Multiprocessor Architecture
Content at a Glance
Review of week9 Introduction Flynns Taxonomy Multiprocessors Multicomputers Multiprocessor Architecture
Demand Paging Bring in pages as required (swap pages) Thrashing Too many processes, in too little memory TLB - Contains page table entries that have been most recently used, most references will be to locations in recently used pages Segments are multiple address spaces of variable dynamic size
4.
5.
6.
CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
It is common to find multiple processors in a computer system, often within the same chip Multiprocessor system and parallel computing organisation issues are similar and both need careful consideration to facilitate construction of high performance distributed computing systems
Introduction (1)
Introduction (2)
Instruction level parallelism has been exploited for a long time, mainly through pipelining and micro-operation parallelism Superscalar machines with multiple execution units within a single processor (uni-processor systems) allow parallel execution of multiple instructions from the same program
Introduction (3)
Systems with multiple processors extend parallelism to multi-program threads Symmetric multiprocessors (SMPs), although the earliest, are still the most common parallel organisation Clusters are common in multi-server systems with workloads beyond the capability of SMPs
Introduction (4)
Non Uniform Memory Access (NUMA) is a more recent approach, used in larger data warehouse systems, supporting the most recent virtualization approach Multiprocessor environments are classified as either tightly coupled or loosely coupled systems
MISD
MIMD
March 20, 2012
-- None
-- Butterfly, Cray X/MP
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
SISD
Single processor Single instruction stream Data stored in single memory Uni-processor The control unit (CU) provides an instruction stream (IS) to a processing unit (PU) The PU operates on a single data stream (DS) from a memory unit (MU).
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Single machine instruction Controls simultaneous execution Number of processing elements Lockstep basis Each processing element has associated data memory Each instruction executed on different set of data by different processors Vector and array processors A single CU feeds a single IS to multiple PUs. Dedicated Local memory (LM) or shared memory
March 20, 2012
SIMD
MISD
Sequence of data Transmitted to set of processors Each processor executes different instruction sequence Impractical Has not been implemented
Set of processors Simultaneously execute different instruction sequences Different sets of data Multiple CUs feed IS to its own PU Shared memory or distributed memory multicomputer Further classified by method of processor communication Examples include SMPs, clusters and NUMA systems
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
MIMD
P1
Memory
P2
I-O 1
I-O 2
Communications P1 link P2
I-O 1
March 20, 2012
I-O 2
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
(a)
March 20, 2012
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
The logical structure of a simple DVD player contains a heterogeneous multiprocessor containing multiple cores for different functions.
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
March 20, 2012
Hardware - Multiprocessor computers have become commodity products, e.g., quad-processor Pentium Pros, SGI and Sun workstations. Programming - Multithreaded programming is supported by commodity operating systems, e.g., Windows NT, UNIX/Pthreads. Applications - Traditionally science and engineering. Now also business and home computing. Problem - Difficulty of multithreaded programming compared to sequential programming.
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Multiprocessors
a. b.
A multiprocessor with 16 CPUs sharing a common memory. An image partitioned into 16 sections, each being March 20, 2012 analyzed by a different CPU. Mitra Richard Salomon, Sudipto
Copyright Box Hill Institute (Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
Multicomputers
a. b.
A multicomputer with 16 CPUs, each with its own private memory. The bit-map image split up among the 16 memories.
March 20, 2012 (Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
Topology
The heavy dots represent switches. The CPUs and memories are not shown
(a) A star (d) A ring (g) A cube
March 20, 2012
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
Multiprocessor Architectures
Message-Passing Architectures
Separate address space for each processor. Processors communicate via message passing.
Shared-Memory Architectures
Single address space shared by all processors. Processors communicate by memory read/write. SMP or NUMA. Cache coherence is important issue.
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley)
Message-Passing Architecture
memory memory
...
memory
cache
cache
cache
processor
processor
...
processor
interconnection network
Shared-Memory Architecture
processor 1 processor 2
...
processor N
cache
cache
cache
interconnection network
memory 1
memory 2
...
memory M
Two or more similar processors of comparable capacity Processors share same memory and I/O Processors are connected by a bus or other internal connection Memory access time is approximately the same for each processor
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
All processors can perform the same functions (hence symmetric) System controlled by integrated operating system
providing
interaction between processors Interaction at job, task, file and data element levels
March 20, 2012
SMP Advantages
Since all processors can perform the same functions, failure of a single processor does not halt the system User can enhance performance by adding more processors
Incremental growth
Organization Classification
Time shared or common bus Multiport memory Central control unit
Simplest form Structure and interface similar to single processor system Following features provided Addressing - distinguish modules on bus Arbitration - any module can be temporary master Time sharing - if one module has the bus, others must wait and may have to suspend Now have multiple processors as well as multiple I/O modules
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Multiport Memory
Direct independent access of memory modules by each processor Logic required to resolve conflicts Little or no modification to processors or modules required
More complex
Better performance
Increased security
Uniprocessor with one main memory card to a high-end system with 48 processors and 8 memory cards Dual-core processor chip Each includes two identical central processors (CPs) CISC superscalar microprocessor Mostly hardwired, some vertical microcode 256-kB L1 instruction cache and a 256-kB L1 data cache L2 cache 32 MB Clusters of five Each cluster supports eight processors and access to entire main memoryRichard Salomon, Sudipto Mitra space March 20, 2012
Copyright Box Hill Institute
Arbitrates system communication Maintains cache coherence Interconnect L2 caches and main memory Each 32 GB, Maximum 8 , total of 256 GB Interconnect to MSC via synchronous memory interfaces (SMIs) Interface to I/O channels, go directly to L2 cache
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
NUMA Multiprocessors
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
The Sun Fire E25K NUMA Multiprocessor (2) The SunFire E25K uses a four-level interconnect. Dashed lines are address paths. Solid lines are data paths.
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education) March 20, 2012 Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Problem - multiple copies of same data may reside in some caches and the main memory Multiple copies have to be kept identical, else it may result in an inconsistent view of memory This results in a cache coherence problem Cache coherence protocols control this problem
SMP, bus interconnection. 4 x 200 MHz Intel Pentium Pro processors. 8 + 8 Kb L1 cache per processor. 512 Kb L2 cache per processor. Snoopy cache coherence. Compaq, HP, IBM, NetPower. Windows NT, Solaris, Linux, etc.
Richard Salomon, Sudipto Mitra (CS 284a Lecture, Tuesday, 7 October 1997, John Thornley) Copyright Box Hill Institute
NUMA, hypercube interconnection. Up to 128 (64 x 2) MIPS R 10000 processors. 32 + 32 Kb L1 cache per processor. 4 Mb L2 cache per processor. Distributed directory-based cache coherence. Automatic page migration/replication. SGI IRIX with Pthreads.
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley)
Shared-memory programming model is easier because data transfer is handled automatically. Message passing can be efficiently implemented on shared memory, but not vice versa. How much of shared-memory programming model should be implemented in hardware? How efficient is shared-memory programming model? How well does shared-memory scale? Does scalablity really matter?
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley)
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Summary
Flynns taxonomy of Parallel Computer architecture (SISD, SIMD, MISD, MIMD) Tightly coupled systems, communicate via a shared memory Loosely coupled systems, are tied together by a switching scheme Mutiprocessor architectures are classified as
Reference
Stallings William, 2003, Computer Organization & Architecture designing for performance, Sixth Edition, Pearson Education, Inc, ISBN 0 - 13 - 049307 4. M Morris Mano, Computer System Architecture, Third Edition, Prentice Hall. Tanenbaum, Structured Computer Organization, Fifth Edition, 2006 Pearson Education, Inc. All rights reserved. 0-13148521-0. CS 284a Lecture, Tuesday, 7 October 1997, John Thornley.
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Manufacturers websites Relevant Special Interest Groups [SIG] Articles in magazines IEEE Computer Society Task Force on Cluster Computing web-site
Further Reading