Ict123 W10

Centre for Computer Technology
ICT123 Computer Architecture

Week 10
Multiprocessor Architecture
Content at a Glance
Review of week9 Introduction Flynns Taxonomy Multiprocessors Multicomputers Multiprocessor Architecture
March 20, 2012
Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Paging - Allocation of Free Frames
March 20, 2012
Demand Paging Bring in pages as required (swap pages) Thrashing Too many processes, in too little memory TLB - Contains page table entries that have been most recently used, most references will be to locations in recently used pages Segments are multiple address spaces of variable dynamic size
March 20, 2012
Cache operation overview

1.
2. 3.
4.
5.
6.
CPU requests contents of memory location Check cache for this data If present, get from cache (fast) If not present, read required block from main memory to cache Then deliver from cache to CPU Cache includes tags to identify which block of main memory is in each cache slot
March 20, 2012
Typical Cache Organization
March 20, 2012
Direct Mapping Example
March 20, 2012
Associative Mapping Example
March 20, 2012
Two Way Set Associative Mapping Example
March 20, 2012
It is common to find multiple processors in a computer system, often within the same chip Multiprocessor system and parallel computing organisation issues are similar and both need careful consideration to facilitate construction of high performance distributed computing systems
March 20, 2012
Introduction (1)
Introduction (2)
Instruction level parallelism has been exploited for a long time, mainly through pipelining and micro-operation parallelism Superscalar machines with multiple execution units within a single processor (uni-processor systems) allow parallel execution of multiple instructions from the same program
March 20, 2012
Introduction (3)
Systems with multiple processors extend parallelism to multi-program threads Symmetric multiprocessors (SMPs), although the earliest, are still the most common parallel organisation Clusters are common in multi-server systems with workloads beyond the capability of SMPs
March 20, 2012
Introduction (4)
Non Uniform Memory Access (NUMA) is a more recent approach, used in larger data warehouse systems, supporting the most recent virtualization approach Multiprocessor environments are classified as either tightly coupled or loosely coupled systems
March 20, 2012
Flynns Taxonomy of Parallel Processor Architectures
March 20, 2012
Taxonomy of Parallel Computers
Flynns taxonomy of parallel computers.

(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
March 20, 2012
Flynns Taxonomy (examples)

SISD SIMD -- IBM370, 486 PC, Macintosh, VAX -- NASAs MPP, ILLIAC IV
MISD
MIMD
March 20, 2012
-- None
-- Butterfly, Cray X/MP
Parallel Organizations - SISD
March 20, 2012
SISD

Single processor Single instruction stream Data stored in single memory Uni-processor The control unit (CU) provides an instruction stream (IS) to a processing unit (PU) The PU operates on a single data stream (DS) from a memory unit (MU).
March 20, 2012
Parallel Organizations - SIMD
March 20, 2012
Single machine instruction Controls simultaneous execution Number of processing elements Lockstep basis Each processing element has associated data memory Each instruction executed on different set of data by different processors Vector and array processors A single CU feeds a single IS to multiple PUs. Dedicated Local memory (LM) or shared memory
March 20, 2012
SIMD
Parallel Processing - MISD
March 20, 2012
MISD
Sequence of data Transmitted to set of processors Each processor executes different instruction sequence Impractical Has not been implemented
March 20, 2012
Parallel Organizations - MIMD Shared Memory
March 20, 2012
Parallel Organizations - MIMD Distributed Memory
March 20, 2012
Set of processors Simultaneously execute different instruction sequences Different sets of data Multiple CUs feed IS to its own PU Shared memory or distributed memory multicomputer Further classified by method of processor communication Examples include SMPs, clusters and NUMA systems
MIMD
March 20, 2012
Tightly Coupled MP Systems (1)
P1
Memory
P2
I-O 1
I-O 2
March 20, 2012
Tightly Coupled MP Systems (2)

Processors share memory (global common memory) Communicate via that shared memory Each processor can have its own local memory Most commercial multiprocessors provide a cache memory with each CPU Tolerate a higher degree of interaction between tasks
March 20, 2012
Block Diagram of Tightly Coupled Multiprocessor System
March 20, 2012
Loosely Coupled MP Systems (1)

M1 M2
Communications P1 link P2
I-O 1
March 20, 2012
I-O 2
Loosely Coupled MP Systems (2)

Each processor has its own private memory The processors are tied together by a switching scheme designed to route information by a message passing scheme Information is relayed in packets Most efficient when interaction between tasks is minimal
March 20, 2012
Multiprogramming and Multiprocessing
March 20, 2012
Homogeneous Multiprocessors on a Chip
(a)
March 20, 2012
Single-chip multiprocessors. A dual-pipeline chip. (b) A chip with two cores.

Heterogeneous Multiprocessors on a Chip (1)
The logical structure of a simple DVD player contains a heterogeneous multiprocessor containing multiple cores for different functions.
March 20, 2012
Heterogeneous Multiprocessors on a Chip (2) An example of the IBM CoreConnect architecture.
March 20, 2012
Multiprocessors and Multiprocessing
Hardware - Multiprocessor computers have become commodity products, e.g., quad-processor Pentium Pros, SGI and Sun workstations. Programming - Multithreaded programming is supported by commodity operating systems, e.g., Windows NT, UNIX/Pthreads. Applications - Traditionally science and engineering. Now also business and home computing. Problem - Difficulty of multithreaded programming compared to sequential programming.
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
March 20, 2012
Why Buy a Multiprocessor?

Multiple users. Multiple applications. Multitasking within an application. Responsiveness and/or throughput.
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley)

March 20, 2012
Multiprocessors
a. b.
A multiprocessor with 16 CPUs sharing a common memory. An image partitioned into 16 sections, each being March 20, 2012 analyzed by a different CPU. Mitra Richard Salomon, Sudipto
Copyright Box Hill Institute (Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
Multicomputers
a. b.
A multicomputer with 16 CPUs, each with its own private memory. The bit-map image split up among the 16 memories.
March 20, 2012 (Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education)
Topology
The heavy dots represent switches. The CPUs and memories are not shown
(a) A star (d) A ring (g) A cube
March 20, 2012
(b) A complete interconnect (e) A grid (h) A 4D hypercube.

(c) A tree (f) A double torus.
Multiprocessor Architectures
Message-Passing Architectures

Separate address space for each processor. Processors communicate via message passing.
Shared-Memory Architectures

Single address space shared by all processors. Processors communicate by memory read/write. SMP or NUMA. Cache coherence is important issue.
March 20, 2012
Message-Passing Architecture
memory memory
...
memory
cache
cache
cache
processor
processor
...
processor
interconnection network
March 20, 2012
Shared-Memory Architecture
processor 1 processor 2
...
processor N
cache
cache
cache
interconnection network
memory 1
memory 2
...
memory M

March 20, 2012
Shared Memory Architecture SMP (Symmetric Multiprocessor) (1)
A stand alone computer with the following characteristics
Two or more similar processors of comparable capacity Processors share same memory and I/O Processors are connected by a bus or other internal connection Memory access time is approximately the same for each processor
March 20, 2012
Shared Memory Architecture SMP (Symmetric Multiprocessor) (2)
All processors share access to I/O

Either
through same channels or different channels giving paths to same devices
All processors can perform the same functions (hence symmetric) System controlled by integrated operating system
providing
interaction between processors Interaction at job, task, file and data element levels
March 20, 2012
Performance - if some work can be done in parallel Availability
SMP Advantages
Since all processors can perform the same functions, failure of a single processor does not halt the system User can enhance performance by adding more processors
Incremental growth
Scaling Vendors can offer range of products based on number of processors

March 20, 2012
Organization Classification
Time shared or common bus Multiport memory Central control unit
March 20, 2012
Simplest form Structure and interface similar to single processor system Following features provided Addressing - distinguish modules on bus Arbitration - any module can be temporary master Time sharing - if one module has the bus, others must wait and may have to suspend Now have multiple processors as well as multiple I/O modules
Time Shared Bus
March 20, 2012
Time Share Bus - Advantages

Simplicity Flexibility Reliability
March 20, 2012
Time Share Bus - Disadvantages

Performance limited by bus cycle time Each processor should have local cache
Reduce number of bus accesses
Leads to problems with cache coherence
Solved in hardware - see later
March 20, 2012
Multiport Memory
Direct independent access of memory modules by each processor Logic required to resolve conflicts Little or no modification to processors or modules required
March 20, 2012
Multiport Memory - Advantages and Disadvantages

More complex
Extra logic in memory system

Each processor has dedicated path to each module
Better performance
Can configure portions of memory as private to one or more processors
Increased security
Write March 20, 2012
through cache policy

Central Control Unit

Funnels separate data streams between independent modules Can buffer requests Performs arbitration and timing Pass status and control Perform cache update alerting Interfaces to modules remain the same e.g. IBM S/370
March 20, 2012
A Mainframe SMP IBM z Series Example (1)

Uniprocessor with one main memory card to a high-end system with 48 processors and 8 memory cards Dual-core processor chip Each includes two identical central processors (CPs) CISC superscalar microprocessor Mostly hardwired, some vertical microcode 256-kB L1 instruction cache and a 256-kB L1 data cache L2 cache 32 MB Clusters of five Each cluster supports eight processors and access to entire main memoryRichard Salomon, Sudipto Mitra space March 20, 2012
Copyright Box Hill Institute
A Mainframe SMP IBM z Series Example (2)
System control element (SCE)

Main store control (MSC) Memory card
Arbitrates system communication Maintains cache coherence Interconnect L2 caches and main memory Each 32 GB, Maximum 8 , total of 256 GB Interconnect to MSC via synchronous memory interfaces (SMIs) Interface to I/O channels, go directly to L2 cache
Memory bus adapter (MBA)
March 20, 2012
IBM z990 Multiprocessor Structure
March 20, 2012
Shared-Memory Architecture - NUMA
NUMA (Non-Uniform Memory Access)

Each memory is closer to some processors than others. E.g. Distributed Shared Memory. Typically interconnection is grid or hypercube. Harder to program, but scales to more processors.
March 20, 2012
(CS 284a Lecture, Tuesday, 7 October 1997, John Thornley) Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
NUMA Multiprocessors
A NUMA machine based on two levels of buses

March 20, 2012
The Sun Fire E25K NUMA Multiprocessor (1)
The Sun Microsystems E25K multiprocessor.
March 20, 2012 Education)
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
The Sun Fire E25K NUMA Multiprocessor (2) The SunFire E25K uses a four-level interconnect. Dashed lines are address paths. Solid lines are data paths.
(Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education) March 20, 2012 Richard Salomon, Sudipto Mitra Copyright Box Hill Institute
Shared-Memory Architecture: Cache Coherence

Problem - multiple copies of same data may reside in some caches and the main memory Multiple copies have to be kept identical, else it may result in an inconsistent view of memory This results in a cache coherence problem Cache coherence protocols control this problem
(more on this topic in the next class)

March 20, 2012
SMP, bus interconnection. 4 x 200 MHz Intel Pentium Pro processors. 8 + 8 Kb L1 cache per processor. 512 Kb L2 cache per processor. Snoopy cache coherence. Compaq, HP, IBM, NetPower. Windows NT, Solaris, Linux, etc.
March 20, 2012
Example: Quad-Processor Pentium Pro
Richard Salomon, Sudipto Mitra (CS 284a Lecture, Tuesday, 7 October 1997, John Thornley) Copyright Box Hill Institute
Example: SGI Origin 2000

NUMA, hypercube interconnection. Up to 128 (64 x 2) MIPS R 10000 processors. 32 + 32 Kb L1 cache per processor. 4 Mb L2 cache per processor. Distributed directory-based cache coherence. Automatic page migration/replication. SGI IRIX with Pthreads.
March 20, 2012
Message-Passing versus SharedMemory Architectures

Shared-memory programming model is easier because data transfer is handled automatically. Message passing can be efficiently implemented on shared memory, but not vice versa. How much of shared-memory programming model should be implemented in hardware? How efficient is shared-memory programming model? How well does shared-memory scale? Does scalablity really matter?
March 20, 2012
Summary
Flynns taxonomy of Parallel Computer architecture (SISD, SIMD, MISD, MIMD) Tightly coupled systems, communicate via a shared memory Loosely coupled systems, are tied together by a switching scheme Mutiprocessor architectures are classified as

Message passing can be efficiently implemented on shared memory

Message-Passing Architectures Shared-Memory Architectures
March 20, 2012
Reference
Stallings William, 2003, Computer Organization & Architecture designing for performance, Sixth Edition, Pearson Education, Inc, ISBN 0 - 13 - 049307 4. M Morris Mano, Computer System Architecture, Third Edition, Prentice Hall. Tanenbaum, Structured Computer Organization, Fifth Edition, 2006 Pearson Education, Inc. All rights reserved. 0-13148521-0. CS 284a Lecture, Tuesday, 7 October 1997, John Thornley.
March 20, 2012
Manufacturers websites Relevant Special Interest Groups [SIG] Articles in magazines IEEE Computer Society Task Force on Cluster Computing web-site
Further Reading
March 20, 2012

Ict123 W10

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Ict123 W10

Diunggah oleh

Hak Cipta:

Format Tersedia

Centre for Computer Technology

ICT123 Computer Architecture

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Paging - Allocation of Free Frames

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Cache operation overview

March 20, 2012

Typical Cache Organization

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Direct Mapping Example

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Associative Mapping Example

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Two Way Set Associative Mapping Example

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Flynns Taxonomy of Parallel Processor Architectures

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Taxonomy of Parallel Computers

Flynns taxonomy of parallel computers.

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Flynns Taxonomy (examples)

Parallel Organizations - SISD

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Parallel Organizations - SIMD

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Parallel Processing - MISD

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Parallel Organizations - MIMD Shared Memory

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Parallel Organizations - MIMD Distributed Memory

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

March 20, 2012

Tightly Coupled MP Systems (1)

March 20, 2012

Richard Salomon, Sudipto Mitra Copyright Box Hill Institute

Tightly Coupled MP Systems (2)