Anda di halaman 1dari 13

Technische Universitt Mnchen

Chip Multicore Processors


Tutorial 8

S. Wallentowitz

Institute for Integrated Systems Theresienstr. 90 Building N1 www.lis.ei.tum.de

Technische Universitt Mnchen

Task 8.1: Performance of Snooping-based Cache Cohereny


0 1 2 3 I S M I 408 410 54 | 04 20 | 01 400 408 0 1 2 3 I M I S 418 0a | 00 428 00 | 20 410 ... 00 | 00 54 | 04 03 | 00

418
420 428 430

00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 ...

0 1 2 3

S S M I

420 408 430

01 | 02 54 | 04 00 | 00

438

Chip Multicore Processors Tutorial 8 2 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

a) sequence 1
1: (P1) read 410 2: (P2) read 410 3: (P0) read 430 3: replace
0 1 I S M I 408 410 S 430 54 | 04 20 | 01 00 | 00 ... 400 408 00 | 00 54 | 04 20 | 01 03 | 00

1: write back

2 3

1: 2: 3:

+ + 200 cycles
2: write back and load 1: read miss

0 1 2 3

I M I S S 428 410 418 00 | 20 20 | 01 0a | 00

410

418
420 428 430

00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 00 | 00 ...

0 1 2 3

S S M I S

420 408 430 410

01 | 02 54 | 04 00 | 00 20 | 01

438

Chip Multicore Processors Tutorial 8 3 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

a) sequence 2
1: write miss 1: (P0) write 420, 42 2: (P2) read 424 3: (P2) write 424, 23 2: write back 3: invalidate
0 1 2 3 I 420 MS I S 408 M I 410 01 | 42 54 | 04 20 | 01 400 408 ... 00 | 00 54 | 04 03 | 00

1: 2: 3:

+ 124 cycles
1: snoop WM 2: read miss

0 1 2 3

I M I S 418 0a | 00 23 | 42 01 | 02 01 | 42 54 | 04 00 | 00 428 00 | 20

410

418
420 428 430

00 | 00
01 | 02 01 | 42 0c | d0 00 | ff 00 | 00 ...

0 1 2 3

420 ISM S 408 M I 430

438

3: invalidate

Chip Multicore Processors Tutorial 8 4 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

a) sequence 3
1: (P0) write 420, 42 2: (P2) read 424 3: (P2) write 424, 23
0 1 2 3 I S M I 408 410 54 | 04 20 | 01 400 408 ... 00 | 00 54 | 04 03 | 00

Self Study

0 1 2 3

I M I S 418 0a | 00 428 00 | 20

410

418
420 428 430

00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 ...

0 1 2 3

S S M I

420 408 430

01 | 02 54 | 04 00 | 00

438

Chip Multicore Processors Tutorial 8 5 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

8.1 b)

To optimize the external accesses an owner state (O) is added to the cache coherency protocol. On a write, all other cache entries should be invalidated (write-invalidate). Instead of the memory the current owner will give the data on a read access of another cache. Sketch the modified diagramm of the MOSI protocol.

Chip Multicore Processors Tutorial 8 6 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

Example Coherency Protocol (MSI)


Invalidate Write Miss Read Hit

Invalid
CPU Write Miss (Place write miss on bus)

CPU Read Miss (Place read miss on bus)

Shared
Read Miss

Write Miss (Write Back)

All actions on cache lines Write-back cache Processor triggered


Events Cache actions

Modified
Hit

Bus triggered
Events Cache actions

Chip Multicore Processors Tutorial 8 7 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

MOSI
Invalidate Write Miss Read Hit

Invalid
Write Miss (Write Back)

Read Miss (Read Miss)

Shared
Read Miss Write (Invalidate)

Write Miss (Write miss)

Eviction (Write Back)

Read Miss (Provide Data) Write (Invalidate) Read Miss (Provide Data)

Modified
Read/Write Hit

Owner

Read Hit

Chip Multicore Processors Tutorial 8 8 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

c) sequence 1
1: (P1) read 410 2: (P2) read 410 3: (P0) read 430 3: replace
0 1 I S M 408 54 | 04 20 | 01 00 | 00 ... 400 408 0 1 I M I S S 428 410 418 00 | 20 20 | 01 0a | 00 410 00 | 00 54 | 04 20 | 01 03 | 00

1: provide data 2: provide data

2 3

410 O 430 I S

1: 2: 3:

+ + 152 cycles
2: write back and load 1: read miss

418
420 428 430

00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 00 | 00 ...

2 3

0 1 2 3

S S M I S

420 408 430 410

01 | 02 54 | 04 00 | 00 20 | 01

438

Chip Multicore Processors Tutorial 8 9 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

c) sequence 2
1: write miss 1: (P0) write 420, 42 2: (P2) read 424 3: (P2) write 424, 23 2: provide data 3: invalidate
0 1 2 3 I 420 MO I S 408 M I 410 01 | 42 54 | 04 20 | 01 400 408 ... 00 | 00 54 | 04 03 | 00

1: 2: 3:

60 cycles
1: snoop WM, dont provide 2: read miss

0 1 2 3

I M I S 418 0a | 00 23 | 42 01 | 02 01 | 42 54 | 04 00 | 00 428 00 | 20

410

418
420 428 430

00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 ...

0 1 2 3

420 ISM S 408 M I 430

438

3: invalidate

Chip Multicore Processors Tutorial 8 10 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

a) sequence 3
1: (P0) write 420, 42 2: (P2) read 424 3: (P2) write 424, 23
0 1 2 3 I S M I 408 410 54 | 04 20 | 01 400 408 ... 00 | 00 54 | 04 03 | 00

Self Study: Will be online

0 1 2 3

I M I S 418 0a | 00 428 00 | 20

410

418
420 428 430

00 | 00
01 | 02 0c | d0 00 | ff 00 | 00 ...

0 1 2 3

S S M I

420 408 430

01 | 02 54 | 04 00 | 00

438

Chip Multicore Processors Tutorial 8 11 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

8.2

Read the article Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System, Daniel Molka et al., PACT 2009. Shortly describe the investigated architecture? What is decribed by the term ccNUMA? How do the information in the L3 cache relate to the other levels and how precise is it? Shortly describe the executed benchmarks and central findings of the article.

Chip Multicore Processors Tutorial 8 12 S. Wallentowitz

Institute for Integrated Systems

Technische Universitt Mnchen

ccNUMA: non-uniform memory access, cache coherent L3: inclusive last level, core valid bits are imprecise
0: core does for sure not hold a copy 1: core may hold a copy

Benchmarks: latency, local and global bandwidth

Chip Multicore Processors Tutorial 8 13 S. Wallentowitz

Institute for Integrated Systems

Anda mungkin juga menyukai