MN Cache Coherence

CACHE COHERENCE
By: Mahesh Neupane
Cache:
Cache is a key to the performance of a modern processor. In a modern computing system nearly 25% of instructions reference memory, so that memory access time a critical factor in performance. By effectively reducing the cost of a memory access, caches enable the greater than one instruction/cycle goal for instruction throughput for modern processors. Cache exploits locality of reference property to improve the access time to data and reducing the cost of accessing main memory. There are 2 types of locality: a) Temporal Locality b) Spatial Locality
Temporal Locality:
Once a location is referenced, there is high probability that it will be referenced again in the near future. Temporal Locality persists in both data and instructions. The simplest example of temporal locality is instructions in loops: once the loop is entered, all the instructions in the loop will be referenced many times before the loop exits. Commonly called subroutines and functions and interrupt handlers also have properties of temporal locality. Many types hot data that the program uses or updates many times before going on to another block of data.
Spatial Locality:
When an instruction or datum is accessed it is very likely that nearby instructions or data will be accessed soon. An instruction stream also exhibit spatial locality such as in the absence of jumps, the next instruction to be executed is the one immediately following the current one. Data also shows considerable spatial locality when arrays or strings are accessed. Programs commonly step through an array from beginning to end, accessing each element of the array sequentially.
Cache Coherence:
In a shared memory multiprocessor with a separate cache memory for each processor, it is possible to have many copies of any one-instruction operand: one copy in the main memory and one in each cache memory. When one copy of an operand is changed, the other copies of the operand must be changed. Cache coherence is the discipline that ensures that changes in the values of shared operands are propagated throughout the system in a timely fashion.
There are three distinct levels of cache coherence: Every write operation appears to occur instantaneously. All processes see exactly the same sequence of changes of values for each separate operand. Different processes may see on operand assume different sequences of values.(this is non-coherent behavior) A cache coherence problem arises when cache reflects a view of memory, which is different from reality. Cache coherence is a common issue when handling the I/O subsystem. For the centralized shared memory architecture two different processors can view two different values for the same memory location. Lets consider following example of centralized shared memory architecture which has 2 processors A and B: Time 0 1 2 3 Event Cache Contents for CPU A Cache Contents for CPU B Memory Contents for location X 1 1 1 0
CPU A reads X CPU B reads X CPU A stores 0 into X
1 1 0
1 1
Condition of Coherency: 1) A read by a processor P, to a location X follows a write by P to X, with no writes of X by another processor occurring in between, always returns the value written by P. 2) A read by P to location X that follows a write by another processor to X returns the newly written value if the read and write are sufficiently separated. 3) Writes to the same location are serialized: that is two writes to the same location by any two processors are seen in the same order by all processors. In a coherent multiprocessor the cache provides both migration and replication of shared data. Data migration allows low latency access to shared data from a local cache. Replicating data enable simultaneous access to shared data and limits the potential of contention.
Solution to Cache Coherency:

There basic protocol that is adapted in order to eliminate the problem of cache coherency in the memory system is Snooping Protocol.
SNOOPING PROTOCOL:

Snooping Cache Coherence protocols can be used in systems with a shared bus between the processors and memory modules. Snoopy Cache coherence protocols rely on a common channel (or bus) connecting the processors to main memory. This enables all cache controllers to observe (or snoop) the activities of all other processors and take appropriate actions to prevent the processor from obtaining stale data.
Snooping protocol further classified in to 2 scheme. They are: a) Write Invalidate Scheme b) Write Update Scheme
a) Write Invalidate Scheme:

In this protocol all other caches must invalidate their copy of a block before a single processor can modify it. In other words, a processor must obtain exclusive ownership of a block before it can modify the block. A processor first broadcasts a write over the bus to obtain the exclusive ownership of a block. Once this is done, the writing processor is sure that no other processor can receive old (stale) data.
In write invalidate protocol, a cache block is always in one of these four possible states: INVALID, VALID, RESERVED, and DIRTY. INVALID state indicates that the cache does not have the current data for the block and that an access to the block must result in a cache miss. VALID state indicates the correctness of the block data and the block may be present in other caches as well. RESERVED state indicates the data is correct and the block is present only in this cache and the main memory. DIRTY state is present in block only in this cache that has the correct data.
Even it looks like RESERVED and DIRTY state leads to same case but there is significant difference between two states. In the RESERVED state, the cache as well as the main memory have the correct data. In the DIRTY state, only the cache has the correct copy of the data.
The state diagram for this protocol is give below: Read Miss
0
Bus Read Bus Write
Write Hit
3
Write Hit
Write Miss STATE 0

Processor Based transition Bus Induced transition
INVALID VALID(clean, potentially shared) RESERVED(clean, only copy) DIRTY(modified, only copy)
STATE 1 STATE 2 STATE 3
The protocol works as follows: Read Hit: The data is returned to the processor without any delay or bus transactions Read Miss: On a read miss, the block is loaded into VALID state from memory if no cache has a copy of block in DIRTY state. If another cache has copy then it supplies the data to requesting cache and also writes the data to the main main memory. All copies of the data are set to state VALID. When read miss occur the state condition depends upon two scheme: Write-through and Write-back. In write through, memory is always kept up-to-date. In write-back, snoop (look) in cache to find the most recent copy of data. Write Hit: If the block is in RESERVED/ DIRTY State, new data is written in cache without any delay or bus transactions.
If the block is in VALID state, the new data is written through to memory and the block state is changed to RESERVED. All other cache will set to be INVALID state. Since repeated write to a block by a processor generates only one write to main memory, which is used to invalidate all other copies of the block and no bus transactions is required for repeated transaction. Thats why this protocol also known as write once protocol. Write Miss: If there is not DIRTY bit in the caches, on write miss, the block is loaded from memory into DIRTY state. If another cache has the block in DIRTY state, then it supplies the block data to requesting cache instead of memory and sets the local block state to INVALID. After observing write miss on the bus all other cache will set INVALID state. Only the requesting processor has exclusive ownership of the block. Once block is loaded, the processor modifies it and set the state to DIRTY.
Block replacement: The block data must be written to memory if the block is in DIRTY state. Otherwise, the block can replaced without any bus transaction. The working of this protocol can be shown by following example: Processor activity Bus Activity CPU As cache content 0 0 1 1 CPU Bs Contents of cache content memory(X) 0 0 0 0 1
CPU A reads X CPU B reads X CPU A writes a 1 to X CPU B reads X
Cache miss for X Cache miss for X Invalidation for X Cache miss for X
0 1
b) Write Update Scheme: In this protocol, a write to shared block causes the write data to be broadcast over the bus so that all caches other can update their data. If the writing processor determines that there are other copies of the block, then it broadcasts the new data over the bus when it writes to the block.
When other caches observes the new data being broadcasted over the bus, they update their copy of the block data. This protocols also known as Firefly protocol. In this protocol, a cache block may be in one of the three possible states: VALID-EXCLUSIVE, SHARED, and DIRTY. VALID-EXCLUSIVE state signifies that this cache has the only copy of the block and that the data present in main memory is correct. SHARED state indicates that the block has the correct copy of the data and may be present in other caches. A block in DIRTY state if the cache has the only copy of the data and the data stored in main memory is incorrect. SHARED and VALID-EXCLUSIVE states are identical to Write invalids VALID and RESERVED states respectively. The DIRTY bit in both protocols has the same meaning. This protocol requires that the bus provide an additional signal called the SharedLine. The Shared-Line is used to indicate whether or not a cache has a copy of the block being accessed on the bus and is sampled by the cache controllers to determine the appropriate actions to be taken. A cache raises the Shared-Line if it has a copy of the block whenever a read or write to the block is performed. The state transition diagram for this protocol is given below: Read miss (Shared Line false) Write Miss (Shared Line False)
Write Hit Bus Read/Write
1
write hit Bus Read/Write
Write Hit (Shared line false)
2
Write Hit (Shared line true) Write Miss(Shared line true) Processor based transition Bus- induced transition Bus Read/Write Read Miss(Shared Line True) State 0: VALID-EX (clean,only copy) State 1: SHARED (clean) State 2: DIRTY (dirty, only copy)
The protocol works as follows: Read Hit: The data is returned to the processor without any delay or bus transactions. Read Miss: On a read miss following operation performed: The block is loaded into the cache in either VALID-EXCLUSIVE or SHARED state depending on whether the Shared Line is raised. If another cache has the block, then that cache supplies the block to the requesting cache and raises the Shared Line on the bus. All other cache aborts their attempt to supply the requested data. If the block is supplied to the requesting cache by another cache instead of memory (i.e. the SharedLine was raised), the entire cache will update their data with new data. Write Hit: * If the block is in VALID-EXCLUSIVE state then on write miss cache goes to SHARED state and the data is written without any delay or bus transactions. * If the block is in SHARED state and the Shared line is raised then that cache sets its state to VALID-EXCLUSIVE state by acquiring the data. * If the block in DIRTY state then cache will load those data in to the bus and broadcast the value . Write Miss: On write miss following operations are performed: If the block present in cache has SHARED state then it is written in to main memory. If the block has SHARED state and Shared Line is true then it remains in the same state by writing into its memory. If the block is in DIRTY state and if Shared Line is false then it changes its state by being in VALID-EXCLUSIVE state. The following example shows the operation of Write update protocol between 2 processors: A and B. Processor activity Bus Activity CPU As cache content 0 0 1 1 CPU Bs Contents of cache content memory(X) 0 0 0 1 1
CPU A reads X CPU B reads X CPU A writes a 1 to X CPU B reads X
Cache miss for X Cache miss for X Write broadcast of X
0 1 1
Comparison between Write Invalidate and Write Update:

Write invalidate is used in vast majority of designs. Qualitative Performance Difference: Write-invalidate requires one transaction per write run (sequence of writes) while write-update involves a broadcast for each write. Write- invalidate uses spatial locality: one transaction per cache block while write update requires a broadcast per word. Write- broadcast has lower latency between write and read while the write-invalidate requires a broadcast per word. Write-invalidate protocol is popular because the demand for bus and memory bandwidth is high. Write-update can causes problem for some memory consistency models reducing the potential performance gain. The high demand for bandwidth in write update limits its scalability for large number of processors.
MESI Algorithm:
The Modified Exclusive Shared Invalid (MESI) algorithm is also used to eliminate the Cache coherency. The states in this algorithm are given below: MESI State Modified (M) Definition The line is valid in the cache and in only this cache. The line is modified with respect to system memory-that is, the modified data in the line has not been written back to memory. Exclusive(E) The addressed line is in the cache only. The data in this line is consistent with system memory. Shared(S) The addressed line is valid in the cache and in at least one other cache. A shared line always consistent with system memory, That is, the shared state is shared-unmodified; there is no sharedmodified state. Invalid(I) This state indicated that the addressed line is not resident in the cache and/or any data contained is considered not useful. Exclusive may also be called CleanExclusive. Modified may also be called DirtyExclusive.
Cache States in MESI:

Modified in Cache A Shared in Cache A
Cache A M Valid Data I
Cache B
Cache A S Valid Data
Cache B S
Invalid data
Valid Data
System memory
Invalid data Valid Data
Exclusive in Cache A
Invalid in Cache A
Cache A E
Cache B I
Cache A
Cache B
Valid Data
Invalid data
Invalid data
Dont Care
System Memory
Dont Care Valid Data
Some processor adds a fifth state for Shared Modified and calls it the MOESI protocol. The caches with the shared modified state update each others lines with current data, but do not write it to main memory.
MESI State Diagram:
MESI State Table: State Event Read miss, shared Invalid
Action
Read cache line Read cache line

Next State
Shared Exclusive Modified Shared Modified Shared Invalid Exclusive Modified Shared
(cache copies exist) Read miss, exclusive(no cache copies exist) Write miss Read hit Write hit Snoop hit on read Snoop hit on invalidate Read hit Write hit Snoop hit on read Snoop hit on invalidate
Broadcast invalidate Read cache line Modify cache line
Shared
Broadcast invalidate Invalidate Cache line
Exclusive
Invalidate cache line
Invalid Modified Modified Shared Invalid Invalid
Modified
Read hit Write hit Snoop hit on read Snoop hit on invalidate LRU Replacement
Write back to memory Write back to memory Write back to memory

MN Cache Coherence

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

MN Cache Coherence

Diunggah oleh

Hak Cipta:

Format Tersedia

CACHE COHERENCE

By: Mahesh Neupane

CPU A reads X CPU B reads X CPU A stores 0 into X

Solution to Cache Coherency:

a) Write Invalidate Scheme:

Write Miss STATE 0

STATE 1 STATE 2 STATE 3

CPU A reads X CPU B reads X CPU A writes a 1 to X CPU B reads X

Write Hit Bus Read/Write

Write Hit (Shared line false)

CPU A reads X CPU B reads X CPU A writes a 1 to X CPU B reads X

Cache miss for X Cache miss for X Write broadcast of X

Comparison between Write Invalidate and Write Update:

Cache States in MESI:

Cache A M Valid Data I

Cache A S Valid Data

MESI State Diagram:

MESI State Table: State Event Read miss, shared Invalid

Broadcast invalidate Read cache line Modify cache line

Broadcast invalidate Invalidate Cache line

Invalidate cache line

Invalid Modified Modified Shared Invalid Invalid

Write back to memory Write back to memory Write back to memory

Anda mungkin juga menyukai