Cache Memory
Characteristics of Computer Memory
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical characteristics
• Organisation
Location
• CPU
• Internal
• External
Capacity
• Word size
—The natural unit of organisation
• Number of words
—or Bytes
Unit of Transfer
• Internal
—Usually governed by data bus width
• External
—Usually a block which is much larger than a
word
• Addressable unit
—Smallest location which can be uniquely
addressed
—Word internally
—Cluster on M$ disks
Access Methods (1)
• Sequential
—Start at the beginning and read through in
order
—Access time depends on location of data and
previous location
—e.g. tape
• Direct
—Individual blocks have unique address
—Access is by jumping to vicinity plus
sequential search
—Access time depends on location and previous
location
—e.g. disk
Access Methods (2)
• Random
—Individual addresses identify locations exactly
—Access time is independent of location or
previous access
—e.g. RAM
• Associative
—Data is located by a comparison with contents
of a portion of the store
—Access time is independent of location or
previous access
—e.g. cache
Memory Hierarchy
• Registers
—In CPU
• Internal or Main memory
—May include one or more levels of cache
—―RAM‖
• External memory
—Backing store
Memory Hierarchy - Diagram
Performance
• Access time
—Time between presenting the address and
getting the valid data
• Memory Cycle time
—Time may be required for the memory to
―recover‖ before next access
—Cycle time is access + recovery
• Transfer Rate
—Rate at which data can be moved
Physical Types
• Semiconductor
—RAM
• Magnetic
—Disk & Tape
• Optical
—CD & DVD
• Others
—Bubble
—Hologram
Physical Characteristics
• Decay
• Volatility
• Erasable
• Power consumption
Organisation
• Physical arrangement of bits into words
• Not always obvious
—e.g. interleaved
The Bottom Line
• How much?
—Capacity
• How fast?
—Time is money
• How expensive?
• Footprint
—How much space it will take?
—Not mentioned in text b/c, at the same chip
area, larger footprint simply means more
expensive
Hierarchy List
• Registers This hierarchy does
• L1 Cache not apply to all
computers, e.g.
• L2 Cache
• L3 Cache
• Main memory
• Expanded memory
• Disk cache
(in IBM PCs running
• Disk DOS)
• Optical
• Tape
So you want it fast?
• It is possible to build a computer which
uses only static RAM (SRAM, described
later)
• This would be very fast (10 ns access
time, compared to about 60 ns for DRAM)
• This would need no cache
—How can you cache cache?
• This would cost a lot and need a huge
chip, b/c SRAM has larger footprint than
DRAM.
What makes caching possible:
Locality of Reference
3 techniques exist:
• Direct
• Associative
• Set-associative
Source: http://cs.njit.edu/~sohn/cs650/lec8.pdf
1 1,m+1, 2m+1…2s-m+1
…
m-1 m-1, 2m-1,3m-1…2s-1
We have covered pp. 111-126 in the text.
Read all explanations in the text and
thoroughly understand the modulo
operation!
5-minute quiz
1. Explain in as few words as possible the
relation between a block and a line
In our example:
• 24 bit address
• w = 2 bit word identifier (22 = 4 Bytes in a block)
• s = 22 bit block identifier
— r = 14 bit line identifier
— 8 bit tag (=22-14)
Remember the 256 blocks of main memory that can map onto
a given cache line? 28 = 256 (Coincidence?)
No two blocks that map to the same line have the same Tag!
Direct Mapping Address Structure
Word
Tag t = s-r bits Line identifier r bits w bits
8 14 2
• Simple
• Inexpensive
• Fixed location for given block
—If a program accesses 2 blocks that map to
the same line repeatedly, cache misses are
very high, a.k.a. thrashing
Victim Cache
• Lower miss penalty
• Remember what was discarded
—Already fetched
—Use again with little penalty
• Fully associative ... See below
• Very small, 4 to 16 cache lines
• Between a direct-mapped L1 cache and
next memory level
Associative Mapping
A main memory block can load into any line
of cache
Word
Tag 22 bit 2 bit
• Cache is 64kB
• Cache line is 4 Bytes
— The cache has 16k
(214) lines
• Main memory is 16MB
— 24 bit address
Is it even possible to build a fully
associative cache controller of this size?
• Simple to understand
• Very expensive to implement the compare
function
• Flexibility to store blocks anywhere
— Miss ratio can be improved using various
replacement algorithms (later ...)
— Miss ratio is lowest of all mappings, so we
would choose it when the miss penalty is very
high (e.g. weapons control systems)
Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
Source: http://cs.njit.edu/~sohn/cs650/lec8.pdf
Set Associative Mapping
Word
Tag 9 bit Set 13 bit 2 bit
• 4.17
• 4.18
0.6
0.5
0.4
0.3
0.2
0.1
0.0
1k 2k 4k 8k 16k 32k 64k 128k 256k 512k 1M
Cache size (bytes)
direct
2-way
4-way
8-way
16-way
5-minute quiz
• How many sets are there in a
— Direct-addressed cache
— Fully-associative cache?
• A 4-way set-associative cache has 2 k
lines. How many sets does it have?
• The main memory associated with the
above cache has 2 MB, byte-level
addressing and 8 Byte/block.
—Derive the address structure
—How long are the tags in the cache?
Example continues …
• A 4-way set-associative cache has 2 k
lines. How many sets does it have?
• The main memory associated with the
above cache has 2 MB, byte-level
addressing and 8 Byte/block.
—Derive the address structure
—How long are the tags in the cache?
—Where in the cache is the word with
address 1242AB (hex)?
Extra-credit question
From table 4.3, it seems that cache size hasn’t
been following Moore’s Law
Why not?
Replacement Algorithms (1)
Direct mapping
Contention occurs when both the Instruction Prefetcher and Create separate data and instruction Pentium
the Execution Unit simultaneously require access to the caches.
cache. In that case, the Prefetcher is stalled while the
Execution Unit’s data access takes place.
Core Cache Cache Size (kB) Cache Line Size Associativity Location Write Buffer
Type (words) Size (words)
End-of-chapter problems:
• 4
• 6
• 8
• 12
• 19 (Hint: Use formula from Ex. 4.1, p.116)
• 25
Lab week 5
In Table 4.5 / 143 it is stated that the
following combination of the cache control
bits is invalid: CD = 0, NW = 1.
Read the description of these bits carefully
and explain why the combination is
invalid.
— Hint: What happens when a "dirty" line needs
to be overwritten due to a cache miss?
Lab week 5
Experimenting with the first cache simulator from
the text website:
http://williamstallings.com/COA/Animation/Links.html
Lab week 5
• 4.15