3. Cache Memories
• Secondary memory: slow, cheap, direct access,
4. Cache Organization located remotely from the CPU.
5. Replacement Algorithms
6. Write Strategies
7. Virtual Memory
Datorarkitektur Fö 2 - 3 Datorarkitektur Fö 2 - 4
What do we need?
We need memory to fit very large programs and to It is possible to build a composite memory system
work at a speed comparable to that of the which combines a small, fast memory and a large
microprocessors. slow main memory and which behaves (most of the
time) like a large fast memory.
Main problem:
- microprocessors are working at a very high rate The two level principle above can be extended into a
and they need large memories; hierarchy of many levels including the secondary
memory (disk store).
- memories are much slower than microproces-
sors;
increasing capacity
1. Processor registers:
Register
5. Hard disk:
- capacity = tens of Gbytes
- access time = tens of milliseconds
Datorarkitektur Fö 2 - 7 Datorarkitektur Fö 2 - 8
• The data which is held in the registers is under the processor memory
direct control of the compiler or of the assembler
programmer. registers
instructions
• The contents of the other levels of the hierarchy are
managed automatically:
- migration of data/instructions to and from address instructions
caches is performed under hardware control; and data
- migration between main memory and backup
copies of address
store is controlled by the operating system (with data
hardware support). instructions
copies
of data
instructions
cache and data
• The miss rate of a well-designed cache: few % Temporal locality (locality in time): If an item is
referenced, it will tend to be referenced again soon.
Datorarkitektur Fö 2 - 11 Datorarkitektur Fö 2 - 12
Problems concerning cache memories: • The figure on slide 8 shows an architecture with a
unified instruction and data cache.
• It is common also to split the cache into one dedi-
cated to instructions and one dedicated to data.
• How many caches?
instruction memory
• How to determine at a read if we have a miss or hit? cache
address
• If there is a miss and there is no place for a new slot copies of
in the cache which information should be replaced? instructions
instructions
instructions
• How to preserve consistency between cache and
main memory at write? address instructions
processor
registers
address
copies
of data
data
data cache
Cache Organization
Datorarkitektur Fö 2 - 15 Datorarkitektur Fö 2 - 16
Direct Mapping
Direct Mapping (cont’d)
8bit 14bit 2bit
Set 0
Disadvantage:
cmp
if miss
This can produce a low hit ratio, even if only a very if hit
small part of the cache is effectively used. miss hit
Datorarkitektur Fö 2 - 19 Datorarkitektur Fö 2 - 20
• A memory block is mapped into any of the lines of a • Set associative mapping keeps most of the
set. The set is determined by the memory address, advantages of direct mapping:
but the line inside the set can be any one. - short tag field
- fast access
• If a block has to be placed in the cache the - relatively simple
particular line of the set will be determined
according to a replacement algorithm.
• Set associative mapping tries to eliminate the main
shortcoming of direct mapping; a certain flexibility is
• The memory address is interpreted as three fields given concerning the line to be replaced when a
by the cache logic, similar to direct mapping. new block is read into the cache.
However, a smaller number of bits (13 in our
example) are used to identify the set of lines in the
cache; correspondingly, the tag field will be larger • Cache hardware is more complex for set
(9 bits in our example). associative mapping than for direct mapping.
• The number of lines in a set is determined by the • if a set consists of a single line ⇒ direct mapping;
designer;
2 lines/set: two-way set associative mapping If there is one single set consisting of all lines ⇒
4 lines/set: four-way set associative mapping associative mapping.
Datorarkitektur Fö 2 - 23 Datorarkitektur Fö 2 - 24
Replacement Algorithms
Associative Mapping (cont’d)
• First-in-first-out (FIFO):
The candidate line is selected which holds
the block that has been in the cache the
longest.
Datorarkitektur Fö 2 - 27 Datorarkitektur Fö 2 - 28
Write Strategies
Write Strategies (cont’d)
• Copy-back
Write operations update only the cache memory
• Write-through which is not kept coherent with main memory;
All write operations are passed to main memory; if cache lines have to remember if they have been
the addressed location is currently hold in the updated; if such a line is replaced from the cache,
cache, the cache is updated so that it is coherent its content has to be copied back to memory.
with the main memory.
PowerPC 603
- two on-chip caches, for data and instructions
- each cache: 8 Kbytes
- line size: 32 bytes
- 2-way set associative organization
(simpler cache organization than the 601 but
stronger processor)
Datorarkitektur Fö 2 - 31 Datorarkitektur Fö 2 - 32
• L1 data cache:
16KB, line size: 64 bytes, 4-way set associative.
System Bus/External Memory Copy-back policy
• ARM3 and ARM 6 had a 4KB unified cache. The address space needed and seen by programs is
• ARM 7 has a 8 KB unified cache. usually much larger than the available main memory.
• Starting with ARM9 there are separate data/instr.
caches:
- ARM9, ARM10, ARM11, Cortex: up to Only one part of the program fits into main memory; the
128/128KB instruction and data cache. rest is stored on secondary memory (hard disk).
- StrongARM: 16/16KB instruction and data
cache. • In order to be executed or data to be accessed, a
- Xscale: 32/32KB instruction and data cache. certain segment of the program has to be first
loaded into main memory; in this case it has to
replace another segment already in memory.
• Line size: 8 (32bit) words,
except ARM7 and StrongArm with 4 words.
• Movement of programs and data, between main
memory and secondary storage, is performed
• Set associative:
automatically by the operating system. These
- 4-way: ARM7, ARM9E, ARM10EJ-S, ARM11 techniques are called virtual-memory techniques.
- 64-way: ARM9T, ARM10E
- 32-way: StrongARM, Xscale • The binary address issued by the processor is a
- various options: Cortex virtual (logical) address; it considers a virtual
address space, much larger than the physical one
available in main memory.
• With the Cortex, an L2 internal cache is introduced
Datorarkitektur Fö 2 - 35 Datorarkitektur Fö 2 - 36
Processor
• The virtual programme space (instructions + data)
data/instructions
Main memory
Demand Paging
transfer if • The program consists of a large amount of pages
reference not in which are stored on disk; at any one time, only a
physical memory few pages have to be stored in main memory.
• The operating system is responsible for loading/
Disk replacing pages so that the number of page faults is
storage minimized.
• If a virtual address refers to a part of program or • We have a page fault when the CPU refers to a
data that is currently in the physical memory location in a page which is not in main memory; this
(cache, main memory), then the appropriate page has then to be loaded and, if there is no
location is accessed immediately using the available frame, it has to replace a page which
respective physical address; if this is not the case, previously was in memory.
the respective program/data has to be transferred
first from secondary memory.
• A special hardware unit, Memory Management Unit
(MMU), translates virtual addresses into physical ones.
pages
Example:
• Virtual memory space: 2 Gbytes
(31 address bits; 231 = 2 G)
frames in main • Physical memory space: 16 Mbytes (224=16M)
memory • Page length: 2Kbytes (211 = 2K)
pages on the disk
Datorarkitektur Fö 2 - 39 Datorarkitektur Fö 2 - 40
virtual address • The page table has one entry for each page of the
virtual memory space.
20bit 11bit
page nmbr. offset 13bit 11bit
• Each entry of the page table holds the address of
frame nr offset
the memory frame which stores the respective
physical address page, if that page is in main memory.
page table
Ctrl frame nr main memory
• Each entry of the page table also includes some
bits in mem. 2 Kbytes
control bits which describe the status of the page:
Entry 0 Frame 0 - whether the page is actually loaded into main
Entry 1 Frame 1 memory or not;
- if since the last loading the page has been
modified;
- information concerning the frequency of
access, etc.
Entry 220-1 Frame 213-1
If page fault
then OS is
activated in
order to load
missed page
The Page Table (cont’d) Memory Reference with Virtual Memory and TLB
request access to
Problems: virtual address
- The page table is very large (number of pages Check TLB
in virtual memory space is very large).
- Access to the page table has to be very fast ⇒
the page table has to be stored in very fast
memory, on chip. Page table Yes
entry in (pages surely in
TLB? main memory)
No
Access page table
• A special cache is used for page table entries, (if entry not in
called translation lookaside buffer (TLB); it works in main memory, a
the same way as an ordinary memory cache and page fault is
produced and OS
contains those page table entries which have been loads missed part
most recently used. of the page table)
• The page table is often too large to be stored in
main memory. Virtual memory techniques are used
to store the page table itself ⇒ only part of the No
Page in
page table is stored in main memory at a given main
(page fault) memory?
moment.
OS activated:
- loads missed Yes
page into main update TLB
The page table itself is distributed along the memory;
memory hierarchy: - if memory is generate physical
- TLB (cache) full, replaces address
an "old" page;
- main memory
- updates page access cache
- disk table and, if miss,
main memory
Datorarkitektur Fö 2 - 43 Datorarkitektur Fö 2 - 44
Page Replacement
Memory Reference with Virtual Memory and TLB • When a new page is loaded into main memory and
(cont’d) there is no free memory frame, an existing page
has to be replaced.
The decision on which page to replace is based on
the same speculations like those for replacement of
blocks in cache memory (see slide 24).
• Memory access is solved by hardware except the LRU strategy is often used to decide on which page
page fault sequence which is executed by the OS to replace.
software.
• The hardware unit which is responsible for • When the content of a page, which is loaded into
translation of a virtual address into a physical one is main memory, has been modified as result of a
the Memory Management Unit (MMU). write, it has to be written back on the disk after its
replacement.
One of the control bits in the page table is used in
order to signal that the page has been modified.
Summary