Anda di halaman 1dari 93

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-1

Chapter 5 Memory Hierarchy Design

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-2

Introduction
The necessity of memory-hierarchy in a computer system design is enabled by the following two factors:
Locality of reference: The nature of program behavior Large gap in speed between CPU and mass storage devices such a DRAM.

Level of memory hierarchy


High level <----> Low level CPU Register, Cache, Main-memory, Disk The levels of the hierarchy subset one another: all data in one level is also found in the level below.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-3

Memory Hierarchy

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-4

Speed Gap between CPU and DRAM

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-5

Memory Hierarchy Difference between Desktops and Embedded Processors


Memory hierarchy for desktops
Speed

Memory hierarchy for Embedded Processors


Real-time applications need to care about worst-case performance. Concerning about power consumption. No memory hierarchy actually needed for simple and fix applications running on embedded processors. Main memory itself may be quite small.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-6

ABCs of Caches
Recalling some terms Cache: The name given to the first level of the memory hierarchy encountered once the address leaves the CPU. Miss rate: The fraction of accesses not in the cache. Miss penalty: The additional time to service the miss. Block: The minimum unit of information that can be present in the cache. Four questions about any level of the hierarchy: Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy)

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-7

Cache Performance
Formula for performance evaluation
CPU execution time = (CPU clock cycles + Memory stall cycles) * Clock cycle time =IC *(CPIexecution + Memory stall clock cycles/IC)*Clock cycle
time

Memory stall cycles = IC * Memory reference per instruction *miss rate *miss penalty Measure of memory-hierarchy performance Average memory access time = Hit time + Miss rate * Miss penalty

Example on page 395. Example on page 396.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-8

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-9

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-10

Four Memory Hierarchy Questions


Q1: Where can a block be placed in the upper level? ( block placement) Q2: How is a block found if it is in the upper level? ( block identification) Q3: Which block should be replaced on a miss? ( block replacement) Q4: What happens on a write? ( write strategy)

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-11

Block Placement (1)


Q1: Where can a block be placed in a cache?
Direct mapped: Each block has only one place it can appear in the cache. The mapping is usually
(Block address) MOD (Number of blocks in cache)

Fully associative: A block can be placed anywhere in the cache. Set associative: A block can be placed in a restricted set of places in the cache. A set is a group of blocks in the cache. A block is first mapped onto a set, and then the block can be placed anywhere within that set. The set is usually obtained by
(block address) MOD (Number of sets in a cache) If there are n blocks in a set, the cache is called n-way set associative.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-12

Block Placement (2)

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-13

Block Identification
Q2: How is a block found if it is in the cache
Each cache block consists of
Address tag: Give the block address Valid bit: Indicate whether or not the associated entry contains a valid address. Data

Relationship of a CPU address to the cache


Address presented by CPU Block address ## Block offset Index: Select the set Block offset: Select the desired data from the block.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-14

Identification Steps
Index field of the CPU address is used to select a set. Tag field presented by the CPU is compared in parallel to all address tags of the blocks in the selected set. If any address tag matches the tag field of the CPU address and its valid bit is true, it is a cache hit. Offset field is used to select the desired data.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-15

Associativity versus Index Field


If the total cache size is kept the same,
Increasing associativity increases the number of blocks per set, thereby decreasing the size of the index and increasing the size of the tag.

The following formula characterized this property:


2index = (cache size)/(block size *set associativity).

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-16

Block Replacement
Q3: Which block should be replaced on a cache miss?
For direct mapped cache, the answer is obvious. For set associative or fully associative cache, the following two strategies can be used:
Random Least-recently used (LRU) First in, first out (FIFO)

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-17

Comparison of Miss Rate between Random and LRU


Fig. 5.6 on page 400

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-18

Write Strategy
Q4: What happens on a write?
Traffic patterns
Writes take about 7% of the overall memory traffic and take about 25% of the data cache traffic. Though read dominates processor cache traffic, write still can not be ignored in a high performance design.

Read can be done faster than write


In reading, the block data can be read at the same time that the tag is read and compared. In writing, modifying a block cannot begin until the tag is checked to see if the address is a hit.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-19

Write Policies and Write Miss Options


Write policies
Write through (or store through) Write to both the block in the cache and the block in the lower-level memory. Write back Write only to the block in the cache. A dirty bit, attached to each block in the cache, is set when the block is modified. When a block is being replaced and the dirty bit is set, the block is copy back to main memory. This can reduce bus traffic.

Common options on a write miss


Write allocate The block is loaded on a write miss, followed by the write-hit. No-write allocate (write around) The block is modified in the lower level and not loaded into the cache.

Either write miss option can be used with write through or write back, but write-back caches generally use write allocate and write-through cache often use no-write allocate.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-20

Comparison between Write Through and Write Back


Write back can reduce bus traffic, but the content of cache blocks can be inconsistent with that of the blocks in main memory at some moment. Write through increases bus traffic, but the content is consistent all the time. Reduce write stall
Use a writing buffer. As soon as the CPU places the write data into the writing buffer, the CPU is allowed to continue.

Example on page 402

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-21

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-22

An Example: the Alpha 21264 Data Cache


Features
64K bytes of data in 64-byte blocks. Two-way set associative. Write back with a dirty bit. Write allocate on a write miss.

The CPU address


48-bit virtual address 44-bit physical address 38-bit block address 29-bit tag address 9-bit index, obtained by 2index = 512= 65536/(64*2) 6-bit block offset

FIFO replacement strategy What happen on a miss?


64-byte block is fetched from main memory in four transfer, each takes 5 clock cycles.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-23

Cache Access Steps

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-24

Unified versus Split Caches


Unified cache: A cache contains instructions and data. Spit caches: Data is contained only in data cache, while instruction is contained in instruction cache.
Fig. 5.8 on page 406.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-25

Cache Performance
Average memory access time for processors with in-order execution
Average memory access time = Hit time + Miss rate * Miss penalty Examples on pages 408 and 409

Miss penalty and out-of-order execution processors


Memory stall cycles / instruction = Misses/instruction * (Total miss latency Overlapped miss latency) Length of memory latency: Time between the start and the end of a memory reference in an out-of-order processor. Length of latency overlap: A time period of memory latency overlapping the operations of the processor.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-26

Improving Cache Performance


Reduce the miss rate Reduce the miss penalty Reduce the hit time Reduce the miss penalty or miss rate via parallelism

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-27

Reducing Cache Miss Penalty


Multilevel caches Critical word first and early restart Giving priority to read misses over writes Merging write buffers Victim caches

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-28

Multilevel Caches
Question:
Larger cache or faster cache? A contradictory scenario. Solution: Adding another level of cache. Second level cache complicates performance evaluation of cache memory.
Average memory access time = Hit timeL1 + Miss rateL1 *Miss penaltyL1

Where,
Miss penaltyL1 = Hit timeL2 + Miss rateL2 * Miss penaltyL2

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-29

Local and Global Miss Rates


The second-level miss rate is measured on the leftovers from the first-level cache.
Local miss rate (Miss rateL2)
The number of misses in the cache divided by the total number of memory accesses to this cache.

Global miss rate (Miss rateL1 *Miss rateL2)


The number of misses in the cache divided by the total number of memory accesses generated by the CPU.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-30

Miss Rate versus Cache size

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-31

Two Insights and Questions


Two insights from the observation of the results shown above:
The global cache miss rate is very similar to the single cache miss rate of the second-level cache. The local cache miss rate is not a good measure of secondary caches; The global cache miss rate should be used because the effectiveness of second-level cache is a function of the miss rate of the first-level cache.

Two questions for the design of the second-level cache:


Will it lower the average memory access time portion of the CPI, and how much it cost?

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-32

Example (P417)

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-33

Influence of L2 Hit Time

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-34

Early Restart and Critical Word First


Basic idea: Dont wait for the full block to be loaded before sending the requested word and restarting the CPU.
Two strategies: Early restart: As soon as the requested word of the block arrives, send it to the CPU and let the CPU continue execution. Critical word first: Request the missed word first from memory and send it to the CPU as soon as it arrives; let the CPU continue execution while filling the rest of the words in the block. Example on page 419.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-35

Given Priority to Read Miss over Writes


A write buffer can free the CPU from waiting for the completion of write, but it could hold the updated value of a location needed on a read miss. This complicates memory access, i.e., it may cause a RAW hazard.
Two solutions: The read miss waits until the write buffer is empty. This certainly increases miss penalty. Or, Check the contents of the write buffer on a read miss, and let the read miss fetch the data from the write buffer. Example on page 419

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-36

Merging Write Buffers

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-37

Victim Caches (1)


Victim cache
A small fully associative cache contains only blocks that are discarded from a cache because of a miss -- victim. The blocks of the victim cache is checked on a miss to see if they have the desired data before going to the next lower-level memory. If it is found there, the victim block and cache block are swapped. A four entry victim cache can remove 20% to 95% of the conflict misses in a 4-KB direct mapped data cache.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-38

Victim Caches (2)

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-39

Reducing Miss Rate


Larger block size Larger caches Higher associativity Way prediction and psudoassociative caches Compiler optimizations

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-40

Miss Categories
Compulsory miss
The first access to a block is not in the cache.

Capacity miss
Occur because of blocks being discarded and later retrieved if the cache cannot contain all the blocks needed during execution of a program.

Conflict miss
Occur because a block can be discarded and later retrieved if two many blocks map to its set for direct mapped or set associative caches.

What can a designer do with the miss rate?


Reduce conflict miss is the easiest: Fully associativity, but very expensive. Reduce capacity miss: Use large cache. Reduce compulsory miss: Use large block.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-41

Miss Rate for Each Category

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-42

Larger Block Size


Reduce compulsory miss by taking advantage of spatial locality.

Increase miss penalty


Increase capacity miss if cache is small. The selection of block size depends on both the latency and bandwidth of the lower-level memory:
High latency and high bandwidth encourages larger block sizes. Low latency and low bandwidth encourages smaller block sizes.

Example on page 426.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-43

Example (P426)

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-44

Miss Rate, Block Size versus Cache Size

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-45

Average Memory Access Time, Block Size versus Cache Size


Fig. 5.18 on page 428

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-46

Larger Caches
Drawbacks
Longer hit time Higher cost

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-47

Higher Associativity
Two general rules of thumb
8-way set associative is for practical purposes as effective in reducing misses as fully associative. 2:1 cache rule of thumb A direct mapped cache of size N has about the same miss rate as a 2-way set-associative cache of size N/2.

The pressure of a fast processor clock cycle encourages simple cache, but the increasing miss penalty rewards associativity Example on page 429.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-48

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-49

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-50

Average Memory Access Time versus Associativity


Fig. 5.19 on page 430

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-51

Way Prediction
Reduce conflict misses and yet maintain the hit speed of a direct-mapped cache. Way prediction
Extra bits are kept in the cache to predict the way, or block within the set of the next cache access. It means the MUX can be set early to select desired block. A miss results in checking the other blocks for matches. Alpha 21264 uses such technique.
Hits take 1 cycle Misses take 3 cycles

Can also be used to reduce power consumption.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-52

Pseudoassociative Caches
Access proceed just as in the direct-mapped cache for a hit. On a miss, a second cache entry is checked to see if it matches there.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-53

Compiler Optimizations
Loop intercahnge
Reduce misses by improving spatial locality

Blocking
Reducing capacity miss

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-54

Blocking

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-55

Reducing Cache Miss Penalty or Miss Rate via Parallelism


Nonblocking caches to reduce stalls on cache misses Hardware prefetching of instructions and data Compiler-controlled prefetching

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-56

Nonblocking Caches to Reduce Stalls on Cache Misses


For pipeline machines that implement Tomasulos algorithm, allowing out-of-order completion, the CPU need not stall on a cache miss. A nonblocking cache can escalates the potential benefits of such a scheme by allowing the data cache to continue to supply cache hits during a miss. This is called hit undermiss. When the allowable misses are more than one, it is called hit under multiple misses. Example on page 436.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-57

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-58

Performance of Hit-Under-Miss

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-59

Hardware Prefetching of Instructions and Data


A processor fetches two (consecutive) blocks on a miss.
The requested block is placed in the instruction (data) cache when it returns. The prefetched block is placed into instruction (data) stream buffer. When the requested block can be found and read from the stream buffer, the next prefetch request is issued.

With four instruction (data) stream buffers, the hit rate improves to 50% (43%).

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-60

Controller-Controlled Prefetching
Compiler inserts prefetch instructions to request the data before they are needed.
Register prefetch Cache prefetch

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-61

Reducing Hit Time


Hit time is critical because it affects the clock rate of the processor. Strategies to reduce hit time
Small and simple cache: direct mapped Avoid address translation during indexing of the cache Pipelined cache access Trace cache

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-62

Access Time versus Cache Size

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-63

Summary of Cache Optimizations


Fig. 5.26

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-64

Main Memory and Organization for Improving Performance


Performance measures of main memory emphasizes both latency and bandwidth.
Traditionally, latency is the primary concern of the cache, while the bandwidth is the primary concern of I/O. However, with a second-level cache and their larger block size, bandwidth becomes important to caches as well. It is easier improve the memory bandwidth with new organization.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-65

Techniques for Improving Bandwidth


Techniques
Wider main memory Simple interleaved memory Independent memory banks

Assume the performance of the basic organization is


4 clock cycles to send address
56 clock cycles for the access time per word (8 bytes) 4 clock cycle to send a word of data
Given a cache block of four words, the miss penalty is 4*(4+56+4)=256 clock cycles.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-66

Wider Main Memory (1)


With a main memory width of two words, the miss penalty for the above example would drop from 256 cycles to 128 cycles.
Drawbacks: Increase the critical path timing by introducing a multiplexer in between the CPU and the cache. Memory with error correction has difficulties with writes to a portion of the protected block (e.g. a write of a byte).

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-67

Wider Main Memory (2)

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-68

Simple Interleaved Memory


Basic concept
Memory chips can be organized in banks to read or write multiple words at a time rather than a single word. The addresses are sent to several banks permits them all to read at the same time. The miss penalty with this scheme becomes 4+56+4*4= 76 cycles. The mapping of addresses to banks affects the behavior of the memory system. Usually, The addresses are interleaved at word level.

Example on page 452.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-69

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-70

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-71

Independent Memory Banks


Multiple memory controllers allow banks to operate independently. Each bank needs separate address lines and possibly a separate data bus.
Such a design enables the use of nonblocking cache.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-72

Memory Technology
Performance metrics
Latency: two measures
Access time: The time between when a read is requested and when the desired word arrives. Cycle time: The minimum time between requests to memory.

Usually cycle time > access time

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-73

DRAM
Refresh time < 5%; slow increase in speed.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-74

SRAM, ROM and Flash Technology


SRAM
No refresh 8 to 16 times faster than DRAM 8 to 16 times more expensive than DRAM Suitable for embedded applications

ROM and flash


Non-volatile Best suit the embedded processors

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-75

Improving Memory Performance in a Standard DRAM Chip


Use of multi-bank organization provides larger bandwidth Other three methods to increase bandwidth
Fast page mode
Repeated accesses to a row without another row access time.

Synchronous DRAM
Have a programmable register to hold the number of bytes requested and hence can send many bytes over several cycles per request with the overhead of synchronizing the controller.

Double Data Rate (DDR) DRAM


Use falling and rising edges of the clock for transfering data.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-76

RAMBUS DRAM (RDARM)


Each chip has interleaved memory and a high-speed interface and acts more like a memory system. RDARM: First generation RAMBUS DRAM
Drop RAS/CAS, replacing it with a bus that allows other accesses over the bus between the sending of the address and return of the data (called packet-switched bus or split-transaction bus). Use double edges of the clock. Runs at 300MHZ.

Direct RDRAM (DRDRAM): Second generation


Separate data, row, column buses such that three transactions on these buses can be performed simultaneously. Runs at 400 MHZ.

Comparing RAMBUS and DDRSDRAM


Both increase memory bandwidth. None help in reducing latency.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-77

Virtual Memory (VM)


VM divides physical memory into blocks and allocates them to different processes, each of which has its own address space. Need a protection scheme that restricts a process to the blocks belonging only to that process. With VM, not all code and data are needed to be in physical memory before a program can begin. VM provides process (program) relocation. Virtual address
Given by CPU

Physical address
For having an access to main memory

Address translation
Convert a virtual address to a physical address. Can easily form the critical path that limits the clock cycle time.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-78

Types of VM
Paged Segmented Paged segment Fig. 5.34 on 463

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-79

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-80

Address Space Mapping in VM

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-81

Differences between Caches and VM


Replacement
On cache is managed by hardware, while On VM is managed by OS

The size
Of VM is determined by the size of processor address. Of cache is independent of processor address size.

Second storage in VM occupied by file system is not normally in the address space.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-82

Parameter Ranges of Caches versus VM


Fig. 5.32 on page 462.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-83

Four Memory Hierarchy Questions


Q1: Where can a block be placed in main memory?
Anywhere (fully associative)

Q2: How is a block found if it is in main memory?


Use page table, or Use inverted page table to reduce the size of page table by hashing. Problems
Need two memory accesses to obtain requested data. Solution is to use translation lookaside buffer.

Q3: Which block should be replaced on a VM miss?


LRU

Q4: What happens on a write?


Write back

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-84

Concept of Addressing Mapping Using Page Table

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-85

Techniques for Fast Address Translation


Problem with pure page table translation
Needs to have two memory accesses

Translation Lookaside Buffer (TLB) solves the problem.


Use locality of page table references. A fully associative memory whose entries record the most recently used base addresses of the pages. Each entry consists
Tag Physical page frame number Protection field Valid bit, dirty bit, and used bit ASN (Address Space Number): to identify which process owns the corresponding page.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-86

Alpha 21264 TLB

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-87

Summary of VM and Caches

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-88

Protection and Examples of VM


Process
A running program plus any state needed to continue running it.

Process (context) switch


One process is stop execution and another process is brought into execution.

Requirements for context switches


Be able to save CPU states for continue execution
A computer designers responsibility

Protect a process from been interfered by another process


OSs responsibility

Computer designers can make protection easily implemented by the OS via VM design.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-89

Protecting Process
The simplest mechanism
Use base and bound registers
An access is valid if Base <= Address <= Bound

To enable this protection, computer designers have the following three responsibilities:
Provide at least two execution modes: user or kernel (OS, supervisor) modes Provide a portion of the CPU state that a user process can use but not write. Provide mechanisms whereby the CPU can go from user mode to kernel modes.

More Sophisticated Mechanisms


Ring Capabilities
A program cant unlock access to the data unless it has keys (capabilities).

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-90

The Alpha Memory Management and the 21264 TLB


Alpha VM architecture
A combination of segmentation and paging, providing protection while minimizing page table size
64-bit address space, but with 48-bit virtual addresses Three segments, each of which is paged seg0 (bits 63~46 = 000): hold user processes seg1 (bits 63~46 = 111): kseg (bits 63~46 = 010): reserved for operating system kernel

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-91

Mapping of an Alpha Virtual Address


Each page table is held in a page.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-92

Memory Protection in Alpha 21264


Each page table entry(PTE)has 64 bits
The first 32 bits contain the physical page frame number. The other half includes the following protection fields:
Valid User read enable Kernel read enable User write enable Kernel write enable

The Alpha obeys only the protection requirements imposed by the bottom-level PTEs.

Chapter 5: Memory Hierarchy Design

Rung-Bin Lin

5-93

Concluding Remarks
The primary challenge for the memory hierarchy designer is in choosing parameters that work well together, not in inventing new techniques (already enough).