Anda di halaman 1dari 98

UNIT 3

Memory Organization
Unit 3 – Memory
Organization
 Memory Characteristics & Hierarchy
 Internal Memory
 Memory size and Memory Cell
 RAM, SRAM, DRAM
 Advanced DRAM-(SDRAM, DDR, RDRAM, RDRAM)
 ROM and its types
 Cache Memory
 Cache memory principles
 Multi level cache
 Cache Mapping techniques
 Replacement Algorithms
 Write policy – Write through and write back
 External Memory
 Magnetic disks, Disk Layout
 Optical memory
Memory
 Memory is one of the key components of the
embedded systems.
 It is also one of the major limiting resources
in the embedded systems.
 Building block of memory – bit
 Stores one piece of Boolean information (0
or 1)
Memory Interfaces
 CPU reads or writes data to Memory through
Bus and Memory controllers; requires
address, data, and read/write signal.
Memory Characteristics
 Location
 Volatility
 Capacity
 Unit of transfer
 Access methods
 Performance
Memory Characteristics
Locatio Refers to whether memory is internal and
n external to the computer.
Internal Memory = Main Memory (The
processor requires its own local memory, in
the form of registers)
Internal (processor registers, cache, main
memory)
External (optical disks, magnetic disks,
tapes)
Volatili Refers to the ability of memory to store
ty data without power.
Two types: volatile and non-volatile
Volatile - requires power to store data :
SRAM, DRAM
Memory Characteristics
Capaci Refers to the amount of storage a memory
ty can hold.
• Number of bytes ( 8 bits )
• Number of words - biggest chunk of bits with
which a processor can do processing.
• WORD, DOUBLEWORD ( DWORD ) and
QUADWORD ( QWORD ) are used for 2, 4 and
8 byte sizes.
Unit Internal Governed by data bus width i.e. size of a bus
16-bitbus can transmit 16 bits of data
of i7 bus width Total (192 bits) and Address
transf bus-40 bits
er External Usually a block which is much larger than a
word
Block - A "memory block" is a contiguous
chunk of memory.
Example - malloc(2*sizeof(int));

Address Smallest location which can be uniquely


Memory Characteristics
Acces Sequenti Shared read/write mechanism.
Start at the beginning and read through in
s al
linear order.
Metho (Tapes) Access time depends on location of data and
ds previous location.
Direct Abilityto obtain data from a storage device
(Allows (Disk) by going directly to where it is physically
access to located on the device.
any part Shared read/write mechanism.
of Individual blocks have unique address based
memory on physical location.
given the Access time is variable (depends on location
address of and previous location)
the
location) Random Each addressable location in memory has a
(RAM) unique, physically wired-in addressing
mechanism.
Individual addresses identify locations
exactly.
Access time is independent of location or
previous access.

Memory Characteristics
Perfor Access For RAM, it is the time taken to perform a
read or write operation i.e. Time between
mance Time /
presenting the address and getting the valid
Latency data.
For non-RAM, it is time taken to position the
read–write mechanism at the desired location.
Memory Time may be required for the memory to
Cycle “recover”
MCT = Latency + recovery
Time i.e. additional time (regenerating signals or
MCT data) required before a second access can
commence.

Transfer Rate at which data can be transferred to and


from memory units.
Rate For RAM - (# of bits(bytes)) * (1/(cycle time))
i.e. CPU cycle time in ns
For non-RAM, it is Tn = TA + n/R

Tn = Average time to read or write n bits


TA = Average access time
Memory Hierarchy

Design Tradeof
considerations Faster access time, greater

How much? – Memory cost per bit.

Capacity Greater capacity, smaller

How fast? – Wired with cost per bit and slower

Processor i.e. access time. access time.

How expensive? –
Reasonable i.e. cost.
Memory Hierarchy
Smaller, more expensive,
faster memories.

 Decreasing cost per bit.


 Decreasing frequency of
access of the memory by
the processor. (Locality of
Reference)
 Increasing capacity.
 Increasing access time.

Larger, cheaper, slower


memories
Memory Hierarchy
Example- For executing any program; CPU refers
memory for instructions and data;
 Registers - Internal to the processor.
 A Register is a group of flip-flops with each flip-
flop capable of storing one bit of information.
 The Intel Core i7 processors have 8 registers in
32-bit mode and 16 registers in 64-bit mode.
 Internal or Main memory - RAM
 Cache - A device for staging the data movement
between main memory and processor registers to
improve performance.
 May include one or more levels of cache.
Principle of Locality
 Principle of Locality: states that program access
a relatively small portion of their address space
at any instant of time. ( Analogy : Library)
 Two different types of locality

1. Temporal Locality( Locality in Time): If an

item is referenced, it will be tend to be


referenced again soon.
2. Spatial Locality(Locality in space): If an item

is referenced, items whose addresses are


close by will tend to be referenced soon.
INTERNAL
MEMORY
Internal Memory
 Memory Size
 A survey of semiconductor main memory
subsystems
 ROM
 DRAM, and
 SRAM memories
Memory Organization

The basic element of a semiconductor memory is the Memory


Cell.
Properties -
• Exhibits 2 stable (or semistable) states, representing
binary 1 & 0.
• Capable of being written into (at least once), to set
the state.
• They are capable of being read to sense the state.
Selects a memory cell for
Operation of Memory Cell a read or write operation.

Control terminal indicates


read or write.
Semiconductor Memory Types
(Individual memory words are directly accessed through wired-in addressing logic)

Write
Memory Type Category Erasure Volatility
Mechanism

Random-access Read-write Electrically, byte- Electrically Volatile


memory- RAM memory level

Read-only Masks
memory- ROM Read-only Not possible
memory
Programmable
ROM (PROM)

Erasable PROM UV light,


(EPROM) chip-level Nonvolatile
Electrically
Electrically Read-mostly
memory Electrically, byte-
Erasable PROM level
(EEPROM)

Electrically,
Flash memory block-level
Dynamic RAM – DRAM

RAM types-
• Dynamic RAM
• Static RAM

 DRAM
 Made with cells that store data as charge on
capacitors.
 Presence or absence of charge in a capacitor is
interpreted as a binary 1 or 0.
 Requires periodic charge refreshing to
maintain data storage.
 The term dynamic refers to tendency of the stored
charge to leak away, even with power
DRAM cell structure for 1 bit
(Bits stored as charge in capacitors)

 Transistor acts as a switch-


 Close – (Allows current flow) i.e.
Voltage applied to address line.
 Open – (No current flows)

 WRITE OPERATION
 A voltage signal is applied
to the bit line.
 A high voltage represents 1
A signal is applied to
address line i.e. charge is
 Address line is transferred to capacitor.
activated when bit
value is to be read
or written.
DRAM cell structure for 1
bit
 READ OPERATION
 The instruction find the
bit store using the
address line to read the
data or bit.
 The address line is
selected.
 The transistor turns on.
 The charge stored on the
capacitor is fed out onto a
bit line and a sense
amplifier.

 Capacitor charge must be


SRAM
Bits stored as on/off switches

 Digital devices that uses


same logic elements used
in the Processor.
 Binary values are stored
using traditional flip-flop
logic-gate configurations.
 L2 and L3 cache in a CPU.
 Present on Processors or
between Processor and
Main Memory.
 Will hold its data as long as
power is supplied to it.
 No refresh is needed to
retain data.
Static RAM Operation
 Transistor arrangement gives stable logic
state
 State 1
 C1 high, C2 low
 T1 T4 off, T2 T3 on
 State 0
 C2 high, C1 low
 T2 T3 off, T1 T4 on
 Address line transistors T5 T6 is switch
 Write – apply value to B & compliment to
B
SRAM versus DRAM
 Both volatile
 Power must be continuously supplied to the memory to
preserve the bit values
 Dynamic cell
 Simpler to build, smaller
 More dense (smaller cells = more cells per unit area)
 Less expensive
 Requires the supporting refresh circuitry
 Tend to be favored for large memory requirements
 Used for main memory
 Static
 Faster
 Used for cache memory (both on and off chip)
Interleaved Memory
 Collection of DRAM chips

 Grouped into memory bank

 Banks independently service read or

write requests

 K banks can service k requests

simultaneously
Read Only Memory (ROM)
 Contains a permanent pattern of data that
cannot be changed or added to.
 No power source is required to maintain the bit
values in memory.
 Data or program is permanently in main
memory and never needs to be loaded from a
secondary storage device.
 Data is actually wired into the chip as part of the
fabrication process
 Disadvantages of this:
 No room for error, if one bit is wrong the whole batch
of ROMs must be thrown out.
 Data insertion step includes a relatively large fixed
Programmable ROM
(PROM)
 Less expensive alternative.
 Nonvolatile and may be written into only once.
 Writing process is performed electrically and may
be performed by supplier or customer at a time
later than the original chip fabrication.
 Special equipment is required for the writing
process.
 Provides flexibility and convenience.
 Attractive for high volume production runs .
Read-Mostly Memory
Flash
EEPRO
EPROM Memor
M
Erasable
Electrically erasable
y Intermediate
programmable read- between EPROM and
programmable read- only memory EEPROM in both cost
only memory
and functionality
Can be written into at
any time without
erasing prior Uses an electrical
Erasure process can contents erasing technology,
be performed Combines the does not provide
repeatedly advantage of non- byte-level erasure
volatility with the
flexibility of being
updatable in place Microchip is
More expensive than
organized so that a
PROM but it has the
section of memory
advantage of the More expensive than cells are erased in a
multiple update EPROM single action or
capability
“flash”
Advanced DRAM
Organization SDRAM
 One of the most critical system
bottlenecks when using high-
performance processors is the
interface to main internal memory. DDR-DRAM
 The traditional DRAM chip is constrained
both by its internal architecture and by
its interface to the processor’s memory
bus. RDRAM
 A number of enhancements to the basic
DRAM architecture have been explored:

Table: Performance Comparison of Some DRAM Alternatives


Synchronous DRAM (SDRAM)
One of the most widely used forms of DRAM

Exchanges data with the processor


synchronized to an external clock signal
and running at the full speed of the
processor/memory bus without imposing wait
states.

With synchronous access the DRAM moves data in and out under
control of the system clock.
• The processor issues the instruction and address information.
• The DRAM then responds after a set number of clock cycles.
• Meanwhile the master can perform other tasks while the
SDRAM is processing.
Developed by Rambus

RDRAM
Bus delivers address
Designed to transfer
data at faster rates.
and control information
using an asynchronous
block-oriented protocol
• Gets a memory Adopted by Intel for its
request over the Pentium and Itanium
high-speed bus. processors
• Request contains the
desired address, the
type of operation, and
the number of bytes in
the operation.

Bus can address up to


Has become the main
320 RDRAM chips and is
competitor to SDRAM
rated at 1.6 gbps

Chips are vertical


packages with all pins on
one side
Double Data Rate SDRAM -
(DDR SDRAM)
 SDRAM can only send data once per bus clock cycle.
 Double-data-rate SDRAM can send data twice per
clock cycle, once on the rising edge of the clock
pulse and once on the falling edge
 DDR2, DDR3, DDR4
 Prefetch butter for DDR, DDR2, DDR3, and DDR4 is
2, 4, and 8n i.e. 2, 4, and 8 datawords per memory
cycle.
CACHE
MEMORY
Cache Memory

 Small amount of fast memory


 Sits between normal main memory and CPU
 May be located on CPU chip or module

Cache hierarchy. As we go up the cache hierarchy towards the CPU,


the caches get smaller, faster, and more expensive to implement;
conversely, as we go down the cache hierarchy they get larger,
cheaper, and much slower.
The operation of cache memory
1. Cache fetches data 2. CPU checks to see
from next to current whether the next
addresses in main instruction it requires is in
memory cache

Cache
Main
Memory
Memory CPU
(SRAM)
(DRAM)
4. If not, the CPU has to 3. If it is, then the
fetch next instruction instruction is fetched
from main memory - a from the cache – a very
much slower process fast position

= Bus connections
Cache operation - overview
 CPU requests contents of memory
location
 Check cache for this data
 If present, get from cache (fast)
 If not present, read required block from
main memory to cache
 Then deliver from cache to CPU
 Cache includes tags to identify which
block of main memory is in each cache
slot
Typical Cache Organization
Cache Memory Structure
Cache Design
Size does matter
 Cost
 More cache is expensive
 Speed
 More cache is faster (up to a point)
 Checking cache for data takes time
Cache Addresses – Virtual
Memory
 Virtual memory
 Facility that allows programs to address
memory from a logical point of view, without
regard to the amount of main memory
physically available.
 When used, the address fields of machine
instructions contain virtual addresses.
 For reads to and writes from main memory, a
hardware memory management unit (MMU)
translates each virtual address into a physical
address in main memory.
Logical and Physical Cache
Multilevel Cache
Multilevel Cache
 As logic density has increased it has become
possible to have a cache on the same chip as the
processor.
 The on-chip cache reduces the processor’s
external bus activity and speeds up execution
time and increases overall system performance.
When the requested instruction
or data is FOUND in the ON- Bus access is eliminated
CHIP CACHE
On-chip CACHE ACCESSES will zero-wait state bus cycles
complete appreciably FASTER.
During this process the bus is free to support other transfers.
Multilevel Cache
 Two level cache:
 With internal (on-chip) cache as Level L1
 External(off-chip) cache as level L2
 Potential savings due to the use of an L2 cache
depends on the hit rates in both the L1 and L2
caches.
 Multilevel Cache design features:
 For an off-chip L2 cache, use a separate data
path, to reduce the burden on system bus.
 Incorporate the L2 cache on the processor chip
for improving performance.
 The use of multilevel caches complicates all of
the design issues related to caches, including
Mapping Functions
 Because there are fewer cache lines than
main memory (MM) blocks, an algorithm is
needed for mapping main memory blocks
into cache lines.
 The transformation of data from MM to CM is
referred to as a Mapping Process.
 Tells which word of MM will be placed at
which location of cache.
 Three techniques can be used-
 Direct Mapping
 Fully Associative Mapping
Assumptions for Mapping
Functions
Cache Memory – CM Main Memory - MM

64 KB 16 MB
LINE SIZE = 4 bytes - 22 BLOCK SIZE = 4 bytes - 22

LINES = 210.26 = 216 = 214.22 i.e. BLOCKS= 220.24 = 224 = 222.22


214 lines of 4 bytes each i.e. 222 blocks of 4 bytes each

Total 24 bits
i.e. 16 KB Lines of 4 bytes each i.e. 4M blocks of 4 bytes each

210 = 1 KB 64 KB = 210. 26
220 = 1 MB 16 MB = 220.24
230 = 1 GB 4 GB = 230.22
Direct Mapping

 Simplest technique (Many to one


mapping)
 Maps each block of main memory into
only one possible cache line.
 If the jth block of MM has to be placed at ith
i = j then,
block of cache modulo (# of lines in
cache)
 Example:
 Assume there are 128 blocks(lines) in
cache then
Direct Mapping
i = j modulo (# of lines in
cache)

 i.e. FIRST m
blocks maps to
first cm blocks of
cache.

 NEXT m blocks
maps in the same
fashion i.e.
Block Bm to L0
Block Bm+1 to L1
Direct Mapping Address
Structure
Tag Line Word i.e. Block
Ofset
Identifies block Block # in cache Identifies a unique
inside cache. where a MM block word within a block of
will be placed. MM.
s-r bits r bits w bits
Address length =
s+w
i.e. Tag + Line +
Word
MM blocks / Cache Memory Bits as per Block
CM blocks size / Block size size
OR
Total bits – line Log2(# blocks in Log2(# words per
bits – word bits Cache) block)
Direct Mapping Address
Structure
Tag s-r Line or Slot r Word w

8 14 2

 24 bit address
 4 byte block i.e Word w = 22 i.e. 2 bits
 24 -2 = 22 bit block identifier
 8 bit tag (=22-14)
 14 bit slot or line
 No two blocks in the same line have the same Tag
field
 Check contents of cache by finding line and
checking Tag
Direct Mapping Cache Organization
Direct Mapping Example
Direct Mapping Summary
 Address length = (s + w) bits
 Number of addressable units = 2s+w words
or bytes
 Block size = line size = 2w words or bytes
 Number of blocks in main memory =
Address length / block size =
2s+ w/2w = 2s
 Number of lines in cache = m = 2r
 Size of tag = (s – r) bits
Direct Mapping pros & cons
 Simple
 Inexpensive
 There is a fixed cache location for any
given block.
 If a program happens to reference words
repeatedly from two different blocks that
map into the same line, then the blocks
will be continually swapped in the cache,
and the hit ratio will be low.
Fully Associative Mapping
 Fastest and most flexible.
 Many to many function mapping.
 Any block of MM can be presented in any
line of CM.
 Memory address is interpreted as tag and
word.
 Tag uniquely identifies block of memory.
Fully Associative Mapping
 Compare tag field with tag entry in cache to
check for hit.
 Every line’s tag is examined for a match as
a comparator and later the results are given
to OR gate to get the actual HIT/MISS.
 Cache searching gets expensive.
Fully Associative Mapping
Summary
 Address length = (s + w) bits
 Number of addressable units = 2s+w words or

bytes
 Block size = line size = 2w words or bytes

 Number of blocks in main memory = 2 s+ w/2w = 2s


 Number of lines in cache = undetermined

 Size of tag = s bits


DISADVANTAGE:
• Cache searching gets expensive –
 Any memory address is present; it is difficult to search
it.
• Hardware cost increases as we need m comparators for
m lines.
• The complex circuitry required to examine the tags of
all cache lines in parallel.
Set Associative Mapping
 Cache is divided into a number of sets.
 Each set contains a number of lines.
 A given block maps to any line in a given
set.
 e.g. Block B can be in any line of set i
 e.g. 2 lines per set
 2 way associative mapping
 A given block can be in one of 2 lines in only one
set
 Cache control logic = Tag + sets + word
field
Set Associative Mapping
 K-associative means k-comparators.
 Block Bj can be mapped into any of the lines
of set j

k associative-mapped
caches
i.e. number of lines in
a set is k
Set Associative Mapping
 The cache consists of a number sets, each
of which consists of a number of lines.
 The relationships are -
Assumption –
2-set Associative; MM = 64 Bytes; CS = 32 Bytes;
Blk Size = 4 Bytes

Lines = Cache Size/Block Size = 32/4 = 8 lines


Sets = # of lines /set size = 8/2 = 4 sets
Linestotal = Setstotal * Lineseach-set
SetNumbercache= Block_NumberMM % Setstotal
Cache size= #sets * lines per set * line size
Set Associative Mapping Address
Structure

Word
Tag 9 bit Set 13 bit 2 bit

 Use set field to determine cache set to


look in.
 Compare tag field to see if we have a hit.
Mapping Techniques -
Summary

Cache control logic Cache control logic Cache control logic


Tag + Line # + Block offset Tag + Block offset Tag + set # + Block offset
Replacement Algorithms

 Once the cache has been filled, when a


new block is brought into the cache, one
of the existing blocks must be replaced.
 For direct mapping there is only one
possible line for any particular block and
no choice is possible.
 For the associative and set-associative
techniques a replacement algorithm is
needed.
 To achieve HIGH SPEED, an algorithm
must be implemented in hardware.
Replacement Algorithms (1)
Direct mapping
 No choice
 Each block only maps to one line
 Replace that line
Replacement Algorithms (2) -
Associative & Set Associative
 Hardware implemented algorithm (speed)
 Least Recently used (LRU) - Most effective
 Replace that block in the set that has been in the
cache longest with no reference to it.
 LRU is the most popular replacement algorithm, as it is
simple to implement.
 First in first out (FIFO)
 Replace that block in the set that has been in the
cache longest.
 Easily implemented as a round-robin or circular buffer
technique
 Least frequently used (LRU)
 Replace block in the set that has experienced the
fewest references.
 Could be implemented by associating a counter with
Write Policy
(Must not overwrite a cache block unless main memory is up to
date)

When a block that is resident


in the cache is to be replaced
there are two cases to 2 Issues
consider:

If the old block in the cache


has not been altered then it
may be OVERWRITTEN with More than one device may
a new block without first have access to main
writing out the old memory.
Block.
If at least one write
operation has been Multiple processors are
performed on a word in that attached to the same bus
cache line then, MM must be and each processor has its
UPDATED by writing the own local cache - if a
cache line to MM block before word is altered in one
bringing cache it could invalidate
in the new block. a word in other caches.
Write Through

 Simplest technique.
 All writes go to main memory as well as
cache.
 Multiple CPUs can monitor main memory
traffic to keep local CPU cache up to date.
 Disadvantages:
 Lots of traffic creating a bottleneck.
 Slows down writes
Write Back

 Minimizes memory writes.


 Updates are made only in the cache.
 Update/Dirty/Use bit for cache slot is set
when update occurs.
 If block is to be replaced, send it back to main
memory only if Use-bit is set.
 Other caches get out of sync.
 I/O must access main memory through
cache(because portions of MM are invalid)
 This makes for complex circuitry and a
potential bottleneck
Cache Coherence
 Two different processors
having two different
values for the same
location is called as
cache coherence
problem.
 A memory system is
coherent if any data read
returns the most recently
written value of that data
item.
 Coherence defines what
values can be returned
Basic Schemes for Enforcing
Coherence
 Migration: A data item can be moved to a
local cache and used there in a
transparent fashion. It reduces both the
latency & bandwidth.
 Replication: When shared data are being
simultaneously read, the caches make
a copy of data item in the local cache. It
reduces both latency of access and
contention for a read data item.
 The protocols to maintain coherence
for multiple processors are called
Software Solutions
 Compiler and operating system deal with
problem.
 Overhead transferred to compile time.
 Design complexity transferred from
hardware to software.
 However, software tends to make
conservative decisions.
 Inefficient cache utilization
 Analyze code to determine safe periods for
caching shared variables.
Hardware Solution
 Cache coherence protocols
 Directory protocols
 Snoopy protocols
 Dynamic recognition of potential
problems
 Run time
 More efficient use of cache
 Transparent to programmer
Directory Protocols
 Collect and maintain information about
copies of data in cache.
 Directory is stored in main memory.
 Requests are checked against directory.
 Appropriate transfers are performed.
 Creates central bottleneck.
 Effective in large scale systems with
complex interconnection schemes.
Snoopy Protocols
 Distribute cache coherence responsibility
among cache controllers.
 Individual caches monitor address lines
for MM accesses that they have cached i.e.
cache recognizes that a line is shared.
 Updates announced to other caches.
 Suited to bus based multiprocessor.
 Increases bus traffic.
Snooping Protocols
 The most popular cache coherence protocol
is snooping protocol.
 One method of enforcing coherence is to
ensure that a processor has exclusive
access to a data item before it writes that
item.
 It is also called as a write invalidate protocol
because it invalidates copies in other
Write Invalidate
 Multiple readers, one writer.
 When a write operation is observed, all
other caches of the line are invalidated;
which forces a read from main memory of
the new value on its next access.
 Writing processor has exclusive access until
line required by another processor.
 State of every line is marked as modified,
exclusive, shared or invalid
Write Update
 Multiple readers and writers.
 When a write operation is observed to a
cache, the cache controller updates its own copy
of the snooped memory location with the new
data.
 Updated word is distributed to all other
processors.
 Some systems use an adaptive mixture of both
solutions.
Cache Bus organization- Multiple caches & MM is shared.
Updating one cache; Invalidation in MM and other
Coherencycaches.
 Bus watching with write through
 Each cache controller monitors the address to detect any
write operations to memory.
 If another master writes to a location in shared memory
that also resides in the cache memory, the cache
controller invalidates that cache entry.
 Write-through policy by all cache controllers.
 Hardware transparency
 Additional hardware is used to ensure that all updates to
MM via cache are reflected in all caches.
 Non-cacheable memory
 Only a portion of MM is shared by more than one
processor.
 All accesses to shared memory are cache misses, because
78
the shared memory is never copied into the cache.
Performance of Cache

 The performance of cache memory is


frequently measured in terms of a
quantity called hit ratio.
On Write Miss
 Write allocate
 The line is allocated on a write miss,
followed by the write hit actions above.
 No write allocate
 Write misses do not interfere cache.
 Line is only modified in the lower level
memory.
 Mostly use with write-through cache.

80
When a block of data is retrieved and placed in the cache

Line Size not only the desired word but also some number of
adjacent words are retrieved.

The hit ratio


decreases as the
block becomes
As the block size bigger.
As the block size
increases the hit The probability of
increases more
ratio will at first using the newly
useful data are
increase because fetched information
brought into the
of the principle of becomes less than
Cache
locality the probability of
reusing the
information that has
Two specific effects come into play: to be replaced.
• Larger blocks reduce the number of blocks
that fit into a cache.
•As a block becomes larger each additional
word
81 is farther from the requested word.
Unified Vs Split Caches
 On-chip cache - a single cache used to store
references to both data and instructions.
 SPLIT Cache:
 One dedicated to instructions
 One dedicated to data
 Both exist at the same level, typically as two L1
caches.
The processor attempts to Instructio
Consults
fetch an n L1 cache
instruction from main
memory.
The processor attempts to Data L1
Consults
fetch an cache
data from main memory.
Unified Vs Split Caches
 Advantages of unified cache:
 Higher hit rate than split cache
 Balances load of instruction and data fetches
automatically.
 Only one cache needs to be designed and implemented.
 Split caches at the L1 and unified caches for higher
levels.
 Advantages of Split cache:
Eliminates cache contention between instruction
fetch/decode unit and execution unit.
Submit
When the Execution unit performs
 Importance in pipelining of instructions. UNIFIED
a memory access to load and req.
store data cache
Execution → Read Instruction
Unified & Split Caches –
Intel i7
Split caches at the L1 and unified caches for higher levels.
Pentium 4 Cache
 80386 – No on-chip cache
 80486 – 8k using 16 byte lines and four way set
associative organization
 Pentium (all versions) – two on-chip L1 caches
 Data & instructions
 Pentium 4
L1 cache L2 cache
16k bytes 512 k bytes
(Feeding both L1 caches)
Line Size - 64 byte lines Line Size - 128 byte lines
4-way Set associative 8-way Set associative
Pentium 4 Diagram
(Simplified)
2 1

4
Pentium 4 Core Processor
 Fetch/Decode Unit
 Fetches program instructions from L2 cache
 Decode into a series of micro-ops
 Stores the results in L1 instruction cache
 Out of order execution logic
 Schedules execution of micro-ops, based on data dependence and resource
availability.
 Different fetching and executing order.
 May speculatively execute
 Execution units
 Execute micro-ops
 Fetch the required data from L1 cache
 Temporary store results in registers
 Memory subsystem
 L2 cache + L3 cache + system bus
 Used to access main memory when the L1 and L2 caches have a cache miss
and to access the system I/O resources
Pentium 4 Design
Reasoning
 Decodes instructions into RISC like micro-ops before L1
cache.
 Performance improved by separating decoding from
scheduling & pipelining.
 Data cache is configured as write through.
 L1 cache is controlled by 2 bits in register
 CD = cache disable
 NW = not write through
 2 instructions to invalidate (flush) cache and write-back.
EXTERNAL
MEMORY
+
Types of External Memory

 Magnetic Disk
 RAID
 Removable

 Optical
 CD-ROM
 CD-Recordable (CD-R)
 CD-R/W
 DVD

 Magnetic Tape
Read and Write Mechanisms
 Recording & retrieval via conductive coil called a head
 May be single read/write head or separate ones
 During read/write, head is stationary, platter rotates
 Write
 Current through coil produces magnetic field
 Pulses sent to head
 Magnetic pattern recorded on surface below
 Read (traditional)
 Magnetic field moving relative to coil produces current
 Coil is the same for read and write
 Read (contemporary)
 Separate read head, close to write head
 Partially shielded magneto resistive (MR) sensor
 Electrical resistance depends on direction of magnetic field
 High frequency operation
 Higher storage density and speed
Data Organization and
Formatting
 Concentric rings or tracks
 Gaps between tracks
 Reduce gap to increase capacity
 Same number of bits per track (variable
packing density)
 Constant angular velocity
 Tracks divided into sectors
 Minimum block size is one sector
 May have more than one sector per
block
Magnetic Disk – Disk
layout

Inter-track and sector


gap prevents, or
minimizes, errors due
to head-misalignment
or interference of
magnetic fields.
Multiple Platter
 One head per side
 Heads are joined and aligned
 Aligned tracks on each platter form
cylinders
 Data is striped by cylinder
 reduces head movement
 Increases speed (transfer rate)
Multiple Platters
Solid State Drives
 The term solid state refers to electronic
circuitry built with semiconductors.
 Uses Flash Memory
 Advantages:
 High-performance input/output operations per
second (IOPS)
 Durable
 Longer lifespan
 Lower power consumption
 Lower access times and Latency
Optical Disks
Chapter 3 - Summary
 Memory hierarchy
 Internal Memory
 Cache Memory
 Mapping techniques
 Replacement techniques
 Write Policy

 External Memory

Anda mungkin juga menyukai