Week 12
Fall 2003
Heads Up
Memory hierarchies
- Reading assignment PH 7.1 and B.5
Reminders
Basics of caches
- Reading assignment PH 7.2
331 Lec19.2
Fall 2003
Devices
Memory
Datapath
Input
Secondary
Memory
(Disk)
Main
Memory
Cache
331 Lec19.3
Output
Fall 2003
Behavior
Partner
Data rate
Device
Behavior
Partner
Keyboard
input
human
0.01
Mouse
input
human
0.02
Laser printer
output
human
200.00
Graphics display
output
human
60,000.00
Network/LAN
input or
output
machine
500.00-6000.00
Floppy disk
storage
machine
100.00
Magnetic disk
storage
machine
2000.00-10,000.00
331 Lec19.4
Fall 2003
Magnetic Disk
Purpose
General structure
331 Lec19.5
Fall 2003
Platters
Track
331 Lec19.6
Fall 2003
Track
Sector
Cylinder
Head
Platter
331 Lec19.7
Fall 2003
Sun X6713A
Toshiba MK2016
3.5
2.5
Capacity
73 GB
20 GB
MTTF (k hrs)
1,200
300
# of platters - heads
2-4
# cylinders
16,383
# B/sector - # sectors/track
512 - 63
10,000
4,200
? - 6.6
24 - 13
7.14
35 MB/sec
16.6 MB/sec
< 2.5
Volume (in3)
4.01
Weight (oz)
3.49
331 Lec19.8
Fall 2003
bus
Main
Memory
Receiver
Keyboard
331 Lec19.9
Performance
Expandability
Performance Measures
331 Lec19.10
Fall 2003
Processor
Cache
Memory
Main
Memory
I/O
Controller
Disk
331 Lec19.11
Disk
I/O
Controller
Terminal
I/O
Controller
Network
Fall 2003
Bus Characteristics
Control Lines
Data Lines
Control lines
Data lines
331 Lec19.12
Fall 2003
Processor
Data
Step 2: Memory accesses data
Control
Main Memory
Processor
Data
Step 3: Memory transfers data to disk
Control
Main Memory
Processor
Data
331 Lec19.13
Fall 2003
Processor
Data
Processor
Data
331 Lec19.14
Fall 2003
Advantages
Versatility:
- New devices can be added easily
- Peripherals can be moved between computer systems that
use the same bus standard
Low Cost:
- A single set of wires is shared in multiple ways
Disadvantages
331 Lec19.15
Types of
Buses
Processor-Memory Bus (proprietary)
331 Lec19.16
Fall 2003
A Two Bus
System
Processor-Memory Bus
Processor
Memory
Bus
Adaptor
I/O
Bus
Bus
Adaptor
Bus
Adaptor
I/O
Bus
I/O
Bus
331 Lec19.17
Fall 2003
A Three Bus
System
Processor-Memory Bus
Processor
Memory
Bus
Adaptor
Backplane Bus
Bus
Adaptor
Bus
Adaptor
I/O Bus
I/O Bus
331 Lec19.18
Fall 2003
PCI
Interface/
Memory
Controller
Processor-Memory Bus
Audio I/O
Serial ports
I/O
Controller
I/O
Controller
CDRom
Disk
Tape
331 Lec19.19
SCSI bus
PCI
I/O
Controller
Graphic
Terminal
I/O
Controller
Network
Fall 2003
Memory controller
(Northbridge)
PCI Bus
I/O Busses
http://developer.intel.com/design/chipsets/850/animate.htm?iid=PCG+devside&
331 Lec19.20
Fall 2003
A Bus Transaction
- arbitration
- request
- action
Bus
Slave
Step 1: Disk wants to use the bus so it generates a bus request to processor
Control
Memory
Processor
Data
Processor
Data
Step 3: Processor gives slave (disk) permission to use the bus
Control
Processor
Data
331 Lec19.22
Memory
Fall 2003
A bus master wanting to use the bus asserts the bus request
A bus master cannot use the bus until its request is granted
331 Lec19.23
Fall 2003
Centralized Parallel
Arbitration
Device
1
Grant1
Bus
Arbiter
Device
2
Device N
Req
Grant2
Req
GrantN
Req
Control
Data
331 Lec19.24
Fall 2003
Advantage: involves very little logic and can run very fast
Disadvantages:
- Every device on the bus must run at the same clock rate
- To avoid clock skew, they cannot be long if they are fast
Asynchronous Bus
Advantages:
- Can accommodate a wide range of devices
- Can be lengthened without worrying about clock skew or
synchronization problems
Disadvantage: slow(er)
331 Lec19.25
Fall 2003
2
addr
data
3
4
6
5
DataRdy
I/O device signals a request by raising ReadReq and putting the addr on
the data lines
1.
Memory sees ReadReq, reads addr from data lines, and raises Ack
2.
I/O device sees Ack and releases the ReadReq and data lines
3.
4.
When memory has data ready, it places it on data lines and raises DataRdy
5.
I/O device sees DataRdy, reads the data from data lines, and raises Ack
6.
Memory sees Ack, releases the data lines, and drops DataRdy
7.
331 Lec19.26
Fall 2003
331 Lec19.27
PCI
SCSI
backplane
I/O
32 or 64
8 to 32
multiplexed
multiplexed
multiple
multiple
centralized
self-selection
synchronous
- 66 MHz)
(33
asynchronous
5 MB/sec
80 MB/sec
1.5 MB/sec
7 to 31
0.5 meters
25 meters
Fall 2003
Processor
Control
Datapath
331 Lec19.28
Devices
Memory
Input
Output
Fall 2003
eDRAM
.1s
1s
10s
100s
1,000s
Size (bytes):
100s
Ks
10Ks
Ms
Ts
Cost:
331 Lec19.29
ITLB DTLB
Speed (ns):
Datapath
RegFile
Instr Data
Cache Cache
Secondary
Memory
(Disk)
Second
Level
Cache
(SRAM)
highest
Main
Memory
(DRAM)
lowest
Fall 2003
Increasing
distance
from the
processor in
access time
L1$
8-32 bytes (block)
L2$
1 block
Inclusive what
is in L1$ is a
subset of what
is in L2$ is a
subset of what
is in MM that is
a subset of is
in SM
Main Memory
1,023+ bytes (disk sector = page)
Secondary Memory
331 Lec19.30
Fall 2003
Random Access
Size: DRAM/SRAM 4 to 8
331 Lec19.31
Fall 2003
row
address
Each intersection
represents a
6-T SRAM cell
RAM Cell
Array
word (row) select
data word
331 Lec19.32
column
address
One memory row holds a block of
data, so the column address
selects the requested word from
that block
Fall 2003
r
o
w
d
e
c
o
d
e
r
Each intersection
represents a
1-T DRAM cell
RAM Cell
Array
row
address
331 Lec19.33
.
. . data bit
ord
w
data
331 Lec19.34
Fall 2003
Column
Address
N cols
DRAM
N rows
Row
Address
M-bit Output
M bits
Cycle Time
1st M-bit Access
RAS
CAS
Row Address
331 Lec19.35
Col Address
Row Address
Col Address
Fall 2003
Memory interleaving
www.chips.ibm.com/products/memory/88H2011/88H2011. pdf
www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.
htm
www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D.
htm
Rambus DRAMS
www.rambus.com/developer/quickfind_documents.html
www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669B
.
htm
www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323HA
331 Lec19.36
Fall 2003
CPU
Memory
Access Time
D1 available
Start Access for D1
D2 available
Memory
Bank 0
Memory
Bank 1
Access Bank 0
Memory
Bank 2
Access Bank 1
331 Lec19.37
Memory
Bank 3
Access Bank 2
Access Bank 3
We can Access Bank 0 again
Fall 2003
331 Lec19.38
Fall 2003
Column
Address
N cols
Row
Address
N rows
N x M SRAM
M bits
M-bit Output
2nd M-bit
3rd M-bit
4th M-bit
RAS
CAS
Row Address
331 Lec19.39
Col Address
Col Address
Col Address
Col Address
Fall 2003
Performance
1000
CPU
Moores Law
Proc
60%/year
(2X/1.5yr)
Processor-Memory
Performance Gap:
(grows 50% / year)
100
10
DRAM
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
9%/year
(2X/10yrs)
Time
331 Lec19.40
Fall 2003
331 Lec19.41
Address Space
2n - 1
Fall 2003
Lower Level
Memory
Blk X
From Processor
331 Lec19.42
Blk Y
Fall 2003
Hit Rate: the fraction of memory accesses found in the upper level
Lower Level
Memory
Blk X
From Processor
Blk Y
331 Lec19.43
Fall 2003
by compiler (programmer?)
by the hardware
331 Lec19.44
Fall 2003
Summa
ry
DRAM is slow but cheap and dense
Good choice for presenting the user with a BIG memory system
331 Lec19.45
Fall 2003