Buses and I/O System: Computer Architecture and Assembly Language Fall 2003

14:332:331
Computer Architecture and Assembly Language

Fall 2003
Week 12
Buses and I/O system

[Adapted from Dave Pattersons UCB CS152 slides and
Mary Jane Irwins PSU CSE331 slides]
331 Lec19.1
Fall 2003
Heads Up
This weeks material
Buses: Connecting I/O devices

- Reading assignment PH 8.4
Memory hierarchies
- Reading assignment PH 7.1 and B.5
Reminders
Next weeks material
Basics of caches
- Reading assignment PH 7.2
331 Lec19.2
Fall 2003
Review: Major Components of a Computer

Processor
Control
Devices
Memory
Datapath
Input
Secondary
Memory
(Disk)
Main
Memory
Cache
331 Lec19.3
Output
Fall 2003
Input and Output Devices
I/O devices are incredibly diverse wrt
Behavior
Partner
Data rate
Device
Behavior
Partner
Data rate (KB/sec)
Keyboard
input
human
0.01
Mouse
input
human
0.02
Laser printer
output
human
200.00
Graphics display
output
human
60,000.00
Network/LAN
input or
output
machine
500.00-6000.00
Floppy disk
storage
machine
100.00
Magnetic disk
storage
machine
2000.00-10,000.00
331 Lec19.4
Fall 2003
Magnetic Disk
Purpose
Long term, nonvolatile storage

Lowest level in the memory hierarchy
- slow, large, inexpensive
General structure
A rotating platter coated with a magnetic surface
Use a moveable read/write head to access the disk
Advantages of hard disks over floppy disks
Platters are more rigid (metal or glass) so they can be larger
Higher density because it can be controlled more precisely
Higher data rate because it spins faster
Can incorporate more than one platter
331 Lec19.5
Fall 2003
Organization of a Magnetic Disk

Sector
Platters
Track
Typical numbers (depending on the disk size)
1 to 15 (2 surface) platters per disk with 1 to 8 diameter
1,000 to 5,000 tracks per surface
63 to 256 sectors per track

- the smallest unit that can be read/written (typically 512 to 1,024 B)
Traditionally all tracks have the same number of sectors

- Newer disks with smart controllers can record more sectors on
the outer tracks (constant bit density)
331 Lec19.6
Fall 2003
Magnetic Disk Characteristic
Cylinder: all the tracks under the heads

at a given point on all surfaces
Read/write data is a three-stage process:
Seek time: position the arm over the

proper track (6 to 14 ms avg.)
- due to locality of disk references
the actual average seek time may
be only 25% to 33% of the
advertised number
Track
Sector
Cylinder
Head
Platter
Rotational latency: wait for the desired sector

to rotate under the read/write head ( of 1/RPM)
Transfer time: transfer a block of bits (sector)

under the read-write head (2 to 20 MB/sec typical)
Controller time: the overhead the disk controller imposes in

performing an disk I/O access (typically < 2 ms)
331 Lec19.7
Fall 2003
Magnetic Disk Examples

Characteristic
Sun X6713A
Toshiba MK2016
3.5
2.5
Capacity
73 GB
20 GB
MTTF (k hrs)
1,200
300
Disk diameter (inches)
# of platters - heads
2-4
# cylinders
16,383
# B/sector - # sectors/track
512 - 63
Rotation speed (RPM)
10,000
4,200
Max. - Avg. seek time (ms)
? - 6.6
24 - 13
7.14
35 MB/sec
16.6 MB/sec
Avg. rot. latency (ms)

Transfer rate (PIO)
Power (watts)
< 2.5
Volume (in3)
4.01
Weight (oz)
3.49
331 Lec19.8
Fall 2003
I/O System Interconnect Issues

Processor
bus
Main
Memory
Receiver
Keyboard
A bus is a shared communication link (a set of

wires used to connect multiple subsystems)
331 Lec19.9
Performance
Expandability
Resilience in the face of failure fault tolerance

Fall 2003
Performance Measures
Latency (execution time, response time) is the total

time from the start to finish of one instruction or
action
usually used to measure processor performance
Throughput total amount of work done in a given

amount of time
aka execution bandwidth
the number of operations performed per second
Bandwidth amount of information communicated

across an interconnect (e.g., a bus) per unit time
the bit width of the operation * rate of the operation
usually used to measure I/O performance
331 Lec19.10
Fall 2003
I/O System Expandability
Usually have more than one I/O device in the system
each I/O device is controlled by an I/O Controller

interrupt signals
Processor
Cache
Memory
Memory - I/O Bus
Main
Memory
I/O
Controller
Disk
331 Lec19.11
Disk
I/O
Controller
Terminal
I/O
Controller
Network
Fall 2003
Bus Characteristics
Control Lines
Data Lines
Control lines
Signal requests and acknowledgments
Indicate what type of information is on the data lines
Data lines
Data, complex commands, and addresses
Bus transaction consists of
Sending the address
Receiving (or sending) the data
331 Lec19.12
Fall 2003
Output (Read) Bus Transaction
Defined by what they do to memory
read = output: transfers data from memory (read) to I/O

device (write)
Step 1: Processor sends read request and read address to memory

Control
Main Memory
Processor
Data
Step 2: Memory accesses data
Control
Main Memory
Processor
Data
Step 3: Memory transfers data to disk
Control
Main Memory
Processor
Data
331 Lec19.13
Fall 2003
Input (Write) Bus Transaction
Defined by what they do to memory
write = input: transfers data from I/O device (read) to

memory (write)
Step 1: Processor sends write request and write address to memory

Control
Main Memory
Processor
Data
Step 2: Disk transfers data to memory

Control
Main Memory
Processor
Data
331 Lec19.14
Fall 2003
Advantages and Disadvantages of Buses
Advantages
Versatility:
- New devices can be added easily
- Peripherals can be moved between computer systems that
use the same bus standard
Low Cost:
- A single set of wires is shared in multiple ways
Disadvantages
It creates a communication bottleneck

- The bus bandwidth limits the maximum I/O throughput
The maximum bus speed is largely limited by

- The length of the bus
- The number of devices on the bus
331 Lec19.15
It needs to support a range of devices with widely varying

latencies and data transfer rates
Fall 2003
Types of
Buses
Processor-Memory Bus (proprietary)
Short and high speed
Matched to the memory system to maximize the memoryprocessor bandwidth
Optimized for cache block transfers
I/O Bus (industry standard, e.g., SCSI, USB, ISA, IDE)
Usually is lengthy and slower
Needs to accommodate a wide range of I/O devices
Connects to the processor-memory bus or backplane bus
Backplane Bus (industry standard, e.g., PCI)
The backplane is an interconnection structure within the

chassis
Used as an intermediary bus connecting I/O busses to the

processor-memory bus
331 Lec19.16
Fall 2003
A Two Bus
System
Processor-Memory Bus
Processor
Memory
Bus
Adaptor
I/O
Bus
Bus
Adaptor
Bus
Adaptor
I/O
Bus
I/O
Bus
I/O buses tap into the processor-memory bus via Bus

Adaptors (that do speed matching between buses)
Processor-memory bus: mainly for processor-memory

traffic
I/O busses: provide expansion slots for I/O devices
331 Lec19.17
Fall 2003
A Three Bus
System
Processor
Memory
Bus
Adaptor
Backplane Bus
Bus
Adaptor
Bus
Adaptor
I/O Bus
I/O Bus
A small number of Backplane Buses tap into the ProcessorMemory Bus
Processor-Memory Bus is used for processor memory traffic
I/O buses are connected to the Backplane Bus
Advantage: loading on the Processor-Memory Bus is greatly

reduced
331 Lec19.18
Fall 2003
I/O System Example (Apple Mac 7200)
Typical of midrange to high-end desktop system in

1997
Processor
Cache
Memory
Main
Memory
PCI
Interface/
Memory
Controller
Audio I/O
Serial ports
I/O
Controller
I/O
Controller
CDRom
Disk
Tape
331 Lec19.19
SCSI bus
PCI
I/O
Controller
Graphic
Terminal
I/O
Controller
Network
Fall 2003
Example: Pentium System

Organization
Processor-Memory
Bus
Memory controller
(Northbridge)
PCI Bus
I/O Busses
http://developer.intel.com/design/chipsets/850/animate.htm?iid=PCG+devside&
331 Lec19.20
Fall 2003
A Bus Transaction
A bus transaction includes three parts:
Gaining access to the bus
- arbitration
Issuing the command (and address)
- request
Transferring the data
- action
Control: Master initiates requests

Bus
Master
Data can go either way
Bus
Slave
Gaining access to the bus
How is the bus reserved by a devices that wishes to use it?
Chaos is avoided by a master-slave arrangement

- The bus master initiates and controls all bus requests
In the simplest system:
The processor is the only bus master
Major drawback - the processor must be involved in every

bus transaction
331 Lec19.21
Fall 2003
Single Master Bus Transaction
All bus requests are controlled by the processor
it initiates the bus cycle on behalf of the requesting device
Step 1: Disk wants to use the bus so it generates a bus request to processor
Control
Memory
Processor
Data
Step 2: Processor responds and generates appropriate control signals

Control
Memory
Processor
Data
Step 3: Processor gives slave (disk) permission to use the bus
Control
Processor
Data
331 Lec19.22
Memory
Fall 2003
Multiple Potential Bus Masters: Arbitration
Bus arbitration scheme:
A bus master wanting to use the bus asserts the bus request
A bus master cannot use the bus until its request is granted
A bus master must release the bus after its use
Bus arbitration schemes usually try to balance two

factors:
Bus priority - the highest priority device should be serviced first
Fairness - Even the lowest priority device should never

be completely locked out from using the bus
Bus arbitration schemes can be divided into four

broad classes
Daisy chain arbitration: all devices share 1 request line
Centralized, parallel arbitration: multiple request and grant lines
Distributed arbitration by self-selection: each device wanting the bus

places a code indicating its identity on the bus
Distributed arbitration by collision detection: Ethernet uses this
331 Lec19.23
Fall 2003
Centralized Parallel
Arbitration
Device
1
Grant1
Bus
Arbiter
Device
2
Device N
Req
Grant2
Req
GrantN
Req
Control
Data
Used in essentially all backplane and high-speed

I/O busses
331 Lec19.24
Fall 2003
Synchronous and Asynchronous

Buses
Synchronous Bus
Includes a clock in the control lines
A fixed protocol for communication that is relative to the clock
Advantage: involves very little logic and can run very fast
Disadvantages:
- Every device on the bus must run at the same clock rate
- To avoid clock skew, they cannot be long if they are fast
Asynchronous Bus
It is not clocked, so requires handshaking protocol (req, ack)

- Implemented with additional control lines
Advantages:
- Can accommodate a wide range of devices
- Can be lengthened without worrying about clock skew or
synchronization problems
Disadvantage: slow(er)
331 Lec19.25
Fall 2003
Asynchronous Handshaking Protocol
Output (read) data from memory to an I/O device.

ReadReq
Data
Ack
2
addr
data
3
4
6
5
DataRdy
I/O device signals a request by raising ReadReq and putting the addr on
the data lines
1.
Memory sees ReadReq, reads addr from data lines, and raises Ack
2.
I/O device sees Ack and releases the ReadReq and data lines
3.
Memory sees ReadReq go low and drops Ack
4.
When memory has data ready, it places it on data lines and raises DataRdy
5.
I/O device sees DataRdy, reads the data from data lines, and raises Ack
6.
Memory sees Ack, releases the data lines, and drops DataRdy
7.
I/O device sees DataRdy go low and drops Ack
331 Lec19.26
Fall 2003
Key Characteristics of Two Bus Standards

Characteristic
Type
Data bus width
Addr/data muxed?
# of masters
Arbitration
Clocking
Peak bandwidth
Typical bandwidth
Max. devices
Max. length
331 Lec19.27
PCI
SCSI
backplane
I/O
32 or 64
8 to 32
multiplexed
multiplexed
multiple
multiple
centralized
self-selection
synchronous
- 66 MHz)
(33
asynchronous
133 - 512 MB/sec
5 MB/sec
80 MB/sec
1.5 MB/sec
32 per bus segment
7 to 31
0.5 meters
25 meters
Fall 2003
Review: Major Components of a Computer
Processor
Control
Datapath
331 Lec19.28
Devices
Memory
Input
Output
Fall 2003
A Typical Memory Hierarchy
By taking advantage of the principle of locality:
Present the user with as much memory as is available in the

cheapest technology.
Provide access at the speed offered by the fastest technology.

On-Chip Components
Control
eDRAM
.1s
1s
10s
100s
1,000s
Size (bytes):
100s
Ks
10Ks
Ms
Ts
Cost:
331 Lec19.29
ITLB DTLB
Speed (ns):
Datapath
RegFile
Instr Data
Cache Cache
Secondary
Memory
(Disk)
Second
Level
Cache
(SRAM)
highest
Main
Memory
(DRAM)
lowest
Fall 2003
Characteristics of the Memory Hierarchy

Processor
4-8 bytes (word)
Increasing
distance
from the
processor in
access time
L1$
8-32 bytes (block)
L2$
1 block
Inclusive what
is in L1$ is a
subset of what
is in L2$ is a
subset of what
is in MM that is
a subset of is
in SM
Main Memory
1,023+ bytes (disk sector = page)
Secondary Memory
(Relative) size of the memory at each level
331 Lec19.30
Fall 2003
Memory Hierarchy Technologies
Random Access
Random is good: access time is the same for all locations
DRAM: Dynamic Random Access Memory

- High density (1 transistor cells), low power, cheap, slow
- Dynamic: need to be refreshed regularly (~ every 8 ms)
SRAM: Static Random Access Memory

- Low density (6 transistor cells), high power, expensive, fast
- Static: content will last forever (until power turned off)
Size: DRAM/SRAM 4 to 8
Cost/Cycle time: SRAM/DRAM 8 to 16
Non-so-random Access Technology
Access time varies from location to location and from time to

time (e.g., Disk, CDROM)
331 Lec19.31
Fall 2003
Classical SRAM Organization (~Square)

bit (data) lines
r
o
w
d
e
c
o
d
e
r
row
address
Each intersection
represents a
6-T SRAM cell
RAM Cell
Array
word (row) select
Column Selector &

I/O Circuits
data word
331 Lec19.32
column
address
One memory row holds a block of
data, so the column address
selects the requested word from
that block
Fall 2003
Classical DRAM Organization (~Square Planes)

..
bit (data) lines
r
o
w
d
e
c
o
d
e
r
Each intersection
represents a
1-T DRAM cell
RAM Cell
Array
word (row) select

column
address
row
address
Column Selector &

I/O Circuits
data bit
data bit
331 Lec19.33
.
. . data bit
ord
w
data
The column address

selects the requested
bit from the row in
each
plane
Fall 2003
RAM Memory Definitions
Caches use SRAM for speed
Main Memory is DRAM for density
Addresses divided into 2 halves (row and column)

- RAS or Row Access Strobe triggering row decoder
- CAS or Column Access Strobe triggering column selector
Performance of Main Memory DRAMs
Latency: Time to access one word

- Access Time: time between request and when word arrives
- Cycle Time: time between requests
- Usually cycle time > access time
Bandwidth: How much data can be supplied per unit time

- width of the data channel * the rate at which it can be used
331 Lec19.34
Fall 2003
Classical DRAM Operation

DRAM Organization:
N rows x N column x M-bit
Read or Write M-bit at a time
Each M-bit access requires

a RAS / CAS cycle
Column
Address
N cols
DRAM
N rows
Row
Address
M-bit Output
M bits
Cycle Time
1st M-bit Access
2nd M-bit Access
RAS
CAS
Row Address
331 Lec19.35
Col Address
Row Address
Col Address
Fall 2003
Ways to Improve DRAM Performance
Memory interleaving
Fast Page Mode DRAMs FPM DRAMs
Extended Data Out DRAMs EDO DRAMs
www.chips.ibm.com/products/memory/88H2011/88H2011. pdf
Synchronous DRAMS SDRAMS
www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.
htm
www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D.
htm
Rambus DRAMS
www.rambus.com/developer/quickfind_documents.html
www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669B
.
htm
Double Data Rate DRAMs DDR DRAMS
www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323HA
331 Lec19.36
Fall 2003
Increasing Bandwidth Access

pattern without Interleaving:
Interleaving
Cycle Time
CPU
Memory
Access Time
D1 available
Start Access for D1
Start Access for D2
D2 available
Memory
Bank 0
Access pattern with 4-way Interleaving:

CPU
Memory
Bank 1
Access Bank 0
Memory
Bank 2
Access Bank 1
331 Lec19.37
Memory
Bank 3
Access Bank 2
Access Bank 3
We can Access Bank 0 again
Fall 2003
Problems with Interleaving
How many banks?
Ideally, the number of banks number of clocks we have to

wait to access the next word in the bank
Only works for sequential accesses (i.e., first word requested

in first bank, second word requested in second bank, etc.)
Increasing DRAM sizes => fewer chips => harder to

have banks
Growth bits/chip DRAM : 50%-60%/yr
Only can use for very large memory systems (e.g.,

those encountered in supercomputer systems)
331 Lec19.38
Fall 2003
Fast Page Mode DRAM Operation

Fast Page Mode DRAM
Column
Address
N cols
N x M SRAM to save a row

DRAM
After a row is read into the

SRAM register
Row
Address
N rows
Only CAS is needed to access

other M-bit blocks on that row
RAS remains asserted while CAS

is toggled
N x M SRAM
M bits
M-bit Output
1st M-bit Access
2nd M-bit
3rd M-bit
4th M-bit
RAS
CAS
Row Address
331 Lec19.39
Col Address
Col Address
Col Address
Col Address
Fall 2003
Why Care About the Memory Hierarchy?

Processor-DRAM Memory Gap
Performance
1000
CPU
Moores Law
Proc
60%/year
(2X/1.5yr)
Processor-Memory
Performance Gap:
(grows 50% / year)
100
10
DRAM
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
DRAM
9%/year
(2X/10yrs)
Time
331 Lec19.40
Fall 2003
Memory Hierarchy: Goals
Fact: Large memories are slow, fast memories are

small
How do we create a memory that gives the illusion of

being large, cheap and fast (most of the time)?
by taking advantage of
The Principle of Locality: Programs access a

relatively small portion of the address space at any
instant of time.
Probability
of reference
0
331 Lec19.41
Address Space
2n - 1
Fall 2003
Memory Hierarchy: Why Does it Work?
Temporal Locality (Locality in Time):

=> Keep most recently accessed data items closer to the
processor
Spatial Locality (Locality in Space):

=> Move blocks consists of contiguous words to the upper
levels
To Processor Upper Level
Memory
Lower Level
Memory
Blk X
From Processor
331 Lec19.42
Blk Y
Fall 2003
Memory Hierarchy: Terminology
Hit: data appears in some block in the upper level (Block X)
Hit Rate: the fraction of memory accesses found in the upper level
Hit Time: Time to access the upper level which consists of

RAM access time + Time to determine hit/miss
To Processor Upper Level

Memory
Lower Level
Memory
Blk X
From Processor
Blk Y
Miss: data needs to be retrieve from a block in the lower level

(Block Y)
Miss Rate = 1 - (Hit Rate)
Miss Penalty: Time to replace a block in the upper level

Time to deliver the block the processor
Hit Time << Miss Penalty
331 Lec19.43
Fall 2003
How is the Hierarchy

Managed?
registers <-> memory
cache <-> main memory
by compiler (programmer?)
by the hardware
main memory <-> disks
331 Lec19.44
by the hardware and operating system (virtual memory)
by the programmer (files)
Fall 2003
Summa
ry
DRAM is slow but cheap and dense
SRAM is fast but expensive and not very dense
Good choice for presenting the user with a BIG memory system
Good choice for providing the user FAST access time
Two different types of locality
Temporal Locality (Locality in Time): If an item is referenced, it will

tend to be referenced again soon.
Spatial Locality (Locality in Space): If an item is referenced, items

whose addresses are close by tend to be referenced soon.
By taking advantage of the principle of locality:
Present the user with as much memory as is available in the

cheapest technology.
Provide access at the speed offered by the fastest technology.
331 Lec19.45
Fall 2003

Buses and I/O System: Computer Architecture and Assembly Language Fall 2003

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Buses and I/O System: Computer Architecture and Assembly Language Fall 2003

Diunggah oleh

Hak Cipta:

Format Tersedia

14:332:331

Computer Architecture and Assembly Language

Buses and I/O system

This weeks material

Buses: Connecting I/O devices

Next weeks material

Review: Major Components of a Computer

Input and Output Devices

I/O devices are incredibly diverse wrt

Data rate (KB/sec)

Long term, nonvolatile storage

A rotating platter coated with a magnetic surface

Use a moveable read/write head to access the disk

Advantages of hard disks over floppy disks

Platters are more rigid (metal or glass) so they can be larger

Higher density because it can be controlled more precisely

Higher data rate because it spins faster

Can incorporate more than one platter

Organization of a Magnetic Disk

Typical numbers (depending on the disk size)

1 to 15 (2 surface) platters per disk with 1 to 8 diameter

1,000 to 5,000 tracks per surface

63 to 256 sectors per track

Traditionally all tracks have the same number of sectors

Magnetic Disk Characteristic

Cylinder: all the tracks under the heads

Read/write data is a three-stage process:

Seek time: position the arm over the

Rotational latency: wait for the desired sector

Transfer time: transfer a block of bits (sector)

Controller time: the overhead the disk controller imposes in

Magnetic Disk Examples

Disk diameter (inches)

Rotation speed (RPM)

Max. - Avg. seek time (ms)

Avg. rot. latency (ms)

I/O System Interconnect Issues

A bus is a shared communication link (a set of

Resilience in the face of failure fault tolerance

Latency (execution time, response time) is the total

usually used to measure processor performance

Throughput total amount of work done in a given

aka execution bandwidth

the number of operations performed per second

Bandwidth amount of information communicated

the bit width of the operation * rate of the operation

usually used to measure I/O performance

I/O System Expandability

Usually have more than one I/O device in the system

each I/O device is controlled by an I/O Controller

Memory - I/O Bus

Signal requests and acknowledgments

Indicate what type of information is on the data lines

Data, complex commands, and addresses

Bus transaction consists of

Sending the address

Receiving (or sending) the data

Output (Read) Bus Transaction

Defined by what they do to memory

read = output: transfers data from memory (read) to I/O

Step 1: Processor sends read request and read address to memory

Input (Write) Bus Transaction

Defined by what they do to memory