Anda di halaman 1dari 45

14:332:331

Computer Architecture and Assembly Language


Fall 2003

Week 12

Buses and I/O system


[Adapted from Dave Pattersons UCB CS152 slides and
Mary Jane Irwins PSU CSE331 slides]
331 Lec19.1

Fall 2003

Heads Up

This weeks material

Buses: Connecting I/O devices


- Reading assignment PH 8.4

Memory hierarchies
- Reading assignment PH 7.1 and B.5

Reminders

Next weeks material

Basics of caches
- Reading assignment PH 7.2

331 Lec19.2

Fall 2003

Review: Major Components of a Computer


Processor
Control

Devices
Memory

Datapath

Input

Secondary
Memory
(Disk)

Main
Memory

Cache

331 Lec19.3

Output

Fall 2003

Input and Output Devices

I/O devices are incredibly diverse wrt

Behavior

Partner

Data rate

Device

Behavior

Partner

Data rate (KB/sec)

Keyboard

input

human

0.01

Mouse

input

human

0.02

Laser printer

output

human

200.00

Graphics display

output

human

60,000.00

Network/LAN

input or
output

machine

500.00-6000.00

Floppy disk

storage

machine

100.00

Magnetic disk

storage

machine

2000.00-10,000.00

331 Lec19.4

Fall 2003

Magnetic Disk

Purpose

Long term, nonvolatile storage


Lowest level in the memory hierarchy
- slow, large, inexpensive

General structure

A rotating platter coated with a magnetic surface

Use a moveable read/write head to access the disk

Advantages of hard disks over floppy disks

Platters are more rigid (metal or glass) so they can be larger

Higher density because it can be controlled more precisely

Higher data rate because it spins faster

Can incorporate more than one platter

331 Lec19.5

Fall 2003

Organization of a Magnetic Disk


Sector

Platters
Track

Typical numbers (depending on the disk size)

1 to 15 (2 surface) platters per disk with 1 to 8 diameter

1,000 to 5,000 tracks per surface

63 to 256 sectors per track


- the smallest unit that can be read/written (typically 512 to 1,024 B)

Traditionally all tracks have the same number of sectors


- Newer disks with smart controllers can record more sectors on
the outer tracks (constant bit density)

331 Lec19.6

Fall 2003

Magnetic Disk Characteristic

Cylinder: all the tracks under the heads


at a given point on all surfaces

Read/write data is a three-stage process:

Seek time: position the arm over the


proper track (6 to 14 ms avg.)
- due to locality of disk references
the actual average seek time may
be only 25% to 33% of the
advertised number

Track
Sector
Cylinder

Head

Platter

Rotational latency: wait for the desired sector


to rotate under the read/write head ( of 1/RPM)

Transfer time: transfer a block of bits (sector)


under the read-write head (2 to 20 MB/sec typical)

Controller time: the overhead the disk controller imposes in


performing an disk I/O access (typically < 2 ms)

331 Lec19.7

Fall 2003

Magnetic Disk Examples


Characteristic

Sun X6713A

Toshiba MK2016

3.5

2.5

Capacity

73 GB

20 GB

MTTF (k hrs)

1,200

300

Disk diameter (inches)

# of platters - heads

2-4

# cylinders

16,383

# B/sector - # sectors/track

512 - 63

Rotation speed (RPM)

10,000

4,200

Max. - Avg. seek time (ms)

? - 6.6

24 - 13

7.14

35 MB/sec

16.6 MB/sec

Avg. rot. latency (ms)


Transfer rate (PIO)
Power (watts)

< 2.5

Volume (in3)

4.01

Weight (oz)

3.49

331 Lec19.8

Fall 2003

I/O System Interconnect Issues


Processor

bus

Main
Memory

Receiver
Keyboard

A bus is a shared communication link (a set of


wires used to connect multiple subsystems)

331 Lec19.9

Performance

Expandability

Resilience in the face of failure fault tolerance


Fall 2003

Performance Measures

Latency (execution time, response time) is the total


time from the start to finish of one instruction or
action

usually used to measure processor performance

Throughput total amount of work done in a given


amount of time

aka execution bandwidth

the number of operations performed per second

Bandwidth amount of information communicated


across an interconnect (e.g., a bus) per unit time

the bit width of the operation * rate of the operation

usually used to measure I/O performance

331 Lec19.10

Fall 2003

I/O System Expandability

Usually have more than one I/O device in the system

each I/O device is controlled by an I/O Controller


interrupt signals

Processor
Cache
Memory

Memory - I/O Bus

Main
Memory

I/O
Controller
Disk

331 Lec19.11

Disk

I/O
Controller
Terminal

I/O
Controller
Network

Fall 2003

Bus Characteristics
Control Lines
Data Lines

Control lines

Signal requests and acknowledgments

Indicate what type of information is on the data lines

Data lines

Data, complex commands, and addresses

Bus transaction consists of

Sending the address

Receiving (or sending) the data

331 Lec19.12

Fall 2003

Output (Read) Bus Transaction

Defined by what they do to memory

read = output: transfers data from memory (read) to I/O


device (write)

Step 1: Processor sends read request and read address to memory


Control
Main Memory

Processor
Data
Step 2: Memory accesses data
Control

Main Memory

Processor
Data
Step 3: Memory transfers data to disk
Control

Main Memory

Processor
Data
331 Lec19.13

Fall 2003

Input (Write) Bus Transaction

Defined by what they do to memory

write = input: transfers data from I/O device (read) to


memory (write)

Step 1: Processor sends write request and write address to memory


Control
Main Memory

Processor
Data

Step 2: Disk transfers data to memory


Control
Main Memory

Processor
Data

331 Lec19.14

Fall 2003

Advantages and Disadvantages of Buses

Advantages

Versatility:
- New devices can be added easily
- Peripherals can be moved between computer systems that
use the same bus standard

Low Cost:
- A single set of wires is shared in multiple ways

Disadvantages

It creates a communication bottleneck


- The bus bandwidth limits the maximum I/O throughput

The maximum bus speed is largely limited by


- The length of the bus
- The number of devices on the bus

331 Lec19.15

It needs to support a range of devices with widely varying


latencies and data transfer rates
Fall 2003

Types of
Buses
Processor-Memory Bus (proprietary)

Short and high speed

Matched to the memory system to maximize the memoryprocessor bandwidth

Optimized for cache block transfers

I/O Bus (industry standard, e.g., SCSI, USB, ISA, IDE)

Usually is lengthy and slower

Needs to accommodate a wide range of I/O devices

Connects to the processor-memory bus or backplane bus

Backplane Bus (industry standard, e.g., PCI)

The backplane is an interconnection structure within the


chassis

Used as an intermediary bus connecting I/O busses to the


processor-memory bus

331 Lec19.16

Fall 2003

A Two Bus
System

Processor-Memory Bus

Processor

Memory
Bus
Adaptor
I/O
Bus

Bus
Adaptor

Bus
Adaptor

I/O
Bus

I/O
Bus

I/O buses tap into the processor-memory bus via Bus


Adaptors (that do speed matching between buses)

Processor-memory bus: mainly for processor-memory


traffic

I/O busses: provide expansion slots for I/O devices

331 Lec19.17

Fall 2003

A Three Bus
System

Processor-Memory Bus

Processor

Memory
Bus
Adaptor

Backplane Bus

Bus
Adaptor
Bus
Adaptor

I/O Bus
I/O Bus

A small number of Backplane Buses tap into the ProcessorMemory Bus

Processor-Memory Bus is used for processor memory traffic

I/O buses are connected to the Backplane Bus

Advantage: loading on the Processor-Memory Bus is greatly


reduced

331 Lec19.18

Fall 2003

I/O System Example (Apple Mac 7200)

Typical of midrange to high-end desktop system in


1997
Processor
Cache
Memory
Main
Memory

PCI
Interface/
Memory
Controller

Processor-Memory Bus
Audio I/O

Serial ports

I/O
Controller

I/O
Controller

CDRom
Disk
Tape
331 Lec19.19

SCSI bus

PCI
I/O
Controller
Graphic
Terminal

I/O
Controller
Network
Fall 2003

Example: Pentium System


Organization
Processor-Memory
Bus

Memory controller
(Northbridge)

PCI Bus

I/O Busses
http://developer.intel.com/design/chipsets/850/animate.htm?iid=PCG+devside&
331 Lec19.20

Fall 2003

A Bus Transaction

A bus transaction includes three parts:

Gaining access to the bus

- arbitration

Issuing the command (and address)

- request

Transferring the data

- action

Control: Master initiates requests


Bus
Master

Data can go either way

Bus
Slave

Gaining access to the bus

How is the bus reserved by a devices that wishes to use it?

Chaos is avoided by a master-slave arrangement


- The bus master initiates and controls all bus requests

In the simplest system:

The processor is the only bus master

Major drawback - the processor must be involved in every


bus transaction
331 Lec19.21
Fall 2003

Single Master Bus Transaction

All bus requests are controlled by the processor

it initiates the bus cycle on behalf of the requesting device

Step 1: Disk wants to use the bus so it generates a bus request to processor
Control
Memory

Processor
Data

Step 2: Processor responds and generates appropriate control signals


Control
Memory

Processor
Data
Step 3: Processor gives slave (disk) permission to use the bus
Control
Processor
Data

331 Lec19.22

Memory

Fall 2003

Multiple Potential Bus Masters: Arbitration

Bus arbitration scheme:

A bus master wanting to use the bus asserts the bus request

A bus master cannot use the bus until its request is granted

A bus master must release the bus after its use

Bus arbitration schemes usually try to balance two


factors:

Bus priority - the highest priority device should be serviced first

Fairness - Even the lowest priority device should never


be completely locked out from using the bus

Bus arbitration schemes can be divided into four


broad classes

Daisy chain arbitration: all devices share 1 request line

Centralized, parallel arbitration: multiple request and grant lines

Distributed arbitration by self-selection: each device wanting the bus


places a code indicating its identity on the bus

Distributed arbitration by collision detection: Ethernet uses this

331 Lec19.23

Fall 2003

Centralized Parallel
Arbitration
Device
1
Grant1
Bus
Arbiter

Device
2

Device N

Req
Grant2

Req
GrantN

Req

Control
Data

Used in essentially all backplane and high-speed


I/O busses

331 Lec19.24

Fall 2003

Synchronous and Asynchronous


Buses
Synchronous Bus

Includes a clock in the control lines

A fixed protocol for communication that is relative to the clock

Advantage: involves very little logic and can run very fast

Disadvantages:
- Every device on the bus must run at the same clock rate
- To avoid clock skew, they cannot be long if they are fast

Asynchronous Bus

It is not clocked, so requires handshaking protocol (req, ack)


- Implemented with additional control lines

Advantages:
- Can accommodate a wide range of devices
- Can be lengthened without worrying about clock skew or
synchronization problems

Disadvantage: slow(er)

331 Lec19.25

Fall 2003

Asynchronous Handshaking Protocol

Output (read) data from memory to an I/O device.


ReadReq
Data
Ack

2
addr

data

3
4

6
5

DataRdy
I/O device signals a request by raising ReadReq and putting the addr on
the data lines
1.

Memory sees ReadReq, reads addr from data lines, and raises Ack

2.

I/O device sees Ack and releases the ReadReq and data lines

3.

Memory sees ReadReq go low and drops Ack

4.

When memory has data ready, it places it on data lines and raises DataRdy

5.

I/O device sees DataRdy, reads the data from data lines, and raises Ack

6.

Memory sees Ack, releases the data lines, and drops DataRdy

7.

I/O device sees DataRdy go low and drops Ack

331 Lec19.26

Fall 2003

Key Characteristics of Two Bus Standards


Characteristic
Type
Data bus width
Addr/data muxed?
# of masters
Arbitration
Clocking
Peak bandwidth
Typical bandwidth
Max. devices
Max. length

331 Lec19.27

PCI

SCSI

backplane

I/O

32 or 64

8 to 32

multiplexed

multiplexed

multiple

multiple

centralized

self-selection

synchronous
- 66 MHz)

(33

asynchronous

133 - 512 MB/sec

5 MB/sec

80 MB/sec

1.5 MB/sec

32 per bus segment

7 to 31

0.5 meters

25 meters

Fall 2003

Review: Major Components of a Computer

Processor
Control
Datapath

331 Lec19.28

Devices
Memory

Input
Output

Fall 2003

A Typical Memory Hierarchy

By taking advantage of the principle of locality:

Present the user with as much memory as is available in the


cheapest technology.

Provide access at the speed offered by the fastest technology.


On-Chip Components
Control

eDRAM

.1s

1s

10s

100s

1,000s

Size (bytes):

100s

Ks

10Ks

Ms

Ts

Cost:
331 Lec19.29

ITLB DTLB

Speed (ns):

Datapath

RegFile

Instr Data
Cache Cache

Secondary
Memory
(Disk)

Second
Level
Cache
(SRAM)

highest

Main
Memory
(DRAM)

lowest
Fall 2003

Characteristics of the Memory Hierarchy


Processor
4-8 bytes (word)

Increasing
distance
from the
processor in
access time

L1$
8-32 bytes (block)

L2$
1 block

Inclusive what
is in L1$ is a
subset of what
is in L2$ is a
subset of what
is in MM that is
a subset of is
in SM

Main Memory
1,023+ bytes (disk sector = page)

Secondary Memory

(Relative) size of the memory at each level

331 Lec19.30

Fall 2003

Memory Hierarchy Technologies

Random Access

Random is good: access time is the same for all locations

DRAM: Dynamic Random Access Memory


- High density (1 transistor cells), low power, cheap, slow
- Dynamic: need to be refreshed regularly (~ every 8 ms)

SRAM: Static Random Access Memory


- Low density (6 transistor cells), high power, expensive, fast
- Static: content will last forever (until power turned off)

Size: DRAM/SRAM 4 to 8

Cost/Cycle time: SRAM/DRAM 8 to 16

Non-so-random Access Technology

Access time varies from location to location and from time to


time (e.g., Disk, CDROM)

331 Lec19.31

Fall 2003

Classical SRAM Organization (~Square)


bit (data) lines
r
o
w
d
e
c
o
d
e
r

row
address

Each intersection
represents a
6-T SRAM cell
RAM Cell
Array
word (row) select

Column Selector &


I/O Circuits

data word
331 Lec19.32

column
address
One memory row holds a block of
data, so the column address
selects the requested word from
that block
Fall 2003

Classical DRAM Organization (~Square Planes)


..

bit (data) lines

r
o
w
d
e
c
o
d
e
r

Each intersection
represents a
1-T DRAM cell

RAM Cell
Array

word (row) select


column
address

row
address

Column Selector &


I/O Circuits
data bit
data bit

331 Lec19.33

.
. . data bit
ord
w
data

The column address


selects the requested
bit from the row in
each
plane
Fall 2003

RAM Memory Definitions

Caches use SRAM for speed

Main Memory is DRAM for density

Addresses divided into 2 halves (row and column)


- RAS or Row Access Strobe triggering row decoder
- CAS or Column Access Strobe triggering column selector

Performance of Main Memory DRAMs

Latency: Time to access one word


- Access Time: time between request and when word arrives
- Cycle Time: time between requests
- Usually cycle time > access time

Bandwidth: How much data can be supplied per unit time


- width of the data channel * the rate at which it can be used

331 Lec19.34

Fall 2003

Classical DRAM Operation


DRAM Organization:

N rows x N column x M-bit

Read or Write M-bit at a time

Each M-bit access requires


a RAS / CAS cycle

Column
Address

N cols

DRAM
N rows

Row
Address

M-bit Output

M bits

Cycle Time
1st M-bit Access

2nd M-bit Access

RAS
CAS
Row Address
331 Lec19.35

Col Address

Row Address

Col Address
Fall 2003

Ways to Improve DRAM Performance

Memory interleaving

Fast Page Mode DRAMs FPM DRAMs

Extended Data Out DRAMs EDO DRAMs

www.chips.ibm.com/products/memory/88H2011/88H2011. pdf

Synchronous DRAMS SDRAMS

www.usa.samsungsemi.com/products/newsummary/asyncdram/K4F661612D.
htm

www.usa.samsungsemi.com/products/newsummary/sdramcomp/K4S641632D.
htm

Rambus DRAMS

www.rambus.com/developer/quickfind_documents.html

www.usa.samsungsemi.com/products/newsummary/rambuscomp/K4R271669B
.
htm

Double Data Rate DRAMs DDR DRAMS

www.usa.samsungsemi.com/products/newsummary/ddrsyncdram/K4D62323HA
331 Lec19.36
Fall 2003

Increasing Bandwidth Access


pattern without Interleaving:
Interleaving
Cycle Time

CPU

Memory

Access Time

D1 available
Start Access for D1

Start Access for D2

D2 available
Memory
Bank 0

Access pattern with 4-way Interleaving:


CPU

Memory
Bank 1

Access Bank 0

Memory
Bank 2

Access Bank 1

331 Lec19.37

Memory
Bank 3

Access Bank 2
Access Bank 3
We can Access Bank 0 again
Fall 2003

Problems with Interleaving

How many banks?

Ideally, the number of banks number of clocks we have to


wait to access the next word in the bank

Only works for sequential accesses (i.e., first word requested


in first bank, second word requested in second bank, etc.)

Increasing DRAM sizes => fewer chips => harder to


have banks

Growth bits/chip DRAM : 50%-60%/yr

Only can use for very large memory systems (e.g.,


those encountered in supercomputer systems)

331 Lec19.38

Fall 2003

Fast Page Mode DRAM Operation


Fast Page Mode DRAM

Column
Address

N cols

N x M SRAM to save a row


DRAM

After a row is read into the


SRAM register

Row
Address

N rows

Only CAS is needed to access


other M-bit blocks on that row

RAS remains asserted while CAS


is toggled

N x M SRAM
M bits
M-bit Output

1st M-bit Access

2nd M-bit

3rd M-bit

4th M-bit

RAS
CAS
Row Address
331 Lec19.39

Col Address

Col Address

Col Address

Col Address
Fall 2003

Why Care About the Memory Hierarchy?


Processor-DRAM Memory Gap

Performance

1000

CPU

Moores Law

Proc
60%/year
(2X/1.5yr)

Processor-Memory
Performance Gap:
(grows 50% / year)

100
10

DRAM

1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000

DRAM
9%/year
(2X/10yrs)

Time
331 Lec19.40

Fall 2003

Memory Hierarchy: Goals

Fact: Large memories are slow, fast memories are


small

How do we create a memory that gives the illusion of


being large, cheap and fast (most of the time)?
by taking advantage of

The Principle of Locality: Programs access a


relatively small portion of the address space at any
instant of time.
Probability
of reference
0

331 Lec19.41

Address Space

2n - 1

Fall 2003

Memory Hierarchy: Why Does it Work?

Temporal Locality (Locality in Time):


=> Keep most recently accessed data items closer to the
processor

Spatial Locality (Locality in Space):


=> Move blocks consists of contiguous words to the upper
levels
To Processor Upper Level
Memory

Lower Level
Memory

Blk X

From Processor

331 Lec19.42

Blk Y

Fall 2003

Memory Hierarchy: Terminology

Hit: data appears in some block in the upper level (Block X)

Hit Rate: the fraction of memory accesses found in the upper level

Hit Time: Time to access the upper level which consists of


RAM access time + Time to determine hit/miss

To Processor Upper Level


Memory

Lower Level
Memory

Blk X

From Processor

Blk Y

Miss: data needs to be retrieve from a block in the lower level


(Block Y)

Miss Rate = 1 - (Hit Rate)

Miss Penalty: Time to replace a block in the upper level


Time to deliver the block the processor

Hit Time << Miss Penalty

331 Lec19.43

Fall 2003

How is the Hierarchy


Managed?

registers <-> memory

cache <-> main memory

by compiler (programmer?)

by the hardware

main memory <-> disks

331 Lec19.44

by the hardware and operating system (virtual memory)

by the programmer (files)

Fall 2003

Summa
ry
DRAM is slow but cheap and dense

SRAM is fast but expensive and not very dense

Good choice for presenting the user with a BIG memory system

Good choice for providing the user FAST access time

Two different types of locality

Temporal Locality (Locality in Time): If an item is referenced, it will


tend to be referenced again soon.

Spatial Locality (Locality in Space): If an item is referenced, items


whose addresses are close by tend to be referenced soon.

By taking advantage of the principle of locality:

Present the user with as much memory as is available in the


cheapest technology.

Provide access at the speed offered by the fastest technology.

331 Lec19.45

Fall 2003

Anda mungkin juga menyukai