Anda di halaman 1dari 18

3/2/15

Single-Cycle CPU Datapath Control

Review: Processor Design 5 steps


Step 1: Analyze instruc>on set to determine datapath
requirements
Meaning of each instruc>on is given by register transfers
Datapath must include storage element for ISA registers
Datapath must support each register transfer
Step 2: Select set of datapath components & establish
clock methodology
Step 3: Assemble datapath components that meet the
requirements
Step 4: Analyze implementa>on of each instruc>on to
determine seJng of control points that realizes the
register transfer
Step 5: Assemble the control logic
2

1
3/2/15

Register-Register Timing:
One Complete Cycle (Add/Sub)
Clk

PC Old Value New Value


Rs, Rt, Rd, Instruc>on Memory Access Time
Op, Func Old Value New Value
Delay through Control Logic
ALUctr Old Value New Value

RegWr Old Value New Value


Register File Access Time
busA, B Old Value New Value
ALU Delay
busW Old Value New Value
RegWr Rd Rs Rt ALUctr
5 5 5
Register Write
Rw Ra Rb busA 32
busW 32 Occurs Here

ALU
RegFile busB
32
clk 3

Register-Register Timing:
One Complete Cycle
Clk

PC Old Value New Value


Rs, Rt, Rd, Instruc>on Memory Access Time
Op, Func Old Value New Value
Delay through Control Logic
ALUctr Old Value New Value

RegWr Old Value New Value


Register File Access Time
busA, B Old Value New Value
ALU Delay
busW Old Value New Value
RegWr Rd Rs Rt ALUctr
5 5 5
Register Write
Rw Ra Rb busA 32
busW 32 Occurs Here
ALU

RegFile busB
32
clk 4

2
3/2/15

PuJng it All Together:A Single Cycle Datapath


Inst Instruction<31:0>

<21:25>

<16:20>

<11:15>
<0:15>
Memory
Adr
Rs Rt Rd Imm16
nPC_sel RegDst
Rd Rt Equal ALUctr MemtoReg
1 0 MemWr

4 RegWr Rs Rt
5 5 5
Adder

Rw Ra Rb busA 32
=
00

busW
32

ALU
RegFile
Mux

32 busB 0
PC

0
Adder

32
clk 32 WrEn Adr
PC Ext

Extender
clk imm16 1 Data In Data 1
16 32
Memory
clk
imm16
ExtOp ALUSrc

Datapath Control Signals


ExtOp: zero, sign MemWr: 1 write memory
ALUsrc: 0 regB; MemtoReg: 0 ALU; 1 Mem
1 immed RegDst: 0 rt; 1 rd
ALUctr: ADD, SUB, OR RegWr: 1 write register

RegDst Rd Rt ALUctr MemtoReg


Inst Address 1 0 MemWr
nPC_sel & Equal RegWr Rs Rt
4 5 5 5
busA 32
Adder

Rw Ra Rb
busW
32
00

0
ALU

32 RegFile busB 0
0
Mux

PC

32
clk 32 WrEn Adr
Adder

Extender

1
imm16 1 Data In Data 1
PC Ext

clk
16 32
Memory
ExtOp ALUSrc clk
6
imm16

3
3/2/15

Control Signals
Instruc>on<31:0>
Inst

<0:5>

<21:25>

<16:20>

<11:15>
<26:31>

<0:15>
Memory
Adr
Op Fun Rt Rs Rd Imm16

Control

nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg

DATA PATH

P&H Figure 4.17

4
3/2/15

Summary of the Control Signals (1/2)


inst Register Transfer
add R[rd] R[rs] + R[rt]; PC PC + 4
ALUsrc=RegB, ALUctr=ADD, RegDst=rd, RegWr, nPC_sel=+4

sub R[rd] R[rs] R[rt]; PC PC + 4


ALUsrc=RegB, ALUctr=SUB, RegDst=rd, RegWr, nPC_sel=+4
ori R[rt] R[rs] + zero_ext(Imm16); PC PC + 4
ALUsrc=Im, Extop=Z, ALUctr=OR, RegDst=rt,RegWr, nPC_sel=+4
lw R[rt] MEM[ R[rs] + sign_ext(Imm16)]; PC PC + 4
ALUsrc=Im, Extop=sn, ALUctr=ADD, MemtoReg, RegDst=rt, RegWr,
nPC_sel = +4
sw MEM[ R[rs] + sign_ext(Imm16)] R[rs]; PC PC + 4
ALUsrc=Im, Extop=sn, ALUctr = ADD, MemWr, nPC_sel = +4
beq if (R[rs] == R[rt]) then PC PC + sign_ext(Imm16)] || 00
else PC PC + 4
nPC_sel = br, ALUctr = SUB

Summary of the Control Signals


See func 10 0000 10 0010 We Dont Care :-)
op 00 0000 00 0000 00 1101 10 0011 10 1011 00 0100 00 0010
add sub ori lw sw beq jump
RegDst 1 1 0 0 x x x
ALUSrc 0 0 1 1 1 0 x
MemtoReg 0 0 0 1 x x x
RegWrite 1 1 1 1 0 0 0
MemWrite 0 0 0 0 1 0 0
nPCsel 0 0 0 0 0 1 ?
Jump 0 0 0 0 0 0 1
ExtOp x x 0 1 1 x x
ALUctr<2:0> Add Subtract Or Add Add Subtract x

31 26 21 16 11 6 0
R-type op rs rt rd shamt funct add, sub

I-type op rs rt immediate ori, lw, sw, beq

J-type op target address jump


10

5
3/2/15

Boolean Expressions for Controller


RegDst = add + sub
ALUSrc = ori + lw + sw
MemtoReg = lw
RegWrite = add + sub + ori + lw
MemWrite = sw
nPCsel = beq
Jump = jump
ExtOp = lw + sw
ALUctr[0] = sub + beq (assume ALUctr is 00 ADD, 01 SUB, 10 OR)
ALUctr[1] = or
Where:
rtype = ~op5 ~op4 ~op3 ~op2 ~op1 ~op0,
ori = ~op5 ~op4 op3 op2 ~op1 op0

How do we
lw = op5 ~op4 ~op3 ~op2 op1 op0 implement this in
sw = op5 ~op4 op3 ~op2 op1 op0
beq = ~op5 ~op4 ~op3 op2 ~op1 ~op0 gates?
jump = ~op5 ~op4 ~op3 ~op2 op1 ~op0
add = rtype func5 ~func4 ~func3 ~func2 ~func1 ~func0
sub = rtype func5 ~func4 ~func3 ~func2 func1 ~func0
11

Controller Implementa>on
opcode func

RegDst
add
ALUSrc
sub MemtoReg
ori RegWrite
OR logic MemWrite
AND logic lw
nPCsel
sw
Jump
beq ExtOp
jump ALUctr[0]
ALUctr[1]

12

6
3/2/15

Where Do Control Signals Come


From? Instruc>on<31:0>
Inst

<0:5>

<21:25>

<16:20>

<11:15>
<26:31>

<0:15>
Memory
Adr
Op Fun Rt Rs Rd Imm16

Control

nPC_sel RegWr RegDst ExtOp ALUSrc ALUctr MemWr MemtoReg

DATA PATH

13

Boolean Exprs for Controller


Instruc>on<31:0>
Inst Op 0-5 are really Instruction bits 26-31
<0:5>

<21:25>

<16:20>

<11:15>
<26:31>

<0:15>

Memory
Adr Func 0-5 are really Instruction bits 0-5
Op Fun Rt Rs Rd Imm16
rtype = ~op5 ~op4 ~op3 ~op2 ~op1 ~op0,
ori = ~op5 ~op4 op3 op2 ~op1 op0
lw = op5 ~op4 ~op3 ~op2 op1 op0
sw = op5 ~op4 op3 ~op2 op1 op0
beq = ~op5 ~op4 ~op3 op2 ~op1 ~op0
jump = ~op5 ~op4 ~op3 ~op2 op1 ~op0
add = rtype func5 ~func4 ~func3 ~func2 ~func1 ~func0
sub = rtype func5 ~func4 ~func3 ~func2 func1 ~func0

How doFallwe implement


2011 this in gates?
-- Lecture #30 14

7
3/2/15

Boolean Exprs for Controller


RegDst = add + sub
ALUSrc = ori + lw + sw
MemtoReg = lw
RegWrite = add + sub + ori + lw
MemWrite = sw
nPCsel = beq
Jump = jump
ExtOp = lw + sw
ALUctr[0] = sub + beq
ALUctr[1] = ori
(assume ALUctr is 00 ADD, 01 SUB, 10 OR)
How do we implement this in gates?
15

Controller Implementa>on
opcode func

RegDst
add
ALUSrc
sub MemtoReg
ori RegWrite
OR logic MemWrite
AND logic lw
nPCsel
sw
Jump
beq ExtOp
jump ALUctr[0]
ALUctr[1]

16

8
3/2/15

Review: Single-cycle Processor


Five steps to design a processor:
1. Analyze instruc>on set Processor
Input
datapath requirements Control
2. Select set of datapath Memory
components & establish
Datapath
clock methodology Output

3. Assemble datapath mee>ng


the requirements
4. Analyze implementa>on of each instruc>on to determine
seJng of control points that eects the register transfer.
5. Assemble the control logic
Formulate Logic Equa>ons
Design Circuits
17

Single Cycle Performance


Assume >me for ac>ons are
100ps for register read or write; 200ps for other events
Clock rate is?
Instr Instr fetch Register ALU op Memory Register Total time
read access write
lw 200ps 100 ps 200ps 200ps 100 ps 800ps
sw 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps

What can we do to improve clock rate?


Will this improve performance as well?
Want increased clock rate to mean faster programs
18

9
3/2/15

Single Cycle Performance


Assume >me for ac>ons are
100ps for register read or write; 200ps for other events
Clock rate is?
Instr Instr fetch Register ALU op Memory Register Total time
read access write
lw 200ps 100 ps 200ps 200ps 100 ps 800ps
sw 200ps 100 ps 200ps 200ps 700ps
R-format 200ps 100 ps 200ps 100 ps 600ps
beq 200ps 100 ps 200ps 500ps

What can we do to improve clock rate?


Will this improve performance as well?
Want increased clock rate to mean faster programs
19

Goma Do Laundry
Ann, Brian, Cathy, Dave
each have one load of clothes to A B C D
wash, dry, fold, and put away
Washer takes 30 minutes

Dryer takes 30 minutes

Folder takes 30 minutes

Stasher takes 30 minutes to put


clothes into drawers

10
3/2/15

Sequen>al Laundry
6 PM 7 8 9 10 11 12 1 2 AM

T 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
a Time
A
s
k B
C
O
r D
d
e
r Sequen>al laundry takes
8 hours for 4 loads

Pipelined Laundry
6 PM 7 8 9 10 11 12 1 2 AM

30 30 30 30 30 30 30 Time
T
a A
s
k B
C
O
D
r
d
e
r Pipelined laundry takes
3.5 hours for 4 loads!

11
3/2/15

Pipelining Lessons (1/2)


6 PM 7 8 9 Pipelining doesnt help latency
Time of single task, it helps
T throughput of en>re workload
a 30 30 30 30 30 30 30
Mul>ple tasks opera>ng
s A simultaneously using dierent
k
resources
B
O Poten>al speedup = Number
C pipe stages
r
d D Time to ll pipeline and >me
e to drain it reduces speedup:
r 2.3X v. 4X in this example

Pipelining Lessons (2/2)


6 PM 7 8 9 Suppose new Washer
T
Time takes 20 minutes, new
a 30 30 30 30 30 30 30 Stasher takes 20
s A minutes. How much
k faster is pipeline?
B
O
Pipeline rate limited by
C slowest pipeline stage
r
d D Unbalanced lengths of
e pipe stages reduces
r
speedup

12
3/2/15

Steps in Execu>ng MIPS


1) IFtch: Instruc>on Fetch, Increment PC
2) Dcd: Instruc>on Decode, Read Registers
3) Exec:
Mem-ref: Calculate Address
Arith-log: Perform Opera>on
4) Mem:
Load: Read Data from Memory
Store: Write Data to Memory
5) WB: Write Data Back to Register

Single Cycle Datapath


registers

rd
instruction
memory
PC

rs
memory

ALU
Data

rt

+4 imm

1. Instruction 2. Decode/ 3. Execute 4. Memory 5. Write


Fetch Register Read Back

13
3/2/15

Pipeline registers

registers
rd

instruction
memory
PC

rs

memory
ALU

Data
rt

+4 imm

1. Instruction 2. Decode/ 3. Execute 4. Memory 5. Write


Fetch Register Read Back

Need registers between stages


To hold informa>on produced in previous cycle

More Detailed Pipeline

14
3/2/15

IF for Load, Store,

ID for Load, Store,

15
3/2/15

EX for Load

MEM for Load

16
3/2/15

WB for Load Oops!

Wrong
register
number

Corrected Datapath for Load

17
3/2/15

So, in conclusion
You now know how to implement the control
logic for the single-cycle CPU.
(actually, you already knew it!)
Pipelining improves performance by increasing
instruc>on throughput: exploits ILP
Executes mul>ple instruc>ons in parallel
Each instruc>on has the same latency
Next: hazards in pipelining:
Structure, data, control
35

18

Anda mungkin juga menyukai