Anda di halaman 1dari 51

Single Cycle Processor Design

ICS 233
Computer Architecture and Assembly Language
Dr. Aiman El-Maleh
College of Computer Sciences and Engineering King Fahd University of Petroleum and Minerals

Outline
Designing a Processor: Step-by-Step
Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design

Single Cycle Processor Design

ICS 233 KFUPM

Muhamed Mudawar slide 2

The Performance Perspective


Recall, performance is determined by:
Instruction count I-Count

Clock cycles per instruction (CPI)


Clock cycle time CPI Cycle

Processor design will affect


Clock cycles per instruction Clock cycle time

Single cycle datapath and control design:


Advantage: One clock cycle per instruction Disadvantage: long cycle time
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 3

Designing a Processor: Step-by-Step


Analyze instruction set => datapath requirements
The meaning of each instruction is given by the register transfers
Datapath must include storage elements for ISA registers Datapath must support each register transfer

Select datapath components and clocking methodology


Assemble datapath meeting the requirements

Analyze implementation of each instruction


Determine the setting of control signals for register transfer

Assemble the control logic


Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 4

Review of MIPS Instruction Formats


All instructions are 32-bit wide Three instruction formats: R-type, I-type, and J-type
Op6 Op6 Op6 Rs5 Rs5 Rt5 Rt5 Rd5 sa5 immediate16 immediate26 funct6

Op6: 6-bit opcode of the instruction Rs5, Rt5, Rd5: 5-bit source and destination register numbers sa5: 5-bit shift amount used by shift instructions funct6: 6-bit function field for R-type instructions immediate16: 16-bit immediate value or address offset immediate26: 26-bit target address of the jump instruction
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 5

MIPS Subset of Instructions


Only a subset of the MIPS instructions are considered
ALU instructions (R-type): add, sub, and, or, xor, slt

Immediate instructions (I-type): addi, slti, andi, ori, xori


Load and Store (I-type): lw, sw Branch (I-type): beq, bne

Jump (J-type): j

This subset does not include all the integer instructions But sufficient to illustrate design of datapath and control Concepts used to implement the MIPS subset are used to construct a broad spectrum of computers
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 6

Details of the MIPS Subset


Instruction
add sub and or xor slt addi slti andi ori xori lw sw beq bne j

Meaning
op6 = 0 op6 = 0 op6 = 0 op6 = 0 op6 = 0 op6 = 0 0x08 0x0a 0x0c 0x0d 0x0e 0x23 0x2b 0x04 0x05 0x02 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5 rs5

Format
rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rt5 rd5 rd5 rd5 rd5 rd5 rd5 0 0 0 0 0 0 im16 im16 im16 im16 im16 im16 im16 im16 im16 0x20 0x22 0x24 0x25 0x26 0x2a

rd, rs, rt addition rd, rs, rt subtraction rd, rs, rt bitwise and rd, rs, rt bitwise or rd, rs, rt exclusive or rd, rs, rt set on less than rt, rs, im16 add immediate rt, rs, im16 slt immediate rt, rs, im16 and immediate rt, rs, im16 or immediate rt, im16 xor immediate rt, im16(rs) load word rt, im16(rs) store word rs, rt, im16 branch if equal rs, rt, im16 branch not equal im26 jump

im26
Muhamed Mudawar slide 7

Single Cycle Processor Design

ICS 233 KFUPM

Register Transfer Level (RTL)


RTL is a description of data flow between registers RTL gives a meaning to the instructions

All instructions are fetched from memory at address PC


Instruction
ADD SUB ORI LW SW BEQ

RTL Description
Reg(Rd) Reg(Rs) + Reg(Rt); Reg(Rd) Reg(Rs) Reg(Rt); Reg(Rt) Reg(Rs) | zero_ext(Im16); Reg(Rt) MEM[Reg(Rs) + sign_ext(Im16)]; MEM[Reg(Rs) + sign_ext(Im16)] Reg(Rt); if (Reg(Rs) == Reg(Rt)) PC PC + 4 + 4 sign_extend(Im16) else PC PC + 4
ICS 233 KFUPM Muhamed Mudawar slide 8

PC PC + 4 PC PC + 4 PC PC + 4 PC PC + 4 PC PC + 4

Single Cycle Processor Design

Instructions are Executed in Steps


R-type
Fetch instruction: Fetch operands: Execute operation: Write ALU result: Next PC address: Fetch instruction: Fetch operands: Execute operation: Write ALU result: Next PC address: Fetch instruction: Fetch operands: Equality: Branch: Instruction MEM[PC] data1 Reg(Rs), data2 Reg(Rt) ALU_result func(data1, data2) Reg(Rd) ALU_result PC PC + 4 Instruction MEM[PC] data1 Reg(Rs), data2 Extend(imm16) ALU_result op(data1, data2) Reg(Rt) ALU_result PC PC + 4 Instruction MEM[PC] data1 Reg(Rs), data2 Reg(Rt) zero subtract(data1, data2) if (zero) PC PC + 4 + 4sign_ext(imm16) else PC PC + 4
ICS 233 KFUPM Muhamed Mudawar slide 9

I-type

BEQ

Single Cycle Processor Design

Instruction Execution contd


LW
Fetch instruction: Fetch base register: Calculate address: Read memory: Write register Rt: Next PC address: Fetch instruction: Fetch registers: Calculate address: Write memory: Next PC address: Fetch instruction: Target PC address: Jump: Instruction MEM[PC] base Reg(Rs) address base + sign_extend(imm16) data MEM[address] Reg(Rt) data PC PC + 4 Instruction MEM[PC] base Reg(Rs), data Reg(Rt) address base + sign_extend(imm16) MEM[address] data PC PC + 4 Instruction MEM[PC] target PC[31:28] , Imm26 , 00 PC target
ICS 233 KFUPM

SW

Jump

concatenation

Single Cycle Processor Design

Muhamed Mudawar slide 10

Requirements of the Instruction Set


Memory
Instruction memory where instructions are stored

Data memory where data is stored

Registers
32 32-bit general purpose registers, R0 is always zero Read source register Rs Read source register Rt Write destination register Rt or Rd

Program counter PC register and Adder to increment PC Sign and Zero extender for immediate constant ALU for executing instructions
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 11

Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design

Single Cycle Processor Design

ICS 233 KFUPM

Muhamed Mudawar slide 12

Components of the Datapath


Combinational Elements
ALU, Adder
Immediate extender Multiplexers
ExtOp
16

32
32

Extend

m u x
1
select

32

A L U

zero
32

ALU result
overflow ALU control

Storage Elements
PC
32 32 32

32

Instruction
32

Data Memory
Address
32

Instruction memory Data memory PC register Register file

Address

Instruction Memory

32

Data_out Data_in

Registers
5 32

MemRead

MemWrite

RA
5

BusA
32

Clocking methodology
Timing of reads and writes
Single Cycle Processor Design ICS 233 KFUPM

RB
5

BusB BusW
32

RW Clock RegWrite

Muhamed Mudawar slide 13

Register Element
Register
Similar to the D-type Flip-Flop
Data_In n bits Write Enable

n-bit input and output


Write Enable:

Clock

Register
Data_Out n bits

Enable / disable writing of register Negated (0): Data_Out will not change Asserted (1): Data_Out will become Data_In after clock edge

Edge triggered Clocking


Register output is modified at clock edge
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 14

MIPS Register File RW RA


Register File consists of 32 32-bit registers

RB

BusA and BusB: 32-bit output busses for reading 2 registers BusW: 32-bit input bus for writing a register when RegWrite is 1 Two registers read and one written in a cycle

Registers are selected by:


RA selects register to be read on BusA RB selects register to be read on BusB RW selects the register to be written

RA
5

Register File BusA


BusB

32

RB
5

32

RW Clock

BusW
32

Clock input

RegWrite

The clock input is used ONLY during write operation

During read, register file behaves as a combinational logic block


RA or RB valid => BusA or BusB valid after access time
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 15

Tri-State Buffers
Allow multiple sources to drive a single bus Two Inputs:
Data signal (data_in)
Enable

Output enable

Data_in

Data_out

One Output (data_out):


If (Enable) Data_out = Data_in

else Data_out = High Impedance state (output is disconnected)

Tri-state buffers can be used to build multiplexors

Data_0 Output Data_1

Select
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 16

Details of the Register File


32 32 R0 is not used

RA 5 Decoder

"0"

RB 5 Decoder

"0"
Tri-state buffer

R1
32

Decoder

32

RW
5

. . .
32

R2

. . .
32
32

32

BusA

BusW R31

32

Clock
Single Cycle Processor Design

RegWrite
ICS 233 KFUPM

BusB
Muhamed Mudawar slide 17

Building a Multifunction ALU


Shift Operation

None = 00 SLL = 01 SRL = 10 SRA = 11

2 32 Shift Amount lsb 5

Shifter
c0
sign A d d e r

SLT: ALU does a SUB and check the sign and overflow

A Arithmetic Operation B ADD = 0 SUB = 1

32

ALU Result
32

1 2 3

32

32

overflow
0 1

zero

Logic Unit
Logical Operation AND = 00 OR = 01 NOR = 10 XOR = 11

ALU Selection Shift = 00 SLT = 01 Arith = 10 Logic = 11

2 3 2

Single Cycle Processor Design

ICS 233 KFUPM

Muhamed Mudawar slide 18

Instruction and Data Memories


Instruction memory needs only provide read access
Because datapath does not write instructions

Behaves as combinational logic for read


Address selects Instruction after access time

32

32

Address Instruction

Data Memory is used for load and store


MemRead: enables output on Data_out
Address selects the word to put on Data_out
32

Instruction Memory

Data Memory
32

MemWrite: enables writing of Data_in


Address selects the memory word to be written The Clock synchronizes the write operation

Address Data_out
32

Data_in Clock

Separate instruction and data memories


Later, we will replace them with caches
Single Cycle Processor Design ICS 233 KFUPM

MemRead

MemWrite

Muhamed Mudawar slide 19

Clocking Methodology
Clocks are needed in a sequential We assume edgelogic to decide when a state element triggered clocking (register) should be updated All state changes
occur on the same To ensure correctness, a clocking clock edge methodology defines when data can Data must be valid be written and read
Register 1 Register 2

and stable before arrival of clock edge

Combinational logic

clock

rising edge

falling edge
ICS 233 KFUPM

Edge-triggered clocking allows a register to be read and written during same clock cycle
Muhamed Mudawar slide 20

Single Cycle Processor Design

Determining the Clock Cycle


With edge-triggered clocking, the clock cycle must be long enough to accommodate the path from one register through the combinational logic to another register
Register 2
Register 1

Tclk-q : clock to output delay through register


Tmax_comb : longest delay through combinational logic Ts : setup time that input to a register must be stable before arrival of clock edge

Combinational logic
clock writing edge

Tclk-q

Tmax_comb

Ts

Th: hold time that input to a Th register must hold after arrival of clock edge Hold time (Th) is normally satisfied since Tclk-q > Th
Muhamed Mudawar slide 21

Tcycle Tclk-q + Tmax_comb + Ts


Single Cycle Processor Design ICS 233 KFUPM

Clock Skew
Clock skew arises because the clock signal uses different paths with slightly different delays to reach state elements Clock skew is the difference in absolute time between when two storage elements see a clock edge With a clock skew, the clock cycle time is increased

Tcycle Tclk-q + Tmax_combinational + Tsetup+ Tskew


Clock skew is reduced by balancing the clock delays
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 22

Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design

Single Cycle Processor Design

ICS 233 KFUPM

Muhamed Mudawar slide 23

Instruction Fetching Datapath


We can now assemble the datapath from its components

For instruction fetching, we need


Program Counter (PC) register Instruction Memory Adder for incrementing PC
next PC

Improved datapath increments upper 30 bits of PC by 1


next PC

4
32 32

A d d
Instruction

The least significant 2 bits of the PC are 00 since PC is a multiple of 4


32

30

+1
00

Improved Datapath
32

00

30

PC

32

Instruction Memory

PC

Address

Datapath does not handle branch or jump instructions

Instruction
32

Address

Instruction Memory

Single Cycle Processor Design

ICS 233 KFUPM

Muhamed Mudawar slide 24

Datapath for R-type Instructions


Op6 Rs5 Rt5
RegWrite
30

Rd5

sa5
ALUCtrl

funct6

+1
00

30 32

Instruction Memory
Instruction Address

Registers
Rs 5
32

32

RA RB RW

BusA BusB BusW

Rt 5 Rd 5

PC

32

A L U

32

ALU result

RA & RB come from the instructions Rs & Rt fields

ALU inputs come from BusA & BusB

RW comes from the Rd field

ALU result is connected to BusW

Control signals
ALUCtrl is derived from the funct field because Op = 0 for R-type RegWrite is used to enable the writing of the ALU result
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 25

Datapath for I-type ALU Instructions


Op6 Rs5 Rt5
RegWrite
30

immediate16
ALUCtrl

+1
00

30 32

Instruction Memory
Instruction Address

Registers
Rs 5
32
5

32

RA RB

BusA
32

PC

Rt 5

BusB BusW ExtOp

32

A L U

32

RW

ALU result

RW now comes from Rt, instead of Rd

Imm16

Extender

Control signals
ALUCtrl is derived from the Op field

Second ALU input comes from the extended immediate RB and BusB are not used

RegWrite is used to enable the writing of the ALU result ExtOp is used to control the extension of the 16-bit immediate
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 26

Combining R-type & I-type Datapaths


RegWrite
30

+1
00

ALUCtrl

30 32

Instruction Memory
Instruction Address

Registers
Rs 5
32

32

RA RB RW

BusA
32

Rt 5

PC

BusB BusW ExtOp


32

m u Rd x
1 5

m u x
1

A L U

32

ALUSrc ALU result

A mux selects RW as either Rt or Rd

RegDst Imm16

Another mux selects 2nd ALU input as either source register Rt data on BusB or the extended immediate

Extender

Control signals
ALUCtrl is derived from either the Op or the funct field RegWrite enables the writing of the ALU result

ExtOp controls the extension of the 16-bit immediate


RegDst selects the register destination as either Rt or Rd ALUSrc selects the 2nd ALU source as BusB or extended immediate
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 27

Controlling ALU Instructions


RegWrite = 1
30

+1
00

ALUCtrl

30 32

Instruction Memory
Instruction Address

Registers
Rs 5
32

32

RA RB

BusA
32

Rt
0

PC

BusB BusW ExtOp


32

m u Rd x
1 5

RW

m u x
1

A L U

32

ALUSrc = 0 ALU result

RegDst = 1 Imm16

Extender
RegWrite = 1

For R-type ALU instructions, RegDst is 1 to select Rd on RW and ALUSrc is 0 to select BusB as second ALU input. The active part of datapath is shown in green

30

+1
00

ALUCtrl

30 32

Instruction Memory
Instruction Address

Registers
Rs 5
32

32

RA RB

BusA
32

Rt
0

PC

BusB BusW ExtOp


32

m u Rd x
1 5

RW

m u x
1

A L U

32

ALUSrc = 1 ALU result

RegDst = 0 Imm16

Extender
ICS 233 KFUPM

For I-type ALU instructions, RegDst is 0 to select Rt on RW and ALUSrc is 1 to select Extended immediate as second ALU input. The active part of datapath is shown in green
Muhamed Mudawar slide 28

Single Cycle Processor Design

Details of the Extender


Two types of extensions
Zero-extension for unsigned constants

Sign-extension for signed constants

Control signal ExtOp indicates type of extension Extender Implementation: wiring and one AND gate
ExtOp = 0 Upper16 = 0

ExtOp

Upper 16 bits

Imm16

Single Cycle Processor Design

ICS 233 KFUPM

. . .
. . .

ExtOp = 1 Upper16 = sign bit

Lower 16 bits
Muhamed Mudawar slide 29

Adding Data Memory to Datapath


A data memory is added for load and store instructions
ExtOp
Imm16

ALUCtrl
32

MemRead

MemWrite
MemtoReg

Extender
RA BusA

ALUSrc ALU result


32

30

+1
00

30 32

Instruction Memory
Instruction Address

Rs 5
32

Rt 5 m u Rd x
1 0

Registers
RB RW

PC

BusB
BusW

m u x
1

A L U

32

Data Memory
Address
Data_out Data_in
32

m 32 u x
1

32

RegDs t

RegWrite

ALU calculates data memory address

Additional Control signals


MemRead for load instructions MemWrite for store instructions

A 3rd mux selects data on BusW as either ALU result or memory data_out BusB is connected to Data_in of Data Memory for store instructions

MemtoReg selects data on BusW as ALU result or Memory Data_out


Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 30

Controlling the Execution of Load


ExtOp = sign to sign-extend Immmediate16 to 32 bits
Imm16

ExtOp = sign

ALUCtrl = ADD
32

MemRead =1

MemWrite =0 MemtoReg =1

Extender
RA
BusA

ALUSrc =1

ALU result

30

+1
00

30 32

Instruction Memory
Instruction Address

Rs 5
32

32

Rt 5 m u Rd x
1 5 0

Registers
RB RW BusB BusW m u x
1 0

PC

A L U

32

Data Memory
Address Data_out Data_in
32

m 32 u x
1

32

RegDst = 0 selects Rt as destination register

RegDst RegWrite =0 =1

MemRead = 1 to read data memory MemtoReg = 1 places the data read from memory on BusW RegWrite = 1 to write the memory data on BusW to register Rt
Muhamed Mudawar slide 31

ALUSrc = 1 selects extended immediate as second ALU input ALUCtrl = ADD to calculate data memory address as Reg(Rs) + sign-extend(Imm16)
Single Cycle Processor Design ICS 233 KFUPM

Controlling the Execution of Store


ExtOp = sign to sign-extend Immmediate16 to 32 bits
Imm16

ExtOp = sign

ALUCtrl = ADD
32

MemRead =0

MemWrite =1 MemtoReg =x

Extender
RA
BusA

ALUSrc =1

ALU result

30

+1
00

30 32

Instruction Memory
Instruction Address

Rs 5
32

32

Rt 5 m u Rd x
1 5 0

Registers
RB RW BusB BusW m u x
1 0

PC

A L U

32

Data Memory
Address Data_out Data_in
32

m 32 u x
1

32

RegDst = x because no destination register

RegDst RegWrite =x =0

MemWrite = 1 to write data memory


MemtoReg = x because we dont care what data is placed on BusW RegWrite = 0 because no register is written by the store instruction
Muhamed Mudawar slide 32

ALUSrc = 1 to select the extended immediate as second ALU input ALUCtrl = ADD to calculate data memory address as Reg(Rs) + sign-extend(Imm16)
Single Cycle Processor Design ICS 233 KFUPM

Adding Jump and Branch to Datapath


30 Jump or Branch Target Address 30 Imm26 30

Next PC
Imm16

MemRea d

MemWrite MemtoReg

ALU result

PCSrc

+1
00

30

Instruction Memory
Instruction Address

zero

Rs 5
32

RA

BusA

PC

m u x
1

Rt 5 m u Rd x
1 5 0

Registers
RB RW BusB BusW

Ext
m u x
1 0

A L U

Data Memory
Address Data_out Data_in
32

m 32 u x
1

RegDst RegWrite ALUSrc ALUCtrl J, Beq, Bne

Additional Control Signals


J, Beq, Bne for jump and branch instructions Zero condition of the ALU is examined PCSrc = 1 for Jump & taken Branch
Single Cycle Processor Design ICS 233 KFUPM

Next PC computes jump or branch target instruction address For Branch, ALU does a subtraction
Muhamed Mudawar slide 33

Details of Next PC
Branch or Jump Target Address
30

PCSrc

Inc PC Sign-Extension: Most-significant bit is replicated Imm26


26 30

A D D

30

SE
msb 4

m 30 u x

Beq Bne

Imm16

J Zero

Imm16 is sign-extended to 30 bits

Jump target address: upper 4 bits of PC are concatenated with Imm26 PCSrc = J + (Beq . Zero) + (Bne . Zero)
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 34

Controlling the Execution of Jump


30 Jump Target Address 30 Imm26 30

PCSrc =1

Next PC
Imm16

MemRea d =0

MemWrite =0 MemtoReg =x

ALU result

+1
00

30

Instruction Memory
Instruction Address

zero

Rs 5
32

RA

BusA Ext m u x
1 0

PC

m u x
1

Rt 5 m u Rd x
1 5 0

Registers
RB RW BusB BusW

A L U

Data Memory
Address Data_out Data_in
32

m 32 u x
1

J = 1 selects Imm26 as jump target address


Upper 4 bits are from the incremented PC PCSrc = 1 to select jump target address
Single Cycle Processor Design

RegDst RegWrite =x =0

ExtOp =x

ALUSrc ALUCtrl J = 1 =x =x

MemRead, MemWrite & RegWrite are 0

We dont care about RegDst, ExtOp, ALUSrc, ALUCtrl, and MemtoReg


ICS 233 KFUPM Muhamed Mudawar slide 35

Controlling the Execution of Branch


30 Branch Target Address 30 Imm26 30

PCSrc =1

Next PC
Imm16

MemRea d =0

MemWrite =0 MemtoReg =x

ALU result

+1
00

30

Instruction Memory
Instruction Address

zero

Rs 5
32

RA

BusA Ext m u x
1 0

PC

m u x
1

Rt 5 m u Rd x
1 5 0

Registers
RB RW BusB BusW

A L U

Data Memory
Address Data_out Data_in
32

m 32 u x
1

Either Beq or Bne =1

RegDst RegWrite =x =0

ExtOp =x

ALUSrc ALUCtrl Beq = 1 =0 = SUB Bne = 1

Next PC outputs branch target address


ALUSrc = 0 (2nd ALU input is BusB) ALUCtrl = SUB produces zero flag MemRead = MemWrite = RegWrite = 0
Single Cycle Processor Design ICS 233 KFUPM

Next PC logic determines PCSrc according to zero flag RegDst = ExtOp = MemtoReg = x
Muhamed Mudawar slide 36

Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design

Single Cycle Processor Design

ICS 233 KFUPM

Muhamed Mudawar slide 37

Main Control and ALU Control


Instruction Memory
Instruction RegWrite Address RegDst
32

Datapath
ALUSrc ExtOp MemWrite MemtoReg MemRead funct6 Beq Bne J

A L U

Op6

ALUCtrl

Input: Output:

Main Control

ALUOp

ALU Control

6-bit opcode field from instruction 10 control signals for datapath ALUOp for ALU Control

Input:
6-bit function field from instruction

ALUOp from main control

Output:
ALUCtrl signal for ALU
ICS 233 KFUPM Muhamed Mudawar slide 38

Single Cycle Processor Design

Single-Cycle Datapath + Control


30 Jump or Branch Target Address 30 Imm26 30

Next PC
Imm16

J, Beq, Bne ALU result

PCSrc

+1
00

zero

30

Instruction Memory
Instruction Address

Rs 5
32

RA

BusA Ext m u x
1 0

PC

m u x
1

Rt 5 m u Rd x
1 5 0

Registers
RB RW BusB BusW

A L U

Data Memory
Address Data_out Data_in
32

m 32 u x
1

RegDst RegWrite

ExtOp

ALUSrc ALUCtrl func

Op

ALU Ctrl
ALUOp
MemWrite

MemRead MemtoReg

Main Control

Single Cycle Processor Design

ICS 233 KFUPM

Muhamed Mudawar slide 39

Main Control Signals


Signal
RegDst RegWrite ExtOp ALUSrc MemRead

Effect when 0
Destination register = Rt None 16-bit immediate is zero-extended

Effect when 1
Destination register = Rd Destination register is written with the data value on BusW 16-bit immediate is sign-extended

Second ALU operand comes from the Second ALU operand comes from second register file output (BusB) the extended 16-bit immediate
None Data memory is read Data_out Memory[address] Data memory is written Memory[address] Data_in BusW = Data_out from Memory PC Branch target address If branch is taken

MemWrite

None

MemtoReg BusW = ALU result Beq, Bne PC PC + 4

J
ALUOp

PC PC + 4

PC Jump target address

This multi-bit signal specifies the ALU operation as a function of the opcode
ICS 233 KFUPM Muhamed Mudawar slide 40

Single Cycle Processor Design

Main Control Signal Values


Op Reg Dst Reg Write Ext Op ALU Src ALU Op Beq Bne J Mem Read Mem Write Mem toReg

R-type 1 = Rd
addi slti andi ori xori lw sw beq bne j 0 = Rt 0 = Rt 0 = Rt 0 = Rt 0 = Rt 0 = Rt x x x x

1
1 1 1 1 1 1 0 0 0 0

0=BusB R-type
ADD SLT AND OR XOR ADD ADD SUB SUB x

0
0 0 0 0 0 0 0 1 0 0

0
0 0 0 0 0 0 0 0 1 0

0
0 0 0 0 0 0 0 0 0 1

0
0 0 0 0 0 1 0 0 0 0

0
0 0 0 0 0 0 1 0 0 0

0
0 0 0 0 0 1 x x x x

1=sign 1=Imm 1=sign 1=Imm 0=zero 1=Imm 0=zero 1=Imm 0=zero 1=Imm 1=sign 1=Imm 1=sign 1=Imm x x x 0=BusB 0=BusB x

X is a dont care (can be 0 or 1), used to minimize logic


Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 41

Logic Equations for Control Signals


RegDst RegWrite ExtOp ALUSrc <= R-type <= (sw + beq + bne + j) <= (andi + ori + xori) <= (R-type + beq + bne)
Logic Equations
MemtoReg MemRead MemWrite RegWrite ALUSrc RegDst ALUop ExtOp

Op6

Decoder
R-type addi slti andi ori xori lw sw
Muhamed Mudawar slide 42

MemRead <= lw MemWrite <= sw

MemtoReg <= lw

Single Cycle Processor Design

ICS 233 KFUPM

Beq Bne J

ALU Control Truth Table


Op6 R-type R-type R-type R-type R-type R-type addi slti andi ori xori lw sw beq bne j ALU Control ALUOp funct6 ALUCtrl R-type R-type R-type R-type R-type R-type ADD SLT AND OR XOR ADD ADD SUB SUB x add sub and or xor slt x x x x x x x x x x ADD SUB AND OR XOR SLT ADD SLT AND OR XOR ADD ADD SUB SUB x
ICS 233 KFUPM

4-bit Encoding 0000 0010 0100 0101 0110 1010 0000 1010 0100 0101 0110 0000 0000 0010 0010 x

The 4-bit encoding for ALUctrl is chosen here to be equal to the last 4 bits of the function field Other binary encodings are also possible. The idea is to choose a binary encoding that will minimize the logic for ALU Control

Single Cycle Processor Design

Muhamed Mudawar slide 43

Next . . .
Designing a Processor: Step-by-Step
Datapath Components and Clocking Assembling an Adequate Datapath Controlling the Execution of Instructions The Main Controller and ALU Controller Drawback of the single-cycle processor design

Single Cycle Processor Design

ICS 233 KFUPM

Muhamed Mudawar slide 44

Drawbacks of Single Cycle Processor


Long cycle time
All instructions take as much time as the slowest
ALU
Load Store
Instruction Fetch Reg Read ALU Reg Write

longest delay
Instruction Fetch Reg Read ALU ALU ALU

Memory Read
Memory Write

Reg Write

Instruction Fetch Reg Read

Branch Instruction Fetch Reg Read Jump


Instruction Fetch Decode

Alternative Solution: Multicycle implementation


Break down instruction execution into multiple cycles
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 45

Multicycle Implementation
Break instruction execution into five steps
Instruction fetch

Instruction decode and register read


Execution, memory address calculation, or branch completion Memory access or ALU instruction completion Load instruction completion

One step = One clock cycle (clock cycle is reduced)


First 2 steps are the same for all instructions
Instruction
ALU & Store Load
Single Cycle Processor Design

# cycles
4 5

Instruction
Branch Jump

# cycles
3 2
Muhamed Mudawar slide 46

ICS 233 KFUPM

Performance Example
Assume the following operation times for components:
Instruction and data memories: 200 ps

ALU and adders: 180 ps


Decode and Register file access (read or write): 150 ps Ignore the delays in PC, mux, extender, and wires

Which of the following would be faster and by how much?


Single-cycle implementation for all instructions Multicycle implementation optimized for every class of instructions

Assume the following instruction mix:


40% ALU, 20% Loads, 10% stores, 20% branches, & 10% jumps
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 47

Solution
Instruction Class ALU Load Store Branch Jump Instruction Memory 200 200 200 200 200 Register Read 150 150 150 150 150 ALU Operation 180 180 180 180 200 200 Data Memory Register Write 150 150 Total 680 ps 880 ps 730 ps 530 ps 350 ps

decode and update PC

For fixed single-cycle implementation:


Clock cycle = 880 ps determined by longest delay (load instruction)

For multi-cycle implementation:


Clock cycle = max (200, 150, 180) = 200 ps (maximum delay at any step) Average CPI = 0.44 + 0.25 + 0.14+ 0.23 + 0.12 = 3.8

Speedup = 880 ps / (3.8 200 ps) = 880 / 760 = 1.16


Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 48

Worst Case Timing (Load Instruction)


Clk Clk-to-q Old PC New PC Instruction Memory Access Time Old Instruction New Instruction = (Op, Rs, Rt, Rd, Funct, Imm16, Imm26)

Delay Through Control Logic


Old Control Signal Values New Control Signal Values (ExtOp, ALUSrc, ALUOp, ) Register File Access Time Old BusA Value Delay Through Extender and ALU Mux Old Second ALU Input New Second ALU Input = sign-extend(Imm16) ALU Delay Old ALU Result Data Memory Access Time Old Data Memory Output Value Mux delay + Setup time + Clock skew Clock Cycle
ICS 233 KFUPM Muhamed Mudawar slide 49

New BusA Value = Register(Rs)

New ALU Result = Address

New Value Write Occurs

Single Cycle Processor Design

Worst Case Timing Cont'd


Long cycle time: must be long enough for Load operation
PCs Clk-to-Q + Instruction Memorys Access Time + Maximum of ( Register Files Access Time, Delay through control logic + extender + ALU mux)

+ ALU to Perform a 32-bit Add


+ Data Memory Access Time + Delay through MemtoReg Mux + Setup Time for Register File Write + Clock Skew

Cycle time is longer than needed for other instructions


Therefore, single cycle processor design is not used in practice
Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 50

Summary
5 steps to design a processor
Analyze instruction set => datapath requirements Select datapath components & establish clocking methodology Assemble datapath meeting the requirements Analyze implementation of each instruction to determine control signals Assemble the control logic

MIPS makes Control easier


Instructions are of same size Source registers always in same place Immediates are of same size and same location Operations are always on registers/immediates

Single cycle datapath => CPI=1, but Long Clock Cycle


Single Cycle Processor Design ICS 233 KFUPM Muhamed Mudawar slide 51

Anda mungkin juga menyukai