Anda di halaman 1dari 41

1

4. Pipeline Hazards

COMP2611 CSE HKUST The Processor: Datapath & Control


Dependences in Programs 2

 Data dependence
Example: lw $1, 200($2)
add $3, $4, $1
add can’t do ID (i.e., read register $1) until lw updates $1

 Control dependence
Example: bne $1, $2, target
add $3, $4, $5
next IF can’t start until bne completes the comparison

 These dependences may cause the pipeline not be fully filled


 Execution stops to wait for data or control to be produced
 next instruction cannot be executed in next cycle

COMP2611 CSE HKUST The Processor: Datapath & Control


Pipeline Hazards 3

 Hazards are situations in pipelining when the next instruction cannot


be executed in the following clock cycle.

 Three types of pipelined hazards


 Structural hazards: A required resource is busy
 Data hazards: Need to wait for previous instruction to complete
its data read/write
 Control hazards: Deciding on control action depends on previous
instruction

 Hazards can always be resolved by waiting. But this slows down


the pipeline.

COMP2611 CSE HKUST The Processor: Datapath & Control


Structural Hazards: Memory 4

Instruction order
instr #1 IF ID EXE MEM WB

instr #2 IF ID EXE MEM WB

instr #3 IF ID EXE MEM WB

instr #4 IF ID EXE MEM WB

instr #5 IF ID EXE MEM WB

 Conflict for use of memory


 In MIPS pipeline with a single memory
Load/store requires data access

 Instruction fetch would have to stall for that cycle
 Would cause a pipeline “bubble”
 Hence, pipelined datapaths require separate instruction/data memories
 Or separate instruction/data caches

COMP2611 CSE HKUST The Processor: Datapath & Control


Structural Hazards: Registers 5

Instruction order instr #1 IF ID EXE MEM WB


instr #2 IF ID EXE MEM WB
instr #3 IF ID EXE MEM WB
instr #4 IF ID EXE MEM WB
instr #5 IF ID EXE MEM WB

 Fact: Register access VERY fast. Takes half the time of ALU stage or less
 always Write to registers during 1st half of each clock cycle
 always Read from Registers during 2nd half of each clock cycle
 Register file supports Write and Read during same clock cycle (in this
order)

COMP2611 CSE HKUST The Processor: Datapath & Control


Data Hazard 6

 An instruction depends on completion of data access by a previous


instruction
add $s0, $t0, $t1
sub $t2, $s0, $t3

 a bubble or pipeline stall is a delay in execution of an instruction in


an instruction pipeline in order to resolve a hazard.

COMP2611 CSE HKUST The Processor: Datapath & Control


Forwarding(aka Bypassing) 7

 Forwarding partially solves the data hazard problem


 Use result when it is computed
 Don’t wait for it to be stored in a register
 Requires extra connections in the datapath

COMP2611 CSE HKUST The Processor: Datapath & Control


Load-Use Data Hazard 8

 Can’t always avoid stalls by forwarding


 If value not computed when needed
 Can’t forward backward in time!

lw $s0, 20($t1)
sub $t2, $s0, $t3

COMP2611 CSE HKUST The Processor: Datapath & Control


Code Scheduling to Avoid Stalls 9

 Consider this code sequence


a = b + c;
d = b + e;
Assume a to e are stored in memory address 0($t0), 4($t0),
8($t0),12($t0)and 16($t0)respectively. Assume forwarding is
used.

lw $t1, 0($t0) lw $t1, 0($t0)


lw $t2, 4($t0) lw $t2, 4($t0)
stall add $t3, $t1, $t2 lw $t4, 8($t0)
sw $t3, 12($t0) add $t3, $t1, $t2
lw $t4, 8($t0) sw $t3, 12($t0)
stall add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0) sw $t5, 16($t0)

13 cycles 11 cycles

COMP2611 CSE HKUST The Processor: Datapath & Control


Control Hazards 10

 Branch determines flow of control


 Fetching next instruction depends on branch outcome
 Pipeline can’t always fetch correct instruction
 Still working on ID stage of branch

 In MIPS pipeline
 Need to compare registers and compute target early in the
pipeline
 Add hardware to do it in ID stage

COMP2611 CSE HKUST The Processor: Datapath & Control


Stall on Branch 11

 Wait until branch outcome determined before fetching


next instruction

COMP2611 CSE HKUST The Processor: Datapath & Control


Branch Prediction 12

 Longer pipelines can’t readily determine branch outcome


early
 Stall penalty becomes unacceptable
 Predict outcome of branch
 Only stall if prediction is wrong
 In MIPS pipeline
 Can predict branches not taken
 Fetch instruction after branch, with no delay

COMP2611 CSE HKUST The Processor: Datapath & Control


MIPS with Static Branch Prediction (Not Taken) 13

Prediction
correct

Prediction
incorrect

COMP2611 CSE HKUST The Processor: Datapath & Control


Pipeline Summary 14

The BIG Picture


 Pipelining improves performance by increasing instruction
throughput
 Executes multiple instructions in parallel
 Each instruction has the same latency

 Subject to hazards
 Structure, data, control

 Instruction set design affects complexity of pipeline


implementation

COMP2611 CSE HKUST The Processor: Datapath & Control


15

5. Pipeline Datapath with Hazards

COMP2611 CSE HKUST The Processor: Datapath & Control


Data Hazards in ALU Instructions 16

 Consider this sequence:


sub $2, $1,$3
and $12,$2,$5
or $13,$6,$2
add $14,$2,$2
sw $15,100($2)
 We can resolve hazards with forwarding
 How do we detect when to forward?

COMP2611 CSE HKUST The Processor: Datapath & Control


Dependencies and Forwarding 17

 From the figure the decision is simple (required “forwardings” are


represented by the two red lines):

COMP2611 CSE HKUST The Processor: Datapath & Control


Detecting the Need to Forward 18

Pass register numbers along pipeline


 e.g., ID/EX.RegisterRs = register number for Rs sitting
in ID/EX pipeline register
ALU operand register numbers in EX stage are
given by
 ID/EX.RegisterRs, ID/EX.RegisterRt

Data hazards when


1a. EX/MEM.RegisterRd = ID/EX.RegisterRs Fwd from
EX/MEM
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt pipeline reg

2a. MEM/WB.RegisterRd = ID/EX.RegisterRs Fwd from


2b. MEM/WB.RegisterRd = ID/EX.RegisterRt MEM/WB
pipeline reg

COMP2611 CSE HKUST The Processor: Datapath & Control


Detecting the Need to Forward (cont.) 19

 But only if forwarding instruction will write to a register!


 EX/MEM.RegWrite, MEM/WB.RegWrite

 And only if Rd for that instruction is not $zero


 EX/MEM.RegisterRd≠ 0,
MEM/WB.RegisterRd ≠ 0

COMP2611 CSE HKUST The Processor: Datapath & Control


Forwarding Paths 20

 Forwarding always takes place to EX stage


❍ Implementing these conditions in a forwarding control unit
❍ Using two multiplexers to decide what is the input of operands A and B
of the ALU

COMP2611 CSE HKUST The Processor: Datapath & Control


Forwarding Conditions 21

 EX hazard
 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
ForwardA = 10
 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10
 MEM hazard
 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01

COMP2611 CSE HKUST The Processor: Datapath & Control


Double Data Hazard 22

 Consider the sequence:


add $1,$1,$2
add $1,$1,$3
add $1,$1,$4
 Both hazards occur
 Want to use the most recent
 Revise MEM hazard condition
 Only fwd if EX hazard condition isn’t true

COMP2611 CSE HKUST The Processor: Datapath & Control


Revised Forwarding Condition 23

 MEM hazard
 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01

COMP2611 CSE HKUST The Processor: Datapath & Control


Datapath with Forwarding 24

COMP2611 CSE HKUST The Processor: Datapath & Control


Load-Use Data Hazard 25

Need to stall
for one cycle

COMP2611 CSE HKUST The Processor: Datapath & Control


Load-Use Hazard Detection 26

 Check when using instruction is decoded in ID stage

 ALU operand register numbers in ID stage are given by


 IF/ID.RegisterRs, IF/ID.RegisterRt

 Load-use hazard when


 ID/EX.MemRead and
((ID/EX.RegisterRt = IF/ID.RegisterRs) or
(ID/EX.RegisterRt = IF/ID.RegisterRt))

 If detected, stall and insert bubble

COMP2611 CSE HKUST The Processor: Datapath & Control


How to Stall the Pipeline 27

 Force control values in ID/EX register to 0


 EX, MEM and WB do nop (no-operation)
 Prevent update of PC and IF/ID register
 Using instruction is decoded again
 Following instruction is fetched again
 1-cycle stall allows MEM to read data for lw
 Can subsequently forward to EX stage

COMP2611 CSE HKUST The Processor: Datapath & Control


Stall/Bubble in the Pipeline 28

Stall inserted
here

COMP2611 CSE HKUST The Processor: Datapath & Control


Stall/Bubble in the Pipeline (cont.) 29

Or, more
accurately…

COMP2611 CSE HKUST The Processor: Datapath & Control


Datapath with Hazard Detection 30

Zero input
to create a
nop
operation

COMP2611 CSE HKUST The Processor: Datapath & Control


Stalls and Performance 31

 The BIG Picture

 Stalls reduce performance


 But are required to get correct results
 Compiler can arrange code to avoid hazards and stalls
 Requires knowledge of the pipeline structure

COMP2611 CSE HKUST The Processor: Datapath & Control


Branch Hazards 32

 If branch outcome determined in MEM

Flush these
instructions
(Set control
values to 0)

PC

COMP2611 CSE HKUST The Processor: Datapath & Control


Reducing Branch Delay 33

 Add hardware to the MIPS pipeline to determine the branch result in


the ID stage
 Target address calculation requires an adder
 Register comparator

An example (assume branch taken)


36: sub $10, $4, $8
40: beq $1, $3, 7 #PC relative branch to
44: and $12, $2, $5 40+4+4*7=72
48: or $13, $2, $6
52: add $14, $4, $2
56: slt $15, $6, $7
...
72: lw $4, 50($7)

COMP2611 CSE HKUST The Processor: Datapath & Control


Example: Branch Taken 34

Target address
calculator and
Register
comparator

COMP2611 CSE HKUST The Processor: Datapath & Control


Example: Branch Taken (cont.) 35

IF.Flush flushes
the “and”
instruction.

COMP2611 CSE HKUST The Processor: Datapath & Control


Data Hazards for Branches: Example 1 36

 Branch instruction depends on data value (in register) to


make decision, therefore it is prone to data hazards.
 If a comparison register is a destination of 2nd or 3rd
preceding ALU instruction

add $1, $2, $3 IF ID EX MEM WB

add $4, $5, $6 IF ID EX MEM WB

… IF ID EX MEM WB

beq $1, $4, target IF ID EX MEM WB

 Can resolve using forwarding

COMP2611 CSE HKUST The Processor: Datapath & Control


Data hazards for branches: Example 2 37

add $1, $2, $3 IF ID EX MEM WB

add $4, $5, $6 IF ID EX MEM WB

beq stalled IF ID

beq $1, $4, target ID EX MEM WB

 If a comparison register is a destination of preceding ALU instruction


or 2nd preceding load instruction
 Need 1 stall cycle

COMP2611 CSE HKUST The Processor: Datapath & Control


Data Hazards for Branches: Example 3 38

lw $1, addr IF ID EX MEM WB

beq stalled IF ID

beq stalled ID

beq $1, $0, target ID EX MEM WB

 If a comparison register is a destination of immediately preceding load


instruction
 Need 2 stall cycles

COMP2611 CSE HKUST The Processor: Datapath & Control


Dynamic Branch Prediction 39

In deeper and superscalar pipelines, branch


penalty is more significant
Use dynamic prediction
 Branch prediction buffer (aka branch history table)
 Indexed by recent branch instruction addresses
 Stores outcome (taken/not taken)
 To execute a branch
 Check table, expect the same outcome
 Start fetching from fall-through or target
 If wrong, flush pipeline and flip prediction

COMP2611 CSE HKUST The Processor: Datapath & Control


Concluding Remarks 40

 Pipelining improves the throughput by allowing reuse of functional


units by different instructions
 Pipelining allows an instruction to complete in each clock cycle,
but it requires a very careful design and additional registers to store
intermediate results between pipeline stages
 Pipelined Control is implemented like single cycle control with needed
control signals are forwarded down the pipeline
 Concurrence between instructions in the pipeline may cause
 Data Hazard: data is needed by an instruction before it is produced
by a previous one
 Structural Hazard: a hardware unit is needed by an instruction
while another is still using it
 Control Hazard: the next instruction cannot be determined in the
next clock cycle
 Hazards can always be solved by delaying (inserting bubbles)

COMP2611 CSE HKUST Introduction


Concluding Remarks (cont.) 41

 Structural hazard is solved by:


 Separating the instruction memory from the data memory
 Writing to the register file in the first half of the clock cycle and
reading from it in the second half
 Data hazard is solved by:
 Forwarding/Bypassing
 Inserting bubbles
 Control hazards are solved by:
 Hardware: add comparator to complete the comparison earlier
 Speculation: guess if the branch is taken or not
 Delay the branch: fill the bubbles with useful work that is
independent of the branch

COMP2611 CSE HKUST Introduction

Anda mungkin juga menyukai