Anda di halaman 1dari 53

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-1

Appendix A. Pipelining: Basic and Intermediate


Concept
What is Pipelining?
Pipelining is an implementation technique whereby multiple
instructions are overlaped in execution.
Pipe stage (pipe segment)
Throughput
Machine cycle: The time required between moving an instruction one
step down the pipeline. This time is equal to the time required for the
slowest pipe stage.
In a computer, the machine cycle is usually one clock cycle.
The pipeline designers goal is to balance the length of each pipe stage.
If the stages are perfectly balanced,

Time per instruction

Time per instruction on unpipelined machine


Number of pipe stages

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-2

A Simple Implementation of A RISC ISA


Five-cycle implementation
Instruction fetch cycle (IF)
Instruction decode/register fetch cycle (ID)
Operand fetches;
Sign-extending the immediate field;
Decoding is done in parallel with reading registers. This technique is
known as fixed-field decoding;
Test branch condition and computed branch address; finished branching
at the end of this cycle.

Execution/effective address cycle (EX)


Memory reference;
Register-Register ALU instruction;
Register-Immediate ALU instruction;

Memory access/branch completion cycle (MEM)


Write-back cycle (WB)
Register-Register ALU instruction;
Register-Immediate ALU instruction;
Load instruction;

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-3

Performance of the Five-Cycle Implementation


CPI=4.54
Branch instructions (12%) take 2 cycles
Store instructions (10%) require 4 cycles
Others takes 5 cycles

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

The Classic Five-Stage Pipeline for a RSIC


Processor

Appendix-4

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

The RISC Pipeline with Registers

Appendix-5

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-6

Instruction Issue
The process of letting an instruction move from the
instruction decode stage (ID) into execution stage
(EX) of this pipeline.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-7

Basic Performance Issues in Pipelining


Pipelining increasing instruction execution throughput,
but it does not reduce the execution time of an individual
instruction due to pipeline overhead.
Register delay
Clock skew

The limitation of pipeline depth is due to


Pipeline latency
Pipe stage imbalance
Pipeline overhead

Example in A-10.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-8

The Major Hurdle of Pipelining - Pipelining


Hazards
A hazard is a situation that prevents the next instruction in
the instruction stream from executing during its designated
clock cycle.
Three classes of hazards
Structural hazard: Arise from resource conflicts.
Data hazard: Arise when an instruction depends on the results of a
previous instruction.
Control hazard: Arise from branches and other instructions that
change the PC.

A pipeline can be stalled by a hazard. To eliminate hazards,


Instructions issued later than the stalled instruction are also stalled.
Instructions issued earlier than the stalled one must continue.

Note that a cache miss stalls the whole pipeline.

Rung-Bin Lin

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Appendix-9

Performance of Pipeline with Stalls


Average instruction time unpipelined
Average instruction time pipelined
CPI unpipelined Clock cycle unpipelined

CPI pipelined
Clock cycle pipelined

Speedup from pipelining

When pipelining is thought of as decreasing the CPI,


Speedup

CPI unpipelined
1 Pipeline stall cycles per instruction
Pipeline depth
1 Pipeline stall cycles per instruction

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-10

When pipelining is thought of as improving the clock cycle


time,
Speedup

1
Clock cycle unpipelined

1 Pipeline stall cycles Clock cycle pipelined

Pipeline depth
1 Pipeline stall cycles per instruction

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Structural Hazards
Due to resource conflicts (Example in A-14)
Due to some functional unit being not fully pipelined.
When some resources have not been duplicated enough.

Appendix-11

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Data Hazards
A memory access depends on the results of unfinishing
instructions.

Appendix-12

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Forwarding (Bypassing) ALU Results To


Minimize Hazards

Appendix-13

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Forwarding (Bypassing) Results to Store

Appendix-14

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Bypassing Results of LOAD

Rung-Bin Lin

Appendix-15

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-16

Data Hazard Classification


Consider two instructions i and j, with i occurring before j,
the possible hazards are,
RAW (read after write) : j tries to read a source before i writes it.
WAW (write after write): j tries to write an operand before it is
written by i. For example,
LW R1, 0(R2)
IF ID EX MEM1 MEM2 WB
DADD R1, R2, R3
IF ID EX
WB
WAR (write after read): j tries to write a destination before it is read
by i. For example, if read is done in the second half of MEM2, and
write is done in the first half of WB.
SW 0(R1), R2
IF ID EX MEM1 MEM2 WB
DADD R2, R3, R4
IF ID EX
WB
RAR (read after read): not a hazard.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-17

Data Hazards Requiring Stalls


Pipeline interlock
A piece of hardware that detects a hazard and stalls the pipeline
until the hazard is cleared.

Load interlock
Example (Fig. A.10 at A-21)

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-18

Control Hazards
Caused by the instructions that change PC.
Some basics
If a branch changes the PC to its target address, it is a taken
branch. If it does not change the PC, it falls through or it is not
taken.
Recall that if an instruction i is a taken branch, the PC is normally
not changed until the end of ID. A stall cycle is required.
Branch Instruction
Branch successor
Branch successor+1
Branch successor+2

IF ID EX MEM WB
IF IF ID
EX MEM WB
IF
ID EX
MEM WB
IF ID
EX
MEM WB

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-19

Branch Penalty
Branch delay: The length of a control hazard.
Branch penalty: The branch delay, unless it is dealt with,
turns into branch penalty.
The deeper the pipeline, the worse the branch penalty.
The number of branch stalls can be reduced by two steps
Find out whether the branch is taken or not taken earlier in the
pipeline.
Compute the taken PC (i.e., the address of the branch target)
earlier.

Branch behavior in programs


Average frequency of taken branches : 67%
60% of the forward branches are taken.
85% of the backward branches are taken.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Reducing Pipeline Branch Penalties


Static branch prediction methods (Compile-time guess).
Free or flush the pipeline
Holding or deleting any instructions after the branch until the branch
destination is known.

Predict-not-taken (untaken) (Fig. A.12 in A-23)


Predict-taken
Does it have any advantage? Ans: no.

Delayed branch:
The execution cycle with a branch delay n is

Branch instruction

Sequential successor 1

Sequential successor 2

Sequential successor n (n=1 for MIPS)

Branch target if taken

Appendix-20

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Scheduling the Branch Delay Slot

Appendix-21

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-22

Effectiveness of Scheduling Branch Delay Slots


Requirements for being effective
Scheduling from before: Always
Scheduling from target: Taken
Scheduling from fall through: Not taken

The limitation on delayed-branch scheduling arises from


The restrictions on the instructions that are scheduled into the
delay slots.
The ability to predict at compile time whether a branch is likely to
be taken or not.

Using canceling or nullified branch to relieve the limlits


In a canceling branch, the instruction includes the direction that
the branch was predicted. When the branch behaves as predicted,
the instruction in the branch delay slot is simply executed.
Otherwise, the instruction in the branch delay slot is simply turned
into a No-Op.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

How Is Pipelining Implemented?


Unpipelined 5-cycle implementation

Appendix-23

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-24

Simple Pipelining Implementation for MIPS

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-25

Implementing the Control for MIPS Pipeline


Implementing the control focuses on detecting of hazards and
generating of control signals for forwarding.
Hazard detection
All the data hazards can be checked and forwarding control
signals can be set during the ID phase. If a data hazard exists, the
instruction is stalled before it is issued.
Or, alternatively, hazards forwarding are checked at the beginning
of a clock cycle that uses an operand (EX and MEM for the MIPS
pipeline).

Implementing the logic for hazard detection


Hazard detection by comparing the destination and sources of
adjacent instructions (fig. A.20 on page A-34).
An example shows detecting of all load interlocks when the
instruction using the load result in the ID stage (fig. A.21 on page A-34).

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-26

Implementing Forwarding Logic


Forwarding sources: ALU or data memory output.
Forwarding destination: ALU input, data memory input,
or zero detection unit (for BRANCH).
The forwarding can be implemented by checking the following
conditions
EX/MEM.IR.destination =ID/EX.IR.source ?
MEM/WB.IR.destination = ID/EX.IR.source ?
MEM/WB.IR.destination = EX/MEM.IR.source?

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Forwarding Data to the Two ALU Inputs

Appendix-27

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Dealing with Branches in the Pipeline

Appendix-28

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-29

What Makes Pipelining Hard to Implement


Exception (interrupt, fault) makes pipelining
difficult to implement.
Instruction set complications

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-30

Types of Exceptions
Types

I/O device request


Invoking an OS service from a user program
Tracing instruction execution
Breakpoint
Integer arithmetic overflow or underflow
FP arithmetic anomaly
Page fault
Misaligned memory access
Memory-protection violation
Using an undefined instruction
Hardware malfunction
Power failure

Exceptions for different architecture (fig. A.26 on page A40).

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-31

Classification of Exceptions
Synchronous versus asynchronous
If the event occurs at the same place every time that the program
is executed with the same data and memory allocation, the event is
called synchronous.

User requested versus coerced


User maskable versus nonmaskable
Within versus between instruction
Depend on whether the event prevents instruction completion by
occurring in the middle of execution or whether it is recognized
between instructions.

Resume versus terminate (fig. 3.40 on page 182).

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-32

Action Requirements for Different Exception


Types (Fig. A.27 on page A-42)
Actions
Resume
Terminate

The most difficult exceptions have two properties:


They occur within instructions (i.e. at EX or MEM stages).
They must be restartable (must save the PC of the
instruction at which to restart).

Rung-Bin Lin

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Appendix-33

Exception Handling
Stopping and restarting execution
Force a trap instruction on the next IF
Until the trap is taken, turn off all writes for the faulting instruction and
for all instructions that follow in the pipeline.
After the exception-handling routine in the operating system receives
control, it immediately saves the PC of the faulting instruction.
IF

ID

EX

MEM

WB <--- Faulting instruction

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

WB

IF

ID

EX

MEM

IF

ID

EX

WB
MEM

Trap instruction -> IF


ID
EX
If delayed branch is used, we need to save and restore as many PCs as the
length of the branch delay plus one.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-34

Precise Interrupt
If a pipeline can be stopped so that the instructions
just before the faulting instruction are completed
and those after it can be restarted from scratch.
Supporting precise interrupts is a requirement in many
systems.

Exceptions in DLX
With pipelining, multiple exceptions may occur in the
same clock cycle. (fig. A.28 on page A-44).

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-35

Implementations of Precise Exceptions


Principle
The pipeline should be able to handle the exceptions caused by
instruction i prior to the exceptions caused by instruction i+1.

Implementation
Hardware posts all exceptions caused by a given instruction in a
status vector associated that instruction.
Once an exception indication is set in the exception status vector,
any control signal that may cause a data value to be written is
turned off.
When an instruction enters WB, the exception status vector is
checked, if any exceptions are posted, they are handled in the
order in which they would occur in time on an unpipelined
machine.
This will guarantee that all exceptions will be seen on instruction i
before any are seen on i+1.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-36

Instruction Committed
When an instruction is guaranteed to complete, it is called
committed.
In the MIPS pipeline, all instructions are committed when
they reach the end of the MEM stage and no instruction
updates the state before that stage. Thus precise exceptions
are straight forward.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-37

Instruction Set Complications


Some machines have instructions that change the state in
the middle if the instruction execution.
VAX: Autoincrement addressing mode.
VAX or IBM 360: String copy.
Implicitly set condition code.
Cause difficulties in scheduling any pipeline delays between
setting condition code and the branch.
ADD XXX <--- Set condition code C.

<- Can not place instructions that change C.


BR C, YYY <--- Use C for branch.
In fact, the condition code must be treated as an operand that
requires hazard detection for RAW hazards with branch no matter
the condition code is set implicitly or explicitly
Multicycle operations in VAX

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-38

Extending the MIPS Pipeline to Handle MultiCycle Operations


Assuming four separate functional units in our MIPS
implementation
Integer unit
Handle loads and stores, ALU operations and branches.
FP and integer multiplier
FP adder
FP and integer divider

If an instruction cannot proceed to the EX stage , the entire


pipeline behind that instruction will be stalled.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

MIPS Pipeline with Multi-cycle Functional


Units

Appendix-39

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Pipelining Multi-cycle Functional Units

Appendix-40

Rung-Bin Lin

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Appendix-41

Latency and Initiation(repeat interval)


Latency
The number of intervening cycles between an instruction that
produces a result and an instruction that uses the result.

Initiation (repeat) interval


The number of cycles that must elapse between issuing two
operations of a given type.

Latency and initiation interval for pipelining multi-cycle


functional units
Functional Unit
Integer ALU
Data memory access
FP add
FP (integer) multiply
FP (integer) divide

Latency
0
1
3
6
24

Initiation interval
1
1
1
1
25

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-42

Hazards and Forwarding in Longer Latency


Pipelines
Hazard detection and forwarding for a pipeline as before.
Structural hazards can occur because the divide unit is not fully
pipelined.
The number of register writes can be larger than 1 because the
instructions have varying running time.
WAW hazards are possible, but WAR hazards are not possible.
Instructions can complete in a different order than they were
issued, causing problems with exceptions.
Stalls for RAW hazards will be more frequent because of longer
latency.
Assuming all hazard detection is done in ID, three checks must be
done before issuing an instruction:
Check for structural hazards
Check for a RAW data hazard
Check for a WAW data hazard

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

RAW Hazards Caused by Longer Pipeline


Fig. A.33

Appendix-43

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Structural Hazards in Longer Pipeline


Fig. A.34

Appendix-44

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-45

Maintaining Precise Exceptions (1)


Problems caused by out-of-order completion
DIV.D
ADD.D
SUB.D

F0, F2, F4
F10, F10, F8
F12, F12, F14

Four possible approaches


Ignore the problem and settle for imprecise exceptions
Buffer the results of an operation until all the operations that were
issued earlier are completed.
History file approach: Buffer the original register values.
Future file approach: Keep the newer values of registers.
Allow the exceptions to become somewhat imprecise, but to keep
enough information so that the trap-handling routines can create a
precise sequence for exceptions. This means knowing what
operations were in the pipeline and their PCs.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-46

Maintaining Precise Exceptions (2)


Worst-case scenario:
Instruction 1: A long-running instruction that interrupts.
Instruction 2 : not completed.
.
Instruction n-1: not completed.
Instruction n: completed. <-- The latest completed instruction.
The software must simulate the instruction 1 through instruction n1 and restart the execution at instruction n+1.
Allows the instruction issue to continue only if it is certain that all
the instructions before the issuing instruction will complete
without causing an exception. This sometimes means stalling the
machine to maintain precise exceptions.

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Number of Stalls per FP Operation

Appendix-47

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Performance of a MIPS FP Pipeline

Appendix-48

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Overview of The MIPS R4000 Pipeline


An implementation of MIPS64
Eight pipeline stages (superpipelining)

Appendix-49

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Load Delay in MIPS R4000

Rung-Bin Lin

Appendix-50

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Branch Delay in MIPS R4000

Rung-Bin Lin

Appendix-51

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

CPI of MIPS R4000

Rung-Bin Lin

Appendix-52

Appendix A: Basic Pipelining: Basic and Intermediate Concepts

Rung-Bin Lin

Appendix-53

Concluding Remarks
We can spend a little money to buy a very powerful
computer today.

Anda mungkin juga menyukai