Anda di halaman 1dari 4

CS-421 Parallel Processing BE (CIS) Batch 2004-05

Handout_8

Delayed Branching
This is a compiler-supported solution to control hazards. The idea is to let the compiler
rearrange the code so that a branch instruction’s effect is delayed by whatever number of cycles
required to discharge the dependencies. This is accomplished by placing such instruction(s) after
branch instruction that will execute regardless of branch is taken or not. Such instructions are
said to be in the branch delay slot.
In MIPS example, we compute BTA and evaluate branch condition in the ID stage and hence
suffer 1 cycle penalty in case of misprediction (i.e. branch is taken while we guessed it to be
untaken). This suggests that in our example branch delay slot = 1 instruction. In pipelines where
branch is tested later in the pipeline, branch delay slot may comprise multiple instructions.
The behavior of MIPS pipeline employing delayed branching is shown in the following diagrams.
The branch instruction is assumed to be ith instruction.

Untaken branch instr IF ID EX M WB

Branch Delay Slot


IF ID EX M WB
Instr i + 1
Instr i + 2 IF ID EX M WB
Instr i + 3 IF ID EX M WB

Instr i + 4 IF ID EX M WB

Taken branch instr IF ID EX M WB

Branch Delay Slot


IF ID EX M WB
Instr i + 1
Branch target IF ID EX M WB
Branch target + 1 IF ID EX M WB

Branch target + 2 IF ID EX M WB

Schemes for Scheduling Branch Delay Slot(s)


The job of the compiler is to make the successor instructions (those in the branch delay slot) valid
and useful. There are three delay-slot-scheduling schemes used in practice:
a. From Before Branch
b. From Target
c. From Fall Through
a. From Before Branch
Here the delay slot is scheduled with an independent instruction from before the branch.
This is the best choice. Options b and c are exercised when this option is not possible

Page - 1 - of 4
CS-421 Parallel Processing BE (CIS) Batch 2004-05
Handout_8

b. From Target

Page - 2 - of 4
CS-421 Parallel Processing BE (CIS) Batch 2004-05
Handout_8

In this code sequence, the use of $s1 in the branch prevents the add instruction (whose
destination is $s1) from being moved after the branch i.e. in the branch delay slot. Here, the
branch-delay slot is scheduled from the target of the branch; usually the target instruction will
need to be copied because it can be reached via another path.
c. From Sequential Fall Through
c. From Fall Through

add $1, $2, $3

if $1 == 0 then

Delay Slot

sub $4, $5, $6

becomes

add $1, $2, $3

if $1 == 0 then

sub $4, $5, $6

To make this optimization legal for (b) and (c), it must be OK to execute the sub (i.e. the
instruction in the delay slot) instruction when the branch goes in the unexpected direction.
OK means that the work done might be wasted but the program will still execute correctly.
E.g. this is the case if $4 were an unused temporary register when the branch goes in the
unexpected direction.

Page - 3 - of 4
CS-421 Parallel Processing BE (CIS) Batch 2004-05
Handout_8
Summary
Scheduling strategy Requirements When Improves Performance

Branch must not depend on the


From before branch Always
rescheduled instructions

Must be OK to execute When branch is taken with high probability such


From target rescheduled instructions if as backward branches in loop. May enlarge
branch is not taken program if instructions are duplicated

Must be OK to execute
From fall though rescheduled instructions if When branch is not taken
branch is taken

Limitations
The limitations on delayed-branch scheduling arise from:
– the restrictions on the instructions that are scheduled into the delay slots and
– our ability to predict at compile time whether a branch is likely to be taken or not.
Delayed branching is now losing popularity. As machines go to both longer pipelines and issue
multiple instructions per clock cycle, a single delay slot doesn’t offer much help.
Canceling Branch
To improve the ability of the compiler to fill branch delay slots, most machines with conditional
branches have introduced a canceling or annulling branch. In a canceling branch, the
instruction includes the direction that the branch was predicted while scheduling the delay slot at
compile time. If the branch behaves as predicted, the instruction in the branch delay slot is fully
executed. However, if the branch behaves against the prediction, the instruction in the delay slot
is turned into NOP by the processor hardware.
******

Page - 4 - of 4

Anda mungkin juga menyukai