Handout_8
Delayed Branching
This is a compiler-supported solution to control hazards. The idea is to let the compiler
rearrange the code so that a branch instruction’s effect is delayed by whatever number of cycles
required to discharge the dependencies. This is accomplished by placing such instruction(s) after
branch instruction that will execute regardless of branch is taken or not. Such instructions are
said to be in the branch delay slot.
In MIPS example, we compute BTA and evaluate branch condition in the ID stage and hence
suffer 1 cycle penalty in case of misprediction (i.e. branch is taken while we guessed it to be
untaken). This suggests that in our example branch delay slot = 1 instruction. In pipelines where
branch is tested later in the pipeline, branch delay slot may comprise multiple instructions.
The behavior of MIPS pipeline employing delayed branching is shown in the following diagrams.
The branch instruction is assumed to be ith instruction.
Instr i + 4 IF ID EX M WB
Branch target + 2 IF ID EX M WB
Page - 1 - of 4
CS-421 Parallel Processing BE (CIS) Batch 2004-05
Handout_8
b. From Target
Page - 2 - of 4
CS-421 Parallel Processing BE (CIS) Batch 2004-05
Handout_8
In this code sequence, the use of $s1 in the branch prevents the add instruction (whose
destination is $s1) from being moved after the branch i.e. in the branch delay slot. Here, the
branch-delay slot is scheduled from the target of the branch; usually the target instruction will
need to be copied because it can be reached via another path.
c. From Sequential Fall Through
c. From Fall Through
if $1 == 0 then
Delay Slot
becomes
if $1 == 0 then
To make this optimization legal for (b) and (c), it must be OK to execute the sub (i.e. the
instruction in the delay slot) instruction when the branch goes in the unexpected direction.
OK means that the work done might be wasted but the program will still execute correctly.
E.g. this is the case if $4 were an unused temporary register when the branch goes in the
unexpected direction.
Page - 3 - of 4
CS-421 Parallel Processing BE (CIS) Batch 2004-05
Handout_8
Summary
Scheduling strategy Requirements When Improves Performance
Must be OK to execute
From fall though rescheduled instructions if When branch is not taken
branch is taken
Limitations
The limitations on delayed-branch scheduling arise from:
– the restrictions on the instructions that are scheduled into the delay slots and
– our ability to predict at compile time whether a branch is likely to be taken or not.
Delayed branching is now losing popularity. As machines go to both longer pipelines and issue
multiple instructions per clock cycle, a single delay slot doesn’t offer much help.
Canceling Branch
To improve the ability of the compiler to fill branch delay slots, most machines with conditional
branches have introduced a canceling or annulling branch. In a canceling branch, the
instruction includes the direction that the branch was predicted while scheduling the delay slot at
compile time. If the branch behaves as predicted, the instruction in the branch delay slot is fully
executed. However, if the branch behaves against the prediction, the instruction in the delay slot
is turned into NOP by the processor hardware.
******
Page - 4 - of 4