Speculation
Outline
Speculation
Re-order buffers
Limits to ILP
Speculation
Branch Prediction Out of Order
Execution
Example:
for (i=0; i<1000; i++)
C[i] = A[i]+B[i];
Branch prediction:
cannot be avoided,
could be very accurate
Branch prediction:
predict the execution as
accurate as possible
(frequent cases)
Speculative execution
recovery: if prediction is
wrong, roll the execution
back
Misprediction is less
frequent event but can
we ignore?
Exception Behavior
Preserving exception behavior -- exceptions must be
raised exactly as in sequential execution
Same sequence as sequential
No extra exceptions
Example:
DADDU
BEQZ
LW
L1:
R2,R3,R4
R2,L1
R1,0(R2)
Exceptions in Order
Solutions:
Early detection of FP exceptions
The use of software mechanisms to restore a precise
exception state before resuming execution,
Delaying instruction completion until we know an
exception is impossible
Precise Interrupts
An interrupt is precise if the saved process
state corresponds with a sequential model of
program execution where one instruction
completes before the next begins.
Tomasulo had:
In-order issue, out-of-order execution, and
out-of-order completion
Need to fix the out-of-order completion
aspect so that we can find precise breakpoint
in instruction stream.
11
12
13
IM
Fetch Unit
Reorder buffer (ROB) reorder out-oforder inst to program order at the time of
writing reg/memory (commit)
Reorder
Buffer
Decode
Rename
Regfile
S-buf
L-buf
RS
RS
FU1
FU2
DM
15
16
Ready?
Program Counter
Exceptions?
Result
Dest reg
Branch or L/W?
Reorder Buffer
Speculative Execution
Recovery
IM
When to flush?
Fetch Unit
Reorder
Buffer
Decode
Rename
Regfile
S-buf
L-buf
RS
RS
FU1
FU2
DM
18
19
Complexity of ROB
Assume dual-issue superscalar
Load/Store machine with three-operand instructions
64 registers
16-entry circular buffer
20
Code Example
Loop: LD R2, 0(R1)
DADDIU R2, R2, #1
SD R2, 0(R1)
DADDIU R1, R1, #4
BNE R2, R3, Loop
How would this code be executed?
Inst
Issue
Exec
Memoryre
ad
Write
results
Commit
LD
21
Summary
Reservations stations: implicit register renaming to
larger set of registers + buffering source operands
Prevents registers as bottleneck
Avoids WAR, WAW hazards of Scoreboard
Lasting Contributions
Dynamic scheduling
Register renaming
Load/store disambiguation
23
Advantages of HW (Tomasulo)
vs. SW (VLIW) Speculation
HW determines address conflicts
HW better branch prediction
HW maintains precise exception model
Works across multiple implementations
SW speculation is much easier for HW design
24