Adapted from Computer Organization and Design, Patterson & Hennessy, UCB
ECE232: Hardware Organization and Design
Part 11: Pipelining Chapter 4/6 http://www.ecs.umass.edu/ece/ece232/ ECE232: Pipelining I 2 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren CPI Calculation CPI stands for average number of Cycles Per Instruction Assume an instruction mix of 24% loads, 12% stores, 44% R- format, 18% branches, and 2% jumps CPI = 0.24 * 5 + 0.12 * 4 + 0.44 * 4 + 0.18 * 3 + 0.02 * 3 = 4.04 Speedup? Question: Can we achieve a CPI of 1??? ECE232: Pipelining I 3 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Speeding up through pipelining Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold Washer takes 30 minutes Dryer takes 30 minutes Folder takes 30 minutes Stasher takes 30 minutes to put clothes into drawers A B C D ECE232: Pipelining I 4 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Sequential Laundry Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take? 30 T a s k O r d e r B C D A Time 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 6 PM 7 8 9 10 11 12 1 2 AM ECE232: Pipelining I 5 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Pipelined Laundry: Start work ASAP Pipelined laundry takes 3.5 hours for 4 loads! T a s k O r d e r 12 2 AM 6 PM 7 8 9 10 11 1 Time B C D A 30 30 30 30 30 30 30 ECE232: Pipelining I 6 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Pipelining Lessons Pipelining doesnt help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup 6 PM 7 8 9 Time B C D A 30 30 30 30 30 30 30 T a s k O r d e r ECE232: Pipelining I 7 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Pipelining Instructions Time (in cycles) I n s t r u c t i o n F D EX M W F D EX M W F D EX M W F D EX M W F D EX M F D EX Fetch = 10 ns Decode = 6 ns Execute = 8 ns Memory = 10 ns Write back = 6 ns W M W ECE232: Pipelining I 8 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Ifetch Reg Exec Mem Wr Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Load Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Load Store Ifetch Reg Exec Mem Wr Store Clk Load Store Waste Ifetch R-type Ifetch Reg Exec Mem Wr R-type Cycle 1 Cycle 2 Multiple Cycle Implementation: Pipeline Implementation: Single Cycle Implementation: ECE232: Pipelining I 9 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Why Pipeline? Suppose we execute 100 instructions Single Cycle Machine 45 ns/cycle x 1 CPI x 100 inst = 4500 ns Multicycle Machine 10 ns/cycle x 4.04 CPI (for the given inst mix) x 100 inst = 4040 ns Instruction mix of 24% loads, 12% stores, 44% R-format, 18% branches, and 2% jumps Ideal pipelined machine (with 5 stages) 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns Speedup=4.33 vs. single-cycle 3.88 vs. multi-cycle (for the given inst mix) ECE232: Pipelining I 10 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Why Pipeline? Because the resources are there! I n s t r. O r d e r Time (clock cycles) Inst 1 Inst 2 Inst 3 Inst 5 Inst 4 A L U Im Reg Dm Reg A L U Im Reg Dm Reg A L U Im Reg Dm Reg A L U Im Reg Dm Reg A L U Im Reg Dm Reg ECE232: Pipelining I 11 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Pipelining Rules Forward traveling signals at each stage are latched Only perform logic on signals in the same stage signal labeling useful to prevent errors, e.g., IR R , IR A , IR M , IR W Backward travelling signals at each stage represent hazards A L U IMem Reg DMem Reg Inst 1 Inst 2 Inst 3 Inst 4 Inst 5 ECE232: Pipelining I 12 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren MIPS Pipelined Datapath State registers between pipeline stages to isolate them Read Address Instruction Memory Add P C 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 16 32 ALU Shift left 2 Add Data Memory Address Write Data Read Data I F e t c h / D e c D e c / E x e c E x e c / M e m M e m / W B IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack System Clock Sign Extend Inst 1 Inst 2 Inst 3 Inst 4 Inst 5 ECE232: Pipelining I 13 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Pipeline Hazards Data hazards: an instruction uses the result of a previous instruction (RAW) ADD R1, R2, R3 or SW R1, 4(R2) SUB R4, R1, R5 LW R3, 4(R2) Control hazards: the address of the next instruction to be executed depends on a previous instruction BEQ R1,R2,CONT SUB R6,R7,R8
CONT: ADD R3,R4,R5
Structural hazards: two instructions need access to the same resource e.g., single memory shared for instruction fetch and load/store ECE232: Pipelining I 14 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren I n s t r. O r d e r Time (clock cycles) lw Inst 1 Inst 2 Inst 4 Inst 3 A L U Mem Reg Mem Reg A L U Mem Reg Mem Reg A L U Mem Reg Mem Reg A L U Mem Reg Mem Reg A L U Mem Reg Mem Reg Structural Hazard Reading data from memory Reading instruction from memory Fix with separate instruction and data memories (I$ and D$) ECE232: Pipelining I 15 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Data Hazards (RAW) Time (in cycles) F I n s t r u c t i o n D EX M W F D EX M W Write Data to R1 Here Get data from R1 Here ADD R1, R2, R3 SUB R4, R1, R5 ECE232: Pipelining I 16 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren stall stall One Way to handle a Data Hazard I n s t r. O r d e r add $1, A L U IM Reg DM Reg sub $4,$1,$5 A L U IM Reg DM Reg By waiting introducing stalls but impacts CPI stall ECE232: Pipelining I 17 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren I n s t r. O r d e r Time (clock cycles) Inst 1 Inst 2 Inst 3 Inst 5 Inst 4 A L U Im Reg Dm Reg A L U Im Reg Dm Reg A L U Im Reg Dm Reg A L U Im Reg Dm Reg A L U Im Reg Dm Reg Must allow Wr/Rd in REG in same cycle Split cycle into two halves ECE232: Pipelining I 18 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren stall stall Only two stall cycles I n s t r. O r d e r add $1, A L U IM Reg DM Reg sub $4,$1,$5 and $6,$1,$7 A L U IM Reg DM Reg A L U IM Reg DM Reg Write in 1 st half, Read in 2 nd half ECE232: Pipelining I 19 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Another Way to Fix a Data Hazard I n s t r. O r d e r add $1, A L U IM Reg DM Reg sub $4,$1,$5 and $6,$1,$7 A L U IM Reg DM Reg A L U IM Reg DM Reg by forwarding xor $4,$1,$5 or $8,$1,$9 A L U IM Reg DM Reg A L U IM Reg DM Reg Time ECE232: Pipelining I 20 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Register File (write and then read) I n s t r. O r d e r Time (clock cycles) Inst 1 Inst 2 A L U IM Reg DM Reg A L U IM Reg DM Reg A L U IM Reg DM Reg A L U IM Reg DM Reg Fix register file access hazard by doing reads in the second half of the cycle and writes in the first half add $1, or $8,$1,$9 clock edge that controls loading of pipeline state registers ECE232: Pipelining I 21 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Internal data forwarding I n s t r. O r d e r add $1, A L U IM Reg DM Reg sub $4,$1,$5 and $6,$1,$7 A L U IM Reg DM Reg A L U IM Reg DM Reg Fix data hazards by forwarding results as soon as they are available to where they are needed xor $4,$1,$5 or $8,$1,$9 A L U IM Reg DM Reg A L U IM Reg DM Reg ALU-to-ALU forwarding vs. full forwarding ECE232: Pipelining I 22 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Forwarding with Load-use Data Hazards A L U IM Reg DM Reg A L U IM Reg DM Reg A L U IM Reg DM Reg A L U IM Reg DM Reg A L U IM Reg DM Reg sub needs to stall Will still need one stall cycle even with forwarding I n s t r. O r d e r lw $1,4($2) sub $4,$1,$5 and $6,$1,$7 xor $4,$1,$5 or $8,$1,$9 ECE232: Pipelining I 23 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Injecting Bubbles Read Address Instruction Memory Add P C 4 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 16 32 ALU Shift left 2 Add Data Memory Address Write Data Read Data I F e t c h / D e c D e c / E x e c E x e c / M e m M e m / W B IF ID EX MEM WB System Clock Sign Extend Inst -2 Inst -1 lw sub and Inst -1 lw bubble sub and Inst 2 Inst 1 lw sub and ECE232: Pipelining I 24 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren 3 Types of Data Hazards RAW (read after write) only hazard for fixed pipelines later instruction must read after earlier instruction writes WAW (write after write) variable-length pipeline later instruction must write after earlier instruction writes WAR (write after read) instruction with late read (e.g., waiting for an execution unit) later instruction must write after earlier instruction reads div $1,$4,$3 add $1,$2,$5 mlt $4,$1,$3 add $1,$2,$5 add $1,$2,$3 sub $4,$1,$5 W M EX D F W M EX D F W M EX D F s4 s5 E1 s2 s1 s3 W E3 E2 D F E2 E1 E3 W E5 E4 D F W M EX D F ECE232: Pipelining I 25 Adapted from Computer Organization and Design, Patterson&Hennessy,UCB, Kundu,UMass Koren Control Hazard Time (in cycles) F I n s t r u c t i o n D EX M W F D EX M W Destination Available Here Need Destination Here JR R25 ... XX: ADD ... Simple solution: Flush Instruction fetch until branch resolved