Anda di halaman 1dari 24

5-stage Pipeline CPU hardware

Pipeline CPU hardware

Distribution of control signals in pipeline CPU


Data hazards
Control hazards
 Control hazard occurs whenever there is change in normal sequential flow of
program (caused by branch/jump, calling subroutine, interrupt, return from
interrupt etc.)
Structural hazards
 [1] multiply instruction holds Ex stage for two or more clock cycle.

 [2]Two or more instructions in pipeline try to read/write register file =>


Since there is only one read/write port, only one instruction is allowed to
read/write register file.
ARM Architecture
 ARM core :
 Pipelined RISC CPU reduced number of fixed size instructions
 Offers high code density, small size, low power
 Applications are cell phones, handheld PDA, camera
 But different from pure RISC (to gain some advantages)
 Variable cycle execution for certain instructions to support multiple
load and store
 Inline barrel shifter leading to few complex instructions –
preprocessing one operand enhances computational power
 Thumb state (16-bit instruction set) to improve code density
 Conditional execution of instructions for smooth pipeline operation
 DSP instructions to support signal processing
 Performance: speed=> MIPS@ Clk freq., DMIPS@ Clk freq.
power=> mW @ (Volt, Clk freq., technology)

6
DMIPS
 Dhrystone is a synthetic benchmark program for system programming. So
DMIPS measures not just instructions per second but gives an idea of how
long overall it will take one processor to perform a task versus another,
taking into account the different number and kinds of instructions.
 The industries have adopted the VAX 11/780 as the reference 1 MIPS
machine. The VAX 11/780 achieves 1757 Dhrystones per second.

 The Dhrystone figure of given computing system is calculated by measuring


the number of Dhrystones executed per second and dividing that by 1757.
So if a computing system able to execute 140560 dhrystones per second,
then its DMIPS rating is 140560/1757 = 80 DMIPS
 To compare two computing systems that run at different clock frequency,
DMIPS is normalized to clock frequency.
e.g. 60 DMIPS @ 40 MHz = 1.5 DMIPS/MHz
 New Benchmarking => CoreMark MIPS

7
 Sign Extend -> converts
signed 8/16 bit to 32 bit
value and places in reg.
 Two source registers (Rn
and Rm) and one result
register Rd
 Barrel shifter =>
preprocess Rm before it
enters to ALU
 MAC unit => for multiply
and accumulation
operation

8
On Chip Debug Hardware

9
ARM Architecture
 ARM Core under study is ARM7TDMI
 ARM state => Instructions are 32-bit wide and address is word aligned
 Thumb state => Instructions are 16-bit and address is half-word aligned
ARM Modes:
 Different Modes of ARM processor are defined for specific purpose
 User mode => most application softwares run in this mode

10
ARM Architecture
 Exception modes => Supervisor, IRQ, FIQ, abort, undefined
 Non exception modes=> User, System
 ‘supervisor’ mode => runs embedded operating system routines
 ‘User’ mode => runs Application programs
 IRQ & FIQ modes => handles hardware interrupts
 Abort mode => handles memory access violations
 Undefined mode => handles undefined instruction
ARM Architecture
CPSR:
 32-bit register with condition flags, control bits, status & ext.
 Only privileged modes have full write access to CPSR
 Every processor mode except user mode can change mode by writing
directly to the mode bits of the CPSR.

 N = 1 if MSB of the ALU result is 1


 Z = 1 if Zero result from ALU
 C = 1 if ALU operation results in Carry (if Subtraction result is -ve =>C reset)
 V =1 if ALU operation oVerflowed (useful for signed numbers only)
 Flags are updated only if suffix ‘S’ is added to instruction 12
ARM Architecture
 When the processor is executing in ARM state:
 All instructions are 32 bits wide
 All instructions must be word aligned
 Therefore the pc value is stored in bits [31:2] with bits [1:0]
undefined (as instruction cannot be halfword or byte aligned).

 When the processor is executing in Thumb state:


 All instructions are 16 bits wide
 All instructions must be halfword aligned
 Therefore the pc value is stored in bits [31:1] with bit [0] undefined
(as instruction cannot be byte aligned).

 When the processor is executing in Jazelle state:


 All instructions are 8 bits wide
 Executes java byte codes

13
Banked Registers:

15
ARM Architecture

 Total 37 registers = 30 general purpose + 6 status + 1 PC


 Different set of register in different mode of operation
 User and System mode uses same set of registers
 Shaded registers (banked registers) are hidden from user/system mode and
available only in exception modes.
 R13 = Stack pointer (SP). Each exception mode has its own SP
 R14 = link register (LR) -> Holds return address of subroutine when it is
called with BL instruction.
 Each exception mode has its own SP and LR
BL <cc> subroutine_label (LR automatically stores return add.)
 The return can be in two ways

 MOV PC, LR or
 B LR

16
ARM Family and Cores

ARM Core Features ARM ISA Thumb


family version version

ARM7TDMI 3-state pipeline, thumb state ARMv4T v1


ARM7 ARM 720T as ARM7TDMI, cache
ARM 740T as ARM7TDMI, cache
ARM 920T 5-stage pipeline, thumb, data and inst. ARMv4T
cache, MMU
ARM 922T 5-stage pipeline, thumb, data and inst.
cache, MMU
ARM9 ARM946E 5-stage pipeline, thumb, Enhanced DSP ARMv5TE
instructions, caches, MPU
ARM926EJ 5-stage pipeline, thumb, Jazelle DBX, ARMv5TEJ
Enhanced DSP instructions, caches, MMU

ARM11 ARM1156T2(F) 8-stage pipeline, SIMD, Thumb-2, VFP, ARMv6T2 v2


Enhanced DSP instructions

ARM Cortex Series: Profile A, Profile R, Profile M


ARM Data Processing
 Syntax : <opcode> {<cc>} {S} Rd, Rn, op2
 ‘op2’ normally comes from barrel shifter and can be the following:

 Rm and Rs should not be PC (r15) in shift/rotate by register mode of ‘op2’


 shift and rotate affects N,Z,C flags
 # value for shift and rotate is 5-bit unsigned integer

18
19
ARM - The Barrel Shifter
LSL : Logical Left Shift ASR: Arithmetic Right Shift

CF Destination 0 Destination CF

Multiplication by a power of 2 Division by a power of 2,


preserving the sign bit
LSR : Logical Shift Right
ROR: Rotate Right

...0 Destination CF Destination CF

Bit rotate with wrap around


Division by a power of 2
from LSB to MSB

RRX: Rotate Right Extended

Destination CF

Single bit rotate with wrap around


from CF to MSB

20
ARM Data Processing Instructions

 CMP,CMN,TST & TEQ always update flags (even if ‘S’ is not used as
suffix) and do not alter any register. They use only Rn and OP2.
 MOV & MVN use only two operands i.e. Rd and ‘op2’

21
Data processing:
 ADD R9, R5, R5, LSL #3 ; R9 = R5+(R5*8) = 9*R5
 RSB R9, R5, R5, LSR #3 ; R9 = (R5/8) – R5
 MOV R12, R4, ROR R3 ;R12= R4 rotated right by value of R3
 CMP R7, R5 ; update flags after (R7-R5)

Conditional Execution:
 ARM instructions can be made to execute conditionally by post fixing
them with the appropriate condition code field. (e.g. MOVEQ R0,R1)
 Condition checks the status of appropriate flags
 If condition is true, normal execution otherwise no execution.
 Adv. => Greater pipeline performance and higher code density leading to
higher instructions throughput

22
ARM Conditional Execution

23
ARM Conditional Execution
 Set the flags, and then use various conditional codes
 CMP r0, # 0 if (a==0) x=0; (here r0 = a, r1= x)
 MOVEQ r1, # 0 if (a>0) x=1;
 MOVGT r1, #1
 Set of Conditional compare instruction
 CMP r0, # 4 if (a==4 or a==10)
 CMPNE r0, #10 x=0;
 MOVEQ r1, # 0

 Reduces number of instructions


While (a!=b) {
if (a>b) a=a-b; else b=b-a; } (here r1 = a, r2= b)
------------------------------------------------------------------------------------------
loop: CMP r1,r2 loop1: CMP r1, r2
BEQ finish SUBGT r1, r1, r2
BLT lessthan SUBLT r2, r2, r1
SUB r1, r1, r2 BNE loop1
B loop
lessthan : SUB r2,r2,r1
B loop
finish

24

Anda mungkin juga menyukai