Lect 2b - IEEE Floating Point Adder Arch

IEEE Floating Point Adder
Using the IEEE Floating Point Standard for an add/subtract execution units
1/8/2007 - L25 Floating Point Adder
Copyright 2006 - Joanne DeGroat, ECE, OSU
Lecture overview
The Interface Part by part A floating point adder design
Adder is double precision
Double Precision
s e (11-bits) f (52-bits)
Value of bits in word representation is:

If e=2047 and f /= 0, then v is NaN regardless of s s If e=2047 and f = 0, then v = (-1) s e-1023 If 0 < e < 2047, then v = (-1) 2 (1.f)
normalized number
If e = 0 and f /= 0, the v = (-1) 2-1022 (0.f)
Denormalized numbers allow for graceful underflow

s
If e = 0 and f = 0 the v = (-1) 0 (zero)

Copyright 2006 - Joanne DeGroat, ECE, OSU 3
Specification of a FPA
Floating Point Add/Subtract Unit Specification

Inputs in IEEE 754 Double Precision Must perform both addition and subtraction Must handle the full floating point standard

Normalized numbers Not a Numbers NaNs +/- Infinity Denormalized numbers

Specifications continued

Result will be a IEEE 754 Double Precision representation Unit will correctly handle the invalid operation of adding + and - = Nan per the standard Unit latches it inputs into registers from parallel 64-bit data busses. There is a separate signal line that indicates the operation add or subtract
Specifications continued
Outputs

The correctly represented result Flags that are output are
Zero result Overflow to infinity from normalized numbers as inputs NaN result Overshift (result is the larger of the two operands) Denormalized result Inexact (result was rounded) Invalid operation for addition
High level block diagram
Basic architecture interface

Data 64 bit A,B,& C Busses Control signals Latch, Add/Sub, Asel, Drive Condition Flags Output 7 Flag signals Clocks Phi1 and Phi2 (a 2 phase clocked architecture
Abus Bbus
Add/Sub Latch Phi1 Phi2
Floating Point Adder Unit
Asel Drive
Cbus
Flags
Start the VHDL
The entity interface

entity Floating_Point_Adder is port (A_Ma in : in BIT_VECTOR; B_Main : in BIT_VECTOR; C_Main : out BIT_VECTOR; A_Out: out BIT_VECTOR; Flags : out BIT_VECTOR; Add_or_sub : in BIT; Latc h : in BIT; Driv e : in BIT; Phi1 : in BIT; Phi2 : in BIT; Asel : in BIT ); end Floa ting_Point_Adder;
A_Main B- Main Add_or_sbub Latch Phi1 Phi2 Drive Asel Flags
Floating Point Adder
C_Main
A_Out
Basic design

Can be divided into functional sub-blocks First latch and drive
A/S
INPUT LATCHES
RESULT LATCHES OUTPUT DRIVERS
What goes in the other blocks

From adjusting the inputs to prepare to add To add To renormalize To round
A/S
INPUT LATCHES Input Adjust Add Mantissas Normalize Result Round according to selected scheme RESULT LATCHES OUTPUT DRIVERS
10
VHDL coding for the latched

A first cut The input latches Note 2 phase
b1: block ((Phi2 and Latch) = '1') begin A_temp <= guarded A_Main; B_temp(63) <= guarded Add_or_sub xor B_Main(63); B_temp(62 downto 0) <= guarded B_Main(62 downto 0); end block; b2: block (Phi1 = '1') begin signa <= guarded A_temp(63); signb <= guarded B_temp(63); expa <= guarded A_temp(62 downto 52); expb <= guarded B_temp(62 downto 52); mana <= guarded A_temp(51 downto 0); manb <= guarded B_temp(51 downto 0); end block;
11
And on the output

Drivers Note use of guarded blocks
out_latch1 : block ((Drive and Phi2) = '1') begin Flags <= guarded flag_temp; end block out_latch1; out_latch2 : block ((Drive and Phi2 and (not Asel)) = '1') begin C_Main <= guarded signout & exp_round & man_round(51 downto 0); end block out_latch2; out_latch3 : block ((Drive and Phi2 and Asel) = '1') begin A_out <= guarded signout & exp_round & man_round(51 downto 0); end block out_latch3;
12
And what goes in between?

In the final design lots goes in between but You first want to make sure that the latches are working properly So just pass one input to the output and check
signout <= signa; exp_round <= expa; man_round <= '0' & A_temp;
And once this works properly can move on with the design
The first section

Prepare to add Identify type of inputs and appropriately adjust operands

Aexp Asign Bsign Exponent Processing Logic
Shift Dist E> E= E< EA0 EA1 EB 0 EB 1 M> M= M< MA 0 MB 0
Bexp
Amantissa
Bmantissa
Mantissa Processing Logic
EA0 & Aman
Aman & 0
EB0 & Bman
Bman & 0 EB0MB0
Larger Exp (to Norm Unit) 2-1Mux

( E> + (E=M>) )
EA0MA0
selR
Swap
L 2-1 Mux R
selR
L 2-1 Mux R
E>+( E=M>)
2x2 crossb ars elements

"Zero" "Nan"
EA1+EB1
selR
L 2-1 Mux R
selR
L 2-1 Mux R
Cntrl Eq
Sign Out (63) to output latch Shift Dist Right Linear Shifter
SignA xor SignB
selR
L 2-1 Mux R ADDER
Adder Output (to normalize un it)
14
The exponent unit portion

Must get the larger exponent And the difference between the exponents which is the shift distance Also several control signals
Exponent all 0s and all 1s Exponent A>B, A<B, =

Mantissa Processing Logic
Need to examine the two fractional parts and generate several control signals that are required to prepare the operands Need relational signals M>, M=, M<
Needed to know which operand to shift
Need to know if stored fractional part if all 0s or not Needed for NaN, 0, and determination
After generating control signals

Step 1 is to select between a normalized mantissa and a denormalized mantissa For normalized Prepend NOT(Ex0)

If Ex0 is a 1 then the exponent if all 0s and you have a denormalized number or 0 When Ex0 is a 0 you have a NaN, infinity, or a normalized number
Other selection is the factional part shifted left by 1 and postpended by a 0

For denormalized numbers Taking it from 2-126 to 2-127 and can now treat it like a normalized number
Now select between these two
Select the denormalized

WHEN Ex0 * (NOT Mx0) When Ex0 is a 1 you have a denormalized number or 0 When Mx0 is a 0 there is a least 1 bit of the fractional part that is a 1 and thus you have a denormalized number Select this case when Ex0 is a 0 or Mx0 is a 1 When Mx0 is a 1 have infinity, 0, or a normalized number When Ex0 is a 0 have a normalized number, infinity, or NaN
Select the NaN, infinity, 0, normalized number

Shown in table form
Selection table to also point out this relationship Note that for a 0 have NOT(Ex0) prepended to the fractional part or a 0.00000000
Ex0 Mx0 Mx0 Ex0*Mx0 Select R 0 0 1 1 0 1 0 1 1 0 1 0
Select L
infinity 0 norm 0 NaN 0 0 1 denorm
19
Selections are input to a crossbar
The crossbar switch place the larger value on the right path and the small onto the left path The small is the operand to shift if any shifting to align the binary point is needed The equation for exchange on the crossbar is
E> + (E=*M>) or shift the A input to the right side if the exponent of A is the larger OR the exponents are equal and the fractional part of A is larger
The next multiplexers

Now have the smaller on the left path and the larger on the right path. On the left path if either exponent is all 1s then that operand is NaN or infinity and has been crossbarred, or is equal, to the right path operand. In this case want to simply pass it through to the output by adding 0 to it. So a 0 is one choice of the left path mux. On the right path select the right path value or mux in a hardwired NaN for an illegal operation
Linear shifting
Next step is to linear shift the left operand The exponent generates the exponent > signals by subtracting the exponents ExpA-ExpB and ExpB-ExpA Then with the help of the all control signals the exponent difference is known and this value is sent to the shifter.
One last mulitplexer
The right path operand, the larger is simply input to the ADDER. On the left path the output of the linear shifter is sent to the ADDER for a + operation OR The ones complement of the value is sent to the ADDER for a operation. In this case the input carry is handled appropriately.
Code for this section - behavioral
Most of code is generation of various signals and movement of data in muxes
-expgt <= '1' when (expa > expb) else '0'; expeq <= '1' when (ex pa = ex pb) else '0'; Exponent Process ing explt <= '1' when (expa < expb) else '0'; expa0 <= '1' when (ex pa = zeroexp) else '0'; expa1 <= '1' when (ex pa = oneexp) else '0'; expb0 <= '1' when (expb = zeroexp) else '0'; expb1 <= '1' when (expb = oneex p) else '0'; shiftdist <= ex pdist (exp a,expb); larger_exp <= expa when (expa >= expb) else expb ; -mangt <= '1' when (mana > manb) else '0'; Mantiss a Proces sing maneq <= '1' when (mana = manb) else '0'; manlt <= '1' when (man a < manb) else '0'; mana0 <= '1' when (mana = zeroman) else '0'; manb 0 <= '1' wh en (manb = zeroman) else '0'; -adenorm <= expa0 and (not mana0); Expanded Normalized bdenorm <= expb0 and (not manb0); Form lshfa(52 downto 1) <= mana; lshfa (0) <= '0'; lshfb(52 downto 1) <= man b; lshfb(0) <= '0'; lxbarin <= lsh fa when (adenorm = '1') else ((n ot expa0) & mana); rxbarin <= lashb when (bden orm = '1') else ((not expb0) & manb); --
Hard Code Ze ro in_mux_l_zero <= exp a1 or expb1; in_mux_l <= zero_man when (in_mux_l_zero = '1') else xbar_l; -Shift smalle r shifted_sig <= sh ift (shiftdist, in_mux_l); -A+B or A+(-B)? twoscomp <= signa xor signb; lad derin <= '0' & shifted_sig when (twoscomp = '0') else '1' & (not shifted_sig); radderin <= '0' & in_mux_r & "0000"; -adder_out <= add (ladderin, radderin, twoscomp);
XBar to place large r swap <= expgt or (expeq and mangt); xbar_r <= lxbarin wh en (swap = '1') else rxbarin; # on right path xbar_l <= rxbarin wh en (swap = '1') esle lx barin; -in_mux_r_man <= expa1 and mana0 and exp b1 and manb0 and (signa xor signb); Hard Code Nan in_mux_r <= nan_man when (in_mux_r_nan = '1') else x bar_r; --
Binary Add
24
Xbar code highlight
Code

swap <= expgt OR (expeq AND mangt); xbar_r <= lxbarin when (swap = 1) else rxbarin; xbar_l <= rxbarin when (swap = 1) else lxbarin;
25
Hard code NaN VHDL code
The code

-- Control equation for mux in_mux_r_man <= expa1 AND mana0 and expb1 AND manb0 and (signa XOR signb); in_mux_r <= nan_man WHEN (in_mux_r_man = 1) ELSE xbar_r;
26
Now add the mantissas

Simply add the two mantissas. As the sign of the B input was XORed with the operation, i.e., inverted if it was a subtract operation, the carry in the the XOR of the two signs. If the signs are different then a subtract is being performed and a 1 if being input to the carry in of the adder. The adder does twos complement addition. Inputs are of the form x.xxxxxxx or 54 bits. The output is of the form xx.xxxxxxx or 58 bits
On to the next challenge
This is perhaps the hardest part renormalization of the result Have a result exponent (the exponent of the larger) and a mantissa in the form xx.xxxxxxxxxx The following slide shows the processing needed
Renormalization Unit
Have exponent and mantissa to deal with.

detect all 1's all1
c1 c2 c3 c4
Larger Exponent
Adder Output
000000 & value
2nd 1st Ld1pos 2-1 Mux
Result Signal Generation XX.XXXXXX---> fract0 Left Linear Shifter Right Shift 1 Right Shift 1 zero
0 & value
inverters
UF
+1 incrementer
Adder detect all 0's UF all0 zero
6lsb
2 4 to 1 Mux
c1 c2 c3 c4 c5
2 3 5 to 1 Mux
exp_norm
man_norm
29
Many choices to deal with

May need to shift the mantissa 1 position to the right on a fixed binary point. May be OK as is May have to shift left then need to know the position of the leading 1.

In a behavioral model can simply shift left once, increment a counter and then check. In hardware need a leading 1 detector that give the position of the leading 1 so that the mantissa can be shifter left.
Interactions
All shifts of mantissa result in exponent adjustment. There are 4 choices on the exponent
As is Incremented by 1 Adjusted down by some amount depending on shift Zero

Interactions
There are 5 choices on the mantissa

As is Right shifted by 1 increment exp by 1 Left shifted for leading 1 Left shifted and then right shifted by 1 Hardwired 0
This part is the same for both addition and multiplication. Easy to do algorithmically.
Rounding Unit

Once done with renormalization will look at the guard bits to determine rounding. Standard specifies several rounding modes. Can also just truncate.
exp_norm +1 incrementer msbin 2-1 Mux
Round(msbin xor msbout)
man_norm
5lsb
53msb
Round Logic Round
+1 incrementer msbout Exponent output 2-1 Mux
Mantissa Output
33
Rounding
Can result in changes to both the mantissa and the exponent. After rounding final result is output in normalized form.
34
And dont forget the flags

Any arithmetic unit output flags on the status and validity of the result. The flags to be generated are output from various control signals or combinations of various control signals.
zero <= (not dec2) and (not dec1) and fract0; overflow <= man_sel_0 or (round_incr_exp and exp_norm_p1_all1); nan <= (expa1 and (not mana0)) or (expb1 and (not manb0)) or invalid; denorm <= (man_sel_ls_rs and (not fract0)) or ((not dec2) and dec1 and lgrall0); inexact <= round; invalid <= in_mux_r_nan; flag_temp <= zero & overflow & nan & overshift & denorm & inex act & invalid;
35
To test (verify) the design

Must test for normal operation and boundary conditions Will check A by B

NaN +/- infinity +/- 0 Denorm Norm
NaN +/- infinity +/- 0 Denorm Norm
For both direct and all crossed pairings

Boundary conditions
Wish to check several boundary conditions

Denorm + Denorm = Max Denorm Denorm + Denorm = Min Norm Norm Norm = Max Denorm Rounding using first guard bit Rounding using 1st and 2nd guard bits
Testing
Testing of the design code is not necessarily the same as the testing the would be done on the chip. The testing of the design is call verification and must insure that all possible input combinations produce the specified output.
38
Scan of entire architecture
39
Scan of the chip
40

Lect 2b - IEEE Floating Point Adder Arch

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Lect 2b - IEEE Floating Point Adder Arch

Diunggah oleh

Hak Cipta:

Format Tersedia

IEEE Floating Point Adder

1/8/2007 - L25 Floating Point Adder

Copyright 2006 - Joanne DeGroat, ECE, OSU

The Interface Part by part A floating point adder design

1/8/2007 - L25 Floating Point Adder

Copyright 2006 - Joanne DeGroat, ECE, OSU

Adder is double precision

Value of bits in word representation is:

If e = 0 and f /= 0, the v = (-1) 2-1022 (0.f)

Denormalized numbers allow for graceful underflow

If e = 0 and f = 0 the v = (-1) 0 (zero)

1/8/2007 - L25 Floating Point Adder

Floating Point Add/Subtract Unit Specification

Normalized numbers Not a Numbers NaNs +/- Infinity Denormalized numbers

1/8/2007 - L25 Floating Point Adder

1/8/2007 - L25 Floating Point Adder

The correctly represented result Flags that are output are

1/8/2007 - L25 Floating Point Adder

High level block diagram

Basic architecture interface

Add/Sub Latch Phi1 Phi2

Floating Point Adder Unit

1/8/2007 - L25 Floating Point Adder

Copyright 2006 - Joanne DeGroat, ECE, OSU

Start the VHDL

The entity interface

Floating Point Adder

1/8/2007 - L25 Floating Point Adder

Copyright 2006 - Joanne DeGroat, ECE, OSU

Can be divided into functional sub-blocks First latch and drive

RESULT LATCHES OUTPUT DRIVERS

1/8/2007 - L25 Floating Point Adder

Copyright 2006 - Joanne DeGroat, ECE, OSU

What goes in the other blocks

From adjusting the inputs to prepare to add To add To renormalize To round

1/8/2007 - L25 Floating Point Adder

Copyright 2006 - Joanne DeGroat, ECE, OSU

VHDL coding for the latched

A first cut The input latches Note 2 phase

1/8/2007 - L25 Floating Point Adder

Copyright 2006 - Joanne DeGroat, ECE, OSU

And on the output

Drivers Note use of guarded blocks

1/8/2007 - L25 Floating Point Adder

Copyright 2006 - Joanne DeGroat, ECE, OSU

And what goes in between?

1/8/2007 - L25 Floating Point Adder

The first section

Prepare to add Identify type of inputs and appropriately adjust operands

Mantissa Processing Logic

EA0 & Aman

EB0 & Bman

Bman & 0 EB0MB0

Larger Exp (to Norm Unit) 2-1Mux

2x2 crossb ars elements

SignA xor SignB

L 2-1 Mux R ADDER

Adder Output (to normalize un it)

1/8/2007 - L25 Floating Point Adder

Copyright 2006 - Joanne DeGroat, ECE, OSU

The exponent unit portion

Exponent all 0s and all 1s Exponent A>B, A<B, =

1/8/2007 - L25 Floating Point Adder