Anda di halaman 1dari 75

ELECT 90X

Programmable Logic Circuits:


Computer Arithmetic: Introduction

Dr. Eng. Amr T. Abdel-Hamid

Slides based on slides prepared by:


• B. Parhami, Computer Arithmetic: Algorithms and Hardware
Design, Oxford University Press, 2000.
• I. Koren, Computer Arithmetic Algorithms, 2nd Edition, A.K. P
eters, Natick, MA, 2002.

Fall 2009
What is Computer Arithmetic?
Programmable Logic Circuits

Pentium Division Bug (1994-95): Pentium’s radix-4 SRT


algorithm occasionally gave incorrect quotient
First noted in 1994 by T. Nicely who computed sums of re
ciprocals of twin primes:
1/5 + 1/7 + 1/11 + 1/13 + . . . + 1/p + 1/(p + 2) + . . .
Worst-case example of division error in Pentium:

1.333 820 44... Correct quotient


c = 4 195 835 =
3 145 727 1.333 739 06... circa 1994 Pentium
Dr. Amr Talaat

double FLP value;


accurate to only 14 bits
(worse than single!)

ELECT 90X
A Motivating Example
Using a calculator with √, x2, and xy functions, compute:
u = √√ … √ 2 = 1.000 677 131 “1024th root of 2”
Programmable Logic Circuits

v = 21/1024 = 1.000 677 131 Save u and v; If you ca


n’t save, recompute values when needed
x = (((u2)2)...)2 = 1.999 999 963
x' = u1024 = 1.999 999 973
y = (((v2)2)...)2 = 1.999 999 983
y' = v1024 = 1.999 999 994
Perhaps v and u are not really the same value
w = v – u = 1  10–11 Nonzero due to hidden digits
(u – 1)  1000 =0.677 130 680 [Hidden ... (0) 68]
(v – 1)  1000 =0.677 130 690 [Hidden ... (0) 69]
Dr. Amr Talaat

ELECT 90X
Finite Range Can Lead to Disaster
Programmable Logic Circuits

Example: Explosion of Ariane Rocket (1996 J


une 4)
Unmanned Ariane 5 rocket of the European Space Agency v
eered off its flight path, broke up, and exploded only 30 s
after lift-off (altitude of 3700 m)
The $500 million rocket (with cargo) was on its first voyage
after a decade of development costing $7 billion
Cause: “software error in the inertial reference system”
Problem specifics: A 64 bit floating point number relating
to the horizontal velocity of the rocket was being convert
ed to a 16 bit signed integer
Dr. Amr Talaat

An SRI* software exception arose during conversion becaus


e the 64-bit floating point number had a value greater th
an what could be represented by a 16-bit signed integer
(max 32 767) *SRI = Inertial Reference System

ELECT 90X
Encoding Numbers in 4 Bits
16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 16
Number
format
Programmable Logic Circuits

Unsigned integers

Signed-magnitude 

3 + 1 fixed-point, xxx.x

Signed fraction, .xxx 

2’s-compl. fraction, x.xxx

2 + 2 floating-point, s  2 e e s
e in [2, 1], s in [0, 3]
Dr. Amr Talaat

2 + 2 logarithmic (log = xx.xx) log x

Some of the possible ways of assigning 16 distinct codes to represent n


umbers.
ELECT 90X
The Binary Number System
➢ In conventional digital computers - integers repr
Programmable Logic Circuits

esented as binary numbers of fixed length n


➢ An ordered sequence of bi
nary digits
➢ Each digit x (bit) is 0 or 1
i
➢ The above sequence represents the integer value
X

➢ Upper case letters represent numerical values or s


Dr. Amr Talaat

equences of digits
➢ Lower case letters, usually indexed, represent indi
vidual digits
ELECT 90X
Radix of a Number System
Programmable Logic Circuits

➢ The weight of the digit xi is the i th power of 2


➢ 2 is the radix of the binary number system
➢ Binary numbers are radix-2 numbers -
allowed digits are 0,1
➢ Decimal numbers are radix-10 numbers - allo
wed digits are 0,1,2,…,9
➢ Radix indicated in subscript as a decimal numb
er
➢ Example:
Dr. Amr Talaat

➢ (101)10 - decimal value 101


➢ (101) - decimal value 5
2

ELECT 90X
Range of Representations
➢ Operands and results are stored in registers of
Programmable Logic Circuits

fixed length n - finite number of distinct value


s that can be represented within an arithmetic
unit
➢ Xmin ; Xmax - smallest and largest representab
le values
➢ [Xmin,Xmax] - range of the representable num
bers
➢ A result larger then Xmax or smaller than Xmin
- incorrectly represented
Dr. Amr Talaat

➢ The arithmetic unit should indicate that the ge


nerated result is in error - an overflow indica
tion
ELECT 90X
Example - Overflow in Binary System
➢ Unsigned integers with 5 binary digits (bits)
➢ Xmax = (31)10 - represented by (11111)2
Programmable Logic Circuits

➢ Xmin = (0)10 - represented by (00000)2


➢ Increasing Xmax by 1 = (32)10 =(100000)2
➢ 5-bit representation - only the last five digits retained -
yielding (00000)2 =(0)10
➢ In general -
➢ A number X not in the range [Xmin,Xmax]=[0,31] is
represented by X mod 32
➢ If X+Y exceeds Xmax - the result is S = (X+Y) mod 32
➢ Example: X 10001 17
+Y 10010 18
Dr. Amr Talaat

1 00011 3 = 35 mod 32
➢ Result has to be stored in a 5-bit register - the most signif
icant bit (with weight 2 =32)
5 is discarded

ELECT 90X
Fixed Radix Systems
➢ r - the radix of the number system
Programmable Logic Circuits

➢ Conventional number systems are also called fix


ed-radix systems
➢ With no redundancy - 0  xi  r-1
➢ xi  r introduces redundancy into the fixed-radix
number system ?? HOW?
➢ If xi  r is allowed -

➢ two machine representations for the same value


Dr. Amr Talaat

-(...,xi+1,xi,... ) and (...,xi+1+1,xi-r,... )

ELECT 90X
Representation of Mixed Numbers
➢ A sequence of n digits in a register - not necessa
Programmable Logic Circuits

rily representing an integer


➢ Can represent a mixed number with a fractional
part and an integral part
➢ The n digits are partitioned into two - k in the in
tegral part and m in the fractional part (k+m=n)
➢ The value of an n-tuple with a radix point betwee
n the k most significant digits and the m least sig
nificant digits
Dr. Amr Talaat

➢ is

ELECT 90X
Fixed Point Representations
➢ Radix point not stored in register - understood to be in a fix
Programmable Logic Circuits

ed position between the k most significant digits and the m


least significant digits
➢ These are called fixed-point representations
➢ Programmer not restricted to the predetermined position of
the radix point
➢ Operands can be scaled - same scaling for all operands
➢ Add and subtract operations are correct -
➢ aX  aY=a(X  Y) (a - scaling factor)
➢ Corrections required for multiplication and division
➢ aX • aY=a2 X • Y ; aX/aY=X/Y
➢ Commonly used positions for the radix point -
Dr. Amr Talaat

➢ rightmost side of the number (pure integers - m=0)


➢ leftmost side of the number (pure fractions - k=0)

ELECT 90X
ULP - Unit in Last Position
➢ Given the length n of the operands, the weigh
Programmable Logic Circuits

t -m
r of the least significant digit indicates the
position of the radix point
➢ Unit in the last position (ulp) - the weight of t
he least significant digit
➢ ulp = r-m
➢ This notation simplifies the discussion
➢ No need to distinguish between the different p
artitions of numbers into fractional and integr
al parts
Dr. Amr Talaat

ELECT 90X
Representation of Negative Numbers
Programmable Logic Circuits

➢ Fixed-point numbers in a radix r system


➢ Two ways of representing negative numbers:
➢ Sign and magnitude representation (or signed-
magnitude representation)
➢ Complement representation with two alternative
s
➢Radix complement (two's complement in the
binary system)
➢Diminished-radix complement (one's comple
Dr. Amr Talaat

ment in the binary system)

ELECT 90X
Signed-Magnitude Representation
➢ Sign and magnitude are represented separately
Programmable Logic Circuits

➢ First digit is the sign digit, remaining n-1 digits repre


sent the magnitude
➢ Binary case - sign bit is 0 for positive, 1 for negative
numbers
➢ Non-binary case - 0 and r-1 indicate positive and ne
gative numbers
n-1 n
➢ Only 2r out of the r possible sequences are utili
zed
➢ Two representations for zero - positive and negative
Dr. Amr Talaat

➢ Inconvenient when implementing an arithmetic un


it - when testing for zero, the two different repre
sentations must be checked

ELECT 90X
Disadvantage of the Signed-Magnitude
Representation
Programmable Logic Circuits

➢ Operation may depend on the signs of the operands


➢ Example - adding a positive number X and a negative num
ber -Y : X+(-Y)
➢ If Y>X, final result is -(Y-X)
➢ Calculation -
➢ switch order of operands
➢ perform subtraction rather than addition
➢ attach the minus sign
➢ A sequence of decisions must be made, costing excess con
trol logic and execution time
Dr. Amr Talaat

➢ This is avoided in the complement representation methods

ELECT 90X
Complement Representations of Negative
Numbers
Programmable Logic Circuits

➢ Two alternatives -
➢ Radix complement (called two's complemen
t in the binary system)
➢ Diminished-radix complement (called one's c
omplement in the binary system)
➢ In both complement methods - positive numbe
rs represented as in the signed-magnitude met
hod
➢ A negative number -Y is represented by R-Y w
Dr. Amr Talaat

here R is a constant
➢ This representation satisfies -(-Y )=Y since R
-(R-Y)=Y

ELECT 90X
Advantage of Complement Representation
Programmable Logic Circuits

➢ No decisions made before executing addition o


r subtraction
➢ Example: X-Y=X+(-Y)
➢ -Y is represented by R-Y
➢ Addition is performed by X+(R-Y) = R-(Y-X)
➢ If Y>X, -(Y-X) is already represented as R-(Y-
X)
Dr. Amr Talaat

➢ No need to interchange the order of the two o


perands

ELECT 90X
Two’s Complement
0
➢ r=2, k=n=4, m=0, ulp=2 =1
Programmable Logic Circuits

➢ Radix complement (called two's complement in the binary c


4
ase) of a number X = 2 - X
-
➢ It can instead be calculated by X+1
➢ 0000 to 0111 represent positive numbers 010 to 710
➢ The two's complement of 0111 is 1000+1=1001
➢ it represents the value (-7)10
➢ The two's complement of 0000 is 1111+1=10000=0 mod
24 - single representation of zero
➢ Each positive number has a corresponding negative number
that starts with a 1
Dr. Amr Talaat

➢ 1000 representing (-8)10 has no corresponding positive num


ber
➢ Range of representable numbers is -8  X  7

ELECT 90X
The Two’s Complement Representation
Programmable Logic Circuits
Dr. Amr Talaat

ELECT 90X
Example - Addition in Two’s complement
➢ Calculating X+(-Y) with Y>X - 3+(-5)
Programmable Logic Circuits

0011 3
+ 1011 -5
1110 -2
➢ Correct result represented in the two's comple
ment method - no need for preliminary decision
s or post corrections
➢ Calculating X+(-Y) with X>Y - 5+(-3)
0101 5
+ 1101 -3
1 0010 2
Dr. Amr Talaat

➢ Only the last four least significant digits are ret


ained, yielding 0010

ELECT 90X
One’s Complement in Binary System
➢ r=2, k=n=4, m=0, ulp=2 =1
Programmable Logic Circuits

0
➢ Diminished-radix complement (called one's com
plement in the binary case) of a number X =

(2 - 1) - X = X-
4

➢ As before, the sequences 0000 to 0111 represen


t the positive numbers 010 to 710
➢ The one's complement of 0111 is 1000, represe
nting (-7)10
➢ The one's complement of zero is 1111 - two rep
Dr. Amr Talaat

resentations of zero
➢ Range of representable numbers is -7  X  7

ELECT 90X
Comparing the Three Representations in a
Binary System
Programmable Logic Circuits
Dr. Amr Talaat

ELECT 90X
5.1 Bit-Serial and Ripple-Carry Adders
Programmable Logic Circuits

Inputs Outputs
x y
x y c s
---------------- c
0 0 0 0 HA
0 1 0 1
1 0 0 1
1 1 1 0 s

Half-adder (HA): Truth table and block diagram


Inputs Outputs
x y c c s
in out x y
----------------------
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0 FA
cout cin
Dr. Amr Talaat

1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1 s

Full-adder (FA): Truth table and block diagram

ELECT 90X
Half-Adder Implementations
Programmable Logic Circuits

_
x x
c c _
y y
x
x
y
s s y

(a) AND/XOR half-adder. (b) NOR-gate half-adder.


_
c
x
Dr. Amr Talaat

s y
(c) NAND-gate half-adder with complemented carry.

Three implementations of a half-adder.

ELECT 90X
Full-Adder Implementations
y x y x
Programmable Logic Circuits

cout HA
cout
HA cin
cin
s
(a) Built o f half -adders.

y x
Mux
0 0
cout 1
2
3 1
s
Dr. Amr Talaat

0
1 (b) Built as an AND- OR circuit.
cin
2
3
s Possible designs for a full-adder in terms
of half-adders, logic gates, and CMOS
(c) Suitable fo r CMOS r ealizatio n.
transmission gates.
ELECT 90X
Full-Adder Details
Programmable Logic Circuits

Logic equations for a full-adder:


s = x  y  cin (odd parity function)
= x y cin  x  y  cin  x  y cin  x y  cin
cout = x y  x cin  y cin (majority function)
y
P

x0 TG

TG x1
N TG
Dr. Amr Talaat

(a) CMOS transmission gate: (b) Two-input mux built of two


circuit and symbol transmission gates

CMOS transmission gate and its use in a 2-to-1 mux.

ELECT 90X
Simple Adders Built of Full-Adders
y
Using full-adders in building
Programmable Logic Circuits

x bit-serial and ripple-carry


Shift xi yi adders.
ci+1 ci
Carry
FF FA
Shift

Clock si s

(a) Bit-serial adder.


x31 y31 x1 y1 x0 y0

c32 c31 c2 c1 c0
FA . . . FA FA
Dr. Amr Talaat

cout cin

s32 s31 s1 s0
(b) Ripple-carry adder.

ELECT 90X
Critical Path Through a Ripple-Carry Adder
Programmable Logic Circuits

Tripple-add = TFA(x,ycout) + (k – 2)TFA(cincout) + TFA(cins)

xk–1 yk–1 xk-2 yk–2 x1 y1 x0 y0

ck ck–1 ck–2 c2 c1 c0
FA FA . . . FA FA
cout cin
Dr. Amr Talaat

sk sk–1 sk–2 s1 s0

Critical path in a k-bit ripple-carry adder.

ELECT 90X
Inputs Outputs
Binary Adders as
x
Versatile
y c
Building
c s
Blocks
in out x y
----------------------
Programmable Logic Circuits

0 0 0 0 0
Set one input to 0: 0 c out =
0 AND
1 of other
0 inputs
1
0 1 0 0 1
Set one input to 1: c
0 out 1 = OR1of other
1 inputs
0 FA
cout cin
1 0 0 0 1
Set one input to 0 1 0 1 1 0
and another to 1: 1 s 1= NOT 0 of third
1 input0
1 1 1 1 1 s

Bit 3 Bit 2 Bit 1 Bit 0


0 1 w 1 z 0 y x

c4 c3 c2 c1 c0
w  xyz
w  xyz xyz xy 0
Dr. Amr Talaat

(w  xyz)
Four-bit binary adder used to realize the logic function
f = w + xyz and its complement.

ELECT 90X
Conditions and Exceptions
yk–1 xk–1 yk–2 xk–2 y1 x1 y0 x0
Programmable Logic Circuits

c k–1
ck c k–2 c2 c1 c0
cout FA ... FA FA
FA cin

Overflow

Negative

Zero

s k–1 s k–2 s1 s0
Two’s-complement adder with provisions for
detecting conditions and exceptions.
Dr. Amr Talaat

overflow2’s-compl = ck  ck–1 = ck ck–1  ck ck–1

ELECT 90X
Manchester Carry Chains and Adders
Programmable Logic Circuits

Sum digit in radix r si = (xi + yi + ci) mod r


Special case of radix 2 si = xi  yi  c i

Computing the carries ci is thus our central problem


For this, the actual operand digits are not important
What matters is whether in a given position a carry is
generated, propagated, or annihilated (absorbed)
For binary addition:
gi = x i y i pi = x i  y i ai = xiyi  = (xi  yi) 
It is also helpful to define a transfer signal:
ti = gi  pi = ai = xi  yi
Dr. Amr Talaat

Using these signals, the carry recurrence is written as


ci+1 = gi  ci pi = gi  ci gi  ci pi = gi  ci ti

ELECT 90X
Carry Network is the Essence of a Fast Adder
Programmable Logic Circuits

gi pi Carry is: xi yi
0 0 annihilated or killed gi = x i y i
0 1 propagated
1 0 generated pi = x i  y i
1 1 (impossible)

g k2 p k2 g i+1 p i+1 gi pi


g1 p1
g k1 p k1 g0 p0
... ... c0

Carry network

ck
Ripple; Skip;
c k1
... ci ... c0 Lookahead;
c k2 c1
c i+1 Parallel-prefix
Dr. Amr Talaat

si

The main part of an adder is the carry network. The rest is just a set of
gates to produce the g and p signals and the sum bits.

ELECT 90X
Ripple-Carry Adder Revisited
Programmable Logic Circuits

The carry recurrence: ci+1 = gi  pi ci

Latency of k-bit adder is roughly 2k gate delays:


1 gate delay for production of p and g signals, plus
2(k – 1) gate delays for carry propagation, plus
1 XOR gate delay for generation of the sum bits

gk1 pk1 gk2 pk2 g1 p1 g0 p0

...
ck ck1 ck2 c2 c1 c0
Dr. Amr Talaat

The carry propagation network of a ripple-carry adder.

ELECT 90X
The Complete Design of a Ripple-Carry Adder
Programmable Logic Circuits

gi pi Carry is: xi yi
0 0 annihilated or killed gi = x i y i
0 1 propagated
1 0 generated pi = x i  y i
1 1 (impossible)

g k2 p k2 g i+1 p i+1 gi pi


g1 p1
g k1 p k1 g0 p0
... ... c0
gk1 pk1 gk2 pk2 g1 p1 g0 gk1
p0 pk1 gk2 pk2 g1 p1 g0 p0

... ...
ck ck1 ck2 c2 Carry network
c1 ck c
c c
0
k1 k2 c2 c1 c0

ck
c k1
... ci ... c0
c k2 c1
c i+1
Dr. Amr Talaat

si

ELECT 90X
Unrolling the Carry Recurrence
Recall the generate, propagate, annihilate (absorb), and transfer signals:
Programmable Logic Circuits

Signal Radix r Binary


gi is 1 iff xi + yi  r xi yi
pi is 1 iff xi + yi = r – 1 xi  yi
ai is 1 iff xi + yi < r – 1 xiyi  = (xi  yi) 
ti is 1 iff xi + yi  r – 1 xi  yi
si (xi + yi + ci) mod r xi  yi  ci
The carry recurrence can be unrolled to obtain each carry signal directly
from inputs, rather than through propagation
ci = gi–1  ci–1 pi–1
= gi–1  (gi–2  ci–2 pi–2) pi–1
Dr. Amr Talaat

= gi–1  gi–2 pi–1  ci–2 pi–2 pi–1


= gi–1  gi–2 pi–1  gi–3 pi–2 pi–1  ci–3 pi–3 pi–2 pi–1
= gi–1  gi–2 pi–1  gi–3 pi–2 pi–1  gi–4 pi–3 pi–2 pi–1  ci–4 pi–4 pi–3 pi–2 pi–1
=...

ELECT 90X
Full Carry Lookahead
x3 y3 x2 y2 x1 y1 x0 y0
Programmable Logic Circuits

cin

...

s3 s2 s1 s0

Theoretically, it is possible to derive each sum digit directly


from the inputs that affect it
Dr. Amr Talaat

Carry-lookahead adder design is simply a way of reducing


the complexity of this ideal, but impractical, arrangement by
hardware sharing among the various lookahead circuits

ELECT 90X
Four-Bit Carry-Lookahead Adder
c4
Complexity
Programmable Logic Circuits

p3
reduced by g3
deriving the
carry-out
indirectly c3

p2

g2
Full carry lookahead is quite practical
for a 4-bit adder
c2 p1
c1 = g0  c 0 p0
c2 = g1  g0 p1  c 0 p 0 p1 g1
p0
g2  g1 p2  g0 p 1 p2  c 0 p0 p1 p2
Dr. Amr Talaat

c3 =
c
c4 = g3  g2 p3  g1 p 2 p3  g0 p1 p2 p3 1 g0
 c 0 p0 p1 p2 p 3 c0
Four-bit carry network with
full lookahead.

ELECT 90X
Carry Lookahead Beyond 4 Bits
Programmable Logic Circuits

Consider a 32-bit adder


c 1 = g 0  c 0 p0
c 2 = g 1  g0 p1  c 0 p0 p 1
c 3 = g 2  g1 p2  g0 p1 p 2  c 0 p0 p1 p2
.
.
. 32-input AND
c31 = g30  g29 p30  g28 p29 p30  g27 p28 p29 p30  . . .  c0 p0 p1 p2 p3 ... p29 p30
Dr. Amr Talaat

... High fan-ins necessitate


32-input OR tree-structured circuits

ELECT 90X
Solutions to the Fan-in Problem
Programmable Logic Circuits

• Multilevel lookahead
• Block Adders
•High-radix addition (i.e., radix 2h) : Increases the latency for
generating g and p signals and sum digits, but simplifies the carry
network (optimal radix?)

Example: 16-bit addition


Radix-16 (four digits)
Two-level carry lookahead (four 4-bit blocks)

Either way, the carries c4, c8, and c12 are determined first
Dr. Amr Talaat

c16 c15 c14 c13 c12 c11 c10 c9 c8 c7 c6 c5 c4 c3 c2 c1 c0


Cout ? ? ?
cin

ELECT 90X
ELECT 90X
Block Ripple Adder
Programmable Logic Circuits Dr. Amr Talaat
Larger Carry-Lookahead Adder Design
Block generate and propagate signals
Programmable Logic Circuits

g [i,i+3] = gi+3  gi+2 pi+3  gi+1 pi+2 pi+3  gi pi+1 pi+2 pi+3
p [i,i+3] = pi pi+1 pi+2 pi+3

• If all 4 bits in a block propagate, the block propagates a carry.


• If at least one of the 4 bits generates carry and it can be propagated to
the MSB, the block generates a carry.
ci+3 ci+2 ci+1

gi+3 p i+3 gi+2 pi+2 gi+1 pi+1 g pi


i
Dr. Amr Talaat

4-bit lookahead carry generator


ci

g[i,i+3] p[i,i+3]
ELECT 90X
A Building Block for p [i,i+3]

Carry-Lookahead Addition
Programmable Logic Circuits

Four-bit g [i,i+3]
lookahead
pi+3
carry generator.
c4
gi+3
p3 Block Signal Generation
g3 Intermediate Carries

c3 ci+3

p2 pi+2

Four-bit gi+2
adder g2

p1 pi+1
Dr. Amr Talaat

c2 ci+2

g1 gi+1
p0 pi
c1 ci+1
g0 gi
ci
c0

ELECT 90X
Combining Block g and p Signals
Programmable Logic Circuits
Dr. Amr Talaat

Combining of g and p signals of four blocks of arbitrary widths into


the g and p signals for the overall block

ELECT 90X
A Two-Level Carry-Lookahead Adder
Programmable Logic Circuits

c12 c8 c4 c0

c 32 c16 g [12,15] g [8,11] g [4,7] g [0,3]


c48 p [12,15] p [8,11] p [4,7] p [0,3]

4-bit lookahead carry generator

g [48,63] g [32,47] 16-bit


g [16,31] g [0,15]
p [48,63] p [32,47] Carry-Lookahead
p [16,31] p [0,15]
Adder

4-bit lookahead carry generator

g [0,63]
Dr. Amr Talaat

p [0,63] Building a 64-bit carry-lookahead adder from 16 4-bit


adders and 5 lookahead carry generators.

ELECT 90X
Ling Adder and Related Designs
Programmable Logic Circuits

Consider the carry recurrence and its unrolling by 4 steps:


ci = gi–1  ci–1 ti–1
= gi–1  gi–2 ti–1  gi–3 ti–2 ti–1  gi–4 ti–3 ti–2 ti–1  ci–4 ti–4 ti–3 ti–2 ti–1
Ling’s modification: Propagate hi = ci  ci–1 instead of ci
hi = gi–1  hi–1 ti–2
= gi–1  gi–2  gi–3 ti–2  gi–4 ti–3 ti–2  hi–4 ti–4 ti–3 ti–2
CLA: 5 gates max 5 inputs 19 gate inputs
Ling: 4 gates max 5 inputs 14 gate inputs
The advantage of hi over ci is even greater with wired-OR:
CLA: 4 gates max 5 inputs 14 gate inputs
Ling: 3 gates max 4 inputs 9 gate inputs
Dr. Amr Talaat

Once hi is known, however, the sum is obtained by a slightly more


complex expression compared with si = pi  ci
si = (ti  hi+1)  hi gi ti–1

ELECT 90X
Carry Determination as Prefix Computation
Programmable Logic Circuits

Block B'
Block B" g p
g
j0 i0
j1 i1
p
(g", p") (g', p')
g" p" g' p'

¢
g = g" + g'p"
g p p = p'p"
Dr. Amr Talaat

(g, p)
Block B g p

ELECT 90X
Formulating the Prefix Computation Problem
Programmable Logic Circuits

The problem of carry determination can be formulated as:


Given (g0, p0) (g1, p1) . . . (gk–2, pk–2) (gk–1, pk–1)
Find (g [0,0] , p [0,0]) (g [0,1] , p [0,1]) . . . (g [0,k–2] , p [0,k–2]) (g [0,k–1] , p [0,k–1])

c1 c2 . . . ck–1 ck
The desired pairs are found by evaluating all prefixes of
(g0, p0) ¢ (g1, p1) ¢ . . . ¢ (gk–2, pk–2) ¢ (gk–1, pk–1)

The carry operator ¢ is associative, but not commutative


[(g1, p1) ¢ (g2, p2)] ¢ (g3, p3) = (g1, p1) ¢ [(g2, p2) ¢ (g3, p3)]

Prefix sums analogy:


Dr. Amr Talaat

Given x0 x1 x2 . . . xk–1
Find x0 x0+x1 x0+x1+x2 . . . x0+x1+...+xk–1

ELECT 90X
Example Prefix-Based Carry Network
g3, p3 g2, p2 g1, p1 g0, p0
6 1 2 5
Programmable Logic Circuits

+ + Four-input
prefix sums
network
+ +
12 6 7 5 g p
g[0,3], p[0,3] g[0,2], p[0,2] g[0,1], p[0,1] g[0,0], p[0,0] Scan
g
order
=g(c , --) = g(c2,3,p--)
3, 4p3 2
=
g1(c
, p21, --) =g(c , p,
0 1 0
--)
p

¢ ¢ Four-bit
Carry
lookahead
Dr. Amr Talaat

¢ ¢ network

g[0,3], p[0,3] g[0,2], p[0,2] g[0,1], p[0,1] g[0,0], p[0,0]


g p
= (c4, --) = (c3, --) = (c2, --) = (c1, --)

ELECT 90X
Alternative Parallel Prefix Networks
Programmable Logic Circuits

xk–1 . . . xk/2 xk/2–1 . . . x0


. . . . . .
Prefix Sums k/2 Prefix Sums k/2
. . . . . .

s k/2–1 . . . s0
+ . .. +
s k–1 . . . s k/2
Parallel prefix sums network built of two k/2-input
networks and k/2 adders. (Ladner-Fischer)
Dr. Amr Talaat

ELECT 90X
Brent-Kung Recursive Construction
x k–1 x k–2 . . . x3 x2 x1 x0
Programmable Logic Circuits

+ + +

. . .

Prefix Sums k/2


. . .
+ +

s k–1 s k–2 . . . s3 s2 s1 s0
Parallel prefix sums network built of one k/2-input
Dr. Amr Talaat

network and k – 1 adders.

ELECT 90X
Brent-Kung Carry Network (8-Bit Adder)
Programmable Logic Circuits

[7, 7 ] [6, 6 ] [5, 5 ] [4, 4 ] [3, 3 ] [2, 2 ] [1, 1 ] [0, 0 ] g[1,1] p[1,1]
g[0,0]
p[0,0]
¢ ¢ ¢ ¢
[6, 7 ] [2, 3 ]
[4, 5 ] [0, 1 ]
¢ ¢
[4, 7 ]
[0, 3 ]
¢ ¢
Dr. Amr Talaat

¢ ¢ ¢
g[0,1] p[0,1]

[0, 7 ] [0, 6 ] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1 ] [0, 0 ]

ELECT 90X
Brent-Kung Carry Network (16-Bit Adder)
x15 x14 x13 x12 x x x x x x x x x x x x
11 10 9 8 7 6 5 4 3 2 1 0
Programmable Logic Circuits

Level
1

Reason for 2
latency
3

Brent-Kung
5
parallel prefix
Dr. Amr Talaat

graph for
6
16 inputs.
s 15 s 14 s 13 s 12 s s s s s s s s s s s s
11 10 9 8 7 6 5 4 3 2 1 0

ELECT 90X
Kogge-Stone Carry Network (16-Bit Adder)
x15 x14 x13 x12 x x x x x x x x x x x x
11 10 9 8 7 6 5 4 3 2 1 0
Programmable Logic Circuits

log2k levels
(minimum
possible)

Kogge-Stone
Dr. Amr Talaat

parallel prefix
graph for
16 inputs.
s 15 s 14 s 13 s 12 s s s s s s s s s s s s
11 10 9 8 7 6 5 4 3 2 1 0

ELECT 90X
Speed-Cost Tradeoffs in Carry Networks
Programmable Logic Circuits

Method Delay Cost


Ladner-Fischer ? (k/2) log2k
Kogge-Stone ? k log2k – k + 1
Brent-Kung ? 2k – 2 – log2k
Dr. Amr Talaat

ELECT 90X
Hybrid B-K/K-S Carry Network (16-Bit Adder)
x
15
x x
14 13
x x
12 11
x x
10 9
x
8 x7 x x x x x x x x15 x14 x13 x12 x x x x x x x x x x x x
6 5 4 3 2 1 0 11 10 9 8 7 6 5 4 3 2 1 0
Level
1
Programmable Logic Circuits

Brent-Kung: 3 Kogge-Stone:
6 levels 4 4 levels
26 cells 5
49 cells
6

s 15 s 14 s 13 s 12 s 11 s 10 s 9 s8 s s s s s s s s0 s 15 s14 s 13 s12 s s s s s s s s s s s s
7 6 5 4 3 2 1 11 10 9 8 7 6 5 4 3 2 1 0

x15 x14 x13 x12 x x x x x x x x x x x x


11 10 9 8 7 6 5 4 3 2 1 0

Brent-
Kung
A Hybrid
Brent-Kung/
Kogge-Stone Kogge-
Hybrid:
parallel prefix Stone 5 levels
Dr. Amr Talaat

graph for 32 cells


16 inputs.
Brent-
Kung
s15 s14 s13 s12 s s s s s s s s s s s s
11 10 9 8 7 6 5 4 3 2 1 0

ELECT 90X
Simple Carry-Skip Adders
Programmable Logic Circuits

c 16 c12 c4 c0
4-Bit 4-Bit c8 4-Bit
Block Block Block 3 2 1 0
Ripple-carry stages
(a) Ripple-carry adder.
c16 c 12 c0
4-Bit 4-Bit c8 4-Bit c4
Block Block Block 3 2 1 0
p [12,15] p [8,11] p [4,7] p[0,3]
Skip Skip Skip

Skip logic (2 gates)


(b) Simple carry-skip adder.
Dr. Amr Talaat

Converting a 16-bit ripple-carry adder into a simple carry-skip


adder with 4-bit skip blocks.

ELECT 90X
Another View of Carry-Skip Addition
g4j+3 p4j+3 g4j+2 p4j+2 g4j+1 p4j+1 g4j p4j
Programmable Logic Circuits

c4j+4 c4j+3 c4j+2 c4j+1 c4j

One-way street
Dr. Amr Talaat

Freeway

Street/freeway analogy for carry-skip adder.

ELECT 90X
Multilevel Carry-Skip Adders
c out
c in
Programmable Logic Circuits

S1 S1 S1 S1 S1

One-level carry-skip adder.


c out
c in

S1 S1 S1 S1 S1

S2

Example of a two-level carry-skip adder.


c out
c in
Dr. Amr Talaat

S1 S1 S1

S2

Two-level carry-skip adder optimized by removing the short-block


skip circuits.
ELECT 90X
Using Two-Operand Adders
Programmable Logic Circuits

Some applications of multioperand addition

• • • • a • • • • • • p (0)
 • • • • x • • • • • • p (1)
---------- • • • • • • p (2)
• • • • x0 a2 0 • • • • • • p (3)
• • • • x1 a2 1 • • • • • • p (4)
• • • • x2 a2 2 • • • • • • p (5)
• • • • x3 a2 3 • • • • • • p (6)
---------------- -----------------
• • • • • • • • p • • • • • • • • • s
Dr. Amr Talaat

Multioperand addition problems for multiplication or inner-


product computation in dot notation.

ELECT 90X
Serial Implementation with One Adder
Programmable Logic Circuits

k + log2 n bits i–1

k bits
Adder  x (j)
j=0
x(i)
Partial sum
register
Serial implementation of multi-operand addition
with a single 2-operand adder.
Dr. Amr Talaat

ELECT 90X
Pipelined Implementation for Higher Throughput
Programmable Logic Circuits

x(i–6) + x(i–7)
x(i–1)
Ready to
Delays
compute s (i–12)
Delay
x(i) + x(i–1)

x(i) x(i–8) + x(i–9) + x(i–10) + x(i–11)


Dr. Amr Talaat

x(i–4) + x(i–5)
Serial multi-operand addition when each adder is a
4-stage pipeline.

ELECT 90X
Parallel Implementation as Tree of Adders
Programmable Logic Circuits

k k k k k k k
Adder Adder Adder
k+1 k+1 k+1
log2n
n–1 Adder Adder adder levels
adders
k+2 k+2

Adder
k+3
Dr. Amr Talaat

Adding 7 numbers in a binary tree of adders.

ELECT 90X
Carry-Save Adders
Cut
Programmable Logic Circuits

A ripple-carry adder FA FA FA FA
FA FA
turns into a
carry-save adder if the
carries are saved
(stored) rather than FA FA FA FA FA FA
propagated.

cin Carry-propagate adder

cout

Carry-save adder (CSA) Full-adder


or Half-adder
(3; 2)-counter
Dr. Amr Talaat

or
3-to-2 reduction circuit Specifying full- and half-
Carry-propagate adder (CPA) and adder blocks,
carry-save adder (CSA) functions in with their inputs and
dot notation. outputs, in dot notation.

ELECT 90X
Multioperand Addition Using Carry-Save Adders
Programmable Logic Circuits

CSA CSA

CSA

Input
CSA
CSA
Sum register CSA
Carry register
Carry-propagate adder
Dr. Amr Talaat

CPA

Output
Serial carry-save addition Tree of carry-save adders reducing
using a single CSA. seven numbers to two.
ELECT 90X
Example Reduction by a CSA Tree
8 7 65 4 3 2 1 0 Bit position
Programmable Logic Circuits

7 7 7 7 7 7 62 = 12 FAs
2 5 5 5 5 5 3 6 FAs
3 4 4 4 4 4 1 6 FAs
12 FAs
1 2 3 3 3 3 2 1 4 FAs + 1 HA
2 2 2 2 2 1 2 1 7-bit adder
--Carry-propagate adder--
6 FAs
1 1 1 1 1 1 1 1 1

Representing a seven-operand
6 FAs
addition in tabular form.
4 FAs + 1 HA

A full-adder compacts 3 dots into 2


Dr. Amr Talaat

7-bit adder
(compression ratio of 1.5)
Total cost = 7-bit adder + 28 FAs + 1 HA
A half-adder rearranges 2 dots
Addition of seven 6-bit (no compression, but still useful)
numbers in dot notation.
ELECT 90X
Width of Adders in a CSA Tree
[0, k–1] [0, k–1] [0, k–1] [0, k–1]
[0, k–1] [0, k–1] [0, k–1] Adding seven k-bit
Programmable Logic Circuits

numbers and the


k-bit CSA k-bit CSA CSA/CPA widths required.
[1, k] [0, k–1] [1, k] [0, k–1]

k-bit CSA
Due to the gradual
[1, k] [0, k–1]
retirement (dropping out)
k-bit CSA of some of the result bits,
CSA widths do not vary
[2, k+1] [1, k] [1, k–1] much as we go down the
tree levels
The index pair k-bit CSA
Dr. Amr Talaat

[i, j] means that [1, k+1]


bit positions
[2, k+1] [2, k+1]
from i up to j
are involved.
k-bit CPA

k+2 [2, k+1] 1 0

ELECT 90X
ELECT 90X
Wallace Tree Multiplier
Programmable Logic Circuits Dr. Amr Talaat
ELECT 90X
Wallace Tree Multiplier
Programmable Logic Circuits Dr. Amr Talaat
ELECT 90X
DADDA Tree Multiplier
Programmable Logic Circuits Dr. Amr Talaat
ELECT 90X
DADDA Tree Multiplier
Programmable Logic Circuits Dr. Amr Talaat
ELECT 90X
DADDA Tree Multiplier
Programmable Logic Circuits Dr. Amr Talaat
ELECT 90X
Wallace Tree Multiplier
Programmable Logic Circuits Dr. Amr Talaat
Saturating Adders
Programmable Logic Circuits

Saturating (saturation) arithmetic:


When a result’s magnitude is too large, do not wrap around;
rather, provide the most positive or the most negative value
that is representable in the number format
Example – In 8-bit 2’s-complement format, we have:
120 + 26  18 (wraparound); 120 +sat 26  127 (saturating)

Saturating arithmetic in desirable in many DSP applications

Designing saturating adders


Adder 0
Dr. Amr Talaat

Unsigned (quite easy) 1

Signed (slightly harder) Overflow


Saturation value

ELECT 90X
Readings:

➢ Main reference for the above slides:


Programmable Logic Circuits

➢ Chapters 5,6,7,& 8, B. Parhami, Computer Ar


ithmetic: Algorithms and Hardware Design, O
xford University Press, 2000.
Dr. Amr Talaat

ELECT 90X

Anda mungkin juga menyukai