Computer Arithmetic: Introduction: Programmable Logic Circuits

ELECT 90X
Programmable Logic Circuits:

Computer Arithmetic: Introduction
Dr. Eng. Amr T. Abdel-Hamid
Slides based on slides prepared by:

• B. Parhami, Computer Arithmetic: Algorithms and Hardware
Design, Oxford University Press, 2000.
• I. Koren, Computer Arithmetic Algorithms, 2nd Edition, A.K. P
eters, Natick, MA, 2002.
Fall 2009
What is Computer Arithmetic?
Programmable Logic Circuits
Pentium Division Bug (1994-95): Pentium’s radix-4 SRT

algorithm occasionally gave incorrect quotient
First noted in 1994 by T. Nicely who computed sums of re
ciprocals of twin primes:
1/5 + 1/7 + 1/11 + 1/13 + . . . + 1/p + 1/(p + 2) + . . .
Worst-case example of division error in Pentium:
1.333 820 44... Correct quotient

c = 4 195 835 =
3 145 727 1.333 739 06... circa 1994 Pentium
Dr. Amr Talaat
double FLP value;

accurate to only 14 bits
(worse than single!)
ELECT 90X
A Motivating Example
Using a calculator with √, x2, and xy functions, compute:
u = √√ … √ 2 = 1.000 677 131 “1024th root of 2”
v = 21/1024 = 1.000 677 131 Save u and v; If you ca

n’t save, recompute values when needed
x = (((u2)2)...)2 = 1.999 999 963
x' = u1024 = 1.999 999 973
y = (((v2)2)...)2 = 1.999 999 983
y' = v1024 = 1.999 999 994
Perhaps v and u are not really the same value
w = v – u = 1  10–11 Nonzero due to hidden digits
(u – 1)  1000 =0.677 130 680 [Hidden ... (0) 68]
(v – 1)  1000 =0.677 130 690 [Hidden ... (0) 69]
Dr. Amr Talaat
ELECT 90X
Finite Range Can Lead to Disaster
Example: Explosion of Ariane Rocket (1996 J

une 4)
Unmanned Ariane 5 rocket of the European Space Agency v
eered off its flight path, broke up, and exploded only 30 s
after lift-off (altitude of 3700 m)
The $500 million rocket (with cargo) was on its first voyage
after a decade of development costing $7 billion
Cause: “software error in the inertial reference system”
Problem specifics: A 64 bit floating point number relating
to the horizontal velocity of the rocket was being convert
ed to a 16 bit signed integer
Dr. Amr Talaat
An SRI* software exception arose during conversion becaus

e the 64-bit floating point number had a value greater th
an what could be represented by a 16-bit signed integer
(max 32 767) *SRI = Inertial Reference System
ELECT 90X
Encoding Numbers in 4 Bits
16 14 12 10 8 6 4 2 0 2 4 6 8 10 12 14 16
Number
format
Unsigned integers
Signed-magnitude 
3 + 1 fixed-point, xxx.x
Signed fraction, .xxx 
2’s-compl. fraction, x.xxx
2 + 2 floating-point, s  2 e e s
e in [2, 1], s in [0, 3]
Dr. Amr Talaat
2 + 2 logarithmic (log = xx.xx) log x
Some of the possible ways of assigning 16 distinct codes to represent n

umbers.
ELECT 90X
The Binary Number System
➢ In conventional digital computers - integers repr
esented as binary numbers of fixed length n

➢ An ordered sequence of bi
nary digits
➢ Each digit x (bit) is 0 or 1
i
➢ The above sequence represents the integer value
X
➢ Upper case letters represent numerical values or s

Dr. Amr Talaat
equences of digits
➢ Lower case letters, usually indexed, represent indi
vidual digits
ELECT 90X
Radix of a Number System
➢ The weight of the digit xi is the i th power of 2

➢ 2 is the radix of the binary number system
➢ Binary numbers are radix-2 numbers -
allowed digits are 0,1
➢ Decimal numbers are radix-10 numbers - allo
wed digits are 0,1,2,…,9
➢ Radix indicated in subscript as a decimal numb
er
➢ Example:
Dr. Amr Talaat
➢ (101)10 - decimal value 101

➢ (101) - decimal value 5
2
ELECT 90X
Range of Representations
➢ Operands and results are stored in registers of
fixed length n - finite number of distinct value

s that can be represented within an arithmetic
unit
➢ Xmin ; Xmax - smallest and largest representab
le values
➢ [Xmin,Xmax] - range of the representable num
bers
➢ A result larger then Xmax or smaller than Xmin
- incorrectly represented
Dr. Amr Talaat
➢ The arithmetic unit should indicate that the ge

nerated result is in error - an overflow indica
tion
ELECT 90X
Example - Overflow in Binary System
➢ Unsigned integers with 5 binary digits (bits)
➢ Xmax = (31)10 - represented by (11111)2
➢ Xmin = (0)10 - represented by (00000)2

➢ Increasing Xmax by 1 = (32)10 =(100000)2
➢ 5-bit representation - only the last five digits retained -
yielding (00000)2 =(0)10
➢ In general -
➢ A number X not in the range [Xmin,Xmax]=[0,31] is
represented by X mod 32
➢ If X+Y exceeds Xmax - the result is S = (X+Y) mod 32
➢ Example: X 10001 17
+Y 10010 18
Dr. Amr Talaat
1 00011 3 = 35 mod 32
➢ Result has to be stored in a 5-bit register - the most signif
icant bit (with weight 2 =32)
5 is discarded
ELECT 90X
Fixed Radix Systems
➢ r - the radix of the number system
➢ Conventional number systems are also called fix

ed-radix systems
➢ With no redundancy - 0  xi  r-1
➢ xi  r introduces redundancy into the fixed-radix
number system ?? HOW?
➢ If xi  r is allowed -
➢ two machine representations for the same value

Dr. Amr Talaat
-(...,xi+1,xi,... ) and (...,xi+1+1,xi-r,... )
ELECT 90X
Representation of Mixed Numbers
➢ A sequence of n digits in a register - not necessa
rily representing an integer

➢ Can represent a mixed number with a fractional
part and an integral part
➢ The n digits are partitioned into two - k in the in
tegral part and m in the fractional part (k+m=n)
➢ The value of an n-tuple with a radix point betwee
n the k most significant digits and the m least sig
nificant digits
Dr. Amr Talaat
➢ is
ELECT 90X
Fixed Point Representations
➢ Radix point not stored in register - understood to be in a fix
ed position between the k most significant digits and the m

least significant digits
➢ These are called fixed-point representations
➢ Programmer not restricted to the predetermined position of
the radix point
➢ Operands can be scaled - same scaling for all operands
➢ Add and subtract operations are correct -
➢ aX  aY=a(X  Y) (a - scaling factor)
➢ Corrections required for multiplication and division
➢ aX • aY=a2 X • Y ; aX/aY=X/Y
➢ Commonly used positions for the radix point -
Dr. Amr Talaat
➢ rightmost side of the number (pure integers - m=0)

➢ leftmost side of the number (pure fractions - k=0)
ELECT 90X
ULP - Unit in Last Position
➢ Given the length n of the operands, the weigh
t -m
r of the least significant digit indicates the
position of the radix point
➢ Unit in the last position (ulp) - the weight of t
he least significant digit
➢ ulp = r-m
➢ This notation simplifies the discussion
➢ No need to distinguish between the different p
artitions of numbers into fractional and integr
al parts
Dr. Amr Talaat
ELECT 90X
Representation of Negative Numbers
➢ Fixed-point numbers in a radix r system

➢ Two ways of representing negative numbers:
➢ Sign and magnitude representation (or signed-
magnitude representation)
➢ Complement representation with two alternative
s
➢Radix complement (two's complement in the
binary system)
➢Diminished-radix complement (one's comple
Dr. Amr Talaat
ment in the binary system)
ELECT 90X
Signed-Magnitude Representation
➢ Sign and magnitude are represented separately
➢ First digit is the sign digit, remaining n-1 digits repre

sent the magnitude
➢ Binary case - sign bit is 0 for positive, 1 for negative
numbers
➢ Non-binary case - 0 and r-1 indicate positive and ne
gative numbers
n-1 n
➢ Only 2r out of the r possible sequences are utili
zed
➢ Two representations for zero - positive and negative
Dr. Amr Talaat
➢ Inconvenient when implementing an arithmetic un

it - when testing for zero, the two different repre
sentations must be checked
ELECT 90X
Disadvantage of the Signed-Magnitude
Representation
➢ Operation may depend on the signs of the operands

➢ Example - adding a positive number X and a negative num
ber -Y : X+(-Y)
➢ If Y>X, final result is -(Y-X)
➢ Calculation -
➢ switch order of operands
➢ perform subtraction rather than addition
➢ attach the minus sign
➢ A sequence of decisions must be made, costing excess con
trol logic and execution time
Dr. Amr Talaat
➢ This is avoided in the complement representation methods
ELECT 90X
Complement Representations of Negative
Numbers
➢ Two alternatives -
➢ Radix complement (called two's complemen
t in the binary system)
➢ Diminished-radix complement (called one's c
omplement in the binary system)
➢ In both complement methods - positive numbe
rs represented as in the signed-magnitude met
hod
➢ A negative number -Y is represented by R-Y w
Dr. Amr Talaat
here R is a constant
➢ This representation satisfies -(-Y )=Y since R
-(R-Y)=Y
ELECT 90X
Advantage of Complement Representation
➢ No decisions made before executing addition o

r subtraction
➢ Example: X-Y=X+(-Y)
➢ -Y is represented by R-Y
➢ Addition is performed by X+(R-Y) = R-(Y-X)
➢ If Y>X, -(Y-X) is already represented as R-(Y-
X)
Dr. Amr Talaat
➢ No need to interchange the order of the two o

perands
ELECT 90X
Two’s Complement
0
➢ r=2, k=n=4, m=0, ulp=2 =1
➢ Radix complement (called two's complement in the binary c

4
ase) of a number X = 2 - X
-
➢ It can instead be calculated by X+1
➢ 0000 to 0111 represent positive numbers 010 to 710
➢ The two's complement of 0111 is 1000+1=1001
➢ it represents the value (-7)10
➢ The two's complement of 0000 is 1111+1=10000=0 mod
24 - single representation of zero
➢ Each positive number has a corresponding negative number
that starts with a 1
Dr. Amr Talaat
➢ 1000 representing (-8)10 has no corresponding positive num

ber
➢ Range of representable numbers is -8  X  7
ELECT 90X
The Two’s Complement Representation
Dr. Amr Talaat
ELECT 90X
Example - Addition in Two’s complement
➢ Calculating X+(-Y) with Y>X - 3+(-5)
0011 3
+ 1011 -5
1110 -2
➢ Correct result represented in the two's comple
ment method - no need for preliminary decision
s or post corrections
➢ Calculating X+(-Y) with X>Y - 5+(-3)
0101 5
+ 1101 -3
1 0010 2
Dr. Amr Talaat
➢ Only the last four least significant digits are ret

ained, yielding 0010
ELECT 90X
One’s Complement in Binary System
➢ r=2, k=n=4, m=0, ulp=2 =1
0
➢ Diminished-radix complement (called one's com
plement in the binary case) of a number X =
(2 - 1) - X = X-
4
➢ As before, the sequences 0000 to 0111 represen

t the positive numbers 010 to 710
➢ The one's complement of 0111 is 1000, represe
nting (-7)10
➢ The one's complement of zero is 1111 - two rep
Dr. Amr Talaat
resentations of zero
➢ Range of representable numbers is -7  X  7
ELECT 90X
Comparing the Three Representations in a
Binary System
Dr. Amr Talaat
ELECT 90X
5.1 Bit-Serial and Ripple-Carry Adders
Inputs Outputs
x y
x y c s
---------------- c
0 0 0 0 HA
0 1 0 1
1 0 0 1
1 1 1 0 s
Half-adder (HA): Truth table and block diagram

Inputs Outputs
x y c c s
in out x y
----------------------
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0 FA
cout cin
Dr. Amr Talaat
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1 s
Full-adder (FA): Truth table and block diagram
ELECT 90X
Half-Adder Implementations
_
x x
c c _
y y
x
x
y
s s y
(a) AND/XOR half-adder. (b) NOR-gate half-adder.

_
c
x
Dr. Amr Talaat
s y
(c) NAND-gate half-adder with complemented carry.
Three implementations of a half-adder.
ELECT 90X
Full-Adder Implementations
y x y x
cout HA
cout
HA cin
cin
s
(a) Built o f half -adders.
y x
Mux
0 0
cout 1
2
3 1
s
Dr. Amr Talaat
0
1 (b) Built as an AND- OR circuit.
cin
2
3
s Possible designs for a full-adder in terms
of half-adders, logic gates, and CMOS
(c) Suitable fo r CMOS r ealizatio n.
transmission gates.
ELECT 90X
Full-Adder Details
Logic equations for a full-adder:

s = x  y  cin (odd parity function)
= x y cin  x  y  cin  x  y cin  x y  cin
cout = x y  x cin  y cin (majority function)
y
P
x0 TG
TG x1
N TG
Dr. Amr Talaat
(a) CMOS transmission gate: (b) Two-input mux built of two

circuit and symbol transmission gates
CMOS transmission gate and its use in a 2-to-1 mux.
ELECT 90X
Simple Adders Built of Full-Adders
y
Using full-adders in building
x bit-serial and ripple-carry

Shift xi yi adders.
ci+1 ci
Carry
FF FA
Shift
Clock si s
(a) Bit-serial adder.

x31 y31 x1 y1 x0 y0
c32 c31 c2 c1 c0
FA . . . FA FA
Dr. Amr Talaat
cout cin
s32 s31 s1 s0
(b) Ripple-carry adder.
ELECT 90X
Critical Path Through a Ripple-Carry Adder
Tripple-add = TFA(x,ycout) + (k – 2)TFA(cincout) + TFA(cins)
xk–1 yk–1 xk-2 yk–2 x1 y1 x0 y0
ck ck–1 ck–2 c2 c1 c0
FA FA . . . FA FA
cout cin
Dr. Amr Talaat
sk sk–1 sk–2 s1 s0
Critical path in a k-bit ripple-carry adder.
ELECT 90X
Inputs Outputs
Binary Adders as
x
Versatile
y c
Building
c s
Blocks
in out x y
----------------------
0 0 0 0 0
Set one input to 0: 0 c out =
0 AND
1 of other
0 inputs
1
0 1 0 0 1
Set one input to 1: c
0 out 1 = OR1of other
1 inputs
0 FA
cout cin
1 0 0 0 1
Set one input to 0 1 0 1 1 0
and another to 1: 1 s 1= NOT 0 of third
1 input0
1 1 1 1 1 s
Bit 3 Bit 2 Bit 1 Bit 0

0 1 w 1 z 0 y x
c4 c3 c2 c1 c0
w  xyz
w  xyz xyz xy 0
Dr. Amr Talaat
(w  xyz)
Four-bit binary adder used to realize the logic function
f = w + xyz and its complement.
ELECT 90X
Conditions and Exceptions
yk–1 xk–1 yk–2 xk–2 y1 x1 y0 x0
c k–1
ck c k–2 c2 c1 c0
cout FA ... FA FA
FA cin
Overflow
Negative
Zero
s k–1 s k–2 s1 s0
Two’s-complement adder with provisions for
detecting conditions and exceptions.
Dr. Amr Talaat
overflow2’s-compl = ck  ck–1 = ck ck–1  ck ck–1
ELECT 90X
Manchester Carry Chains and Adders
Sum digit in radix r si = (xi + yi + ci) mod r

Special case of radix 2 si = xi  yi  c i
Computing the carries ci is thus our central problem

For this, the actual operand digits are not important
What matters is whether in a given position a carry is
generated, propagated, or annihilated (absorbed)
For binary addition:
gi = x i y i pi = x i  y i ai = xiyi  = (xi  yi) 
It is also helpful to define a transfer signal:
ti = gi  pi = ai = xi  yi
Dr. Amr Talaat
Using these signals, the carry recurrence is written as

ci+1 = gi  ci pi = gi  ci gi  ci pi = gi  ci ti
ELECT 90X
Carry Network is the Essence of a Fast Adder
gi pi Carry is: xi yi
0 0 annihilated or killed gi = x i y i
0 1 propagated
1 0 generated pi = x i  y i
1 1 (impossible)
g k2 p k2 g i+1 p i+1 gi pi

g1 p1
g k1 p k1 g0 p0
... ... c0
Carry network
ck
Ripple; Skip;
c k1
... ci ... c0 Lookahead;
c k2 c1
c i+1 Parallel-prefix
Dr. Amr Talaat
si
The main part of an adder is the carry network. The rest is just a set of
gates to produce the g and p signals and the sum bits.
ELECT 90X
Ripple-Carry Adder Revisited
The carry recurrence: ci+1 = gi  pi ci
Latency of k-bit adder is roughly 2k gate delays:

1 gate delay for production of p and g signals, plus
2(k – 1) gate delays for carry propagation, plus
1 XOR gate delay for generation of the sum bits
gk1 pk1 gk2 pk2 g1 p1 g0 p0
...
ck ck1 ck2 c2 c1 c0
Dr. Amr Talaat
The carry propagation network of a ripple-carry adder.
ELECT 90X
The Complete Design of a Ripple-Carry Adder
gi pi Carry is: xi yi
0 0 annihilated or killed gi = x i y i
0 1 propagated
1 0 generated pi = x i  y i
1 1 (impossible)
g k2 p k2 g i+1 p i+1 gi pi

g1 p1
g k1 p k1 g0 p0
... ... c0
gk1 pk1 gk2 pk2 g1 p1 g0 gk1
p0 pk1 gk2 pk2 g1 p1 g0 p0
... ...
ck ck1 ck2 c2 Carry network
c1 ck c
c c
0
k1 k2 c2 c1 c0
ck
c k1
... ci ... c0
c k2 c1
c i+1
Dr. Amr Talaat
si
ELECT 90X
Unrolling the Carry Recurrence
Recall the generate, propagate, annihilate (absorb), and transfer signals:
Signal Radix r Binary

gi is 1 iff xi + yi  r xi yi
pi is 1 iff xi + yi = r – 1 xi  yi
ai is 1 iff xi + yi < r – 1 xiyi  = (xi  yi) 
ti is 1 iff xi + yi  r – 1 xi  yi
si (xi + yi + ci) mod r xi  yi  ci
The carry recurrence can be unrolled to obtain each carry signal directly
from inputs, rather than through propagation
ci = gi–1  ci–1 pi–1
= gi–1  (gi–2  ci–2 pi–2) pi–1
Dr. Amr Talaat
= gi–1  gi–2 pi–1  ci–2 pi–2 pi–1

= gi–1  gi–2 pi–1  gi–3 pi–2 pi–1  ci–3 pi–3 pi–2 pi–1
= gi–1  gi–2 pi–1  gi–3 pi–2 pi–1  gi–4 pi–3 pi–2 pi–1  ci–4 pi–4 pi–3 pi–2 pi–1
=...
ELECT 90X
Full Carry Lookahead
x3 y3 x2 y2 x1 y1 x0 y0
cin
...
s3 s2 s1 s0
Theoretically, it is possible to derive each sum digit directly

from the inputs that affect it
Dr. Amr Talaat
Carry-lookahead adder design is simply a way of reducing

the complexity of this ideal, but impractical, arrangement by
hardware sharing among the various lookahead circuits
ELECT 90X
Four-Bit Carry-Lookahead Adder
c4
Complexity
p3
reduced by g3
deriving the
carry-out
indirectly c3
p2
g2
Full carry lookahead is quite practical
for a 4-bit adder
c2 p1
c1 = g0  c 0 p0
c2 = g1  g0 p1  c 0 p 0 p1 g1
p0
g2  g1 p2  g0 p 1 p2  c 0 p0 p1 p2
Dr. Amr Talaat
c3 =
c
c4 = g3  g2 p3  g1 p 2 p3  g0 p1 p2 p3 1 g0
 c 0 p0 p1 p2 p 3 c0
Four-bit carry network with
full lookahead.
ELECT 90X
Carry Lookahead Beyond 4 Bits
Consider a 32-bit adder

c 1 = g 0  c 0 p0
c 2 = g 1  g0 p1  c 0 p0 p 1
c 3 = g 2  g1 p2  g0 p1 p 2  c 0 p0 p1 p2
.
.
. 32-input AND
c31 = g30  g29 p30  g28 p29 p30  g27 p28 p29 p30  . . .  c0 p0 p1 p2 p3 ... p29 p30
Dr. Amr Talaat
... High fan-ins necessitate

32-input OR tree-structured circuits
ELECT 90X
Solutions to the Fan-in Problem
• Multilevel lookahead
• Block Adders
•High-radix addition (i.e., radix 2h) : Increases the latency for
generating g and p signals and sum digits, but simplifies the carry
network (optimal radix?)
Example: 16-bit addition

Radix-16 (four digits)
Two-level carry lookahead (four 4-bit blocks)
Either way, the carries c4, c8, and c12 are determined first
Dr. Amr Talaat
c16 c15 c14 c13 c12 c11 c10 c9 c8 c7 c6 c5 c4 c3 c2 c1 c0

Cout ? ? ?
cin
ELECT 90X
ELECT 90X
Block Ripple Adder
Programmable Logic Circuits Dr. Amr Talaat
Larger Carry-Lookahead Adder Design
Block generate and propagate signals
g [i,i+3] = gi+3  gi+2 pi+3  gi+1 pi+2 pi+3  gi pi+1 pi+2 pi+3
p [i,i+3] = pi pi+1 pi+2 pi+3
• If all 4 bits in a block propagate, the block propagates a carry.

• If at least one of the 4 bits generates carry and it can be propagated to
the MSB, the block generates a carry.
ci+3 ci+2 ci+1
gi+3 p i+3 gi+2 pi+2 gi+1 pi+1 g pi

i
Dr. Amr Talaat
4-bit lookahead carry generator

ci
g[i,i+3] p[i,i+3]
ELECT 90X
A Building Block for p [i,i+3]
Carry-Lookahead Addition
Four-bit g [i,i+3]
lookahead
pi+3
carry generator.
c4
gi+3
p3 Block Signal Generation
g3 Intermediate Carries
c3 ci+3
p2 pi+2
Four-bit gi+2
adder g2
p1 pi+1
Dr. Amr Talaat
c2 ci+2
g1 gi+1
p0 pi
c1 ci+1
g0 gi
ci
c0
ELECT 90X
Combining Block g and p Signals
Dr. Amr Talaat
Combining of g and p signals of four blocks of arbitrary widths into

the g and p signals for the overall block
ELECT 90X
A Two-Level Carry-Lookahead Adder
c12 c8 c4 c0
c 32 c16 g [12,15] g [8,11] g [4,7] g [0,3]

c48 p [12,15] p [8,11] p [4,7] p [0,3]
g [48,63] g [32,47] 16-bit

g [16,31] g [0,15]
p [48,63] p [32,47] Carry-Lookahead
p [16,31] p [0,15]
Adder
g [0,63]
Dr. Amr Talaat
p [0,63] Building a 64-bit carry-lookahead adder from 16 4-bit

adders and 5 lookahead carry generators.
ELECT 90X
Ling Adder and Related Designs
Consider the carry recurrence and its unrolling by 4 steps:

ci = gi–1  ci–1 ti–1
= gi–1  gi–2 ti–1  gi–3 ti–2 ti–1  gi–4 ti–3 ti–2 ti–1  ci–4 ti–4 ti–3 ti–2 ti–1
Ling’s modification: Propagate hi = ci  ci–1 instead of ci
hi = gi–1  hi–1 ti–2
= gi–1  gi–2  gi–3 ti–2  gi–4 ti–3 ti–2  hi–4 ti–4 ti–3 ti–2
CLA: 5 gates max 5 inputs 19 gate inputs
Ling: 4 gates max 5 inputs 14 gate inputs
The advantage of hi over ci is even greater with wired-OR:
CLA: 4 gates max 5 inputs 14 gate inputs
Ling: 3 gates max 4 inputs 9 gate inputs
Dr. Amr Talaat
Once hi is known, however, the sum is obtained by a slightly more

complex expression compared with si = pi  ci
si = (ti  hi+1)  hi gi ti–1
ELECT 90X
Carry Determination as Prefix Computation
Block B'
Block B" g p
g
j0 i0
j1 i1
p
(g", p") (g', p')
g" p" g' p'
¢
g = g" + g'p"
g p p = p'p"
Dr. Amr Talaat
(g, p)
Block B g p
ELECT 90X
Formulating the Prefix Computation Problem
The problem of carry determination can be formulated as:

Given (g0, p0) (g1, p1) . . . (gk–2, pk–2) (gk–1, pk–1)
Find (g [0,0] , p [0,0]) (g [0,1] , p [0,1]) . . . (g [0,k–2] , p [0,k–2]) (g [0,k–1] , p [0,k–1])
c1 c2 . . . ck–1 ck
The desired pairs are found by evaluating all prefixes of
(g0, p0) ¢ (g1, p1) ¢ . . . ¢ (gk–2, pk–2) ¢ (gk–1, pk–1)
The carry operator ¢ is associative, but not commutative

[(g1, p1) ¢ (g2, p2)] ¢ (g3, p3) = (g1, p1) ¢ [(g2, p2) ¢ (g3, p3)]
Prefix sums analogy:

Dr. Amr Talaat
Given x0 x1 x2 . . . xk–1
Find x0 x0+x1 x0+x1+x2 . . . x0+x1+...+xk–1
ELECT 90X
Example Prefix-Based Carry Network
g3, p3 g2, p2 g1, p1 g0, p0
6 1 2 5
+ + Four-input
prefix sums
network
+ +
12 6 7 5 g p
g[0,3], p[0,3] g[0,2], p[0,2] g[0,1], p[0,1] g[0,0], p[0,0] Scan
g
order
=g(c , --) = g(c2,3,p--)
3, 4p3 2
=
g1(c
, p21, --) =g(c , p,
0 1 0
--)
p
¢ ¢ Four-bit
Carry
lookahead
Dr. Amr Talaat
¢ ¢ network
g[0,3], p[0,3] g[0,2], p[0,2] g[0,1], p[0,1] g[0,0], p[0,0]

g p
= (c4, --) = (c3, --) = (c2, --) = (c1, --)
ELECT 90X
Alternative Parallel Prefix Networks
xk–1 . . . xk/2 xk/2–1 . . . x0

. . . . . .
Prefix Sums k/2 Prefix Sums k/2
. . . . . .
s k/2–1 . . . s0
+ . .. +
s k–1 . . . s k/2
Parallel prefix sums network built of two k/2-input
networks and k/2 adders. (Ladner-Fischer)
Dr. Amr Talaat
ELECT 90X
Brent-Kung Recursive Construction
x k–1 x k–2 . . . x3 x2 x1 x0
+ + +
. . .
Prefix Sums k/2

. . .
+ +
s k–1 s k–2 . . . s3 s2 s1 s0
Parallel prefix sums network built of one k/2-input
Dr. Amr Talaat
network and k – 1 adders.
ELECT 90X
Brent-Kung Carry Network (8-Bit Adder)
[7, 7 ] [6, 6 ] [5, 5 ] [4, 4 ] [3, 3 ] [2, 2 ] [1, 1 ] [0, 0 ] g[1,1] p[1,1]
g[0,0]
p[0,0]
¢ ¢ ¢ ¢
[6, 7 ] [2, 3 ]
[4, 5 ] [0, 1 ]
¢ ¢
[4, 7 ]
[0, 3 ]
¢ ¢
Dr. Amr Talaat
¢ ¢ ¢
g[0,1] p[0,1]
[0, 7 ] [0, 6 ] [0, 5 ] [0, 4 ] [0, 3 ] [0, 2 ] [0, 1 ] [0, 0 ]
ELECT 90X
Brent-Kung Carry Network (16-Bit Adder)
x15 x14 x13 x12 x x x x x x x x x x x x
11 10 9 8 7 6 5 4 3 2 1 0
Level
1
Reason for 2
latency
3
Brent-Kung
5
parallel prefix
Dr. Amr Talaat
graph for
6
16 inputs.
s 15 s 14 s 13 s 12 s s s s s s s s s s s s
11 10 9 8 7 6 5 4 3 2 1 0
ELECT 90X
Kogge-Stone Carry Network (16-Bit Adder)
11 10 9 8 7 6 5 4 3 2 1 0
log2k levels
(minimum
possible)
Kogge-Stone
Dr. Amr Talaat
parallel prefix
graph for
16 inputs.
s 15 s 14 s 13 s 12 s s s s s s s s s s s s
11 10 9 8 7 6 5 4 3 2 1 0
ELECT 90X
Speed-Cost Tradeoffs in Carry Networks
Method Delay Cost

Ladner-Fischer ? (k/2) log2k
Kogge-Stone ? k log2k – k + 1
Brent-Kung ? 2k – 2 – log2k
Dr. Amr Talaat
ELECT 90X
Hybrid B-K/K-S Carry Network (16-Bit Adder)
x
15
x x
14 13
x x
12 11
x x
10 9
x
8 x7 x x x x x x x x15 x14 x13 x12 x x x x x x x x x x x x
6 5 4 3 2 1 0 11 10 9 8 7 6 5 4 3 2 1 0
Level
1
Brent-Kung: 3 Kogge-Stone:
6 levels 4 4 levels
26 cells 5
49 cells
6
s 15 s 14 s 13 s 12 s 11 s 10 s 9 s8 s s s s s s s s0 s 15 s14 s 13 s12 s s s s s s s s s s s s
7 6 5 4 3 2 1 11 10 9 8 7 6 5 4 3 2 1 0

11 10 9 8 7 6 5 4 3 2 1 0
Brent-
Kung
A Hybrid
Brent-Kung/
Kogge-Stone Kogge-
Hybrid:
parallel prefix Stone 5 levels
Dr. Amr Talaat
graph for 32 cells

16 inputs.
Brent-
Kung
s15 s14 s13 s12 s s s s s s s s s s s s
11 10 9 8 7 6 5 4 3 2 1 0
ELECT 90X
Simple Carry-Skip Adders
c 16 c12 c4 c0
4-Bit 4-Bit c8 4-Bit
Block Block Block 3 2 1 0
Ripple-carry stages
(a) Ripple-carry adder.
c16 c 12 c0
4-Bit 4-Bit c8 4-Bit c4
Block Block Block 3 2 1 0
p [12,15] p [8,11] p [4,7] p[0,3]
Skip Skip Skip
Skip logic (2 gates)

(b) Simple carry-skip adder.
Dr. Amr Talaat
Converting a 16-bit ripple-carry adder into a simple carry-skip

adder with 4-bit skip blocks.
ELECT 90X
Another View of Carry-Skip Addition
g4j+3 p4j+3 g4j+2 p4j+2 g4j+1 p4j+1 g4j p4j
c4j+4 c4j+3 c4j+2 c4j+1 c4j
One-way street
Dr. Amr Talaat
Freeway
Street/freeway analogy for carry-skip adder.
ELECT 90X
Multilevel Carry-Skip Adders
c out
c in
S1 S1 S1 S1 S1
One-level carry-skip adder.

c out
c in
S1 S1 S1 S1 S1
S2
Example of a two-level carry-skip adder.

c out
c in
Dr. Amr Talaat
S1 S1 S1
S2
Two-level carry-skip adder optimized by removing the short-block

skip circuits.
ELECT 90X
Using Two-Operand Adders
Some applications of multioperand addition
• • • • a • • • • • • p (0)
 • • • • x • • • • • • p (1)
---------- • • • • • • p (2)
• • • • x0 a2 0 • • • • • • p (3)
• • • • x1 a2 1 • • • • • • p (4)
• • • • x2 a2 2 • • • • • • p (5)
• • • • x3 a2 3 • • • • • • p (6)
---------------- -----------------
• • • • • • • • p • • • • • • • • • s
Dr. Amr Talaat
Multioperand addition problems for multiplication or inner-

product computation in dot notation.
ELECT 90X
Serial Implementation with One Adder
k + log2 n bits i–1
k bits
Adder  x (j)
j=0
x(i)
Partial sum
register
Serial implementation of multi-operand addition
with a single 2-operand adder.
Dr. Amr Talaat
ELECT 90X
Pipelined Implementation for Higher Throughput
x(i–6) + x(i–7)
x(i–1)
Ready to
Delays
compute s (i–12)
Delay
x(i) + x(i–1)
x(i) x(i–8) + x(i–9) + x(i–10) + x(i–11)

Dr. Amr Talaat
x(i–4) + x(i–5)
Serial multi-operand addition when each adder is a
4-stage pipeline.
ELECT 90X
Parallel Implementation as Tree of Adders
k k k k k k k
Adder Adder Adder
k+1 k+1 k+1
log2n
n–1 Adder Adder adder levels
adders
k+2 k+2
Adder
k+3
Dr. Amr Talaat
Adding 7 numbers in a binary tree of adders.
ELECT 90X
Carry-Save Adders
Cut
A ripple-carry adder FA FA FA FA
FA FA
turns into a
carry-save adder if the
carries are saved
(stored) rather than FA FA FA FA FA FA
propagated.
cin Carry-propagate adder
cout
Carry-save adder (CSA) Full-adder

or Half-adder
(3; 2)-counter
Dr. Amr Talaat
or
3-to-2 reduction circuit Specifying full- and half-
Carry-propagate adder (CPA) and adder blocks,
carry-save adder (CSA) functions in with their inputs and
dot notation. outputs, in dot notation.
ELECT 90X
Multioperand Addition Using Carry-Save Adders
CSA CSA
CSA
Input
CSA
CSA
Sum register CSA
Carry register
Carry-propagate adder
Dr. Amr Talaat
CPA
Output
Serial carry-save addition Tree of carry-save adders reducing
using a single CSA. seven numbers to two.
ELECT 90X
Example Reduction by a CSA Tree
8 7 65 4 3 2 1 0 Bit position
7 7 7 7 7 7 62 = 12 FAs
2 5 5 5 5 5 3 6 FAs
3 4 4 4 4 4 1 6 FAs
12 FAs
1 2 3 3 3 3 2 1 4 FAs + 1 HA
2 2 2 2 2 1 2 1 7-bit adder
--Carry-propagate adder--
6 FAs
1 1 1 1 1 1 1 1 1
Representing a seven-operand
6 FAs
addition in tabular form.
4 FAs + 1 HA
A full-adder compacts 3 dots into 2

Dr. Amr Talaat
7-bit adder
(compression ratio of 1.5)
Total cost = 7-bit adder + 28 FAs + 1 HA
A half-adder rearranges 2 dots
Addition of seven 6-bit (no compression, but still useful)
numbers in dot notation.
ELECT 90X
Width of Adders in a CSA Tree
[0, k–1] [0, k–1] [0, k–1] [0, k–1]
[0, k–1] [0, k–1] [0, k–1] Adding seven k-bit
numbers and the

k-bit CSA k-bit CSA CSA/CPA widths required.
[1, k] [0, k–1] [1, k] [0, k–1]
k-bit CSA
Due to the gradual
[1, k] [0, k–1]
retirement (dropping out)
k-bit CSA of some of the result bits,
CSA widths do not vary
[2, k+1] [1, k] [1, k–1] much as we go down the
tree levels
The index pair k-bit CSA
Dr. Amr Talaat
[i, j] means that [1, k+1]

bit positions
[2, k+1] [2, k+1]
from i up to j
are involved.
k-bit CPA
k+2 [2, k+1] 1 0
ELECT 90X
ELECT 90X
Wallace Tree Multiplier
ELECT 90X
ELECT 90X
DADDA Tree Multiplier
ELECT 90X
ELECT 90X
ELECT 90X
Saturating Adders
Saturating (saturation) arithmetic:

When a result’s magnitude is too large, do not wrap around;
rather, provide the most positive or the most negative value
that is representable in the number format
Example – In 8-bit 2’s-complement format, we have:
120 + 26  18 (wraparound); 120 +sat 26  127 (saturating)
Saturating arithmetic in desirable in many DSP applications
Designing saturating adders

Adder 0
Dr. Amr Talaat
Unsigned (quite easy) 1
Signed (slightly harder) Overflow

Saturation value
ELECT 90X
Readings:
➢ Main reference for the above slides:

➢ Chapters 5,6,7,& 8, B. Parhami, Computer Ar

ithmetic: Algorithms and Hardware Design, O
xford University Press, 2000.
Dr. Amr Talaat
ELECT 90X

Computer Arithmetic: Introduction: Programmable Logic Circuits

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Computer Arithmetic: Introduction: Programmable Logic Circuits

Diunggah oleh

Hak Cipta:

Format Tersedia

ELECT 90X

Programmable Logic Circuits:

Dr. Eng. Amr T. Abdel-Hamid

Slides based on slides prepared by:

Pentium Division Bug (1994-95): Pentium’s radix-4 SRT

1.333 820 44... Correct quotient

double FLP value;

v = 21/1024 = 1.000 677 131 Save u and v; If you ca

Example: Explosion of Ariane Rocket (1996 J

An SRI* software exception arose during conversion becaus

Signed fraction, .xxx 

2’s-compl. fraction, x.xxx

2 + 2 logarithmic (log = xx.xx) log x

Some of the possible ways of assigning 16 distinct codes to represent n

esented as binary numbers of fixed length n

➢ Upper case letters represent numerical values or s

➢ The weight of the digit xi is the i th power of 2

➢ (101)10 - decimal value 101

fixed length n - finite number of distinct value

➢ The arithmetic unit should indicate that the ge

➢ Xmin = (0)10 - represented by (00000)2

➢ Conventional number systems are also called fix

➢ two machine representations for the same value

-(...,xi+1,xi,... ) and (...,xi+1+1,xi-r,... )

rily representing an integer

ed position between the k most significant digits and the m

➢ rightmost side of the number (pure integers - m=0)

➢ Fixed-point numbers in a radix r system

ment in the binary system)

➢ First digit is the sign digit, remaining n-1 digits repre

➢ Inconvenient when implementing an arithmetic un

➢ Operation may depend on the signs of the operands

➢ This is avoided in the complement representation methods

➢ No decisions made before executing addition o

➢ No need to interchange the order of the two o

➢ Radix complement (called two's complement in the binary c

➢ 1000 representing (-8)10 has no corresponding positive num

➢ Only the last four least significant digits are ret

➢ As before, the sequences 0000 to 0111 represen

Half-adder (HA): Truth table and block diagram

Full-adder (FA): Truth table and block diagram

(a) AND/XOR half-adder. (b) NOR-gate half-adder.

Three implementations of a half-adder.

Logic equations for a full-adder:

(a) CMOS transmission gate: (b) Two-input mux built of two

CMOS transmission gate and its use in a 2-to-1 mux.

x bit-serial and ripple-carry

(a) Bit-serial adder.

Tripple-add = TFA(x,ycout) + (k – 2)TFA(cincout) + TFA(cins)

xk–1 yk–1 xk-2 yk–2 x1 y1 x0 y0

Critical path in a k-bit ripple-carry adder.

Bit 3 Bit 2 Bit 1 Bit 0

overflow2’s-compl = ck  ck–1 = ck ck–1  ck ck–1

Sum digit in radix r si = (xi + yi + ci) mod r

Computing the carries ci is thus our central problem

Using these signals, the carry recurrence is written as

g k2 p k2 g i+1 p i+1 gi pi

The carry recurrence: ci+1 = gi  pi ci

Latency of k-bit adder is roughly 2k gate delays:

gk1 pk1 gk2 pk2 g1 p1 g0 p0

The carry propagation network of a ripple-carry adder.

g k2 p k2 g i+1 p i+1 gi pi

Signal Radix r Binary