ON
2002B5A3503
Table of Contents
Acknowledgements
Abstract
1. Introduction
.
5
10
12
12
5. VERILOG CODE
14
6. RESULTS
6.1 Multiplier Interface
6.2 Partial Product generator
56
56
57
58
59
60
61
7. CONCLUSION
62
8. FUTURE SCOPE
62
References
63
Acknowledgements
I would like to express my sincere gratitude to Dr. D. Sriram, Instructor In-charge Lab
oriented Project Bits C314, for providing me an opportunity to work in the methodology
of research, for cultivating a logical and creative thinking and for making me express my
findings in the form of a scientific report.
I would also like to express my gratitude to Dr.(Mrs.)Anu Gupta, Assistant
professor, EEE Group, for giving me an opportunity to work under her guidance. The
work under her supervision, gave me an opportunity to comprehend my subject
knowledge and apply it to the given problem.
Last but not the least; I would like to thank Mr. Pawan Sharma, for allowing me
to use the various tools in OYSTER LAB.
Abstract
3
A low power multiplication algorithm and its VLSI architecture using a mixed number
representation is proposed. The reduced switching activity and low power dissipation are
achieved through the Sign-Magnitude (SM) notation for the multiplicand and through a
novel design of the Redundant Binary (RB) adder and Booth decoder. The high speed
operation is achieved through the Carry- Propagation-Free (CPF} accumulation of the
Partial Products (PP) by using the RB notation. Analysis showed that the switching
activity in the PP generation process can be reduced on average by 90%. Compared to the
same type of multipliers, the proposed design dissipates much less power and is 18%
faster on average
1: Introduction
It has been shown that by the use of the SM notation for the multiplicand, the use of
Twos Complement (2C) representation for the multiplier, and the use of RB
representation for the PP accumulation, the Expected Switching Activity (ESA), and
therefore the power dissipation, can be significantly reduced. The ESA reduction occurs
any time the negation of the multiplicand is needed in order to generate the PPs upon the
radix-4 Booths algorithm. High speed operation is sustained through the RB notations
for accumulating the PPs, since a CPF addition can be executed with RB numbers. The
inputs and outputs of the multiplication unit are assumed to be in 2C notation. It is
interesting to point out the fact that although the proposed algorithm and its VLSI
architecture is complex in terms of the number conversions, it is more energy efficient
and has an operating speed close to the Wallace tree architecture and faster than the other
proposed multipliers.
Therefore, an extra bit x n= x n-1 (sign extension) must be appended to the left of
x n-1 to make the triplet x n , x n-1,x n-2 .If n is even, then the largest index 2[(n-1)/2]+1=n-1
Therefore, multiplier X can be exactly grouped into n / 2 triplets and no sign extension is
needed. For parallel multiplication, all triplets can be scanned at the same time.
From Table I, when the radix-4 Booths algorithm catches the multiplier patterns l l 0,
101 and l 0 0, it has to generate -Y or -2Y. These patterns, which will be referred to as
the NEG - the negation patterns hereafter - are directly related to the ESA in the Booth
PP generator. The average probability of a NEG patterns to occur in any given triplet
x2k+1, x2k, x2k-1 of the multiplier can be analyzed as follows.
Assume an n-bit 2C number X=xn-1,xn-2.x1,x0 and the probability of being 1 for
each bit of the multiplier is 0.5.
Case 1: n is even, (n-1)/2 = (n-2)/2. Therefore n/2 triplets are needed to cover all the bits
of the multiplier and the sign extension is not needed. For x1x0, since the Booths
algorithm assumes bit xel to be always zero, there are only four choices for the triplet x1
xo x-1: 000, 010, 100 and 110. Two of them are NEGs. Hence, the probability of a NEG to
appear in x1x0 and x-, positions is l/2. For the remaining (n-2) bits, each triplet (x2k+l,
2k, 2k-l) has 8 possible patterns and 3 of them are NEGs. So, the probability of a NEG to
appear in the remaining (n-2) bits is 3/8. Therefore, the average probability of a NEG that
may appear in a triplet x2k+1, x2k , x2k-1 is
Case 2: n is odd, [(n-1)/2] = (n-1)/2. Therefore, number must be sign extended and
(n+1)/2 triplets are needed to cover all the bits of the multiplier. Based on the sign
extension rule, the triplet x,x,-,x,,-~ has four possible patterns: 000, 001, 110, 111.
Among them there is just one NEG. So, the probability of a NEG to occur in triplet x n, x
n-1,
x n-2, is l/4. For x1x0, same as the case when n is even, the probability of a NEG to
occur in the triplet x,x0x-, is l/2. For the remaining (n-3) bits, the probability Of a NEG to
occur in a triplet x2k+lx2kx2k-1 is 3/8. Therefore, the average probability of a NEG that may
appear in a triplet x2k+lx2kx2k-1 is :
Combining cases 1 and 2, the average probability for a NEG to appear in triplet
x2k+lx2kx2k-1 is
Since, for 2C numbers -Y = y+l and the generation of Y requires the complementation
of every bit of Y, the ESA in the PP generation process is:
On the average, the ESA in the partial product generation process is about 0.40. This
results in a large power dissipation!
A comparison of ESA for the SM and 2C number is reported in Table III. The reduction
of the ESA is significant, ranging from 87.5% for 8 bit operands to 98.4% for 64 bit
operands. As the operand length increases, the ESA for the even bit 2C numbers
decreases with the asymptotic value of 318 and the ESA for the odd bit 2C numbers is a
constant value of 3/8. For the SM numbers, the ESA decreases at the rate of 0(1/n) and
asymptotically reaches zero. Thus, for longer operands the ESA reduction and therefore
the power saving is more profound.
For positive numbers, the 2C and SM notations are identical - no conversion is needed.
For negative numbers, the conversion from 2C to SM can
be implemented by
complementing all the bits except the sign bit yn-1, and adding the 1 to the final result. If
one assumes an uniform distribution of positive and negative numbers, then the
probability that the number has to be converted is 0.5. Although the conversion adds
some delay, it does not offset the power dissipation gain due to the SM representation
for the multiplicand. Indeed, if the multiplicand is in 2C notation one has to execute the
negation process for about 40% of all the PPs needed and the number of the negation
processes increases as the operand length increases, while the conversion from 2C to SM
takes place only once for any operand length. For the add 1 operation, instead of using
an n-bit adder which introduces delay and power overhead we generate a correction term
associated with each PP and then add this correction term to all PPs through the binary
addition tree as shown in Figure 3. In this manner, only one more input for the addition
tree is added while the whole n-bit addition operation is avoided. The correction term can
be generated according to Table IV.
9
The logic for Cl and C2 is trivial: Cl= yn-1*lY and C2=yn-1*2Y. The block diagram, as
shown in Figure 1, indicates that the 2C-to- SM conversion adds only one inverter delay
or about 0.5 gate delay2 which comes from the complementation operation of the 2C
number. The correction term does not introduce extra power overhead compared to the
traditional 2C implementation, since in the traditional 2C implementation one also
needs a similar correction term generator (adding 1) to generate the negation of the
multiplicand.
10
with digit ri { 1,0,-1), are more suitable for high speed parallel arithmetic computations
[ 1, 61. Due to the redundancy in RB numbers one can perform the CPF addition through
the selection of different numbers for the same value. Hence, we further convert the PPs
into the RB representation. We are adopting the selection rule proposed by Takagi in [l]
to perform CPF addition for the PP accumulation. The rule is shown in Table VI. Let us
and
is shown in
Figure 2. One can see that, the carry is limited within adjacent digits and there is no
global carry propagation.
11
can group the sign bit xn-1, with all the rest bits in a pair by pair fashion, (xn-1,xn-2),( xn1,xn-3),...,(
xn-1,x1),( xn-1,x0), and interpret the pairs according to the SM coding rule
shown in Table V. Clearly, we do not need any operations except some wiring.
THE ALGORITHM:
Step 1: Convert the multiplicand from 2C into the SM representation and keep the
multiplier in 2C form.
Step 2: Apply the radix-4 Booths algorithm to generate all the PPs represented in SM
notation.
Step 3: Convert all the partial products from SM into RB representation.
Step 4: Sum up all the PPs through a RB adder tree.
12
13
14
15
17
18
7.CONCLUSION:
19
This architecture has been chosen keeping low power as main objective. All the stages in
above architecture have been coded in VERILOG HDL. In implementation special care
has been taken to meet our objective. All the modules involved are verified functionally.
After testing logic synthesis has been carried out. From logic synthesis delay involved in
each stage has been calculated. Whole design has been synthesized in tsmc018
technology and the delay obtained is around 16 ns. Further semi custom layout of the
design has been done in Autocell.
8.FUTURE SCOPE
Power vs delay optimization is the main aim of all the designs, various techniques can be
applied for achieving it. Since the whole design is modular wherein single one bit adder
has been repeated for the whole adder tree, optimization of this adder can increase the
speed. For this transmission gate designs can be further exploited and fast RB adder can
be designed, which could not be done here because of technology library constraints.
Various circuits level power reduction techniques can also be applied to further reduce
the power consumption.
REFERENCES
20
21