32 Bits

LETTERS International Journal of Recent Trends in Engineering, Vol 2, No.
6, November 2009
A Novel Power Delay Optimized 32-bit Parallel Prefix Adder For High Speed Computing
P.Ramanathan 1, P.T.Vanathi 2
1
PSG College of Technology / Department of ECE, Coimbatore, India. Email: pramanathan_2000@yahoo.com 2 PSG College of Technology / Department of ECE, Coimbatore, India. Email: ptvani@yahoo.com
AbstractParallel Prefix addition is a technique for improving the speed of binary addition. Due to continuing integrating intensity and the growing needs of portable devices, low-power and high-performance designs are of prime importance. The classical parallel prefix adder structures presented in the literature over the years optimize for logic depth, area, fan-out and interconnect count of logic circuits. In this paper, a new architecture for performing 32-bit Parallel Prefix addition is proposed. The proposed 32-bit prefix adder is compared with several classical adders of same bit width in terms of power, delay and number of computational nodes. The results reveal that the proposed 32-bit Parallel Prefix adder has the least power delay product when compared with its peer existing Prefix adder structures. Tanner EDA tool was used for simulating the adder designs in the TSMC 180 nm and TSMC 130 nm technologies. Index Terms Parallel Prefix Adder, Dot operator, SemiDot operator, CMOS, Odd-dot operator, Even-dot operator, Odd-semi-dot operator, Even-semi-dot operator
In the above equation, operator is applied on two pairs of bits ( pi , g i ) and ( pi 1 , gi 1 ) . These bits represent generate and propagate signals used in addition. The output of the operator is a new pair of bits which is again combined using a dot operator or semi-dot operator with another pairs of bits. This procedural use of dot operator and semi-dot operator creates a prefix tree network which ultimately ends in the generation of all carry signals. In the final step, the sum bits of the adder are generated with the propagate signals of the operand bits and the preceding stage carry bit using a xor gate. The semi-dot operator will be present as last computation node in each column of the prefix graph structures, where it is essential to compute only generate term, whose value is the carry generated from that bit to the succeeding bit. The structure of the prefix network specifies the type of the PPA. The Prefix network described by Haiku Zhu, Chung-Kuan Cheng and Ronald Graham [1], has the minimal depth for a given n bit adder. Optimal logarithmic adder structures with a fan-out of two for minimizing the area-delay product is presented by Matthew Ziegler and Mircea Stan [2]. The Sklansky adder [3] presents a minimum depth prefix network at the cost of increased fan-out for certain computation nodes. The algorithm invented by Kogge-Stone [4] has both optimal depth and low fan-out but produces massively complex circuit realizations and also account for large number of interconnects. Brent-Kung adder [5] has the merit of minimal number of computation nodes, which yields in reduced area but structure has maximum depth which yields slight increase in latency when compared with other structures. The Han-Carlson adder [6] combines Brent-Kung and Kogge-Stone structures to achieve a balance between logic depth and interconnect count. Knowles [7] presented a class of logarithmic adders with minimum depth by allowing the fan-out to grow. Ladner and Fischer [8] proposed a general method to construct a prefix network with slightly higher depth when compared with Sklansky topology but achieved some merit by reducing the maximum fan-out for computation nodes in the critical path. Related work on PPA literature such as Ling adder [9], achieve improved performance gains by changing the equation of the dot operator . Taxonomy of classical Prefix Parallel Adders based on fan-out, interconnect count and depth
58
I. INTRODUCTION VLSI Integer adders find applications in Arithmetic and Logic units (ALUs), microprocessors and memory addressing units. Speed of the adder often decides the minimum clock cycle time in a microprocessor. The need for a Parallel Prefix adder is that it is primarily fast when compared with ripple carry adders. Parallel Prefix adders (PPA) are family of adders derived from the commonly known carry look ahead adders. These adders are best suited for adders with wider word lengths. PPA circuits use a tree network to reduce the latency to O (log 2 n) where n represents the number of bits. A three step process is generally involved in the construction of a PPA. The first step involves the creation of generate and propagate signals for the input operand bits. The second step involves the generation of carry signals. In PPA, the dot operator and the semi-dot operator are introduced. The dot operator is defined by the equation (1) and the semi-dot operator is defined by the equation (2)
( pi , gi ) ( pi 1 , gi 1 ) = ( pi pi 1 , gi + pi gi 1 )
( pi , gi ) ( pi 1 , gi 1 ) = ( gi + pi gi 1 )
(1) (2)
2009 ACADEMY PUBLISHER
LETTERS International Journal of Recent Trends in Engineering, Vol 2, No. 6, November 2009
characteristics has been presented by Harris [10]. Sparse tree binary adder proposed by Yan Sun and etal [11] combines the benefits of prefix adder and carry save adder. Integer Linear Programming method to build parallel prefix adders is proposed by Jainhau Liu and etal [12]. A new technique to achieve energy savings for Kogge-Stone adder was proposed by Frustaci and etal [14]. A novel Parallel Prefix adder for 32-bits has been proposed. The proposed structure has the least power delay product amongst all its peer ones. II. EXISTING PARALLEL PREFIX ADDERS Brent Kung adder is oriented towards simpler tree structure with a fewer computation nodes. Kogge-Stone adder possesses a regular layout and is preferred for high performance applications. Han-Carlson adder the reduces the hardware complexity when compared to that of a Kogge-Stone adder but at the cost of introducing a additional stage to its carry merge path. In Sklansky adder, the fan-out from the inputs to outputs along the critical path increases drastically which introduces latency in the circuit. Ladner-Fischer adder is an improved version of Sklansky adder, where the maximum fan-out is reduced. Table I summarizes the data regarding the requirement of number of computation nodes and logic depth for various existing Parallel Prefix adders. Let n be the word-length of the adder in terms of bits. III. PROPOSED 32-BIT PARALLELREFIX ADDER Fig. 1 shows the architecture of the proposed 32-bit parallel prefix adder. The objective is to eliminate the massive overlap between the prefix sub-terms being computed. Hence the associate property of the dot operator is employed to keep the number of computation nodes at a minimum. The first stage of the computation is called as pre-processing. The first stage in the architectures of the proposed 32-bit prefix adder involve the creation of generate and propagate signals for individual operand bits in active low format. The equations (3) and (4) represent the functionality of the first stage.
TABLE I. COMPARATIVE STUDY ON EXISTING PREFIX ADDERS
Inputs 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Stage 1
Stage 2
Stage 3
Stage 4 Stage 5
Stage 6
Stage 7
Stage 8 Stage 9
Outputs
C31 C 30 C29 C28 C C26C 25 C 24C 23C22 C21 C20 C19C18 C C16C 15C14C13 C12 C C10 C9 C C7 C6 C5 C4 C3 C2 C1 C0 27 17 11 8
Figure 1. Proposed 32-bit Parallel Prefix Adder
Gi = (ai bi )
(3) (4)
Pi = ai bi = ai bi
In the above equations, ai , bi represent input operand bits for the adder, where i varies from 0 to 31. The second stage in the prefix addition is termed as prefix computation. This stage is responsible for creation of group generate and group propagate signals. For deriving the carry signals in the second stage, this architecture introduces four different computation nodes for achieving improved performance. There are two cells designed for the dot operator. First cell for the dot operator named odd-dot represented by a , works with active low inputs and generates active high outputs. The second cell for the dot operator named even-dot represented by a , works with active high inputs and generates active low outputs. Similarly, there are two cells designed for the semi-dot operator. First cell for the semi-dot operator named odd-semi-dot represented by a , works with active low inputs and generates active high inputs. The second cell for the semi-dot operator named even-semi-dot represented by a , works with active high inputs and generates active low outputs. The stages with odd indexes use odd-dot and odd-semi-dot cells where as the stages with even indexes use even-dot and even-semi-dot cells. The equations (5) and (6) represent the functionality of odd-dot and even-dot cells respectively.
Adder Type Brent-Kung Kogge-Stone Han-Carlson
Number of Computation Nodes
Logic Depth
[(2*log 2 n) 2]
[2 * n 2 log 2 n] [(n * log 2 n) n + 1] n [ * (log 2 n)] 2 n [ * (log 2 n)] 2 n [ * (log 2 n)] 2
log 2 n [(log 2 n) + 1] [(log 2 n) + 1] log 2 n
Ladner-Fischer
(P,G) = ( pi , gi )( pi1 , gi1) = ( pi + pi1 , gi ( pi + gi1) = ( pi pi1 , gi + pi gi1)

59
(5)
Sklansky
( P , G ) = ( pi , gi )( pi 1 , gi 1 )
= ( pi pi 1 , gi + pi gi 1 )
logic family. The aspect ratio of the MOS transistor (6) devices were chosen such that
The equations (7) and (8) represent the functionality of odd-semi-dot and even-semi-dot cells respectively.
W W = 3* . For L n L p
(G ) = ( pi , gi ) ( gi 1 ) = ( g i ( pi + gi 1 ) = ( gi + pi gi 1 ) = ci
(G ) = ( pi , gi ) ( gi 1 ) = ( gi + pi gi 1 ) = ci
(7)
(8)
CMOS logic family will implement only inverting functions. Thus cascading odd cells and even cells alternatively gives the benefit of elimination of two inverters between them, if a dot or a semi-dot computation node in an odd stage receives both of its input edges from any of the even stages and vice-versa. But it is essential to introduce two inverters in a path, if a dot or a semi-dot computation node in an even stage receives any of its edges from any of the even stages and vice-versa. From the prefix graph of the proposed structure shown in Figure. 1, we infer that there are only few edges with a pair of inverters, to make (G , P ) as
(G , P ) or to make (G , P ) as (G , P ) respectively. The pair of inverters in a path is represented by a in the prefix graph. By introducing two cells for dot operator and two cells for semi-dot operator, we have eliminated a large number of inverters. Due to inverter elimination in paths, the propagation delay in those paths would have reduced. Further we achieve a benefit in power reduction, since these inverters if not eliminated, would have contributed to significant amount of power dissipation due to switching. The output of the odd-semi-dot cells gives the value of the carry signal in that corresponding bit position. The output of the even-semi-dot cell gives the complemented value of carry signal in that corresponding bit position. The final stage in the prefix addition is termed as post-processing. The final stage involves generation of sum bits from the active low Propagate signals of the individual operand bits and the carry bits generated in true form or complement form. The proposed 32-bit structure has a maximum fan-out of 6 and a logic depth of 9. The first stage and last stage are intrinsically fast because they involve only simple operations on signals local to each bit position. The intermediate stage embodies long distance propagation of carries, so the performance of the adder depends on the intermediate stage. The lateral fan-out slightly increases, but we get an advantage of limited interconnect lengths since the tree grows along the main diagonal.
TSMC 180 nm technology, threshold voltages of NMOS and PMOS transistors are around 0.3694 V and -0.3944 V respectively and the supply voltage was kept at 1.8 V.. For TSMC 130 nm technology, threshold voltages of NMOS and PMOS transistors are around 0.332 V and 0.3499 V respectively and the supply voltage was kept at 1.3 V. The parameters considered for comparison are power consumption, worst case delay and power-delay product. The various PPA structures were then compared with the number of transistors needed for each of the circuit realizations. IV. RESULTS AND DISCUSSION Table II lists the structural characteristics for each of the 32-bit Parallel Prefix adders. From the table it is observed that the proposed 32-bit Parallel Prefix adder has the least number of computation nodes amongst all other peer designs. Structure of the proposed 32-bit adder also reveals that the prefix tree builds along the main diagonal after the first two stages.
TABLE II. CHARACTERISTICS OF 32-BIT PARALLEL PREFIX ADDERS Number of Computation Nodes Adder Type Brent-Kung Kogge-Stone Han-Carlson Ladner-Fischer Sklansky Proposed Dot 26 98 33 33 33 23 Semi-Dot 31 31 31 31 31 31 Logic Depth 8 5 6 6 5 9
Table III and Table IV list out the performance comparison of the various 32-bit Parallel Prefix adders in 180 m technology and 130 m technology.
TABLE III. PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX ADDERS USING TSMC 180 NM TECHNOLOGY Average Power (W) 228.5272 342.1976 253.1247 261.1407 232.7179 211.9441 Delay (ns) 1.05 0.79 0.89 0.70 0.91 0.85 Power-Delay Product ( X 10-15 Joules) 239.9536 270.3361 225.2810 184.8985 211.7733 180.1525
Adder Name Brent-Kung Kogge-Stone Han-Carlson Sklansky Ladner-Fischer Proposed
III. SIMULATION ENVIRONMENT Simulation of various Parallel Prefix Adder designs were carried out with Tanner EDA tool using TSMC 180 nm and TSMC 130 nm technologies. All the Parallel Prefix Adder structures were implemented using CMOS 60
Figure 3. Power Delay Product of various PPA Designs in 180 nm TABLE IV. PERFORMANCE COMPARISON OF 32-BIT PARALLEL PREFIX ADDERS USING TSMC 130 NM TECHNOLOGY Average Power (W) 67.02993 101.4262 76.92042 78.23902 69.72199 62.9453 Delay (ns) 0.88 0.62 0.68 0.53 0.67 0.65 Power-Delay Product ( X 10-15 Joules) 58.9863 62.8842 52.3059 41.4667 46.7137 40.9144
Adder Name Brent-Kung Kogge-Stone Han-Carlson Sklansky Ladner-Fischer Proposed
Fig. 2 and Fig. 3 show the Power and Power Delay Product comparison of various PPA Designs simulated using TSMC 180 nm Technology respectively.
Figure 4. Power Comparison of various PPA Designs in 130 nm
Figure 2. Power Comparison of various PPA Designs in 180 nm
Figure 5. Power Delay Product of various PPA Designs in 130nm
V. CONCLUSIONS It is observed that the Proposed 32-bit Parallel Prefix adder has the least power delay product when compared with its peer existing ones. The Proposed structure offers 7.2 % power savings is achieved when compared with Brent Kung adder. The Proposed design offers 38 % 61
power saving when compared with Kogge-Stone structure, 18.8 % power savings when compared with Sklansky structure and 16.26 % power savings when compared with Han-Carlson structure. The delay of the Proposed structure is almost comparable with that of the Kogge-Stone adder. The significant improvement in power delay product of the Proposed 32-bit prefix structure makes it more suited for ALUs and multiplier units. REFERENCES
[1] Haikun Zhu, Chung-Kuan Cheng and Ronald Graham, Constructing zero deficiency parallel prefix adder of minimum depth, Proceedings of 2005 Conference on Asia South Pacific Design Automation ASP-DAC 2005. Shanghai, vol.2, pp. 883888, January 2005. [2] Matthew Ziegler and Mircea Stan, Optimal logarithmic adder structures with a fan-out of two for minimizing area delay product, IEEE International Symposium on Circuits and Systems 2001. Sydney, vol.2, pp.657-660, May 2001. [3] J. Sklansky, Conditional sum addition logic, IRE Transactions on Electronic computers. New York, vol. EC9, pp. 226-231, June 1960. [4] P.Kogge and H.Stone, A parallel algorithm for the efficient solution of a general class of recurrence relations, IEEE Transactions on Computers, vol. C-22, no.8, pp.786-793, August 1973. [5] R.Brent and H.Kung, A regular layout for parallel adders, IEEE Transaction on Computers, vol. C-31, no.3, pp. 260-264, March 1982. [6] T. Han and D. Carlson, Fast area efficient VLSI adders, Proceedings of the Eighth Symposium on Computer Arithmetic. Como, Italy, pp.49-56, September 1987. [7] S.Knowles, A family of adders, Proceedings of the 15th IEEE Symposium on Computer Arithmetic. Vail, Colarado, pp.277-281, June 2001. [8] R. Ladner and M. Fischer, Parallel prefix computation, Journal of ACM. La Jolla, CA, vol.27,no.4, pp. 831-838, October 1980. [9] Giorgos Dimitrakopoulos and Dimitris Nikolos, High speed parallel prefix VLSI Ling adders, IEEE Transactions on Computers, vol.54, no.2, pp. 225-231, February 2005. [10] David Harris, A Taxonomy of parallel prefix networks, Proceedings of the 37th Asilomar Conference on Signals, Systems and Computers Pacific Grove, California, pp.2213-2217, November 2003. [11] Yan Sun, Dongyu, Zheng, Minxuan Zhang and Shaoqing Li, High Performance-Low power sparse tree binary adders, Proceedings of the 8th International Conference on Solid State and Integrated Circuit Technology ICSICT 2006. Shanghai, pp. 1649-1651, October 2006. [12] Jainhua Liu, Yi Zhu, Haikun Zhu and Chung-Kuan Cheng, Optimum prefix adders in a comprehensive area, timing and power design Space, Proceedings of the 2007 Asia and South Pacific Design Automation Conference. Washington, pp. 609-615, January 2007. [13] Jun Chen and James E. Stine, Enhancing parallel Prefix Structures with Carry Save Notion, Proceedings of the 51st IEEE International Midwest Symposium on Circuits and Systems. Knoxville, pp. 354-357, August 2008. [14] F. Frustaci, M. Lanuzza, P. Zicari, S. Perri and P. Corsonello, Designing high speed adders in power constrained environments, IEEE Transactions on Circuits and Systems, Vol.56, pp. 172-176, February 2009.
62

32 Bits

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

32 Bits

Diunggah oleh

Hak Cipta:

Format Tersedia

LETTERS International Journal of Recent Trends in Engineering, Vol 2, No.

2009 ACADEMY PUBLISHER

Figure 1. Proposed 32-bit Parallel Prefix Adder

Adder Type Brent-Kung Kogge-Stone Han-Carlson

Number of Computation Nodes

[2 * n 2 log 2 n] [(n * log 2 n) n + 1] n [ * (log 2 n)] 2 n [ * (log 2 n)] 2 n [ * (log 2 n)] 2

log 2 n [(log 2 n) + 1] [(log 2 n) + 1] log 2 n

(P,G) = ( pi , gi )( pi1 , gi1) = ( pi + pi1 , gi ( pi + gi1) = ( pi pi1 , gi + pi gi1)

2009 ACADEMY PUBLISHER

Adder Name Brent-Kung Kogge-Stone Han-Carlson Sklansky Ladner-Fischer Proposed

Adder Name Brent-Kung Kogge-Stone Han-Carlson Sklansky Ladner-Fischer Proposed

Figure 4. Power Comparison of various PPA Designs in 130 nm

Figure 2. Power Comparison of various PPA Designs in 180 nm

Figure 5. Power Delay Product of various PPA Designs in 130nm

Anda mungkin juga menyukai