art ic l e i nf o a b s t r a c t
Article history: Full Swing Gate Diffusion Input (FS-GDI) methodology is presented. The proposed methodology is
Received 15 July 2012 applied to a 40 nm Carry Look Ahead Adder (CLA). The CLA is implemented mainly using GDI full-swing
Received in revised form F1 and F2 gates, which are the counterparts of standard CMOS NAND and NOR gates. A 16-bit GDI CLA
8 February 2013
was designed in a 40 nm low power TSMC process. The CLA, implemented according to the proposed
Accepted 24 April 2013
Available online 7 May 2013
methodology, presents full functionality and robustness under global and local process variations at
wide range of supply voltages. Simulation results show 2 area reduction, 5 improvement in dynamic
Keywords: energy dissipation and 4 decrease in leakage, with a slight (24%) degradation in performance, when
Alternative logic family compared to the CMOS CLA. Advanced design metrics of GDI cells, such as minimum energy point (MEP)
Carry Look Ahead (CLA) adder
operation and minimum leakage vector (MLV), are discussed.
Full-Swing GDI
& 2013 Elsevier B.V. All rights reserved.
Gate Difusion Input (GDI)
Low power
1. Introduction Recently, it was shown that any GDI circuit can be implemented
in a standard CMOS process [8]. The efficiency of the GDI method
Power consumption and area reduction of logic and memory for both combinatorial and sequential logic was shown by many
have become primary focuses of attention in VLSI digital design groups [8–15]. Various combinatorial circuits, such as adders,
[1–6]. Power is the limiting factor in both high performance multipliers, comparators, and counters, were implemented in
systems and portable applications. Die area directly affects the processes from 0.8 mm down to 65 nm. GDI Flip-Flops were also
device size and cost. Since the introduction of the standard CMOS presented, showing improvements in both area and power, com-
Logic in early 80s, many design solutions have been proposed pared to existing Flip Flop styles.
to improve power dissipation, area and performance of digital In this paper we present an efficient methodology for digital
VLSI chips. circuits implementation. The proposed methodology was applied
Gate Diffusion Input (GDI) design methodology was introduced to a 16-bit adder in low power standard 40 nm TSMC process. CLA
as a promising alternative to Static CMOS Logic [7]. Originally adder architecture, which was originally proposed as an alterna-
proposed for fabrication in Silicon on Insulator (SOI) and twin-well tive for the speed enhancement of a simple ripple adder, was
CMOS processes, GDI methodology allowed implementation of a chosen as a benchmark circuit in this work. The proposed CLA
wide range of complex logic functions using only two transistors implementation utilizes improved full-swing GDI F1 and F2 gates,
[7]. It was shown, that area and dynamic power of GDI combina- which are the counterparts of standard CMOS NAND and NOR
torial and sequential logic were significantly reduced, as compared gates. The CLA design is compared with our previously shown GDI
to standard CMOS implementations. Similarly to existing alter- methodology [8], which utilizes swing-restoring buffers with
natives to CMOS, such as Pass Transistor Logic (PTL), the GDI gates selective application of high-Vth transistors, as well as with
presented reduced voltage swing at their outputs due to threshold standard CMOS implementation. Simulation results show CLA
drops. These drops usually cause degradation in performance and functionality and robustness under global and local process varia-
increased short circuit power [8]. However, since the GDI circuits tions. The CLA presents 2 area reduction and 4–5 power
were implemented with much less transistors, a significant power reduction, compared to the conventional CMOS implementation.
overall power reduction was observed, while maintaining minimal The contributions of this paper are as follows: (1) The GDI full
performance penalty. swing methodology for a standard nanoscaled CMOS technology is
presented; (2) Leakage reduction in GDI is discussed through
minimum leakage vector (MLV) analysis; (3) GDI robustness is
evaluated through statistical Monte Carlo simulations; and (4) Low
n
Corresponding author. Tel.: +972 54 8044144; fax: +972 3 7384051. voltage operation of the GDI cells, including minimum energy
E-mail addresses: alexander.fish@gmail.com, alexander.fish@biu.ac.il (A. Fish). operation (MEP) is shown.
0167-9260/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.vlsi.2013.04.002
A. Morgenshtein et al. / INTEGRATION, the VLSI journal 47 (2014) 62–70 63
The paper is organized as follows: Section 2 overviews the transistors. This combination allows minimization of the direct-
GDI methodology and presents its benefits and limitations. The path static power in the inverters.
proposed CLA implementation is discussed in Section 3. Section 4 Most of today's static digital designs are based on CMOS NAND
presents simulation results of the proposed GDI CLA in 40 nm and NOR gates. The reasons for this are known and well explored.
standard CMOS process, comparing them to the CMOS CLA. Both NAND and NOR gates are implemented using only four
Section 5 concludes the paper. transistors and each one of these functions is a universal set. The
GDI method, which is very efficient for implementation of various
gates, such as MUX, AND, OR (see Table 2), has similar number of
transistors in NAND/NOR gates implementation as standard CMOS
2. Overview of GDI
methodology. However, the GDI technology provides alternative
basic functions, F1 and F2. Consisting only of two transistors (one
The basic GDI cell is shown in Fig. 1. At the first glance, the GDI
GDI cell), each one of these functions represents a universal set.
cell, which consists of only two transistors, resembles the conven-
Moreover, it was shown in [7] that F1 and F2 functions can be used
tional CMOS inverter. However, contradictory to the inverter, it
to synthesize other functions more efficiently than the NAND
contains three inputs: G (common gate input of both the nMOS
and NOR gates. Fig. 2 shows a comparison of number of various
and the pMOS), P (input to the source/drain of the pMOS), and N
functions that can be implemented using the same number of F1
(input to the source/drain of the nMOS).
and NAND gates. The strength of the F1 and F2 gates will be also
It was shown that multiple Boolean functions can be imple-
demonstrated by implementation of CLA using mainly these GDI
mented by a simple GDI cell, as demonstrated in Table 1. This is
gates (see Section 3).
achieved by a change of the input configuration of the GDI cell.
While implementation of most of these functions is relatively
complex (6–12 transistors) in Static CMOS, it is very efficient (only
2 transistors) with the GDI cells. The Multiplexer (MUX) is the 3. GDI CLA adder design
most complex function that can be implemented with a basic GDI
cell, while being the most efficient function as compared to CMOS 3.1. Full-swing GDI cells
implementation.
GDI gates may suffer from threshold voltage drops which In this paper we propose full-swing (FS) GDI cells. The
reduce current drive and therefore affect the performance of the proposed technique utilizes a single swing restoration (SR) tran-
gate. These drops also increase direct-path static power dissipation sistor to improve the output swing of F1 and F2 GDI gates. Fig. 3
in the cascaded inverters, used for swing restoration. It was shown shows the structure of full swing F1 and F2 cells. As can be seen,
that these effects can be significantly reduced by using swing- the SR transistor is activated only in cases when the Vth drop may
restoration buffers with a multiple VTH approach [7], herein occur at the output. Since in F1 and F2 gates, the output VTH drop
named MVT. This approach suggests using low threshold transis- can occur only at one of the logical levels (VTH instead of 0 V in F1,
tors in all paths where a voltage drop is expected. This way, and VDD–VTH instead of VDD in F2), only a single SR transistor is
the voltage drop at the output will be minimal. In addition, all required to ensure the full swing operation.
regenerative inverters are implemented using high threshold In cases where the gate input signal of GDI cell has an inverted
representation in the circuit, it can be used to control the swing
restoring transistor. This transistor will have a diffusion input similar
to the diffusion input of GDI, but will be of an opposite type (nMOS
for F1, and pMOS for F2). In this manner, the diffusion input signal
will pass through a pair of transistors of both types (in the transistor
of original GDI cell, and the complementary SR transistor). The
FS GDI cells are efficient alternative for swing restoration buffers,
in designs where inverted signals can be obtained as part of logic
function implementation. The CLA adder presented in the next sub-
section is a good example of such design. An example of a logic chain
in CLA, containing full-swing GDI cells, is shown in Fig. 4.
Table 2
The transistor-level design of CLA blocks.
pg
PG
LCG
CLA
A. Morgenshtein et al. / INTEGRATION, the VLSI journal 47 (2014) 62–70 65
Table 2 (continued )
Last CLA
Inverter with HVT nMOS transistor, Inverter with HVT pMOS transistor, Inverter with both, HVT transistors.
Note: F1* and F2* stand for GDI F1 and F2 full swing gates.
35
Number of Functions
30
25 GDI
21
20
16
CMOS
15
10
8 Fig. 3. Scheme of the FS GDI gates.
8
5 4
2
The local and global carry generator blocks are implemented
0
0 1 2 3 4 according to:
Number of Cells C loc ¼ g 0 þ p0 C in
(
Fig. 2. Number of various functions that can be implemented using the same F 1 ðg 0 ; F 2 ðC in ; p0 ÞÞ; MVT GDI
number of F1 and NAND cells (after [7]). ¼ ð3Þ
F 2 ðF 1 ðp0 ; C in Þ; g 0 Þ; FS GDI
C out ¼ G1 þ P 1 G0 þ P 1 P 0 C in ¼ F 2 ðP 1 G0 þ G1 ; P 1 P 0 C in Þ
where a similar GDI XOR gate was used in both versions, as will be
explained below. ¼ F 2 ðF 2 ðP 1 G0 ; G1 Þ; F 1 ðP 1 P 0 ; C in ÞÞ
The PG unit implementation is the following:
¼ F 2 ðF 2 ðF 1 ðP 1 ; G0 Þ; G1 Þ; F 1 ðF 1 ðP 1 ; P 0 Þ; C in ÞÞ ð4Þ
P ¼ p1 p0 ¼ F 1 ðp1 ; p0 Þ
( The signals P 1 ; G1 and P 0 ; G0 in (4) represent the outputs from
F 1 ðg 1 ; F 2 ðg 0 ; p1 ÞÞ; MVT GDI top hierarchy level CLA blocks.
G ¼ g 1 þ g 0 p1 ¼ ð2Þ It should be noted that in the MVT implementation, the
F 2 ðF 1 ðp1 ; g 0 Þ;g 1 Þ; FS GDI
application of high-Vth (HVT) transistors in swing restoration
where output P has similar implementation in both the versions. buffers is selective and depends on the Vth drop that may occur
66 A. Morgenshtein et al. / INTEGRATION, the VLSI journal 47 (2014) 62–70
in the path between two buffers. In case that the Vth drop at buffer elsewhere in the circuit. Thus, instead of implementing an inverter
input may occur only at high (low) voltage, then an asymmetric again as part of GDI XOR, we can use both signals as diffusion
buffer will be used with high-Vth nMOS (pMOS) transistor. All inputs to GDI MUX while maintaining the same functionality.
other transistors in the buffers will remain low-Vth (LVT) to This allows a significant decrease in number of transistors in
maintain the performance. In all the GDI cells, the transistor that circuits with high number of XOR/XNOR gates. An example of the
may cause a Vth drop, will be LVT in order to minimize the Vth optimization in GDI CLA adder can be seen in implementation of
drop. Other GDI transistors will be standard-Vth (SVT). the p function in Table 2. This optimization allowed reducing the
The FS GDI implementation is based on the F1 and F2 cells, as number of transistors in GDI design.
described in Fig. 3. The implementation did not require addition of Both versions of GDI implementation were compared with
inverters for driving the SR transistors. All the inverted signals that standard CMOS design of CLA adder. In order to maintain optimal
were used in SR transistors appeared inherently in the functional design, the XOR circuits in CMOS were implemented using the
implementation. In MUX cell implementation, a couple of com- PTL technique. Table 3 summarizes the transistor count and area
plementary SR transistors was used at the output, making the cell estimation of both GDI designs vs. CMOS counterpart. Note, all
similar to a PTL MUX. CMOS gates were implemented using minimum sized transistors
In both the versions the XOR implementation was optimized. with a standard ratio between pull up and pull down networks, i.e.
The basic implementation of a GDI XOR gate, as was proposed β¼2. GDI F1, F2 and XOR gates were sized similarly to a CMOS
in [1], consists of 4 transistors comprising a GDI cell used as a inverter (β¼ 2). Supplementary SR transistors were: minimum
MUX, and an inverter. However, in complex circuits, the diffusion sized NMOS and double sized PMOS. The area estimation is
input to GDI XOR may already have an inverted representation normalized with respect to WminLmin.
It can be clearly seen that both GDI implementations have
significant advantage in terms of transistors count and area,
as compared to CMOS. While FS GDI implementation provides
full swing and improved performance (as will be shown in the
F* following sections), it implies about 40% area increase as com-
pared to MVT GDI. Still, it occupies only half of area as compared to
CMOS design.
Table 3
Transistor count and area estimation comparison between GDI and Static CMOS designs.
Design Unit
CMOS 18 18 12 30 34 934
(41) (35) (24) (59) (77) (2039)
FS GDI 11 11 10 21 31 627
(16) (16) (15) (31) (46) (925)
at higher VDD values because of the VTH drops. Thus, the MVT GDI
achieves the MEP at higher voltage. FS GDI design has slightly References
higher effective capacitances. Therefore, its energy consumption at
large voltages is increased. Since no VTH drops occur in FS GDI, its [1] M. Alioto, Ultra-low power VLSI circuit design demystified and explained: a
tutorial, IEEE Transactions on Circuits and Systems—Part I (invited) 59 (1)
leakage component becomes dominant at lower supply voltage and (2012) 3–29.
therefore the MEP is shifted towards lower VDD. The CMOS adder [2] G. Gammie, A. Wang, M. Chau, S. Gururajarao, R. Pitts, F. Jumel, S. Engel, P.
MEP appears at lower VDD because of the dominating dynamic Royannez, R. Lagerquist, H. Mair, A 45 nm 3.5 g baseband-and-multimedia
application processor using adaptive body-bias and ultra-low-power techni-
energy caused by higher switching capacitances.
ques, in: Proceedings of IEEE International Solid-State Circuits Conference
Fig. 12 presents the propagation delay of all adders as function Digest of Technical Papers (ISSCC), 2008, pp. 258–611.
of supply voltage sweep. The delays at MEP are also shown. As can [3] Bol D. Robust and energy-efficient ultra-low-voltage circuit design under
be seen, the MVT GDI adder exhibits a delay of about one order timing constraints in 65/45 nm CMOS. Journal of Low Power Electronics and
Applications 1 (1) (2011) 1–19.
higher than in CMOS and FS GDI. However, at MEP, the delay [4] G. Chen, M. Fojtik, D. Kim, D. Fick, J. Park, M. Seok, M.T. Chen, Z. Foo, D. Sylvester,
of MVT GDI is comparable with the MEP delays of the other D. Blaauw, Millimeter-scale nearly perpetual sensor system with stacked battery
techniques. and solar cells, in: Proceedings of IEEE International Solid-State Circuits
Conference Digest of Technical Papers (ISSCC), 2010 , pp. 288–289.
Table 5 summarizes the characteristics of all the designs at [5] I. Vaisband, E.G. Friedman, R. Ginosar, A. Kolodny, Low power clock network
MEP. As can be seen, FS GDI presents the best energy and delay design, Journal of Low Power Electronics and Applications 1 (2011) 219–246.
at MEP. Moreover, the MEP is achieved at higher VDD than CMOS, [6] A. Teman, L. Pergament, O. Cohen, A. Fish, Minimum leakage quasi-static RAM
bitcell, Journal of Low Power Electronics and Applications 1 (2011) 204–218.
which when accounting the similar process variation sensitivity [7] A. Morgenshtein, A. Fish, I.A. Wagner, Gate-diffusion input (GDI)—a power-
make FS GDI beneficial for minimal energy operation. efficient method for digital combinatorial circuits, IEEE Transactions on VLSI
Systems 10 (5) (2002).
[8] A. Morgenshtein, I. Shwartz, A. Fish, Gate diffusion input (GDI) logic in
standard CMOS nanoscale process, in: Proceedings of IEEE Convention of
5. Conclusions Electrical and Electronics Engineers in Israel, 2010.
[9] M. Kumar, M.A. Hussain, L.L.K. Singh, Design of a Low Power High Speed ALU
in 45nm Using GDI Technique and Its Performance Comparison Communica-
Full Swing Gate Diffusion Input (FS GDI) methodology was
tions in Computer and Information Science 142 (Part 3) (2011) 458–463.
proposed and evaluated on a 16-bit Carry Look Ahead Adder (CLA). [10] K.K. Chaddha, R. Chandel, Design and analysis of a modified low power CMOS
It was shown that the proposed FS-GDI circuits are beneficial in full adder using gate-diffusion input technique, Journal of Low Power
terms of performance and static power consumption, as compared Electronics 6 (4) (2010) 482–490.
[11] O.P. Hari, A.K. Mai, Low power and area efficient implementation of N-phase
to the conventional multiple Vth GDI (MVT-GDI). Three CLA non overlapping clock generator using GDI technique, in: Proceedings of IEEE
versions, based on FS-GDI, MVT-GDI and standard CMOS were International Conference on Electronics Computer Technology (ICECT), 2011.
70 A. Morgenshtein et al. / INTEGRATION, the VLSI journal 47 (2014) 62–70
[12] P.M Lee, C.H. Hsu, Y.-H. Hung, Novel 10-T full adders realized by GDI structure, in: Viacheslav Yuzhaninov received the B.Sc. degree
Proceedings of the IEEE International Symposium on Integrated Circuits, 2007. in Electrical Engineering from Ben-Gurion University,
[13] F. Moradi, D.T. Wisland, D.T.H. Mahmoodi, H.S. Aunet, T.V. Cao, A. Peiravi, Ultra Be’er Sheva, Israel, in 2012. He has been a Research
low power full adder topologies, in: Proceedings of ISCAS'04, Taipei, Taiwan, Assistant at the Low Power Circuits and Systems Lab,
May 2009. VLSI Systems Center, Ben-Gurion University, since 2011.
[14] A. Morgenshtein, A. Fish, I.A. Wagner, An efficient implementation of D-flip- He is currently working on his M.Sc degree in
flop using the GDI technique, in: Proceedings of ISCAS'04 Conference, Canada, Electrical Enginering at Bar-Ilan University. His research
May 2004, pp. 673–676. interests are energy effcient logic families and low
[15] R. Uma, P. Dhavachelvan, Modified gate diffusion input technique: a new voltage high performance digital design.
technique for enhancing performance in full adder circuits, Proceedings of
ICCCS 6 (2012) 74–81.
[16] A. Wang, B.H. Calhoun, A.P. Chandrakasan, Sub-Threshold Design for Ultra
Low-Power Systems, Springer Verlag, 2006.
[17] S. Fisher, A. Teman, D. Vaysman, A. Gertsman, O. Yadid-Pecht, A. Fish, Digital
subthreshold logic design - motivation and challenges, in: Proceedings of the
IEEE 25th Convention of Electrical and Electronics Engineers in Israel (IEEEI),
Alexey Kovshilovsky received the B.Sc. degree in
vol. 3–5, 2008, pp. 702–706.
Electrical Engineering from Ben-Gurion University,
[18] P.R. Panda, A. Shrivastava, P.R. Panda, B.V.N. Silpa, K. Gummidipudi, Power-
Be’er Sheva, Israel, in 2012. He has been a Research
Efficient System Design, Springer Verlag, 2010.
Assistant at the Low Power Circuits and Systems Lab,
[19] D. Markovic, C.C. Wang, L.P. Alarcon, J.M. Rabaey, Ultralow-power design in
VLSI Systems Center, Ben-Gurion University, since 2011.
near-threshold region, Proceedings of the IEEE 98 (2010) 237–252.
Currently he is working as an Embedded Software
[20] B. Zhai, S. Hanson, D. Blaauw, D. Sylvester, Analysis and mitigation of
Engineer at Powermat Technologies in Neve-Ilan, Israel.
variability in subthreshold design, in: Proceedings of the 2005 International
Symposium on Low power Electronics and Design,2005, pp. 20–25.
[21] N. Verma, J. Kwong, A.P. Chandrakasan, Nanometer MOSFET variation in
minimum energy subthreshold circuits, IEEE Transactions on Electron Devices
55 (2008) 163–174.
[22] D.F. Finchelstein, V. Sze, M.E. Sinangil, A.P. Chandrakasan, A 0.7-V 1.8-mW H.
264/AVC 720p video decoder, IEEE Journal of Solid State Circuits 44 (2009)
2943–2956.
[23] Y. Pu, J.P. de Gyvez, H. Corporaal, Y. Ha, An ultra-low-energy/frame multi-
standard JPEG co-processor in 65 nm CMOS with sub/near-threshold power Alexander Fish received the B.Sc. degree in Electrical
supply, in: Proceeding of IEEE International Solid-State Circuits Conference- Engineering from the Technion, Israel Institute of
Digest of Technical Papers (ISSCC), 147 a, 2009, pp. 146–147. Technology, Haifa, Israel, in 1999. He completed his
[24] B.H. Calhoun, A. Wang, A. Chandrakasan, Modeling and sizing for minimum M.Sc. in 2002 and his Ph.D. (summa cum laude) in
energy operation in subthreshold circuits, IEEE Journal of Solid-State Circuits 2006, respectively, at Ben-Gurion University in Israel.
40 (9) (2005) 1778–1786. He was a postdoctoral fellow in the ATIPS laboratory at
the University of Calgary (Canada) from 2006–2008. In
2008 he joined the Ben-Gurion University in Israel, as a
faculty member in the Electrical and Computer Engi-
Arkadiy Morgenshtein received the B.Sc. degree in neering Department. There he founded the Low Power
electrical engineering in 1999, M.Sc. in biomedical Circuits and Systems (LPC&S) laboratory, specializing in
engineering in 2003, MBA in 2006 and Ph.D in elec- low power circuits and systems. In July 2011 he was
trical engineering in 2008 from Technion, Israel Insti- appointed as a head of the VLSI Systems Center at BGU.
tute of Technology. In 2012 he joined the IBM Haifa In October 2012 Prof. Fish joined the Bar-Ilan University, Faculty of Engineering as
Research Lab. Prior to that he worked with Core CAD an Associate Professor and the head of the microelectronics track. Prof. Fish also
Technologies group at Intel Corporation, where he was leads new Energy Efficient Electronics and Applications Labs.
researching and developing tools for power optimiza- Prof. Fish's research interests include development of energy efficient “smart”
tion and estimation at various levels of VLSI design. He CMOS image sensors, ultra low power SRAM, DRAM and Flash memory arrays and
has been a Teaching and Research Assistant at Electrical energy efficient design techniques for low voltage digital and analog VLSI chips. He
Engineering Department, Technion since 1999, where has authored over 70 scientific papers in journals and conferences, including IEEE
he is currently an Adjunct Lecturer. Journal of Solid State Circuits, IEEE Transactions on Electron Devices, IEEE Transac-
Dr. Morgenshtein's research interests include low- tions on Circuits and Systems and many others. He also submitted 16 patent
power design techniques for digital circuits, optimization of on-chip interconnect, applications. Prof. Fish has published two book chapters. He was a co-author
CMOS sensors and EDA tools for power estimation and optimization. He has of papers that won the Best Paper Finalist awards at IEEE ISCAS and ICECS
authored over 40 scientific papers and patent applications. Dr. Morgenshtein co- conferences.
authored a paper that won the IEEE VLSI Transactions (TVLSI) Best Paper award Prof. Fish serves as an Editor in Chief for the MDPI Journal of Low Power
for 2012. He was honored by Technion President's award and Intel-Technion award Electronics and Applications (JLPEA) and as an Associate Editor for the IEEE Sensors
for excellence in study in 1998 and 2007. He supervised projects winning the award Journal. He also served as a chair of different tracks of various IEEE conferences. He
of Oz Moses Foundation by Intel in 2002 and best VLSI project in 2003 and 2005. was a co-organizer of many special sessions at IEEE conferences, including IEEE
Dr. Morgenshtein has served as associate editor for the Journal of Low Power ISCAS, IEEE Sensors and IEEEI conferences. Prof. Fish is a member of Sensory, VLSI
Electronics and Applications (JLPEA), as session chairman at ICECS'04 Conference Systems and Applications and Bio-medical Systems Technical Committees of IEEE
and as referee in multiple journals and conferences. Circuits and Systems Society.