Full Swing Gate Diffusion Input Logic - Case Study of Low Power CLA Adder Design

INTEGRATION, the VLSI journal 47 (2014) 62–70
Contents lists available at ScienceDirect
INTEGRATION, the VLSI journal

journal homepage: www.elsevier.com/locate/vlsi
Full-Swing Gate Diffusion Input logic—Case-study of low-power

CLA adder design
Arkadiy Morgenshtein, Viacheslav Yuzhaninov, Alexey Kovshilovsky, Alexander Fish n
Faculty of Engineering, Bar-Ilan University, Ramat-Gan 52900, Israel
art ic l e i nf o a b s t r a c t
Article history: Full Swing Gate Diffusion Input (FS-GDI) methodology is presented. The proposed methodology is
Received 15 July 2012 applied to a 40 nm Carry Look Ahead Adder (CLA). The CLA is implemented mainly using GDI full-swing
Received in revised form F1 and F2 gates, which are the counterparts of standard CMOS NAND and NOR gates. A 16-bit GDI CLA
8 February 2013
was designed in a 40 nm low power TSMC process. The CLA, implemented according to the proposed
Accepted 24 April 2013
Available online 7 May 2013
methodology, presents full functionality and robustness under global and local process variations at
wide range of supply voltages. Simulation results show 2 area reduction, 5 improvement in dynamic
Keywords: energy dissipation and 4 decrease in leakage, with a slight (24%) degradation in performance, when
Alternative logic family compared to the CMOS CLA. Advanced design metrics of GDI cells, such as minimum energy point (MEP)
Carry Look Ahead (CLA) adder
operation and minimum leakage vector (MLV), are discussed.
Full-Swing GDI
& 2013 Elsevier B.V. All rights reserved.
Gate Difusion Input (GDI)
Low power
1. Introduction Recently, it was shown that any GDI circuit can be implemented
in a standard CMOS process [8]. The efficiency of the GDI method
Power consumption and area reduction of logic and memory for both combinatorial and sequential logic was shown by many
have become primary focuses of attention in VLSI digital design groups [8–15]. Various combinatorial circuits, such as adders,
[1–6]. Power is the limiting factor in both high performance multipliers, comparators, and counters, were implemented in
systems and portable applications. Die area directly affects the processes from 0.8 mm down to 65 nm. GDI Flip-Flops were also
device size and cost. Since the introduction of the standard CMOS presented, showing improvements in both area and power, com-
Logic in early 80s, many design solutions have been proposed pared to existing Flip Flop styles.
to improve power dissipation, area and performance of digital In this paper we present an efficient methodology for digital
VLSI chips. circuits implementation. The proposed methodology was applied
Gate Diffusion Input (GDI) design methodology was introduced to a 16-bit adder in low power standard 40 nm TSMC process. CLA
as a promising alternative to Static CMOS Logic [7]. Originally adder architecture, which was originally proposed as an alterna-
proposed for fabrication in Silicon on Insulator (SOI) and twin-well tive for the speed enhancement of a simple ripple adder, was
CMOS processes, GDI methodology allowed implementation of a chosen as a benchmark circuit in this work. The proposed CLA
wide range of complex logic functions using only two transistors implementation utilizes improved full-swing GDI F1 and F2 gates,
[7]. It was shown, that area and dynamic power of GDI combina- which are the counterparts of standard CMOS NAND and NOR
torial and sequential logic were significantly reduced, as compared gates. The CLA design is compared with our previously shown GDI
to standard CMOS implementations. Similarly to existing alter- methodology [8], which utilizes swing-restoring buffers with
natives to CMOS, such as Pass Transistor Logic (PTL), the GDI gates selective application of high-Vth transistors, as well as with
presented reduced voltage swing at their outputs due to threshold standard CMOS implementation. Simulation results show CLA
drops. These drops usually cause degradation in performance and functionality and robustness under global and local process varia-
increased short circuit power [8]. However, since the GDI circuits tions. The CLA presents 2 area reduction and 4–5 power
were implemented with much less transistors, a significant power reduction, compared to the conventional CMOS implementation.
overall power reduction was observed, while maintaining minimal The contributions of this paper are as follows: (1) The GDI full
performance penalty. swing methodology for a standard nanoscaled CMOS technology is
presented; (2) Leakage reduction in GDI is discussed through
minimum leakage vector (MLV) analysis; (3) GDI robustness is
evaluated through statistical Monte Carlo simulations; and (4) Low
n
Corresponding author. Tel.: +972 54 8044144; fax: +972 3 7384051. voltage operation of the GDI cells, including minimum energy
E-mail addresses: alexander.fish@gmail.com, alexander.fish@biu.ac.il (A. Fish). operation (MEP) is shown.
0167-9260/$ - see front matter & 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.vlsi.2013.04.002
A. Morgenshtein et al. / INTEGRATION, the VLSI journal 47 (2014) 62–70 63
The paper is organized as follows: Section 2 overviews the transistors. This combination allows minimization of the direct-
GDI methodology and presents its benefits and limitations. The path static power in the inverters.
proposed CLA implementation is discussed in Section 3. Section 4 Most of today's static digital designs are based on CMOS NAND
presents simulation results of the proposed GDI CLA in 40 nm and NOR gates. The reasons for this are known and well explored.
standard CMOS process, comparing them to the CMOS CLA. Both NAND and NOR gates are implemented using only four
Section 5 concludes the paper. transistors and each one of these functions is a universal set. The
GDI method, which is very efficient for implementation of various
gates, such as MUX, AND, OR (see Table 2), has similar number of
transistors in NAND/NOR gates implementation as standard CMOS
2. Overview of GDI
methodology. However, the GDI technology provides alternative
basic functions, F1 and F2. Consisting only of two transistors (one
The basic GDI cell is shown in Fig. 1. At the first glance, the GDI
GDI cell), each one of these functions represents a universal set.
cell, which consists of only two transistors, resembles the conven-
Moreover, it was shown in [7] that F1 and F2 functions can be used
tional CMOS inverter. However, contradictory to the inverter, it
to synthesize other functions more efficiently than the NAND
contains three inputs: G (common gate input of both the nMOS
and NOR gates. Fig. 2 shows a comparison of number of various
and the pMOS), P (input to the source/drain of the pMOS), and N
functions that can be implemented using the same number of F1
(input to the source/drain of the nMOS).
and NAND gates. The strength of the F1 and F2 gates will be also
It was shown that multiple Boolean functions can be imple-
demonstrated by implementation of CLA using mainly these GDI
mented by a simple GDI cell, as demonstrated in Table 1. This is
gates (see Section 3).
achieved by a change of the input configuration of the GDI cell.
While implementation of most of these functions is relatively
complex (6–12 transistors) in Static CMOS, it is very efficient (only
2 transistors) with the GDI cells. The Multiplexer (MUX) is the 3. GDI CLA adder design
most complex function that can be implemented with a basic GDI
cell, while being the most efficient function as compared to CMOS 3.1. Full-swing GDI cells
implementation.
GDI gates may suffer from threshold voltage drops which In this paper we propose full-swing (FS) GDI cells. The
reduce current drive and therefore affect the performance of the proposed technique utilizes a single swing restoration (SR) tran-
gate. These drops also increase direct-path static power dissipation sistor to improve the output swing of F1 and F2 GDI gates. Fig. 3
in the cascaded inverters, used for swing restoration. It was shown shows the structure of full swing F1 and F2 cells. As can be seen,
that these effects can be significantly reduced by using swing- the SR transistor is activated only in cases when the Vth drop may
restoration buffers with a multiple VTH approach [7], herein occur at the output. Since in F1 and F2 gates, the output VTH drop
named MVT. This approach suggests using low threshold transis- can occur only at one of the logical levels (VTH instead of 0 V in F1,
tors in all paths where a voltage drop is expected. This way, and VDD–VTH instead of VDD in F2), only a single SR transistor is
the voltage drop at the output will be minimal. In addition, all required to ensure the full swing operation.
regenerative inverters are implemented using high threshold In cases where the gate input signal of GDI cell has an inverted
representation in the circuit, it can be used to control the swing
restoring transistor. This transistor will have a diffusion input similar
to the diffusion input of GDI, but will be of an opposite type (nMOS
for F1, and pMOS for F2). In this manner, the diffusion input signal
will pass through a pair of transistors of both types (in the transistor
of original GDI cell, and the complementary SR transistor). The
FS GDI cells are efficient alternative for swing restoration buffers,
in designs where inverted signals can be obtained as part of logic
function implementation. The CLA adder presented in the next sub-
section is a good example of such design. An example of a logic chain
in CLA, containing full-swing GDI cells, is shown in Fig. 4.
3.2. CLA implementation
Two GDI versions of a 16-bit CLA adder were implemented in

this work: one using swing restoration buffers with multiple Vth
(MVT GDI), and the second one with FS GDI gates.
Fig. 1. Basic GDI cell.
A similar conventional CLA architecture, shown in Fig. 5, was
used to implement all the CLA versions. The circuit-level imple-
mentation of MVT and FS GDI adders was different, in order to
Table 1 address the specific properties of each technique. The implemen-
Boolean function synthesis through input configuration of a simple GDI cell tation was based mostly on F1 and F2 cells. The detailed GDI
implementation of various CLA blocks is given in (1)–(4) and
N P G Out Function
is depicted in Table 2, assuming the logical functions F1 ða; bÞ ¼ ab
‘0’ B A AB F1 and F2 ða; bÞ ¼ a þ b.
B ‘1’ A AþB F2 The pg unit outputs were implemented as follows:
‘1’ B A AþB OR
‘0’
p ¼ a⊕b; GDI XOR gate
B A AB AND (
C B A AB þ AC MUX F 2 ða; bÞ; MVT GDI ð1Þ
‘0’ ‘1’ A NOT g ¼ ab ¼
A F 1 ðb; aÞ; FS GDI
64 A. Morgenshtein et al. / INTEGRATION, the VLSI journal 47 (2014) 62–70
Table 2
The transistor-level design of CLA blocks.
Unit MVT GDI Full-Swing (FS) GDI
pg
PG
LCG
CLA
Table 2 (continued )
Unit MVT GDI Full-Swing (FS) GDI
Last CLA
Inverter with HVT nMOS transistor, Inverter with HVT pMOS transistor, Inverter with both, HVT transistors.
Note: F1* and F2* stand for GDI F1 and F2 full swing gates.
CMOS NAND cell vs. GDI F1 cell

45
41
40
35
Number of Functions
30
25 GDI
21
20
16
CMOS
15
10
8 Fig. 3. Scheme of the FS GDI gates.
8
5 4
2
The local and global carry generator blocks are implemented
0
0 1 2 3 4 according to:
Number of Cells C loc ¼ g 0 þ p0 C in
(
Fig. 2. Number of various functions that can be implemented using the same F 1 ðg 0 ; F 2 ðC in ; p0 ÞÞ; MVT GDI
number of F1 and NAND cells (after [7]). ¼ ð3Þ
F 2 ðF 1 ðp0 ; C in Þ; g 0 Þ; FS GDI
C out ¼ G1 þ P 1 G0 þ P 1 P 0 C in ¼ F 2 ðP 1 G0 þ G1 ; P 1 P 0 C in Þ
where a similar GDI XOR gate was used in both versions, as will be
explained below. ¼ F 2 ðF 2 ðP 1 G0 ; G1 Þ; F 1 ðP 1 P 0 ; C in ÞÞ
The PG unit implementation is the following:
¼ F 2 ðF 2 ðF 1 ðP 1 ; G0 Þ; G1 Þ; F 1 ðF 1 ðP 1 ; P 0 Þ; C in ÞÞ ð4Þ
P ¼ p1 p0 ¼ F 1 ðp1 ; p0 Þ
( The signals P 1 ; G1 and P 0 ; G0 in (4) represent the outputs from
F 1 ðg 1 ; F 2 ðg 0 ; p1 ÞÞ; MVT GDI top hierarchy level CLA blocks.
G ¼ g 1 þ g 0 p1 ¼ ð2Þ It should be noted that in the MVT implementation, the
F 2 ðF 1 ðp1 ; g 0 Þ;g 1 Þ; FS GDI
application of high-Vth (HVT) transistors in swing restoration
where output P has similar implementation in both the versions. buffers is selective and depends on the Vth drop that may occur
in the path between two buffers. In case that the Vth drop at buffer elsewhere in the circuit. Thus, instead of implementing an inverter
input may occur only at high (low) voltage, then an asymmetric again as part of GDI XOR, we can use both signals as diffusion
buffer will be used with high-Vth nMOS (pMOS) transistor. All inputs to GDI MUX while maintaining the same functionality.
other transistors in the buffers will remain low-Vth (LVT) to This allows a significant decrease in number of transistors in
maintain the performance. In all the GDI cells, the transistor that circuits with high number of XOR/XNOR gates. An example of the
may cause a Vth drop, will be LVT in order to minimize the Vth optimization in GDI CLA adder can be seen in implementation of
drop. Other GDI transistors will be standard-Vth (SVT). the p function in Table 2. This optimization allowed reducing the
The FS GDI implementation is based on the F1 and F2 cells, as number of transistors in GDI design.
described in Fig. 3. The implementation did not require addition of Both versions of GDI implementation were compared with
inverters for driving the SR transistors. All the inverted signals that standard CMOS design of CLA adder. In order to maintain optimal
were used in SR transistors appeared inherently in the functional design, the XOR circuits in CMOS were implemented using the
implementation. In MUX cell implementation, a couple of com- PTL technique. Table 3 summarizes the transistor count and area
plementary SR transistors was used at the output, making the cell estimation of both GDI designs vs. CMOS counterpart. Note, all
similar to a PTL MUX. CMOS gates were implemented using minimum sized transistors
In both the versions the XOR implementation was optimized. with a standard ratio between pull up and pull down networks, i.e.
The basic implementation of a GDI XOR gate, as was proposed β¼2. GDI F1, F2 and XOR gates were sized similarly to a CMOS
in [1], consists of 4 transistors comprising a GDI cell used as a inverter (β¼ 2). Supplementary SR transistors were: minimum
MUX, and an inverter. However, in complex circuits, the diffusion sized NMOS and double sized PMOS. The area estimation is
input to GDI XOR may already have an inverted representation normalized with respect to WminLmin.
It can be clearly seen that both GDI implementations have
significant advantage in terms of transistors count and area,
as compared to CMOS. While FS GDI implementation provides
full swing and improved performance (as will be shown in the
F* following sections), it implies about 40% area increase as com-
pared to MVT GDI. Still, it occupies only half of area as compared to
CMOS design.
F* 3.3. Leakage elimination in GDI
As was shown in [7], the unique structure of the GDI cell

provides significant reduction of both the sub-threshold and the
Fig. 4. Example logic chain in CLA, containing full-swing GDI cells. gate leakage components, as compared to a static CMOS gate.
Fig. 5. The general CLA architecture.
Table 3
Transistor count and area estimation comparison between GDI and Static CMOS designs.
Design Unit
pg PG LCG CLA Last CLA Total
CMOS 18 18 12 30 34 934
(41) (35) (24) (59) (77) (2039)
MVT GDI 8 8 8 16 24 438

(12) (12) (12) (24) (36) (657)
FS GDI 11 11 10 21 31 627
(16) (16) (15) (31) (46) (925)
Transistors count (Area [W min L min]).

values of 100 mV up to nominal values of 1.1 V. The designs were

simulated using SPICE based Virtuoso simulator. Both GDI designs
were compared to the CMOS counterpart in terms of performance,
power consumption, area estimation and sensitivity to process
variations.
4.1. Nominal voltage operation
The comparative results of performance, static power, energy

per operation and energy-delay product (EDP) in CMOS and GDI
Fig. 6. LCG unit with minimum leakage vector.
implementations are presented in Table 4.
Delay—As can be seen, the CMOS design has the shortest delay
among all implementations. The FS GDI implementation shows a
30% delay increase as compared to CMOS. The MVT GDI is about
three times slower than CMOS due to voltage drops. The perfor-
mance improvement in FS GDI is achieved due to better driving
capabilities of the modified F1 and F2 gates.
Static power consumption—According to the results presented
in Table 4, the static power of both GDI designs is significantly
lower than in CMOS design. One of the reasons for leakage
reduction in GDI is the reduced transistor count. In addition, as
shown in previous section, GDI benefits from an inherent sub-
threshold leakage reduction in half of the input vectors, leading to
a zero potential between the diffusion inputs of GDI cell [7]. The FS
GDI implementation presents reduction in static power consump-
tion. as compared to MVT GDI. This is achieved by maintaining full
swing at output nodes of all GDI cells, and therefore eliminating
the direct-path currents.
Dynamic Energy and EDP—Table 4 shows a 5–6 reduction of
dynamic energy consumption per operation in both GDI circuits
as compared to CMOS. The main reason for this is the reduced
switching capacitance in GDI. Note that although CMOS presents
an advantage over GDI by means of delay, it can be clearly seen
that the EDP metric of both GDI designs is better.
Fig. 7. PG unit with minimum leakage vector.
Since the sub-threshold leakage is still dominant, it is addressed Table 4

Comparison of performance, static and dynamic power in GDI and CMOS CLA
here in more details. implementations.
A general GDI cell eliminates the sub-threshold leakage in half
of all possible states. This is contrary to static CMOS gates, where tpd [psec] PStat [nW] EDyn [f] EDP [J sec 10−24]
the pull-up and the pull-down networks are always connected to
CMOS 134 78.7 123 16.5
the supply voltage and ground, respectively.
MVT GDI 422 35.9 23.5 8.2
Here we demonstrate this advantage by analyzing the minimum FS GDI 167 18.4 19.4 3.9
leakage vector (MLV) of a basic two-bit CLA module, consisting of
PG and LCG blocks. The following input vector provides the minimal
leakage in the two-bit FS-GDI CLA
!
vin ¼ ðg 0 ; g 1 ; p0 ; p1 ; C in Þ ¼ ð1; 1; 0; 1; 0Þ
As can be seen in Fig. 6, when the input vector is applied to LCG
block, four transistors are turned off (M1, M3, M4 and the inverter
nMOS). However, as the inputs Cin and go are connected to
diffusions of turned-off transistors, the potential at both diffusion
nodes of the transistors is similar. This leads to elimination of sub-
threshold leakage in transistors M1, M3 and M4.
Similar effect is observed when the input vector vin is applied
to the PG block, as shown in Fig. 7. As can be seen, due to the
diffusion input connections, three out of five turned-off transistors
have zero potential between the diffusion nodes. In this case
transistors M2, M8 and M9 exhibit zero sub-threshold leakage.
4. Comparative simulation results
The proposed 16-bit CLA circuits were designed in 40 nm TSMC

process with supply voltage varying from deep sub-threshold Fig. 8. Delay distribution of CMOS design derived by Monte Carlo simulation.
Sensitivity of Process Variations—In order to evaluate the sensi-

tivity of the designs to local and global process variations, Monte
Carlo simulations have been carried out. Figs. 8–10 present the
delay distribution of CMOS, MVT GDI and FS GDI, respectively.
As expected, FS GDI presents much better immunity to process
variations than MVT GDI, while showing μ/s ratio that is very close
to CMOS (only 12% degradation). The MVT GDI adder is much
more sensitive (4 degradation as compared to CMOS), because
of driving current dependence on process-sensitive Vth, which is
amplified due to voltage drops at internal nodes.
4.2. Low voltage operation
Driven by demand for ultra-low power dissipation, low voltage

operation of digital circuits has recently gained extensive research
efforts [16–23]. It was shown that minimum energy operation
point (MEP) is usually achieved in the sub-threshold region. Here
we examine the operation of GDI and CMOS adders at low supply
Fig. 9. Delay distribution of MVT GDI design derived by Monte Carlo simulation. voltages.
While discussing the MEP term, it should be reminded that the
total energy consumption is comprised of two different compo-
nents— Etotal ¼ Edyn þ Eleak . The dynamic energy component is
proportional to an effective load capacitance and supply voltage—
Edyn ∝C ef f V DD 2 , while the leakage component is dominated by
integration of a sub-threshold current along the operation time—
Eleak ∝V DD 2 expð−V DD =nV t Þ, assuming full swing operation [24]. It can
be easily noticed that these two components have an opposite
effect as a function of VDD, so their relation determines the MEP.
Fig. 10 presents the dependency of total energy consumption
on varying supply voltages. MEPs of GDI and CMOS adders are
shown. Note that the simulation was carried out at a typical corner
(TT), thus the actual MEPs may be higher due to increased leakage
currents under process variations.
An interesting observation from Fig. 11 is that the same order of
MEP energy dissipation is obtained for all designs, while the MEPs
were achieved at different voltages. The FS GDI has 35% lower MEP
energy, as compared to CMOS.
Both GDI designs have reduced effective load capacitances, thus
the consumed energy at high VDD values is lower than in CMOS.
Fig. 10. Delay distribution of FS GDI design derived by Monte Carlo simulation. The sub-threshold leakage in MVT GDI adder starts dominating
Fig. 11. MEPs simulation.

Fig. 12. Propagation delay of CLA Adders at low supply voltages.
Table 5 designed and compared in 40 nm low power TSMC process.

Transistor count and area estimation comparison between GDI and Static CMOS Simulation results showed a clear advantage of the proposed GDI
designs. CLAs by means of area, dynamic and static energy. The FS-GDI
achieved 2 area reduction, 5 improvement in dynamic energy
MEP VDD [V] MEP energy [fJ] MEP delay [μ sec]
dissipation and 4 decrease in leakage, with a slight (24%)
CMOS 0.18 4.7 5.51 degradation in performance, when compared to the CMOS CLA.
MVT GDI 0.32 5.2 5.92 Advanced design metrics of GDI cells, such as minimum energy
FS GDI 0.22 3.1 4.17 point (MEP) operation and minimum leakage vector (MLV) were
discussed. It was shown that MVT-GDI achieved better character-
istics at MEP, as compared to other techniques.
at higher VDD values because of the VTH drops. Thus, the MVT GDI
achieves the MEP at higher voltage. FS GDI design has slightly References
higher effective capacitances. Therefore, its energy consumption at
large voltages is increased. Since no VTH drops occur in FS GDI, its [1] M. Alioto, Ultra-low power VLSI circuit design demystified and explained: a
tutorial, IEEE Transactions on Circuits and Systems—Part I (invited) 59 (1)
leakage component becomes dominant at lower supply voltage and (2012) 3–29.
therefore the MEP is shifted towards lower VDD. The CMOS adder [2] G. Gammie, A. Wang, M. Chau, S. Gururajarao, R. Pitts, F. Jumel, S. Engel, P.
MEP appears at lower VDD because of the dominating dynamic Royannez, R. Lagerquist, H. Mair, A 45 nm 3.5 g baseband-and-multimedia
application processor using adaptive body-bias and ultra-low-power techni-
energy caused by higher switching capacitances.
ques, in: Proceedings of IEEE International Solid-State Circuits Conference
Fig. 12 presents the propagation delay of all adders as function Digest of Technical Papers (ISSCC), 2008, pp. 258–611.
of supply voltage sweep. The delays at MEP are also shown. As can [3] Bol D. Robust and energy-efficient ultra-low-voltage circuit design under
be seen, the MVT GDI adder exhibits a delay of about one order timing constraints in 65/45 nm CMOS. Journal of Low Power Electronics and
Applications 1 (1) (2011) 1–19.
higher than in CMOS and FS GDI. However, at MEP, the delay [4] G. Chen, M. Fojtik, D. Kim, D. Fick, J. Park, M. Seok, M.T. Chen, Z. Foo, D. Sylvester,
of MVT GDI is comparable with the MEP delays of the other D. Blaauw, Millimeter-scale nearly perpetual sensor system with stacked battery
techniques. and solar cells, in: Proceedings of IEEE International Solid-State Circuits
Conference Digest of Technical Papers (ISSCC), 2010 , pp. 288–289.
Table 5 summarizes the characteristics of all the designs at [5] I. Vaisband, E.G. Friedman, R. Ginosar, A. Kolodny, Low power clock network
MEP. As can be seen, FS GDI presents the best energy and delay design, Journal of Low Power Electronics and Applications 1 (2011) 219–246.
at MEP. Moreover, the MEP is achieved at higher VDD than CMOS, [6] A. Teman, L. Pergament, O. Cohen, A. Fish, Minimum leakage quasi-static RAM
bitcell, Journal of Low Power Electronics and Applications 1 (2011) 204–218.
which when accounting the similar process variation sensitivity [7] A. Morgenshtein, A. Fish, I.A. Wagner, Gate-diffusion input (GDI)—a power-
make FS GDI beneficial for minimal energy operation. efficient method for digital combinatorial circuits, IEEE Transactions on VLSI
Systems 10 (5) (2002).
[8] A. Morgenshtein, I. Shwartz, A. Fish, Gate diffusion input (GDI) logic in
standard CMOS nanoscale process, in: Proceedings of IEEE Convention of
5. Conclusions Electrical and Electronics Engineers in Israel, 2010.
[9] M. Kumar, M.A. Hussain, L.L.K. Singh, Design of a Low Power High Speed ALU
in 45nm Using GDI Technique and Its Performance Comparison Communica-
Full Swing Gate Diffusion Input (FS GDI) methodology was
tions in Computer and Information Science 142 (Part 3) (2011) 458–463.
proposed and evaluated on a 16-bit Carry Look Ahead Adder (CLA). [10] K.K. Chaddha, R. Chandel, Design and analysis of a modified low power CMOS
It was shown that the proposed FS-GDI circuits are beneficial in full adder using gate-diffusion input technique, Journal of Low Power
terms of performance and static power consumption, as compared Electronics 6 (4) (2010) 482–490.
[11] O.P. Hari, A.K. Mai, Low power and area efficient implementation of N-phase
to the conventional multiple Vth GDI (MVT-GDI). Three CLA non overlapping clock generator using GDI technique, in: Proceedings of IEEE
versions, based on FS-GDI, MVT-GDI and standard CMOS were International Conference on Electronics Computer Technology (ICECT), 2011.
[12] P.M Lee, C.H. Hsu, Y.-H. Hung, Novel 10-T full adders realized by GDI structure, in: Viacheslav Yuzhaninov received the B.Sc. degree
Proceedings of the IEEE International Symposium on Integrated Circuits, 2007. in Electrical Engineering from Ben-Gurion University,
[13] F. Moradi, D.T. Wisland, D.T.H. Mahmoodi, H.S. Aunet, T.V. Cao, A. Peiravi, Ultra Be’er Sheva, Israel, in 2012. He has been a Research
low power full adder topologies, in: Proceedings of ISCAS'04, Taipei, Taiwan, Assistant at the Low Power Circuits and Systems Lab,
May 2009. VLSI Systems Center, Ben-Gurion University, since 2011.
[14] A. Morgenshtein, A. Fish, I.A. Wagner, An efficient implementation of D-flip- He is currently working on his M.Sc degree in
flop using the GDI technique, in: Proceedings of ISCAS'04 Conference, Canada, Electrical Enginering at Bar-Ilan University. His research
May 2004, pp. 673–676. interests are energy effcient logic families and low
[15] R. Uma, P. Dhavachelvan, Modified gate diffusion input technique: a new voltage high performance digital design.
technique for enhancing performance in full adder circuits, Proceedings of
ICCCS 6 (2012) 74–81.
[16] A. Wang, B.H. Calhoun, A.P. Chandrakasan, Sub-Threshold Design for Ultra
Low-Power Systems, Springer Verlag, 2006.
[17] S. Fisher, A. Teman, D. Vaysman, A. Gertsman, O. Yadid-Pecht, A. Fish, Digital
subthreshold logic design - motivation and challenges, in: Proceedings of the
IEEE 25th Convention of Electrical and Electronics Engineers in Israel (IEEEI),
Alexey Kovshilovsky received the B.Sc. degree in
vol. 3–5, 2008, pp. 702–706.
Electrical Engineering from Ben-Gurion University,
[18] P.R. Panda, A. Shrivastava, P.R. Panda, B.V.N. Silpa, K. Gummidipudi, Power-
Be’er Sheva, Israel, in 2012. He has been a Research
Efficient System Design, Springer Verlag, 2010.
Assistant at the Low Power Circuits and Systems Lab,
[19] D. Markovic, C.C. Wang, L.P. Alarcon, J.M. Rabaey, Ultralow-power design in
VLSI Systems Center, Ben-Gurion University, since 2011.
near-threshold region, Proceedings of the IEEE 98 (2010) 237–252.
Currently he is working as an Embedded Software
[20] B. Zhai, S. Hanson, D. Blaauw, D. Sylvester, Analysis and mitigation of
Engineer at Powermat Technologies in Neve-Ilan, Israel.
variability in subthreshold design, in: Proceedings of the 2005 International
Symposium on Low power Electronics and Design,2005, pp. 20–25.
[21] N. Verma, J. Kwong, A.P. Chandrakasan, Nanometer MOSFET variation in
minimum energy subthreshold circuits, IEEE Transactions on Electron Devices
55 (2008) 163–174.
[22] D.F. Finchelstein, V. Sze, M.E. Sinangil, A.P. Chandrakasan, A 0.7-V 1.8-mW H.
264/AVC 720p video decoder, IEEE Journal of Solid State Circuits 44 (2009)
2943–2956.
[23] Y. Pu, J.P. de Gyvez, H. Corporaal, Y. Ha, An ultra-low-energy/frame multi-
standard JPEG co-processor in 65 nm CMOS with sub/near-threshold power Alexander Fish received the B.Sc. degree in Electrical
supply, in: Proceeding of IEEE International Solid-State Circuits Conference- Engineering from the Technion, Israel Institute of
Digest of Technical Papers (ISSCC), 147 a, 2009, pp. 146–147. Technology, Haifa, Israel, in 1999. He completed his
[24] B.H. Calhoun, A. Wang, A. Chandrakasan, Modeling and sizing for minimum M.Sc. in 2002 and his Ph.D. (summa cum laude) in
energy operation in subthreshold circuits, IEEE Journal of Solid-State Circuits 2006, respectively, at Ben-Gurion University in Israel.
40 (9) (2005) 1778–1786. He was a postdoctoral fellow in the ATIPS laboratory at
the University of Calgary (Canada) from 2006–2008. In
2008 he joined the Ben-Gurion University in Israel, as a
faculty member in the Electrical and Computer Engi-
Arkadiy Morgenshtein received the B.Sc. degree in neering Department. There he founded the Low Power
electrical engineering in 1999, M.Sc. in biomedical Circuits and Systems (LPC&S) laboratory, specializing in
engineering in 2003, MBA in 2006 and Ph.D in elec- low power circuits and systems. In July 2011 he was
trical engineering in 2008 from Technion, Israel Insti- appointed as a head of the VLSI Systems Center at BGU.
tute of Technology. In 2012 he joined the IBM Haifa In October 2012 Prof. Fish joined the Bar-Ilan University, Faculty of Engineering as
Research Lab. Prior to that he worked with Core CAD an Associate Professor and the head of the microelectronics track. Prof. Fish also
Technologies group at Intel Corporation, where he was leads new Energy Efficient Electronics and Applications Labs.
researching and developing tools for power optimiza- Prof. Fish's research interests include development of energy efficient “smart”
tion and estimation at various levels of VLSI design. He CMOS image sensors, ultra low power SRAM, DRAM and Flash memory arrays and
has been a Teaching and Research Assistant at Electrical energy efficient design techniques for low voltage digital and analog VLSI chips. He
Engineering Department, Technion since 1999, where has authored over 70 scientific papers in journals and conferences, including IEEE
he is currently an Adjunct Lecturer. Journal of Solid State Circuits, IEEE Transactions on Electron Devices, IEEE Transac-
Dr. Morgenshtein's research interests include low- tions on Circuits and Systems and many others. He also submitted 16 patent
power design techniques for digital circuits, optimization of on-chip interconnect, applications. Prof. Fish has published two book chapters. He was a co-author
CMOS sensors and EDA tools for power estimation and optimization. He has of papers that won the Best Paper Finalist awards at IEEE ISCAS and ICECS
authored over 40 scientific papers and patent applications. Dr. Morgenshtein co- conferences.
authored a paper that won the IEEE VLSI Transactions (TVLSI) Best Paper award Prof. Fish serves as an Editor in Chief for the MDPI Journal of Low Power
for 2012. He was honored by Technion President's award and Intel-Technion award Electronics and Applications (JLPEA) and as an Associate Editor for the IEEE Sensors
for excellence in study in 1998 and 2007. He supervised projects winning the award Journal. He also served as a chair of different tracks of various IEEE conferences. He
of Oz Moses Foundation by Intel in 2002 and best VLSI project in 2003 and 2005. was a co-organizer of many special sessions at IEEE conferences, including IEEE
Dr. Morgenshtein has served as associate editor for the Journal of Low Power ISCAS, IEEE Sensors and IEEEI conferences. Prof. Fish is a member of Sensory, VLSI
Electronics and Applications (JLPEA), as session chairman at ICECS'04 Conference Systems and Applications and Bio-medical Systems Technical Committees of IEEE
and as referee in multiple journals and conferences. Circuits and Systems Society.

Full Swing Gate Diffusion Input Logic - Case Study of Low Power CLA Adder Design

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Full Swing Gate Diffusion Input Logic - Case Study of Low Power CLA Adder Design

Diunggah oleh

Hak Cipta:

Format Tersedia

INTEGRATION, the VLSI journal 47 (2014) 62–70

Contents lists available at ScienceDirect

INTEGRATION, the VLSI journal

Full-Swing Gate Diffusion Input logic—Case-study of low-power

3.2. CLA implementation

Two GDI versions of a 16-bit CLA adder were implemented in

Unit MVT GDI Full-Swing (FS) GDI

Unit MVT GDI Full-Swing (FS) GDI

CMOS NAND cell vs. GDI F1 cell

F* 3.3. Leakage elimination in GDI

As was shown in [7], the unique structure of the GDI cell

Fig. 5. The general CLA architecture.

pg PG LCG CLA Last CLA Total

MVT GDI 8 8 8 16 24 438

Transistors count (Area [W min L min]).

values of 100 mV up to nominal values of 1.1 V. The designs were

4.1. Nominal voltage operation

The comparative results of performance, static power, energy

Since the sub-threshold leakage is still dominant, it is addressed Table 4

4. Comparative simulation results

The proposed 16-bit CLA circuits were designed in 40 nm TSMC

Sensitivity of Process Variations—In order to evaluate the sensi-

4.2. Low voltage operation

Driven by demand for ultra-low power dissipation, low voltage

Fig. 11. MEPs simulation.

Fig. 12. Propagation delay of CLA Adders at low supply voltages.

Table 5 designed and compared in 40 nm low power TSMC process.

Anda mungkin juga menyukai