Abstract—In Part II of this paper, a comparison of the most rep- k) Transmission-Gate Pulsed Latch (TGPL) [11].
resentative flip-flop (FF) classes and topologies in a 65-nm CMOS 3) Differential:
technology is carried out. The comparison, which is performed on l) Modified Sense-Amplifier FF (MSAFF) [12];
the energy-delay-area domain, exploits the strategies and method-
ologies for FFs analysis and design reported in Part I. In particular, m) Skew-Tolerant FF (STFF) [13];
the analysis accounts for the impact of leakage and layout par- n) Conditional Capture FF (CCFF) [14];
asitics on the optimization of the circuits. The tradeoffs between o) Variable Sampling Window FF (VSWFF) [15].
leakage, area, clock load, delay, and other interesting properties 4) Dual-edge-triggered (DET):
are extensively discussed. The investigation permits to derive sev- p) Transmission-gate Latch-Mux (DET-TGLM) [16];
eral considerations on each FF class and to identify the best topolo-
gies for a targeted application. q) Symmetric Pulse Generator FF (DET-SPGFF) [17];
r) Static Pulsed Latch (DET-SPL) [18];
Index Terms—Clocking, energy-delay tradeoff, energy effi- s) Conditional Discharge FF (DET-CDFF) [19].
ciency, flip-flops (FFs), high speed, interconnects, leakage, logical
effort, low power, nanometer technologies, VLSI. To derive the ranking among the various FFs, the energy ef-
ficiency is evaluated by extracting their energy-efficient curves
(EEC) [20], [21], under the conditions described in Part I. At
I. INTRODUCTION the same time, the most significant points of the EEC are asso-
ciated with the minimization of the energy-delay products
N PART II of this paper, the general framework for FFs
I analysis and design discussed in Part I [1] is adopted to carry
out a comparison among 19 flip-flop (FF) topologies in a 65-nm
with proper exponents. Hence, results are very general and have
a clear physical meaning, since are linked to figures of merit
(FOMs) that designers are familiar with.
CMOS technology, which is the widest comparison presented in Impact of leakage on FF energy in standby and active mode
the literature. The circuits are selected among the most efficient is discussed, and the influence on the FFs design is highlighted.
and best-known ones and were presented at the end of the Part The tradeoff between leakage, area, clock load, and delay is
I [1]. They are again listed in the following, together with the analyzed. Several additional FFs features are also considered,
following respective classes [see Fig. 6(a)–(s) in Part I]. like the load on the clock network, the layout efficiency, and the
1) Master-Slave (MS): leakage-area interdependence.
a) Transmission-Gate FF (TGFF) [2]; This paper is structured as follows. In Section II, the normal-
b) Write-Port Master Slave FF (WPMS) [3]; ization of results to technology is showed. The energy-efficiency
c) Gated Master Slave FF (GMSL) [2]; potentials of the considered FFs and the various tradeoffs
d) Data-transition lookahead FF (DTLA) [4]. are discussed in Sections III and IV. The successive three sec-
2) Pulsed, both implicit (IP) and explicit (EP): tions deal with leakage (see Section V), area (see Section VI),
e) Hybrid Latch-FF (HLFF) [5]; and clock load (see Section VII). Comparison among the con-
f) Semi-Dynamic FF (SDFF) [6]; sidered FFs, tradeoffs with delay and several other interesting
g) UltraSPARC Semi-Dynamic FF (USDFF) [7]; properties are reported in Sections V–VII. Finally, the conclu-
h) Implicitly Push-Pull FF (IPPFF) [8]; sions are in Section VIII.
i) Conditional Precharge FF (CPFF) [9];
j) Static Explicit Pulsed FF (SEPFF) [10]; II. NORMALIZATION TO TECHNOLOGY
To gain an intuitive understanding of results independently of
Manuscript received September 23, 2009; revised December 21, 2009. First
published March 25, 2010; current version published April 27, 2011.
technology, the various quantities and data are properly normal-
M. Alioto is with the Dipartimento di Ingegneria dell’Informazione (DII), ized to reference technology values. In particular, we have the
Università di Siena, 53100 Siena, Italy, and also with the Berkeley Wireless following:
Research Center—Electrical Engineering and Computer Science Department,
University of California, Berkeley, CA 94704-1302 USA (e-mail: malioto@dii.
• capacitances are normalized to that of a symmetrical min-
unisi.it; alioto@eecs.berkeley.edu). imum inverter , ;
E. Consoli and G. Palumbo are with the Dipartimento di Ingegneria • delays are normalized to delay [24];
Elettrica, Elettronica e dei Sistemi (DIEES), Università di Catania, I-95125 • energies are normalized to , which is the energy
Catania, Italy (e-mail: econsoli@diees.unict.it; elio83@katamail.com;
gpalumbo@diees.unict.it). dissipated by an unloaded symmetrical minimum inverter
Digital Object Identifier 10.1109/TVLSI.2010.2041377 during a complete transition cycle at its output;
1063-8210/$26.00 © 2010 IEEE
738 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011
TABLE I
65-nm CMOS TECHNOLOGY: MAIN PARAMETERS
D. Dual-Edge-Triggered FFs
We decide to put the selected DET topologies in a single class
Fig. 5. EECs of differential FFs: C = 16C , = 0:25, T =F O4 = even if they have quite different basic operations and features.
40 . The EECs of the DET FFs, derived in the reference case, are
reported in Fig. 7. As for the Differential class, two topologies
emerge as the most energy-efficient ones: DET-SPL in the high-
C. Single-Edge-Triggered Differential FFs speed region (from to FOMs) and the DET-TGLM
The EECs of the SET Differential FFs in the reference case in the low-energy one (from to FOMs). In partic-
are reported in Fig. 5. From Fig. 5, the space is split in two ular, DET-TGLM (which has a MS structure) dissipates less
regions: the high-speed one (from to FOMs), where the energy among all the analyzed FFs. On average, DET-SPGFF,
STFF is the most energy-efficient, and the low-energy one (from DET-SPL and DET-CDFF dissipate 1.7 , 2.4 , and 2.1 more
to FOMs), where the MSAFF is the best Differential energy than the DET-TGLM. This is due to the combination of
FF. In particular, STFF is the fastest among all the analyzed FFs. the DET and MS features, which both contribute to reduce en-
For instance the of TGPL is 1.1 greater than the STFF, ergy consumption.
ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 741
Fig. 6. EECs of differential FFs: (a) = 0:1 and (b) = 0:5 (C = Fig. 8. EECs of DET FFs: (a) C = 64C and (b) C = 4C ( =
16C , T =F O 4 = 40 ). 0:25, T =F O4 = 40).
Fig. 9. EECs of DET FFs: (a) = 0:1 and (b) = 0:5 (C = 16C ,
T =F O 4 = 40 ).
A. Metrics
In Fig. 10(a)–(e), we report the FOMs , , , ,
and of all FFs, normalized to the best topology (again,
we consider the reference case). This permits to draw general
conclusions on the comparison of the analyzed FF classes. It is
apparent that Pulsed topologies are the most energy efficient in
Fig. 10. (a) D ; (b) ED ; (c) ED ; (d) E D ; (e) E normalized FOMs:
the high-speed region ( and FOMs) and the EP FFs in C = 16C , = 0:25, T =F O4 = 40.
particular result more energy-efficient than the IP FFs in such a
region. Since pulsed FFs are employed in real high-speed appli-
cations (e.g., Intel microprocessors), EP topologies can be con- The superiority of EP over IP FFs is explained by considering
sidered the best choice in such a case. that, in nanometer technologies, IP FFs suffer from a complex
ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 743
TABLE III
OPTIMUM SIZING VARIATION FOR T =FO 4 = 10 AND T =FO4 = 80
TABLE IV
AVERAGE LEAKAGE UNDER VARIOUS OPTIMUM SIZINGS (LEAKAGE
NORMALIZED TO THE MINIMUM IS REPORTED IN BRACKETS) AND AVERAGE
(AMONG THE FOMS) RATIO BETWEEN AVERAGE AND MINIMUM LEAKAGE
Fig. 12. Ideal EEC extracted selecting the most energy-efficient FFs and min-
imum-E D designs (layout parasitics not included).
TABLE II
OPTIMUM TGFF SIZING FOR T =FO4 = 10 AND T =FO4 = 80
(1)
TABLE V
ABSOLUTE AREA UNDER VARIOUS OPTIMUM SIZINGS (AREA NORMALIZED TO THE MINIMUM IS REPORTED IN BRACKETS) AND
LAYOUT EFFICIENCY UNDER VARIOUS OPTIMUM SIZINGS
(2)
where both dynamic and static energy are taken into account,
Fig. 16. Clock load degradation (normalization to E sizing). Optimization in
active mode: C = 16C , = 0:25, T =FO4 = 40. is the capacitance of the first buffer stage (in the fol-
lowing ) and is the number of
buffer stages.
which are relatively low compared to other classes (because Differently from [35], given that the clock wires distributing
of the presence of the PG, as explained above). IP FFs (ex- the clock signal throughout the domain produce an increment of
cept DET-SPGFF), MS, MSAFF, and STFF show clock load in- that is independent of FFs features, they are neglected in the
crements up to 3.5 . Conditional Differential FFs (CCFF and following analysis. Hence, is simply equal to , being
VSWFF) reach nearly 4 clock load increase. For the previ- the number of FFs within the clock domain (in the following
ously mentioned reasons, DET-SPGFF exhibits the greatest in- ).
crease (up to 5.5 ). In Table VII we report the energy (with respect to the
energy due to the only FFs that is ) due to clock buffers
B. Impact of Layout Parasitics on Clock Load driving the FFs clock load in the case of clock slope and for
In the analysis reported in Section VII-A, the clock load in- the optimum value (i.e., is the optimum tapering
cludes the contribution of layout parasitics. In this section, we factor leading to the minimum clocking energy in the clock do-
evaluate in detail the fraction of clock load due to these para- main). The values of are also reported in Table VII (see
sitics. To this aim, we consider all the , , and [35] for a detailed discussion on how these values are derived,
conditions and all the minimum designs (i.e., 216 dif- given the FFs features) and the considered sizings are minimum
ferent sizings). In Table VI (5th column), we report the average , , and (in the reference case).
percentage ratio between the clock load fraction due By inspection of Table VII, it is apparent that, except for the
to layout parasitics and the total clock load . MS FFs, when considering a steep clock waveform (i.e., ),
From Table VI (5th column), the layout parasitics are a size- the energy increment typically is 10%–30% of the FFs en-
able fraction of the overall clock load, which typically is in the ergy, nearly regardless of FFs sizing (actually it slightly dimin-
40%–60% range (i.e., the layout parasitics can even account for ishes for low-energy design), and EP FFs are again rewarded.
most of the clock load in a FF). This confirms that layout para- Anyhow, the previously reported rankings change in the low-
sitics must be necessarily taken into account to fairly compare energy region because of the behavior of MS FFs, which, due
FFs, although they were neglected in previous papers. to their basic low energy and their high clock load, see a very
Globally, MS FFs (except GMSL) have the most complex high energy increment (up to 100% for the DET-TGLM) due
clock (and complementary clock) routing paths, showing an im- to .
pact of clock wires higher than 50%. An exception to this trend In the case the optimum clock slope is used to minimize the
is represented by DET-SPL and DET-CDFF, where the clock overall clocking energy, it can be easily seen that the energy in-
terminal is not decoupled from some internal transistors (be- crements become much more similar for all the FFs topologies,
cause of turned on transmission gates) and hence the clock wires and they become equal to 30%–35% for MS FFs. Moreover,
have a minor impact on . the energy increments significantly diminish for low-energy siz-
ings. Hence, by also considering that FFs speed is not practi-
C. Joint Flip-Flops and Clock Distribution Energy Dissipation cally degraded by assuming a smoother slope up to
According to the traditional approach adopted in the litera- [35], the previously reported rankings in the energy-delay space
ture, up to this point the FFs comparison has been carried out do not change significantly when adopting the optimum clock
by considering the dissipation related to the only FFs. However, slope. Nevertheless, one should emphasize that, although they
as mentioned in Section VII-A, the dissipation of clock buffers remain the most suitable circuits for very low-energy applica-
in the clock domains is directly connected with the clock load. tions, the inclusion of worsen the performances of MS FFs
Therefore, we have to further investigate this aspect and in case (e.g., for minimum ).
ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 749
M = 128 FFs (N ME
TABLE VII
PERCENTAGE ENERGY INCREMENT DUE TO A CLOCK TAPERED BUFFER DRIVING ORMALIZED TO FFs ENERGY, )
VIII. CONCLUSION sults showed that the clock load is severely impacted by layout
In this paper, an exhaustive comparison of a large number parasitics, and that explicit pulsed FFs have a small clock load
of FFs (19 topologies belonging to four different classes) in thanks to the decoupling effect brought by the pulse generator.
nanometer (65-nm) CMOS technology has been carried out, dif- It is also shown that, by including the impact of local clock dis-
ferently from the other most relevant analyses reported in the lit- tribution buffers, whose dissipation is directly related with FFs
erature that have so far adopted technologies up to 0.13 m. The clock load, the rankings of FFs in the E-D space do not change
comparison has been performed in the whole energy-delay-area significantly, unless for the MS class that is somewhat penalized.
design space. The impact of layout parasitics has been included As a general remark, simpler basic structures are rewarded in
in the transistor-level design phase. The contribution of leakage nanometer technologies because of the strong impact of layout
has been considered in both standby and active mode, weighting parasitics. In particular, explicit pulsed topologies, and specif-
it according to the logic depth in the active case. Wide loading ically the TGPL, have been recognized as the most efficient
and switching activity conditions have been explored, and other FF topologies in a very wide range of applications from many
properties (e.g., the clock load) have been analyzed in detail. points of view.
As opposite to previous papers, figures of merit that designers
REFERENCES
are familiar with have been considered to gain an insight into the
[1] M. Alioto, E. Consoli, and G. Palumbo, “Analysis and comparison
considered tradeoffs in a wide range of applications. Analysis
in the energy-delay-area domain of nanometer CMOS flip-flops: Part
showed that the results are different from previous papers be- I—Methodologies and design strategies,” IEEE Trans. Very Large
cause, here, the layout parasitics have been explicitly included Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 725–736, May 2011.
from the beginning and a much wider range of topologies has [2] D. Markovic, B. Nikolic, and R. Brodersen, “Analysis and design of
low-energy flip-flops,” in Proc. Int. Symp. Low Power Electron. Des.,
been considered. Aug. 2001, pp. 52–55.
According to the presented results, the fastest topology is the [3] D. Markovic, J. Tschanz, and V. De, “Transmission-gate based flip-
STFF, the best low-energy FFs are the DETTGLM and TGFF, flop,” U.S. Patent 6 642 765, Nov. 4, 2003.
[4] M. Nogawa and Y. Ohtomo, “A data-transition look-ahead DFF circuit
whereas the most energy-efficient throughout a wide region of for statistical reduction in power consumption,” IEEE J. Solid-State
the energy-delay design space is the TGPL. Moreover, the best Circuits, vol. 33, no. 5, pp. 702–706, May 1998.
topologies within each of the main FF classes (MS, implicit- [5] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D.
Draper, “Flow-through latch and edge-triggered flip-flop hybrid ele-
explicit pulsed, differential, and dual-edge-triggered) have been
ments,” in Proc. IEEE Int. Solid-State Circuit Conf., Feb. 1996, pp.
identified as well. 138–139.
For the first time, the layout efficiency of FFs has been [6] F. Klass, C. Amir, A. Das, K. Aingaran, C. Truong, R. Wang, A. Mehta,
analyzed. In particular, HLFF, MSAFF, and TGFF exhibit a R. Heald, and G. Yee, “A new family of semidynamic and dynamic flip-
flops with embedded logic for high-performance processors,” IEEE J.
very efficient area-delay tradeoff. Moreover, it has been shown Solid-State Circuits, vol. 34, no. 5, pp. 712–716, May 1999.
that area is almost proportional to leakage regardless of the FF [7] R. Heald, K. Aingaran, C. Amir, M. Ang, M. Boland, P. Dixit, G.
topology and the transistor sizing. Gouldsberry, D. Greenley, J. Grinberg, J. Hart, T. Horel, W. Hsu, J.
Kaku, C. Kim, S. Kim, F. Klass, H. Kwan, G. Lauterbach, R. Lo, H.
The differences between the leakage-delay and the more gen- McIntyre, A. Mehta, D. Murata, S. Nguyen, Y. Pai, S. Patel, K. Shin,
eral energy-delay tradeoff have been pointed out. It has also K. Tam, S. Vishwanthaiah, J. Wu, G. Yee, and E. You, “A third gener-
been shown that leakage has a significant impact on the op- ation SPARC V9 64-b microprocessor,” IEEE J. Solid-State Circuits,
timum transistor sizing, especially for MS FFs. The clock load vol. 35, no. 11, pp. 1526–1538, Nov. 2000.
[8] N. Nedovic, “Clocked storage elements for high-performance applica-
seen from the clock terminal of a FF and the related dissipation tions,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ. California,
of the clock distribution network, has also been analyzed. Re- Davis, 2003.
750 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011
[9] N. Nedovic, M. Aleksic, and V. Oklobdzija, “Conditional techniques [23] S. Heo, R. Krashinsky, and K. Asanovic, “Activity-sensitive flip-flop
for low power consumption flip-flops,” in Proc. IEEE Int. Conf. Elec- and latch selection for reduced energy,” IEEE Trans. Very Large Scale
tron., Circuits Syst., Feb./May 2001, vol. 2, pp. 803–806. Integr. (VLSI) Syst., vol. 15, no. 9, pp. 1060–1064, Sep. 2007.
[10] P. Zhao, T. Darwish, and M. Bayoumi, “Low power and high speed [24] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and System
explicit-pulsed flip-flops,” in Proc. IEEE Midw. Symp. Circuits Syst., Perspective (3rd edition). Boston, MA: Addison Wesley, 2004.
Aug. 2002, pp. 477–480. [25] V. Oklobdzija, V. Stojanovic, D. Markovic, and N. Nedovic, Digital
[11] S. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. Sullivan, and System Clocking: High-Performance and Low-Power Aspects. New
T. Grutkowski, “The implementation of the itanium 2 microprocessor,” York: Wiley-IEEE Press, 2003.
IEEE J. Solid-State Circuit, vol. 37, no. 11, pp. 1448–1460, Nov. 2002. [26] H. Partovi, “Clocked storage elements,” in Design of High-Perfor-
[12] B. Nikolic, V. Stojanovic, V. Oklobdzija, W. Jia, J. Chiu, and M. mance Microprocessor Circuits. Piscataway, NJ: IEEE Press, 2001,
Leung, “Improved sense-amplifier-based flip-flop: Design and mea- pp. 207–234.
surements,” IEEE J. Solid-State Circuits, vol. 35, no. 6, pp. 876–884, [27] V. Stojanovic and V. Oklobdzija, “Comparative analysis of master-
Jun. 2000. slave latches and flip-flops for high-performance and low-power sys-
[13] N. Nedovic, V. Oklobdzija, and W. Walker, “A clock skew absorbing tems,” IEEE J. Solid-State Circuits, vol. 34, no. 4, pp. 536–548, Apr.
flip-flop,” in Proc. IEEE Int. Solid-State Circuit Conf., Feb. 2003, pp. 1999.
342–344. [28] N. Nedovic and V. Oklobdzija, “Dual-edge triggered storage elements
[14] B. Kong, S. Kim, and Y. Jun, “Conditional-capture flip-flop for statis- and clocking strategy for low-power systems,” IEEE Trans. Very Large
tical power reduction,” IEEE J. Solid-State Circuits, vol. 36, no. 8, pp. Scale Integr. (VLSI) Syst., vol. 13, no. 5, pp. 577–590, May 2005.
1263–1271, Aug. 2001. [29] S. G. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS
[15] S. Shin and B. Kong, “Variable sampling window flip-flops for low- Technologies. New York: Springer, 2006.
power high-speed VLSI,” IEE Proc. IEE Circuits, Devices Syst., vol. [30] A. Abdollahi, F. Fallah, and M. Pedram, “Leakage current reduction in
152, no. 3, pp. 266–271, Jun. 2005. CMOS VLSI circuits by input vector control,” IEEE Trans. Very Large
[16] R. Llopis and M. Sachdev, “Low power, testable dual edge triggered Scale Integr. (VLSI) Syst., vol. 12, no. 2, pp. 140–154, Feb. 2004.
flip-flops,” in Proc. Int. Symp. Low Power Electron. Des., Aug. 1996, [31] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage
pp. 341–345. current mechanisms and leakage reduction techniques in deep-submi-
[17] N. Nedovic, W. Walker, V. Oklobdzija, and M. Aleksic, “A low power crometer CMOS circuits,” Proc. IEEE, vol. 91, no. 2, pp. 305–327, Feb.
simmetrically pulsed dual edge-triggered flip-flop,” in Proc. IEEE Eur. 2003.
Solid-State Circuits Conf., Sep. 2002, pp. 399–402. [32] M. Agostinelli, M. Alioto, D. Esseni, and L. Selmi, “Leakage-delay
[18] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De, tradeoff in FinFET logic circuits: A comparative analysys with bulk
“Comparative delay and energy of single edge-triggered and dual edge- technology,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., to be
triggered pulsed flip-flops for high-performance microprocessors,” in published.
Proc. Int. Symp. Low Power Electron. Des., Aug. 2001, pp. 147–152. [33] P. Gronowski, W. Bowhill, R. Preston, M. Gowan, and R. Allmon,
[19] P. Zhao, T. Darwish, and M. Bayoumi, “High-performance and low- “High-performance microprocessor design,” IEEE J. Solid-State Cir-
power conditional discharge flip-flop,” IEEE Trans. Very Large Scale cuits, vol. 33, no. 5, pp. 676–686, May 1998.
Integr. (VLSI) Syst., vol. 12, no. 5, pp. 477–484, May 2004. [34] D. Bailey and B. Benschneider, “Clocking design and analysis for a
[20] M. Alioto, E. Consoli, and G. Palumbo, “General strategies to design 600-MHz alpha microprocessor,” IEEE J. Solid-State Circuits, vol. 33,
nanometer flip-flops in the energy-delay space,” IEEE Trans. Circuits no. 11, pp. 1627–1633, Nov. 1998.
Syst. I, Reg. Papers, to be published. [35] M. Alioto, E. Consoli, and G. Palumbo, “Flip-flop energy/performance
[21] C. Giacomotto, N. Nedovic, and V. Oklobdzija, “The effect of the versus clock slope and impact on the clock network design,” IEEE
system specification on the optimal selection of clocked storage ele- Trans. Circuits Syst. I, Reg. Papers, to be published.
ments,” IEEE J. Solid-State Circuits, vol. 42, no. 6, pp. 1392–1404,
Jun. 2007.
[22] S. Heo and K. Asanovic, “Load-sensitive flip-flop characterization,” in
Proc. IEEE Comput. Soc. Workshop VLSI, Apr. 2001, pp. 87–92. Photographs and biographies of all authors are available in M. Alioto et al. [1].