Anda di halaman 1dari 14

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO.

5, MAY 2011 737

Analysis and Comparison in the Energy-Delay-Area


Domain of Nanometer CMOS Flip-Flops:
Part II—Results and Figures of Merit
Massimo Alioto, Senior Member, IEEE, Elio Consoli, and Gaetano Palumbo, Fellow, IEEE

Abstract—In Part II of this paper, a comparison of the most rep- k) Transmission-Gate Pulsed Latch (TGPL) [11].
resentative flip-flop (FF) classes and topologies in a 65-nm CMOS 3) Differential:
technology is carried out. The comparison, which is performed on l) Modified Sense-Amplifier FF (MSAFF) [12];
the energy-delay-area domain, exploits the strategies and method-
ologies for FFs analysis and design reported in Part I. In particular, m) Skew-Tolerant FF (STFF) [13];
the analysis accounts for the impact of leakage and layout par- n) Conditional Capture FF (CCFF) [14];
asitics on the optimization of the circuits. The tradeoffs between o) Variable Sampling Window FF (VSWFF) [15].
leakage, area, clock load, delay, and other interesting properties 4) Dual-edge-triggered (DET):
are extensively discussed. The investigation permits to derive sev- p) Transmission-gate Latch-Mux (DET-TGLM) [16];
eral considerations on each FF class and to identify the best topolo-
gies for a targeted application. q) Symmetric Pulse Generator FF (DET-SPGFF) [17];
r) Static Pulsed Latch (DET-SPL) [18];
Index Terms—Clocking, energy-delay tradeoff, energy effi- s) Conditional Discharge FF (DET-CDFF) [19].
ciency, flip-flops (FFs), high speed, interconnects, leakage, logical
effort, low power, nanometer technologies, VLSI. To derive the ranking among the various FFs, the energy ef-
ficiency is evaluated by extracting their energy-efficient curves
(EEC) [20], [21], under the conditions described in Part I. At
I. INTRODUCTION the same time, the most significant points of the EEC are asso-
ciated with the minimization of the energy-delay products
N PART II of this paper, the general framework for FFs
I analysis and design discussed in Part I [1] is adopted to carry
out a comparison among 19 flip-flop (FF) topologies in a 65-nm
with proper exponents. Hence, results are very general and have
a clear physical meaning, since are linked to figures of merit
(FOMs) that designers are familiar with.
CMOS technology, which is the widest comparison presented in Impact of leakage on FF energy in standby and active mode
the literature. The circuits are selected among the most efficient is discussed, and the influence on the FFs design is highlighted.
and best-known ones and were presented at the end of the Part The tradeoff between leakage, area, clock load, and delay is
I [1]. They are again listed in the following, together with the analyzed. Several additional FFs features are also considered,
following respective classes [see Fig. 6(a)–(s) in Part I]. like the load on the clock network, the layout efficiency, and the
1) Master-Slave (MS): leakage-area interdependence.
a) Transmission-Gate FF (TGFF) [2]; This paper is structured as follows. In Section II, the normal-
b) Write-Port Master Slave FF (WPMS) [3]; ization of results to technology is showed. The energy-efficiency
c) Gated Master Slave FF (GMSL) [2]; potentials of the considered FFs and the various tradeoffs
d) Data-transition lookahead FF (DTLA) [4]. are discussed in Sections III and IV. The successive three sec-
2) Pulsed, both implicit (IP) and explicit (EP): tions deal with leakage (see Section V), area (see Section VI),
e) Hybrid Latch-FF (HLFF) [5]; and clock load (see Section VII). Comparison among the con-
f) Semi-Dynamic FF (SDFF) [6]; sidered FFs, tradeoffs with delay and several other interesting
g) UltraSPARC Semi-Dynamic FF (USDFF) [7]; properties are reported in Sections V–VII. Finally, the conclu-
h) Implicitly Push-Pull FF (IPPFF) [8]; sions are in Section VIII.
i) Conditional Precharge FF (CPFF) [9];
j) Static Explicit Pulsed FF (SEPFF) [10]; II. NORMALIZATION TO TECHNOLOGY
To gain an intuitive understanding of results independently of
Manuscript received September 23, 2009; revised December 21, 2009. First
published March 25, 2010; current version published April 27, 2011.
technology, the various quantities and data are properly normal-
M. Alioto is with the Dipartimento di Ingegneria dell’Informazione (DII), ized to reference technology values. In particular, we have the
Università di Siena, 53100 Siena, Italy, and also with the Berkeley Wireless following:
Research Center—Electrical Engineering and Computer Science Department,
University of California, Berkeley, CA 94704-1302 USA (e-mail: malioto@dii.
• capacitances are normalized to that of a symmetrical min-
unisi.it; alioto@eecs.berkeley.edu). imum inverter , ;
E. Consoli and G. Palumbo are with the Dipartimento di Ingegneria • delays are normalized to delay [24];
Elettrica, Elettronica e dei Sistemi (DIEES), Università di Catania, I-95125 • energies are normalized to , which is the energy
Catania, Italy (e-mail: econsoli@diees.unict.it; elio83@katamail.com;
gpalumbo@diees.unict.it). dissipated by an unloaded symmetrical minimum inverter
Digital Object Identifier 10.1109/TVLSI.2010.2041377 during a complete transition cycle at its output;
1063-8210/$26.00 © 2010 IEEE
738 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

TABLE I
65-nm CMOS TECHNOLOGY: MAIN PARAMETERS

• leakage currents are normalized to the average leakage cur-


rent of a symmetrical minimum inverter, ;
• areas are normalized to , where is minimum pitch of
Fig. 1. EECs of MS FFs: C = 16C , = 0:25, T =F O4 = 40.
the Metal2 layer.
All the normalization parameters, together with the minimum
feasible channel width and length, are reported in Table I for of the longer internal wires needed (WPMS area is 1.3–1.5
the considered 65-nm CMOS technology. For all the analyses, greater than TGFF area in the various considered sizings).
a 1 V supply voltage is adopted. Moreover, the capac- Clock-gated FFs (GMSL and DTLA) exhibit the worse
itances per unit-length of poly, metal1 and metal2 wires, nor- performances throughout the space. Their high latency
malized to the capacitance per unit width of a minimum channel is obviously due to the high number of stages involved in the
length transistor, , are also reported in Table I. It is worth paths, because of the additional gating logic with respect
noting that such capacitances are extracted from the design kit to the basic MS topologies. In regard to energy consumption,
by including the fringing contribution due to coupled lines be- in principle clock-gated FFs should have a low dissipation
tween stacked buses, which is usually reported as the worst-case for low switching activity [1], [4], [25]. Actually, this
interconnect capacitance (as opposite to the best-case value ob- holds given that GMSL (DTLA) nearly achieves 700% (200%)
tained for an isolated interconnect above a ground plane) [20]. clock-related energy savings when working in gating (i.e.,
Inclusion of the fringing contribution is very important to cor- when ) rather than in non-gating (i.e., when )
rectly estimate layout parasitics, as the worst-case value can be condition. However, from an absolute point of view (i.e., when
typically three times the best-case value [20]. comparing to other FFs), the of GMSL and DTLA are
about 1.2 (1.8 ) and 3.0 (3.0 ) times greater than TGFF
III. E-D TRADEOFF IN EACH FF CLASS for 0.1 (0.25). Again, this is due to the strong impact of
In this section, we discuss the tradeoff between the energy layout parasitics that degrade the performances of clock gated
and the data-to-output delay (see Part I for their FFs, since they have a very complex layout (see layouts in
definition) by comparing the EECs for various FF classes. The Part I).
curves are derived under different and conditions for The ranking of the analyzed MS FFs does not change for
equal to 10 (high-speed applications), 40 (energy- different and values, and the TGFF largely remains
efficient applications), and 80 (low-energy applications). Since the most energy-efficient MS FF in all the space. For
the ranking of topologies does not change significantly with this reason, additional figures relative to this FF class are not
, the results for are presented. reported for the sake of brevity.
In the rest of the paper, we assume a load capacitance
, and switching activity as the “reference B. Single-Edge-Triggered Implicit-Explicit Pulsed FFs
case”. The EECs of the SET IP-EP FFs, derived in the reference
case, are reported in Fig. 2. From Fig. 2, the TGPL is clearly the
A. Single-Edge-Triggered Master-Slave FFs
most energy-efficient SET Pulsed FF in the high-speed region
The EECs of the SET MS FFs, derived in the reference case, and in the low-energy one up to the FOM. This was ex-
are reported in Fig. 1. Unlike the results in [21], where the per- pected from the simplicity of the basic latch structure of TGPL
formances of TGFF and WPMS are similar in the high-speed (and hence the low impact of layout parasitics) [26]. This good
and minimum region, from Fig. 1 we find that TGFF is energy efficiency of TGPL is remarkable since here every latch
more energy-efficient than WPMS in all regions (WPMS has its own pulse generator (PG), but actually energy may be
has minimum delay , energy-delay product , and min- further reduced by sharing PG among various latches. From
imum energy worse than the TGFF by a factor close to Fig. 2, in the deep low-energy region (minimum and
1.5, and other FOMs are even worse). FOMs), the CPFF and IPPFF are the best SET Pulsed FFs. In-
This is partly due to the adoption of NMOS pass-transis- deed, both are Implicit Pulsed and hence do not require a PG.
tors (versus TGFF transmission gates) and partially non-gated In addition, the CPFF employs a conditional technique to avoid
keepers (versus TGFF full-gated keepers), but also to the impact unnecessary precharge [9], while the IPPFF reduces the load on
ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 739

Fig. 2. EECs of IP-EP FFs: C = 16C , = 0:25, T =F O4 = 40.

the precharged internal node by using a push-pull second stage.


CPFF and HLFF also exhibit the best speed among SET IP FFs.
SEPFF is fast, but dissipates more than TGPL in all condi-
tions and hence is less energy-efficient. Its delay is also nearly
1.2 greater than TGPL in the various conditions. This is some-
what different from previous works [18], which predicted the
same speed for an average load (like ). Again, this is
due to the heavier parasitic delay associated with interconnects,
since SEPFF apparently has a slightly more complex layout
compared to TGPL (see layouts in Part I).
Among all the SET Pulsed FFs, the semi-dynamic ones
(SDFF and USDFF) exhibit the worst speed in the whole Fig. 3. EECs of IP-EP FFs: (a) C = 64C and (b) C = 4C ( =
space. The reason is again related with the layout
0:25, T =FO4 = 40).
complexity, as is apparent from the comparison of the layouts
in Part I. In contrast with [6], [18], and [27], where it is stated
that such FFs have features very similar to the HLFF, To understand the dependence of the above results on the
we find that the latter one is significantly more energy-effi- load, the EECs of Pulsed FFs for and
cient throughout the whole space (except in the very are reported in Fig. 3(a) and (b) (in both cases
high-speed region where they are similar). Indeed, HLFF has a and ). The ranking of IP FFs does not
much simpler schematic and hence its layout has much shorter change significantly, except for IPPFF that, having a greater
interconnects, thus reducing energy consumption. number of stages in its paths, becomes relatively faster
Moreover, in contrast to previous results [21], USDFF does for a large load, as is obvious from logical effort theory.
not outperform SDFF, again because of its more complex As concerns EP FFs, unlike [22], where the speed of a
routing, although this can be only partly inferred from the in- two stage FF (TGPL) is overcome by that of a three stage
spection of the layouts reported in the Part I of the paper, which topology (SEPFF) when the load is large enough ,
are relative to a single sizing strategy. Given the mirror-like the SEPFF still shows a 1.1 (1.3 ) delay increment even
structure of the two circuits, the local wires capacitances can for . When the load is small ,
be compared by averaging out the results for all the different TGPL is the most energy-efficient Pulsed FF up to the
nodes and for all the different sizing strategies considered in FOM, whereas it is dominant “only” up to FOM for large
this work. On the average, we find that local wires parasitics load .
are nearly 60% larger for USDFF than SDFF. To understand the effect of switching activity on Pulsed FFs,
All SET IP FFs are slower than EP FFs. In particular, by av- their EECs for and a are reported in
eraging out the delays correspondent to the various optimized Fig. 4(a) and (b) (in both cases and
FOMs, IP FFs delays are nearly 1.3 greater than for EP FFs. ). The main changes occur in the low-energy region, where
This happens because IP FFs need stages with three stacked the CPFF becomes more energy efficient for from
transistors in their critical path, whereas EP FFs exploit a real on, since it takes advantage of the conditional precharge.
pulsed signal and need stages with two stacked transistors. In Conversely, for , the IPPFF becomes the more en-
particular, IPPFF has the worst minimum delay among IP ergy-efficient Pulsed FF in the low-energy region, whereas the
FFs, since it exhibits three and four stages paths for the rising CPFF and the SEPFF (both exhibiting pseudo-static first stages)
and falling data transitions and this effect overcomes the advan- suffer from a considerable increase in their dissipation due to the
tages given by the push-pull stage [21]. high data activity rate.
740 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

whereas those of MSAFF, CCFF and VSWFF are 1.8 , 1.3 ,


and 1.4 greater, respectively.
These differences in the speed of such differential FFs can
be explained as follows: all of them have equal second (skewed
inverter) and third (push-pull) stages, which are very fast. As re-
gards the first stage, the speed of MSAFF is affected by the load
imposed by the cross-coupled inverters, whose NMOS transis-
tors belong to the complementary critical paths (although the
sense-amplifier nature is useful for level-restoring). The first
stage of CCFF and VSWFF does not have this drawback and
is significantly faster, but not as much as the first stage of STFF,
where only two stacked NMOS are employed thanks to the use
of additional driving NOR gates.
The high energy-efficiency of MSAFF in the low-energy re-
gion is due to its relatively simpler layout and to the lower im-
pact of layout parasitics that allows for downsizing transistors
with minor performances loss with respect to STFF, CCFF, and
VSWFF. As shown in Fig. 7(l) in Part I, the high regularity of
MSAFF layout leads to a very small area despite of its differen-
tial signaling.
For analogous reasons, CCFF and VSWFF, which have ex-
tremely complex layout and local wires (see the layouts in Figs.
7(n) and (o) in Part I), are never the most energy-efficient. This
is in contrast to what is claimed in many papers (especially as
concerns CCFF) [9], [14], [15], and [25], where the conditional
capture property is praised as a very efficient technique to reduce
energy at a negligible speed penalty. This is no longer true in
nanometer technologies where the impact of local wires is con-
Fig. 4. EECs of IP-EP FFs: (a) = 0:1 and (b) = 0:5 (C = siderable (in order to maintain good speed, CCFF and VSWFF
16C , T =F O 4 = 40
). need to be strongly oversized for a targeted speed).
Given the very similar topology of the considered differen-
tial FFs, the same ranking is obtained regardless of the load .
Instead, switching activity has a significant impact on the com-
parison, as is shown in Fig. 6(a) and (b), where the EECs de-
rived for and a are plotted (in both
cases and ). In detail, for
, CCFF and VSWFF become the most energy-effi-
cient from to FOMs (thus including also the product
). For their EECs move far away from the
MSAFF and STFF ones, in contrast to [14], where it is stated
that conditional capture FFs have a reasonable energy consump-
tion even for such a data transition rate.

D. Dual-Edge-Triggered FFs
We decide to put the selected DET topologies in a single class
Fig. 5. EECs of differential FFs: C = 16C , = 0:25, T =F O4 = even if they have quite different basic operations and features.
40 . The EECs of the DET FFs, derived in the reference case, are
reported in Fig. 7. As for the Differential class, two topologies
emerge as the most energy-efficient ones: DET-SPL in the high-
C. Single-Edge-Triggered Differential FFs speed region (from to FOMs) and the DET-TGLM
The EECs of the SET Differential FFs in the reference case in the low-energy one (from to FOMs). In partic-
are reported in Fig. 5. From Fig. 5, the space is split in two ular, DET-TGLM (which has a MS structure) dissipates less
regions: the high-speed one (from to FOMs), where the energy among all the analyzed FFs. On average, DET-SPGFF,
STFF is the most energy-efficient, and the low-energy one (from DET-SPL and DET-CDFF dissipate 1.7 , 2.4 , and 2.1 more
to FOMs), where the MSAFF is the best Differential energy than the DET-TGLM. This is due to the combination of
FF. In particular, STFF is the fastest among all the analyzed FFs. the DET and MS features, which both contribute to reduce en-
For instance the of TGPL is 1.1 greater than the STFF, ergy consumption.
ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 741

Fig. 6. EECs of differential FFs: (a) = 0:1 and (b) = 0:5 (C = Fig. 8. EECs of DET FFs: (a) C = 64C and (b) C = 4C ( =
16C , T =F O 4 = 40 ). 0:25, T =F O4 = 40).

Also DET-CDFF has an EP structure and shows a good


tradeoff but is competitive only for the FOM.
DET-SPGFF, which is an implicit pulsed FF, is never the most
energy-efficient FF because it suffers from a high layout com-
plexity, and also from the inclusion in the paths of the
clocked precharge transistors, which thus need to be oversized.
This explains why results contrast with those in [28], where it
was stated that the DET-SPGFF has a better product than
DET-TGLM and DET-SPL in typical conditions.
The effect of load on DET FFs is shown in Fig. 8(a), (b),
where the EECs for and a
conditions are plotted (in both cases and
). The speed of DET-SPL and DET-CDFF is
Fig. 7. EECs of DET FFs: C = 16C , = 0:25, T =F O4 = 40. nearly the same for large load (they have a nearly equal ),
whereas the DET-SPL is significantly faster for small load since
it has only two stages in the paths. However, for large
In general, DET FFs can have rather complex layouts in those load, the DET-CDFF is more energy-efficient than DET-SPL
topologies where some parts of the circuit are replicated (as from FOM on, i.e., in almost all the space (differ-
for DET-TGLM or DET-SPGFF, from Fig. 6(p) and (q) in Part ently from the previous similar discussion on the comparison of
I). Instead, in other cases (DET-SPL or DET-CDFF), the DET TGPL and SEPFF). This is because the conditional discharge
functionality is simply accomplished by adopting a DET Pulse allows for considerably reducing energy (although DET-CDFF
Generator. Anyhow, in the case of DET-TGLM, the more com- has a more complex layout than DET-SPL).
plex layout is compensated by the DET property that makes it The effect of switching activity on DET FFs is analyzed in
the best FF in the low-energy region with the TGFF. Fig. 9(a) and (b), where the EECs for and
DET-SPL, which has an explicit pulsed structure, is the are reported (in both cases and
faster DET topology thanks to the simplicity of its path. ). Even if the DET-CDFF adopts the conditional discharge
742 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

Fig. 9. EECs of DET FFs: (a) = 0:1 and (b) = 0:5 (C = 16C ,
T =F O 4 = 40 ).

property, it is the most energy-efficient circuit only around the


FOM. Indeed, for low switching activity, DET-TGLM
takes advantage of the intrinsic absence of precharge dissipa-
tion, while DET-CDFF always suffers from the Pulse Generator
consumption. For analogous reasons, DET-SPGFF has an even
higher energy due to precharge of internal nodes.
For DET-CDFF can no longer benefit from
the conditional discharge and DET-TGLM suffers from the
frequent transitions in its numerous internal nodes. On the
other hand, DET-SPL is the most energy efficient in the region
from to FOMs, and DET-SPGFF is the best at and
FOMs. DET-TGLM is still the best circuit in the deep
low-energy region.

IV. E-D GLOBAL COMPARISON AMONG ALL THE FFs

A. Metrics
In Fig. 10(a)–(e), we report the FOMs , , , ,
and of all FFs, normalized to the best topology (again,
we consider the reference case). This permits to draw general
conclusions on the comparison of the analyzed FF classes. It is
apparent that Pulsed topologies are the most energy efficient in
Fig. 10. (a) D ; (b) ED ; (c) ED ; (d) E D ; (e) E normalized FOMs:
the high-speed region ( and FOMs) and the EP FFs in C = 16C , = 0:25, T =F O4 = 40.
particular result more energy-efficient than the IP FFs in such a
region. Since pulsed FFs are employed in real high-speed appli-
cations (e.g., Intel microprocessors), EP topologies can be con- The superiority of EP over IP FFs is explained by considering
sidered the best choice in such a case. that, in nanometer technologies, IP FFs suffer from a complex
ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 743

to FOMs. DET-TGLM has the best , and


FOMs (TGFF has nearly the same performances in the
low-energy region and is the best circuit in a narrow window
between and FOMs). Hence, except for extreme high-
speed designs, TGPL reveals itself as the most energy-efficient
solution in a very wide region of practical applicability. MS FFs
based on transmission gates (TGFF, DET-TGLM) are the best
when energy is the main concern.
Such results only partially agree with those on [21] and lead
to the different following considerations:
• the suitability of STFF for high-speed designs is now lim-
ited only to extremely high-speed applications ( ratio
greater than 4, see Part I);
• the basic simplicity of TGPL leads to low layout parasitics
(in general, topologies having simple layouts in their
Fig. 11. Ideal EEC extracted selecting the most energy-efficient FFs and min-
imum-E D designs.
paths are definitely favored);
• IP FFs are no longer advantageous and, differently from the
IPPFF in [21], none of them is the most energy-efficient in
any of the regions;
routing between the stages involved in the paths, which
• in despite of the presumed ineffectiveness of DET topolo-
thus need to be oversized to avoid a speed penalty.
gies, DET-TGLM is slightly more energy-efficient than the
In particular, TGPL is the best circuit even in terms of
TGFF in the low-energy region (in [21] only TGFF is con-
product. Observe that, in high-speed applications, pulsed FFs
sidered to extract the ideal EEC).
can benefit from an even greater energy reduction when the
The previously mentioned FFs are still the best ones even
pulse generator is shared among various FFs (here, every latch
combining all the considered load and switching activity values.
has its own PG). The advantage of EP over IP FFs no longer
In general, the following few differences emerge.
exists in the low-energy region ( and FOMs).
• For large load , STFF always achieves the best
As expected, also differential FFs exhibit very good features
FOM and TGPL does not always have the best
in the high-speed region. Indeed their basic structures closely
product. By comparing STFF and TGPL in the high-speed
resemble those of pulsed FFs (STFF is the fastest FF).
region, the delay of the latter one is nearly 1.03 , 1.10 ,
Obviously the energy dissipation of differential FFs is high
and 1.20 greater in the small, average and large loading
since they have to provide both polarities of the output. Some
conditions, respectively. Instead, the energy consumption
of them have a single-ended counterpart, like the STFF [13] and
of STFF (which has a more complex routing and 1.3
the CCFF [14]. However, such single-ended versions (which are
1.5 larger area) is nearly 1.9 greater in such a region.
IP FFs) are quite complex and (in a separate analysis) we found
• For small load , TGPL is always the best circuit in
that their energy-efficiency is always worse than other analo-
the extremely wide range .
gous single-ended topologies (for this reason they were not in-
• For low-switching activity (0.1), DET-TGLM is more ef-
cluded in our analysis). In the low-energy region, MSAFF is
ficient than TGFF. Indeed, it partly replicates the TGFF
quite efficient since it achieves acceptable speed at the cost of a
circuit but the increased nodes number is not a significant
relatively low consumption.
concern if the data rarely varies. For instance, DET-TGLM
MS FFs are clearly the most energy-efficient FFs in the low-
has the best overall for and .
energy region, whereas their speed is limited. Together with
• For high-switching activity (0.5), TGFF replaces
TGPL, TGFF, and DET-TGLM offer also the best compromise
DET-TGLM as the best circuit in terms of ,
in terms of product. Clock-gated FFs are by far the worst
and .
circuits and have a degraded speed and energy compared to any
The significantly larger number of analyzed topologies and
other topology. Accordingly, Clock-gated FFs are unsuitable for
inclusion of the impact of layout parasitics are responsible for
nanometer technologies.
the previously mentioned differences with respect to the results
Among DET FFs, the DET-TGLM represents the most en-
in [21]. This is easily demonstrated by comparing the above
ergy-efficient solution in the deep low-energy region, together
results with those in Fig. 12, where layout parasitics are not
with TGFF. It is the DET counterpart of TGFF and they show
considered at all. By comparing Fig. 12 with [21, Fig. 15], not
similar performances since the greater layout complexity of
surprisingly, the results are nearly coincident (except for CPFF
DET-TGLM is compensated by the energy reduction due to the
and DET-TGLM that were not considered in [21]).
DET property.
In particular, STFF shows now also the best and is
more suitable in a wider part of the high-speed region. TGPL no
B. Selection of the Most Energy-Efficient Topologies
longer has the best (TGFF does) and IPPFF (together with
An ideal energy-efficient curve extrapolated by selecting the CPFF) is the most energy-efficient in a non-negligible
best circuits for each region is reported in Fig. 11. STFF space window as in [21]. The couple TGFF/DET-TGLM still
exhibits the best and products. TGPL is the best from exhibits the best features in the low-energy region.
744 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

TABLE III
OPTIMUM SIZING VARIATION FOR T =FO 4 = 10 AND T =FO4 = 80

TABLE IV
AVERAGE LEAKAGE UNDER VARIOUS OPTIMUM SIZINGS (LEAKAGE
NORMALIZED TO THE MINIMUM IS REPORTED IN BRACKETS) AND AVERAGE
(AMONG THE FOMS) RATIO BETWEEN AVERAGE AND MINIMUM LEAKAGE

Fig. 12. Ideal EEC extracted selecting the most energy-efficient FFs and min-
imum-E D designs (layout parasitics not included).

TABLE II
OPTIMUM TGFF SIZING FOR T =FO4 = 10 AND T =FO4 = 80

• is the average relative variation by including all


the FF transistors;
• is the standard deviation of the relative variation
V. LEAKAGE by including all the FF transistors.
The three above quantities , , and
A. Leakage Impact in Active Mode are evaluated by considering all the optimum
In our analysis, we find that the leakage energy in active designs for and . From Table III, it is
mode does not influence the ranking of FFs, although it can sig- apparent that MS FFs exhibit significant changes in the transistor
nificantly impact the optimum transistor sizing. Thus, leakage sizes because of leakage, since their dynamic consumption is
cannot be merely considered only in standby mode (where it rather low. On the other hand, transistors sizes in Pulsed and
is the only source of dissipation). For instance, let us analyze Differential FFs are negligibly impacted by leakage, as energy is
the TGFF for and . By considering always dominated by the transient energy contribution.
and conditions, the op-
timal sizing of the circuit to minimize the product changes. B. Leakage Impact in Standby Mode and Tradeoff With Delay
Table II reports the optimum design variables , and shows In standby mode, leakage currents represent the only source
that a smaller size of the circuit is required when the leakage of dissipation. Especially when circuits stay idle for very long
contribution increases (high ), for a targeted FOM. periods, standby leakage may be even more important than the
In general, FFs exhibit an extremely high dynamic energy active mode energy.
consumption compared to combinational logic gates, as clock In the following we refer to the average leakage current
is the signal with the highest transition rate within a chip. How- (see Part I). In Table IV, is reported for
ever, with technology scaling, the growing impact of leakage the various FFs and under three typical optimum sizings, i.e.,
on transistor sizing can be significant even in active mode, minimum , and (the optimization is carried out
according to Table II. In particular, the impact of leakage is in the reference case). Absolute and normalized (to the best
certainly stronger for circuits that do not adopt precharge and circuit, highlighted in gray) values are reported.
whose topologies do not lead to frequent transitions on the in- From Table IV (2nd–4th columns), the circuits with the
ternal nodes. In particular, the topologies exhibiting significant greatest leakage are the clock-gated and the differential ones
sizing changes when the parameter is varied from (except MSAFF), due to the high number of transistors and
10 to 80, are reported in Table III, where it follows: layout complexity (which leads to oversized transistors for a
• is the maximum relative variation in the given speed target). On the other hand, all pulsed FFs (both EP/IP
size of a single transistor; and SET-DET) except HLFF show a moderate leakage. This
ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 745

Except for GMSL and DTLA, the energy savings achievable


in standby mode through ISA are moderate, i.e. in the range
8%–33%. Clock-gated FFs behave differently, as the output of
one of the gate in the PG of DTLA remains floating for ,
leading to an enormous leakage. It is worth noting that the re-
sults in the previous subsections were derived by neglecting the
leakage contributions for , i.e., assuming an intentional
(and necessary) driving in standby mode for DTLA.
Effectiveness of RBB can be evaluated through parameter

(1)

which is a figure of merit that was recently introduced to eval-


uate the ability to reduce leakage with small reverse body volt-
Fig. 13. Leakage-delay tradeoff. Optimization in active mode: C = 16C , ages [32], where is the body bias voltage. In particular,
= 0 25
: , T =F O 4 = 40 . represents the reverse body voltage that must be applied
to reduce leakage by an order of magnitude [32]. In our anal-
ysis, starting from for NMOS
is explained by considering that such FFs extensively employ (PMOS), the body bias voltage is decreased (increased) by up to
stacked transistors and hence leakage is somewhat reduced [29]. 0.4 V for NMOS (PMOS) transistors. The average
DET FFs (except DET-TGLM) do not show higher leakage RBB slope (again considering all the aforementioned siz-
currents than their SET counterparts, because they need to em- ings) is in the range of 1.8–2.1 for all the considered FF topolo-
ploy only a slightly more complex PG (DET-SPL and DET- gies, i.e., no appreciable differences arise among them.
CDFF) or because they can again exploit the leakage reduction Interestingly, by evaluating for a single NMOS and a
due to stacking (DET-SPGFF). Instead, DET-TGLM is signif- single PMOS transistor (both with and
icantly worse than SET MS FFs (TGFF and WPMS) since the ), one finds and , respectively.
increased complexity due to the duplication of some stages is The average of such values (2.1) is very close to the
not compensated by the DET property (as opposite to transient values found for complex circuits like the analyzed FFs. This
consumption in active mode). means that the leakage sensitivity of FFs to body biasing is ap-
As mentioned in Sections III and IV, MSAFF tends to be proximately that of a single transistor, which agrees with our
downsized compared to other Differential FFs because of the intuition.
simpler layout, and also because the branching due to the cross-
coupling in the first stage would prevent significant speed im- VI. SILICON AREA
provements if the size were strongly increased.
Finally WPMS and in particular TGFF and HLFF have the A. Comparison of FF Area
minimum leakage. Indeed, MS FFs cannot exploit stacking but The silicon area occupied by FFs can be accurately estimated
have very simple structures and typically small transistors sizes. by using the same procedure that is used to evaluate the in-
HLFF is the simplest among IP FFs and extensively employs terconnects length, as discussed in the Section II-D of Part I.
stacking. Hence, the above procedure permits for the first time to exten-
When circuits operate in standby mode, the tradeoff sively compare FFs in terms of silicon area (previous works did
must be reinterpreted as leakage-delay tradeoff, although we not analyze this aspect [2]–[23], [25]–[28]).
still refer to the optimum FFs sizings discussed in Sections III Table V (2nd–4th columns) reports the area of the various
and IV, where the active mode energy is considered. In Fig. 13, FFs under three typical optimum sizings, i.e., minimum ,
such tradeoff is depicted. For practical delay ranges, SEPFF, , and (the optimization is carried out in the reference
TGPL, HLFF, and TGFF show the best compromise. case). Absolute and normalized (with respect to the best circuit,
highlighted in gray) values are reported.
C. Effectiveness of Leakage-Reduction Techniques Area is mostly dictated by the topological complexity. By in-
Leakage can be reduced by resorting to techniques like the spection of Table V (2nd–4th columns), we can draw the fol-
“input state assignment” (ISA) [30] or the “reverse body bi- lowing main conclusions, which roughly hold for all the fol-
asing” (RBB) [31]. lowing considered sizings:
The effectiveness of ISA can be simply analyzed by evalu- • DTLA and the conditional differential FFs (CCFF and
ating the proportion between the average leakage current and VSWFF) have the greatest area (for minimum ,
the minimum one, considering all possible values of input (data the major complexity of DTLA is the dominant factor);
and clock) and output. Table IV (5th column) reports the av- • TGFF, HLFF, and MSAFF have the smallest areas. Indeed,
erage ratio between and the minimum leakage cur- as explained when dealing with leakage, MSAFF requires
rent . Such average ratio is extrapolated by con- a very low area (despite its differential nature) thanks to
sidering all optimum sizings for the FOMs in formula (7) in Part its regularity, while TGFF and HLFF have the simplest
I and all the , , and conditions. structures among the considered MS and Pulsed FFs.
746 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

TABLE V
ABSOLUTE AREA UNDER VARIOUS OPTIMUM SIZINGS (AREA NORMALIZED TO THE MINIMUM IS REPORTED IN BRACKETS) AND
LAYOUT EFFICIENCY UNDER VARIOUS OPTIMUM SIZINGS

Fig. 14. Area-delay tradeoff. Optimization in active mode: C = 16C ,


= 0 25
: , T =F O 4 = 40 .
Fig. 15. Area degradation (normalization to E sizing). Optimization in active
mode: C = 16 C , = 0 25
: , T =FO 4 = 40 .
As concerns EP FFs, the values in Table V (2nd–4th columns)
are somewhat pessimistic. Indeed, when sharing the PG among
an increasing number of latches, the area increase of the PG We also analyze the area degradation versus sizing (i.e., when
is very low. Thus, the actual area evaluation is affected by the optimizing FOMs where more emphasis is given to the speed).
number of latches sharing the same PG. The results in Fig. 15 (where MS, pulsed, and differential FFs
are depicted with continuous, dashed and dotted lines, respec-
B. Area-Delay Tradeoff tively) refer to the usual optimization conditions and are normal-
The area-delay tradeoff is illustrated for the reference case in ized with respect to the minimum area for each FF (obviously
Fig. 14. From Fig. 14, the area-delay tradeoff closely resembles achieved when optimizing ). Note that the area is a param-
the energy-delay tradeoff discussed in Section IV. The reason eter that does not change continuously, given the use of folded
is that (differently from the leakage-delay tradeoff) the overall layout technique when dealing with large transistors.
energy dissipation is strongly related with the area and the size Differential FFs see the highest relative increase in their area
of the circuits. The main differences with the composite EEC (up to 1.8 ) when they are progressively increased for smaller
in Section IV-B are the very good tradeoff offered by the HLFF delays. Indeed, their complex layouts and the high branching
in the delay range [3–5] and the better features of TGFF effects due to local wires’ parasitics and additional gates (not
with respect to DET-TGLM in the low-energy (i.e., high delay) lying in the paths) require a significant transistor over-
region. sizing of their critical stages. Pulsed FFs (both IP and EP) show
ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 747

an intermediate behavior, with area increases up to 1.4–1.7 . TABLE VI


MS FFs exhibit the smallest relative increase, up to 1.1–1.5 . CLOCK LOAD UNDER VARIOUS OPTIMUM SIZINGS (CLOCK LOAD
NORMALIZED TO THE MINIMUM IS REPORTED IN BRACKETS) AND AVERAGE
(AMONG THE FOMS) PERCENTAGE CONTRIBUTION OF CLOCK WIRES
C. Area-Related Properties
In order to express how the topology is amenable for efficient
physical design, we evaluated in Table V (5th–12th columns) the
“layout-efficiency”, which was defined as the ratio of transistors
count and the FF area normalized to (it represents the number
of transistors in a square with side ). Due to the large number
of different sizings and conditions, we evaluated the mean value
, the standard deviation , and the variability of the layout
efficiency for each FF topology.
From Table V (5th–12th columns), as expected the layout ef-
ficiency decreases in high-speed designs (i.e., minimum ,
, and ) since transistors (those in paths) must
be larger, compared to low-energy designs. Moreover, the layout
efficiency tends to be almost the same for all FFs when referring
to the low-energy sizings. This is quantitatively confirmed by
the variability which is very small (less than 10%). On the
other hand, bigger differences are found in high-speed designs clock load is a further figure of merit since its value is inherently
( is up to 20%). related to the consumption of the clock network.
It is also interesting to analyze the relationship between area In Table VI (2nd–4th columns) we report the clock load of the
and leakage for the various FFs. By considering all the , various FFs under three typical optimum sizings, i.e., minimum
and conditions and all the sizings, (i.e., on , , and (the optimization is carried out in the ref-
the whole, 216 different sizings), the correlation coefficient be- erence case). Absolute and normalized (with respect to the best
tween the average leakage current (minimum leakage circuit, highlighted in gray) values are reported.
current ) and area was evaluated. This correlation FFs have obviously a decreasing clock load when going to-
coefficient turns out to be very close to unity (always larger than wards low-energy designs, except DTLA, which has the min-
0.95 for any FF), which means that area is always proportional imum and constant clock load for all sizings (its PG sees a small
to leakage for any specific FF topology. internal load and hence is always minimum-sized).
To infer if the area-leakage correlation can be assumed as a MS FFs exhibit the highest clock load in almost all conditions
general property independently of the specific FF topology, we (the loads seen by the true and complementary versions of clock
also analyze the correlation coefficients relative to the 19 FFs signals are added), since, independently from their sizing, they
altogether. The latter turns out to be 0.71 (0.80) for have a quite high number of clocked transistor and clock inter-
, which is still rather close to unity. Hence, area connects (see schematic in Fig. 6(a)–(d) in Part I).
can be still considered to be almost linearly related to leakage DET-SPGFF shows wide clock load variations when sized for
despite of the FF topology. In other words, the silicon area of high-speed or low-energy. Indeed (see Section III-D), clocked
FFs can be inferred immediately from the analysis of leakage precharge transistors lie in two of the four paths of such
(i.e., it does not require a separate analysis). Quantitatively, by FF and hence they need to be strongly oversized. This is not the
again considering the 19 FFs altogether, the linear proportion- case when sizing for low-energy.
ality coefficient between and area re- EP FFs exhibit a very small clock load thanks to the “decou-
sults to 30.8 nA m (24.0 nA m ), which is a very useful pling” effect accomplished through the use of a PG. Since the
information to roughly estimate leakage once the area is given PG dissipation has already been fully accounted for, EP FFs re-
(and vice versa). veal another significant advantage, given that they do not bring
a great load to clock distribution network. DET-SPL is slightly
VII. CLOCK LOAD worse because of the features of its PG, which does not guar-
antee a full decoupling.
A. Clock Load Comparison and Tradeoff With Delay Also the clock gating logic (GMSL) and the NOR gates in the
The clock load of a FF is defined as the capacitance seen STFF (see schematics in Fig. 6(c) and 6(m) in Part I) separate
from the FF clock terminal, and is an important feature since it the external clock from the internal nodes and hence such FFs
is closely related with the design of the clock network, which have a low clock load.
is responsible for a large fraction (up to 30%–50%, [28]) of The tradeoff between the clock load and the FF delay can
the whole energy budget in high-performance microprocessors be understood from Fig. 16, which reports the clock load in-
[33]–[35]. Indeed, the higher the clock load, the larger the clock crease with respect to the minimum-energy sizing when FFs
buffers that locally distribute the clock signal to FFs throughout are progressively sized for high speed (continuous, dashed and
the various clock domains [24], [35]. dotted lines for MS, pulsed, and differential FFs, respectively).
Therefore, in addition to the evaluation of the energy spent From Fig. 16, EP FFs (except DET-SPL) show clock load in-
to charge/discharge the clock input capacitance (see Part I), the crements up to 2.5 compared to the minimum energy sizing,
748 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

revise the so far reported results, by carrying out an analysis


that include the clock network contribution in the overall en-
ergy breakdown.
Traditionally, when designing clock networks, a steep clock
waveform, typically featuring a clock slope, is ensured.
However, in [35] we showed that a proper clock slope optimiza-
tion with a value , allows to reduce the overall
clocking energy (i.e., clock buffers and FFs), at the cost of a
negligible speed and local skew/jitter degradation.
As shown in [35], the energy of a tapered clock buffer driving
a clock load equal to and featuring an slope is

(2)

where both dynamic and static energy are taken into account,
Fig. 16. Clock load degradation (normalization to E sizing). Optimization in
active mode: C = 16C , = 0:25, T =FO4 = 40. is the capacitance of the first buffer stage (in the fol-
lowing ) and is the number of
buffer stages.
which are relatively low compared to other classes (because Differently from [35], given that the clock wires distributing
of the presence of the PG, as explained above). IP FFs (ex- the clock signal throughout the domain produce an increment of
cept DET-SPGFF), MS, MSAFF, and STFF show clock load in- that is independent of FFs features, they are neglected in the
crements up to 3.5 . Conditional Differential FFs (CCFF and following analysis. Hence, is simply equal to , being
VSWFF) reach nearly 4 clock load increase. For the previ- the number of FFs within the clock domain (in the following
ously mentioned reasons, DET-SPGFF exhibits the greatest in- ).
crease (up to 5.5 ). In Table VII we report the energy (with respect to the
energy due to the only FFs that is ) due to clock buffers
B. Impact of Layout Parasitics on Clock Load driving the FFs clock load in the case of clock slope and for
In the analysis reported in Section VII-A, the clock load in- the optimum value (i.e., is the optimum tapering
cludes the contribution of layout parasitics. In this section, we factor leading to the minimum clocking energy in the clock do-
evaluate in detail the fraction of clock load due to these para- main). The values of are also reported in Table VII (see
sitics. To this aim, we consider all the , , and [35] for a detailed discussion on how these values are derived,
conditions and all the minimum designs (i.e., 216 dif- given the FFs features) and the considered sizings are minimum
ferent sizings). In Table VI (5th column), we report the average , , and (in the reference case).
percentage ratio between the clock load fraction due By inspection of Table VII, it is apparent that, except for the
to layout parasitics and the total clock load . MS FFs, when considering a steep clock waveform (i.e., ),
From Table VI (5th column), the layout parasitics are a size- the energy increment typically is 10%–30% of the FFs en-
able fraction of the overall clock load, which typically is in the ergy, nearly regardless of FFs sizing (actually it slightly dimin-
40%–60% range (i.e., the layout parasitics can even account for ishes for low-energy design), and EP FFs are again rewarded.
most of the clock load in a FF). This confirms that layout para- Anyhow, the previously reported rankings change in the low-
sitics must be necessarily taken into account to fairly compare energy region because of the behavior of MS FFs, which, due
FFs, although they were neglected in previous papers. to their basic low energy and their high clock load, see a very
Globally, MS FFs (except GMSL) have the most complex high energy increment (up to 100% for the DET-TGLM) due
clock (and complementary clock) routing paths, showing an im- to .
pact of clock wires higher than 50%. An exception to this trend In the case the optimum clock slope is used to minimize the
is represented by DET-SPL and DET-CDFF, where the clock overall clocking energy, it can be easily seen that the energy in-
terminal is not decoupled from some internal transistors (be- crements become much more similar for all the FFs topologies,
cause of turned on transmission gates) and hence the clock wires and they become equal to 30%–35% for MS FFs. Moreover,
have a minor impact on . the energy increments significantly diminish for low-energy siz-
ings. Hence, by also considering that FFs speed is not practi-
C. Joint Flip-Flops and Clock Distribution Energy Dissipation cally degraded by assuming a smoother slope up to
According to the traditional approach adopted in the litera- [35], the previously reported rankings in the energy-delay space
ture, up to this point the FFs comparison has been carried out do not change significantly when adopting the optimum clock
by considering the dissipation related to the only FFs. However, slope. Nevertheless, one should emphasize that, although they
as mentioned in Section VII-A, the dissipation of clock buffers remain the most suitable circuits for very low-energy applica-
in the clock domains is directly connected with the clock load. tions, the inclusion of worsen the performances of MS FFs
Therefore, we have to further investigate this aspect and in case (e.g., for minimum ).
ALIOTO et al.: ANALYSIS AND COMPARISON IN THE ENERGY-DELAY-AREA DOMAIN OF NANOMETER CMOS FLIP-FLOPS 749

M = 128 FFs (N ME
TABLE VII
PERCENTAGE ENERGY INCREMENT DUE TO A CLOCK TAPERED BUFFER DRIVING ORMALIZED TO FFs ENERGY, )

VIII. CONCLUSION sults showed that the clock load is severely impacted by layout
In this paper, an exhaustive comparison of a large number parasitics, and that explicit pulsed FFs have a small clock load
of FFs (19 topologies belonging to four different classes) in thanks to the decoupling effect brought by the pulse generator.
nanometer (65-nm) CMOS technology has been carried out, dif- It is also shown that, by including the impact of local clock dis-
ferently from the other most relevant analyses reported in the lit- tribution buffers, whose dissipation is directly related with FFs
erature that have so far adopted technologies up to 0.13 m. The clock load, the rankings of FFs in the E-D space do not change
comparison has been performed in the whole energy-delay-area significantly, unless for the MS class that is somewhat penalized.
design space. The impact of layout parasitics has been included As a general remark, simpler basic structures are rewarded in
in the transistor-level design phase. The contribution of leakage nanometer technologies because of the strong impact of layout
has been considered in both standby and active mode, weighting parasitics. In particular, explicit pulsed topologies, and specif-
it according to the logic depth in the active case. Wide loading ically the TGPL, have been recognized as the most efficient
and switching activity conditions have been explored, and other FF topologies in a very wide range of applications from many
properties (e.g., the clock load) have been analyzed in detail. points of view.
As opposite to previous papers, figures of merit that designers
REFERENCES
are familiar with have been considered to gain an insight into the
[1] M. Alioto, E. Consoli, and G. Palumbo, “Analysis and comparison
considered tradeoffs in a wide range of applications. Analysis
in the energy-delay-area domain of nanometer CMOS flip-flops: Part
showed that the results are different from previous papers be- I—Methodologies and design strategies,” IEEE Trans. Very Large
cause, here, the layout parasitics have been explicitly included Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 725–736, May 2011.
from the beginning and a much wider range of topologies has [2] D. Markovic, B. Nikolic, and R. Brodersen, “Analysis and design of
low-energy flip-flops,” in Proc. Int. Symp. Low Power Electron. Des.,
been considered. Aug. 2001, pp. 52–55.
According to the presented results, the fastest topology is the [3] D. Markovic, J. Tschanz, and V. De, “Transmission-gate based flip-
STFF, the best low-energy FFs are the DETTGLM and TGFF, flop,” U.S. Patent 6 642 765, Nov. 4, 2003.
[4] M. Nogawa and Y. Ohtomo, “A data-transition look-ahead DFF circuit
whereas the most energy-efficient throughout a wide region of for statistical reduction in power consumption,” IEEE J. Solid-State
the energy-delay design space is the TGPL. Moreover, the best Circuits, vol. 33, no. 5, pp. 702–706, May 1998.
topologies within each of the main FF classes (MS, implicit- [5] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D.
Draper, “Flow-through latch and edge-triggered flip-flop hybrid ele-
explicit pulsed, differential, and dual-edge-triggered) have been
ments,” in Proc. IEEE Int. Solid-State Circuit Conf., Feb. 1996, pp.
identified as well. 138–139.
For the first time, the layout efficiency of FFs has been [6] F. Klass, C. Amir, A. Das, K. Aingaran, C. Truong, R. Wang, A. Mehta,
analyzed. In particular, HLFF, MSAFF, and TGFF exhibit a R. Heald, and G. Yee, “A new family of semidynamic and dynamic flip-
flops with embedded logic for high-performance processors,” IEEE J.
very efficient area-delay tradeoff. Moreover, it has been shown Solid-State Circuits, vol. 34, no. 5, pp. 712–716, May 1999.
that area is almost proportional to leakage regardless of the FF [7] R. Heald, K. Aingaran, C. Amir, M. Ang, M. Boland, P. Dixit, G.
topology and the transistor sizing. Gouldsberry, D. Greenley, J. Grinberg, J. Hart, T. Horel, W. Hsu, J.
Kaku, C. Kim, S. Kim, F. Klass, H. Kwan, G. Lauterbach, R. Lo, H.
The differences between the leakage-delay and the more gen- McIntyre, A. Mehta, D. Murata, S. Nguyen, Y. Pai, S. Patel, K. Shin,
eral energy-delay tradeoff have been pointed out. It has also K. Tam, S. Vishwanthaiah, J. Wu, G. Yee, and E. You, “A third gener-
been shown that leakage has a significant impact on the op- ation SPARC V9 64-b microprocessor,” IEEE J. Solid-State Circuits,
timum transistor sizing, especially for MS FFs. The clock load vol. 35, no. 11, pp. 1526–1538, Nov. 2000.
[8] N. Nedovic, “Clocked storage elements for high-performance applica-
seen from the clock terminal of a FF and the related dissipation tions,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ. California,
of the clock distribution network, has also been analyzed. Re- Davis, 2003.
750 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 5, MAY 2011

[9] N. Nedovic, M. Aleksic, and V. Oklobdzija, “Conditional techniques [23] S. Heo, R. Krashinsky, and K. Asanovic, “Activity-sensitive flip-flop
for low power consumption flip-flops,” in Proc. IEEE Int. Conf. Elec- and latch selection for reduced energy,” IEEE Trans. Very Large Scale
tron., Circuits Syst., Feb./May 2001, vol. 2, pp. 803–806. Integr. (VLSI) Syst., vol. 15, no. 9, pp. 1060–1064, Sep. 2007.
[10] P. Zhao, T. Darwish, and M. Bayoumi, “Low power and high speed [24] N. Weste and D. Harris, CMOS VLSI Design: A Circuits and System
explicit-pulsed flip-flops,” in Proc. IEEE Midw. Symp. Circuits Syst., Perspective (3rd edition). Boston, MA: Addison Wesley, 2004.
Aug. 2002, pp. 477–480. [25] V. Oklobdzija, V. Stojanovic, D. Markovic, and N. Nedovic, Digital
[11] S. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. Sullivan, and System Clocking: High-Performance and Low-Power Aspects. New
T. Grutkowski, “The implementation of the itanium 2 microprocessor,” York: Wiley-IEEE Press, 2003.
IEEE J. Solid-State Circuit, vol. 37, no. 11, pp. 1448–1460, Nov. 2002. [26] H. Partovi, “Clocked storage elements,” in Design of High-Perfor-
[12] B. Nikolic, V. Stojanovic, V. Oklobdzija, W. Jia, J. Chiu, and M. mance Microprocessor Circuits. Piscataway, NJ: IEEE Press, 2001,
Leung, “Improved sense-amplifier-based flip-flop: Design and mea- pp. 207–234.
surements,” IEEE J. Solid-State Circuits, vol. 35, no. 6, pp. 876–884, [27] V. Stojanovic and V. Oklobdzija, “Comparative analysis of master-
Jun. 2000. slave latches and flip-flops for high-performance and low-power sys-
[13] N. Nedovic, V. Oklobdzija, and W. Walker, “A clock skew absorbing tems,” IEEE J. Solid-State Circuits, vol. 34, no. 4, pp. 536–548, Apr.
flip-flop,” in Proc. IEEE Int. Solid-State Circuit Conf., Feb. 2003, pp. 1999.
342–344. [28] N. Nedovic and V. Oklobdzija, “Dual-edge triggered storage elements
[14] B. Kong, S. Kim, and Y. Jun, “Conditional-capture flip-flop for statis- and clocking strategy for low-power systems,” IEEE Trans. Very Large
tical power reduction,” IEEE J. Solid-State Circuits, vol. 36, no. 8, pp. Scale Integr. (VLSI) Syst., vol. 13, no. 5, pp. 577–590, May 2005.
1263–1271, Aug. 2001. [29] S. G. Narendra and A. Chandrakasan, Leakage in Nanometer CMOS
[15] S. Shin and B. Kong, “Variable sampling window flip-flops for low- Technologies. New York: Springer, 2006.
power high-speed VLSI,” IEE Proc. IEE Circuits, Devices Syst., vol. [30] A. Abdollahi, F. Fallah, and M. Pedram, “Leakage current reduction in
152, no. 3, pp. 266–271, Jun. 2005. CMOS VLSI circuits by input vector control,” IEEE Trans. Very Large
[16] R. Llopis and M. Sachdev, “Low power, testable dual edge triggered Scale Integr. (VLSI) Syst., vol. 12, no. 2, pp. 140–154, Feb. 2004.
flip-flops,” in Proc. Int. Symp. Low Power Electron. Des., Aug. 1996, [31] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage
pp. 341–345. current mechanisms and leakage reduction techniques in deep-submi-
[17] N. Nedovic, W. Walker, V. Oklobdzija, and M. Aleksic, “A low power crometer CMOS circuits,” Proc. IEEE, vol. 91, no. 2, pp. 305–327, Feb.
simmetrically pulsed dual edge-triggered flip-flop,” in Proc. IEEE Eur. 2003.
Solid-State Circuits Conf., Sep. 2002, pp. 399–402. [32] M. Agostinelli, M. Alioto, D. Esseni, and L. Selmi, “Leakage-delay
[18] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V. De, tradeoff in FinFET logic circuits: A comparative analysys with bulk
“Comparative delay and energy of single edge-triggered and dual edge- technology,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., to be
triggered pulsed flip-flops for high-performance microprocessors,” in published.
Proc. Int. Symp. Low Power Electron. Des., Aug. 2001, pp. 147–152. [33] P. Gronowski, W. Bowhill, R. Preston, M. Gowan, and R. Allmon,
[19] P. Zhao, T. Darwish, and M. Bayoumi, “High-performance and low- “High-performance microprocessor design,” IEEE J. Solid-State Cir-
power conditional discharge flip-flop,” IEEE Trans. Very Large Scale cuits, vol. 33, no. 5, pp. 676–686, May 1998.
Integr. (VLSI) Syst., vol. 12, no. 5, pp. 477–484, May 2004. [34] D. Bailey and B. Benschneider, “Clocking design and analysis for a
[20] M. Alioto, E. Consoli, and G. Palumbo, “General strategies to design 600-MHz alpha microprocessor,” IEEE J. Solid-State Circuits, vol. 33,
nanometer flip-flops in the energy-delay space,” IEEE Trans. Circuits no. 11, pp. 1627–1633, Nov. 1998.
Syst. I, Reg. Papers, to be published. [35] M. Alioto, E. Consoli, and G. Palumbo, “Flip-flop energy/performance
[21] C. Giacomotto, N. Nedovic, and V. Oklobdzija, “The effect of the versus clock slope and impact on the clock network design,” IEEE
system specification on the optimal selection of clocked storage ele- Trans. Circuits Syst. I, Reg. Papers, to be published.
ments,” IEEE J. Solid-State Circuits, vol. 42, no. 6, pp. 1392–1404,
Jun. 2007.
[22] S. Heo and K. Asanovic, “Load-sensitive flip-flop characterization,” in
Proc. IEEE Comput. Soc. Workshop VLSI, Apr. 2001, pp. 87–92. Photographs and biographies of all authors are available in M. Alioto et al. [1].

Anda mungkin juga menyukai