Anda di halaman 1dari 9

Comparative study on low-power high-performance

standard-cell flip-flops
S. Tahmasbi Oskuii, A. Alvandpour
Electronic Devices, Linkping University, Linkping, Sweden

ABSTRACT

This paper explores the energy-delay space of eight widely referred flip-flops in a 0.13m CMOS technology. The main
goal has been to find the smallest set of flip-flop topologies to be included in a high performance flip-flop cell library
covering a wide range of power-performance targets. Based on our comparison results, transmission gate-based flip-
flops show the best power-performance trade-off with a total delay (clock-to-output + setup time) down to 105ps. For
higher performance, the pulse-triggered flip-flops are the fastest (80ps) alternatives suitable to be included in a flip-flop
cell library. However, pulse-triggered flip-flops consume significantly larger power (about 2.5x) compared to other fast
but fully dynamic flip-flops such as TSPC and dynamic TG-based flip-flops.

Keywords: flip-flops, latches, low-power, standard-cell, cell library, energy-delay space

1. INTRODUCTION

For high performance VLSI chip-design, the choice of the back-end methodology has a significant impact on the design
time and the design cost. Making every single gate from scratch is not necessarily the best method. Instead, a sufficient
set of pre-designed standard cells can be utilized as building blocks to design most of the functional blocks.
Semiconductor manufacturers offer standard cell libraries, which are also supported by CAD tools in automated design
flows including the final physical auto-placement and routing. However, the selection of the standard cells as well as
their performance is often limited. Despite the performance limitations, standard cell libraries could be useful even in
design of high performance VLSI chips. Often, only a smaller portion of the chips include performance-critical units,
and the rest of the design could be maximally automated to reduce the design time without degrading the targeted
performance. In addition, the concept of cell library can be extended to even support the full-custom part of the chip.
Custom (in house) cell libraries can be made and shared by the designers of the performance critical units. This results
in a sharp decrease in the number of cells to be created and verified reducing the total chip layout time significantly.
Hence, development of an efficient cell library for high performance chips is essential.
A cell library includes a number of cells with different functionalities, where each cell may be available in several sizes
and with different driving capability. Two central categories of cells included in cell libraries are flip-flops and latches.
These are extremely important circuit elements in any synchronous VLSI chip. They are not only responsible for correct
timing, functionality, and performance of the chips, but also their clocked devices consume a significant portion of the
total active power.
A universal flip-flop with the best performance, lowest power consumption, and highest robustness against noise would
be an ideal component to be included in cell libraries. However, it will be shown in this paper, that increasing the
performance of flip-flops generally involves significant power and robustness trade-offs. Therefore, a set of different
latches and flip-flops with different performances are essential to limit the use of more power consuming and noise-
sensitive elements only for smaller portion of the chips with performance-critical units. This eliminates global and
unnecessary increase in power consumption as well as robustness degradations, which would result in overall decrease
in noise margin requiring extra careful and time consuming design.

390 Microelectronics: Design, Technology, and Packaging, edited by Derek Abbott,


Kamran Eshraghian, Charles A. Musca, Dimitris Pavlidis, Neil Weste, Proceedings of SPIE
Vol. 5274 (SPIE, Bellingham, WA, 2004) 0277-786X/04/$15 doi: 10.1117/12.530225
The goal of this work is to find a small set (ideally the smallest set) of flip-flop topologies to be included in a library
covering a wide range of power-performance targets. Our strategy has been to first explore the capabilities of
conventional and simpler transmission-gate (TG) based flip-flop topologies, before including other types of flip-flops.
Among a large number of flip-flops that have been proposed in the past [1-5], we have selected some of the widely used
and/or referred topologies. Sec.2 shows eight flip-flops we have incorporated in our initial benchmark including static
and dynamic edge-triggered mater-slave as well as semi-dynamic pulsed flip-flops.
In contrast to many previously published results [3], [5], we have explored a wide power-performance space for each of
the eight flip-flops. By sizing, we have identified the useful operating ranges of the flip-flops. The design-space
exploration not only enables a true comparison, but also it reveals potentially large overlaps in operating range of the
flip-flops. This in turn provides an opportunity to reduce the number of different circuit topologies in a flip-flop library.
Sec. 3 describes our simulation setup, as well as the flip-flop parameters we have considered in our comparisons. In Sec.
4 we show the comparison results including the energy-delay space for each flip-flop topology followed by the
conclusions in Sec. 5.

2. FLIP-FLOP TOPOLOGIES

As was described in Sec.1, many flip-flop topologies have been proposed in the past. For our comparative study, we
have selected some of widely used and/or referred topologies in our initial benchmark. Four static master-slave flip-
flops are included in our test bench. Figure 1 shows the classic transmission-gate based flip-flop (TGMS) [1]. Another
variation of TGMS is the flip-flop shown in figure 2, which is derived from PowerPC 603 master slave flip-flop [6]. In
PowerPC 603 the interrupting feedback in the storage elements is based on CMOS inverters. Figure 3 shows the third
topology, which is a modified clocked inverter (mCMOS) [7], where the dynamic master-slave CMOS flip-flop is
modified to a pseudo-static CMOS flip-flop by adding a CMOS feedback at the outputs. Fourth master-slave flip-flop
(figure 4) is based on the traditional SR-latch build by cross coupled NAND/NOR gates [1], [4].
The next two flip-flops (figures 5, 6) are pulse-triggered latches. They are based on a single latch, which is transparent
within a short time (during a pulse) on the edge of the clock. Figure 5 shows a hybrid-latch flip-flop element (HLFF)
[8], and figure 6 shows a semi-dynamic flip-flop (SDFF) [9]. Both of the pulse-triggered topologies require and include
pulse generators.
Further, there are two fully dynamic flip-flops in our benchmark; the TSPC flip-flop [10] in figure 7 and the dynamic
transmission gate flip-flop [1], [2] in figure 8. These fast flip-flops (with floating nodes) are extra sensitive to noise and
leakage currents. However, we have included their performance level as a reference to evaluate other flip-flops.

Figure 1: TGMS flip-flop Figure 2: PowerPC 603 flip-flop

Proc. of SPIE Vol. 5274 391


Figure 4: NAND-NOR flip-flop
Figure 3: Modified CMOS flip-flop (mCMOS)

Figure 5: Hybrid latch flip-flop (HLFF)

Figure 6: Semi-dynamic flip-flop (SDFF)

Figure 7: 9T TSPC flip-flop Figure 8: fully-dynamic TGMS flip-flop

3. SIMULATION SETUP

All the circuits are designed in a standard 0.13m CMOS technology. The supply voltage used for simulations is 1.2V,
and the operating temperature is 27C. The simulation condition is shown in figure 9. All of the flip-flops utilize
identical and fixed input drivers (minimum sized inverters) and are loaded equally by the input capacitances of four
minimum sized inverters.

392 Proc. of SPIE Vol. 5274


Figure 9: The simulation test bench

3.1. Energy-delay space exploration


We have explored the energy-delay space of each flip-flop by sizing the internal devices, while keeping the input drivers
and the output loads fixed. The best achieved delays for a number of different target energy consumption points have
been selected as the sub-optimal energy-delay points. Our timing and energy metrics are described as follows:

3.1.1. Delay and Timing metrics


The performance of flip-flops is defined by three important time-windows and delays: Clock-to-Output delay, setup
time, and hold time. For performance comparison, the setup time and hold times require a clear definition. An edge-
triggered flip-flop requires the input data to be stable some time before the edge of the clock. This time-window is
referred as the setup time of a flip-flop. The time after the clock edge for which the input has to remain stable is called
the hold time. The setup time can be defined and measured in different ways. The time window could be measured by
referring it to a timing edge, where a flip-flop fails to receive the data. However, this definition might be impractical.
Before the input reach the failure limit, the flip-flop responses slower to the input data. This increases the delay of the
flip-flop. Throughout this paper, we use the following definition for setup and hold time which was also used in [3]:
Setup time and hold time are the data to clock offsets which cause 5% increase of Clock-to-Output. This is illustrated in
figure 10. Propagation delay, setup time and hold time may be different for low-to-high or high-to-low input transitions.
For the comparisons, we have chosen the worst case delays.
t Clock-to-Output=max ( t Clock-to-OutputLH, t Clock-to-OutputHL),
t setup=max ( t setupLH, t setupHL), t hold=max ( t holdLH, t holdHL)

t skew
B
A t Logic
Combinational logic

Figure 11: Flip-flops at the logic boundaries

Figure 10: Setup time and hold time definitions

Proc. of SPIE Vol. 5274 393


Further, in a digital system (figure 11) the following condition has to be satisfied:
Latency = t Clock-to-Output (Max) + t setup (1)
Latency + t logic + t skew T (2)
Where, T (the clock period) must be greater than the sum of the worst case clock-to-output delay of the flip-flop A, the
setup time of the flip-flop B, the maximum delay in the combinational logic, and the relative clock skew. Therefore, we
have used the sum of the clock-to-output and the setup time as the delay imposed by the flip-flops.

3.1.2. Energy metrics:


Two energy metrics have been used for the comparisons. The first measure is energy-per-transition (EPT), which is the
average energy consumption when a transition appears at the input of the flip-flop (average of high and low transitions).
The second measure is the clock-energy, which is the average energy consumption when data activity is zero.
Since the flip-flops are targeted to be used in cell libraries, the input clock and the input data are both single-ended. All
additional clock phases are generated inside the flip-flops. Power consumptions of the clock and data drivers are
included in the total power consumption of the flip-flops.

4. SIMULATION RESULTS AND COMPARISON

Figures 12-19 show the energy-delay space of each flip-flop. Each figure includes two sub-graphs:
1- The upper sub-graph shows the flip-flop energy per transition versus the total delay time (clock-to-output + setup
time). The energy consumed by the clocked devices is shown with black color.
2- The lower sub-graph shows the total delay time (clock-to-output + setup time) versus the total flip-flop energy per
transition. The setup time and the clock-to-output delay are highlighted by white and black colors respectively.
Figures 20-23 summarize the energy-delay space of all the flip-flops. As the figure 21 shows, transmission gates flip-
flops TGMS and PowerPC 603 show the best power-performance trade-off among the fully static flip-flops. Further,
they cover a relatively wide portion of the total energy-delay space. Pulse-triggered flip-flops HLFF and SDFF can
support shorter delay targets. Figure 21 shows that pulse-triggered flip-flops HLFF and SDFF are faster mainly due to
their shorter setup-time. Based on this figure the SDFF is the fastest flip-flop. However, the pulse-triggered flip-flops
consume a considerably larger power (about 2x compared to TGMS flip-flops). The TSPC and the dynamic TG-based
flip flops have a comparable performance while they consume up to 50% of the energy needed for SDFF. However,
their internal floating nodes are sensitive to leakage currents and other sources of noise [11].

Figure 12: Energy-delay space for TGMS Figure 13: Energy-delay space for PowerPC 603

394 Proc. of SPIE Vol. 5274


Figure 14: Energy-delay space for mCMOS Figure 15: Energy-delay space for NANDNOR

Figure 16: Energy-delay space for HLFF Figure 17: Energy-delay space for SDFF

Figure 18: Energy-delay space for TSPC Figure 19: Energy-delay space for dynamic TGMS

Proc. of SPIE Vol. 5274 395


Figure 20: Energy-per-transition versus clock-to-output delay Figure 21: Energy-per-transition versus total delay

Figure 22: Clock-Energy versus clock-to-output delay Figure 23: Clock-Energy versus total delay

Figures 20-23 can be used to identify the optimum flip-flop topology for different energy-delay targets. However, as an
example, Table 1 compares the flip-flops at their minimum EPT delay points in Fig. 21.

396 Proc. of SPIE Vol. 5274


Overall Clock-to- Setup- Hold- Energy-per- Clock
delay [ps] Output [ps] time[ps] time[ps] transition [fJ] energy [fJ]

SDFF 83.6 65.1 15.0 18.8 46.8 34.4


HLFF 94.5 64.4 26.9 15.6 34.7 21.7
TGMS-dynamic 98.4 49.8 46.1 -6.4 15.8 4.4
TSPC 103.8 59.7 41.1 3.9 15.6 6.7
PowerPC 116.3 60.2 53.1 -17.4 18.9 5.7
TGMS 118.7 63.3 52.2 -17.8 18.8 5.6
mCMOS 152.8 68.6 80.8 -31.7 29.9 10.6
NANDNOR 197.5 94.9 97.9 -30.8 25.1 7.5
Table 1: Performance parameters at minimum EPT Latency

5. CONCLUSION

In this paper, we have explored the energy-delay space for eight of widely referred flip-flops to be included in a high
performance flip-flop cell library covering a wide range of power-performance targets. All the eight flip-flops have been
designed in a standard 0.13m CMOS technology at 1.2V. Based on our simulation results, we have shown that
transmission gate-based flip-flops (such as TGMS and PowerPC 603) exhibit the best power-performance trade-off with
a total delay (clock-to-output + setup time) down to 105ps. For higher performance, the pulse-triggered semi-dynamic
flip-flop SDFF (figure 6) is the fastest (80ps) alternative suitable to be included in a flip-flop cell library. However,
pulse-triggered flip-flops consume significantly larger power (about 2.5x) compared to fully-dynamic flip-flops such as
TSPC and dynamic TG-based flip-flops.

ACKNOWLEDGEMENTS

Authors would like to thank Dr. Ram Krishnamurthy, and James Tschanz (Intel Corporation) and Prof. Christer
Svensson (Linkping University) for useful technical discussions.

REFERENCES

1. Weste N. H. E., Eshraghian K., Principles of CMOS VLSI design, a systems perspective, second edition,
Addison-Wesley, 1994
2. Rabaey J. M., Chandrakasan A., Nikolic B., Digital integrated circuits, a design perspective, second edition,
Prentice Hall, 2003
3. Markovic D., Nikolic B., Brodersen R.W., Analysis and design of low-energy flip-flops, Proceeding of
International Symposium on Low Power Electronics and Design, 2001, 6-7 Aug. 2001, Pages: 52 -55
4. Uyemura J., Circuit Design for CMOS VLSI, Kluwer Academic Publishers, Norwell, Massachusetts, 1992
5. Stojanovic V., Oklobdzija V.G., Comparative analysis of master-slave latches and flip-flops for high-
performance and low-power systems, IEEE Journal of Solid-State Circuits, Volume: 34 Issue: 4 , April 1999,
Pages: 536 -548

Proc. of SPIE Vol. 5274 397


6. Gerosa G., Gary S., Dietz C., Dac Pham, Hoover K., Alvarez J., Sanchez H., Ippolito P., Tai Ngo, Litch S., Eno
J., Golab J., Vanderschaaf N., Kahle J., A 2.2 W, 80 MHz superscalar RISC microprocessor, IEEE Journal of
Solid-State Circuits, Volume: 29 Issue: 12 , Dec. 1994, Pages: 1440 -1454
7. Suzuki Y., Odagawa K., Abe T., Clocked CMOS calculator circuitry, IEEE Journal of Solid-State Circuits,
Volume: 8 Issue: 6 , Dec 1973, Pages: 462 -469
8. Partovi H., Burd R., Salim U., Weber F., DiGregorio L., Draper D., Flow-through latch and edge-triggered
flip-flop hybrid elements, Solid-State Circuits Conference, 1996. Digest of Technical Papers. 43rd ISSCC.,
1996 IEEE International , 8-10 Feb. 1996, Pages: 138 -139
9. Klass F., Semi-dynamic and dynamic flip-flops with embedded logic, Digest of Technical Papers, 1998
Symposium on VLSI Circuits, Honolulu, HI, USA, 11-13 June 1998, Pages: 108 -109
10. Yuan J., Svensson C., High-speed CMOS circuit technique, IEEE Journal of Solid-State Circuits, Volume: 24
Issue: 1 , Feb. 1989, Pages: 62 -70
11. Larsson P.; Svensson C., Noise in digital dynamic CMOS circuits, IEEE Journal of Solid-State Circuits,
Volume: 29 Issue: 6 , June 1994 , Pages: 655 -662

398 Proc. of SPIE Vol. 5274