07 Chapter 2

28
CHAPTER 2
LITERATURE REVIEW
Energy conversion is needed to represent a change in signal

value. If energy exists only in one form, i.e. electric energy, then there is
only one irreversible energy conversion from electric energy to heat.
To break this one-way conversion, researchers have introduced another
energy form, i.e. magnetic field energy, into the digital circuit. If one
relates the signal change to the conversion of electric energy to magnetic
energy the so-called “energy-recovery” can be realized. This is the method
by which the irreversible conversion from electric energy to heat caused by
dissipative elements, i.e. resistors, is largely reduced or avoided.
The energy conversion from electric field to magnetic field and

vice versa implies that circuits should be supplied with AC power. In this
case, signals in the circuits should also be alternating quantities. The latter
has been extensively used in dynamic CMOS logic, clocked CMOS logic
and various domino logics (Weste et al 1993). However, those circuits still
rely on DC power, and the energy conversion remains as electric energy to
heat. There is need for further study in the case of circuits supplied with
AC power. The AC power controls the working rhythm of the circuit and
acts as the clock, called the power-clock.
The research shows that the adopted power clock with gradually
changing process during its rising and falling dissipates only less energy
29
for charging and discharging the node capacitance through the conducting
of MOS transistor. The “adiabatic” switching operation is resulted, by
which a new approach to design low power CMOS circuits is proposed.
Clocked CMOS circuits with gradually rising and falling power-clock were
expected to obtain a significant energy saving. It attracts many researchers
to study this issue in recent years. However, the operational constraint that
the output signal should track the power clock’s gradually rising and
falling behavior to accomplish the charging and discharging process
increases the difficulty in the circuit design. At present, the existing
research either adopts retractile cascade power clock or adopts multiple-
phase power clock with memory schemes (Pedram et al 2000).
The new research on the energy recovery CMOS circuit should

start from its basic theory, including the basic algebraic expressions and the
basic properties of clocked signals. At the same time, both the basic
clocked CMOS gate and the clocked flip-flop, the basic unit of energy-
recovery CMOS circuits, should be investigated at the beginning. With the
above view this research will focus on these two topics.
A variety of full adders using static and dynamic logic styles has
been reported in literature, 34 of which have been stated by (Jiang et al
2008) alone, including the most well known static complementary CMOS
adders using 28 transistors and 40 transistors.
In specific, there are two research works by Shalem et al (1999)

and Bui et al (2000) explained that the Static Energy Recovery Full (SERF)
adder. This design requires only 10 transistors to implement a full adder.
The adder makes use of the design architecture, where an intermediately
generated XNOR (A,B) signal is shared to generate the carry out and the
30
sum outputs. SERF has been shown to consume 26% less power than a
Transmission Function Adder.
Sub-threshold circuit design involves scaling the supply voltage

below the threshold voltage, where load capacitances are charged or
discharged by sub-threshold leakage currents. Leakage currents are orders
of magnitude lower than drain currents in the strong inversion regime, so
there was a significant limit on the maximum performance of sub -
threshold circuits. Therefore, traditionally, sub threshold circuits have been
used for applications which require ultra-low power dissipation, with low-
to-moderate circuit performance (Wang et al 2002 and Shams et al 2002).
SERF design uses only 10 transistors to implement a full adder.

This circuit operates well at higher supply voltages, but if the supply
voltage was scaled to voltages lower than 0.3V, it will fail to work.
Table 2.1 Output of SERF adder with various input combination
A B C SUM CARRY
0 0 0 0 0
0 0 1 Vdd - Vt 0
0 1 0 1 Vt
0 1 1 0 1
1 0 0 1 >Vtp
1 0 1 0 1
1 1 0 Power consuming Vdd – 2 Vth
causes to failure
1 1 1 >Vdd – 2 Vth Vdd - Vth
31
The output of SERF adder with various input combination is

shown in Table 2.1. It shows that, SERF adder was confronted with serious
problems especially at lower supply voltages. Assume that one of the two
input vectors ABCin = “110” and “111” are applied when A = 1 and B = 1,
the node voltage was Vdd - Vth. Now if Cin = 0 then Cout will be equal to
Vdd-2Vth and the sum signal is going to zero driven by a MOS transistor
with its gate connected to Vdd - Vth. When Cin = 1, Cout is connected to Vdd
(may be lower) and the SUM signal will go to Vdd-Vth. Another problem
with this design was when the floating node was connected to ‘0’ (A = 0,
B = 1 or A = 1, B = 0). When Cin is “1”, Cout was charged to Vdd, but when
Cin= 0, Cout must be discharged to ground using a PMOS pass transistor
that cannot fully discharge the output. In this case, Cout was discharged to
Vtp which is higher than Vtn. This problem was intensified if the circuit
works at sub threshold voltage.
If A is at logic “1”, some current leaks to the Cout node which

makes Cout to increase even more than Vtp in some cases depending on the
sizing of the pass transistors. In this case the sum value is dependent on the
Cin state, for instance, if Cin is “1”, the sum output is going to Vdd - Vth
which is a problem in sub threshold region. The most important problem
with SERF full adder was in the case when A = 1, B = 1 and Cin = 0. In this
case as mentioned before the output signal reaches Vdd. The results show
that at Vdd = 0.3V, the output signal rises only to 0.1V which was not high
enough to change the state of the next stage. To eliminate these problems a
new topology must be introduced. This limitation also causes a constraint
for lowering the supply voltage.
For instance, to have a correct output for SUM it seems that the
supply voltage cannot be lowered more than Vdd/2 + 2Vtn indicating that
the supply voltage must be higher than Vdd/2 + 0.28V in a 65 nm CMOS
32
technology. However this limit depends on the circuit design topology and
also the sizing and the device types that were employed. To mitigate this
problem, the gates of power consuming transition for Cout signal must be
connected to Vdd during the challenging state (A = B = 1, Cin = 0). Then the
supply voltage may be reduced to as low as Vdd/2+Vth which was estimated
to be Vdd/2+0.14. For example when Vdd = 0.3V, in the worst case Cout
will be Vdd - Vth = 0.16V, which can be used as a high logic. In addition the
NMOS pass transistor may be upsized to further lower the supply voltage.
It seems possible to lower supply voltage to 0.25V. In A = 1, B = 1 and Cin
= 0, it can be seen that the output cannot be decided exactly, because in this
case, two PMOS devices and also the NMOS transistors are ON, then the
output state was roughly dependant on the transistor (Junming et al 2001
and Wu et al 2001).
A detailed analysis and synthesis procedures for pass networks

are presented by Pedron et al (2005) and Radhakrishnan et al (2006).
A pass transistor was an NMOS (PMOS) transistor with the signal input
fed to the source and the signal output taken from drain. A pass network
was an interconnection of a number of pass transistors to achieve a
particular switching function. If signals X ( X and Y are connected to the
gate and source of an NMOS (PMOS) transistor respectively, then it was
denoted as X(Y) and read as ‘X passing Y’. When both an NMOS and a
PMOS transistor are connected in parallel to pass the signal Y, the circuit
was referred to as a CMOS transmission gate. An NMOS transistor passes
a strong '0', but a weak '1'. A PMOS transistor passes a strong '1', but a
weak '0'.
33
A CMOS transmission gate passes both a strong '1' and a strong

'0'. A series connection of a number of NMOS (PMOS) transistors passes
the input to the output when all control signals were high (low). This was
represented by the expression X1X2…Xn(V), where X1, X2,…,Xn were the
control variables applied to the gates of the NMOS transistors. (X1, X2,
...,Xn)are the control variables applied to the gates of the PMOS transistors)
and V is the pass variable. A product term P = X1X2…Xn passing an input
signal V, was defined as the pass implicant and is denoted by P(V). A pass
function was formed by the logical sum of a number of pass implicants. In
a minimal pass function, all pass implicants belong to the set of pass prime
implicants. A pass prime implicant was a pass implicant that cannot
subsume any other pass implicant with a smaller number of literals in it
that implies the same function. The design of minimal transistor CMOS
pass networks with full voltage swings at the output was described by
(Radhakrishnan et al 2006).
A number of different XOR and XNOR circuit realizations

using a combination of pass transistors, transmission gates and inverters
were presented by (Wang et al 2004). To minimize the number of inverters
in the design the k-map cells must be grouped such that complementary
variables were not used as pass variables. If A = B = 0, both the PMOS
transistors pass a weak '0' to the output. This can be easily remedied by
replacing one of the PMOS transistors by a transmission gate, thus
requiring a total of 4 transistors. This XOR configuration was simulated at
a reduced power supply voltage of 2V and was found to have no threshold
voltage loss, thereby providing full voltage swing (0 to 2V) at the output.
By replacing the PMOS transistor controlled by the input ‘A’ by a
transmission gate, the output becomes a strong '0'. This needs only a total
of 4 transistors (6 including an inverter).
34
Dhireesha Kudithipudi et al (2005) explained the comparative

analysis of the power, delay, and power delay product (PDP) optimization
characteristics of four parallel digital multipliers implemented using low
power 10 transistor (10T) adders and conventional CMOS adder cells.
In order to achieve optimal power savings at smaller geometry sizes, a
heuristic approach known as hybrid adder models. Multipliers realized
using the Static Energy Recovery Full adder (SERF) circuit consumed
considerably less power compared to 10T and static CMOS based
multipliers for all the configurations studied. Furthermore, the difference
between the power consumption of the 10 transistor based multipliers and
28T multipliers was significant at 180 nm, but not at 70 nm. For smaller
geometry sizes down to 70 nm, the propagation delay of the multipliers
implemented with 10 transistors translates to a better performance measure.
Carry-Save Multipliers have better PDP range than the other multipliers for
all the three adder sub-module designs. The PDP measure for optimal
scaled gate width resulted in a best-case scenario for SERF Wallace tree
multiplier as compared to the other three SERF based multipliers. This can
be attributed to the fast computational capability of the Wallace Tree
multiplier and SERF adders save more power at deep sub-micron sizes.
A design called Gate-Diffusion Input (GDI) was proposed by

Morgenshtein et al (2002). It was an ingenious design which was very
flexible for digital circuits. Besides, it was also power efficient without
incurring huge amount of transistor count. Although GDI has the above
advantages, it still has some difficulties that to be solved. The major
problem of a GDI cell was that it requires twin-well CMOS or silicon on
insulator (SOI) process to realize. Thus, it will be more expensive to realize
a GDI chip. However, if only standard p well CMOS process can be used,
the GDI scheme will face the problem of lacking driving capability which
35
makes it difficult to realize a feasible chip. A modified GDI scheme has

been proposed to adopt the general CMOS process. In addition, four 10T
based full adders were proposed using the modified GDI scheme.
In most of these systems, the adder lies in the critical path that
determines the overall speed of the system thereby enhancing the
performance of the 1-bit full adder cell, a significant goal. Demands for the
low power VLSI have been pushing the development of aggressive design
methodologies to reduce the power consumption drastically.
To meet the growing demand, a new low power adder cell, by sacrificing
the MOS Transistor count, reduces the serious threshold loss problem.
It considerably increases the speed and decreases the power when
compared to the static energy recovery full (SERF) adder. A new improved
14T CMOS l-bit full adder cell was presented by Vigneswaran et al (2006).
Several designs of low power adder cells can be found in the

literature. The transmission function full adder (Zimmermann et al 2007),
uses 16 transistors, for the realization of the circuit. For this circuit there
are two possible short circuit paths to ground. This design uses pull-up and
pull down logic as well as complementary pass logic to drive the load.
Shams et al (2004) have proposed dual value logic (DVL) full

adder. It uses 23 transistors for the realization of the adder function.
The DVL was developed to improve the characteristics of double pass
transistor logic. It was designed to have the logic level high signal passed
to the load through a P-transistor and the logic level low drained from the
load through an N-transistor.
Numerous methods for leakage power control have been

reported in the literature. The work done by Halter et al (1997) made use of
36
the dependence of the leakage current on the input vector to the gate.
With additional control logic, the circuit was put into a low-leakage
standby state when it was idle and restored to the original state when
reactivated. Reactivation state forces the need to remember the original
state information before going to low-leakage standby state. This requires
special latches, thereby increasing the area of the circuit by about five
times in the worst case (Temel et al 2004). Also, the amount of time for
which the unit remains in idle state should be long enough so that the
dynamic power consumed in forcing the circuit to low-leakage state and
the leakage power dissipated in the standby state together was less than the
leakage power without the technique.
The use of multiple threshold voltage CMOS (MTCMOS)

technology for leakage control is described by Kao et al (2001) and
Gopalakrishnan et al (2003). The transistors of the gates are at low
threshold voltage and the ground was connected to the gate through a high-
threshold voltage NMOS gating transistor. The logical function of a gating
transistor is similar to that of a sleep transistor. The existence of reverse
conduction paths tend to reduce the noise margin or in the worst case may
result in complete failure of the gate. Moreover, there was a performance
penalty since high-threshold transistors appear in series with all the
switching current paths.
A variation of MTCMOS technique was the Dual Vt technique,

which was used the transistors with two different threshold voltages.
Low-threshold transistors were used for the gates on the critical path and
high-threshold transistors are used for those not in the critical path
(Wei et al 1998 and Sundarajan et al 1999).
37
In both MTCMOS and Dual Vt methods, additional mask layers

for each value of threshold voltage were required for fabricating the
transistors selectively according to their assigned threshold voltage values.
This makes the fabrication process complex. In addition to these
limitations, the techniques were discussed above suffer from turning-on
latency, that is, when the idle subsections of the circuit are reactivated, they
cannot be used immediately because some time was needed before the sub
circuit returns to its normal operating condition. The latency for power
gating was typically a few cycles, and for Dual Vt technology, was much
higher. Hence these techniques were not effective in controlling the
leakage power when the circuit was in active state.
Narendra et al (2001) stated the concept of forced stacks for

leakage control. Forced stacking introduces an additional transistor for
every input of the gate in both N and P networks. This ensures that two
transistors were OFF instead of one for every OFF-input of the gate and
hence makes a significant savings on the leakage current. However, the
loading requirements for each input introduced by the forced stacking
reduce the drive current of the gate significantly. This results in a
detrimental impact on the speed of the circuit.
Johnson et al (2002) explained the blend of sleep transistors and

the stacking effects were used to reduce leakage power. This method
identifies a circuit input vector for which the leakage current of the circuit
was the lowest possible. The sleep signal controlled transistors were
inserted away from the critical path where only one transistor was OFF
when low-leakage input vector was applied to the circuit. Hence, this
technique was input-vector dependent. Moreover, as this technique uses
sleep transistors, it needs additional hardware for controlling them.
38
This additional hardware consumes power in both idle and active states of
the circuit.
A circuit technique was proposed by Johnson et al (2002) for

simultaneously reducing the sub threshold and gate oxide leakage power
consumption in domino logic circuits. Only p-channel sleep transistors and
a dual-threshold voltage CMOS technology are utilized to place an idle
domino logic circuit into a low leakage state. Sleep transistors were added
to the dynamic nodes in order to reduce the sub - threshold leakage current
by strongly turning off all of the high-threshold voltage transistors.
Similarly, the sleep switches added to the output nodes suppress the
voltages across the gate insulating layers of the transistors in the fan-out
gates, thereby minimizing the gate tunneling current (Zhiyu Liu et al
2007).
Feature size scaling in MOSFETs requires reducing the supply

and threshold voltages. The lowering of threshold voltages led in to an
exponential increase in the sub - threshold leakage current. Several circuit
techniques based on multiple threshold voltage multiple Vt CMOS
technologies were described in the literature for reducing the sub threshold
leakage current (Allam et al 2000 and Kursun et al 2004). The effect of
these multiple CMOS circuit techniques on the gate oxide leakage current
characteristics however has not been explored until recently.
The exponential increase in leakage power due to technology

scaling has made power gating an attractive design choice for low-power
applications. This design style in large combinational circuit blocks and
latch-to-latch data paths introduce a novel power gating approach to yield
an improved power-performance tradeoff. A multiple sleep mode power
gating technique represents a different point in the wake-up overhead.
39
The high wake-up latency and wake-up power penalty of traditional power
gating limit its application to large stretches of inactivity. The multiple-
mode feature allowed a processor to enter power saving modes more
frequently. Hence, it resulted in enhanced leakage savings (Harmander
Singh et al 2007).
To maintain circuit performance while scaling Vdd the threshold

voltage of the device Vt was also reduced. However, this causes the sub
threshold current to increase exponentially since it was exponentially
dependent on Vt (Borkar 2004). In addition to the sub threshold leakage
current, gate tunneling current also increased due to the scaling of gate
oxide thickness. Each new technology generation results in nearly a 30 x
increase in gate leakage (Bernstein et al 2003). Hence, it has become
extremely important to develop design techniques to reduce static power
dissipation during periods of inactivity.
Multi threshold CMOS (MTCMOS) was widely used to

significantly suppress leakage currents in standby mode. In this technique,
a sleep transistor was added between the actual ground rail and circuit
ground (Tschanz et al 2003). This device was turned off in the sleep mode
to cutoff the leakage path through the stack effect. The MTCMOS method
has become very popular and there has been significant research towards
optimizing its benefits. The technique was effective because gating the
power supply causes the virtual ground rail to charge-up to a steady-state
value close to Vdd, strongly suppressing leakage current (Kawaguchi et al
2000 and Anis et al 2002).
The wake-up latency and power penalty associated with the

power gating technique limit the overall leakage savings by limiting how
often a logic block can go in and out of the sleep mode. Thus, it seems
reasonable to investigate multiple sleep modes that tradeoff wake up
40
penalty for leakage savings. During a stretch of inactivity, the processor

can go into one of the intermediate sleep modes as determined by the
wake-up overhead and save power without degrading performance
(Kim et al 2004). Unlike conventional power gating, multiple sleep mode
capability could also provide an option of a state-retentive mode to enable
power savings during inactive periods while preserving the state of the
circuit. However, the concept of more than one low-power mode was not
entirely new.
A circuit for intermediate power saving mode was proposed by

(Kim et al 1998). In this reference, the authors proposed using a PMOS
device in parallel with the NMOS footer. In the intermediate mode, the
PMOS device was turned-on while the NMOS footer was off. This holds
the virtual ground rail potential at the threshold voltage of the PFET.
However, this approach allows only one intermediate mode and the virtual
ground rail potential of the intermediate mode was set by the threshold
voltage of the PMOS device and cannot be arbitrarily controlled.
Previous work has focused on the energy-delay product of

timing elements (TEs), but real designs include many TEs that were not on
the critical path and this timing slack can be exploited by using slower,
lower energy TEs. Instead of simultaneously optimizing delay and energy,
critical TEs should be optimized to reduce delay and noncritical TEs
should be optimized to reduce energy.
Hamada et al (2006) used different structures for critical and

noncritical flip-flops in the context of a logic synthesis design flow.
Previous works often measured energy consumption using a limited set of
data patterns with the clock switching every cycle. But real designs have a
wide variation in clock and data activity across different TE instances
(Nikolic et al 2003 and Stojanovic et al 2004).
41
Low-power microprocessors make extensive use of clock gating

resulting in many TEs whose energy consumption was dominated by input
data transitions rather than clock transitions. Other TEs, in contrast, have
negligible data input activity but are clocked every cycle

(Gonzalez et al 2006).
Significant energy savings can be achieved when each TE

instance was selected from a heterogeneous library of designs, each tuned
to a different operating regime. The detailed energy analysis to compare a
number of TE designs, include designs that exploit particular combinations
of signal activity and timing slack. The statistics on TE activity in a
pipelined MIPS microprocessor running SPECint95 benchmarks and show
that activity-sensitive TE selection can reduce total TE energy without
increasing cycle time.
Chen et al (2007) explained the experience of applying an

advanced version of former spurious power suppression technique (SPST)
on multipliers for high-speed and low-power purposes. To filter out the
useless switching power, there were two approaches, i.e. using registers
and using AND gates, to assert the data signals of multipliers after the data
transition. The SPST has been applied on both the modified Booth decoder
and the compression tree of multipliers to enlarge the power reduction.
Lowering down the power consumption and enhancing the

processing performance of the circuit designs were undoubtedly the two
important design challenges of wireless multimedia and digital signal
processor (DSP) applications, in which multiplications were frequently
used for key computations, such as fast Fourier transform (FFT), Discrete
Cosine Transform (DCT), quantization and filtering. To save significant
42
power consumption of a VLSI design, it was a good direction to reduce its

dynamic power that is the major part of total power dissipation.
The existing works by Chen et al (2003) stated that reducing the

dynamic power consumption by minimizing the switched capacitance.
The design proposes a concept called partially guarded computation (PGC)
which divides the arithmetic units, e.g. adders and multipliers, into two
parts. It turns off the unused part to minimize the power consumption.
The reported results show that the PGC can reduce power consumption by
10% to 20% in an array multiplier with 20% to 28% area overheads in
speech related applications.
Chen et al (2002) proposed a 32-bit 2’s complement adder

equipping a master-stage flip-flop and a slave-stage flip-flop for both
operands of the adder, a Dynamic-Range Determination (DRD) unit and a
sign-extension unit. This design tends to reduce the power dissipation of
conventional adders for multimedia applications. Additionally, the design
presented a multiplier using the DRD unit to select the input operand with
a smaller effective dynamic range to yield the Booth codes. The direct
report of the multiplier can save over 10% power dissipation than
conventional ones.
Benini et al (2000) posited the technique for glitching power

minimization by replacing some existing gates with functionally equivalent
ones that can be “frozen” by asserting a control signal. This technique can
be applied to replace layout-level descriptions and guarantees predictable
results. However, it can only achieve savings of 6.3% in total power
dissipation since it operates in the layout-level environment which is
tightly restricted.
43
Henzler et al (2004) proposed a double-switch circuit-block

switch scheme capable of reducing power dissipation during down time by
shortening the settling time after reactivation. The drawbacks of the
scheme actually induced the necessity for two independent virtual power
rails and the necessity for two additional transistors for switching each cell.
Huang et al (2003) presented the arithmetic details about the signal gating
schemes and illustrated 10% to 25% power reduction for adders.
Current Mode Logic (CML) has some advantages over voltage

mode MVL. Implementing voltage-mode MVL requires partitioning the
total voltage range, zero to supply voltage into many discrete levels.
Thus, the dynamic range and the noise margin are highly dependent on the
supply voltage. In current-mode circuits, currents were usually defined to
have logical levels that were integer multiples of a reference current unit.
Current can be copied, scaled and algebraically sign-changed with a simple
current mirror. The frequently used linear sum operation can be performed
simply by wiring, resulting in a reduced number of active devices in the
circuit (Dubrova 2003).
In digital circuit application 3-valued and 4-valued circuits have

been defined for implementing 2-input and 3-input adders with
borrow/carry save redundant number representation. This was the number
representation used by (Kawahito et al 2002). They were based on 3-valued
to binary converter (3-BC) and 4-valued to binary converter (4-BC)
circuits.
The increasing demand for the high fidelity portable devices has
laid emphasis on the development of low power and high performance
systems. In the next generation processors, the low power design has to be
incorporated into fundamental computation units, such as multipliers.
44
The characterization and optimization of such low power multipliers will

aid in the comparison and choice of multiplier modules in system design.
Liu et al (2004) enumerated the statistics of multiplier operands

and identified two characteristics of their distribution that have important
consequences for the design of low power multipliers. Most inputs are
positive, and most inputs have a small number of significant bits.
These characteristics were exploited in the design of a multiplier that
employed three techniques to minimize power consumption: asynchronous
control, a radix-2 algorithm and split registers. The power savings resulting
from the use of these techniques were 21%, 23% and 12% respectively
when compared to a synchronous multiplier using a radix-4 modified
Booth's algorithm with unified registers.
The ALU is the core of a CPU in a computer and the adder cell
was the elementary unit of an ALU. The adder is satisfied by the area,
power and speed requirements. Some of the conventional types of adders
are ripple-carry adder, carry-look ahead adder, carry-skip adder and
Manchester carry chain adder (Harrison et al 2005). The delay in an adder
is dominated by the carry chain. Carry chain analysis must consider
transistor and wiring delays.
Lehman et al (1961) initially proposed Carry-Skip-Adder (CSK)

to improve the speed of a Ripple-Carry-Adder (RCA) with only a minimal
overhead in number of gates. The speed of CSK was improved in the
Variable- Block-Adder (VBA) where the blocks sizes were varied to
optimally balance the delay between the ripple and carry paths.
This improvement in speed required only a small increase in the number of
gates.
45
Bedrij (1992) and Tyagi et al (1993) explained the Carry Select

adder and each block were evaluated conditionally. When carry-in to a
block becomes available, it conditionally selects the carry-out and sum-bits
of the block. The critical path of CSA was either the ripple carry path in the
largest block or in the worst case carry select path. The optimal block
sizing was chosen such that the delay of the ripple and carry select paths
were balanced. The addition can be performed in less stages than VBA.
However this comes at the expense of increased branching and more logic
gates are required for conditional computation.
A modification of VBA was introduced by Chirca et al (2004) to

further minimize delay by using carry-look ahead logic within the blocks.
Group sizing is chosen to balance the delay of each path within the adder.
This balancing of delay intends to reduce power consumption by
eliminating spurious glitch transitions that occur when the delay of the
paths are non-equal. A 32-bit implementation has seven logic levels, each
with complex CMOS gates. The results indicated speed comparable to
high-performance 32-bit adders. However, it was not shown how CSK
compared to the adders when energy is taken into account.
Sparse carry-tree adder architecture was proposed by

Mutoh (2001) that reduces carry-tree density through the use of 4-bit
conditional sum computation. Carry signals are generated for every fourth
bit (C3, C7, C23 and C27). This is opposed to Kogge-Stone adder
architecture (KS) presented by Kogge et al (1973) which generates carry
signal for every bit.
Digital circuits made up of classical gates dissipate significant

amount of energy as bits were erased during logic operations. The use of
reversible logic gates to implement such circuits can significantly reduce
46
the power consumed and the various aspects of reversible computing and
reversible logic gates. Furthermore, it was important to design a reversible
implementation of eight bit arithmetic and logic unit and it was optimal in
terms of number of gates used and number of garbage outputs produced
(Keskar et al 2011).
2.1 SUMMARY
There are various works reported in the literature review for

design of adders, multipliers, flip-flops, FIR filter and Arithmetic and
Logic Unit. Optimization of area, power and delay is important for all
digital circuits. The circuits discussed in literature have advantages and
disadvantages. Some circuits have high speed operation but consume more
power. A few circuits consume less power, but they are unable to work in
low voltages. A few designs produce proper logic output but occupy more
area.
2.2 OBJECTIVE
Optimization of power, delay and area is very important for all

digital circuits and investigation is carried out to tradeoff those parameters.
Optimization techniques required for digital circuits to obtain the high
speed data processing is proposed. The proposed techniques have been
implemented with adder, multiplier, FIR filter and ALU and their
performance is compared with existing techniques. The implementation of
optimized techniques and results are discussed in detail in the forthcoming
chapters.

07 Chapter 2

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

07 Chapter 2

Diunggah oleh

Hak Cipta:

Format Tersedia

28

Energy conversion is needed to represent a change in signal

The energy conversion from electric field to magnetic field and

The new research on the energy recovery CMOS circuit should

In specific, there are two research works by Shalem et al (1999)

Sub-threshold circuit design involves scaling the supply voltage

SERF design uses only 10 transistors to implement a full adder.

Table 2.1 Output of SERF adder with various input combination

The output of SERF adder with various input combination is

If A is at logic “1”, some current leaks to the Cout node which

A detailed analysis and synthesis procedures for pass networks

A CMOS transmission gate passes both a strong '1' and a strong

A number of different XOR and XNOR circuit realizations

Dhireesha Kudithipudi et al (2005) explained the comparative

A design called Gate-Diffusion Input (GDI) was proposed by

makes it difficult to realize a feasible chip. A modified GDI scheme has

Several designs of low power adder cells can be found in the

Shams et al (2004) have proposed dual value logic (DVL) full

Numerous methods for leakage power control have been

The use of multiple threshold voltage CMOS (MTCMOS)

A variation of MTCMOS technique was the Dual Vt technique,

In both MTCMOS and Dual Vt methods, additional mask layers

Narendra et al (2001) stated the concept of forced stacks for

Johnson et al (2002) explained the blend of sleep transistors and

A circuit technique was proposed by Johnson et al (2002) for

Feature size scaling in MOSFETs requires reducing the supply

The exponential increase in leakage power due to technology

To maintain circuit performance while scaling Vdd the threshold

Multi threshold CMOS (MTCMOS) was widely used to

The wake-up latency and power penalty associated with the

penalty for leakage savings. During a stretch of inactivity, the processor

A circuit for intermediate power saving mode was proposed by

Previous work has focused on the energy-delay product of

Hamada et al (2006) used different structures for critical and

Low-power microprocessors make extensive use of clock gating

negligible data input activity but are clocked every cycle

Significant energy savings can be achieved when each TE

Chen et al (2007) explained the experience of applying an

Lowering down the power consumption and enhancing the

power consumption of a VLSI design, it was a good direction to reduce its

The existing works by Chen et al (2003) stated that reducing the

Chen et al (2002) proposed a 32-bit 2’s complement adder

Benini et al (2000) posited the technique for glitching power

Henzler et al (2004) proposed a double-switch circuit-block

Current Mode Logic (CML) has some advantages over voltage

In digital circuit application 3-valued and 4-valued circuits have

The characterization and optimization of such low power multipliers will

Liu et al (2004) enumerated the statistics of multiplier operands

Lehman et al (1961) initially proposed Carry-Skip-Adder (CSK)

Bedrij (1992) and Tyagi et al (1993) explained the Carry Select

A modification of VBA was introduced by Chirca et al (2004) to

Sparse carry-tree adder architecture was proposed by

Digital circuits made up of classical gates dissipate significant

There are various works reported in the literature review for

Optimization of power, delay and area is very important for all

Anda mungkin juga menyukai