Anda di halaman 1dari 44

EC6601 VLSI Design

Department of Electronics and Communication


Engineering

QUESTION BANK
VI-SEMESTER
( Reg-2013)

AUTHORS
1. Mr. M. Yuvaraj,
Assistant Professor, Dept. of ECE, Agni College of Technology.
2. Mrs. A Shifana Parween,
Assistant Professor, Dept. of ECE, Agni College of Technology.
3. Mr. G. Laxmanaa ,
Assistant Professor, Dept. of ECE, Agni College of Technology.

1 Department of ECE, Agni College of Technology


EC6601 VLSI Design

1. What are the different operating regions foe an MOS transistor?


_ Cutoff region
_ Non- Saturated Region
_ Saturated Region

2. What is Channel-length modulation?


The current between drain and source terminals is constant and independent of the
applied voltage over the terminals. This is not entirely correct. The effective length of the
conductive channel is actually modulated by the applied VDS, increasing VDS causes the
depletion region at the drain junction to grow, reducing the length of the effective
channel.

3. Define Threshold voltage in CMOS?


The Threshold voltage, VT for a MOS transistor can be defined as the voltage applied
between the gate and the source of the MOS transistor below which the drain to source
current, IDS effectively drops to zero.

4. What is Body effect?


The threshold volatge VT is not a constant w. r. to the voltage difference between the
substrate and the source of MOS transistor. This effect is called substrate-bias effect or body
effect.

5. What is Scaling?
Proportional adjustment of the dimensions of an electronic device while maintaining the
electrical properties of the device, results in a device either larger or smaller than the un-
scaled device

6. What is Elmore’s Constant.


In general, most circuits of interest can be represented as an RC tree, i.e., an RC
circuit with no loops. The root of the tree is the voltage source and the leaves are the
capacitors at the ends of the branches. The Elmore delay model [Elmore48] estimates
the delay from a source switching to one of the leaf nodes changing as the sum over
each node I of the capacitance Cion the node, multiplied by the effective resistance
Rison the shared path from the source to the node and the leaf. Application of Elmore
delay is best illustrated through examples.

2 Department of ECE, Agni College of Technology


EC6601 VLSI Design

7. Define Static CMOS logic.


The principle of static CMOS logic is that the output is connected to ground through
an n-block and to VDD through a dual p-block. Without changes of the inputs this
gate consumes only the leakage currents of some transistors. When it is switching it
draws an additional current which is needed to charge and discharge the internal
capacitances. and the load. Although the gate's logic function is ideally independent
of the transistor channel widths, they determine the dynamic behavior essentially:
wider transistors will switch a capacitive load faster, but they will also cause a larger
input capacitance of the gate. Unless otherwise noted, minimum-width and, of course,
minimum-channel-length transistors are assumed. For given capacitances the
transistors' on-state current Ion will limit the switching speed of the gate and,
consequently, the maximum clock frequency of a synchronous circuit.

8. Define Dynamic CMOS logic.


Dynamic logic is distinguished from so-called static logic in that dynamic logic uses
a clock signal in its implementation of combinational logic circuits. The usual use of
a clock signal is to synchronize transitions in sequential logic circuits. For most
implementations of combinational logic, a clock signal is not even needed.

3 Department of ECE, Agni College of Technology


EC6601 VLSI Design

9. What is meant by transmission gate?


A transmission gate, or analog switch, is defined as an electronic element that will
selectively block or pass a signal level from the input to the output. This solid-state
switch is comprised of a pMOS transistor and nMOS transistor. The control gates are
biased in a complementary manner so that both transistors are either on or off.A
transmission gate consists of an n-channel transistor and p-channel transistor with
separate gates and common source and drain.

10.What are the different types of power dissipation?


There are three types of power dissipation. They are
 Static power dissipation. Ps= leakage power * supply voltage.
 Dynamic power dissipation. Pd = CLV2dd fclk
 Short circuit power dissipation Psc = Imean * Vdd

11.What are synchronizers?


A synchronizer is a circuit that accepts an input that can change at arbitrary times and
produces an output aligned to the synchronizer’s clock. Because the input can change
dur- ing the synchronizer’s aperture, the synchronizer has a nonzero probability of
producing a metastable output

4 Department of ECE, Agni College of Technology


EC6601 VLSI Design

12. State bistability principle


A bistable circuit has two stable states. In absence of any triggering, the circuit remains
in a single state (assuming that the power supply remains applied to the circuit), and
hence remembers a value. A trigger pulse must be applied to change the state of the
circuit. Another common name for a bistable circuit is flip-flop (unfortunately, an edge-
triggered register is also referred to as a flip-flop).

13. Explain about C2MOS latch.


The dynamic latch of Figure 10.17(d) can also be drawn as a clocked tristate. Such a form is
sometimes called clocked CMOS (C2MOS) the output is driven through the nMOS and pMOS
working in parallel. C2MOS is slightly smaller because it eliminates two contacts.

14. What is meant by true single phase clocked register?


The True Single-Phase Clocked Register (TSPCR) uses a single clock
(without an inverse clock). The basic single-phase positive and negative latches
are shown in Figure 7.30. For the positive latch, when CLK is high, the latch is in
the transparent mode and corresponds to two cascaded inverters; the latch is non-
inverting, and propagates the input to the output. On the other hand, whenC LK =
0, both inverters are disabled, and the latch is in hold-mode. Only the pull-up
networks are still active, while the pull-down circuits are deactivated. As a result
of the dual-stage approach, no signal can ever propagate from the input of the
latch to the output in thism ode. A register can be constructed by cascading
positive and negative latches.

15. Define pipelining.


Pipelining is a popular design technique often used to accelerate the operation of
the datapaths in digital processors. The idea is easily explained with the example of
Figure 7.40a. The goal of the presented circuit is to computelog(|a - b|), where both a and

5 Department of ECE, Agni College of Technology


EC6601 VLSI Design

b represent streams of numbers, that is, the computation must be performed on a large set
of input values.

16.What is meant by Datapath circuits?


A datapath is a collection of functional units, such as arithmetic logic
units or multipliers, that perform data processing operations, registers, and
buses.[1] Along with the control unit it composes the central processing unit (CPU).

17.How CLA differ from RCA.


CLA RCA
The carry lookahead adder (CLA) solves In the ripple carry adder, the output is
the carry delay problem by calculating known after the carry generated by the
the carry signals in advance, based on the previous stage is produced. Thus, the sum
input signals. It is based on the fact that a of the most significant bit is only
carry signal will be generated in two available after the carry signal has rippled
cases: (1) when both bits ai and bi are through the adder from the least
1, or (2) when one of the two bits is and significant stage to the most significant
the carry-in is 1 stage. As a result, the final sum and carry
bits will be valid after a considerable
delay.

18.List out different high speed adders.


 Carry look ahead adder
 Carry skip adder
 Carry save adder
 Carry select adder
 Carry bypass adder

19.Define Accumulator.
An accumulator is a register for short-term, intermediate storage of arithmetic and
logic data in a computer's CPU (central processing unit). The term "accumulator" is
rarely used in reference to contemporary CPUs, having been replaced around the turn
of the millennium by the term "register." In a modern computers, any register can
function as an accumulator.

6 Department of ECE, Agni College of Technology


EC6601 VLSI Design

20.Draw the generic block diagram of digital processor?

21. Write the design style classification?

The IC design style can be classified as

 Full custom Design ASICs


 Semi custom Design ASICs
o Standard Cell Design
o Gate Array Design
 Channeled Gate Array
 Channel less Gate Array
 Programmable ASICs
o PLDs
o FPGA
22.Differentiate between channeled & channel less gate array.

The channeled gate array channel less gate array


 The channeled gate array was the first • This channel less gate-array
to be developed . In a channeled gate architecture is now more widely used .
array space is left between the rows of The routing on a channelless gate array
transistors for wiring. uses rows of unused transistors.
 A channeled gate array is similar to a • The key difference between a channel
CBIC. Both use the rows of cells less gate array and channeled gate array
separated by channels used for is that there are no predefined areas set
interconnect. One difference is that the aside for routing between cells on a
space for interconnect between rows of channel less gate array. Instead we
cells are fixed in height in a channeled route over the top of the gate-array

7 Department of ECE, Agni College of Technology


EC6601 VLSI Design

gate array, whereas the space between devices. We can do this because we
rows of cells may be adjusted in a customize the contact layer that defines
CBIC the connections between metal 1, the
first layer of metal, and the transistors.

23. What is a FPGA?


A field programmable gate array (FPGA) is a programmable logic device that
supports implementation of relatively large logic circuits. FPGAs can be used to
implement a logic circuit with more than 20,000 gates whereas a CPLD can implement
circuits of upto about 20,000 equivalent gates.

24.What is an antifuse?
An antifuse is normally high resistance (>). On application of appropriate 100M
programming voltages, the antifuse is changed permanently to a low-resistance structure
(200-500).

8 Department of ECE, Agni College of Technology


EC6601 VLSI Design

1. A. Discuss DC transfer characteristics of the CMOS. (8)

DC transfer characteristics
Digital circuits are merely analog circuits used over a special portion of their range. The DC
transfer characteristics of a circuit relate the output voltage to the input voltage, assuming the
input changes slowly enough that capacitances have plenty of time to charge or discharge.
Specific ranges of input and output voltages are defined as valid 0 and 1 logic levels. This
section explores the DC transfer characteristics of CMOS gates and pass transistors.

Static CMOS Inverter DC Characteristics


Let us derive the DC transfer function (Vout vs. Vin) for the static CMOS inverter shownin
Figure 2.25. We begin with Table 2.2, which outlines various regions of operation forthe n- and
p-transistors. In this table, Vtnis the threshold voltage of the n-channel device, and Vtpis the
threshold voltage of the p-channel device. Note that Vtpis negative. The equations are given both
in terms of Vgs/Vdsand Vin /Vout. As the source of the nMOS transistor is grounded, Vgsn=
Vin and Vdsn= Vout. As the source of the pMOS transistoristied to VDD, Vgsp= Vin – VDD
and Vdsp= Vout – VDD.The objective is to find the variation in output voltage (Vout) as a
function of the inputvoltage (Vin). This may be done graphically, analytically (see Exercise
2.16), or through simulation [Carr72]. Given Vin, we must find Vout subject to the constraint
that Idsn=|Idsp|. For simplicity, we assume Vtp= –Vtnand that the pMOS transistor is 2–3 times
as wide as the nMOS transistor so �n = �p. We relax this assumption in Section 2.5.2.We
comnce with the graphical representation of the simple algebraic equations described by EQ
(2.10) for the two transistors shown in Figure 2.26(a). The plot shows Idsn and Idspin terms of
Vdsnand Vdspfor various values of Vgsn and Vgsp. Figure 2.26(b)shows the same plot of
Idsnand |Idsp| now in terms of Vout for various values of Vin. The possible operating points of
the inverter, marked with dots, are the values of Vout where Idsn= |Idsp| for a given value of
Vin. These operating points are plotted on Vout vs. Vin axes in Figure 2.26(c) to show the
inverter DC transfer characteristics. The supply current IDD= Idsn= |Idsp| is also plotted against
Vin in Figure 2.26(d) showing that both transistors are momentarily ON as Vin passes through
voltages between GND and VDD, resulting ina pulse of current drawn from the power supply.

The operation of the CMOS inverter can be divided into five regions indicated on Figure2.26(c).
The state of each transistor in each region is shown in Table 2.3. In region A, then MOS
transistor is OFF so the pMOS transistor pulls the output to VDD. In region B, then MOS
transistor starts to turn ON, pulling the output down. In region C, both transistors are in
saturation. Notice that ideal transistors are only in region C for Vin = VDD/2 and that the slope
of the transfer curve in this example is – in this region, corresponding to infinite gain. Real
transistors have finite output resistances on account of channel length modulation, described in
Section 2.4.2, and thus have finite slopes over a broader region C. In region D, the pMOS
transistor is partially ON and in region E, it is completely

Region Condition p-device n-device Output


A 0 <= Vin<Vtnlinear cutoff Vout= VDD
B Vtn<= Vin<VDD/2 linear saturated Vout>VDD /2
C Vin = VDD/2 saturated saturatedVout drops sharply

9 Department of ECE, Agni College of Technology


EC6601 VLSI Design

D VDD /2 <Vin<= VDD – |Vtp| saturated linear Vout<VDD/2


E Vin >VDD – |Vtp| cutoff linear Vout= 0

2. Explain in detail about current voltage characteristics of MOS transistor.

I-V Characteristics
When familiarizing yourself with a new process, a starting point is to plot the current voltage(I-
V) characteristics. Although digital designers seldom make calculations directly from these plots,
it is helpful to know the ON current of nMOS and pMOS transistors, how severely velocity-
saturated the process is, how the current rolls off below threshold, how the devices are affected
by DIBL and body effect, and so forth. These plots are made with DC sweeps, as discussed in
Section 8.2.2. Each transistor is 1 �m wide in a representative65 nm process at 70 °C with VDD
= 1.0 V. Figure 8.16 shows nMOS characteristics and Figure 8.17 shows pMOS characteristics.
Figure 8.16(a) plots Ids vs. Vdsat various values fVgs, as was done in Figure 8.5. The saturation
current would ideally increase quadratically with Vgs– Vt, but in this plot it shows closer to a
linear dependence, indicating that the nMOS transistor is severely velocity saturated(� closer to
1 than 2 in the �-power model). The significant increase in saturation current with Vdsis caused
by channel-length modulation. Figure 8.16(b) makes a similar plot for a device with a drawn
channel length of twice minimum. The current drops by less than a factor of two because it
experiences less velocity saturation. The current is slightly flatter in saturation because channel-
length modulation has less impact at longer channel lengths.Figure 8.16(c) plots Ids vs. Vgson a
semilogarithmi scale for Vds= 0.1 V and 1.0 V.The straight line at low Vgs indicates that the
currentrolls off exponentially below threshold. The difference in subthreshold leakage at the
varying drain voltage reflects the effects

10 Department of ECE, Agni College of Technology


EC6601 VLSI Design

11 Department of ECE, Agni College of Technology


EC6601 VLSI Design

2.a. Discuss the techniques to reduce switching activity in a static and dynamic
CMOS circuits.
Circuit Families
Static CMOS circuits with complementary nMOS pulldown and pMOS pullup network
sare used for the vast majority of logic gates in integrated circuits. They have good noise
margins, and are fast, low power, insensitive to device variations, easy to design, widely
supported by CAD tools, and readily available in standard cell libraries. When noise does exceed
the margins, the gate delay increases because of the glitch, but the gate eventually will settle to
the correct answer. Most design teams now use static CMOS exclusively for combinational
logic. This section begins with a number of techniques for optimizing static CMOS circuits.
Nevertheless, performance or area constraints occasionally dictate the need for other circuit
families. The most important alternative is dynamic circuits. However, we begin by considering
ratioed circuits, which are simpler and offer a helpful conceptual transition between static and
dynamic. We also consider pass transistors, which had their zenith in the 1990s for general-
purpose logic and still appear in specialized applications.

Static CMOS

Designers accustomed to AND and OR functions must learn to think in terms of NAND
and NOR to take advantage of static CMOS. In manual circuit design, this is often done through
bubble pushing. Compound gates are particularly useful to perform complex functions with
relatively low logical efforts. When a particular input is known to be latest, the gate can be
optimized to favor that input. Similarly, when either the rising or falling edge is known to be
more critical, the gate can be optimized to favor that edge. We have focused on building gates
with equal rising and falling delays; however, using smaller pMOS transistors can reduce power,
area, and delay. In processes with multiple threshold voltages, multiple flavors of gates can be
constructed with different speed/leakage power trade-offs

Bubble Pushing CMOS stages are inherently inverting, so AND and OR functions must be built
from NAND and NOR gates. DeMorgan’s law helps with this conversion: In general, logical
effort of compound gates can be different for different inputs. Figure 9.4 shows how logical
efforts can be estimated for the AOI21, AOI22, and a more complex compound AOI gate. The
transistor widths are chosen to give the same drive as a unit inverter. The logical effort of each
input is the ratio of the input capacitance of that input to the input capacitance of the inverter. For
the AOI21 gate, this means the logical effort is slightly lower for the OR terminal (C) than for
the two AND terminals (A, B). The parasitic delay is crudely estimated from the total diffusion
capacitance on the output node by summing the sizes of the transistors attached to the output.
These relations are illustrated graphically in Figure 9.1. A NAND gate is equivalent to an OR of
inverted inputs. A NOR gate is equivalent to an AND of inverted inputs. The same relationship
applies to gates with more inputs. Switching between these representations is easy to do on a
whiteboard and is often called bubble pushing.

12 Department of ECE, Agni College of Technology


EC6601 VLSI Design

Dynamic Circuits
Ratioed circuits reduce the input capacitance by replacing the pMOS transistors connected to the
inputs with a single resistive pullup. The drawbacks of ratioed circuits include slow rising
transitions, contention on the falling transitions, static power dissipation, and a nonzero VOL.
dynamic circuits circumvent these drawbacks by using a clocked pullup transistor rather than a
pMOS that is always ON. Figure 9.21 compares (a) static CMOS, (b) pseudo-nMOS, and (c)
dynamic inverters. Dynamic circuit operation is divided into two modes, as shown in Figure
9.22. During precharge, the clock � is 0, so the clocked pMOS is ON and initializes the output Y
high. During evaluation, the clock is 1 and the clocked pMOS turns OFF. The output may
remain high or may be discharged low through the pulldown network. Dynamic circuits are the
fastest commonly used circuit family because they have lower input capacitance and no
contention during switching. They also have zero static power dissipation. However, they require
careful clocking, consume significant dynamic power, and are sensitive to noise during
evaluation. Clocking of dynamic circuits will be discussed in much more detail in Section 10.5.
In Figure 9.21(c), if the input A is 1 during precharge, contention will take
place because both the pMOS and nMOS transistors will be ON. When the input cannot be
guaranteed to be 0 during precharge, an extra clocked evaluation transistor can be added to the
bottom of the nMOS stack to avoid contention as shown in Figure 9.23. The extra transistor is
sometimes called a foot.

Figure 9.24 shows generic footed and unfootedgates.4 Figure 9.25 estimates the falling logical
effort of both footed and unfooted dynamic gates. As usual, the pulldown transistors’ widths are
chosen to give unit resistance. Precharge occurs while the gate is idle and often may take place
more slowly. Therefore, the precharge transistor width is chosen for twice unit resistance. This
reduces the capacitive load on the clock and the parasitic capacitance at the expense of greater
rising delays. We see that the logical efforts are very low. Footed gates have higher logical effort

13 Department of ECE, Agni College of Technology


EC6601 VLSI Design

than their unfooted counterparts but are still an improvement over static logic. In practice, the
logical effort of footed gates is better than predicted because velocity saturation means series
nMOS transistors have less resistance than we have estimated. Moreover, logical efforts are also
slightly better than predicted because there is no contention between nMOS and pMOS
transistors during the input transition. The size of the foot can be increased relative to the other
nMOS transistors to reduce logical effort of the other inputs at the expense of greater clock
loading. Like pseudo-nMOS gates, dynamic gates are particularly well suited to wide NOR
functions or multiplexers because the logical effort is independent of the number of inputs. Of
course, the parasitic delay does increase with the number of inputs because there is more
diffusion capacitance on the output node. Characterizing the logical effort and parasitic delay of
dynamic gates is tricky because the output tends to fall much faster than the input
rises, leading to potentially misleading dependence of propagation delay on fanout
[Sutherland99]. A fundamental difficulty with dynamic circuits is the monotonicity requirement.
While a dynamic gate is in evaluation, the inputs must be monotonically rising. That is, the input
can start LOW and remain LOW, start LOW and rise HIGH, start HIGH and remain HIGH, but
not start HIGH and fall LOW. Figure 9.26 shows waveforms for a footed dynamic inverter in
which the input violates monotonicity. During precharge, the output is pulled HIGH. When the
clock rises, the input is HIGH so the output is discharged LOW through the pulldown network,
as you would want to have happen in an inverter. The input later falls LOW, turning off the
pulldown network. However, the precharge transistor is also OFF so the output floats, staying
LOW rather than rising as it would in a normal inverter. The output will remain low until the
next precharge step. In summary, the inputs must be monotonically rising for the dynamic gate to
compute the correct function. Unfortunately, the output of a dynamic gate begins HIGH and
monotonically falls LOW during evaluation. This monotonically falling output X is not a suitable
input to asecond dynamic gate expecting monotonically rising signals, as shown in Figure 9.27.
Dynamic gates sharing the same clock cannot be directly connected

14 Department of ECE, Agni College of Technology


EC6601 VLSI Design

2.b. Explain the various Power dissipation in CMOS circuits.

Sources of Power Dissipation


Power dissipation in CMOS circuits comes from two components:
_ Dynamic dissipation due to
○ charging and discharging load capacitances as gates switch
○ “short-circuit” current while both pMOS and nMOS stacks are partially ON
_ Static dissipation due to
○ subthreshold leakage through OFF transistors
○ gate leakage through gate dielectric
○ junction leakage from source/drain diffusions
○ contention current in ratioed circuits (see Section 9.2.2)
Putting this together gives the total power

of a circuit

P dynamic + P switching =P short

circuit

Power can also be considered in active, standby, and sleep modes. Active power is the power
consumed while the chip is doing useful work. It is usually dominated by Pswitching. Standby
power is the power consumed while the chip is idle. If clocks are stopped and ratioed circuits are
disabled, the standby power is set by leakage. In sleep mode, the supplies to unneeded circuits
are turned off to eliminate leakage. This drastically reduces the sleep power required, but the
chip requires time and energy to wake up so sleeping is only viable if the chip will idle for long
enough.[Gonzalez96] found that roughly one-third of microprocessor power is spent on the
clock, another third on memories, and the remaining third on logic and wires. In nanometer
technologies, nearly one-third of the power is leakage. High-speed I/O contributes growing
component too. For example, Figure 5.6 shows the active power consumption of Sun’s 8-core 84
W Niagra2 processor [Nawathe08]. The cores and other components collectively account for
clock, logic, and wires. The next sections investigate how to estimate and minimize each of these
components of power.

Dynamic Power
Dynamic power consists mostly of the switching power, given in EQ (5.10). The supply voltage
VDD and frequency f are readily known by the designer. To estimate this power, one can
consider each node of the circuit. The capacitance of the node is the sum of the gate, diffusion,
and wire capacitances on the node. The activity factor can be estimated using techniques
described in Section 5.2.1 or measured from logic simulations. The effective capacitance of the
node is its true capacitance multiplied by the activity factor. The switching power depends on the

15 Department of ECE, Agni College of Technology


EC6601 VLSI Design

sum of the effective capacitances of all the nodes. Activity factors can be heavily dependent on
the particular task being executed. For example, a processor in a cell phone will use more ower
while running video games than while displaying a calendar. CAD tools do a fine job of power
estimation when given a realistic workload. Low power design involves considering and
reducing each of the terms in switching power.

As VDD is a quadratic term, it is good to select the minimum VDD that can support the required
frequency of operation. Likewise, we choose the lowest frequency of operation that achieves the
desired end performance. The activity factor is mainly reduced by putting unused blocks to sleep.
Finally, the circuit may be optimized to reduce the overall load capacitance of each section.
Dynamic power also includes a short-circuit power component cause by power rushing from
VDD to GND when both the pull up and pull down networks are partially ON while a transistor
switches. This is normally less than 10% of the whole, so it can be conservatively estimated by
adding 10% to the switching power. Switching power is consumed by delivering energy to
charge a load capacitance, then dumping this energy to GND. Intuitively, one might expect that
power could be saved by shuffling the energy around to where it is needed rather than just
dumping it. Resonant circuits, and adiabatic charge-recovering circuits [Maksimovic00, Sathe07]
seek to achieve such a goal. Unfortunately, all of these techniques add complexity that detracts

from the potential energy savings, and none have

3.a. Explain about Static CMOS circuit.

The most widely used logic style is static complementary CMOS. The static CMOS style
is really an extension of the static CMOS inverter to multiple inputs. In review, the primary
advantage of the CMOS structure is robustness (i.e, low sensitivity to noise), good performance,
and low power consumption with no static power dissipation. Most of those properties are
carried over to large fan-in logic gates implemented using a similar circuit topology. The
complementary CMOS circuit style falls under a broad class of logic circuits called static circuits
in which at every point in time (except during the switching transients), each gate output is
connected to either VDD or Vss via a low-resistance path. Also, the outputs of the gates assume

16 Department of ECE, Agni College of Technology


EC6601 VLSI Design

at all times the value of the Boolean function implemented by the circuit (ignoring, once again,
the transient effects during switching periods). This is in contrast to the dynamic circuit class,
which relies on temporary storage of signal values on the capacitance of high-impedance circuit
nodes. The latter approach has the advantage that the resulting gate is simpler and faster. Its
design and operation are however more involved and prone to failure due to an increased
sensitivity to noise. In this section, we sequentially address the design of various static circuit
flavors including complementary CMOS, ratioed logic (pseudo-NMOS and DCVSL), and
passtransistor logic. The issues of scaling to lower power supply voltages and threshold voltages
will also be dealt with.

Complementary CMOS
Concept
A static CMOS gate is a combination of two networks, called the pull-up network (PUN)
and the pull-down network (PDN) (Figure 6.2). The figure shows a generic N input logic gate
where all inputs are distributed to both the pull-up and pull-down networks. The function of the
PUN is to provide a connection between the output and VDD anytime the output of the logic gate
is meant to be 1 (based on the inputs). Similarly, the function of the PDN is to connect the
output to VSS when the output of the logic gate is meant to be 0. The PUN and PDN networks
are constructed in a mutually exclusive fashion such that one and only one of the networks is
conducting in steady state. In this way, once the transients have settled, a path always exists
between VDD and the output F, realizing a high output (“one”),
or, alternatively, between VSS and F for a low output (“zero”). This is equivalent to stating that
the output node is always a low-impedance node in steady state.

In constructing the PDN and PUN networks, the following observations should be kept in mind:

• A transistor can be thought of as a switch controlled by its gate signal. An NMOS switch is on
when the controlling signal is high and is off when the controlling signal is low. A PMOS
transistor acts as an inverse switch that is on when the controlling signal is low and off when the
controlling signal is high.
• The PDN is constructed using NMOS devices, while PMOS transistors are used in the PUN.
The primary reason for this choice is that NMOS transistors produce “strong zeros,” and PMOS

17 Department of ECE, Agni College of Technology


EC6601 VLSI Design

devices generate “strong ones”. To illustrate this, consider the examples shown in Figure 6.3. In
Figure 6.3a, the output capacitance is initially charged to VDD. Two possible discharge scenarios
are shown. An NMOS device pulls the output all the way down to GND, while a PMOS lowers
the output no further than |VTp| — the PMOS turns off at that point, and stops contributing
discharge current. NMOS transistors are hence the preferred devices in the PDN. Similarly, two
alternative approaches to charging up a capacitor, with the output initially at GND. A PMOS
switch succeeds in charging theoutput all the way to VDD, while the NMOS device fails to raise
the output above VDD-VTn. This explains why PMOS transistors are preferentially used in a
PUN. A set of construction rules can be derived to construct logic functions

NMOS devices connected in series corresponds to an AND function. With all the inputs high, the
series combination conducts and the value at one end of the chain is transferred to the other end.
Similarly, NMOS transistors connected in parallel represent an OR function. A conducting path
exists between the output and input terminal if at least one of the inputs is high. Using similar
arguments, construction rules for PMOS networks can be formulated. A series connection of
PMOS conducts if both inputs are low, representing a NOR function (A.B = A+B), while PMOS
transistors in parallel implement a NAND (A+B = A·B.• Using De Morgan’s theorems ((A + B) =
A·B and A·B = A + B), it can be shown that the pull-up and pull-down networks of a
complementary CMOS structure are dual networks. This means that a parallel connection of
transistors in the pull-up network corresponds to a series connection of the corresponding
devices in the pull-down network, and vice versa. Therefore, to construct a CMOS gate, one of
the networks (e.g., PDN) is implemented using combinations of series and parallel devices. The
other network (i.e., PUN) is obtained using duality principle by walking the hierarchy, replacing
series sub-nets with parallel sub-nets, and parallel sub-nets with series sub-nets. The complete
CMOS gate is constructed by combining the PDN with the PUN.

• The complementary gate is naturally inverting, implementing only functions such as NAND,
NOR, and XNOR. The realization of a non-inverting Boolean function (such as AND OR, or
XOR) in a single stage is not possible, and requires the addition of an extra inverter stage.
• The number of transistors required to implement an N-input logic gate is 2N.

Ratioed Logic
Concept
Ratioed logic is an attempt to reduce the number of transistors required to implement a
given logic function, at the cost of reduced robustness and extra power dissipation. The purpose

18 Department of ECE, Agni College of Technology


EC6601 VLSI Design

of the PUN in complementary CMOS is to provide a conditional path between VDD and the
output when the PDN is turned off. In ratioed logic, the entire PUN is replaced with a single
unconditional load device that pulls up the output for a high output. Instead of a combination of
active pull-down and pull-up networks, such a gate consists of an NMOS pull-down network that
realizes the logic function, and a simple load device. Figure 6.27b shows an example of ratioed
logic, which uses a grounded PMOS load and is referred to as a pseudo-NMOS gate.

The clear advantage of pseudo-NMOS is the reduced number of transistors (N+1 versus 2N for
complementary CMOS). The nominal high output voltage (VOH) for this gate is VDD since the
pull-down devices are turned off when the output is pulled high (assuming that VOL is below
VTn). On the other hand, the nominal low output voltage is

not 0 V since there is a fight between the devices in the PDN and the grounded PMOS load
device. This results in reduced noise margins and more importantly static power dissipation.

The sizing of the load device relative to the pull-down devices can be used to trade-off
parameters such a noise margin, propagation delay and power dissipation. Since the voltage
swing on the output and the overall functionality of the gate depends upon the ratio between the
NMOS and PMOS sizes, the circuit is called ratioed. This is in contrast to the ratioless logic
styles, such as complementary CMOS, where the low and high levels do not depend upon
transistor sizes. Computing the dc-transfer characteristic of the pseudo-NMOS proceeds along
paths
similar to those used for its complementary CMOS counterpart. The value of VOL is obtained by
equating the currents through the driver and load devices for Vin = VDD. At this operation point,
it is reasonable to assume that the NMOS device resides in linear mode (since the output should
ideally be close to 0V), while the PMOS load is saturated. In order to make VOL as small as
possible, the PMOS device should be sized much smaller than the NMOS pull-down devices.
Unfortunately, this has a negative impact on the propagation delay for charging up the output
node since the current provided by the PMOS device is limited.

19 Department of ECE, Agni College of Technology


EC6601 VLSI Design

A major disadvantage of the pseudo-NMOS gate is the static power that is dissipated when the
output is low through the direct current path that exists between VDD and GND. The static
power consumption in the low-output mode is easily derived

Pass-Transistor Logic
Pass-Transistor Basics
A popular and widely-used alternative to complementary CMOS is pass-transistor logic,
which attempts to reduce the number of transistors required to implement logic by allowing the
primary inputs to drive gate terminals as well as source/drain terminals This is in contrast to
logic families that we have studied so far, which only allow primary inputs to drive the gate
terminals of MOSFETS. shows an implementation of the AND function constructed that way,
using only NMOS transistors. In this gate, if the B input is high, the top transistor is turned on
and copies the input A to the output F. When B is low, the bottom pass transistor is turned on and
passes a 0. The switch driven by B seems to be redundant at first glance. Its presence is essential
to ensure that the gate is static, this is that a low-impedance path exists to the supply rails under
all circumstances, or, in this particular case, when B is low. The promise of this approach is that
fewer transistors are required to implement a given function. For example, the implementation of
the AND gate in Figure 6.33 requires 4 transistors (including the inverter required to invert B),
while a complementary CMOS implementation would require 6 transistors. The reduced number
of devices has the additional advantage of lower capacitance. Unfortunately, as discussed earlier,
an NMOS device is effective at passing a 0 but is poor at pulling a node to VDD. When the pass
transistor pulls a node high, the output only charges up to VDD -VTn. In fact, the situation is
worsened by the fact that the devices

3.b.Explain about Dynamic CMOS Design (8)

Dynamic CMOS Design


It was noted earlier that static CMOS logic with a fan-in of N requires 2N devices. A
variety of approaches were presented to reduce the number of transistors required to implement a
given logic function including pseudo-NMOS, pass transistor logic, etc. The pseudo-NMOS
logic style requires only N + 1 transistors to implement an N input logic gate, but unfortunately it
has static power dissipation. In this section, an alternate logic style called dynamic logic is
presented that obtains a similar result, while avoiding static power consumption. With the
addition of a clock input, it uses a sequence of precharge and conditional evaluation phases.

20 Department of ECE, Agni College of Technology


EC6601 VLSI Design

Dynamic Logic: Basic Principles


The basic construction of an (n-type) dynamic logic gate is shown in Figure 6.52a. The PDN
(pull-down network) is constructed exactly as in complementary CMOS. The operation of this
circuit is divided into two major phases: precharge and evaluation, with the mode of operation
determined by the clock signal CLK.

Precharge
When CLK = 0, the output node Out is precharged to VDD by the PMOS transistor Mp. During
that time, the evaluate NMOS transistor Me is off, so that the pull-down path is disabled. The
evaluation FET eliminates any static power that would be consumed during the precharge period
(this is, static current would flow between the supplies if both the pulldown and the precharge
device were turned on simultaneously).

Evaluation
For CLK = 1, the precharge transistor Mp is off, and the evaluation transistor Me is turned on.
The output is conditionally discharged based on the input values and the pull-down topology. If
the inputs are such that the PDN conducts, then a low resistance path exists between Out and
GND and the output is discharged to GND. If the PDN is turned off, the precharged value
remains stored on the output capacitance CL, which is a combination of junction capacitances,
the wiring capacitance, and the input capacitance of the fan-out gates. During the evaluation
phase, the only possible path between the output node and a supply rail is to GND.
Consequently, once Out is discharged, it cannot be charged again till then next precharge
operation. The inputs to the gate can therefore make at most one transition during evaluation.
Notice that the output can be in the high-impedance state during the evaluation period if the pull-
down network is turned off. This behavior is fundamentally different from the static counterpart
that always has a low resistance path between the output and one of the power rails. As as an
example, consider the circuit shown in Figure 6.52b. During the precharge phase (CLK=0), the
output is precharged to VDD regardless of the input values since the evaluation device is turned
off. During evaluation (CLK=1), a conducting path is created
between Out and GND if (and only if) A·B+C is TRUE. Otherwise, the output remains at the
precharged state of VDD. The following function is thus realized:

21 Department of ECE, Agni College of Technology


EC6601 VLSI Design

A number of important properties can be derived for the dynamic logic gate:
• The logic function is implemented by the NMOS pull-down network. The construction of the
PDN proceeds just as it does for static CMOS.
• The number of transistors (for complex gates) is substantially lower than in the static case: N +
2 versus 2N.
• It is non-ratioed. The sizing of the PMOS precharge device is not important for realizing proper
functionality of the gate. The size of the precharge device can be made large to improve the low-
to-high transition time (of course, at a cost to the high-to low transition time). There is however,
a trade-off with power dissipation since a larger precharge device directly increases clock-power
dissipation.
• It only consumes dynamic power. Ideally, no static current path ever exists between VDD and
GND. The overall power dissipation, however, can be significantly higher compared to a static
logic gate.
• The logic gates have faster switching speeds. There are two main reasons for this. The first
(obvious) reason is due to the reduced load capacitance attributed to the lower number of
transistors per gate and the single-transistor load per fan-in. Second, the dynamic gate does not
have short circuit current, and all the current provided by the pull-down devices goes towards
discharging the load capacitance. The low and high output levels VOL and VOH are easily
identified as GND and VDD and are not dependent upon the transistor sizes. The other VTC
parameters are dramatically different from static gates. Noise margins and switching thresholds
have been defined as static quantities that are not a function of time. To be functional, a dynamic
gate requires a periodic sequence of precharges and evaluations. Pure static analysis, therefore,
does not apply. During the evaluate period, the pull-down network of a dynamic inverter starts to
conduct when the input signal exceeds the threshold voltage (VTn) of the NMOS pull-down
transistor. Therefore, it is reasonable to set the switching threshold (VM) as well as VIH and VIL
of the gate equal to VTn. This translates to a low value for the NML.

Speed and Power Dissipation of Dynamic Logic


The main advantages of dynamic logic are increased speed and reduced implementation area.
Fewer devices to implement a given logic function implies that the overall load capacitance is
much smaller. The analysis of the switching behavior of the gate has some interesting
peculiarities to it. After the precharge phase, the output is high. For a low input signal, no
additional switching occurs. As a result, tpLH = 0! The high-to-low transition, on the other hand,
requires the discharging of the output capacitance through the pull-down network. Therefore
tpHL is proportional to CL and the current-sinking capabilities of the pull-down network. The
presence of the evaluation transistor slows the gate somewhat, as it presents an extra series
resistance. Omitting this transistor, while functionally not forbidden, may result in static power
dissipation and potentially a performance loss. The above analysis is somewhat unfair, because it
ignores the influence of the precharge time on the switching speed of the gate. The precharge
time is determined by the time it takes to charge CL through the PMOS precharge transistor.
During this time, the logic in the gate cannot be utilized. However, very often, the overall digital
system can be designed in such a way that the precharge time coincides with other system
functions. For instance, the precharge of the arithmetic unit in a microprocessor can coincide
with the instruction decode. The designer has to be aware of this “dead zone” in the use of
dynamic logic, and should carefully consider the pros and cons of its usage, taking the overall
system requirements into account.

22 Department of ECE, Agni College of Technology


EC6601 VLSI Design

When evaluating the power dissipation of a dynamic gate, it would appear that dynamic logic
presents a significant advantage. There are three reasons for this. First, the physical capacitance
is lower since dynamic logic uses fewer transistors to implement a given function. Also, the load
seen for each fanout is one transistor instead of two. Second, dynamic logic gates by construction
can at most have one transition per clock cycle. Glitching (or dynamic hazards) does not occur in
dynamic logic. Finally, dynamic gates do not exhibit short circuit power since the pull-up path is
not turned on when the gate is evaluating. While these arguments are generally true, they are
offset by other considerations:

(i) the clock power of dynamic logic can be significant, particularly since the clock node has
a guaranteed transition on every single clock cycle;
(ii) the number of transistors is higherthan the minimal set required for implementing the
logic;
(iii) short-circuit power may exist when leakage-combatting devices are added (as will be
discussed further);
(iv) and,most importantly, dynamic logic generally displays a higher switching activity due
to the periodic precharge and discharge operations. Earlier, the transition probability for
a static gate was shown to be p0 p1= p0 (1-p0). For dynamic logic, the output transition
probabilitydoes not depend on the state (history) of the inputs, but rather on the signal
probabilities only. For an n-tree dynamic gate, the output makes a 0Õ1 transition during
the precharge phase only if the output was discharged during the preceding evaluate
phase. The 0Õ1 transition probability for an n-type dynamic gate hence equals

4a. Explain in detail about Pulsed Latches and resettable lathes

A pulsed latch can be built from a conventional CMOS transparent latch driven by a brief clock
pulse. Figure 10.22(a) shows a simple pulse generator, sometimes called a clock chopper or one-
shot [Harris01a]. The pulsed latch is faster than a regular flip-flop because it involves a single
latch rather than two and because it allows time borrowing. It can also consume less energy,
although the pulse generator adds to the energy consumption (and is ideally shared across
multiple pulsed latches for energy and area efficiency). The drawback is the increased hold time.
The Naffziger pulsed latch used on the Itanium 2

23 Department of ECE, Agni College of Technology


EC6601 VLSI Design

processor consists of the latch from Figure 10.17(k) driven by even shorter pulses produced by
the generator of Figure 10.22(b) [Naffziger02]. This pulse generator uses a fairly slow (weak)
inverter to produce a pulse with a nominal width of about one-sixth of the cycle (125 ps for 1.2
GHz operation). When disabled, the internal node of the pulse generator floats high momentarily,
but no keeper is required because the duration is short. Of course, the enable signal has setup and
hold requirements around the rising edge of the clock, as shown in Figure 10.22(c).

Figure 10.22(d) shows yet another pulse generator used on an NEC RISC processor [Kozu96] to
produce substantially longer pulses. It includes a built-in dynamic transmission gate latch to
prevent the enable from glitching during the pulse. Many designers consider short pulses risky.
The pulse generator should be carefully simulated across process corners and possible RC loads
to ensure the pulse is not degraded too badly by process variation or routing. However, the
Itanium 2 team found that the pulses could be used just as regular clocks as long as the pulse
generator had adequate drive. The quad-core Itanium pulse generator selects between 1- and 3-
inverter delay chains using a transmission gate multiplexer [Stackhouse09]. The wider pulse
offers more robust latch operation across process and environmental variability and permits more
time borrowing, but increases the hold time. The multiplexer select is software-programmable to
fix problems discovered after fabrication. The Partovi pulsed latch in Figure 10.23 eliminates the
need to distribute the pulse by building the pulse generator into the latch itself [Partovi96,
Draper97]. The weak crosscoupled inverters in the dashed box staticize the circuit, although the
latch is susceptible to back-driven output noise on Q or Q unless an extra inverter is used to
buffer the output. The Partovi pulsed latch was used on the AMD K6 and Athlon [Golden99], but

24 Department of ECE, Agni College of Technology


EC6601 VLSI Design

is slightly slower than a simple latch [Naffziger02]. It was originally called an Edge Triggered
Latch (ETL), but strictly speaking is a pulsed latch because it has a brief window of
transparency.

Resettable Latches and Flip-Flops


Most practical sequencing elements require a reset signal to enter a known initial state on
startup and ensure deterministic behavior. Figure 10.24 shows latches and flip-flops with reset
inputs. There are two types of reset: synchronous and asynchronous. Asynchronous reset forces
Q low immediately, while synchronous reset waits for the clock. Synchronous reset signals must
be stable for a setup and hold time around the clock edge while asynchronous reset is
characterized by a propagation delay from reset to output. Synchronous reset simply requires
ANDing the input D with reset. Asynchronous reset requires gating both the data and the
feedback to force the reset independent of the clock. The tristate NAND gate can be constructed
from a NAND gate in series with a clocked transmission gate. Settable latches and flip-flops
force the output high instead of low. They are similar to resettable elements of Figure 10.24 but
replace NAND with NOR and reset with set shows a flip-flop combining both asynchronous set
and reset.

Enabled Latches and Flip-Flops


Sequencing elements also often accept an enable input. When enable en is low, the
element retains its state independently of the clock. The enable can be performed with an input
multiplexer or clock gating, as shown in Figure 10.26. The input multiplexer feeds back the old
state when the element is disabled. The multiplexer adds area and delay. Clock gating does not
affect delay from the data input and the AND gate can be shared among multiple clocked
elements. Moreover, it significantly reduces power consumption because the clock on the
disabled element does not toggle. However, the AND gate delays

25 Department of ECE, Agni College of Technology


EC6601 VLSI Design

the clock, potentially introducing clock skew. addresses techniques to minimize the skew by
building the AND gate into the final buffer of the clock distribution network. en must be stable
while the clock is high to prevent glitches on the clock, as will be discussed further

4b. Explain about Master-Slave Based Edge Triggered Register


Master-Slave Based Edge Triggered Register
The most common approach for constructing an edge-triggered register is to use a master-
slave configuration as shown in Figure 7.14. The register consists of cascading a negative latch
(master stage) with a positive latch (slave stage). A multiplexer based latch is used in this
particular implementation, though any latch can be used to realize the master and slave stages.
On the low phase of the clock, the master stage ist ransparent and the D input is passed to the
master stage output, QM. During this period, the slave stage is in the hold mode, keeping its
previous value using feedback. On the rising edge of the clock, the master slave stops sampling
the input, and the slave stage starts sampling. During the high phase of the clock, the slave stage
samples the output of the master stageQ (M), while the master stage remains in a hold mode.
Since QM is constant during the high phase of the clock, the output Q makes only one transition
per cycle. The value of Q is the value of D right before the rising edge of the clock, achieving the
positive edge-triggered effect. A negative edge-triggered register can be constructed using the
same principle by simply switching the order of the positive and negative latch (i.e., placing the
positive latch first). A complete transistor level implementation of a the master-slave positive
edge-triggered register is shown in Figure 7.15. The multiplexer is implemented using
transmission gates as discussed in the previous section. When clock is low (CLK = 1), T1 is on
and T2 is off, and the D input is sampled onto node QM. During this period, T3 is off and T4 is on
and the cross-coupled inverters (I5, I6) holds the state of the slave latch. When the clock goes
high, the master stage stops sampling the input and goes into a hold mode. T1 is off and T2 is on,
and the cross coupled inverters I3 and I4 holds the state of QM. Also, T3 is on and T4 is off, and
QM is copied to the output Q.

26 Department of ECE, Agni College of Technology


EC6601 VLSI Design

Timing Properties of the multiplexer Bases Master-Slave Register. As discussed earlier, there
are three important timing metrics in registers: the set up time, the hold time and the propagation
delay. It is important to understand these factors that affect the timing parameters and develop
the intuition to manually estimate the parameters. Assume that the propagation delay of each
inverter is tpd_inv and the propagation delay of the transmission gate is tpd_tx. Also assume that
the contamination delay is 0 and the inverter delay to derive CLK from CLK has a delay equal to
0.

The set-up time is the time before the rising edge of the clock that the input dataD must
become valid. Another way to ask the question is how long before the rising edge does the D
input have to be stable such that QM samples the value reliably. For the transmission gate
multiplexer-based register, the input D has to propagate through I1, T1, I3 and I2 before the
rising edge of the clock. This is to ensure that the node voltage s on both terminals of the
transmission gate T2 are at the same value. Otherwise, it is possible for the cross-coupled pair I2
and I3 to settle to an incorrect value. The set-up time is therefore equal to 3 *tpd_inv + tpd_tx .
The propagation delay is the time for the value of QM to propagate to the output Q. Note that
since we included the delay ofI2 in the set-up time, the output of I4 is valid before the rising edge
of clock. Therefore the delayt c-q is simply the delay throughT 3 and I6 (tc-q = tpd_tx +
tpd_inv).The hold time represents the time that the input must be held stable after the rising edge
of the clock. In this case, the transmission gateT 1 turns off when clock goes high and therefore
any changes in theD-input after clock going high are not seen by the input. Therefore, the hold
time is 0.

As mentioned earlier, the drawback of the transmission gate register is the high
capacitive load presented to the clock signal. The clock load per register is important since it

27 Department of ECE, Agni College of Technology


EC6601 VLSI Design

directly impacts the power dissipation of the clock network. Ignoring the overhead required to
invert the clock signal (since the buffer inverter overhead can be amortized over multiple register
bits), each register has a clock load of 8 transistors. One approach to reduce the
clock load at the cost of robustness is to make the circuit ratioed. Figure 7.18 shows that the
feedback transmission gate can be eliminated by directly cross coupling the inverters.

The penalty for the reduced clock load is increased design complexity. The transmission gate
(T1) and its source driver must overpower the feedback inverter (I2) to switch the state of the
cross-coupled inverter. The sizing requirements for the transmission gates can be derived using a
similar analysis as performed for the SR flip-flop. The input to the inverter I1 must be brought
below its switching threshold in order to make a transition. If minimum-sized devices are to be
used in the transmission gates, it is essential that the transistors of inverter I2 should be made
even weaker. This can be accomplished by making their channel-lengths larger than minimum.
Using minimum or close-to-minimumsize devices in the transmission gates is desirable to reduce
the power dissipation in the latches and the clock distribution network. Another problem with
this scheme is the reverse conduction — this is ,the second stage can affect the state of the first
latch. When the slave stage is on (Figure 7.19), it is possible for the combination of T2 and I4 to
influence the data stored in I1-I2 latch. As long as I4 is a weak device, this is fortunately not a
major problem.

5a. Explain the concept of a 4-bit barrel Shifter (8)


Barrel Shifter
Any general purpose n-bit shifter should be able to shift incoming data by up to n - 1 places in a
right-shift or left-shift direction. If we now further specify that all shifts should be on an 'end-
around' basis, so that any bit shifted out at one end of a data word will be shifted in at the other
end of the word, then the problem of right shift or left shift is greatly eased. In fact, a moment's
consideration will reveal, for a 4-bit word, that a 1-bit shift right is equivalent to a 3-bit shift left
and a 2-bit shift right is equivalent to a 2-bit shift left, etc. Thus we can achieve a capability to
shift left or right by zero, one, two, or three places by designing a circuit which will shift right
only (say) by zero, one, two, or three places. The nature of the shifter having been decided on, its
implementation must then be considered. Obviously, the first circuit which comes to mind is that
of the shift register in Figures 6.38, 6.39 and 6.40. Data could be loaded from the output of the
ALU and shifting effected; then the outputs of each stage of the shift register would provide the
required parallel output to be returned to the register array (or elsewhere in the general case).
However, there is danger in accepting the obvious without question. Many designers, used to the
constraints of TTL, MSI, and SSI logic, would be conditioned to think in terms of such standard
arrangements. When designing VLSI systems, it pays to set out exactly what
is req).lired to assess the best approach. In this case, the shifter must have: input from a four-line
parallel data bus;

28 Department of ECE, Agni College of Technology


EC6601 VLSI Design

• four output lines for the shifted data;


• means of transferring input data to output lines with any shift from zero to three bits
inclusive.
In looking for a way of meeting these requirements, we should also attempt to take best
advantage of the technology; for example, the availability of the switch-like MOS pass
transistor and transmission gate.

29 Department of ECE, Agni College of Technology


EC6601 VLSI Design

We must also observe the strategy decided on earlier for the direction of data and control signal
flow, and the approach adopted should make this feasible. Remember that the overall strategy in
this case is for data to flow horizontally and control signals vertically. A solution which meets
these requirements emerges from the days of switch and relay contact based switching
networks-the crossbar switch. Consider a direct MOS switch implementation of a 4 x 4 crossbar
switch, as in Figure 7.6. The arrangement is quite general and may be readily expanded to
accommodate n-bit inputs/outputs. In fact, this arrangement is an overkill in that any input line
can be connected to any or all output lines-if all switches are closed, then all inputs are connected
to all outputs in one glorious short circuit. Furthermore, 16 control signals (sw00-sw15), one for
each transistor switch, must be provided to drive the crossbar switch, and such complexity is
highly undesirable. An adaptation of this arrangement recognizes the fact that we can couple the
switch gates together in groups of four (in this case) and also form four separate groups
corresponding to shifts of zero, one, two and three bits. The arrangement is readily adapted so
that the in-lines also run horizontally (to conform to the required strategy). The resulting
arrangement is known as a barrel shifter and a 4 x 4-bit barrel shifter circuit diagram is given ih
Figure 7. 7. The interbus switches have their gate inputs connected in a staircase fashion in
groups of four and there are now four shift control inputs which must be mutually exclusive in
the active state. CMOS transmission gates may be used in place of the simple pass transistor
switches if appropriate.

The structure of the barrel shifter is clearly one of high regularity and generality and it may be
readily represented in stick diagram form. One possible implementation, using simple n-type
switches, is given in Figure 7.8. The stick diagram clearly conveys regular topology and allows
the choice of a standard
cell from which complete barrel shifters of any size may be formed by replication of the standard
cell. It should be noted that standard cell boundaries must be carefully chosen to allow for
butting together side by side and top to bottom to retain the overall topology. The mask layout
for standard cell number 2 (arbitrary choice) of Figure 7.8 may then be set out as in Figure 7.9.
Once the standard cell dimensions have been determined, then any n x n barrel shifter may be
configured and its outline, or bounding box, arrived at by summing up the dimensions of the

30 Department of ECE, Agni College of Technology


EC6601 VLSI Design

replicated standard cell. The use of simple n-type switches in a CMOS environment might be
questioned. Although there will be a degrading of logic 1 levels through n-type switches, this
generally does not matter if the shifter is followed by restoring circuitry such as inverters or gate
logic. Furthermore, as there will only ever be one n-type switch in series between an input and
the corresponding output line, the arrangement is fast. The minimum size bounding box outline
for the 4 x 4-way barrel shifter is given in
Figure 7.10. The figure also indicates all inlet and outlet points around the periphery together
with the layer on which each is located. This allows ready placing of the shifter within the floor
plan (Figure 7.5) and its interconnection with the other subsystems forming the datapath. It also
emphasizes the fact that, as in this case, many subsystems need external links to complete their
architecture. In this case, the links shown on the right of the bounding box must be made and
must be allowed for in interconnections and overall dimensions. This form of representation also
allows the subsystem geometric characterization to be that of the bounding box alone for
composing higher levels of the system hierarchy.

5b.Explain about Carry Look ahead adder in detail.

Carry-Propagate Addition
N-bit adders take inputs {AN, …, A1}, {BN, …, B1}, and carry-in Cin, and compute the sum
{SN, …, S1} and the carry-out of the most significant bit Cout, as shown in Figure 11.9. carry-
out. Long adders use multiple levels of lookahead structures for even more speed.

Carry-Ripple Adder An N-bit adder can be constructed by cascading N full adders, as shown
in Figure 11.11(a) for N carry-ripple adder (or ripple-carry adder). The
carry-out of bit i, Ci, is the carry-in to bit i weight of
the sum Si. The delay of the adder is set by the time for the carries to ripple through the N stages,
so the tC�Cout delay should be
minimized. This delay can be reduced by omitting the inverters on the outputs, as was done in
Figure 11.4(c). Because addition is a self-dual function (i.e., the function of complementary
inputs is the complement of the function), an inverting full adder receiving complementary
inputs produces true outputs. Figure 11.11(b) shows a carry ripple adder built from inverting full
adders. Every other stage operates on complementary data. The delay inverting the adder inputs
or sum outputs is off the critical ripple-carry path.

Carry Generation and Propagation


This section introduces notation commonly used in describing faster adders. Recall that the P
(propagate) and G ( generate) signals were defined in Section 11.2.1. We can generalize these
signals to describe whether a group spanning bits i…j, inclusive, generate a carry or propagate a
carry. A group of bits generates a carry if its carry-out is true independent of the carryin; it
propagates a carry if its carry-out is true when there is a carry-in. These signals can be defined
recursively for i � k j as In other words, a group generates a carry if the upper (more
significant) or the lower portion generates and the upper portion propagates that carry. The group
propagates a carry if both the upper and lower portions propagate the carry.2 The carry-in must
be treated specially. Let us define C Cin and CN Cout. Then we can define generate and
propagate signals for bit 0 as

31 Department of ECE, Agni College of Technology


EC6601 VLSI Design

Observe that the carry into bit i is the carry-out of bit i–1 and is Ci– Gi–1:0. This is an
important relationship; group generate signals and carries will be used synonymously in the
subsequent sections. We can thus compute the sum for bit i using EQ (11.2) as (11.7)

Hence, addition can be reduced to a three-step process:

1. Computing bitwise generate and propagate signals using EQs (11.5) and (11.6)
2. Combining PG signals to determine group generates Gi–1:0 for all N � i � 1 using EQ (11.4)
3. Calculating the sums using EQ (11.7)

These steps are illustrated in Figure 11.12. The first and third steps are routine, so most of the
attention in the remainder of this section is devoted to alternatives for the group PG logic with
different trade-offs between speed, area, and complexity. Some of the hardware can be shared in
the bitwise PG logic, as shown in Figure 11.13. carry-in to carry-out along the carry chain
majority gates. As the P and G signals will have already stabilized by the time the carry arrives,
we can use them to simplify the majority function into an AND-OR gate:3

In this extreme, the group propagate signals are never used and need not be computed. Figure
11.14 shows a 4-bit carry-ripple adder. The critical carry path now proceeds through a chain of
AND-OR gates rather than a chain of majority gates. Figure 11.15 illustrates the group PG logic
for a 16-bit carry-ripple adder, where the AND-OR gates in the group PG network are
represented with gray cells. Diagrams like these will be used to compare a variety of adder
architectures in subsequent sections. The diagrams use black cells, gray cells, and white buffers
defined in Figure 11.16(a) for valency-2 cells. Black cells contain the group generate and

32 Department of ECE, Agni College of Technology


EC6601 VLSI Design

propagate logic (an AND-OR gate and an AND gate) defined in EQ (11.4). Gray cells containing
only the group generate logic are used at the final cell position in each column because only the
group generate signal is required to compute the sums. Buffers can be used to minimize the load
on critical paths. Each line represents a bundle of the group generate and propagate signals
(propagate signals are omitted after gray cells).

6.a. Explain in detail about types of ASICs

Types of ASICs
ICs are made on a thin (a few hundred microns thick), circular silicon wafer , with each wafer
holding hundreds of die (sometimes people use dies or dice for the plural of die). The transistors
and wiring are made from many layers (usually between 10 and 15 distinct layers) built on top of
one another. Each successive mask layer has a pattern that is defined using a mask similar to a
glass photographic slide. The first half-dozen or so layers define the transistors. The last half-
dozen or so layers define the metal wires between the transistors (the interconnect ).

A full-custom IC includes some (possibly all) logic cells that are customized and all mask layers
that are customized. A microprocessor is an example of a full-custom IC—designers spend many
hours squeezing the most out of every last square micron of microprocessor chip space by hand.
Customizing all of the IC features in this way allows designers to include analog circuits,
optimized memory cells, or mechanical structures on an IC, for example. Full-custom ICs are the
most expensive to manufacture and to design. The manufacturing lead time (the time it takes
just to make an IC—not including design time) is typically eight weeks for a full-custom IC.
These specialized full-custom ICs are often intended for a specific application, so we might call
some of them full-custom ASICs. We shall discuss full-custom ASICs briefly next, but the
members of the IC family that we are more interested in are semicustom ASICs , for which all
of the logic cells are predesigned and some (possibly all) of the mask layers are customized.
Using predesigned cells from a cell library makes our lives as designers much, much easier.

There are two types of semicustom ASICs that we shall cover: standard-cell–based ASICs and
gate-array–based ASICs. Following this we shall describe the programmable ASICs , for which
all of the logic cells are predesigned and none of the mask layers are customized. There are two
types of programmable ASICs: the programmable logic device and, the newest member of the
ASIC family, the field programmable gate array.

Full-Custom ASICs
In a full-custom ASIC an engineer designs some or all of the logic cells, circuits, or layout
specifically for one ASIC. This means the designer abandons the approach of using pretested and
precharacterized cells for all or part of that design. It makes sense to take this approach only if
there are no suitable existing cell libraries available that can be used for the entire design. This
might be because existing cell libraries are not fast enough, or the logic cells are not small
enough or consume too much power. You may need to use full-custom design if the ASIC
technology is new or so specialized that there are no existing cell libraries or because the ASIC is
so specialized that some circuits must be custom designed. Fewer and fewer full-custom ICs are
being designed because of the problems with these special parts of the ASIC. There is one
growing member of this family, though, the mixed analog/digital ASIC, which we shall discuss

33 Department of ECE, Agni College of Technology


EC6601 VLSI Design

next. Bipolar technology has historically been used for precision analog functions. There are
some fundamental reasons for this. In all integrated circuits the matching of component
characteristics between chips is very poor, while the matching of characteristics between
components on the same chip is excellent. Suppose we have transistors T1, T2, and T3 on an
analog/digital ASIC. The three transistors are all the same size and are constructed in an identical
fashion. Transistors T1 and T2 are located adjacent to each other and have the same orientation.
Transistor T3 is the same size as T1 and T2 but is located on the other side of the chip from T1
and T2 and has a different orientation. ICs are made in batches called wafer lots. A wafer lot
is a group of silicon wafers that are all processed together. Usually there are between 5 and 30
wafers in a lot. Each wafer can contain tens or hundreds of chips depending on the size of the IC
and the wafer. If we were to make measurements of the characteristics of transistors T1, T2, and
T3 we would find the following: Transistors T1 will have virtually identical characteristics to T2
on the same IC. We say that the transistors match well or the tracking between devices is
excellent. l Transistor T3 will match transistors T1 and T2 on the same IC very well, but not as
closely as T1 matches T2 on the same IC. l Transistor T1, T2, and T3 will match fairly well with
transistors T1, T2, and T3 on a different IC on the same wafer. The matching will depend on
how far apart the two ICs are on the wafer.

l Transistors on ICs from different wafers in the same wafer lot will not match very well. l
Transistors on ICs from different wafer lots will match very poorly. For many analog designs the
close matching of transistors is crucial to circuit operation. For these circuit designs pairs of
transistors are used, located adjacent to each other. Device physics dictates that a pair of bipolar
transistors will always match more precisely than CMOS transistors of a comparable size.
Bipolar technology has historically been more widely used for full-custom analog design because
of its improved precision. Despite its poorer analog properties, the use of CMOS technology for
analog functions is increasing. There are two reasons for this. The first reason is that CMOS is
now by far the most widely available IC technology. Many more CMOS ASICs and CMOS
standard products are now being manufactured than bipolar ICs. The second reason is that
increased levels of integration require mixing analog and digital functions on the same IC: this
has forced designers to find ways to use CMOS technology to implement analog functions.
Circuit designers, using clever new techniques, have been very successful in finding new ways to
design analog CMOS circuits that can approach the accuracy of bipolar analog designs.

Standard-Cell–Based ASICs
A cell-based ASIC (cell-based IC, or CBIC —a common term in Japan, pronounced “sea-bick”)
uses predesigned logic cells (AND gates, OR gates, multiplexers, and flipflops, for example)
known as standard cells . We could apply the term CBIC to any IC that uses cells, but it is
generally accepted that a cell-based ASIC or CBIC means a standard-cell–based ASIC. The
standard-cell areas (also called flexible blocks) in a CBIC are built of rows of standard cells—
like a wall built of bricks. The standard-cell areas may be used in combination with larger
predesigned cells, perhaps microcontrollers or even microprocessors, known as megacells .
Megacells are also called megafunctions, fullcustom blocks, system-level macros (SLMs), fixed
blocks, cores, or Functional Standard Blocks (FSBs).

The ASIC designer defines only the placement of the standard cells and the interconnect in a
CBIC. However, the standard cells can be placed anywhere on the silicon; this means that all the

34 Department of ECE, Agni College of Technology


EC6601 VLSI Design

mask layers of a CBIC are customized and are unique to a particular customer. The advantage of
CBICs is that designers save time, money, and reduce risk by using a predesigned, pretested, and
precharacterized standard-cell library . In addition each standard cell can be optimized
individually. During the design of the cell library each and every transistor in every standard cell
can be chosen to maximize speed or minimize area, for example. The disadvantages are the time
or expense of designing or buying the standard-cell library and the time needed to fabricate all
layers of the ASIC for each new design. Figure 1.2 shows a CBIC (looking down on the die
shown in Figure 1.1b, for example). The important features of this type of ASIC are as follows:
l All mask layers are customized—transistors and interconnect. l Custom blocks can be
embedded. l Manufacturing lead time is about eight weeks Each standard cell in the library is
constructed using full-custom design methods, but you can use these predesigned and
precharacterized circuits without having to do any full-custom design yourself. This design style
gives you the same performance and flexibility advantages of a full-custom ASIC but reduces
design time and reduces risk. Standard cells are designed to fit together like bricks in a wall.
Figure 1.3 shows an example of a simple standard cell (it is simple in the sense it is not
maximized for density—but ideal for showing you its internal construction). Power and ground
buses (VDD and GND or VSS) run horizontally on metal lines inside the cells. Standard-cell
design allows the automation of the process of assembling an ASIC. Groups of standard cells fit
horizontally together to form rows. The rows stack vertically to form flexible rectangular blocks
(which you can reshape during design). You may then connect a flexible block built from
several rows of standard cells to other standard-cell blocks or other full-custom logic blocks. For
example, you might want to include a custom interface to a standard, predesigned
microcontroller together with some memory. The microcontroller block may be a fixed-size
megacell, you might generate the memory using a memory compiler, and the custom logic and
memory controller will be built from flexible standard-cell blocks, shaped to fit in the empty
spaces on the chip.

Both cell-based and gate-array ASICs use predefined cells, but there is a difference—we can
change the transistor sizes in a standard cell to optimize speed and performance, but the device
sizes in a gate array are fixed. This results in a tradeoff in performance and area in a gate array at
the silicon level. The trade-off between area and performance is made at the library level for a
standard-cell ASIC. Modern CMOS ASICs use two, three, or more levels (or layers) of metal for
interconnect. This allows wires to cross over different layers in the same way that we use copper
traces on different layers on a printed-circuit board. In a two-level metal CMOS technology,
connections to the standard-cell inputs and outputs are usually made using the second level of
metal ( metal2 , the upper level of metal) at the tops and bottoms of the cells. In a three-level
metal technology, connections may be internal to the logic cell (as they are in Figure 1.3). This
allows for more sophisticated routing programs to take advantage of the extra metal layer to
route interconnect over the top of the logic cells. We shall cover the details of routing ASICs in

A connection that needs to cross over a row of standard cells uses a feedthrough. The term
feedthrough can refer either to the piece of metal that is used to pass a signal through a cell or to
a space in a cell waiting to be used as a feedthrough—very confusing. Figure 1.4 shows two
feedthroughs: one in cell A.14 and one in cell A.23. In both two-level and three-level metal
technology, the power buses (VDD and GND) inside the standard cells normally use the lowest
(closest to the transistors) layer of metal ( metal1 ). The width of each row of standard cells is

35 Department of ECE, Agni College of Technology


EC6601 VLSI Design

adjusted so that they may be aligned using spacer cells . The power buses, or rails, are then
connected to additional vertical power rails using row-end cells at the aligned ends of each
standard-cell block. If the rows of standard cells are long, then vertical power rails can also be
run in metal2 through the cell rows using special power cells that just connect to VDD and
GND. Usually the designer manually controls the number and width of the vertical power rails
connected to the standard-cell blocks during physical design. A diagram of the power
distribution scheme for a CBIC is shown in Figure 1.4. All the mask layers of a CBIC are
customized. This allows megacells (SRAM, a SCSI controller, or an MPEG decoder, for
example) to be placed on the same IC with standard cells. Megacells are usually supplied by an
ASIC or library company complete with behavioral models and some way to test them (a test
strategy). ASIC library companies also supply compilers to generate flexible DRAM, SRAM,
and ROM blocks. Since all mask layers on a standard-cell design are customized, memory design
is more efficient and denser than for gate arrays. For logic that operates on multiple signals
across a data bus—a datapath ( DP )—the use of standard cells may not be the most efficient
ASIC design style. Some ASIC library companies provide a datapath compiler that
automatically generates datapath logic . A datapath library typically contains cells such as
adders, subtracters, multipliers, and simple arithmetic and logical units ( ALUs ). The
connectors of datapath library cells are pitch-matched to each other so that they fit together.
Connecting datapath cells to form a datapath usually, but not always, results in faster and denser
layout than using standard cells or a gate array. Standard-cell and gate-array libraries may
contain hundreds of different logic cells, including combinational functions (NAND, NOR,
AND, OR gates) with multiple inputs, as well as latches and flip-flops with different
combinations of reset, preset and clocking options. The ASIC library company provides
designers with a data book in paper or electronic form with all of the functional descriptions and
timing information for each library element.
Gate-Array–Based ASICs
In a gate array (sometimes abbreviated to GA) or gate-array–based ASIC the transistors are
predefined on the silicon wafer. The predefined pattern of transistors ona gate array is the base
array , and the smallest element that is replicated to make the base array (like an M. C. Escher
drawing, or tiles on a floor) is the base cell (sometimes called a primitive cell ). Only the top
few layers of metal, which define the interconnect between transistors, are defined by the
designer using custom masks. To distinguish this type of gate array from other types of gate
array, it is often called amasked gate array ( MGA ). The designer chooses from a gate-array
library of predesigned and precharacterized logic cells. The logic cells in a gate-array library are
often called macros . The reason for this is that the base-cell layout is the same for each logic
cell, and only the interconnect (inside cells and between cells) is customized, so that there is a
similarity between gate-array macros and a software macro. Inside IBM, gate-array macros are
known as books (so that books are part of a library), but unfortunately this descriptive term is not
very widely used outside IBM. We can complete the diffusion steps that form the transistors and
then stockpile wafers (sometimes we call a gate array a prediffused array for this reason). Since
only the metal interconnections are unique to an MGA, we can use the stockpiled wafers for
different customers as needed. Using wafers prefabricated up to the metallization steps reduces
the time needed to make an MGA, the turnaround time , to a few days or at most a couple of
weeks. The costs for all the initial fabrication steps for an MGA are shared for each customer and
this reduces the cost of an MGA compared to a full-custom or standard-cell ASIC design. There
are the following different types of MGA or gate-array–based ASICs:

36 Department of ECE, Agni College of Technology


EC6601 VLSI Design

 Channeled gate arrays.


 Channelless gate arrays.
 Structured gate arrays.

The hyphenation of these terms when they are used as adjectives explains their construction. For
example, in the term “channeled gate-array architecture,” the gate array is channeled , as will be
explained. There are two common ways of arranging (or arraying) the transistors on a MGA: in a
channeled gate array we leave space between the rows of transistors for wiring; the routing on a
channelless gate array uses rows of unused transistors. The channeled gate array was the first to
be developed, but the channelless gate-array architecture is now more widely used. A structured
(or embedded) gate array can be either channeled or channelless but it includes (or embeds) a
custom block.

6.b. Explain the types of gate array based ASIC.


Channeled gate array The important features of this type of MGA are:

 Only the interconnect is customized.


 The interconnect uses predefined spaces between rows of base cells.
 Manufacturing lead time is between two days and two weeks.
A channeled gate array is similar to a CBIC—both use rows of cells separated by channels used
for interconnect. One difference is that the space for interconnect between rows of cells are fixed
in height in a channeled gate array, whereas the space between rows of cells may be adjusted in a
CBIC.

1.1.5 Channelless Gate Array

Channelless gate array (also known as a channel-free gate array , sea-of-gates array , or
SOG array). The important features of this type of MGA are as follows:

 Only some (the top few) mask layers are customized—the interconnect.
 Manufacturing lead time is between two days and two weeks.

37 Department of ECE, Agni College of Technology


EC6601 VLSI Design

The key difference between a channelless gate array and channeled gate array is that there are no
predefined areas set aside for routing between cells on a channelless gate array. Instead we route
over the top of the gate-array devices. We can do this because we customize the contact layer
that defines the connections between metal1, the first layer of metal, and the transistors. When
we use an area of transistors for routing in a channelless array, we do not make any contacts to
the devices lying underneath; we simply leave the transistors unused. The logic density—the
amount of logic that can be implemented in a given silicon area—is higher for channelless gate
arrays than for channeled gate arrays. This is usually attributed to the difference in structure
between the two types of array. In fact, the difference occurs because the contact mask is
customized in a channelless gate array, but is not usually customized in a channeled gate array.
This leads to denser cells in the channelless architectures. Customizing the contact layer in a
channelless gate array allows us to increase the density of gate-array cells because we can route
over the top of unused contact sites.

Structured Gate Array

An embedded gate array or structured gate array (also known as masterslice or


masterimage ) combines some of the features of CBICs and MGAs. One of the disadvantages of
the MGA is the fixed gate-array base cell. This makes the implementation of memory, for
example, difficult and inefficient. In an embedded gate array we set aside some of the IC area
and dedicate it to a specific function. This embedded area either can contain a different base cell
that is more suitable for building memory cells, or it can contain a complete circuit block, such
as a microcontroller.

The important features of this type of MGA are the following:

 Only the interconnect is customized.


 Custom blocks (the same for each design) can be embedded.
 Manufacturing lead time is between two days and two weeks. An embedded gate array
gives the improved area efficiency and increased
performance of a CBIC but with the lower cost and faster turnaround of an MGA. One
disadvantage of an embedded gate array is that the embedded function is fixed. For example, if
an embedded gate array contains an area set aside for a 32 k-bit memory, but we only need a 16
k-bit memory, then we may have to waste half of the embedded memory function. However, this

38 Department of ECE, Agni College of Technology


EC6601 VLSI Design

may still be more efficient and cheaper than implementing a 32 k-bit memory using macros on a
SOG array. ASIC vendors may offer several embedded gate array structures containing different
memory types and sizes as well as a variety of embedded functions. ASIC companies wishing to
offer a wide range of embedded functions must ensure that enough customers use each different
embedded gate array to give the cost advantages over a custom gate array or CBIC (the Sun
Microsystems SPARCstation 1 described in Section 1.3 made use of LSI Logic embedded gate
arrays—and the 10K and 100K series of embedded gate arrays were two of LSI Logic’s most
successful products).

1. Discuss in detail about ASIC Cell Libraries


The cell library is the key part of ASIC design. For a programmable ASIC the FPGA
company supplies you with a library of logic cells in the form of a design kit , you normally do
not have a choice, and the cost is usually a few thousand dollars. For MGAs and CBICs you have
three choices: the ASIC vendor (the company that will build your ASIC) will supply a cell
library, or you can buy a cell library from a third party library vendor , or you can build your
own cell library. The first choice, using an ASIC-vendor library , requires you to use a set of
design tools approved by the ASIC vendor to enter and simulate your design. You have to buy
the tools, and the cost of the cell library is folded into the NRE. Some ASIC vendors (especially
for MGAs) supply tools that they have developed in-house. For some reason the more common
model in Japan is to use tools supplied by the ASIC vendor, but in the United States, Europe, and
elsewhere designers want to choose their own tools. Perhaps this has to do with the relationship
between customer and supplier being a lot closer in Japan than it is elsewhere.

An ASIC vendor library is normally a phantom library —the cells are empty boxes, or
phantoms , but contain enough information for layout (for example, you would only see the
bounding box or abutment box in a phantom version of the cell in Figure 1.3). After you
complete layout you hand off a netlist to the ASIC vendor, who fills in the empty boxes (
phantom instantiation ) before manufacturing your chip.

The second and third choices require you to make a buy-or-build decision. If you complete an
ASIC design using a cell library that you bought, you also own the masks (the tooling ) that are
used to manufacture your ASIC. This is called customerowned tooling ( COT , pronounced
“see-oh-tee”). A library vendor normally develops a cell library using information about a
process supplied by an ASIC foundry . An ASIC foundry (in contrast to an ASIC vendor) only
provides manufacturing, with no design help. If the cell library meets the foundry specifications,

39 Department of ECE, Agni College of Technology


EC6601 VLSI Design

we call this a qualified cell library . These cell libraries are normally expensive (possibly
several hundred thousand dollars), but if a library is qualified at several foundries this allows you
to shop around for the most attractive terms. This means that buying an expensive library can be
cheaper in the long run than the other solutions for high-volume production.

The third choice is to develop a cell library in-house. Many large computer and electronics
companies make this choice. Most of the cell libraries designed today are still developed in-
house despite the fact that the process of library development is complex and very expensive.
However created, each cell in an ASIC cell library must contain the following:

 A physical layout
 A behavioral model
 A Verilog/VHDL model
 A detailed timing model
 A test strategy
 A circuit schematic
 A cell icon
 A wire-load model
 A routing model

For MGA and CBIC cell libraries we need to complete cell design and cell layout and shall
discuss this in Chapter 2. The ASIC designer may not actually see the layout if it is hidden inside
a phantom, but the layout will be needed eventually. In a programmable ASIC the cell layout is
part of the programmable ASIC design.

The ASIC designer needs a high-level, behavioral model for each cell because simulation at the
detailed timing level takes too long for a complete ASIC design. For a NAND gate a behavioral
model is simple. A multiport RAM model can be very complex. We shall discuss behavioral
models when we describe Verilog and VHDL ASIC designers also need a detailed timing model
for each cell to determine the performance of the critical pieces of an ASIC. It is too difficult, too
time-consuming, and too expensive to build every cell in silicon and measure the cell delays.
Instead library engineers simulate the delay of each cell, a process known as characterization.

Characterizing a standard-cell or gate-array library involves circuit extraction from the full-
custom cell layout for each cell. The extracted schematic includes all the parasitic resistance and
capacitance elements. Then library engineers perform a simulation of each cell including the
parasitic elements to determine the switching delays. The simulation models for the transistors
are derived from measurements on special chips included on a wafer called process control
monitors ( PCMs ) or dropins . Library engineers then use the results of the circuit simulation
to generate detailed timing models for logic simulation.
All ASICs need to be production tested (programmable ASICs may be tested by the maufacturer
before they are customized, but they still need to be tested). Simple cells in small or medium-size
blocks can be tested using automated techniques, but large blocks such as RAM or multipliers
need a planned strategy. We shall discuss test in. The cell schematic (a netlist description)
describes each cell so that the cell designer can perform simulation for complex cells. You may
not need the detailed cell schematic for all cells, but you need enough information to compare

40 Department of ECE, Agni College of Technology


EC6601 VLSI Design

what you think is on the silicon (the schematic) with what is actually on the silicon (the
layout)—this is a layout versus schematic ( LVS ) check. If the ASIC designer uses schematic
entry, each cell needs a cell icon together with connector and naming information that can be
used by design tools from different vendors. We shall cover ASIC design using schematic entry
in. One of the advantages of using logic synthesis rather than schematic design entry is
eliminating the problems with icons, connectors, and cell names. Logic synthesis also makes
moving an ASIC between different cell libraries, or retargeting , much easier. In order to
estimate the parasitic capacitance of wires before we actually complete any 1.5 ASIC Cell
Libraries routing, we need a statistical estimate of the capacitance for a net in a given size circuit
block. This usually takes the form of a look-up table known as a wire-load model . We also need
a routing model for each cell. Large cells are too complex for the physical design or layout tools
to handle directly and we need a simpler representation—a phantom —of the physical layout
that still contains all the necessary information. The phantom may include information that tells
the automated routing tool where it can and cannot place wires over the cell, as well as the
location and types of the connections to the cell.

7.Explain about Xilinx FPGA blocks


Xilinx LCA
Xilinx LCA (a trademark, denoting logic cell array) basic logic cells, configurable logic blocks
or CLBs , are bigger and more complex than the Actel or QuickLogic cells. The Xilinx LCA
basic logic cell is an example of a coarse-grain architecture . The Xilinx CLBs contain both
combinational logic and flip-flops. XC3000 CLB The XC3000 CLB, shown in Figure 5.6 , has
five logic inputs (A–E), a common clock input (K), an asynchronous direct-reset input (RD), and
an enable (EC). Using programmable MUXes connected to the SRAM programming cells, you
can independently connect each of the two CLB outputs (X and Y) to the output of the flipflops
(QX and QY) or to the output of the combinational logic (F and G) A 32-bit look-up table (
LUT ), stored in 32 bits of SRAM, provides the ability to implement combinational logic.
Suppose you need to implement the function F = A · B · C · D · E (a five-input AND). You set
the contents of LUT cell number 31 (with address '11111') in the 32-bit SRAM to a '1'; all the
other SRAM cells are set to '0'. When you apply the input variables as an address to the 32-bit
SRAM, only when ABCDE = '11111' will the output F be a '1'. This means that the CLB
propagation delay is fixed, equal to the LUT access time, and independent of the logic function
you implement. There are seven inputs for the combinational logic in the XC3000 CLB: the five
CLB inputs (A–E), and the flip-flop outputs (QX and QY). There are two outputs from the LUT
(F and G). Since a 32-bit LUT requires only five variables to form a unique address (32 = 2 5 ),
there are several ways to use the LUT: l You can use five of the seven possible inputs (A–E, QX,
QY) with the entire 32- bit LUT. The CLB outputs (F and G) are then identical. l You can split
the 32-bit LUT in half to implement two functions of four variables each. You can choose four
input variables from the seven inputs (A–E, QX, QY). You have to choose two of the inputs
from the five CLB inputs (A–E); then one function output connects to F and the other output
connects to G. l You can split the 32-bit LUT in half, using one of the seven input variables as a
select input to a 2:1 MUX that switches between F and G. This allows you to implement some
functions of six and seven variables.

41 Department of ECE, Agni College of Technology


EC6601 VLSI Design

XC4000 Logic Block


XC4000 series of Xilinx FPGAs.
This is a fairly complicated basic logic cell containing 2 four-input LUTs that feed a three-input
LUT. The XC4000 CLB also has special fast carry logic hard-wired between CLBs. MUX
control logic maps four control inputs (C1–C4) into the four inputs: LUT input H1, direct in
(DIN), enable clock (EC), and a set / reset control (S/R) for the flip-flops. The control inputs
(C1–C4) can also be used to control the use of the F' and G' LUTs as 32 bits of SRAM.

42 Department of ECE, Agni College of Technology


EC6601 VLSI Design

XC5200 Logic Block


A Logic Cell or LC, used in the XC5200 family of Xilinx LCA FPGAs. 1 The LC is similar to
the CLBs in the XC2000/3000/4000 CLBs, but simpler. Xilinx retained the term CLB in the
XC5200 to mean a group of four LCs (LC0–LC3). The XC5200 LC contains a four-input LUT, a
flip-flop, and MUXes to handle signal switching. The arithmetic carry logic is separate from the
LUTs. A limited capability to cascade functions is provided (using the MUX labeled F5_MUX
in logic cells LC0 and LC2 in Figure 5.8 ) to gang two LCs in parallel to provide the equivalent
of a fiveinput LUT..

43 Department of ECE, Agni College of Technology


EC6601 VLSI Design

44 Department of ECE, Agni College of Technology

Anda mungkin juga menyukai