QUESTION BANK
VI-SEMESTER
( Reg-2013)
AUTHORS
1. Mr. M. Yuvaraj,
Assistant Professor, Dept. of ECE, Agni College of Technology.
2. Mrs. A Shifana Parween,
Assistant Professor, Dept. of ECE, Agni College of Technology.
3. Mr. G. Laxmanaa ,
Assistant Professor, Dept. of ECE, Agni College of Technology.
5. What is Scaling?
Proportional adjustment of the dimensions of an electronic device while maintaining the
electrical properties of the device, results in a device either larger or smaller than the un-
scaled device
b represent streams of numbers, that is, the computation must be performed on a large set
of input values.
19.Define Accumulator.
An accumulator is a register for short-term, intermediate storage of arithmetic and
logic data in a computer's CPU (central processing unit). The term "accumulator" is
rarely used in reference to contemporary CPUs, having been replaced around the turn
of the millennium by the term "register." In a modern computers, any register can
function as an accumulator.
gate array, whereas the space between devices. We can do this because we
rows of cells may be adjusted in a customize the contact layer that defines
CBIC the connections between metal 1, the
first layer of metal, and the transistors.
24.What is an antifuse?
An antifuse is normally high resistance (>). On application of appropriate 100M
programming voltages, the antifuse is changed permanently to a low-resistance structure
(200-500).
DC transfer characteristics
Digital circuits are merely analog circuits used over a special portion of their range. The DC
transfer characteristics of a circuit relate the output voltage to the input voltage, assuming the
input changes slowly enough that capacitances have plenty of time to charge or discharge.
Specific ranges of input and output voltages are defined as valid 0 and 1 logic levels. This
section explores the DC transfer characteristics of CMOS gates and pass transistors.
The operation of the CMOS inverter can be divided into five regions indicated on Figure2.26(c).
The state of each transistor in each region is shown in Table 2.3. In region A, then MOS
transistor is OFF so the pMOS transistor pulls the output to VDD. In region B, then MOS
transistor starts to turn ON, pulling the output down. In region C, both transistors are in
saturation. Notice that ideal transistors are only in region C for Vin = VDD/2 and that the slope
of the transfer curve in this example is – in this region, corresponding to infinite gain. Real
transistors have finite output resistances on account of channel length modulation, described in
Section 2.4.2, and thus have finite slopes over a broader region C. In region D, the pMOS
transistor is partially ON and in region E, it is completely
I-V Characteristics
When familiarizing yourself with a new process, a starting point is to plot the current voltage(I-
V) characteristics. Although digital designers seldom make calculations directly from these plots,
it is helpful to know the ON current of nMOS and pMOS transistors, how severely velocity-
saturated the process is, how the current rolls off below threshold, how the devices are affected
by DIBL and body effect, and so forth. These plots are made with DC sweeps, as discussed in
Section 8.2.2. Each transistor is 1 �m wide in a representative65 nm process at 70 °C with VDD
= 1.0 V. Figure 8.16 shows nMOS characteristics and Figure 8.17 shows pMOS characteristics.
Figure 8.16(a) plots Ids vs. Vdsat various values fVgs, as was done in Figure 8.5. The saturation
current would ideally increase quadratically with Vgs– Vt, but in this plot it shows closer to a
linear dependence, indicating that the nMOS transistor is severely velocity saturated(� closer to
1 than 2 in the �-power model). The significant increase in saturation current with Vdsis caused
by channel-length modulation. Figure 8.16(b) makes a similar plot for a device with a drawn
channel length of twice minimum. The current drops by less than a factor of two because it
experiences less velocity saturation. The current is slightly flatter in saturation because channel-
length modulation has less impact at longer channel lengths.Figure 8.16(c) plots Ids vs. Vgson a
semilogarithmi scale for Vds= 0.1 V and 1.0 V.The straight line at low Vgs indicates that the
currentrolls off exponentially below threshold. The difference in subthreshold leakage at the
varying drain voltage reflects the effects
2.a. Discuss the techniques to reduce switching activity in a static and dynamic
CMOS circuits.
Circuit Families
Static CMOS circuits with complementary nMOS pulldown and pMOS pullup network
sare used for the vast majority of logic gates in integrated circuits. They have good noise
margins, and are fast, low power, insensitive to device variations, easy to design, widely
supported by CAD tools, and readily available in standard cell libraries. When noise does exceed
the margins, the gate delay increases because of the glitch, but the gate eventually will settle to
the correct answer. Most design teams now use static CMOS exclusively for combinational
logic. This section begins with a number of techniques for optimizing static CMOS circuits.
Nevertheless, performance or area constraints occasionally dictate the need for other circuit
families. The most important alternative is dynamic circuits. However, we begin by considering
ratioed circuits, which are simpler and offer a helpful conceptual transition between static and
dynamic. We also consider pass transistors, which had their zenith in the 1990s for general-
purpose logic and still appear in specialized applications.
Static CMOS
Designers accustomed to AND and OR functions must learn to think in terms of NAND
and NOR to take advantage of static CMOS. In manual circuit design, this is often done through
bubble pushing. Compound gates are particularly useful to perform complex functions with
relatively low logical efforts. When a particular input is known to be latest, the gate can be
optimized to favor that input. Similarly, when either the rising or falling edge is known to be
more critical, the gate can be optimized to favor that edge. We have focused on building gates
with equal rising and falling delays; however, using smaller pMOS transistors can reduce power,
area, and delay. In processes with multiple threshold voltages, multiple flavors of gates can be
constructed with different speed/leakage power trade-offs
Bubble Pushing CMOS stages are inherently inverting, so AND and OR functions must be built
from NAND and NOR gates. DeMorgan’s law helps with this conversion: In general, logical
effort of compound gates can be different for different inputs. Figure 9.4 shows how logical
efforts can be estimated for the AOI21, AOI22, and a more complex compound AOI gate. The
transistor widths are chosen to give the same drive as a unit inverter. The logical effort of each
input is the ratio of the input capacitance of that input to the input capacitance of the inverter. For
the AOI21 gate, this means the logical effort is slightly lower for the OR terminal (C) than for
the two AND terminals (A, B). The parasitic delay is crudely estimated from the total diffusion
capacitance on the output node by summing the sizes of the transistors attached to the output.
These relations are illustrated graphically in Figure 9.1. A NAND gate is equivalent to an OR of
inverted inputs. A NOR gate is equivalent to an AND of inverted inputs. The same relationship
applies to gates with more inputs. Switching between these representations is easy to do on a
whiteboard and is often called bubble pushing.
Dynamic Circuits
Ratioed circuits reduce the input capacitance by replacing the pMOS transistors connected to the
inputs with a single resistive pullup. The drawbacks of ratioed circuits include slow rising
transitions, contention on the falling transitions, static power dissipation, and a nonzero VOL.
dynamic circuits circumvent these drawbacks by using a clocked pullup transistor rather than a
pMOS that is always ON. Figure 9.21 compares (a) static CMOS, (b) pseudo-nMOS, and (c)
dynamic inverters. Dynamic circuit operation is divided into two modes, as shown in Figure
9.22. During precharge, the clock � is 0, so the clocked pMOS is ON and initializes the output Y
high. During evaluation, the clock is 1 and the clocked pMOS turns OFF. The output may
remain high or may be discharged low through the pulldown network. Dynamic circuits are the
fastest commonly used circuit family because they have lower input capacitance and no
contention during switching. They also have zero static power dissipation. However, they require
careful clocking, consume significant dynamic power, and are sensitive to noise during
evaluation. Clocking of dynamic circuits will be discussed in much more detail in Section 10.5.
In Figure 9.21(c), if the input A is 1 during precharge, contention will take
place because both the pMOS and nMOS transistors will be ON. When the input cannot be
guaranteed to be 0 during precharge, an extra clocked evaluation transistor can be added to the
bottom of the nMOS stack to avoid contention as shown in Figure 9.23. The extra transistor is
sometimes called a foot.
Figure 9.24 shows generic footed and unfootedgates.4 Figure 9.25 estimates the falling logical
effort of both footed and unfooted dynamic gates. As usual, the pulldown transistors’ widths are
chosen to give unit resistance. Precharge occurs while the gate is idle and often may take place
more slowly. Therefore, the precharge transistor width is chosen for twice unit resistance. This
reduces the capacitive load on the clock and the parasitic capacitance at the expense of greater
rising delays. We see that the logical efforts are very low. Footed gates have higher logical effort
than their unfooted counterparts but are still an improvement over static logic. In practice, the
logical effort of footed gates is better than predicted because velocity saturation means series
nMOS transistors have less resistance than we have estimated. Moreover, logical efforts are also
slightly better than predicted because there is no contention between nMOS and pMOS
transistors during the input transition. The size of the foot can be increased relative to the other
nMOS transistors to reduce logical effort of the other inputs at the expense of greater clock
loading. Like pseudo-nMOS gates, dynamic gates are particularly well suited to wide NOR
functions or multiplexers because the logical effort is independent of the number of inputs. Of
course, the parasitic delay does increase with the number of inputs because there is more
diffusion capacitance on the output node. Characterizing the logical effort and parasitic delay of
dynamic gates is tricky because the output tends to fall much faster than the input
rises, leading to potentially misleading dependence of propagation delay on fanout
[Sutherland99]. A fundamental difficulty with dynamic circuits is the monotonicity requirement.
While a dynamic gate is in evaluation, the inputs must be monotonically rising. That is, the input
can start LOW and remain LOW, start LOW and rise HIGH, start HIGH and remain HIGH, but
not start HIGH and fall LOW. Figure 9.26 shows waveforms for a footed dynamic inverter in
which the input violates monotonicity. During precharge, the output is pulled HIGH. When the
clock rises, the input is HIGH so the output is discharged LOW through the pulldown network,
as you would want to have happen in an inverter. The input later falls LOW, turning off the
pulldown network. However, the precharge transistor is also OFF so the output floats, staying
LOW rather than rising as it would in a normal inverter. The output will remain low until the
next precharge step. In summary, the inputs must be monotonically rising for the dynamic gate to
compute the correct function. Unfortunately, the output of a dynamic gate begins HIGH and
monotonically falls LOW during evaluation. This monotonically falling output X is not a suitable
input to asecond dynamic gate expecting monotonically rising signals, as shown in Figure 9.27.
Dynamic gates sharing the same clock cannot be directly connected
of a circuit
circuit
Power can also be considered in active, standby, and sleep modes. Active power is the power
consumed while the chip is doing useful work. It is usually dominated by Pswitching. Standby
power is the power consumed while the chip is idle. If clocks are stopped and ratioed circuits are
disabled, the standby power is set by leakage. In sleep mode, the supplies to unneeded circuits
are turned off to eliminate leakage. This drastically reduces the sleep power required, but the
chip requires time and energy to wake up so sleeping is only viable if the chip will idle for long
enough.[Gonzalez96] found that roughly one-third of microprocessor power is spent on the
clock, another third on memories, and the remaining third on logic and wires. In nanometer
technologies, nearly one-third of the power is leakage. High-speed I/O contributes growing
component too. For example, Figure 5.6 shows the active power consumption of Sun’s 8-core 84
W Niagra2 processor [Nawathe08]. The cores and other components collectively account for
clock, logic, and wires. The next sections investigate how to estimate and minimize each of these
components of power.
Dynamic Power
Dynamic power consists mostly of the switching power, given in EQ (5.10). The supply voltage
VDD and frequency f are readily known by the designer. To estimate this power, one can
consider each node of the circuit. The capacitance of the node is the sum of the gate, diffusion,
and wire capacitances on the node. The activity factor can be estimated using techniques
described in Section 5.2.1 or measured from logic simulations. The effective capacitance of the
node is its true capacitance multiplied by the activity factor. The switching power depends on the
sum of the effective capacitances of all the nodes. Activity factors can be heavily dependent on
the particular task being executed. For example, a processor in a cell phone will use more ower
while running video games than while displaying a calendar. CAD tools do a fine job of power
estimation when given a realistic workload. Low power design involves considering and
reducing each of the terms in switching power.
As VDD is a quadratic term, it is good to select the minimum VDD that can support the required
frequency of operation. Likewise, we choose the lowest frequency of operation that achieves the
desired end performance. The activity factor is mainly reduced by putting unused blocks to sleep.
Finally, the circuit may be optimized to reduce the overall load capacitance of each section.
Dynamic power also includes a short-circuit power component cause by power rushing from
VDD to GND when both the pull up and pull down networks are partially ON while a transistor
switches. This is normally less than 10% of the whole, so it can be conservatively estimated by
adding 10% to the switching power. Switching power is consumed by delivering energy to
charge a load capacitance, then dumping this energy to GND. Intuitively, one might expect that
power could be saved by shuffling the energy around to where it is needed rather than just
dumping it. Resonant circuits, and adiabatic charge-recovering circuits [Maksimovic00, Sathe07]
seek to achieve such a goal. Unfortunately, all of these techniques add complexity that detracts
The most widely used logic style is static complementary CMOS. The static CMOS style
is really an extension of the static CMOS inverter to multiple inputs. In review, the primary
advantage of the CMOS structure is robustness (i.e, low sensitivity to noise), good performance,
and low power consumption with no static power dissipation. Most of those properties are
carried over to large fan-in logic gates implemented using a similar circuit topology. The
complementary CMOS circuit style falls under a broad class of logic circuits called static circuits
in which at every point in time (except during the switching transients), each gate output is
connected to either VDD or Vss via a low-resistance path. Also, the outputs of the gates assume
at all times the value of the Boolean function implemented by the circuit (ignoring, once again,
the transient effects during switching periods). This is in contrast to the dynamic circuit class,
which relies on temporary storage of signal values on the capacitance of high-impedance circuit
nodes. The latter approach has the advantage that the resulting gate is simpler and faster. Its
design and operation are however more involved and prone to failure due to an increased
sensitivity to noise. In this section, we sequentially address the design of various static circuit
flavors including complementary CMOS, ratioed logic (pseudo-NMOS and DCVSL), and
passtransistor logic. The issues of scaling to lower power supply voltages and threshold voltages
will also be dealt with.
Complementary CMOS
Concept
A static CMOS gate is a combination of two networks, called the pull-up network (PUN)
and the pull-down network (PDN) (Figure 6.2). The figure shows a generic N input logic gate
where all inputs are distributed to both the pull-up and pull-down networks. The function of the
PUN is to provide a connection between the output and VDD anytime the output of the logic gate
is meant to be 1 (based on the inputs). Similarly, the function of the PDN is to connect the
output to VSS when the output of the logic gate is meant to be 0. The PUN and PDN networks
are constructed in a mutually exclusive fashion such that one and only one of the networks is
conducting in steady state. In this way, once the transients have settled, a path always exists
between VDD and the output F, realizing a high output (“one”),
or, alternatively, between VSS and F for a low output (“zero”). This is equivalent to stating that
the output node is always a low-impedance node in steady state.
In constructing the PDN and PUN networks, the following observations should be kept in mind:
• A transistor can be thought of as a switch controlled by its gate signal. An NMOS switch is on
when the controlling signal is high and is off when the controlling signal is low. A PMOS
transistor acts as an inverse switch that is on when the controlling signal is low and off when the
controlling signal is high.
• The PDN is constructed using NMOS devices, while PMOS transistors are used in the PUN.
The primary reason for this choice is that NMOS transistors produce “strong zeros,” and PMOS
devices generate “strong ones”. To illustrate this, consider the examples shown in Figure 6.3. In
Figure 6.3a, the output capacitance is initially charged to VDD. Two possible discharge scenarios
are shown. An NMOS device pulls the output all the way down to GND, while a PMOS lowers
the output no further than |VTp| — the PMOS turns off at that point, and stops contributing
discharge current. NMOS transistors are hence the preferred devices in the PDN. Similarly, two
alternative approaches to charging up a capacitor, with the output initially at GND. A PMOS
switch succeeds in charging theoutput all the way to VDD, while the NMOS device fails to raise
the output above VDD-VTn. This explains why PMOS transistors are preferentially used in a
PUN. A set of construction rules can be derived to construct logic functions
NMOS devices connected in series corresponds to an AND function. With all the inputs high, the
series combination conducts and the value at one end of the chain is transferred to the other end.
Similarly, NMOS transistors connected in parallel represent an OR function. A conducting path
exists between the output and input terminal if at least one of the inputs is high. Using similar
arguments, construction rules for PMOS networks can be formulated. A series connection of
PMOS conducts if both inputs are low, representing a NOR function (A.B = A+B), while PMOS
transistors in parallel implement a NAND (A+B = A·B.• Using De Morgan’s theorems ((A + B) =
A·B and A·B = A + B), it can be shown that the pull-up and pull-down networks of a
complementary CMOS structure are dual networks. This means that a parallel connection of
transistors in the pull-up network corresponds to a series connection of the corresponding
devices in the pull-down network, and vice versa. Therefore, to construct a CMOS gate, one of
the networks (e.g., PDN) is implemented using combinations of series and parallel devices. The
other network (i.e., PUN) is obtained using duality principle by walking the hierarchy, replacing
series sub-nets with parallel sub-nets, and parallel sub-nets with series sub-nets. The complete
CMOS gate is constructed by combining the PDN with the PUN.
• The complementary gate is naturally inverting, implementing only functions such as NAND,
NOR, and XNOR. The realization of a non-inverting Boolean function (such as AND OR, or
XOR) in a single stage is not possible, and requires the addition of an extra inverter stage.
• The number of transistors required to implement an N-input logic gate is 2N.
Ratioed Logic
Concept
Ratioed logic is an attempt to reduce the number of transistors required to implement a
given logic function, at the cost of reduced robustness and extra power dissipation. The purpose
of the PUN in complementary CMOS is to provide a conditional path between VDD and the
output when the PDN is turned off. In ratioed logic, the entire PUN is replaced with a single
unconditional load device that pulls up the output for a high output. Instead of a combination of
active pull-down and pull-up networks, such a gate consists of an NMOS pull-down network that
realizes the logic function, and a simple load device. Figure 6.27b shows an example of ratioed
logic, which uses a grounded PMOS load and is referred to as a pseudo-NMOS gate.
The clear advantage of pseudo-NMOS is the reduced number of transistors (N+1 versus 2N for
complementary CMOS). The nominal high output voltage (VOH) for this gate is VDD since the
pull-down devices are turned off when the output is pulled high (assuming that VOL is below
VTn). On the other hand, the nominal low output voltage is
not 0 V since there is a fight between the devices in the PDN and the grounded PMOS load
device. This results in reduced noise margins and more importantly static power dissipation.
The sizing of the load device relative to the pull-down devices can be used to trade-off
parameters such a noise margin, propagation delay and power dissipation. Since the voltage
swing on the output and the overall functionality of the gate depends upon the ratio between the
NMOS and PMOS sizes, the circuit is called ratioed. This is in contrast to the ratioless logic
styles, such as complementary CMOS, where the low and high levels do not depend upon
transistor sizes. Computing the dc-transfer characteristic of the pseudo-NMOS proceeds along
paths
similar to those used for its complementary CMOS counterpart. The value of VOL is obtained by
equating the currents through the driver and load devices for Vin = VDD. At this operation point,
it is reasonable to assume that the NMOS device resides in linear mode (since the output should
ideally be close to 0V), while the PMOS load is saturated. In order to make VOL as small as
possible, the PMOS device should be sized much smaller than the NMOS pull-down devices.
Unfortunately, this has a negative impact on the propagation delay for charging up the output
node since the current provided by the PMOS device is limited.
A major disadvantage of the pseudo-NMOS gate is the static power that is dissipated when the
output is low through the direct current path that exists between VDD and GND. The static
power consumption in the low-output mode is easily derived
Pass-Transistor Logic
Pass-Transistor Basics
A popular and widely-used alternative to complementary CMOS is pass-transistor logic,
which attempts to reduce the number of transistors required to implement logic by allowing the
primary inputs to drive gate terminals as well as source/drain terminals This is in contrast to
logic families that we have studied so far, which only allow primary inputs to drive the gate
terminals of MOSFETS. shows an implementation of the AND function constructed that way,
using only NMOS transistors. In this gate, if the B input is high, the top transistor is turned on
and copies the input A to the output F. When B is low, the bottom pass transistor is turned on and
passes a 0. The switch driven by B seems to be redundant at first glance. Its presence is essential
to ensure that the gate is static, this is that a low-impedance path exists to the supply rails under
all circumstances, or, in this particular case, when B is low. The promise of this approach is that
fewer transistors are required to implement a given function. For example, the implementation of
the AND gate in Figure 6.33 requires 4 transistors (including the inverter required to invert B),
while a complementary CMOS implementation would require 6 transistors. The reduced number
of devices has the additional advantage of lower capacitance. Unfortunately, as discussed earlier,
an NMOS device is effective at passing a 0 but is poor at pulling a node to VDD. When the pass
transistor pulls a node high, the output only charges up to VDD -VTn. In fact, the situation is
worsened by the fact that the devices
Precharge
When CLK = 0, the output node Out is precharged to VDD by the PMOS transistor Mp. During
that time, the evaluate NMOS transistor Me is off, so that the pull-down path is disabled. The
evaluation FET eliminates any static power that would be consumed during the precharge period
(this is, static current would flow between the supplies if both the pulldown and the precharge
device were turned on simultaneously).
Evaluation
For CLK = 1, the precharge transistor Mp is off, and the evaluation transistor Me is turned on.
The output is conditionally discharged based on the input values and the pull-down topology. If
the inputs are such that the PDN conducts, then a low resistance path exists between Out and
GND and the output is discharged to GND. If the PDN is turned off, the precharged value
remains stored on the output capacitance CL, which is a combination of junction capacitances,
the wiring capacitance, and the input capacitance of the fan-out gates. During the evaluation
phase, the only possible path between the output node and a supply rail is to GND.
Consequently, once Out is discharged, it cannot be charged again till then next precharge
operation. The inputs to the gate can therefore make at most one transition during evaluation.
Notice that the output can be in the high-impedance state during the evaluation period if the pull-
down network is turned off. This behavior is fundamentally different from the static counterpart
that always has a low resistance path between the output and one of the power rails. As as an
example, consider the circuit shown in Figure 6.52b. During the precharge phase (CLK=0), the
output is precharged to VDD regardless of the input values since the evaluation device is turned
off. During evaluation (CLK=1), a conducting path is created
between Out and GND if (and only if) A·B+C is TRUE. Otherwise, the output remains at the
precharged state of VDD. The following function is thus realized:
A number of important properties can be derived for the dynamic logic gate:
• The logic function is implemented by the NMOS pull-down network. The construction of the
PDN proceeds just as it does for static CMOS.
• The number of transistors (for complex gates) is substantially lower than in the static case: N +
2 versus 2N.
• It is non-ratioed. The sizing of the PMOS precharge device is not important for realizing proper
functionality of the gate. The size of the precharge device can be made large to improve the low-
to-high transition time (of course, at a cost to the high-to low transition time). There is however,
a trade-off with power dissipation since a larger precharge device directly increases clock-power
dissipation.
• It only consumes dynamic power. Ideally, no static current path ever exists between VDD and
GND. The overall power dissipation, however, can be significantly higher compared to a static
logic gate.
• The logic gates have faster switching speeds. There are two main reasons for this. The first
(obvious) reason is due to the reduced load capacitance attributed to the lower number of
transistors per gate and the single-transistor load per fan-in. Second, the dynamic gate does not
have short circuit current, and all the current provided by the pull-down devices goes towards
discharging the load capacitance. The low and high output levels VOL and VOH are easily
identified as GND and VDD and are not dependent upon the transistor sizes. The other VTC
parameters are dramatically different from static gates. Noise margins and switching thresholds
have been defined as static quantities that are not a function of time. To be functional, a dynamic
gate requires a periodic sequence of precharges and evaluations. Pure static analysis, therefore,
does not apply. During the evaluate period, the pull-down network of a dynamic inverter starts to
conduct when the input signal exceeds the threshold voltage (VTn) of the NMOS pull-down
transistor. Therefore, it is reasonable to set the switching threshold (VM) as well as VIH and VIL
of the gate equal to VTn. This translates to a low value for the NML.
When evaluating the power dissipation of a dynamic gate, it would appear that dynamic logic
presents a significant advantage. There are three reasons for this. First, the physical capacitance
is lower since dynamic logic uses fewer transistors to implement a given function. Also, the load
seen for each fanout is one transistor instead of two. Second, dynamic logic gates by construction
can at most have one transition per clock cycle. Glitching (or dynamic hazards) does not occur in
dynamic logic. Finally, dynamic gates do not exhibit short circuit power since the pull-up path is
not turned on when the gate is evaluating. While these arguments are generally true, they are
offset by other considerations:
(i) the clock power of dynamic logic can be significant, particularly since the clock node has
a guaranteed transition on every single clock cycle;
(ii) the number of transistors is higherthan the minimal set required for implementing the
logic;
(iii) short-circuit power may exist when leakage-combatting devices are added (as will be
discussed further);
(iv) and,most importantly, dynamic logic generally displays a higher switching activity due
to the periodic precharge and discharge operations. Earlier, the transition probability for
a static gate was shown to be p0 p1= p0 (1-p0). For dynamic logic, the output transition
probabilitydoes not depend on the state (history) of the inputs, but rather on the signal
probabilities only. For an n-tree dynamic gate, the output makes a 0Õ1 transition during
the precharge phase only if the output was discharged during the preceding evaluate
phase. The 0Õ1 transition probability for an n-type dynamic gate hence equals
A pulsed latch can be built from a conventional CMOS transparent latch driven by a brief clock
pulse. Figure 10.22(a) shows a simple pulse generator, sometimes called a clock chopper or one-
shot [Harris01a]. The pulsed latch is faster than a regular flip-flop because it involves a single
latch rather than two and because it allows time borrowing. It can also consume less energy,
although the pulse generator adds to the energy consumption (and is ideally shared across
multiple pulsed latches for energy and area efficiency). The drawback is the increased hold time.
The Naffziger pulsed latch used on the Itanium 2
processor consists of the latch from Figure 10.17(k) driven by even shorter pulses produced by
the generator of Figure 10.22(b) [Naffziger02]. This pulse generator uses a fairly slow (weak)
inverter to produce a pulse with a nominal width of about one-sixth of the cycle (125 ps for 1.2
GHz operation). When disabled, the internal node of the pulse generator floats high momentarily,
but no keeper is required because the duration is short. Of course, the enable signal has setup and
hold requirements around the rising edge of the clock, as shown in Figure 10.22(c).
Figure 10.22(d) shows yet another pulse generator used on an NEC RISC processor [Kozu96] to
produce substantially longer pulses. It includes a built-in dynamic transmission gate latch to
prevent the enable from glitching during the pulse. Many designers consider short pulses risky.
The pulse generator should be carefully simulated across process corners and possible RC loads
to ensure the pulse is not degraded too badly by process variation or routing. However, the
Itanium 2 team found that the pulses could be used just as regular clocks as long as the pulse
generator had adequate drive. The quad-core Itanium pulse generator selects between 1- and 3-
inverter delay chains using a transmission gate multiplexer [Stackhouse09]. The wider pulse
offers more robust latch operation across process and environmental variability and permits more
time borrowing, but increases the hold time. The multiplexer select is software-programmable to
fix problems discovered after fabrication. The Partovi pulsed latch in Figure 10.23 eliminates the
need to distribute the pulse by building the pulse generator into the latch itself [Partovi96,
Draper97]. The weak crosscoupled inverters in the dashed box staticize the circuit, although the
latch is susceptible to back-driven output noise on Q or Q unless an extra inverter is used to
buffer the output. The Partovi pulsed latch was used on the AMD K6 and Athlon [Golden99], but
is slightly slower than a simple latch [Naffziger02]. It was originally called an Edge Triggered
Latch (ETL), but strictly speaking is a pulsed latch because it has a brief window of
transparency.
the clock, potentially introducing clock skew. addresses techniques to minimize the skew by
building the AND gate into the final buffer of the clock distribution network. en must be stable
while the clock is high to prevent glitches on the clock, as will be discussed further
Timing Properties of the multiplexer Bases Master-Slave Register. As discussed earlier, there
are three important timing metrics in registers: the set up time, the hold time and the propagation
delay. It is important to understand these factors that affect the timing parameters and develop
the intuition to manually estimate the parameters. Assume that the propagation delay of each
inverter is tpd_inv and the propagation delay of the transmission gate is tpd_tx. Also assume that
the contamination delay is 0 and the inverter delay to derive CLK from CLK has a delay equal to
0.
The set-up time is the time before the rising edge of the clock that the input dataD must
become valid. Another way to ask the question is how long before the rising edge does the D
input have to be stable such that QM samples the value reliably. For the transmission gate
multiplexer-based register, the input D has to propagate through I1, T1, I3 and I2 before the
rising edge of the clock. This is to ensure that the node voltage s on both terminals of the
transmission gate T2 are at the same value. Otherwise, it is possible for the cross-coupled pair I2
and I3 to settle to an incorrect value. The set-up time is therefore equal to 3 *tpd_inv + tpd_tx .
The propagation delay is the time for the value of QM to propagate to the output Q. Note that
since we included the delay ofI2 in the set-up time, the output of I4 is valid before the rising edge
of clock. Therefore the delayt c-q is simply the delay throughT 3 and I6 (tc-q = tpd_tx +
tpd_inv).The hold time represents the time that the input must be held stable after the rising edge
of the clock. In this case, the transmission gateT 1 turns off when clock goes high and therefore
any changes in theD-input after clock going high are not seen by the input. Therefore, the hold
time is 0.
As mentioned earlier, the drawback of the transmission gate register is the high
capacitive load presented to the clock signal. The clock load per register is important since it
directly impacts the power dissipation of the clock network. Ignoring the overhead required to
invert the clock signal (since the buffer inverter overhead can be amortized over multiple register
bits), each register has a clock load of 8 transistors. One approach to reduce the
clock load at the cost of robustness is to make the circuit ratioed. Figure 7.18 shows that the
feedback transmission gate can be eliminated by directly cross coupling the inverters.
The penalty for the reduced clock load is increased design complexity. The transmission gate
(T1) and its source driver must overpower the feedback inverter (I2) to switch the state of the
cross-coupled inverter. The sizing requirements for the transmission gates can be derived using a
similar analysis as performed for the SR flip-flop. The input to the inverter I1 must be brought
below its switching threshold in order to make a transition. If minimum-sized devices are to be
used in the transmission gates, it is essential that the transistors of inverter I2 should be made
even weaker. This can be accomplished by making their channel-lengths larger than minimum.
Using minimum or close-to-minimumsize devices in the transmission gates is desirable to reduce
the power dissipation in the latches and the clock distribution network. Another problem with
this scheme is the reverse conduction — this is ,the second stage can affect the state of the first
latch. When the slave stage is on (Figure 7.19), it is possible for the combination of T2 and I4 to
influence the data stored in I1-I2 latch. As long as I4 is a weak device, this is fortunately not a
major problem.
We must also observe the strategy decided on earlier for the direction of data and control signal
flow, and the approach adopted should make this feasible. Remember that the overall strategy in
this case is for data to flow horizontally and control signals vertically. A solution which meets
these requirements emerges from the days of switch and relay contact based switching
networks-the crossbar switch. Consider a direct MOS switch implementation of a 4 x 4 crossbar
switch, as in Figure 7.6. The arrangement is quite general and may be readily expanded to
accommodate n-bit inputs/outputs. In fact, this arrangement is an overkill in that any input line
can be connected to any or all output lines-if all switches are closed, then all inputs are connected
to all outputs in one glorious short circuit. Furthermore, 16 control signals (sw00-sw15), one for
each transistor switch, must be provided to drive the crossbar switch, and such complexity is
highly undesirable. An adaptation of this arrangement recognizes the fact that we can couple the
switch gates together in groups of four (in this case) and also form four separate groups
corresponding to shifts of zero, one, two and three bits. The arrangement is readily adapted so
that the in-lines also run horizontally (to conform to the required strategy). The resulting
arrangement is known as a barrel shifter and a 4 x 4-bit barrel shifter circuit diagram is given ih
Figure 7. 7. The interbus switches have their gate inputs connected in a staircase fashion in
groups of four and there are now four shift control inputs which must be mutually exclusive in
the active state. CMOS transmission gates may be used in place of the simple pass transistor
switches if appropriate.
The structure of the barrel shifter is clearly one of high regularity and generality and it may be
readily represented in stick diagram form. One possible implementation, using simple n-type
switches, is given in Figure 7.8. The stick diagram clearly conveys regular topology and allows
the choice of a standard
cell from which complete barrel shifters of any size may be formed by replication of the standard
cell. It should be noted that standard cell boundaries must be carefully chosen to allow for
butting together side by side and top to bottom to retain the overall topology. The mask layout
for standard cell number 2 (arbitrary choice) of Figure 7.8 may then be set out as in Figure 7.9.
Once the standard cell dimensions have been determined, then any n x n barrel shifter may be
configured and its outline, or bounding box, arrived at by summing up the dimensions of the
replicated standard cell. The use of simple n-type switches in a CMOS environment might be
questioned. Although there will be a degrading of logic 1 levels through n-type switches, this
generally does not matter if the shifter is followed by restoring circuitry such as inverters or gate
logic. Furthermore, as there will only ever be one n-type switch in series between an input and
the corresponding output line, the arrangement is fast. The minimum size bounding box outline
for the 4 x 4-way barrel shifter is given in
Figure 7.10. The figure also indicates all inlet and outlet points around the periphery together
with the layer on which each is located. This allows ready placing of the shifter within the floor
plan (Figure 7.5) and its interconnection with the other subsystems forming the datapath. It also
emphasizes the fact that, as in this case, many subsystems need external links to complete their
architecture. In this case, the links shown on the right of the bounding box must be made and
must be allowed for in interconnections and overall dimensions. This form of representation also
allows the subsystem geometric characterization to be that of the bounding box alone for
composing higher levels of the system hierarchy.
Carry-Propagate Addition
N-bit adders take inputs {AN, …, A1}, {BN, …, B1}, and carry-in Cin, and compute the sum
{SN, …, S1} and the carry-out of the most significant bit Cout, as shown in Figure 11.9. carry-
out. Long adders use multiple levels of lookahead structures for even more speed.
Carry-Ripple Adder An N-bit adder can be constructed by cascading N full adders, as shown
in Figure 11.11(a) for N carry-ripple adder (or ripple-carry adder). The
carry-out of bit i, Ci, is the carry-in to bit i weight of
the sum Si. The delay of the adder is set by the time for the carries to ripple through the N stages,
so the tC�Cout delay should be
minimized. This delay can be reduced by omitting the inverters on the outputs, as was done in
Figure 11.4(c). Because addition is a self-dual function (i.e., the function of complementary
inputs is the complement of the function), an inverting full adder receiving complementary
inputs produces true outputs. Figure 11.11(b) shows a carry ripple adder built from inverting full
adders. Every other stage operates on complementary data. The delay inverting the adder inputs
or sum outputs is off the critical ripple-carry path.
Observe that the carry into bit i is the carry-out of bit i–1 and is Ci– Gi–1:0. This is an
important relationship; group generate signals and carries will be used synonymously in the
subsequent sections. We can thus compute the sum for bit i using EQ (11.2) as (11.7)
1. Computing bitwise generate and propagate signals using EQs (11.5) and (11.6)
2. Combining PG signals to determine group generates Gi–1:0 for all N � i � 1 using EQ (11.4)
3. Calculating the sums using EQ (11.7)
These steps are illustrated in Figure 11.12. The first and third steps are routine, so most of the
attention in the remainder of this section is devoted to alternatives for the group PG logic with
different trade-offs between speed, area, and complexity. Some of the hardware can be shared in
the bitwise PG logic, as shown in Figure 11.13. carry-in to carry-out along the carry chain
majority gates. As the P and G signals will have already stabilized by the time the carry arrives,
we can use them to simplify the majority function into an AND-OR gate:3
In this extreme, the group propagate signals are never used and need not be computed. Figure
11.14 shows a 4-bit carry-ripple adder. The critical carry path now proceeds through a chain of
AND-OR gates rather than a chain of majority gates. Figure 11.15 illustrates the group PG logic
for a 16-bit carry-ripple adder, where the AND-OR gates in the group PG network are
represented with gray cells. Diagrams like these will be used to compare a variety of adder
architectures in subsequent sections. The diagrams use black cells, gray cells, and white buffers
defined in Figure 11.16(a) for valency-2 cells. Black cells contain the group generate and
propagate logic (an AND-OR gate and an AND gate) defined in EQ (11.4). Gray cells containing
only the group generate logic are used at the final cell position in each column because only the
group generate signal is required to compute the sums. Buffers can be used to minimize the load
on critical paths. Each line represents a bundle of the group generate and propagate signals
(propagate signals are omitted after gray cells).
Types of ASICs
ICs are made on a thin (a few hundred microns thick), circular silicon wafer , with each wafer
holding hundreds of die (sometimes people use dies or dice for the plural of die). The transistors
and wiring are made from many layers (usually between 10 and 15 distinct layers) built on top of
one another. Each successive mask layer has a pattern that is defined using a mask similar to a
glass photographic slide. The first half-dozen or so layers define the transistors. The last half-
dozen or so layers define the metal wires between the transistors (the interconnect ).
A full-custom IC includes some (possibly all) logic cells that are customized and all mask layers
that are customized. A microprocessor is an example of a full-custom IC—designers spend many
hours squeezing the most out of every last square micron of microprocessor chip space by hand.
Customizing all of the IC features in this way allows designers to include analog circuits,
optimized memory cells, or mechanical structures on an IC, for example. Full-custom ICs are the
most expensive to manufacture and to design. The manufacturing lead time (the time it takes
just to make an IC—not including design time) is typically eight weeks for a full-custom IC.
These specialized full-custom ICs are often intended for a specific application, so we might call
some of them full-custom ASICs. We shall discuss full-custom ASICs briefly next, but the
members of the IC family that we are more interested in are semicustom ASICs , for which all
of the logic cells are predesigned and some (possibly all) of the mask layers are customized.
Using predesigned cells from a cell library makes our lives as designers much, much easier.
There are two types of semicustom ASICs that we shall cover: standard-cell–based ASICs and
gate-array–based ASICs. Following this we shall describe the programmable ASICs , for which
all of the logic cells are predesigned and none of the mask layers are customized. There are two
types of programmable ASICs: the programmable logic device and, the newest member of the
ASIC family, the field programmable gate array.
Full-Custom ASICs
In a full-custom ASIC an engineer designs some or all of the logic cells, circuits, or layout
specifically for one ASIC. This means the designer abandons the approach of using pretested and
precharacterized cells for all or part of that design. It makes sense to take this approach only if
there are no suitable existing cell libraries available that can be used for the entire design. This
might be because existing cell libraries are not fast enough, or the logic cells are not small
enough or consume too much power. You may need to use full-custom design if the ASIC
technology is new or so specialized that there are no existing cell libraries or because the ASIC is
so specialized that some circuits must be custom designed. Fewer and fewer full-custom ICs are
being designed because of the problems with these special parts of the ASIC. There is one
growing member of this family, though, the mixed analog/digital ASIC, which we shall discuss
next. Bipolar technology has historically been used for precision analog functions. There are
some fundamental reasons for this. In all integrated circuits the matching of component
characteristics between chips is very poor, while the matching of characteristics between
components on the same chip is excellent. Suppose we have transistors T1, T2, and T3 on an
analog/digital ASIC. The three transistors are all the same size and are constructed in an identical
fashion. Transistors T1 and T2 are located adjacent to each other and have the same orientation.
Transistor T3 is the same size as T1 and T2 but is located on the other side of the chip from T1
and T2 and has a different orientation. ICs are made in batches called wafer lots. A wafer lot
is a group of silicon wafers that are all processed together. Usually there are between 5 and 30
wafers in a lot. Each wafer can contain tens or hundreds of chips depending on the size of the IC
and the wafer. If we were to make measurements of the characteristics of transistors T1, T2, and
T3 we would find the following: Transistors T1 will have virtually identical characteristics to T2
on the same IC. We say that the transistors match well or the tracking between devices is
excellent. l Transistor T3 will match transistors T1 and T2 on the same IC very well, but not as
closely as T1 matches T2 on the same IC. l Transistor T1, T2, and T3 will match fairly well with
transistors T1, T2, and T3 on a different IC on the same wafer. The matching will depend on
how far apart the two ICs are on the wafer.
l Transistors on ICs from different wafers in the same wafer lot will not match very well. l
Transistors on ICs from different wafer lots will match very poorly. For many analog designs the
close matching of transistors is crucial to circuit operation. For these circuit designs pairs of
transistors are used, located adjacent to each other. Device physics dictates that a pair of bipolar
transistors will always match more precisely than CMOS transistors of a comparable size.
Bipolar technology has historically been more widely used for full-custom analog design because
of its improved precision. Despite its poorer analog properties, the use of CMOS technology for
analog functions is increasing. There are two reasons for this. The first reason is that CMOS is
now by far the most widely available IC technology. Many more CMOS ASICs and CMOS
standard products are now being manufactured than bipolar ICs. The second reason is that
increased levels of integration require mixing analog and digital functions on the same IC: this
has forced designers to find ways to use CMOS technology to implement analog functions.
Circuit designers, using clever new techniques, have been very successful in finding new ways to
design analog CMOS circuits that can approach the accuracy of bipolar analog designs.
Standard-Cell–Based ASICs
A cell-based ASIC (cell-based IC, or CBIC —a common term in Japan, pronounced “sea-bick”)
uses predesigned logic cells (AND gates, OR gates, multiplexers, and flipflops, for example)
known as standard cells . We could apply the term CBIC to any IC that uses cells, but it is
generally accepted that a cell-based ASIC or CBIC means a standard-cell–based ASIC. The
standard-cell areas (also called flexible blocks) in a CBIC are built of rows of standard cells—
like a wall built of bricks. The standard-cell areas may be used in combination with larger
predesigned cells, perhaps microcontrollers or even microprocessors, known as megacells .
Megacells are also called megafunctions, fullcustom blocks, system-level macros (SLMs), fixed
blocks, cores, or Functional Standard Blocks (FSBs).
The ASIC designer defines only the placement of the standard cells and the interconnect in a
CBIC. However, the standard cells can be placed anywhere on the silicon; this means that all the
mask layers of a CBIC are customized and are unique to a particular customer. The advantage of
CBICs is that designers save time, money, and reduce risk by using a predesigned, pretested, and
precharacterized standard-cell library . In addition each standard cell can be optimized
individually. During the design of the cell library each and every transistor in every standard cell
can be chosen to maximize speed or minimize area, for example. The disadvantages are the time
or expense of designing or buying the standard-cell library and the time needed to fabricate all
layers of the ASIC for each new design. Figure 1.2 shows a CBIC (looking down on the die
shown in Figure 1.1b, for example). The important features of this type of ASIC are as follows:
l All mask layers are customized—transistors and interconnect. l Custom blocks can be
embedded. l Manufacturing lead time is about eight weeks Each standard cell in the library is
constructed using full-custom design methods, but you can use these predesigned and
precharacterized circuits without having to do any full-custom design yourself. This design style
gives you the same performance and flexibility advantages of a full-custom ASIC but reduces
design time and reduces risk. Standard cells are designed to fit together like bricks in a wall.
Figure 1.3 shows an example of a simple standard cell (it is simple in the sense it is not
maximized for density—but ideal for showing you its internal construction). Power and ground
buses (VDD and GND or VSS) run horizontally on metal lines inside the cells. Standard-cell
design allows the automation of the process of assembling an ASIC. Groups of standard cells fit
horizontally together to form rows. The rows stack vertically to form flexible rectangular blocks
(which you can reshape during design). You may then connect a flexible block built from
several rows of standard cells to other standard-cell blocks or other full-custom logic blocks. For
example, you might want to include a custom interface to a standard, predesigned
microcontroller together with some memory. The microcontroller block may be a fixed-size
megacell, you might generate the memory using a memory compiler, and the custom logic and
memory controller will be built from flexible standard-cell blocks, shaped to fit in the empty
spaces on the chip.
Both cell-based and gate-array ASICs use predefined cells, but there is a difference—we can
change the transistor sizes in a standard cell to optimize speed and performance, but the device
sizes in a gate array are fixed. This results in a tradeoff in performance and area in a gate array at
the silicon level. The trade-off between area and performance is made at the library level for a
standard-cell ASIC. Modern CMOS ASICs use two, three, or more levels (or layers) of metal for
interconnect. This allows wires to cross over different layers in the same way that we use copper
traces on different layers on a printed-circuit board. In a two-level metal CMOS technology,
connections to the standard-cell inputs and outputs are usually made using the second level of
metal ( metal2 , the upper level of metal) at the tops and bottoms of the cells. In a three-level
metal technology, connections may be internal to the logic cell (as they are in Figure 1.3). This
allows for more sophisticated routing programs to take advantage of the extra metal layer to
route interconnect over the top of the logic cells. We shall cover the details of routing ASICs in
A connection that needs to cross over a row of standard cells uses a feedthrough. The term
feedthrough can refer either to the piece of metal that is used to pass a signal through a cell or to
a space in a cell waiting to be used as a feedthrough—very confusing. Figure 1.4 shows two
feedthroughs: one in cell A.14 and one in cell A.23. In both two-level and three-level metal
technology, the power buses (VDD and GND) inside the standard cells normally use the lowest
(closest to the transistors) layer of metal ( metal1 ). The width of each row of standard cells is
adjusted so that they may be aligned using spacer cells . The power buses, or rails, are then
connected to additional vertical power rails using row-end cells at the aligned ends of each
standard-cell block. If the rows of standard cells are long, then vertical power rails can also be
run in metal2 through the cell rows using special power cells that just connect to VDD and
GND. Usually the designer manually controls the number and width of the vertical power rails
connected to the standard-cell blocks during physical design. A diagram of the power
distribution scheme for a CBIC is shown in Figure 1.4. All the mask layers of a CBIC are
customized. This allows megacells (SRAM, a SCSI controller, or an MPEG decoder, for
example) to be placed on the same IC with standard cells. Megacells are usually supplied by an
ASIC or library company complete with behavioral models and some way to test them (a test
strategy). ASIC library companies also supply compilers to generate flexible DRAM, SRAM,
and ROM blocks. Since all mask layers on a standard-cell design are customized, memory design
is more efficient and denser than for gate arrays. For logic that operates on multiple signals
across a data bus—a datapath ( DP )—the use of standard cells may not be the most efficient
ASIC design style. Some ASIC library companies provide a datapath compiler that
automatically generates datapath logic . A datapath library typically contains cells such as
adders, subtracters, multipliers, and simple arithmetic and logical units ( ALUs ). The
connectors of datapath library cells are pitch-matched to each other so that they fit together.
Connecting datapath cells to form a datapath usually, but not always, results in faster and denser
layout than using standard cells or a gate array. Standard-cell and gate-array libraries may
contain hundreds of different logic cells, including combinational functions (NAND, NOR,
AND, OR gates) with multiple inputs, as well as latches and flip-flops with different
combinations of reset, preset and clocking options. The ASIC library company provides
designers with a data book in paper or electronic form with all of the functional descriptions and
timing information for each library element.
Gate-Array–Based ASICs
In a gate array (sometimes abbreviated to GA) or gate-array–based ASIC the transistors are
predefined on the silicon wafer. The predefined pattern of transistors ona gate array is the base
array , and the smallest element that is replicated to make the base array (like an M. C. Escher
drawing, or tiles on a floor) is the base cell (sometimes called a primitive cell ). Only the top
few layers of metal, which define the interconnect between transistors, are defined by the
designer using custom masks. To distinguish this type of gate array from other types of gate
array, it is often called amasked gate array ( MGA ). The designer chooses from a gate-array
library of predesigned and precharacterized logic cells. The logic cells in a gate-array library are
often called macros . The reason for this is that the base-cell layout is the same for each logic
cell, and only the interconnect (inside cells and between cells) is customized, so that there is a
similarity between gate-array macros and a software macro. Inside IBM, gate-array macros are
known as books (so that books are part of a library), but unfortunately this descriptive term is not
very widely used outside IBM. We can complete the diffusion steps that form the transistors and
then stockpile wafers (sometimes we call a gate array a prediffused array for this reason). Since
only the metal interconnections are unique to an MGA, we can use the stockpiled wafers for
different customers as needed. Using wafers prefabricated up to the metallization steps reduces
the time needed to make an MGA, the turnaround time , to a few days or at most a couple of
weeks. The costs for all the initial fabrication steps for an MGA are shared for each customer and
this reduces the cost of an MGA compared to a full-custom or standard-cell ASIC design. There
are the following different types of MGA or gate-array–based ASICs:
The hyphenation of these terms when they are used as adjectives explains their construction. For
example, in the term “channeled gate-array architecture,” the gate array is channeled , as will be
explained. There are two common ways of arranging (or arraying) the transistors on a MGA: in a
channeled gate array we leave space between the rows of transistors for wiring; the routing on a
channelless gate array uses rows of unused transistors. The channeled gate array was the first to
be developed, but the channelless gate-array architecture is now more widely used. A structured
(or embedded) gate array can be either channeled or channelless but it includes (or embeds) a
custom block.
Channelless gate array (also known as a channel-free gate array , sea-of-gates array , or
SOG array). The important features of this type of MGA are as follows:
Only some (the top few) mask layers are customized—the interconnect.
Manufacturing lead time is between two days and two weeks.
The key difference between a channelless gate array and channeled gate array is that there are no
predefined areas set aside for routing between cells on a channelless gate array. Instead we route
over the top of the gate-array devices. We can do this because we customize the contact layer
that defines the connections between metal1, the first layer of metal, and the transistors. When
we use an area of transistors for routing in a channelless array, we do not make any contacts to
the devices lying underneath; we simply leave the transistors unused. The logic density—the
amount of logic that can be implemented in a given silicon area—is higher for channelless gate
arrays than for channeled gate arrays. This is usually attributed to the difference in structure
between the two types of array. In fact, the difference occurs because the contact mask is
customized in a channelless gate array, but is not usually customized in a channeled gate array.
This leads to denser cells in the channelless architectures. Customizing the contact layer in a
channelless gate array allows us to increase the density of gate-array cells because we can route
over the top of unused contact sites.
may still be more efficient and cheaper than implementing a 32 k-bit memory using macros on a
SOG array. ASIC vendors may offer several embedded gate array structures containing different
memory types and sizes as well as a variety of embedded functions. ASIC companies wishing to
offer a wide range of embedded functions must ensure that enough customers use each different
embedded gate array to give the cost advantages over a custom gate array or CBIC (the Sun
Microsystems SPARCstation 1 described in Section 1.3 made use of LSI Logic embedded gate
arrays—and the 10K and 100K series of embedded gate arrays were two of LSI Logic’s most
successful products).
An ASIC vendor library is normally a phantom library —the cells are empty boxes, or
phantoms , but contain enough information for layout (for example, you would only see the
bounding box or abutment box in a phantom version of the cell in Figure 1.3). After you
complete layout you hand off a netlist to the ASIC vendor, who fills in the empty boxes (
phantom instantiation ) before manufacturing your chip.
The second and third choices require you to make a buy-or-build decision. If you complete an
ASIC design using a cell library that you bought, you also own the masks (the tooling ) that are
used to manufacture your ASIC. This is called customerowned tooling ( COT , pronounced
“see-oh-tee”). A library vendor normally develops a cell library using information about a
process supplied by an ASIC foundry . An ASIC foundry (in contrast to an ASIC vendor) only
provides manufacturing, with no design help. If the cell library meets the foundry specifications,
we call this a qualified cell library . These cell libraries are normally expensive (possibly
several hundred thousand dollars), but if a library is qualified at several foundries this allows you
to shop around for the most attractive terms. This means that buying an expensive library can be
cheaper in the long run than the other solutions for high-volume production.
The third choice is to develop a cell library in-house. Many large computer and electronics
companies make this choice. Most of the cell libraries designed today are still developed in-
house despite the fact that the process of library development is complex and very expensive.
However created, each cell in an ASIC cell library must contain the following:
A physical layout
A behavioral model
A Verilog/VHDL model
A detailed timing model
A test strategy
A circuit schematic
A cell icon
A wire-load model
A routing model
For MGA and CBIC cell libraries we need to complete cell design and cell layout and shall
discuss this in Chapter 2. The ASIC designer may not actually see the layout if it is hidden inside
a phantom, but the layout will be needed eventually. In a programmable ASIC the cell layout is
part of the programmable ASIC design.
The ASIC designer needs a high-level, behavioral model for each cell because simulation at the
detailed timing level takes too long for a complete ASIC design. For a NAND gate a behavioral
model is simple. A multiport RAM model can be very complex. We shall discuss behavioral
models when we describe Verilog and VHDL ASIC designers also need a detailed timing model
for each cell to determine the performance of the critical pieces of an ASIC. It is too difficult, too
time-consuming, and too expensive to build every cell in silicon and measure the cell delays.
Instead library engineers simulate the delay of each cell, a process known as characterization.
Characterizing a standard-cell or gate-array library involves circuit extraction from the full-
custom cell layout for each cell. The extracted schematic includes all the parasitic resistance and
capacitance elements. Then library engineers perform a simulation of each cell including the
parasitic elements to determine the switching delays. The simulation models for the transistors
are derived from measurements on special chips included on a wafer called process control
monitors ( PCMs ) or dropins . Library engineers then use the results of the circuit simulation
to generate detailed timing models for logic simulation.
All ASICs need to be production tested (programmable ASICs may be tested by the maufacturer
before they are customized, but they still need to be tested). Simple cells in small or medium-size
blocks can be tested using automated techniques, but large blocks such as RAM or multipliers
need a planned strategy. We shall discuss test in. The cell schematic (a netlist description)
describes each cell so that the cell designer can perform simulation for complex cells. You may
not need the detailed cell schematic for all cells, but you need enough information to compare
what you think is on the silicon (the schematic) with what is actually on the silicon (the
layout)—this is a layout versus schematic ( LVS ) check. If the ASIC designer uses schematic
entry, each cell needs a cell icon together with connector and naming information that can be
used by design tools from different vendors. We shall cover ASIC design using schematic entry
in. One of the advantages of using logic synthesis rather than schematic design entry is
eliminating the problems with icons, connectors, and cell names. Logic synthesis also makes
moving an ASIC between different cell libraries, or retargeting , much easier. In order to
estimate the parasitic capacitance of wires before we actually complete any 1.5 ASIC Cell
Libraries routing, we need a statistical estimate of the capacitance for a net in a given size circuit
block. This usually takes the form of a look-up table known as a wire-load model . We also need
a routing model for each cell. Large cells are too complex for the physical design or layout tools
to handle directly and we need a simpler representation—a phantom —of the physical layout
that still contains all the necessary information. The phantom may include information that tells
the automated routing tool where it can and cannot place wires over the cell, as well as the
location and types of the connections to the cell.