Bibliography
[1] M. Anaratone. Digital CMOS Circuit Design. Kluwer Academic Publishers, 1986.
[2] Stephen D. Brown, Robert J. Francis, Jonathan Rose, and Zvonko G. Vranesic. Field-
Programmable Gate Arrays. Kluwer Academic Publishers, 1992.
[4] Murray Disman. The Programmable Logic IC Market. Electronic Trend Publications,
1992.
[5] European Silicon Structures (ES2), Zone Industrielle, 13106 Rousset, France. Solo 2030
User Guide, e02a02 edition, June 1992.
[6] Daniel D. Gajski. Silicon Compilation. Addison-Wesley Publishing Company, Inc., 1988.
[7] Randall L. Geiger, Phillip E. Allen, and Noel R. Strader. VLSI Design Techniques for
Analog and Digital Circuits. McGraw-Hill, Inc., 1990.
[8] Abhijit Ghosh, Srinivas Devadas, and A. Richard Newton. Sequential Logic Testing and
Verification. Kluwer Academic Publishers, 1992.
[9] Lance A. Glasser and Daniel W. Dobberpuhl. The Design and Analysis of VLSI Circuits.
Addison-Wesley Publishing Company, 1985.
[10] John P. Hayes. Computer Architecture and Organization. McGraw-Hill, Inc., 1988.
[11] David A. Hodges and Horace G. Jackson. Analysis and Design of Digital Integrated
Circuits. McGraw-Hill, 1983.
[12] Ernest E. Hollis. Design of VLSI Gate Array ICs. Prentice-Hall, 1987.
[13] Kai Hwang. Computer Arithmetic – Principles, Architectures, and Design. John Wiley
and Sons, 1979.
[14] Barry W. Johnson. Design and Analysis of Fault-Tolerant Digital Systems. Addison-
Wesley Publishing Company, 1989.
[15] Parak K. Lala. Digital System Design using Programmable Logic Devices. Prentice-Hall,
1990.
VLSI Design
Course 0-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Bibliography
[17] Colin M. Maunder and Rodham E. Tulloss. The Test Access Port and Boundary Scan
Architecture. IEEE Computer Society Press, 1990.
[18] John Mavor, Mervyn A. Jack, and Peter B. Denyer. Introduction to MOS LSI Design.
Addison Wesley, 1983.
[19] William J. McClean (Editor). ASIC Outlook 1993. ICE (Integrated Circuit Engineering
Corporation), 1993.
[20] Dhiraj K. Pradhan, editor. Fault-Tolerant Computing: Theory and Techniques, volume I.
Prentice-Hall, 1986.
[21] Bryan T. Preas and Michael J. Lorenzetti. Physical Design Automation of VLSI Systems.
The Benjamin/Cummings Publishing Company, 1988.
[23] Takao Uehara and William M. van Cleemput. Optimal Layout of CMOS Functional
Arrays . In IEEE Transactions on Computers, pages 305–312, May 1981.
[24] John P. Uyemura. Fundamentals of MOS Digital Integrated Circuits. Addison Wesley,
1988.
[25] John P. Uyemura. Circuit Design for CMOS VLSI. Kluwer Academic Publishers, 1992.
[26] Stephen A. Ward and Robert H. Halstead. Computation Structures. MIT-Press, 1990.
[27] Neil Weste and Kamran Eshraghian. Principles of CMOS VLSI design. Addison-Wesley
Publishing Company, 1985.
[28] T.W. Williams, editor. VLSI Testing, volume 5 of Advances in CAD for VLSI. Elsevier
Science Publishers B.V., 1986.
VLSI Design
Course 0-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
pn Junction Properties
Chapter 1
The following analysis is done for the pn junction without external voltage (V = 0).
Diffusion (statistical phenomenon) of mobile carriers over the junction lets the dopants become
ionized and space charge regions arise. The diffusion is restricted by the electric field caused
by the space charge (moved electrons/holes). The equation describing the relation between
the space charge density ρ(x), the depletion electric field E(x) and the potential φ(x) (Poisson
equation) is given by
d2 φ(x) dE(x) ρ(x)
− 2
= = . (1.1)
dx dx Si
VLSI Design
Course 1-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
pn Junction Properties
ρ(x) is the volume charge density of ionized dopants and can idealized be written as
(
+qNd [0, xn ],
ρ(x) = (1.2)
−qNa [−xp , 0].
Dp Dn kT
= = VT = (1.8)
µp µn q
where k is the Boltzmann constant (in joules per Kelvin) and T the temperature (in K). The
electic field E(x) in the analyzed junction semiconductor has not for all x the value 0 which
means that also a drift current density Jdrif t exists. The equation for Jdrif t for positive charge
carriers is
Jp drif t (x) = qµp p(x)E(x) (1.9)
The resulting hole current density is
dp(x) A
Jp (x) = qµp p(x)E(x) − qDp (1.10)
dx m2
VLSI Design
Course 1-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
pn Junction Properties
dn(x) A
Jn (x) = qµn n(x)E(x) + qDn . (1.11)
dx m2
Setting Jp = 0 (equilibrium condition) and using the Einstein relationship Dp = µp VT we
obtain
dφ(x) VT dp(x)
E(x) = − = (1.12)
dx p(x) dx
and we can calculate the potential as
dp(x)
dV = −VT . (1.13)
(p)
p(−xp )
φ0 = VT ln . (1.15)
p(xn )
With
p(−xp ) = Na (1.16)
and
np = n2i (1.17)
n2i
=⇒ p(xn ) = (1.18)
Nd
we get the final expression for φ0
Na Nd
φ0 = VT ln . (1.19)
n2i
Note: Equation 1.17 is valid independent of the amount of donor and acceptor impurity
doping.
W = xp + xn (1.20)
With Na xp = Nd xn follows
Na
xn = W (1.21)
Na + Nd
Nd
xp = W. (1.22)
Na + Nd
(1.23)
VLSI Design
Course 1-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
pn Junction Properties
From
qNd
Si (xn − x) [0, xn ],
dφ
−E(x) = = (1.24)
dx qNa
Si (x + xp ) [−xp , 0].
where N = min(Na , Nd ).
Assuming that the positive side of an external voltage V is attached to the p-type area and
the negative side to the n-side area (V > 0: forward bias; V < 0: reverse bias) we can modify
the equilibrium equations by the transformation φ0 → (φ0 − V ) and obtain for the depletion
width: s
2Si 1 1
W = (φ0 − V ) + . (1.28)
q Na Nd
VLSI Design
Course 1-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
pn Junction Properties
The junction capacitance originates from the depletion charge. It is important in reverse bias
(V < 0), where it is given by
Si
Cj (V ) = [F/cm2 ] (1.29)
W (V )
C is nonlinear since it changes with the voltage V .
Current flow through the junction is established by tracking the minority carriers:
where !
Dn np0 Dp pn0
I0 = qA + (1.31)
Ln Lp
is the reverse saturation current. The reverse generation current (V < 0) is found as
qAni
Igen ' − W (V ), (1.32)
2τ0
while the forward recombination current assumes the form
qAni W (V ) qV /2kT
Irec ' e , (1.33)
2τ0
where τ0 is the average carrier lifetime. These contributions must be added to the ideal diode
current.
VLSI Design
Course 1-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
VLSI Design
Course 1-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
n-channel MOSFET:
• p-type wafer (single crystal p-type silicon) uniformly doped with acceptor (e.g. boron)
concentration Na (Na ' 1015 cm−3 )
• Close to the bulk electrode, the majority and minority thermal equilibrium concentra-
tions are approximated by
n2i
ppo ' Na and npo ' (1.34)
Na
• Oxide layer (SiO2 = quartz glass) is used as insulating dielectric between metal and
semiconductor layer with a resistivity > 1015 Ωcm.
• State of the art MOS processes use poly silicon as gate material.
The gate capacity is given by
εox
Cox = [F/cm2 ] (1.35)
xox
F
with εox = 3, 9ε0 , ε0 = 8, 854 · 10−14
cm
F
xox ' 50nm ⇒ Cox ' 10−8
cm2
Cges = Cox · A [F]
• the top layer of metal is used for low resistance connections of transistor structures
VLSI Design
Course 1-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
%newpage Varying the gate voltage gives three modes of operation for the MOS capacitor:
Accumulation
Positively charged majority carriers (holes) accumulate at the Si-SiO2 interface (Fig. 1.4).
The MOS system behaves as a capacitor (Eq. 1.35). This state is only useful for measuring
some basic MOS properties. It is no operational region.
VLSI Design
Course 1-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
Depletion
Figure 1.5: MOS fields and potentials for positive gate voltages
MOS field effect: An externally applied gate voltage VG controls the semiconductor electric
field E(x) and the semiconductor potential φ(x) and therefore the Silicon carrier densities p, n.
dφ(x)
E(x) = − (1.36)
dx
Potential boundary condition: φ(x) → VB = 0 at the bulk electrode.
The total voltage accross the semiconductor is equal to the surface potential
φS = φ(x = 0). (1.37)
Applying the KVL leads to
VG = Vox + φS (1.38)
Connection between VG and ES :
dφ
ES = E(x = 0) = − (1.39)
dx x=0
ES is the maximum value of the semiconductor field and is controlled (Poisson equation) by
the voltage VG and influences the surface carrier concentrations
⇒ negatively charged acceptor ions are termination points for the electric field lines.
pS = pp (x = 0) and nS = np (x = 0) (1.40)
If VG is increased to a point where pS Na (induced by electric field ES ) is satisfied, the
depletion region extends from x = 0 to x = xd . The depletion phenomenon in the MOS
system is analogous to the p-side of a one-sided n+ p profile junction with the difference that
there is the voltage φS across the depletion region.
Replacing the built-in voltage φ0 by the surface potential φS leads to an equation for the
depletion width: s
2εSi
xd = φS , εSi = 11, 8ε0 . (1.41)
qNa
VLSI Design
Course 1-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
Inversion
VLSI Design
Course 1-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
Increasing VG implies increasing φS and driving xd deeper towards the bulk electrode. When
VG reaches a critical threshold value VT 0 (assuming VB = 0) the inversion phenomenon occurs:
The depth of the depletion area remains constant (xd = xdm ) and a layer of minority carriers
accumulates at the surface (x=0). The depth of the depletion area remains constant, because
the inversion layer electrons shield the bulk substrate from the increasing field at the surface.
The inversion condition is given by
VG ≥ V T (1.45)
with φS (VG = VT ) = 2|φF | (1.46)
kT Na
where |φF | = ln (bulk Fermi potential). (1.47)
q ni
The maximum depletion width is
s
2εSi
xdm = (2 |φF |) , (1.48)
qNa
In reality exists an additional term VF B (called Flatband voltage) to the oxide voltage drop:
1
VF B = ΦGS − (Qox + QSS ) (1.53)
Cox
• ΦGS = ΦG − ΦS represents the difference in work functions Φ between the gate and
substrate materials (material specific contact voltages which can be taken from tables).
Since Qox and QSS are positive, VF B may become negative resulting in a negative threshold
voltage.
VLSI Design
Course 1-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
Non zero bulk voltage reverse biases the pn junction. The depletion charge is
q
QB = − 2qεSi Na (2|φF | + VB ) (1.56)
Threshold shift:
∆VT = VT (VB ) − VT 0 , VT 0 = VT (VB = 0)
√
2qεSi Na q
q
= 2|φF | + VB − 2|φF | (1.57)
Cox
q q
VT = VT 0 + γ 2|φF | + VB − 2|φF | (1.58)
with √
2qεSi Na 1/2
γ = [V ] (1.59)
Cox
body effect constant
The n-channel inversion charge is given by
QI = −Cox (VG − VT ) (1.60)
VLSI Design
Course 1-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
n-channel MOSFET:
• VSB > 0 because VB must be more negative than VS to make sure that the pn-junction
from bulk to source is reverse biased.
VLSI Design
Course 1-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
VLSI Design
Course 1-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
VLSI Design
Course 1-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
The channel electric field Ey (y) is established by the drain source voltage VDS is
dV (y)
Ey (y) = − (1.62)
dy
with V (y = 0) = VS = 0, V (y = L) = VDS .
The depletion depth has its maximum at the drain electrode because V (y) has a maximum at
y = L: s
2εSi
Xdm (y) ' [2|ΦF | + V (y)] (1.63)
qNa
The inversion charge density as a function of the position y is given by
VLSI Design
Course 1-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
Rearranging
ID dy
dV = ID dR = − (1.67)
µn W QI (y)
ZL VZDS
VLSI Design
Course 1-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
The resulting equation from the GCA for the nonsaturated current in a conveniant form is
β 2
ID = [2(VGS − VT )VDS − VDS ] (1.71)
2
At the onset of saturation the current ID reaches a peak value and remains constant in the
saturation region:
∂ID
= 0 = β(VGS − VT − VDS ) (1.72)
∂VDS
Evaluation of the derivation yields
β
⇒ ID,SAT = ID (VDS = VDS,SAT ) = (VGS − VT )2 (1.74)
2
⇒ parabolic border between saturation and nonsaturation.
VLSI Design
Course 1-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
VLSI Design
Course 1-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
β
⇒ ID ' ID0 (1 + λVDS ) = (VGS − VT 0 )2 (1 + λVDS ) (1.82)
2
λ has typical values from 0.1 to 0.01V−1 and represents the influence of VDS on ID in
saturation. λ is important in small geometrie devices. In the following exercises we will
neglect λ.
VLSI Design
Course 1-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
q q
VT = VT 0 + γ( 2|φF | + VSB − 2|φF |) (1.83)
ID ' 0 (VGS < VT ) (1.84)
βh 2
i
ID = 2(VGS − VT )VDS − VDS (VGS > VT , VDS < VDS,sat ) (1.85)
2
VDS,sat = VGS − VT (1.86)
β
ID = (VGS − VT )2 (1 + λVDS ) (VGS > VT , VDS ≥ VDS,sat ) (1.87)
2
VLSI Design
Course 1-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
Get
and λ from
ID2 1 + λVD2
(4) =
ID1 1 + λVD1
VLSI Design
Course 1-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
• includes additional depletion charge created by the channel voltage V (y), which is reverse
bias across the n+ p junction at the channel-substrate boundary
• assume VS = 0 = VB
• calculation for nonsaturated MOSFET
qDI 1 q
VT 0 (V ) = VF B + 2|φF | + + 2qSi Na (2|φF | + V ) (1.88)
Cox Cox
The basic GCA integral
VZDS
ID = [VGS − VT 0 (V ) − V ] dV (1.89)
0
is modified to (now: VT 0 not constant and dependent of QB0 )
VZDS
qDI
ID = β VGS − VF B − 2|φF | − −V
Cox
0
1 q
− 2qSi Na (2|φF | + V ) dV (1.90)
Cox
which gives for the nonsaturated drain current
qDI 1 2
ID = β VGS − VF B − 2|φF | − VDS − VDS
Cox 2
2 p
− 2qSi Na [(2|φF | + VDS )3/2 − (2|φF |)3/2 ] . (1.91)
3Cox
Introduction of a “reduction factor” M < 1 modifies the nonsaturated current equation to
β 2
ID = M [2(VGS − VT 0 )VDS − VDS ]. (1.92)
2
The saturated current is then given by
β
ID,sat = M (VGS − VT 0 )2 (1.93)
2
VLSI Design
Course 1-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
Figure 1.24: Comparision of circuit equations with the complete GCA model
Figure 1.25: Comparision of modified circuit equations with the complete GCA model
VLSI Design
Course 1-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
is modeled by
(Nd − Na ) > 0. (1.95)
The current ID can be modeled by
W
Z VDS
ID = −µn QC (V )dV (1.96)
L 0
VLSI Design
Course 1-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
Qn = −q(Nd − Na )a (1.98)
QS (V ) = −Cox [VGS − VF B − V ] (1.99)
q
Qj (V ) = 2qSi N (φ0 + V ) (1.100)
" #
kT (Nd − Na )Na
φ0 ' ln (built-in voltage) (1.101)
q Ni2
(Nd − Na )Na Na
N= = (Nd − Na ) (1.102)
(Nd − Na ) + Na Nd
Using these charge densities gives
W
Z VDS
ID = −µn [q(Nd − Na )a + Cox (VGS − VF B − V )
L 0
q
− 2qSi N (φ0 + V )]dV
q(Nd − Na )a 1 2
= β VDS + (VGS − VF B )VDS − VDS
Cox 2
2 p
− 2qSi N [(φ0 + VDS )3/2 − (φ0 )3/2 ] . (1.103)
3Cox
This equation is too complicate for hand-calculations, so usually the D-mode MOSFET is
described by
β 2
ID = [2(VGS − VT 0 )VDS − VDS ], (1.104)
2
VLSI Design
Course 1-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
VLSI Design
Course 1-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
VLSI Design
Course 1-28
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
Threshold voltage:
1 1
VTP =ΦGS − 2ΦFn − Cox (QSS + Qox ) − Cox QBn
Nd
ΦFn = kT
q ln ni >0 Nd : n − type substrate doping
q
QBn = 2qεSi Nd [2ΦFn + VBSp ] (1.106)
q p
VTp =VT Op − γp VBSp + 2ΦFn − 2ΦFn
√
2qNd εSi
with γp = Cox
VTp is negative for enhancement p-channel MOSFET. Current equations are similar to n-
channel MOSFET but all the signs are opposite.
1.2.10 Conclusions
VLSI Design
Course 1-29
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
ID = 0 ID = 0
Nonsaturation
VGS > VTn and VDS ≤ (VGS − VTn ) |VGSp | > |VTp | and |VDSp | ≤ |VGSp − VTp |
h i h i
βn 2 βp 2
ID = 2 2(VGSn − VTn )VDSn − VDS n
IDp = 2 2(VSGp + VTp )VSDp − VSDp
Saturation
VGS > VTn and VDS ≥ (VGS − VTn ) |VGSp | > |VTp | and |VDSp | ≥ |VGSp − VTp |
βp
ID = βn
2 (VGS − VTn )2 ID = 2 (VSGp + VTp )2
VLSI Design
Course 1-30
Darmstadt University of Technology
Institute of Microelectronic Systems 0
MOS Transistor Theory
VLSI Design
Course 1-31
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
VLSI Design
Course 1-32
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
βD
(Vin − VT )2 (1 + λVout ) = IL (VL ) (1.109)
2
= IL (VDD − Vout )
If Vin is more increased and when the point is reached where Vout < (VGS − VT ) then the
driver is in ohmic mode:
βD 2
[2(Vin − VT )Vout − Vout ] = IL (VL ) (1.110)
2
= IL (VDD − Vout )
VLSI Design
Course 1-33
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
VLSI Design
Course 1-34
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
VLSI Design
Course 1-35
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
VLSI Design
Course 1-36
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
VLSI Design
Course 1-37
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
VLSI Design
Course 1-38
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
Calculation of VOH
Calculation of VOL
Vin = VOH and the driver is nonsaturated, because Vout < Vin − VT .
VDD − VOL βD 2
= [2(VOH − VT )VOL − VOL ] (1.117)
RL 2
1 2VDD
2
⇔ VOL −2 + VDD − VT VOL + =0 (1.118)
βD RL βD RL
Solving quadratic equation ⇒ VOL .
VLSI Design
Course 1-39
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
Calculation of VIL
For Vin = VIL the driver transistor is saturated, because Vout is slightly below VOH . From ID
= IL follows:
βD VDD − Vout
(Vin − VT )2 = (1.119)
2 RL
VIL is defined as the point where
dVout
= −1 (1.120)
dVin
Differentials of both sides of ID (Vin ) = IL (Vout ):
dID dIL
dVin = dVout (1.121)
dVin dVout
dID
dVout dVin
⇔ = dIL
(1.122)
dVin dVout
βD (Vin − VT )
=
− R1L
VLSI Design
Course 1-40
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
Calculation of VIH
For Vin = VIH , Vout < (VGS −VT ), so the driver is in the ohmic (nonsaturated) mode. Equating
ID and IL gives
βD 2 1
[2(VIH − VT )Vout − Vout ]= (VDD − Vout ) (1.126)
2 RL
Evaluation of the condition (dVout /dVin ) = -1 for ID (Vin , Vout ) = IL (Vout ) gives
VLSI Design
Course 1-41
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
VIH can be computed by solving this quadratic equation and selecting the proper physical
root.
Calculation of Vth
The inverter threshold voltage is defined as the VTC point where Vin = Vout .
The current equation can be written as (with Vth = Vin = Vout ):
βD VDD − Vth
(Vth − VT )2 = (1.132)
2 RL
Rearranging and solving the equation
1 2VDD
2
Vth − 2 VT − Vth + VT2 − =0 (1.133)
βD RL βD RL
yields Vth .
In this approach VOH and VOL are of first and VIH and VIL of secondary importance.
The inverter is modeled as series resistive voltage divider.
Rof f
VOH = VDD (1.134)
Rof f + RL
VLSI Design
Course 1-42
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
For RL Rof f is VOH ' VDD . Current equations for VOL (assuming Vin = VOH ):
βD 2 VDD − VOL
[2(VOH − VT )VOL − VOL ]= (1.135)
2 RL
Rearrangement yields
W 2(VDD − VOL )
RL = 2 (1.136)
L D k 0 [2(V OH − VT )VOL − VOL ]
with βD = k 0 (W/L). This equation describes the needed product RL (W/L) for a given voltage
VOH . The driver on resistance can be written as follows:
VOL 1
Ron = = h i (1.137)
ID 0
k LW
(VOH − VT ) − 12 VOL
D
VLSI Design
Course 1-43
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
With VGSL = VDSL ⇒ VDSL > (VGSL − VT L ) the load is automatically saturated and the
current is given by
k0 W
IL = (VGSL − VT L )2 (1.138)
2 L L
Since VGSL = (VDD − Vout ) and Vout = VDSD ,
k0 W
ID = IL = [VDD − VDSD − VT L (VDSD )]2 (1.139)
2 L L
VSBL = Vout , so q q
VT L = VT 0L + γ( Vout + 2|φF | − 2|φF |) (1.140)
The driver is in cutoff for Vin < VT D ⇒ Vout = VOH . As Vin increases above VT D the driver
is saturated, so
βD βL
(Vin − VT D )2 = [VDD − Vout − VT L (Vout )]2 (1.141)
2 2
VLSI Design
Course 1-44
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
When Vin is increased further and the condition Vout < (Vin − VT D ) becomes true, then the
current is
βD 2 βL
[2(Vin − VT D )Vout − Vout ]= [VDD − Vout − VT L (Vout )]2 . (1.142)
2 2
VLSI Design
Course 1-45
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
The ideal load line in fig. 1.49 is for the case, that the load transistor body bias effects are
ignored.
Because VGSL = 0 > VT L is always satisfied ⇒ there always exists a conducting channel in
the depletion load.
VDSL,sat = (VGSL − VT L ) = |VT L | (1.147)
Border between saturated and nonsaturated load region:
βL βL
IL = [VGSL − VT L (Vout )]2 = [−VT L (Vout )]2 (1.150)
2 2
VLSI Design
Course 1-46
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
Condition for load beeing in nonsaturation: (VDD − Vout ) < |VT L (Vout )|
βL
IL = [2|VT L (Vout )|(VDD − Vout ) − (VDD − Vout )2 ] (1.151)
2
For the following discussion is assumed that
When Vin < VT D then the driver is in cutoff and the load provides a conduction path between
VDD and Vout , so Vout ' VOH ' VDD .
When Vin is increased above VT D the driver enters the saturation region while the load remains
ohmic (VDD − Vout < |VT L |):
βD βL
(Vin − VT D )2 = [2|VT L (Vout )|(VDD − Vout ) − (VDD − Vout )2 ] (1.153)
2 2
VLSI Design
Course 1-47
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
When Vin is increased further then either the driver or the load changes its operational region.
If
Vout < VDD − |VT L | (1.154)
is satisfied first, then the load will change to saturation while the driver remains in saturation,
otherwise
Vout < Vin − VT D (1.155)
is satisfied first and the driver becomes nonsaturated while the load is still nonsaturated.
When Vin is further increased to a voltage few less than VDD the driver is nonsaturated and
the load is in saturation region:
βD 2 βL
[2(Vin − VT D )Vout − Vout ]= [−VT L (Vout )]2 (1.156)
2 2
Calculation of VOH
Usually taken:
VOH ' VDD (1.157)
VLSI Design
Course 1-48
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
Figure 1.50: VTC for inverter with depletion mode MOSFET load
The current IL is in this case the driver leakage current. The conductance of the nonsaturated
load is:
IL βL
GDSL = = [2|VT L (VOH )| − (VDD − VOH )] (1.159)
VDSL 2
With VDSL = IL /GDSL results:
IL
VOH = VDD − 0
kL
(1.160)
W
2 L [2|VT L (VOH )| − (VDD − VOH )]
L
Calculation of VOL
βD 2 βL
[2(Vin − VT D )Vout − Vout ]= [−VT L (Vout )]2 (1.161)
2 2
Setting Vin = VOH and Vout = VOL yields
2
βR [2(VOH − VT D )VOL − VOL ] = |VT L (VOL )|2 (1.162)
Rearranging
2 1
VOL − 2(VOH − VT D )VOL + |VT L (VOL )|2 = 0 (1.163)
βR
and solution of this quadratic equation (body bias is ignored at this step) yields
s
1
VOL = (VOH − VT D ) − (VOH − VT D )2 − |VT L (VOL )|2 . (1.164)
βR
⇒ final result for VOL by iteration of this equation
VLSI Design
Course 1-49
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
The output voltages VOL and VOH are tuned to predefined values by adjusting the (W/L)
ratios. For VOH the following equation has been given before
IL
VOH = VDD − 0
kL
, (1.165)
W
2 L [2|VT L (VOH )| − (VDD − VOH )]
L
VLSI Design
Course 1-50
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
can be written as
VOL 1 1
Ron = = 0
kD
= (1.169)
ID W
[2(VOH − VT D ) − VOL ] Gon
2 L D
The equation
Ron
VOL = VDD (1.171)
Ron + RDSL
implies that the value of VOL is lowered by increasing βR . The transistor conductances are
proportional to their (W/L) ratios.
• CMOS circuits dissipate power only • processing is more complex than for
during switching events. When the NMOS: extra processing s.pdf must
inputs are stable, only leakage cur- be added to create n-tub areas for p-
rents are required from the power transistor realizations (including ex-
supply. (NMOS: current flow, when tra step for adjusting the threshold
driver is on) voltage of the p-channel device)
For Vin < VT n ⇒ Vout = VDD the nMOS transistor is in cutoff while the pMOS transistor is
in nonsaturation (|VDSp | = |Vout − VDD | < |VGSp − VT p | = |Vin − VDD − VT p |).
When Vin is increased to values above VT n , the nMOS transitor starts conducting in saturation
mode while the pMOS transistor is still in ohmic region:
βn βp
(Vin − VT n )2 = [2(VDD − Vin − |VT p |)(VDD − Vout ) − (VDD − Vout )2 ] (1.174)
2 2
VLSI Design
Course 1-51
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
As Vin is increased further, Vout is decreased. When the point is reached, where
the pMOS transistor goes into cutoff (⇒ IDn = IDp = 0, Vout = 0).
Calculation of VOH
VOH ' VDD when Vin < VT n (n-channel transistor in cutoff, current is leakage current only)
Calculation of VOL
VOL ' 0 when (VDD − Vin ) < |VT p | (p-channel transistor in cutoff)
Calculation of VIL
VLSI Design
Course 1-52
Darmstadt University of Technology
Institute of Microelectronic Systems 0
DC Characteristics of MOS Inverters
Calculation of VIH
At this point of the VTC the nMOS device is nonsaturated and the pMOS transistor is
saturated.
βn 2 βp
[2(VIH − VT n )Vout − Vout ] = (VDD − VIH − |VT p |)2 . (1.183)
2 2
The derivation condition (dVout /dVin ) = −1 has to be evaluated for IDn (Vin , Vout ) = IDp (Vin ):
dVout (dIDp /dVin ) − (∂IDn /∂Vin )
= = −1 (1.184)
dVin ∂IDn /∂Vout
which gives
βp βp
VIH 1 + = 2Vout + VT n + (VDD − |VT p |) (1.185)
βn βn
This equations forms together with equation 1.183 a quadratic in VIH which has to be solved.
Calculation of Vth
Design
While at nMOS design a lot of efforts have to be made to optimize the levels of VOH and VOL ,
the ratio (W/L) in CMOS design is used to set the level of Vth (VOH = VDD , VOL = 0).
W
βp µp L
= p (1.188)
βn µn W
L n
If in a process is set |VT p | = VT n ⇒ βp = βn then the device aspect ratios are related by
W
L µn
p = . (1.191)
W µp
L n
VLSI Design
Course 1-53
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
Since µn /µp ' 2.5 a minimum area CMOS inverter will have (W/L)n ' 1 and (W/L)p ' 2.5
In this case the VTC is completely symmetric.
Z −VT
VOH
dVOU T
Saturation : tx − t1 = −COU T β
(1.193)
2
VOH 2 (VOH − VT )
V
ZOL
dVOU T
Nonsaturation : t2 − tx = −COU T β 2
(1.194)
2 2(VOH − VT )VOU T − VOU T
VOH −VT
2COU T VT
⇒ t x − t1 = , (1.195)
β(VOH − VT )2
dx 1 xn
Z
with = ln (1.196)
x(a + bxn ) an a + bxn
COU T 2(VOU T − VT )
follows : t2 − tx = ln −1 ; (1.197)
β(VOH − VT ) VOL
2VT 2(VOH − VT )
⇒ tHL = τ + ln −1 (1.198)
VOH − VT VOL
COU T
with τ = (1.199)
β(VOH − VT )
dVOU T
IL (VOU T ) = COU T (1.202)
dt
VLSI Design
Course 1-54
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
Zt2 ZV1
dVOU T
tLH = dt = COU T (1.203)
IL (VOU T )
t1 V0
VDD − Vout
IL = (1.204)
RL
ZV1
dVOU T VDD − V0
tLH = RL COU T = RL COU T ln (1.205)
VDD − VOU T VDD − V1
V0
First Approximation : IL = βL
2 |VT L (VOU T )|2
(1.210)
COU T ·∆V 2COU T (V1 −V0 )
tLH = IL = βL |VT L |2
With more accuracy : VT L is not constant because of the substrate effect (=body bias effect).
Depletion MOSFET changes from saturation to nonsaturated mode, if
VDD − VOU T < |VT L |.
Nonsaturation
βL h i
IL = 2 |VT L (VOU T )| (VDD − VOU T ) − (VDD − VOU T )2 (1.211)
2
VDDZ−|VT L | ZV1
dVOU T dVOU T
tLH = COU T + COU T (1.212)
IL(SAT ) IL(nonSAT )
V0 VDD −|VT L |
COU T
Load Charge Time constant τL = βL |VT L | with
1
interconnect line resistance RLIN E : τL = βL |VT L | + RLIN E COU T
1
max. switching frequency fmax = tHL +tLH
VLSI Design
Course 1-55
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
V1/2
dVOU T
Z
tP HL = −COU T (1.215)
ID (VOU T )
VOH
VOHZ−VT D
dVOU T
= −COU T βD 2
VOH 2 (VOH − VT D )
V1/2
dVOU T
Z
−COU T βD 2
(1.216)
2 2(VOH − VT D )VOU T − VOU T
VOH −VT D
2VT D 4(VOH − VT D )
= τD + ln −1 (1.217)
(VOH − VT D ) (VOH + VOL )
COU T
with τD = (1.218)
βD (VOH − VT D )
(1.219)
Depletion load
VDDZ−|VT L | V1/2
dVOU T dVOU T
Z
tP LH = COU T + COU T (1.220)
IL(SAT ) IL(nonSAT )
VOL VDD −|VT L |
( " #)
COU T 2(VDD − |VT L | − VOL ) 2|VT L | − (VDD − V1/2 )
tP LH = + ln (1.221)
βL |VT L | |VT L | (VDD − V1/2 )
VLSI Design
Course 1-56
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
1
tp = (tP HL + tP LH ) with (1.228)
2
1
τn = + RLIN E COU T and (1.229)
βn (V1 − VTn )
" #
1
τp = + RLIN E COU T . (1.230)
βp (V1 − |VTp |)
(1.231)
tP HL = tP LH = tP
2VT 4(VDD − VT )
= τn + ln −1 (1.232)
VDD − VT VDD
VLSI Design
Course 1-57
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
VLSI Design
Course 1-58
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
VLSI Design
Course 1-59
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
VLSI Design
Course 1-60
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
VLSI Design
Course 1-61
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
VLSI Design
Course 1-62
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
VLSI Design
Course 1-63
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
VLSI Design
Course 1-64
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
P DP = Pav tp (1.233)
where Pav is the average power dissipated by the circuit and tp is the average propagation
delay time.
⇒ a small PDP is desirabel.
For PDP computation the input signal waveform must be taken into consideration (Fig. 1.60).
For the following PDP analysis, simplified versions of propagation delay time equations will
be used:
VLSI Design
Course 1-65
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
With Iav (average power supply current) the power dissipated by the circuit is
VLSI Design
Course 1-66
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
2 Ron tp
(P DP )HL ' Cout VDD (1.246)
RL tHL
The final PDP expression is obtained by adding all contributions:
1 tp Ron tp
2
P DP ' Cout VDD + + . (1.247)
4 tLH RL tHL
For well-designed inverters is Ron RL . The propagation delay time is then tp ' (τL /2).
With the approximations tLH = 2τL and tHL = 2τD follows:
3 2
P DP ' Cout VDD (1.248)
4
βD 2
(Pav )DC ' [2(VOH − VT D )VOL − VOL ]VDD (1.250)
4
βL
' [VT L (VOL )]2 VDD (1.251)
4
1
Z tLH
(Iav )LH = IL (t)dt (1.252)
T 0
with
dVout
IL (t) = ID (t) + Cout (1.253)
dt
VLSI Design
Course 1-67
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
⇒
1
Z tLH 1
Z tLH dVout
(Iav )LH = ID (t)dt + Cout dt (1.254)
T 0 T 0 dt
The first term may be rewritten as
1
Z tLH
ID,LH ≡ ID (t)dt . (1.255)
tLH 0
ID,LH = 0 if Vin is an ideal square wave (driver in cutoff). The second term can be evaluated
as follows Z tLH Z V1
dVout
dt = dVout (1.256)
0 dt V0
⇒
1
(Iav )LH = [ID,LH tLH + Cout (V1 − V0 )] (1.257)
T
VLSI Design
Course 1-68
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
Current flows only during a switching event so the average current in a logic cycle T can be
written as
1
Iav = [IDn ,LH tLH + IDn ,HL tHL ] . (1.268)
T
In this equation Z tLH
1
IDn ,LH ≡ IDn (t)dt (1.269)
tLH 0
gives the average current during the rise time, while
1
Z tHL
IDn ,HL ≡ IDn (t)dt (1.270)
tHL 0
is the average fall time current. For a completely symmetric CMOS inverter IDn ,LH =
IDn ,HL = IDn ,av , so the power-delay product is given by
f
P DPCMOS = IDn ,av VDD tp (1.271)
fmax
VLSI Design
Course 1-69
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
• MOSFET capacitances are complicated functions of the fabrication processes and the
layout geometry
L0 = Ls + L + LD (1.272)
The gate overlap is necessary to ensure the contact of the channel and the n+ regions.
The overlap capacitances are given by
VLSI Design
Course 1-70
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
The gate-bulk capacitance consists of the gate capacitance in series with the depletion capac-
itance of the depletion region.
VLSI Design
Course 1-71
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
2. Nonsaturation: the channel shields the bulk electrode from the gate since the inversion
layer acts as conductor between drain and source ⇒ Cgb = 0
3. Saturation: the channel shields the bulk electrode from the gate since the inversion layer
acts as conductor between drain and source ⇒ Cgb = 0. The channel is pinched off and
does not contact the drain n+ region.
CG = Cox W L0 (1.287)
0
where L = L + 2Lo
VLSI Design
Course 1-72
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
Figure 1.67: Expanded view of an n+ drain or source region for computing depletion capaci-
tances
where Vr is the magnitude of the reverse-bias voltage applied to the junction. φ0 is the built-in
VLSI Design
Course 1-73
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
potential !
kT Nd Na
φ0 = ln (1.291)
q n2i
and Cj0 is the zero-bias (Vr = 0) capacitance per unit area.
v
u qεSi
Cj0 =u
t (1.292)
1 1
2 Na + Nd φ0
The bottom capacitance can be computed simply using the doping concentrations Nd and Na
for the pn junction:
Cj0 W Y
Cbottom = 1/2 (1.293)
Vr
1 + φ0
For computing the sidewall capacitance the p+ channel stop doping must be taken into con-
sideration (−→ see also technology description later on). The sidewall capacitance is usually
computed by first taking the sidewall capacitance per unit area as
v
u qεSi
Cj0sw =u
t (1.294)
1 1
2 Na,sw + Nd φ0sw
where !
kT Nd Na,sw
φ0sw = ln (1.295)
q n2i
is the sidewall built-in potential. Because the n+ area has a junction depth of xj , the sidewall
capacitance per unit length Cjsw is taken as
where l is the total sidewall perimeter length (2W + 2Y ). Assuming φ0 = φ0sw , the total
depletion capacitance for a drain or source area is given by
For drain regions Vr = VDB and for source regions Vr = VSB ⇒ the depletion capacitance
depends on actual voltages.
An average depletion capacitance may be defined by
1
Z V2
Cav = Cd (Vr )dVr (1.299)
V2 − V1 V1
" 1/2 1/2 #
2φ0 CT V2 V1
= 1+ − 1+ (1.300)
(V2 − V1 ) φ0 φ0
VLSI Design
Course 1-74
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
where
CT = Cj0 W Y + Cjsw l . (1.301)
Defining a dimensionless voltage factor
" 1/2 1/2 #
Cav 2φ0 V2 V1
K(V1 , V2 ) = = 1+ − 1+ <1 (1.302)
CT (V2 − V1 ) φ0 φ0
yields
Cav = K(V1 , V2 )CT (1.303)
VLSI Design
Course 1-75
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
Cout = CGD1 + CGD2 + K(VOL , VOH )[Cdb1 + Csb2 ] + Cline + CG3 (1.304)
For computation of the line capacitance transmission line theory should be used (parasitic
capacitances, structures must be treated in a distributed manner). The problem can be
reduced by a lumped-element approximation:
with
εox
Cint = [F/cm2 ]. (1.306)
xint
Cint is the capacitance per unit area formed between the line and the substrate, xint is the
oxide thickness between line and substrate. The line resistance can be estimated in a similar
manner by
Rline = nR2 [Ω] (1.307)
where n = (d/w) is the number of squares (2) with area w2 as seen in the direction of current
flow. Fig. 1.70 gives an example for cascaded stages with a fanout of three:
The output capacitance of CMOS inverters can be computed using similar techniques. In
Fig. 1.71 two cascaded CMOS inverters are shown.
Cout ' CGDn + CGDp + K(VOL , VOH )(Cdbp + Cdbn ) + Cline + CG (1.309)
VLSI Design
Course 1-76
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
VLSI Design
Course 1-77
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Switching of MOS Inverters
Assuming that device dimensions are scaled with S > 1, such that
Length
Length’ = (1.311)
S
This length reduction applies to all geometries in the chip.
nMOS high-to-low time:
2VT D 2(VOH − VT D )
tHL = τD + ln −1 (1.312)
VOH − VT D VOL
Scaling: also voltage reduction by V 0 = (V /S). The term enclosed by curly brackets in the
previous equation remains constant, but τD is modified:
Cout
τD = (1.313)
βD (VOH − VT D )
(VOH − VT D )
(VOH − VT D )0 = (1.314)
S
0 0
Cox = SCox ⇒ βD = SβD (1.315)
Cout consists of oxide and depletion capacitances:
0 (C)oxide
(C 0 )oxide = Cox (Area)0 = (1.316)
S
(C)junction
(C 0 )junction ' (approximation) (1.317)
S
0 Cout
⇒ Cout = (1.318)
S
0 τD
⇒ τD ' (1.319)
S
The maximum switching frequency is
0 1
fmax = 0 0 ' Sfmax (1.320)
tHL + tLH
VLSI Design
Course 1-78
Darmstadt University of Technology
Institute of Microelectronic Systems 0
CMOS Technology
VLSI Design
Course 1-79
Darmstadt University of Technology
Institute of Microelectronic Systems 0
CMOS Technology
If the base-emitter junction of the pnp transistor becomes forward biased, the transistor is
switched on and I begins to flow, causing the npn transistor to be forward biased. The collector
current of the npn transistor forces the pnp transistor to conduct more current. This feedback
leads to latch-up and the circuit will be destroyed by heat.
VLSI Design
Course 1-80
Darmstadt University of Technology
Institute of Microelectronic Systems 0
CMOS Technology
The circuit can be prevented from latch-up by placing heavily doped guard ring around the
MOSFETs. This reduces the effectiveness of the base and emitter regions in both transistors.
VLSI Design
Course 1-81
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Overview: Combinational Logic
Chapter 2
• Random Logic: Circuit design using NAND gates, NOR gates and Inverters (often called
“AOI Logic Gate Representation” = AND-OR-Inverter Logic)
VLSI Design
Course 2-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Overview: Combinational Logic
• Complex MOS Logic: A boolean function is realized by a pull-up network (realizes the
product terms for logic ’1’) and a pull-down network (realizes logic ’0’). Productterm
realization is done by parallel/serial combinations of MOS tranistors which inputs are
controlled by the literals of the boolean equation.
• Passtransistor Logic: transistors are used as switches which are controlled by input
literals.
VLSI Design
Course 2-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Overview: Combinational Logic
• Logic Arrays: PLA (programmable logic arrays), gate-matrix layout, Weinberger Arrays
and regular layout achieved by application of the Euler-Graph method
VLSI Design
Course 2-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Overview: Combinational Logic
VLSI Design
Course 2-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex nMOS Logic
VLSI Design
Course 2-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex nMOS Logic
VLSI Design
Course 2-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex nMOS Logic
VLSI Design
Course 2-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex nMOS Logic
VLSI Design
Course 2-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex nMOS Logic
VLSI Design
Course 2-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
VLSI Design
Course 2-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
VLSI Design
Course 2-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
VLSI Design
Course 2-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
• Build logic gates as shown in figure 2.15 where transistors are represented as switches
• The pMOS pull-up network replaces resistive or depletion loads used in nMOS technique
⇒ pull-up and pull-down networks implement complementary functions, when one con-
ducts the other does not
• No quiescent current through the gate means zero or very low static power dissipation
VLSI Design
Course 2-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
VLSI Design
Course 2-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
Design Method
• nMOS devices pull the output to ’0’ when the gate inputs are ’1’
• pMOS devices pull the output to ’1’ when the gate inputs are ’0’
FP D = F (A, B, C, . . .)
FP U = F (A, B, C, . . .)
FP D = A + B + C
FP U = A+B+C =A∗B∗C
⇒ Synthesis can use conventional logic design techniques (Boolean functions, Karnaugh
maps, logic minimization) and express the results in AND/OR form for realisation in
series and parallel connections for devices
VLSI Design
Course 2-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
First the logic nMOS transistors are structured according to the rules above. The output of
the function is the complement of the nMOS logic. Now the pMOS transistor network has to
be structured according to the following rules:
VLSI Design
Course 2-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
VLSI Design
Course 2-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
VLSI Design
Course 2-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
F = A(BC + D) (2.1)
F = D0 S0 S1 + D2 S0 S1 + D1 S0 S1 + D3 S0 S1 (2.2)
VLSI Design
Course 2-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
⇒ ...
F = AB + (A + B)C (2.3)
VLSI Design
Course 2-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
VLSI Design
Course 2-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
Figure 2.19: Combinational adder layout possibilities for one adder circuit
VLSI Design
Course 2-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Complex Static CMOS Logic
• Consists of a single pMOS load per gate (emulating the nMOS depletion load, without
body effect) and a nMOS pull-down network
VLSI Design
Course 2-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
A B AB Pass Function
0 0 1 A+B
0 1 0 A+B
1 0 0 A+B
1 1 1 A+B
VLSI Design
Course 2-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
With
2Cin
τch ≡ (2.10)
βP (VDD − VT P )
VLSI Design
Course 2-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
(t/τch )
Vin (t) = (VDD − VT P ) . (2.11)
1 + (t/τch )
VGS,P = VDD − VP
= VDD (2.15)
VLSI Design
Course 2-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
Since the passtransistor is always nonsaturated, the charging current differential equation can
be written as:
dVin βP 2
−Cin = [2(VDD − VT P )Vin − Vin ] (2.16)
dt 2
VLSI Design
Course 2-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
Ignoring the body bias effect the solution of this differential equation is given by:
!
2e−t/τdis
Vin (t) = (VDD − VT P ) . (2.17)
1 + e−t/τdis
where
Cin
τdis ≡ (2.18)
βP (VDD − VT P )
1
τdis ' τch (2.19)
2
⇒ Discharging much faster than charging.
VLSI Design
Course 2-28
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
dVout
IDn + IDp = Cout (2.21)
dt
Logic 1 transfer:
Vout (t) = VDD [1 − e−(t/τT G ) ] (2.22)
with
τT G = RT G Cout (2.23)
Logic 0 transfer:
Vout (t) = VDD e−(t/τT G ) (2.24)
VLSI Design
Course 2-29
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
Equivalent Resistance
VT G
RT G = (2.25)
IDn + IDp
1
Rn = (2.26)
βn (VDD − VT n )
1
Rp = (2.27)
βp (VDD − |VT p |)
VLSI Design
Course 2-30
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
VLSI Design
Course 2-31
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
VLSI Design
Course 2-32
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
VLSI Design
Course 2-33
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
Path Selector
F = AS + BS (2.29)
S=1 : F =A
S=0 : F =B
VLSI Design
Course 2-34
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
OR Gate
F = A + AB
= A+B (2.30)
F1 = A ⊕ B
= AB + AB (2.31)
F2 = A B
= AB + A B (2.32)
VLSI Design
Course 2-35
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
VLSI Design
Course 2-36
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
Adders
S0 = A0 ⊕ B0 (2.33)
C0 = A0 B0 (2.34)
VLSI Design
Course 2-37
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
VLSI Design
Course 2-38
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
Array Logic
Multiplexers/Demultiplexers
4-to-1 Multiplexer:
VLSI Design
Course 2-39
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Passtransistor and Transmission Gate Logic
Split Arrays
⇒ improvement of the layout efficiency by separating pMOS and nMOS transistors into two
distinct areas (physical separation)
For reduction of device count and area an nMOS version with pMOS pull-up can also be useful
(→ kind of pseudo nMOS).
VLSI Design
Course 2-40
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clocking
Chapter 3
3.1 Clocking
Clock Signal:
VLSI Design
Course 3-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clocking
⇒ For nonoverlapping clock phases φ and φ fine tuned and well designed delay lines (realized
as Transmission gates) have to be inserted in order to avoid overlapping of φ and φ.
VLSI Design
Course 3-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clocking
VLSI Design
Course 3-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clocking
VLSI Design
Course 3-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clocked Static Logic
τT G = RT G CL (3.2)
where
CL = CT G + Cin + Cline (3.3)
VA = VDD : (Vin (0) = 0)
Vin (t) ' VDD [1 − e−t/τT G ] (3.4)
Inverter is switched, when Vin = VIH which occurs after
VIH
t1 ' −τT G ln 1 − (3.5)
VDD
VLSI Design
Course 3-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clocked Static Logic
VLSI Design
Course 3-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clocked Static Logic
dV
Cstore = ILp − ILn . (3.15)
dt
The solution of this equation is
(ILp − ILn )
V (t) = t + V (0) (3.16)
Cstore
If ∆V is the maximum allowed voltage change:
Cstore ∆V
tmax = (3.17)
IL
VLSI Design
Course 3-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clocked Static Logic
With Tmax = 2tmax (the longest allowed clock period) follows for the minimum frequency
1
fmin '
2tmax
IL
' (3.18)
2Cstore ∆V
The transmission gate capacitance is
VLSI Design
Course 3-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clocked Static Logic
For a realistic analysis of the charge leakage problems the dependence of the leakage currents
from the reverse voltage bias has to be taken into consideration (see [25]).
VLSI Design
Course 3-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Charge Sharing
QT = C1 VDD (3.23)
Vf = V1 (t > 0) = V2 (t > 0)
C1
= VDD (3.25)
C1 + C2
1
= VDD
1 + (C2 /C1 )
Initial charge:
N
X
QT = Ci Vi (0) (3.26)
i=1
Final voltage: PN
i=1 Ci Vi (0)
Vf = PN (3.28)
i=1 Ci
VLSI Design
Course 3-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Charge Sharing
VLSI Design
Course 3-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Dynamic Logic
• Transistor count is reduced from 2n (static CMOS) to n+2 for dynamic precharged
CMOS (but now: 2 phases of operation)
Precharge Phase
If Vin = 0 then
Cout
τch = = Rp Cout (3.29)
βp (VDD − |VT p |)
Worst case (Vin = VDD ):
τch,max = Rp (Cout + Cn ) (3.30)
" #
2|VT p | 2(VDD − |VT p |)
tch,max = τch,max + ln −1 (3.31)
(VDD − |VT p |) V0
VLSI Design
Course 3-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Dynamic Logic
Evaluation Phase
For the case that M1 is switched on and identically designed channel width for M1 and Mn
the discharge time constant is given by
(L1 + Ln )Cout
τdis = 0 (3.32)
kn W (VDD − VT n )
VLSI Design
Course 3-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Dynamic Logic
2VT n 2(VDD − VT n )
tdis = τdis + ln −1 (3.33)
(VDD − VT n ) V0
VLSI Design
Course 3-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Dynamic Logic
1
fmax ' (3.35)
2tM
• in the evaluation phase the output remains HIGH (LOW) or is optionally discharged
(charged)
VLSI Design
Course 3-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Dynamic Logic
VLSI Design
Course 3-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Dynamic Logic
pMOS blocks and nMOS blocks have to be installed alternated in order to avoid glitches
VLSI Design
Course 3-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Domino CMOS Logic
• Domino Logic: design method for glitch-free cascading of nMOS logic blocks
– Precharge during φ = 0
– Evaluation when φ = 1
Precharge Phase: The gate output is precharged to logic 1 and the inverter output
is going to logic 0. Logic transmission errors are avoided by providing a logic 0 at
the inverter output (avoiding discharge of the next logic stage).
Evaluation Phase: The inverter output stays according to the actual input values at
logic 0 or is set to logic 1. The correct result signal is provided at the end of the
domino cascade after stabilization of all stages.
VLSI Design
Course 3-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Domino CMOS Logic
VLSI Design
Course 3-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Domino CMOS Logic
$\phi_N$
B D
A C
n n n
$\phi_N$
VLSI Design
Course 3-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Domino CMOS Logic
VLSI Design
Course 3-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Domino CMOS Logic
out out
n-channel p-channel
only only
• small load capacity to be driven by logic (one inverter only) =⇒ low dimension of
transistors
• only positive logic realizations possible because of the input inverters ⇒ domino logic is
noninverting
Functions as
F1 = A ⊕ B = AB + AB
F2 = A B = AB + A B (3.36)
VLSI Design
Course 3-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Domino CMOS Logic
3.5.2 Analysis
Precharge
Assuming that all Ai (coming from previous stages) are zero, the capacitance CX is charged,
where
CX = C0 + CT (3.37)
' (CGDn1 + CBDn1 ) + (CGDp1 + CBDp1 ) + CG + Cline (3.38)
Evaluate
If all inputs Ai are set to logic 1, the worst case delay time can be estimated by
with
1
Rj = (3.40)
kn0 (W/L)j (VDD − VT n )
VLSI Design
Course 3-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Domino CMOS Logic
VLSI Design
Course 3-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Domino CMOS Logic
Figure 3.33: Use of feedback to control a pull-up MOSFET for charge sharing problem
VLSI Design
Course 3-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
NORA Logic
(NORA = NO RAce)
• one clock signal and the inverted clock signal with short slopes rise times are sufficient
• no inverter is needed between the logik stages, because of alternate use of n-type and
p-type blocks
From fig. 3.34 the signal race problem can be seen: A signal race can arise, when both
transmission gates conduct at the same time. If the new input from TG1 reaches the input of
TG2 while TG2 is still transmitting the output, the output information will be lost. Imperfect
TG synchronization occurs because of normal transition intervals or clock skew.
VLSI Design
Course 3-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
NORA Logic
VLSI Design
Course 3-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
NORA Logic
$\overline{clk2}$
in out
clk2
clk1 $\overline{clk1}$
clk1
clk2
VLSI Design
Course 3-28
Darmstadt University of Technology
Institute of Microelectronic Systems 0
NORA Logic
VLSI Design
Course 3-29
Darmstadt University of Technology
Institute of Microelectronic Systems 0
NORA Logic
VLSI Design
Course 3-30
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
Behaviour:
LD = 1 : Q←D Q←D
LD = 0 : store current state
VLSI Design
Course 3-31
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-32
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
Figure 3.42: Pseudo 2-phase clocking (a) waveforms and simple latch, (b) clock skew, and (c)
slow clock edges
VLSI Design
Course 3-33
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-34
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-35
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-36
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
3.7.4 Dynamic Flip-Flop with reduced Transistor Count and Clock Con-
nection
D Q
$\phi_1$ $\phi_2$
Figure 3.47: Reduced transistor count latch with high impedance sustainer transistor
VLSI Design
Course 3-37
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
Characteristic Equation:
where
VLSI Design
Course 3-38
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-39
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-40
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-41
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-42
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-43
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-44
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-45
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
VLSI Design
Course 3-46
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Memory Structures
INPUTS OUTPUT
CL D R S Q
X X 1 0 0
X X 0 1 1
X X 1 1 NA
VLSI Design
Course 3-47
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Signaldelay
Chapter 4
Performance
4.1 Signaldelay
ρ = resistivity
t = thickness
l = conductor length
w = conductor width
l
This expression may be rewritten as R = Rs w , where Rs is the sheet resistance having
Ω
units of 2 (ohms per square). Thus to obtain the resistance of a layer, one would simply
multiply the sheet resistance Rs , by the ratio of the length to width of the conductor. Note
that for metal having a given thickness t, the resistivity is known, while for poly and diffusion
the resistivities are significantly influenced by the concentration density of the impurities that
have been introduced into the conducting regions during implantation. This means that the
process parameters have to be known to accurately estimate these quantities.
Although the voltage-current characteristic of a MOS transistor is generally nonlinear, it is
sometimes useful to approximate its behavior in terms of a channel resistance to estimate
performance. The channel resistance may be expressed by
L
Rc = k
W
with −1
0 r
k= µ (Vgs − Vt )
tox
VLSI Design
Course 4-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Signaldelay
!
"
#%$ #%$
& '!
"
-/. 01/23
For both the n-channel and p-channel devices, k may take a value within the range 50, 000 to
30, 000 Ω
2 . The equation for k as given above demonstrates the dependence of channel resistance
on the surface mobility µ of the majority carriers. Since the mobility is also a function of
temperature, the channel resistance and therefore switching time parameters, as well as power
dissipation, change with temperature variations. The increase in channel resistance may be
approximated by +0.25% per ◦ C for an increase in temperature above 25◦ C.
The dynamic response of MOS systems are very much dependent on the parasitic capacitances
associated with the MOS device and interconnection capacitances that are formed by metal,
poly, and diffusion wires in concert with transistor and conductor resistances. The total load
capacitance on the output of a MOS gate is the sum of:
• gate capacitance (of other inputs connected to the output of the gate)
• routing capacitance (of connections between the output and other inputs).
Gate Capacitances
The large-signal MOSFET capacitance model that will be used to compute Cgate is based on
the self-aligned, poly gate LOCOS (local oxidation of silicon) structure depicted in Fig. 4.1.
VLSI Design
Course 4-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Signaldelay
Gate
(Oxide thickness xox)
Source Drain
n+ n+
Cgb
Csb Cdb
p+ p+
p-substrate
Poly gate
Drain n+
Ls L Ld
Source n+
Ys Yd
Although the LOCOS MOSFET has been singled out for the analysis, the model developed
here is generally applicable to any MOSFET regardless of the technology base. Figure 4.2a
shows the basic lumped-element capacitances and their physical origins in terms of the device
cross section. This particular model is chosen because it allows the capacitors to be divided
into contributions that may be computed directly from the device and processing parameters.
1. The overlap capacitances Cols and Cold are parasitic elements that originate from the
basic fabrication steps. In the self-aligned process, the polysilicon gate is employed as a
mask to define the n+ drain and source regions. Directly after this step, Ls = Ld = 0
and L0 = L. The overlaps occure because the remaining steps require heating of the
wafer. This gives rise to lateral diffusion of the n+ dopants. Typically, these overlap
VLSI Design
Course 4-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Signaldelay
A
Cgb tox 0 0
1 A 2 A
Cgs 0 2 tox 3 tox
1 A
Cgd 0 2 tox 0
A A 2 A
Cg = Cgb + Cgs + Cgd tox tox 3 tox
2. The gate-source capacitance Cgs is really the gate-to-channel capacitance as seen be-
tween the gate and source; similarly, Cgd represents the gate-drain capacitance when
the channel is acting as a conductor to the drain n+ region. The voltage-dependent
nature of the channel implies that these elements are nonlinear. Cgb is the gate-bulk
capacitance and consists of the gate capacitance in series with the depletion capacitance
established by the p-type space charge region. Table 4.1 shows approximated values of
these three capacitances in various states of the MOS transistor.
Diffusion Capacitances
The two remaining capacitors in the model of Fig. 4.2a are Csb and Cdb . These represent the
voltage-dependent depletion capacitances that result from the pn junctions at the drain and
source regions. The problem of determining these elements is aided by using the expanded
drawing in Fig. 4.3. This shows an n+ well in a p-type bulk region and is representative
of either a drain or a source; note that a p+ region surrounds the n+ sidewalls. The actual
doping profile around the pn junction is generally quite complicated. A step doping will be
assumed for simplicity.
The total depletion capacitance Cd can be presented by
where
VLSI Design
Course 4-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Signaldelay
p+
Nd
p+ xj p+
p-type substrate Na
Depletion region
Figure 4.3: Expanded view of an n+ drain or source region for computing depletion capaci-
tances.
Since the thickness of depletion layer depends on the voltage across the junction, both Cja
and Cjp are functions of junction voltage Vj . A general expression that describes the junction
capacitance is
Vj −m
Cj = Cj0 1 −
ΦB
where Vj is the junction voltage (negative for reverse bias), Cj0 zero bias capacitance (Vj = 0),
and ΦB the build-in junction potential (∼ 0.6V ). m is a constant, which depends on the
distribution of impurities near the junction, and has a value of the order of 0.3 to 0.5.
Routing Capacitances:
Routing capacitances between metal and poly layers and the substrate can be approximated
using a parallel plate model (C = t A), where A is the area of the plate capacitor, t is
the insulator thickness, and is the dielectric constant of the insulating material between
the plates. The parallel-plate approximation, however, ignores fringing fields. The effect of
fringing fields is to increase the effective area of the plates. Consequently, poly and metal lines
will actually have a higher capacitance (up to twice as large) than that predicted by the model.
Interlayer capacitance such as metal-poly capacitance is also enhanced by fringing. As line
width are scaled, the width (w) and heights of wires tend to reduce less than their separations
(l). Accordingly, this fringing effect increases in importance. For current processes, a factor
of 1.5 − 3 should be used. Another factor, which should be taken into account for small
VLSI Design
Course 4-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Signaldelay
Ij-1 Ij
R R R R
Vj-1 Vj Vj+1
C C C C
geometries when using the parallel plate model, is that a drawn shape (on mask) will not be
the same as the actual physical shape produced on silicon.
The propagation of a signal along a wire depends on many factors, including the distributed
resistance and capacitance of the wire, the impedance of the driving source, and the load
impedance. For very long wires propagation delays caused by distributed resistance capaci-
tance (RC) in the wiring layer tend to dominate. This transmission line effect is particularly
severe in poly wires because of the relatively high resistance of this layer. A long wire can be
represented in terms of several RC sections, as shown in Fig. 4.4.
The response at node Vj with respect to time is then given by
dVj (Vj−1 − Vj ) (Vj − Vj+1 )
C = (Ij−1 − Ij ) = −
dt R R
As the number of sections in the network becomes large (and the sections become small), the
above expression reduces to the differential form:
dV d2 V
rc =
dt dx2
where
where
VLSI Design
Course 4-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Signaldelay
1mm 1mm
Buffer
Input
Output
taubuf
Rs Rt
tau
Cl
Ct
V
The l2 term in the equation above shows that signal delay will be totally dominated by this
RC effect for very long signal paths. In order to optimize speed in a long poly line, one possible
strategy is to segment the line into several sections and insert buffers within these sections as
shown in Fig. 4.5.
A model for the distributed RC delay, which takes driver and receiver loading into account,
is shown in Fig. 4.6. Rs is the output resistance of the driver. Cl is the receiver input
capacitance. Rt and Ct are the total lumped resistance and capacitance of the line. τ is the
2
RC delay calculated using the equation τ = rc.l
2 . The concept of using RC time constants for
delay estimations is based upon the assumption that the time taken for a signal to reach 63%
of its final value approximates the switching point of an inverter.
Wire length design guide
For the purpose of timing analysis, an electrical mode may be defined as that region of con-
nected paths in which the delay associated with signal propagation is small in comparison with
gate delays. For sufficiently small wire lengths, RC delays can be ignored. Wires can then be
treated as one electrical node and modeled as simple capacitive loads. It is therefore useful to
define simple electrical rules that can be used as a guide in determining the maximum length
of communication paths for the various interconnect levels. To do this we required that wire
delay τw and gate delay τg satisfy the following condition:
τw τg
VLSI Design
Course 4-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
CMOS Gate Transistor Sizing
To fulfil this condition, the maximum length of the wire is given by:
r
2τg
l
rc
This establishes an upper bound on the allowable length of the interconnects where the above
approximations are valid.
To have the same rise and fall times for an inverter, we must make
Wp = 2Wn
where Wp is the channel width of the p-device and Wn is the channel width of the n-device.
This, of course increases layout area and dynamic power dissipation. In some cascaded struc-
tures it is possible to use minimum size devices without compromising the switching response.
This is illustrated in the following analysis, in which the delay response for an inverter pair
(Fig. 4.7a) with Wp = 2Wn is given by
Thus we find similar responses are obtained for the two different conditions.
There are two components that establish the amount of power dissipated in a CMOS circuit.
These are:
VLSI Design
Course 4-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power Dissipation
tinv.pair
R
3Ceq 3Ceq
(b) Wp=2Wn
tinv.pair
2R
2Ceq 2Ceq
(b) Wp=Wn
Considering a complementary CMOS gate, if the input=‘0’, the associated n-device is ‘OFF’
and the p-device is ‘ON’. The output voltage is VDD or logic ‘1’. When the input=‘1’, the
associated n-channel is biased ‘ON’ and the p-channel device is ‘OFF’. The output voltage is
0V (VSS ). Note that one of the transistors is always ‘OFF’ when the gate is in either of these
logic states. Since no current flows into the gate terminal, and there is no D.C. current, and
hence power Ps , is zero.
However, there is some small static dissipation due to reverse bias leakage between diffusion
regions and the substrate. The source-drain diffusion and the p-well diffusion form parasitic
diodes. Since the diodes are reverse biased, only their leakage current contributes to static
power dissipation. The leakage current is described by the diode equation
V
i0 = is (e kT /q − 1)
where
VLSI Design
Course 4-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power Dissipation
The static power dissipation is the product of the device leakage current and the supply
voltage. A useful estimate is to allow a leakage current of 0.1nA to 0.5nA per gate at room
temperature. Then total static power dissipation Ps is obtained from
Pn
Ps = ( 1 leakage current) × suply voltage
For example, typical static power dissipation due to leakage for an inverter operating at 5V
is between 1 − 2nW (nano-watts).
During transition from either ‘0’ to ‘1’ or, alternatively, from ‘1’ to ‘0’, both n- and p-transistors
are on for a short period of time. This results in a short current pulse from VDD to VSS .
Current is also required to charge and discharge the output capacitive load. This latter term
is generally the dominant term. The current pulse from VDD to VSS results in a ”short circuit”
dissipation which is dependent on the load capacitance and the gate design. This is of relevance
to I/O buffer design.
The dynamic dissipation can be modeled by assuming the rise and fall time of the step input
is much less than the repetition period. The average dynamic power, Pd , dissipated during
switching for a square-wave input Vin , having a repetition frequency of fp = 1/tp , as shown
by Fig. 4.8, is given by
tZp /2 Ztp
1 1
Pd = in (t)Vo .dt + ip (t)(VDD − Vo ).dt
tp tp
0 tp /2
where
For a step input with in (t) = CL dVo /dt (CL =load capacitance)
VZDD Z0
CL CL
Pd = Vo .dVo + (VDD − Vo ).d(VDD − Vo )
tp tp
0 VDD
2
CL VDD
=
tp
1
with fp = tp , resulting in
2
Pd = CL VDD fp
VLSI Design
Course 4-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power Dissipation
$t_p$
$V_{in}$
$V_{DD}$
$0$ $t$
$t_f$ $t_r$
$V_o$
$V_{DD}$
$0$ $t$
$I_d$
$I_{dn}$
$I_{pn}$
$0$ $t$
Thus for the repetitive step input the average power that is dissipated is proportional to the
energy required to charge and discharge the circuit capacitance. The important factor to be
noted here is that the lattest equation shows power to be proportional to switching frequency
but independent of the device parameters.
The power delay product (PDP) is used to characterize the overall performance of a digital
gate circuit. It is given by
P DP = Pav tp
where Pav is the average power dissipated by the gate and tp is the average propagation delay
time. Typically, MOS-based digital gates display power-delay products on the order of a few
picojoules (pJ). The PDP is commonly used to compare the performance of various logic
families or processing technologies. A small PDP is desirable, as this implies both low power
consumption and fast switching speeds.
As a first step towards understanding the meaning of the PDP, suppose that an ideal square
VLSI Design
Course 4-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power Dissipation
VOH
VOL
t
0 T/2 T
Vin(t)
VOH
V_{1/2}
VOL
t
0 T/2 T
wave Vin (t) (Fig. 4.9a) is applied to the resistively load nMOS inverter shown in Fig. 4.10a; the
output voltage Vout (t) then assumes the form drawn in Fig. 4.10b. The average propagation
delay is
1
tp ≈ (Ron + RL )Cout
2
with approximations as followed
tP HL ≈ τD = Ron Cout tP LH ≈ τL = RL Cout
where Ron is the on-resistance of the driver; note that Ron = RDS .
The average power dissipated by the circuit is given by
Pav = Iav VDD
Iav is the average power supply current and is separated into two contributions: the constant
(DC) current flow when the output is stable with Vout = VOL and the transient current that
flows during the rise and fall times. Using Ohms’s law, the average DC power dissipation
during the period T is
2
VDD
Pav =
2(Ron + RL )
The PDP that results from the constant DC current flow only is given by
1 2
(P DP )DC ≈ Cout VDD
4
VLSI Design
Course 4-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power Dissipation
$V_{DD}$
$R_L$
$+$
$V_{out}(t)$
$+$ $C_{out}$
$V_{in}(t)$ $-$
$-$
$V_{DD}$
$V_{out}$
$V_{OH}$ $R_L$
$+$
$V_{1/2}$
$V_{out}$
$R_{on}$
$V_{OL}$
$t$ $-$
$T/2$ $T$
{\it Output voltage (b)} {\it Resistor analogy for $V_{out}=V_{OL}$ (c)}
The total power-delay product for the circuit must also account for the average power con-
sumed by the gate during the rise and fall time intervals. Consider first the charging current
supplied by VDD during the rise time tLH . Since the driver is in cutoff, this can be estimated
by
(∆V ) Vl
Iav ≈ Cout = Cout
(∆t) tLH
with Vl = VDD being the logic swing. The resulting PDP contribution due to this current is
then
2 tp
(P DP )LH ≈ Cout VDD
tLH
The power supply current used by the inverter during the discharge time tHL is approximated
by
1 1 (VDD − VOH ) (VDD − VOL )
Iav ≈ (Iinitial + If inal ) = +
2 2 RL RL
Iinitial and If inal give the current at the beginning and end of the discharging event. Thus,
VLSI Design
Course 4-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power Dissipation
$V_{in}(t)$
$V_{OH}$
$V_{1/2}$
$V_{OL}$
$t$
$0$ $T/2$ $T$
{\it Input voltage waveform}
{\it (a)}
$I(t)$
$I_{max}$
$I_{leak}$
$t$
$0$ $T/2$ $T$
$I(t)$
$I_{peak}$
$t$
$0$ $T/2$ $T$
2 Ron tp
(P DP )HL ≈ Cout VDD
RL tHL
The complete expression for the PDP is obtained by summing all contributions:
1 tp Ron tp
2
P DP ≈ Cout VDD + +
4 tLH RL tHL
This can be simplified by noting that Ron RL will be valid in a well-designed inverter. The
propagation delay time is then tp ≈ (τL /2). Using this in conjunction with the approximations
tLH ≈ 2τL and tHL ≈ 2τD gives
3 2
P DP ≈ Cout VDD
4
as the lowest-order approximation for the total PDP.
VLSI Design
Course 4-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Scaling
The power-delay product for the CMOS inverter is computed by using the current waveform
in Fig. 4.11c. Since current flows only during a switching event, the average power supply
current required during a single logic cycle T can be written by
1
Iav = [IDn,LH tLH + IDn,HL tHL ]
T
In this equation IDn,LH gives the average current during the rise time, while IDn,HL is the
average fall time current. For a completely symmetric CMOS inverter, the two currents are
the same, so the power-delay product is given by
f
P DPCM OS = IDn,av VDD tp
fmax
4.4 Scaling
Very large-scale integration (VLSI) requires dense circuit layouts on silicon. The level of
integration depends on the smallest-size feature permitted by the fabrication processes. To
obtain the highest packing density, the size of the transistors must be made as small as possible.
This, however, changes the internal operating physics of the MOSFETs. Phenomena that are
negligible in “large” devices become limiting factors as device geometries are reduced.
This section discusses some of the important aspects involved in describing small MOSFETs.
The level is introductory, with emphasis on parameters that affect circuit design. The model
we use is a simple first-order constant field scaling.
First-order MOS scaling theory indicates that the characteristics of an MOS device can be
maintained and the basic operational characteristics preserved if the critical parameters of a
device are scaled in accordance to a given criterion. Such an approach has shown to be very
effective in scaling from the range 5µm to 10µm minimum features to the range 1µm to 3µm
minimum feature size.
Although first-order scaling does not give optimized device performance at small dimensions,
the technique is very powerful in providing the necessary guidelines to identify the improve-
ments (or otherwise) that can be expected as processes are scaled.
Basically the scaled device is obtained by applying a dimensionless factor α to
• device voltages
The resultant effect of the first-order scaling process is illustrated in Table 4.2. Table 4.2
shows that if device dimensions (which include channel length L, channel width W , oxide
thickness Tox , junction depth Xj , applied voltages, and substrate concentration density N )
are scaled by the constant parameter α, then the depletion thickness d, the threshold voltage
VLSI Design
Course 4-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Scaling
SCALING
PARAMETER FACTOR
Length; L 1/α
Width; W 1/α
DEVICE Gate oxide thickness; tox 1/α
PARAMETER Junction depth; Xj 1/α
Substrate doping; Na or Nd α
Supply voltage; VDD 1/α
Electric field across gate oxide; E 1
Depletion layer thickness; d 1/α
Parasitic capacitance; W L/tox 1/α
Gate delay; (V C/I) 1/α
RESULTANT DC power dissipation; Ps 1/α2
INFLUENCE Dynamic power dissipation; Pd 1/α2
Power speed product 1/α3
Gate area 1/α2
Power density; (V I/A) 1
Current density; (I/A) α
Transconductance; gm 1
Vt , and drain-to-source current Ids are also scaled. One of the important factors to be noted
is that since the voltage is scaled, electric field E in the device remains constant. This has
the desirable effect that many nonlinear factors essentially remain uneffected. A further point
is that reduction in oxide thickness would require the fabrication process to provide thinner
oxides with comparable yield to conventional oxide thicknesses.
The depletion regions associated with the pn junctions of the source and drain determine how
small we can make the channel. As a rule, the source-drain distance must be greater than the
sum of the widths of the depletion layers to ensure that the gate is able to exercise control over
the conductance of the channel. Thus in order to reduce the length of the channel one needs to
reduce the width of the depletion layers. This is accomplished by increasing the doping level
of the substrate silicon. As we scale device dimensions by 1/α, the drain-to-source current Ids
per transistor reduces by α, the number of transistors per unit area; that is, circuit density
scales up by α2 , which subsequently results in the current density scaling linearly with α.
Thus wider metal conductors will be necessary for densly packed structures.
A second characteristic illustrated in Table 4.2 is power density. Both the static power dissi-
pation Ps and frequency dependent dissipation Pd decrease by 1/α2 as the result of scaling.
However, since the number of devices per unit area increases by α2 , the resultant effect is that
the power density remains constant.
An estimation of the limit in power density is derived from the thermodynamic relationship
given by
Tj = Tamb + θjA .P
where
VLSI Design
Course 4-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Scaling
Generally the thermal resistance is expressed as ∆◦ C per watt, which means one watt of heat
energy will raise the temperature by ∆◦ C.
As the temperature increases, the carrier mobility falls, thus reducing the gain of devices.
This, in turn, would reduce the speed of circuits. If high temperature, high speed circuits are
required, then special consideration during design is necessary.
One of the limitations of first-order scaling is that it gives the wrong impression of being
able to scale proportionally to zero dimension, or to zero threshold voltages. In reality, both
theoretical and practical considerations do not permit such behavior. This is highlighted
when the surface concentrations become larger than surface concentrations become larger
than 1 × 1019 cm−3 , above which the gate oxide breaks down, before surface inversion can take
place for the formation of the channel.
where ρ is the conductivity term and t is conductor thickness. The voltage drop along such a
line can now be expressed as
Vd0 = (I/α)(αR) = IR
which is a constant. However, for constant chip size, the length of some of the signal paths
that traverse across the chip, as a rule, do not scale down. This gives the principal result that
voltage drops along communication paths are larger by a factor of α with respect to the scaled
voltages. In a similiar manner, we can derive the line response time as
τs0 = (αR)(C/α) = RC
which is a constant. However, as before, for a constant chip size many of the communication
paths do not scale. Thus the line response time normalize to scaled line response is larger by
a factor of α. The significance of this result is that it is somewhat difficult to take the full
advantage of the higher switching speeds inherent in scaled devices when signals are required to
propagate over long paths. Thus the distribution an organization of clocking signals becomes
a major problem as geometries are scaled.
The influence of scaling on interconnection paths is summarized in Table 4.3. As seen from
Table 4.3, metal lines must carry a higher current with respect to cross-sectional area; thus
VLSI Design
Course 4-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power and Clock Distribution
electron migration becomes a major factor to consider. The second problem relates to an
increase in the capacitance of wiring. As the level of integration increases, the average line
length on a chip tends to increase also. However the power dissipation per gate decreases,
which diminishes the ability of gates driving wiring capacitances. Under such condition,
average gate delay is determined by the interconnection rather than the gate itself.
Many of these limitations are being overcome by scaling lateral dimensions while keeping
vertical dimensions approximately constant.
One of the most important issues in chip planning is the routing of power. In technologies in
which there is only one level of metal, VDD and ground are routed in interdigitated trees. This
is illustrated in Fig. 4.12. Crossunders are very difficult. When necessary, these are done in
low resistance interconnect (poly over buried contact over active area) with a multiplicity of
contact cuts. Consider the extreme case of a crossunder that must cary 100mA. One square of
low resistance interconnect might have a maximum resistance of, say, 10Ω/2. Thus a square
crossunder would drop 1 volt. Over 50 contact 2µm cuts to the metal on each side would
be needed because of metal migration limits. Obviously, 100mA is an awful lot of current to
squeeze through a crossunder. Even 10mA can be difficult, and 10mA corresponds only to
about twenty nMOS inverters.
Power is usually distributed locally in diffusion since it must get to the sources and drains
anyway. For low-power gates, this local power distribution is not too bad, but for high
performance devices, great care must be taken. When two levels of metal are available the
general power distribution is much easier, though by no means trivial.
Clearly, one of the worst scenarios for power supply noise is when large segments of the chip
transition simultaneously. One strategy, therefore, is to distribute power in such a way that
parts of the chip that are likely to transition all at once are routed separately. If power is
distributed across these simultaneously switching segments, we would expect large surges on
the power lines, but if power is distributed along the signal lines, then surge currents should
be much smaller.
A major problem of high performance chips is bringing power onto the chip. Bonding wires
VLSI Design
Course 4-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power and Clock Distribution
Vdd
Vss
Vss
Vdd
Vdd Vdd
OUTPUT PADS
Vdd Vdd
Vss Vss
Vss Vss
Vdd Vdd
can bave anywhere from 0.25 to 2nH of inductance (about 0.5 to 1nH/mm). VDD and ground
are often double-bonded (two wires to the bonding pad) but while this lowers the inductance
somewhat, it does not give the expected factor of two unless the wires are kept far apart. This
is because there is mutual coupling between the wires. Seperate power pins might be used for
the output driver, since these drivers cause huge switching transients and can tolerate more
power supply noise than the internal circuitry.
Synchronizing machine operations and data transfers with clock pulses provides us with a
structured framework for dealing with the complexities of large system designs. Clocking
is a global control technique which provides the “glue” for system operation. It is equally
important at the circuit level, particularly in a dynamic logic stage.
VLSI Design
Course 4-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power and Clock Distribution
System level timing can be described using circular timing charts. Consider an ideal pseudo
2-phase scheme with mutually-exclusive pulses φ1 and φ2 :
φ1 (t) · φ2 (t) = 0
System timing can be described by constructing the chart shown in Fig. 4.13. Time increases
in a counter-clockwise direction with one full rotation corresponding to the clock periode T .
Segments are labeled according to time intervals when a clock signal is high. In this example,
φ1 = 1 during the first half-period, while φ2 = 1 during the last half-period.
A more realistic clocking arrangement is depicted by the clocking circle in Fig. 4.14. If both
clocks have 50% duty cycles, normal operation gives
φ1 (t) · φ2 (t) = 0
except during the transition times. Mutually-exclusive clock signals provide timing intervals
for logical operations, and are used to allow for normal gate delay times. Overlapped segments
are avoided to prevent ill-defined movement of data, instructions, or control signals. Transtion
times can be made small by proper clock generator design.
Clock skew is represented by rotating one of the clocks as shown in Fig. 4.15. The skew time
ts is defined as the time interval where
φ1 (t) · φ2 (t) = 1
VLSI Design
Course 4-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power and Clock Distribution
and indicates the possibility of unwanted simultaneous bit transfers. This may lead to severe
conflict problems in the operation.
A basic 2-phase clock generator circuit is designed to generate φ and φ̄ from a single input
CLK signal. This is often a matter of convenience to the user: requiring only a single external
clock makes the chip’s usage more attractive to the board designer.
Various circuits have been developed for use in clock generation. Fig. 4.16 provides a CMOS
generator/driver which uses a transmission gate as a delay element. MOSFETs M n1 and
M p1 form an inverter which acts as the first driver for the chain. The upper branch of the
circuit consists of two cascaded inverters and generator the signal φ̄ = CLK while the lower
branch only has a single inverter and gives φ = CLK. Transmission gate T G is used as a delay
element to minimize clock skew between φ and φ̄. Since it is biased into active conduction,
we will model it using an equivalent resistance RT G , and introduce the time constant
tD ' RT G Cin
tD ' t P
equalizes the delay between the upper and lower branches. Recalling that the transmission
gate conductance can be approximated by
we see that clocking skew can be controlled by adjusting the size of the TG transistors.
Another straightforward approach uses an SR latch as shown in Fig. 4.17. The clocking signal
CLK is inverted, and CLK and CLK are used to drive the SR circuit. The 2-phase clock
signals φ and φ̄ are taken from the latch outputs. This logic can also be used to generate
pseudo 2-phase clocks φ1 and φ2 by redefining the outputs.
VLSI Design
Course 4-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Power and Clock Distribution
Once the clocking pulses are generated they must be destributed throughout the chip in a
manner which minimizes clock skew. Fig. 4.18 illustrates the problem in a pseudo 2-phase
circuit by showing timing circles at various points on a chip. Skew problems originate mostly
from
so that the driver circuits and associated distribution schemes are important in maintaining
the synchronous logic design. A related problem is that the drive capability of the circuit
must be able to handle large capacitive loads at the required clock frequency.
VLSI Design
Course 4-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Input Protection Circuits
One approach to designing a clock distribution network is to use a cascaded chain of inverting
buffers that matches the clock generator to the distribution line. Also careful global planning
and structured distribution patterns can be used to solve the problem.
Clock distribution can also be accomplished by using a balanced tree network with multiple
fanouts as shown in Fig. 4.19. Identical drivers can be used within a given stage. Moreover,
the drive requirements of the output circuits are reduced from the single inverter design since
the FO has been split into groups. Each inverter reshapes the clocking waveform, making the
performance less sensitive to variations in the interconnect routing.
Clock skew problems can be minimized by using symmetrical geometries for the clock distri-
bution lines. An example is the “H-tree” network shown in Fig. 4.20. Every clock distribution
point O is the same distance from the driver D, giving equal delay times. If the load capac-
itance is the same at every O-point, then the clocks will all be in phase with one another.
Other geometrical patterns can be used so long as the general design criteria are unchanged.
Input pads connect data, control, or clocking signals to on-chip logic gates. When the pads
are directly connected to the gate electrodes of MOSFETs, care must be taken to insure
that excessive static electrical charge does not destroy the transistor. Protection circuits are
designed to drain excessive charge away from the MOS capacitance to avoid static burnout.
To understand the origin of the problem, recall that a MOSFET gate is basically a capacitor
of value
Cg = Cox W L
With a gate-substrate voltage VG applied to the transistor, the internal oxide electric field is
VLSI Design
Course 4-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Input Protection Circuits
given by
VG
Eox '
xox
where we have ignored any trapped oxide or surface charge. Breakdown occurs because of the
fact that silicon dioxide has a breakdown field value of approximately
EBD ∼ 7.5 × 106 V cm
If Eox exceeds this value, the oxide insulating properties break down and charge is tranported
VLSI Design
Course 4-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Static Gate Sizing
through the material. This usually results in destruction of the device. Since xox is usually
less than about 450 Å, the maximum gate voltage VG,max ' EBD · xox which can be applied
to the device is a relatively small number.
The basic idea of an input protection circuit is to allow for alternate charge flow paths when
the input voltage gets too large. Diode structures are very useful in this application since they
have relatively breakdown voltages which can be controlled. Moreover, reverse breakdown in
a pn-junction is non-destructive, so that the protection circuit is reusable. Junctions which
are purposely used at the reverse-bias breakdown voltage are generally termed Zener diodes.
Fig. 4.21 illustrates a simple input protection circuit for CMOS IC. Reverse biased pn-junctions
are used as protection diodes, and a series connected resistor is included to drop some of the
voltage. Both diode pairs (D1 , D2 ) and (D3 , D4 ) are designed to undergo breakdown for
positive or negative voltage surges. R is designed to reduce the voltage that reaches (D3 , D4 );
this effectively increases the level of protection to the transistor gate.
One problem that exists with this input protection circuits is the introduction of parasitic RC
time constants into the network.
Other input protection schemes are used. Fig. 4.22 shows a common circuit based on the
properties of a thick field oxide MOSFET. The transistor has an threshold voltage of VT,F >
VDD and is in cutoff during normal operation. A large input voltage V > VT,F drives the
transistor into conduction, providing a path to ground to drain off the excessive charge. The
breakdown voltage of the FOX MOSFET is large enough to withstand the high voltages since
XF OX is large.
An interesting and useful problem is that of optimizing a chain of static gates to minimize
the overall propagation delay. This type of situation arises in many different situations and is
important to high-performance circuits. In particular, it is relevant to the output drivers and
clocking circuits.
VLSI Design
Course 4-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Static Gate Sizing
A classic example is shown in Fig. 4.23 where the objective is to design the fastest network
for driving a large capacitance. For the problem at had, we will assume a series of inverting
buffers for the driving network. At first sight, it may appear that we could want the fewest
possible gates between the input and the load. This simple solution, however, ignores the effect
of capacitive loading on successive stages. Accounting for these factors shows that the sizing
of the transistors in the chain allows for minimization of the delay. This gives the interesting
result that additional logic gates are often inserted to reduce the overall propagation delay
between two points.
Consider the scaled inverter chain shown in Fig. 4.24. Each gate is characterized by a sizing
factor Sj which is normalized to the first stage such that S1 = 1, while Sj > 1 for (j > 1). By
definition, the first stage has a MOSFET conduction factor
W
β1 = k 0
L 1
The values of Ci and C0 are determined by gate 1, and scaled for successive gates. Note
that an additional capacitive component Cw has been added between stages. This represents
VLSI Design
Course 4-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Static Gate Sizing
the wiring contribution. We assume that the wiring capacitance is between two stages is
proportional to the sizing factor of the second stage. The capacitance between the j-th gate
and the (j + 1)-st gate can be summarized as follows:
Our calculation is to determine the values of Sj for (j = 2, ...) which minimizes the total delay
through the chain.
Suppose that there are N stages in the chain. The total time delay is given by
N
X R[Sj Co + Sj+1 (Ci + Cw )]
TD =
j=1
Sj
To minimize TD , we differentiate with respect to Sj and look for zero slope points via
δTD
= 0;
δSj
VLSI Design
Course 4-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Static Gate Sizing
must be true. Now then, the boundary conditions of the problems are
S1 = 1
CL
SN +1 =
Ci
which is our final result. Explicitly, the scaling factors are given by
S1 = 1
S2 = K
S3 = K 2
.
.
.
SN = K N −1
as the scaling required to optimize the chain. The minimum delay is then
N
X
TD,min = R[Co + K(Ci + Cw )]
j=1
= N R[Co + K(Ci + Cw )]
VLSI Design
Course 4-28
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Off-Chip Driver Circuits
One important point which is obtained from the above analysis deals with the delay time.
S
The equation K = ( Sj+1j
) says physically that the minimum chain delay occurs when every
stage has the same individual time delay tD .
The final question which must be answered is the number of stages N needed to optimize the
delay. To calculate this, we differentiate TD with respect to N and set the result to 0. This
gives the general equation
1
CL ln(CL Ci )
N
RCo + R(Ci + Cw ) 1− =0
Ci N
Off-chip driver circuits are critical to the overall chip design. Much effort is put into speeding
up internal switching networks. Careful output design insures that the high-performance
specifications apply to the external characteristics as well. Some important problems which
must be addressed include
• Fast switching
The simplest off-chip driver circuit consists of an inverter chain which is designed to handle a
large capacitive load. Cout includes contributions from the bonding pad, the package wiring,
and the circuit board trace. Since this easily amounts to tens or a few hundred of picofarads
depending on the interface specifications, the transistors must be relatively large.
VLSI Design
Course 4-29
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Off-Chip Driver Circuits
Consider the 2-stage off-chip driver network shown in Fig. 4.25. We may use time constants
to obtain first-order design estimates for the sizes of the output transistors M n2 and M p2 by
writing
W Cout
= 0
L n2 τn kn (VDD − VT n )
W Cout
=
L p2 τp kp0 (VDD − |VT p |)
where τn and τp are the high-to-low and low-to-high time constants, respectively. Since the
output capacitance seen by an off-chip driver can be large, the MOSFET aspect ratios are also
quite large. These are obtained using several parallel-connected transistors to aid in layout
and parasitic control. Sizing theory may be used to determine the sizes of the first stage
transistors M n1 and M p1.
The actual values of the fall and rise times can be estimated from
2VT n 2(VDD − VT n )
tHL = τn + ln −1
(VDD − VT n ) Vo
" #
2|VT p | 2(VDD − |VT p |)
tLH = τp + ln −1
(VDD − |VT p |) Vo
Tri-state off-chip driver circuits are constructed by splitting the input signal to individually
control each output transistor. Normal operation gives high and low voltages, while the high-
impedance state is obtained by driving both the nMOS and pMOS devices into cutoff. An
inverting tri-state circuit is shown in Fig. 4.26. When the tri-state variable Z = 1, pMOSFETs
M p1 and M p2 are off, while nMOSFET M n conducts. This gives normal circuit operation.
If Z = 0, then the gate voltages to output transistors are given by
VLSI Design
Course 4-30
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Off-Chip Driver Circuits
Vp = VDD
Vn = 0
so that both are in cutoff. A condition of Z = 0 thus provides the necessary high-impedance
state.
Bi-directional input/output (I/O) circuits are also quite useful. An example is shown in
Fig. 4.27. The tri-state section of the circuit is a non-inverting buffer with an enable control
E, where E = 0 gives the High-Z state. Operation is straight forward and easily understood
by examining the circuit.
VLSI Design
Course 4-31
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Off-Chip Driver Circuits
VLSI Design
Course 4-32
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
Chapter 5
The fabrication of an integrated circuit consists of a series of steps carried out in a specific
order. These steps convert the circuit design into an operable silicon integrated circuit chip.
The way in which individual IC fabrication steps are carried out is of critical importance to
the outcome of the manufacturing process. The main objective is to minimize the departure
of geometrical features of the processed circuit from those determined during the design. To
achieve this, a high degree of control over the parameters of each processing step is required.
Equally rigid requirements apply to the physical and chemical properties of materials used for
IC fabrication as well as to the cleanliness of the production environment.
The basic raw material used in semiconductor plants is a wafer or disk of silicon, which varies
from 75mm to 150mm in diameter and is less than 1mm thick. Wafers are cut from ingots of
single crystal silicon that have been pulled from a crucible melt of pure molten polycrystalline
silicon. Controlled amounts of impurities are added to the melt to provide the crystal with
the required electrical properties. The crystal orientation is determined by a seed crystal that
is dipped into the melt to initiate single crystal growth. The seed is then gradually withdrawn
vertically from the melt while simultaneously being rotated.
Slicing into wafers is usually carried out using internal cutting edge diamond blades.
A common approach to n-well CMOS fabrication has been to start with a moderately doped
p-type substrate (wafer), create the n-type well for the p-channel devices, and build the n-
channel transistors in the native p-substrate. The mask that is used in each process step is
shown in addition to a sample cross-section through an n-device and a p-device.
VLSI Design
Course 5-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
1. The first mask defines the n-well (or n-tub). p-channel transistors will be fabricated in
this well. Field oxide is etched away to allow a deep diffusion.
2. The next mask is called the “thin oxide” or “thinox” mask, as it defines where areas of
thin oxide are needed to implement transistor gates and allow implantation to form p-
or n-type diffusions for transistor source/drain regions. The field oxide areas are etched
to the silicon surface and then the thin oxide is grown on these areas. Other terms for
this mask include active area, island, and mesa.
3. Polysilicon gate definition is then completed. This involves covering the surface with
polysilicon and then etching the required pattern. In a self-aligned process, the poly
gate regions lead to aligned source-drain regions.
4. A n+ -mask is then used to indicate those thin-oxide areas (and polysilicon) that are to
be implanted n+ . Hence the thin-oxide area exposed by the n+ -mask will become a n+
diffusion area. If the n+ -area is in the p-substrate, then a n-channel transistor or n-type
wire may be constructed. If the n+ area is in the n-well, then an ohmic contact to the
n-well may be constructed. An ohmic contact is one which is only resistive in nature
and is not rectifying (as in the case of a diode). In other words, there is no junction and
current can flow in both directions in an ohmic contact. This typ of mask is sometimes
called the select mask as it selects those transistor regions that are to be p-type.
5. The next step ussually uses the complement of the n+ -mask, although an extra mask is
VLSI Design
Course 5-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
VLSI Design
Course 5-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
VLSI Design
Course 5-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
normally not needed. The “absence” of a n+ -region over a thin oxide area indicates that
the area will be an p+ -diffusion. p+ -diffusion in the n-well defines possible p-transistors
and wires. An n+ -diffusion in the n-substrate allows an ohmic contact to be made.
Following this step, the surface of the chip is covered with a layer of SiO2 .
6. Contact cuts are then defined. This involves etching any SiO2 down to the contacted
surface. These allow metal to contact diffusion regions or polysilicon regions.
8. As a final step, the wafer is passivated and openings to the bond pads are etched to
allow for wire bonding. Passivation protects the silicon surface against the ingress of
contaminants that can modify circuit behavior in deleterious ways.
Additional steps might include threshold adjust steps to set the threshold voltages of the n-
and p-devices.
In current fabrication processes the polysilicon is normally doped n+ . The p+ doping phase
reduces the poly doping such that the polysilicon inside the p+ regions have a higher sheet re-
sistence than the polysilicon outside the p+ region. The extent of this reduction may influence
the qulaity of metal-poly contacts within p+ regions.
VLSI Design
Course 5-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
VLSI Design
Course 5-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
Typical p-well fabrication steps are similar to an n-well process, except that a p-well is used.
The first masking step defines the p-well regions. This is followed by a low-dose boron implant
driven in by a high-temperature step for the formation of the p-well. The next steps are to
define the devices and other diffusions, to grow fiels oxide, contact cuts, and metallization. An
p-well mask is used to define a p-well regions, as opposed to a n-well mask in a n-well process.
An p+ -mask may be used to define the p-channel transistors and VSS contacts. Alternatively,
we could use a n+ -mask to define the n-channel transistors, as the masks usually are the
complement of each other.
Twin-tub CMOS technology provides the basis for seperate optimization of the p-type and
n-type transistors, thus making it possible for threshold voltage, body effect, and the gain
associated with n- and p-devices to be independently optimized. Generally the starting ma-
terial is either an n+ or p+ -substrate with a lightly doped epitaxial or epi layer, which is
VLSI Design
Course 5-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
VLSI Design
Course 5-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
used for protection against latch-up. The aim of epitaxy (which means “arranged upon”) is
to grow high purity silicon layers of controlled thickness with accurately determined dopant
concentrations distributed homogeneously throughout the layer. The electrical properties for
this layer are determined by the dopant and its concentration in the silicon.
The process sequence, which is similar to the p-well process apart from the tub formation
where both p-well and n-well are utilized, entails the following steps:
• tub formation
• metallization.
VLSI Design
Course 5-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
5.1.5 Isolation
LOCOS
The Local Oxidation of Silicon (LOCOS) achieves device isolation by selective oxide growth.
A typical LOCOS process starts by growing a thin stress relief thermal oxide (SiO2 ) layer on
the silicon surface. Next, silicon nitride (Si3 N4 ) is deposited and patterned, keeping nitride
in the areas where transistors will be built. The entire surface is then exposed to an oxidizing
ambient. Nitride does not oxidize, but any exposed silicon will react to form SiO2 . The
resulting LOCOS structure is illustrated in Fig 5.12.
where XR is the depth of recession and XF OX is the thickness of the grown field oxide (FOX)
which separates device locations. In general, the patterned nitride regions are called active
areas, while the oxide growth defines the field regions between active transistor sections.
LOCOS is a widely used isolation technique in many processing lines. However, a major
limitation is the problem of active area encroachment which occurs during the FOX growth
process and reduces the usable size of the region. The Problem is illustrated in Fig. 5.13. Even
though the nitride protects the silicon surface, oxygen diffuses through the sides of the stress-
relief oxide layer during the FOX growth. SiO2 is thus formed arround the edges, lifting the
nitride upwards and forming a characteristic bird’s beak transition region between the active
area and the field oxide. Encroachment cannot be avoided and affects the integration density.
VLSI Design
Course 5-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
VLSI Design
Course 5-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
Trench Isolation
Trench isolation uses reactive ion etching (RIE) to form small trenches in the silicon. The
trenches are then filled with oxide and polysilicon to electrically isolate neighboring device
regions from one another. High integration levels are possible since the trench widths can be
reduced to the order of a few microns. Trench isolation is illustrated in Fig. 5.14. A field
implant may be used to increase the trench threshold voltage VT,T r . Small trench dimensions
makes this approach particularly important for high-density integration.
The vertical trench regions may also be used to create large-value capacitors without con-
suming valuable surface real estate. An example geometry which uses doped poly and p+ as
capacitor plates is shown in Fig. 5.15.
Trench capacitors are commonly used in advanced dynamic RAM (DRAM) cell design since
they conserve surface real estate. Trench isolation has been developed to the point where it is a
viable production line technique. It eliminates almost the problem of active area encroachment
found in LOCOS and is useful when increasing the logic integration density.
5.1.6 Latchup
Bulk CMOS technologies are susceptible to latchup. This condition occurs when a parasitic
conducting path is established between VDD and ground, directing current away from the cir-
cuit. Once latchup occurs, it can only be stopped by removing the power supply and restarting
the circuit. In addition to halting the circuit operation, latchup may induce catastrophic fail-
ure from heating.
Fig. 5.16 shows the cross-section of a n-well CMOS substrate region where the latchup problem
originates. To understand the origin of the latchup problem, note that the voltage across
parasitic resistor Rw1 acts to forward bias the emitter-base junction of Q2 . If VEB2 reaches
the turn-on voltage of about 0.7 volts, IC2 flows. This current flowing through Rs1 develops
a forward bias VBE1 across the base-emitter junction of Q1 , causing IC1 to increase. The
VLSI Design
Course 5-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Processing Steps
transistor pair Q1 and Q2 are connected to form a positive feedback loop, so that the buildup
continues.
Latchup triggering may occur anytime the circuit voltages exceed normal levels. Causes in-
clude
• Voltage overshoot/undershoot
• Avalanche breakdown
VLSI Design
Course 5-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Rules
• Punchthrough
• Parasitic MOSFETs
• Photocurrent
and others. Although careful circuit design may reduce the possibility of inducing latchup, it
is generally worthwhile to take extra precautions.
There are two main approaches to dealing with the latchup problem: (a) reduce the transistor
current gains, or (b) decouple the transistor feedback loop; it is common to use both in
practice. Deep trench isolation can also be used to reduce the possibility of latchup. Fig. 5.17
illustrates adjacent nMOS and pMOS transistors separated by deep trenches. Parasitic bipolar
transistors are not found in the structure since the isolating pn-junctions have been replaced
by an oxide barrier.
Latchup prevention is an important aspect of CMOS chip layout and design. One should
always check to insure that all suggested rules have been followed to guard against the problem.
Design rules are sets of geometrical specifications which govern chip design for a given fabri-
cation process. The layout rules are statements of the geometrical limits placed on the mask
patterns and include items such as minimum widths, dimensions, and spacings. Violating the
design rules can lead to a geometry which cannot be replicated in the fabrication line, yielding
a non-functional circuit. Designers are often saved from simple mistakes by the omnipotent
design rule checker (DRC) used to find layout violations. Another important fact is that
parasitic circuit component values are a direct consequence of the layout geometry. Since the
layout is an integral part of the circuit design, it is important to examine how a design rule
set affets the overall performance.
VLSI Design
Course 5-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Rules
and constitutes a metric for the surface dimensions. The most common approach is optical
lithography which uses an ultraviolet light source through a patterned mask to selectively
expose a light-sensitive photoresist layer. Alternate approaches include electron-beam and
X-ray sources; these offer finer resolution but introduce other problems. X-ray lithography
currently appears to be the likely winner in the next generation, but recent advances in e-beam
systems still look promising.
Regardless of the approach, the resolution is limited by diffraction effects which occur whenever
a wave passes by an opaque edge. This result in the minimum linewidth specification in the
design rule set and may be viewed as the smallest mask dimension which can be reliably
transferred to the chip surface. U V optical lithography has a minimum linewidth on the order
of about 0.5 microns; e-beam systems can pattern down to one-tenth of a micron or less.
Diffraction also limits how small we can make the spacing between two lines; this consideration
gives a set of minimum spacing allowances in the design rule set. Minimum spacings also are
needed to account for misaligned masking steps, lateral spreading, and other problems which
occur during the many weeks it takes to fabricate a wafer. Yield enhancement plays an
important role in setting the final numbers.
Design rules are best illustrated by example. We consider a 1.5-micron n-well, single-poly,
double-metal process which uses 10 masks. The process flow description in Table 5.2 lists the
major steps in the fabrication and indicates each mask in proper sequence.
Geometrical layout rules specify minimum mask feature sizes. Rules are provided for each
masking layer, and also for spacings between different layers. The former originates from
lithographic constraints or physical considerations. Bloats and shrinks may be applied to se-
lected layers during the fabrication process, but the resulting physical overlay for the structure
is still represented by the layout drawing.
Table 5.1 provides a listing of design rules for a 2-micron CMOS process. These consist of
minimum widths or dimensions, minimum spacings between features on the same or other
layers, overlap distances, and other item of importance to the chip layout. Some examples of
the design rules are shown below. Ground rules are usually accompanied by a complete set of
drawings to illustrate each specification.
VLSI Design
Course 5-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Rules
An integrated circuit may be viewed as a set of overlaid geometric patterns. Each layer is
VLSI Design
Course 5-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Rules
6 differ.
2µ
same
? Potential 35µ -
10µ -
Potent.
1.50µ
Active -
Area n-well
p-active 30µ -
6
6
3.0µ 2.25µ
?
? scribe
n-active lane
0.75µ
Poly 2.0µ
- Active Area scribe
- 6
2.25µ lane
? ?
30µ -
6 -
1.50µ 1.25µ
-
1.50µ
minimum channel length for VDD = 5V is 1.5µ and for VDD > 5V 2.25µ.
VLSI Design
Course 5-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Rules
n+
Active Area
n+ diffusion
Poly
6
2.25µ
p+ is reverse of n+ ?
6
2.0µ
?
p-active
-
1.25µ - 1.50µ
n+
61.25µ
?
n+
Contact 6
2.75µ
?
6
2.75µ
Active Area
?
1.75µ
Poly -
6
?
1.50µ - 61.25µ
?
1.75µ 6
?
1.0µ -
-
1.50µ
VLSI Design
Course 5-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Rules
2.25µ
- scribe
lane
metal1 metal1 30µ -
1.0µ 6
?
-
2.0µ
1.5µ
Via - metal1 metal1
metal2
6 2.0µ
6
metal1 2.0µ 1.0µ 2.0µ
- ?
?
6@ -@ @ @
? @ @ @ @
1.5µ 6
- 6 6 6
metal2
scribe
metal2 metal1 lane
2.75µ- metal2 61.5µ
?
@ 30µ -
3.0µ @
-
VLSI Design
Course 5-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Rules
shaped to provide the proper characteristics when referenced to every other layer. High-density
circuit design requires compacting the geometrical patterns into a small area without violating
the design rules.
Active Areas
Dimensional specifications for active device areas are larger than that permitted by the lithog-
raphy to account for encroachment from the isolation. As shown in the sequence of Fig. 5.18,
growth of the field oxide creates the bird’s beak region which must be avoided when patterning
the device.
Gate Dimensions
Basic self-aligned MOSFETs are fabricated using the polysilicon gate as a mask for a n+ or
p+ drain/source ion implant. Lateral doping affects give effective channel lengths which are
smaller than the drawn values shown on the poly mask.
Gate Overhang
Self-aligned MOSFETs use the gate polysilicon as a mask to the drain and source implants.
To insure a functional MOSFET we require that the masks are drawn so that the poly gate
extends further than required in the W direction. Fig. 5.20 shows the geometry. Providing
VLSI Design
Course 5-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Rules
for a gate overhang allowance compensates for mask misalignment between the poly and n+
or p+ regions. If the gate over hang is reduced to zero, then even a minor registration error
would result in a shorted transistor.
Contact and via etches in the oxide can be troublesome failure points in a high-density layout.
If the contact windows are too large, nonuniform coverage may result in void formation and
other problems. The same comment also applies to oxide cuts which are too small. To avoid
inducing contact-related failure modes, it is common practice to allow only one size for contact
windows; large areas are connected by multiple contacts. This is illustrated in Fig. 5.21.
Metal Dimensions
Metal layers are deposited at the end of the fabrication sequence. They generally encounter
a very rugged terrain due to patterning of the previous layers. Owing to this fact, the design
rule widths and spacing must be large to insure electrical current flow. Another reason for
VLSI Design
Course 5-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Circuit Extraction and Electrical Process Parameters
increased widths is to allow larger current flow levels for power and ground connections.
The title circuit extraction includes a broad class of layout analysis problems. The fundamental
problem is connectivity extraction, which derives a list of interconnections among the termi-
nals from a layout description. There are several parameter extraction, which augment the
basic connectivity information with measurements of features that are related to the (analog)
electrical characteristics of the chip.
Consider the problem of finding transistors. Transistors are formed by intersecting the polysil-
icon and diffusion layers; their type depends on the presence or absence of different kinds of
implant or tub.
Most circuit extractor treat two points (on the same or different layers) as electrically con-
nected if they lie in the same region of a single layer or if they can be joined by a sequence
of regions on several layers that are connected explicitly by contact windows. A common
circuit extraction operation is to find maximal regions of electrically connected points, more
commonly called nodes. This operation involves labeling the contents of each layer so that
items belong to the same node if and only if they have the same label.
VLSI Design
Course 5-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Circuit Extraction and Electrical Process Parameters
The output of connectivity extraction is a list of transistors on the chip, together with node
numbers on each transistor’s gate, source, and drain. This transistor list is adequate for
VLSI Design
Course 5-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Circuit Extraction and Electrical Process Parameters
checking the logical correctness of the circuit. In order to check analog characteristics of the
circuit, it is necessary to extract parasitic capacitances and resistances and transistor size
information.
The first step in connectivity extraction is to create derived layers that correspond to transistor
of different kinds and to electrically connected regions on single layers. To illustrate the
creation of derived layers using the edge representation, suppose the artwork for an nMOS
chip includes the following six levels: Dmask (the diffusion mask), P mask ( the polysilicon
mask), M mask (the metal mask), Cmask (contact windows from metal to underlying layers),
Bmask (buried contact windows between polysilicon and diffusion), and Imask ( the depletion
transistor implant). Then we could create five derive layers as follows:
trans ← Dmask and Pmask and not Bmask
dwires ← Dmask and not trans
PDcuts ← Pmask and Dmask and Bmask
MPcuts ← Mmask and Pmask and Cmask
MDcuts ← Mmask and Dmask and Cmask and Pmask
Regions in layer trans are transitor channels, that is, places where polysilicon crosses diffusion
outside of a buried contact region. Conduction diffusion regions are represented in layers
dwires. Files P Dcuts, M P cuts, and M Dcuts contain pricisely the places where materials of
the appopriate types make electrical contact.
The next step is to assign globally consistent signal labels to the items on each conducting
layer that belong to a node, using the contact windows to merge signals between layers. The
final step in connectivity extraction is to find for each transistor the signal labels on the nodes
that are its terminals. This requires examinig all regions that abut a transistor region.
To extract capacitance we still treat each node as equipotential but also consider it as the
terminal of one or more capacitors. Each region has a capacitance between itself and the
chip substrate and also internodal capacitances between itself and other overlapping or nearby
nodes.
Substrate capacitance can be accurately approximated as a function of the area and perimeter
of each region on each layer. Capacitance between two nodes of the circuit is much harder to
compute accurately. Internodal capacitance is not a simple function of area and perimeter.
Analog characteristics such as the drive of an MOS transistor are a function of its channel
length and width. For a rectangular transistor formed by polysilicon that completely overlaps
diffusion, length is one-half of the transistor’s perimeter with polysilicon, and width is one-half
of the transistor’s perimeter with diffusion.
VLSI Design
Course 5-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Circuit Extraction and Electrical Process Parameters
When we consider the problem of extracting resistances from a layout, the abstraction of
transistors connected by equipotential nodes breaks down completely. It does not make sense
to associate resistance with a node: resistance is defined between pairs of points. Thus, a
node attached to the terminals of k transistors gives rise to k(k−1)
2 resistances, one between
each pair of terminals.
One idea is to reduce the number of resistances we must compute by chopping the region into
electrically isolated regions. If we add the appropriate k − 2 junctions to a node attached to k
terminals, then we need to compute only O(k) resistances, instead of k(k−2) 2 . (See Fig. 5.22)
Figure 5.22: A region with eight terminals has 28 interconnection resistances. Making the
cross-hatched juntions into new nodes splits the region into 10 electrically isolated regions and
reduces the number of interconnection resistances to 10
A second way to reduce the number of resistances is to break nodes into rectangles by intro-
ducing artificial junctions at corners. Thus, resistances can be more easily computed.
Careful resistance extraction is the hardest and most expensive problem. Indeed, most chips
are manufactured without ever undergoing a complete resistance extraction because such an
extraction would result in a prohibitively large network of resistors.
The technology description file contains all information specific for a particular technology.
Among this information, and of particular importance for the extractor, is the specification
of the layers that can be used in a process and electrical parameters of that process.
Layers are specified by their name and their type. The type of a layer distinguishes between
auxiliary layers, implantation layers, and interconnect layers. Auxiliary layers are ignored by
the extractor. Interconnect layers form the conducting patterns in a chip layout, so in a chip all
interconnections will always be made via such layers. If the layers is of type interconnect, an
associated terminal layer must be specified for it. Given the interconnect layers, the extractor
is able to determine where the nodes of an element are located. Another important part of
the technology description is the specification of the elements to be extracted.
For extraction of parasitic elements, electrical process parameters must be known. The layer
capacities or layer and contact resistances are necessary for exact modelling of parasitic ca-
VLSI Design
Course 5-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Basic Layout
pacitances and resistances on a wafer, e.g.: for calculating load capacities (gates) and coupling
(between wires). Furthermore, process parameters must be involved during the design. So
poly lines can not be designed too long because of the high layer capacity and resistance of
polysilicon. Example parameter of an n-well CMOS process are listed in Table 5.3 and 5.4.
nF
Capacities Value ( cm 2)
Gate-Oxide 135
n+ –diff to substrate (bottom) 25
n+ –diff to substrat (sidewall) 4 (pF /cm)
p+ –diff to n-well (bottom) 38
p+ –diff to n-well (sidewall) 4 (pF /cm)
Poly–substrate 5.9
Metal1–substrate 3.2
Metal2–substrate 2
Metal1–metal2 3.9
Metal1–poly 5.4
Metal2–poly 2.5
Metal1–n+ -diff. 5.2
Metal1–p+ -diff. 5.5
Metal2–n+ -diff. 2.4
Metal2–p+ -diff. 2.5
Resistances Value
n-well 2.5 kΩ/2
n+ -diffusion 50 Ω/2
p+ -diffusion 150 Ω/2
Poly 50Ω/2
Metal1 60 mΩ/2
Metal2 40 mΩ/2
Contact 100 Ω/contact
Via 1 Ω/via
Transforming schematics into physical circuits occurs during the layout process. All aspects
of the circuit performance are structured by the patterning. Parasitics, interconnect coupling,
and logic integration density are also determined by the geometries used in the layout artwork.
Although layout is easy to learn, the interplay between the geometrical shapes and the resulting
electrical behavior makes it difficult to master.
VLSI Design
Course 5-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Basic Layout
5.4.1 IC Design
IC design is a very complex process that involves hundreds of decisions dealing with the
variety of IC performances and manufacturing-related issues. The final phase in the design is
the creation of an IC layout; i.e. the creation of the drawing representing the geometry of the
designed circuit. For a given process such a drawing uniquely defines the IC geometry and
therefore the performance of the designed circuit.
The layout of an IC is defined as a set of polygons that determines the presence or absence
of regions in a number of conducting and isolating layers. In other words, an IC layout shows
from which part of the IC surface such materials as metal, silicon dioxide, photoresist, and so
on should be removed, and where other materials should be deposited.
During the design the IC is represented by a set of numbers that can be manipulated to create
a composite drawing of IC masks on the screen of the terminal or on the color plotter. In the
manufacturing process a “hard copy” of this layout is needed in the form of photolithographic
masks.
Typically, the IC design is transformed into a set of masks in a sequence of steps illustrated in
Fig. 5.23. First, coordinates of all elements of the IC composite drawing are computed. Then
data representing different layers are separated(Fig. 5.23 (c) and (d)) and an image of each IC
layer is produced. Typically, such images are engraved on the surface of glass plates covered
with chromium, using a photographic technique and pattern generator or E-beam equipment.
Masks created in this way are called master mask.
Next master masks are scaled down (Fig. 5.23 (e-f)) and duplicated (Fig. 5.23 (g-h)) so that
working masks made in this way contain a couple of tens to a couple of hundreds of the same
images as tte master masks. The size of the working mask is such that with a single exposure
the entire area of a single manufacturing wafer can be covered.
In the new lithography techniques, working masks are not needed and the image from the
mask is transferred directly onto the surface of the wafer (the master mask is then called a
reticle). Special high-precision optical step-and-repeat cameras are used for this purpose.
Data that describe a single IC layer can also be used to project an image directly onto the
surface of the manufacturing wafer using an electron beam technique. In this technique a
deflected beam of electrons exposes appropriate regions directly on the surface of the photore-
sist.
Structured layout is based on the idea of grids and cells. The simplest approaches start with
the power distribution lines VDD and VSS and structure the circuits as needed. Each gate is
placed in a semi-rectangular cell, and cascaded logic is achieved using adjacent cells. Fig. 5.24
illustrates the general idea. Both signal and power lines run horizontally in the network. Logi-
cal gates are built between metal VDD and VSS lines, while the signals may move between poly
and metal layers when necessary. Minimization of the area is achieved by creative placement
and shaping of the MOSFETs, interconnects, and cells in the overall grid structure. It is
important to remember that the dimensions set the electrical characteristics and must adhere
to the design rules set. CMOS has the added complications of complementary nMOS/pMOS
logic blocks and physical separation of nMOS and pMOS transistors, which affect the layout.
VLSI Design
Course 5-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Basic Layout
Complementary structuring is illustrated in Fig. 5.25. Each input is connected to both nMOS
and pMOS transistors which are physically separated from one another due to the opposite
background polarity requirements.
High-speed switching requires large currents and small Cout to insure small charging and
discharging time constants. It is evident that this leads to a design problem: to increase
current flow, we must use large ( W
L ) values for the MOSFETs, which in turn increases the
VLSI Design
Course 5-28
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Basic Layout
transistor capacitances. Increasing the aspect ratios in a CMOS circuit gives larger values for
both Cin and Cout , affecting the performance of the entire logic chain. In bottom-up design,
we attempt to optimize each gate, both intrinsically and with respect to its nearest neighbors.
The concept of the equivalent load helps the initial layout problem by defining “standard”
transistor or logic gate capacitances which are used as a reference. All loads are then specified
by the number of equivalent loads. A common choice is a minimum-area transistor as shown
VLSI Design
Course 5-29
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Basic Layout
in Fig. 5.26. Assuming drawn gate dimensions of (W × L), the gate input capacitance is
approximated by
CG ≈ Cox W L.
An inverter made using minimum area nMOS and pMOS transistors has an input capacitance
of approximately
Cin = 2CG
which becomes our reference value.
To use the equivalent load concept, we assume that the circuit we are designing must drive a
load of value
CL = nCin ,
where n is a scaling factor indicating the size of the transistors used in the next gate. For
example, n = 2 may imply a single gate with MOSFETs which are twice as large as the
reference, or a fan-out F O = 2 into two minimum size gates. The circuit is designed according
to the assumed load value. After the design of the logic chain is completed, we recheck the
circuit to insure that the actual switching performance is acceptable.
Optimization of the circuit performance can also be specified at the system level and then
applied to each gate. This type of top-down approach has been used to estimate gate sizing
rules to speed up the response of a static logic chain. In general, combining the two views
offered by bottom-up (circuit level) and top-dowm (system level) design provides the most
powerful approach to high-performance design. Large digital networks contain both critical
and non-critical logic paths so that intermixing design philosophies are often required.
VLSI Design
Course 5-30
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Basic Layout
Circuits which are fabricated in bulk CMOS require additional safeguards to aviod latch-up.
A common approach is to use guard rings, which are heavily doped n+ or p+ regions around
MOSFETs as shown in Fig. 5.27. Guard rings reduce the transistor current gain and offset
the potential and are effective in preventing latch-up. Another common preventative measure
is providing substrate bias contacts next to every MOSFET which is connected to the power
supply or ground.
Static CMOS gates are based on complementary nMOS/pMOS logic blocks. Cell design can
be split into two tasks: transistor placement and interconnect routing. Real estate budgets
often have priority status, so that some thought may be required to fit the subsystem into the
allocated area. The main limitations are usually due to design rule spacings and the complexity
to the interconnect topolgy. Other considerations which may come into play include the shape
of the allocated area, location of input and output lines relative to neighboring logic units,
and clock distribution.
Some of the more interesting designs are based on the complementary placement of opposite
polarity MOSFETs. Consider a NOR2 gate. This circuit uses 2 nMOS transistors in paral-
lel and 2 pMOS transistors in series. Fig. 5.28 shows how the complementary arrangement
can be implemented by using similar transistor arrays with different interconnect patterning.
Reversing the transistors in the NOR2 gate in Fig. 5.28(a) directly yields the NAND2 gate
shown in Fig. 5.28(b).
Although some layouts are based on the schematic patterning, these do not generally yield
minimum-area circuits. Thoughtful use of transistor arrays and interconnect routing is usually
VLSI Design
Course 5-31
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Basic Layout
The Layout of transmission-gate logic circuits is complicated by the transmission gate itself.
The switch uses parallel-connected nMOS and a pMOS transistors which reside in opposite-
polarity backgrounds. Consider, for example, a pwell process. The p-channel transistor is
VLSI Design
Course 5-32
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Basic Layout
located on the n-substrate, while the nMOS is in a p-well region. Two extreme layout philoso-
phies are (a) use a p-well for every transmission gate, or, (b) use a single p-well for all transmis-
sion gates in the circuit. These are illustrated in Fig. 5.29. Approach (a) reduces integration
density due to the p-well spacing requirement, but is easy to replicate on a CAD systems; (b)
on the other hand, may provide higher logic density, but has a larger capacitance from the
extra interconnect. Although both are used in practice, minimizing the number of wells is
ussually the preferred strategy. Since each well requires a connection to either VDD or VSS ,
this also aids in power distribution.
A critical aspect of high-speed CMOS layout is control of the parasitic capacitance values.
VLSI Design
Course 5-33
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Layout Examples
VLSI Design
Course 5-34
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Layout Examples
VLSI Design
Course 5-35
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Layout Examples
VLSI Design
Course 5-36
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Layout Examples
VLSI Design
Course 5-37
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Layout Examples
VLSI Design
Course 5-38
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Layout Examples
VLSI Design
Course 5-39
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Introduction
Chapter 6
6.1 Introduction
Packaging affects significantly or in some cases dominates the overall chip costs ([22]). The
increase of packaging costs for a increasing number of gates on is different for memory and
logic/microprocessor devices:
Memory devices:
Due to multiplexing techniques on the chip, the I/O requirements remain essentially constant
• high reliability
• package must be compatible with a variety of assembly, test and handling systems
VLSI Design
Course 6-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Introduction
Figure 6.1: Continuous growth in DRAM complexity and size places little
demand on package size and number of I/Os
VLSI Design
Course 6-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Introduction
Figure 6.2: Comparison of I/O requirements for DRAM, logic and micro-
processor devices
VLSI Design
Course 6-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Package Types
Principally there are two types of mounting devices to printed wiring boards (PWB):
• up to 48 terminals:
– small outline (SO) (available in plastic only):
SOP: small outline package
SSOP: shrinked small outline package
– quad types: chip carriers (CC) and flatpacks (available in ceramic and plastic)
• above 48 terminals: quad types only
– leaded plastic (PLCC)
– leaded ceramic (LDCC)
– leadless ceramic (LLCC)
VLSI Design
Course 6-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Package Types
Figure 6.3: Examples for packages and PWB mounting techniques: (a) TH:
Dual-in-line (DIL) package. (b) TH: Pin-grid-array (PGA) pack-
age. (c) SM: ”J”-leaded packages, leaded chip carrier or small-
outline. (d) SM: Gull-wing-leaded packages, chip-carrier or
small-outline. (e) SM: Butt-leaded package, small-outline dual-
in-line type. (f) Leadless type, ceramic chip carrier mounted to
a matching ceramic substrate
VLSI Design
Course 6-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Package Types
VLSI Design
Course 6-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Considerations
Figure 6.7: Bonding-pad pitch versus chip lead count for several chip sizes
VLSI Design
Course 6-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Considerations
Figure 6.9: CAD template for positioning bonding pads (assures that wire
span length meets the design rules)
VLSI Design
Course 6-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Considerations
Figure 6.11: CAD template for checking the maximum distance that wire
spans over silicon. Here: violation of the guidelines. The circle
must be at minimum tangent to the step-and-repeat centerline
(case of maximum distance) or cross it
VLSI Design
Course 6-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Considerations
• Objective: keep temperature of silicon die low enough to prevent failure rate
Increased operation speed and reduced noise margins demand a more careful consideration of
package design. Performance criterions:
The inductances of SM packages are significantly lower than the inductances of TH packages
due to their shorter lead traces.
Most important problem: noise reduction. The noise induced in the ground line when one line
is switching is given by
di
Vi = Lg (6.2)
dt
VLSI Design
Course 6-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Considerations
If m ground leads are used, the total inductance is approximately Lg /m. In practical designs
often up to 25% of the leads have to be grounded in order to keep noise in desired limits (also
usage of large-area power and ground planes within the package).
• Ideally: prefer to use materials that are matched in physical properties, especially which
have the same TCE (Themal Coefficient of Expansion)
VLSI Design
Course 6-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Considerations
VLSI Design
Course 6-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Assembly Technologies
Figure 6.15: Generic assembly sequence for plastic and ceramic packages
• In some cases: wafer thinning down using highly automated backgrinding processes
• The sawed wafer is still mounted on a tape frame-fixture (to which it has been attached
before sawing and which is not destroyed by the sawing step) and loaded into an auto-
matic die bonder that picks only the good chips from the tape
The back of the die is mechanically attached to a mount medium, such as ceramic substrate,
multilayer-ceramic-package-piece part or metal leadframe. This attachment sometimes enables
electricial connection to the back of the die to be made.
VLSI Design
Course 6-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Assembly Technologies
VLSI Design
Course 6-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Assembly Technologies
VLSI Design
Course 6-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Assembly Technologies
VLSI Design
Course 6-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Package Technologies
(f) the capillary rises and the wire clamp closes at a predefined height
Figure 6.19: Thermosonic ball wire bonds on a gate array VLSI chip
• very effective for constructing complex packages with many signal, power, ground, bond-
ing and sealing layers
VLSI Design
Course 6-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Package Technologies
VLSI Design
Course 6-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Package Technologies
Lower cost ceramic technology applicable to single-chip DIPs and quad CERPACs. This
technology relies on glass-sealing a leadframe between two pressed ceramic units.
VLSI Design
Course 6-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Package Technologies
Postmolding
• low cost
• thermosetting epoxy resins are molded around the leadframe-chip subassembly after the
chip being wire-bonded to the leadframe
Premolding
VLSI Design
Course 6-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Package Technologies
→ the preheated molding compound flows under pressure to fill the cavities containing lead-
frame strips with their attached ICs.
VLSI Design
Course 6-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
IC Package Market Share
VLSI Design
Course 6-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Packaging Trends
VLSI Design
Course 6-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Packaging Trends
VLSI Design
Course 6-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Packaging Trends
VLSI Design
Course 6-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Packaging Trends
VLSI Design
Course 6-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Packaging Trends
VLSI Design
Course 6-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
CAD Tools
Chapter 7
The following list shows some important CAD tools used for the design of integrated circuits:
• graphics editor (drawing schematic diagrams, physical layout, stick layout diagrams, . . . ,
used for displaying results from simulations, layout verifications (like design rule checks),
placement and routing, . . . )
• language based circuit capture tools (for hardware description languages like VHDL,
Verilog, EDIF, . . . )
• physical design verification tools (design rule checker, extractor, LVS, schematic and
electrical rule checker, . . . )
• simulation tools (analog simulation: circuit level; digital simulations: circuit level, switch
level, logic level, register transfer level, architectural level, behavioural level; thermal
simulation: displaying heat dissipation on chip)
• logic optimizer
• database management (to keep different versions (current, backup1, backupn) and views
of a design object [schematic, simulation netlist, stick diagram, physical layout, . . . ]) in
the design database)
VLSI Design
Course 7-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Full Custom Design
With Full Custom Design techniques, the designer is able to individually specify the geometri-
cal layout of the integrated circuit (transistor size [channel length, channel width, shape, . . . ],
transistor placement, wire width, . . . ). The designer has the option to manually optimize the
layout → the most dense layouts can be generated using the full custom design styles.
• The layout is drawn in form of rectangles and polygons on different layers using a
graphics editor.
• The designer has to know a large set of process dependent design rules.
• The mask layout is generated as drawn on the screen → direct influence to compo-
nent placement, to important parameters as W and L of transistors, wire widths,
....
Stick Diagram
• The layout is drawn in form of lines and polygons on different layers using a graphics
editor. A stick–to–layout converter together with a compactor and a description of
the process design rules is then used to generate the rectangle based layout.
• The designer can draw almost process and design rule independent symbolic lay-
outs. Process adaption is done by the converter/compactor.
• Converter constraints (cell dimensions, channel widths / lengths of transistors, . . . )
can be specified.
• The layout is specified in textual form giving either the position and layer of rect-
angles (similar to hand crafted layout) or lines (as in stick diagrams).
• Since programming language constructs like parameterized macros (to be used
for layout segments as cells, . . . ), loops (while, repeat, for, . . . ), and conditional
statements (if, case, . . . ) may be available, parameterized layouts (e. g. generic
transistor with W and L as parameters, cells for different bit–widths, . . . ) can be
described using geometrical specification languages.
• Used in a large number of macrocell compilers.
VLSI Design
Course 7-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Full Custom Design
B x y dx dy Box with length dx, width dy, and lower left hand corner placed at (x, y).
Ln Layout level (layer) for the box definitions that follow
Mn Start of macro definition n
E End of macro definition
Cnxym Call for macro number n with translation x, y and orientation m.
Q End layout file.
Orientation Description
1 no rotation
2 rotate 90o counterclockwise
3 rotate 180o counterclockwise
4 rotate 270o counterclockwise
5 mirror about y-axis
6 rotate 90o counterclockwise and mirror about y-axis
7 rotate 180o counterclockwise and mirror about y-axis
8 rotate 270o counterclockwise and mirror about y-axis
VLSI Design
Course 7-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Full Custom Design
Figure 7.2: Full custom layout Figure 7.3: Corresponding geometrical specifi-
(hand crafted or generated out of a cation file and schematic diagram
stick diagram resp. a layout descrip-
tion)
VLSI Design
Course 7-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Full Custom Design
VLSI Design
Course 7-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Full Custom Design
Stick Diagram
Symbol Generation
Editor
6
?
stick2layout ?
Converter
Compactor Schematic Entry
? ?
Simulation
Layout Editor Netlist
Extraction
Block Layout
?
? Circuit Simulation
Floorplanning
Timing Analysis
Placement 6
Routing
Design Analysis
?
Mask Layout Data - DRC ERC
Circuit Extraction
- LVS
?
Fabrication
Fabrication Test Pattern
VLSI Design
Course 7-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Cell Based Design
The Cell Based Design approaches rely on layout components predefined and provided by the
silicon foundry. Several implementation styles can be distinguished:
Gate Array
Standard Cell
VLSI Design
Course 7-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Cell Based Design
Macrocell
Symbol Generation Specification /
Compilation
6
Compiled Macrocell
? ?
Graphical Simulation
- Schematic Entry - Netlist
Data Extraction
Cell Simulation
Models
? ?
Library Placement Logic Simulation
Layout- Macrocells
- Fault Simulation
Data IO-Cells Timing Analysis
Standard-Cells 6
? Design Analysis
Mask Layout Data -
DRC
Parasitics Extraction
?
Fabrication
Fabrication Test Pattern
VLSI Design
Course 7-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Verification
Physical design rule checks (DRCs) are performed to guarantee the conformity of a layout
design to the silicon vendor’s set of design rules. Design rules are defined between objects on
the same layer (minimum width, minimum spacing) as well as for objects on different layers
(minimum spacing, overlapping, extension).
Minimum width
Minimum spacing
Overlapping
Extension
Design rule violations are usually reported in the physical layout using a graphics editor.
Sometimes, also a tabular form indicating the location and type of design rule violation can
be generated.
7.4.2 Extraction
Circuit Level Extraction: can be used to create a netlist for circuit level simulations (e. g.
SPICE, . . . ). The netlist consists of MOS transistors (including geometrical parameters
as W / L, parasitic capacitances), resistors, capacitances, diodes, . . . .
Switch Level Extraction: can be used to create a netlist which can be processed by a
switch level simulator. The resulting netlist consists of MOS transistors and parasitic
capacitances (to model storage effects in MOS circuits).
Parasitics Extraction: is used in conjunction with cell based design techniques. Since wire
delay is dependent on the parasitic capacitance of a wire, parasitic capacitances of nets
and input capacitances of other gates connected to an output can be used to estimate
the extrinsic delays (Note: intrinsic delays [i. e. the delay of unloaded gates] are fetched
from the cell library’s simulation model data).
Schematic Extraction: is executed to generate the connectivity data out of a graphical rep-
resentation (schematic diagram) of a circuit module. The connectivity data is forwarded
to a netlister which provides the information required e. g. by simulation tools (the sim-
ulators cannot operate on graphical data, they require netlists in a textual format). This
kind of extraction is usually required in pre-layout design specification phases.
VLSI Design
Course 7-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Verification
Figure 7.7: Example of a design rules set checked during design verification
VLSI Design
Course 7-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Verification
7.4.3 LVS
The layout–versus–schematic (LVS) comparison tool checks the equivalence of the layout and
its schematic. The tool can be used to find wrong connections or parameter mismatch (as W
/ L of transistors, . . . ) between a schematic and its physical layout representation.
To verify schematics used e. g. in cell based designs, a schematic rule checker can find schematic
rule violations (like the following examples):
Warnings:
• open outputs
• exceeded fanout
Errors:
• more than one active driver connected to a net at the same time
VLSI Design
Course 7-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Simulation
7.5 Simulation
• Software development
3-valued logic:
log. zero = 0
log. one = 1
unknown = U
Example:
AND 0 1 U
0 0 0 0
1 0 1 U
U 0 U U
VLSI Design
Course 7-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Simulation
Problems:
HH d
x τ (n) y
• Zero-Delay: ∆ = 0
VLSI Design
Course 7-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Simulation
• Introduction of signal strength additional to logic values for driver and bus modelling
A0 A1 AX P0 P1 PX S0 S1 SX X0 X1 XX Y0 Y1 YX ZZ
A0 A0 AX AX A0 A0 A0 A0 A0 A0 A0 AX AX A0 A0 A0 A0
A1 A1 A1 A1 A1 A1 A1 A1 A1 AX A1 AX A1 A1 A1 A1
AX AX AX AX AX AX AX AX AX AX AX AX AX AX AX
P0 P0 PX PX P0 P0 P0 X0 XX XX P0 PX PX P0
P1 P1 PX P1 P1 P1 XX X1 XX PX P1 PX P1
PX PX PX PX PX XX XX XX PX PX PX PX
S0 S0 SX SX X0 XX XX Y0 YX YX S0
S1 S1 SX XX X1 XX YX Y1 YX S1
SX SX XX XX XX YX YX YX SX
X0 X0 XX XX X0 X0 XX X0
X1 X1 XX X1 XX XX X1
XX XX XX XX XX XX
Y0 Y0 YX YX Y0
Y1 Y1 YX Y1
YX YX YX
ZZ ZZ
Problems:
– Feedbacks
– Sorting of gate netlist
– Zero delay model
– Entire circuit is simulated
• Event driven simulation . . .
VLSI Design
Course 7-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Simulation
HH HH
HHH A1 - HHH A1 -
A H
A H
HH HH
A1 - HHHH AX- HHH
C C H
HH HH
HHH P0 - HHH A0 -
B H
B H
HH HH
HHHH P0 - HHH X1 -
A A H
HH HH
P0 - HHHH XX- HHH
C C H
HH HH
HHH S1 - HHH P0 -
B H
B H
VLSI Design
Course 7-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Simulation
• algebraic or RC models
VLSI Design
Course 7-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Simulation
Drain
d
Logic n-Channel p-Channel
(Gate) Enhancement Enhancement Depletion
Gate -\
\ 1 Closed Open Weak
\ 0 Open Closed Weak
d X Unknown Unknown Weak
Source
remarks:
• Switch transition time is assumed to be zero or some nominal value.
Drain
d
Gate -\
\ Logic n-Channel p-Channel
\
(Gate) Enhancement Enhancement Depletion
...
.......
1 REF F ∞ REF F
XX
X
X 0 ∞ REF F REF F
REF F X
X X [REF F , ∞] [REF F , ∞] REF F
X
X.......
....
Source
remarks:
• In the linear model, node capacitance and devices resistance are used to compute output
logic levels and transition times.
VLSI Design
Course 7-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-28
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-29
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-30
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-31
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-32
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-33
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-34
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-35
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Hardware Description with VHDL
VLSI Design
Course 7-36
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Weinberger Structuring
Chapter 8
Weinberger structuring is a structured approach that simplifies physical layout and improves
layout density. The method has been presented by Weinberger in 1967.
Weinberger Arrays
• are created by placing transistors on the chip in a geometrically regular manner. Hori-
zontal and vertical interconnect patterns are used to wire the devices together.
• using one type of gate; for example, NOR gates form a complete logic set for nMOS
circuits
VLSI Design
Course 8-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Weinberger Structuring
Example:
F = (A + B + C) = A B C (8.1)
VLSI Design
Course 8-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Weinberger Structuring
VLSI Design
Course 8-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Weinberger Structuring
VLSI Design
Course 8-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Weinberger Structuring
Example:
Z =U +V +W +X +Y (8.2)
U
V
Z
X
Y
b b b b b VDD
b b b b
b b b b b b b b
b b
b b b b b b
b
b b
b b b b
U V W X Y Z
Figure 8.5: Weinberger NOR array representation
VLSI Design
Course 8-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Weinberger Structuring
VLSI Design
Course 8-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Matrix Layout
Gate matrix layout is a character based layout style for custom CMOS circuitry. It is a
regular design style, employing a matrix of intersecting transistor diffusion rows and polysilicon
columns such that intersections are potential transistor sites.
Representational line drawing or stick figure using the levels of interconnections available (e.g.
polysilicon gate technology: polysilicon, metal, diffusion)
• immediately draw series of parallel poly lines corresponding to the number of inputs to
the circuit (may become more if an output is chosen to be polysilicon)
• subsequent transistor placements will be determined by two factors, i.e. input column
and serial or parallel association among transistors.
• after row definition, further interconnections may be done with horizontal and vertical
metal interconnection tracks
• final improvements
VLSI Design
Course 8-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Matrix Layout
Figure 8.8: Gate matrix layout: (a) schematic (b) layout (c) optimized layout of n part
VLSI Design
Course 8-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Matrix Layout
A C
HA
B S
A
C
B
A S
C = AB = AB (8.3)
S = AB + AB
= (A + B) B + (A + B) A
= (A B) B + (A B) A
= (A B B) (A B A) (8.4)
VLSI Design
Course 8-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Matrix Layout
Figure 8.10: Half adder realizations: (a) standard cell (b) gate matrix
N n-channel transistor
P p-channel transistor
+ metal-poly or metal-diffusion crossover
∗ contact
| polysilicon or n-diffusion wire
! p-diffusion wire
: vertical metal
– horizontal metal
VLSI Design
Course 8-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Matrix Layout
VLSI Design
Course 8-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Matrix Layout
1. Polysilicon runs only in one direction and is of constant width and pitch.
2. Diffusion wires (of constant width) may run vertically between polysilicon columns.
3. Metal may run horizontally and vertically. Any pitch departures from a minimum (e.g.
power rails) are manually specified.
VLSI Design
Course 8-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Matrix Layout
+ technology updatable
+ circuit extraction may done at the symbolic level or at the mask level by conventional
circuit extractions
VLSI Design
Course 8-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
In MOS circuit design advantage can be taken by the application of complex functional cells
in order to achieve better performance. In this section the implementation of a random logic
function on an array of CMOS transistors will be discussed. The method has been presented by
Uehara and van Cleemput in 1981. A graph theoretical approach for systematic and efficient
layout generation minimizes the required chip area.
⇓ optimal
Figure 8.13: (a) CMOS complex gate schematic and (b) corresponding layout
VLSI Design
Course 8-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
Figure 8.14: Implementation of an EXOR function: (a) Logic diagram. (b) Circuit. (c) Layout
+ better performance
VLSI Design
Course 8-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
+ smaller size
In the following, the consideration is limited to AND/OR networks realized in complex gate
CMOS by means of series/parallel connections of transistors. The topology of the nMOS
network and the pMOS network are assumed to be dual.
The delay of a complex CMOS cell mainly depends on the maximum number of series transis-
tors between VDD or VSS and the cell output, which is called level of the complex cell. This
quantity has a direct influence on the charging or discharging resistance of the cell. Generally
cells with less than four levels are desirable. The number of cells with parallel/serial topology
is given by the following table:
So it’s reasonable to use mainly cells with three levels and only sometimes cells with four levels
in order to get a sufficient performance.
VLSI Design
Course 8-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
Figure 8.16: Alternative complex gate implementation of EXOR function: (a) Logic diagram.
(b) Circuit. (c) Layout
VLSI Design
Course 8-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
Figure 8.17: Basic layout of the functional cell: (a) Logic diagram. (b) Circuit. (c) Graph
model. (d) Layout
VLSI Design
Course 8-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
• two rows of transistors, implementing the pMOS and nMOS part of the circuit
• equal number of transistors in both rows
Figure 8.18: Layout optimization: (a) Diffusion connection of adjacent transistors. (b) Opti-
mal arrangement (reordered input lines)
Fig. 8.18 shows layout improvements for the circuit in Fig. 8.17. If the metal connections
between adjacent transistors are replaced by diffusion (designer should be careful in doing this
for high-speed circuits) the layout of Fig. 8.18(a) is achieved. An even more sophisticated
layout arrangement which reduces the required area is shown in Fig. 8.18(b).
The best layout is achieved by the transistor arrangement of Fig. 8.19, which is logically
equivalent to the previous figures.
VLSI Design
Course 8-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
Figure 8.19: Alternative optimal circuit layout: (a) Logic diagram. (b) Circuit. (c) Graph
model. (d) Optimal Layout.
VLSI Design
Course 8-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
The p-side and the n-side of the circuit can be formulated as graphs which can be defined as
follows:
Graph properties:
If two edges Ei and Ej are adjacent in the graph model, then it is possible to place the
corresponding gates in a physically adjacent position of an array and hence, connect them by
a diffusion area. In order to minimize the number of separations a set of minimum size paths
has to be found, which corresponds to chains of transistors in the array.
Definition 1 An Euler path is a closed path on a graph, that covers every edge of the graph
exactly once
If there exist Euler paths for GN and GP then all transistors can be chained by diffusion areas.
Otherwise the graphs have to be partitioned into subgraphs which have Euler graphs.
It’s necessary to find a pair of paths for GP and GN with the same sequence of labels, because
p- and n-type transistors corresponding to the same input have to be positioned at the same
horizontal position (poly line).
VLSI Design
Course 8-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
General algorithm:
1. enumerate all possible decompositions of the graph model to find the minimum number
of Euler paths that cover the graph
2. chain the gates by means of a diffusion area according to the order of the edges in each
Euler path and
3. if more than two Euler paths are necessary to cover the graph model, then provide a
separation area between each pair of chains
Definition 2 The reduced graph is obtained by iteratively replacing an odd number of series
(parallel) edges by a single edge, until no further reduction is possible.
Theorem 1 If there is an Euler path in the reduced Graph then there exists an Euler path in
the original graph.
Proof: It is possible to reconstruct an Euler path in the original graph by replacing each edge of the Euler path
in the reduced graph by a sequence of the original odd number of edges.
If there are gates in the logic diagramm with an even number of inputs, additional “pseudo”
inputs have to be introduced in order to guarantee an odd number of inputs. It is guaranteed
by the second previously given theorem, that there exists an Euler path for this modified
problem. But the pseudo edges in the Euler path have to be removed afterwards and then
they can cause diffusion separations. An algorithm for minimizing separations caused by
pseudo edges is given in the next section (⇒ minimal interlace of normal and pseudo inputs).
The heuristic algorithm for generating an Euler path is given by:
VLSI Design
Course 8-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
Figure 8.21: Application of reduction rule: (a) Logic Diagram. (b) Graph model and its
reduction. (c) Reconstruction of an Euler path
2. Add this new input to the gate such that the planar representation of the logic diagram
shows a minimal interlace of “pseudo” and real inputs. It should be noted that a
“pseudo” input at the top or at the bottom of the logic diagram does not contribute to
the separation areas as shown in Fig. 8.22(b) and (c).
3. Construct the graph model such that the sequence of edges corresponds to the vertical
order of inputs on the planar logic diagram.
4. Chain together the gates by means of diffusion areas, as indicated by the sequence of
edges on the Euler path. “Pseudo” edges indicate separation areas.
5. The final circuit topology can be derived by deleting “pseudo” edges in parallel with
other edges and by contracting “pseudo” edges in series with other edges.
VLSI Design
Course 8-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
This heuristic algorithm does not necessarily give the optimal layout, but if the resulting
sequence has no separations areas, it is the real optimal solution.
Figure 8.22: Application of the heuristic algorithm: (a) New inputs p1 and p2 are added. (b)
Optimal sequence of inputs without the interlace of p1 or p2. (c) Circuit with the dual path
{p1,2,3,1,4,5,p2}
VLSI Design
Course 8-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
VLSI Design
Course 8-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
VLSI Design
Course 8-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
8.3.6 Examples
Figure 8.25: Carry look-ahead circuit (this representation has no Euler path)
VLSI Design
Course 8-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
Figure 8.26: Alternative topology for carry look-ahead circuit (with possibility of constructing
an Euler path)
VLSI Design
Course 8-28
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Optimal CMOS Complex Gate Layout
Figure 8.27: Comparison of space: (a) Functional cell realization. (b) Conventional NAND
realization
VLSI Design
Course 8-29
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Standard Cell Layout
VLSI Design
Course 8-30
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Standard Cell Layout
VLSI Design
Course 8-31
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
A programmable logic array (PLA) maps a set of Boolean functions in cannonical, two-level
sum-of-product form into a geometrical structure. A PLA consists of an AND-plane and an
OR-plane. For every input variable in the Boolean equations, there is an input signal to
the AND-plane. The AND plane produces a set of product terms by performing an AND
operation. The OR-plane generates output signals by performing an OR operation on the
product terms fed by the AND-plane.
PAL: AND array is programmable and OR array has fixed connection points (OR gates)
PROM: AND array hardwired, OR array programmable (→ the set of all possible product
terms is realized)
VLSI Design
Course 8-32
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
VLSI Design
Course 8-33
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
Example:
x0 x1 x2 z0 z1
0 0 0 1 1
0 0 1 1 1 z0 = x0 x1 x2 + x0 x1 x2 + x0 x1 x2
0 1 0 0 0 = x0 x1 + x0 x1 x2 (8.10)
0 1 1 0 0
z1 = x0 x1 x2 + x0 x1 x2 + x0 x1 x2
1 0 0 0 0
1 0 1 0 0 = x0 x1 + x0 x1 x2 (8.11)
1 1 0 1 0 (8.12)
1 1 1 0 1
here:
• PROM implementation realizes all of the 8 product terms
AND OR
0 0 X - 1 1
1 1 0 - 1 0
1 1 1 - 0 1
? ? ? ? ?
x0 x1 x2 z0 z1
VLSI Design
Course 8-34
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
VLSI Design
Course 8-35
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
Pseudo nMOS PLA: Pull-up by high resistance pMOS transistor with permanently grounded
gate input
Since the AND-OR structure is not suited to MOS circuit technology both AND and OR
planes are implemented using distributed NOR or NAND gate structures based on deMorgans
law:
1. INV-NOR-NOR-INV structure:
a b + c d = (a + b) + (c + d)
= (|{z}
a +b) + (c + d) (8.13)
| {z }
INV
NOR
| {z }
NOR
| {z }
INV
Example:
z0 = xo x1 + x0 x1 x2
h i
= (x0 x1 + x0 x1 x2 ) (8.14)
= (x0 + x1 ) + x0 + x1 + x2
(8.15)
Properties:
VLSI Design
Course 8-36
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
VLSI Design
Course 8-37
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
2. NAND-NAND structure:
ab + cd = ab + cd
= (a b) (c d) (8.16)
Example:
z0 = xo x1 + x0 x1 x2
= (x0 x1 ) (x0 x1 x2 ) (8.17)
(8.18)
Properties:
VLSI Design
Course 8-38
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
NOR gates with a large number of inputs should be avoided in CMOS, because the p-channel
devices are in series.
Static CMOS PLA are usually realized with NAND-INV-INV-NAND structure in order to
avoid long chains of pMOS transistors. Properties:
• working fast
• fast
• 2-phase clocking
• states of φ1 :
VLSI Design
Course 8-39
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
φ1 = 1:
– no path to ground
– inputs change
VLSI Design
Course 8-40
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
• to reduce noise: locally grounding the PLA; use of metal lines for power supply whenever
possible (reduced impedance)
Logic Minimization
VLSI Design
Course 8-41
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
• if a term is needed both positive and negative sometimes a reduction can be achieved
using negative logic
Example:
0 0 0 0
z = x1 + x0 x1 x2 + x0 x1 x2
=⇒ 3 minterms
0 0 0 0 0 0
z = (x1 + x0 x1 x2 + x0 x1 x2 )
0 0 0 0 0 0 0
= x1 (x0 x1 x2 ) (x0 x1 x2 )
0 0 0
= x1 (x0 + x1 + x2 ) (x0 + x1 + x2 )
0 0 0 0
= (x1 x0 + x1 x2 ) (x0 + x1 + x2 )
0 0 0 0
= x0 x1 x2 + x0 x1 x2
=⇒ 2 minterms
VLSI Design
Course 8-42
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
Folding
An advantage of multiple-sided access and folding is the decreased layout area, but the layout
structure has changed and the wiring is more difficult.
VLSI Design
Course 8-43
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
Delay is determined by
Minimum Delay:
• (W/L)ORplane = e · (W/L)ANDplane
Limitations:
• the stage sizing factor e for successive stages can not always be realized due to the
floorplan
VLSI Design
Course 8-44
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
JJ
logical optimization
A
A
truth table = matrix
Cells:
A
input/output buffer A
HH
clock driver
floorplanner HH structure of PLA
VDD/VSS cells
Schmittrigger ...
A
A
Output: layout with mask data
PLA adderpla;
INPUT: I1,I2,I3;
OUTPUT: O1,O2;
PRODUCT: P1,P2,P3,P4,P5,P6,P7;
AND_BEGIN
P1 := I1 * I2;
P2 := I1 * I3;
P3 := I2 * I3;
P4 := I1 * I2’ * I3’;
P5 := I1’ * I2 * I3’;
P6 := I1’ * I2’ * I3;
P7 := I1 * I2 * I3;
END_END
OR_BEGIN
O1 := P1 + P2 + P3;
O2 := P4 + P5 + P6 + P7;
OR_END
VLSI Design
Course 8-45
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Arrays
11X 10
1X1 10
X11 10
100 01
010 01
001 01
111 01
VLSI Design
Course 8-46
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
A typical digital circuit architecture for computation intensive applications consists of a data-
path and a controller. The data-path is formed by a number of arithmetic units like adders,
ALUs, multipliers etc. connected through a network of connections, busses, multiplexors and
registers. Registers are required to separate computational stages from each other (to syn-
chronize computations) or to feed back data for further arithmetic operations (to break up
circuit loops).
However, no circuit can be realized through a data-path only since this circuit part has to be
controlled to perform actual computations. Signals are required to select e.g. the functionality
of an ALU, to steer data through multiplexors to a dedicated input of an arithmetic unit or to
control the reading of values into registers. Those signals are provided through a control unit
or short controller. To support a hierarchical design approach data-path and controller are
always regarded separately as shown in figure 8.48. The control section provides some control
signals required for datapath control and on the other hand reads status information as e.g.
overflow flags or comparator results (to control loop execution etc). A typical control task
example is the instruction set execution of standard microprocessors. Simplified the controller
can work in the following way:
VLSI Design
Course 8-47
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
Different steps are required to fetch, decode and execute an instruction. Depending on the
decoding of the instruction a dedicated sequence of steps will be executed. During each step
an output vector will be produced to control the datapath (e.g. switch a multiplexor to se-
lect a certain operand or determine ALU-operation to be performed on operands). In this
example the controller is also receiving signals from the datapath as e.g. instruction decod-
ing information in step 3 to be able to branch into the corresponding instruction execution
sequence.
The question arises now, how such a controller can be specified and designed. Combinational
circuit specification through boolean equations provides a good model for the behaviour of
memoryless digital circuits. However, it is quite obvious that a controller realization cannot
be memoryless. This is due to the fact that one is passing through a sequence of steps which
generally will be influenced through signals to be read from the datapath. During each step an
output vector has to be produced to control/steer the datapath. Therefore, a controller can
be regarded as a black box with an input and output vector, where the values of the output
vector depend on the current step. A certain step is reached through a sequence of preceding
steps which finally means, that the value of an output vector depends on the history of the
circuit. Such a behaviour is only possible when memory is available.
Synchronous digital circuits which comprise memory elements are called sequential circuits
since the results produced at the primary outputs generally depend on the values at the
primary inputs and the history of the circuit. History in this context means the values of all
registers in the current step (or state) which received those values before the actual clock cycle
(in the past). Therefore, during its operation the circuit will run through a sequence of states
represented through the register contents.
Each sequential circuit can be represented in a way as depicted for the controller on the left
side of figure 8.48 if all registers are collected into the state register, all combinational logic
producing the contents of the registers into the next state logic and all combinational logic
producing the primary output values into the output function.
Due to the existence of memory, combinational circuit theory is no well suited model for the
description of controllers or any other sequential logic. Since a controller can be regarded as a
special case of sequential logic application (and one is interested in a general approach to cope
with all sequential logic circuits) the more general term sequential logic will be investigated
in the remaining section. Figure 8.49 shows a small example of a sequential circuit. Despite
it is principally possible to replace the registers through the corresponding combinational
circuits and to open the feedback loop such that combinational circuit theory can be applied,
a more abstract behaviour description would be desirable. This is especially true for complex
controllers where a designer does not want to be concerned with too much circuit details.
Fortunately, the theory of finite state machines provides an abstract basis for the modelling
of sequential logic.
VLSI Design
Course 8-48
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
In this section we will show how a sequential circuit can be seen as one of several possible
implementations of a particular finite state machine (FSM). Each FSM has a finite set of
discrete states as well as a finite set of digital inputs and outputs and a set of digital rules that
govern its behaviour. An FSM operates in discrete time why its behaviour can be characterized
as a sequence of steps that occur at regular intervals (all registers are synchronously clocked).
An FSM’s inputs, outputs, and state are assumed to be constant during each interval, changing
only at the boundaries between consecutive intervals (the registers are triggered with rising
or falling clock edge). Summarizing an FSM is defined in the following way:
A finite state machine is a digital device having
• a finite set of states S1 , S2 , ..., Sk (where k is the number of states). Optionally one of
these, SI is distinguished as the initial state of the FSM
• a set of state-transition rules specifying, for each choice of current state SS and input
values I1 , I2 , ..., Im , a next state SS 0
• a set of output rules specifying, for each choice of current state SS and input values
I1 , I2 , ..., Im , the binary value at each output
One distinguishes between two types of finite state machines, namely the Moore machine and
the Mealy machine. Both types of machine differ in the last of the topics mentioned above. In
the case of Moore type machines the output rules are such that the outputs of a Moore FSM
are functions of the current state only. In figure 8.48 this would mean that control inputs are
only going into the next-state function block and not into the output function block.
The alternative Mealy machine model allows outputs to reflect current inputs as well as current
state. Therefore, figure 8.48 represents a Mealy machine. The behaviour of every FSM can be
described using either model, although the number of states and timing details will generally
differ.
The Moore machine has some advantages for theoretical reasoning and is therefore generally
used in proving, however, the Mealy machine type is preferred in actual circuit implemen-
tations since it generally requires less states (which means less logic for its realization) and
VLSI Design
Course 8-49
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
it can respond immediately upon changes of the input vector (a Moore machine first has to
branch into a new state since output values only depend on the state information). Practical
FSM implementations typically have a reset input, which returns the FSM to a well defined
initial state such that the automata can be reset before a new input sequence is applied (e.g.
when the system containing the FSM is turned on).
Returning to the circuit of figure 8.49, one can identify the discrete states by tabulating
combinations of values for its state variables. If q0 and q1 are used to denote the values of
the state variables in the current state, and n0 and n1 to denote the values in the succeeding
state, the following equations will describe this circuit:
n0 = in · q¯1
n1 = q 0
out = q1 · q0
The state-transition and output rules are shown in the truth table of table 8.1, which lists
all possible combinations of current state and input variables on the left side, and the next
state which the machine should enter on the right side along with the corresponding output.
These tables can be easily obtained from the implementation of the FSM. For example, if in
the circuit above, q1 = 0, q0 = 0, and in = 0, then the next state that results is q1 = 0, q0 = 0.
If in = 1, the next state will be q1 = 0, q0 = 1.
The state-transition table immediately suggests a ROM implementation of the FSM, the left-
hand side of the table being the address of the ROM and the right-hand columns being data
outputs.
The final and most abstract representation for a finite-state machine is a state-transition
diagram. In such a diagram, states are shown as circles. Outputs associated with the state
are given inside the circle. Transitions between states are represented as directed arcs from
one circle to another. The input combination that causes a given transition is written along
the arc. Since we are dealing with clocked sequential machines, transitions only occur on
clock edges, and for this reason the clock is not explicitly shown on state-transition diagrams.
Figure 8.50 gives the state-transition diagram for the FSM discussed above.
VLSI Design
Course 8-50
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
The realization of clocked sequential circuits is a fairly straightforward processing having four
main steps.
First step is to draw a state-transition diagram for the FSM. This is often a very difficult
step since it requires thinking very precisely about what the FSM is supposed to do. Next,
determine the number of state variables (and therefore registers) from the number of states
in the state-transition diagram and assign a binary encoding to each state. This assignment
can be done arbitrarily, however, this might result in an inefficient solution. An optimal
state assignment is of major importance for the amount of combinational circuitry required to
implement the FSM. Unfortunately this problem is NP-hard which means that it is suspected
to require exponentially growing computation time if the problem size is increasing. The
importance of an appropriate state encoding will be illustrated at the end of this subsection.
Then, based on the state-transition diagram, a state-transition table has to be built. It is
important that the table covers all possible input combinations for each possible state (if
a combination does not occur don’t cares should be inserted which can be exploited during
combinational logic minimization). From the table, the circuit can be directly implemented
with ROMs. If another implementation is required (logic gates, for example), Karnaugh maps
from the state-transition table for each next-state variable have to be developed. Finally,
a reduced sum-of-products expression has to be found for each which can be implemented
through appropriate combinational logic.
To illustrate those steps consider the design of a simple FSM whose one output goes high
every five clock times and remains high for one clock period. The frequency of the output
pulses is one-fifth that of the clock. This type of circuit is called a divide-by-5 counter. This
machine has no external inputs. Its state-transition diagram is shown in figure 8.51. A state
assignment and a state-transition table for this counter are given in table 8.2. Please note
that the number of 3 bits for state encoding as well as the actual encoding of each state had
been done arbitrarily. A larger number of bits or another encoding could have been selected!
The table can now be realized through e.g. ROMs or using explicit combinational logic (realized
as two/multilevel gates, PLA etc).
Figure 8.52 shows another example which in the following will be used to illustrate the im-
VLSI Design
Course 8-51
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
portance of state-encoding. The behaviour of the FSM is given represented in transition table
8.3. State encoding is the process of assigning a unanimous bit vector to each state of the
FSM, e.g. the following two encodings can be selected:
VLSI Design
Course 8-52
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
Encoding 1 Encoding 2
S0 = 00 S0 = 00
S1 = 01 S1 = 11
S2 = 11 S2 = 01
with k = 2n (n is the number of selected state bits) and m being the number of states to be
encoded. Typically n is chosen as n = dlg2 (m)e. However, other values are possible for n, e.g.
one bit per state!
In the example above: k = 22 = 4 and m = 3. With these constraints the number of
4!
possible encodings is (4−3)! = 24. Each corresponding encoding results in different complex
realizations.
The first state encoding had been
S0 = 00, S1 = 01, S2 = 11
out = abc
y1 = āb̄c
y2 = ab̄ + b̄c + ac
The number of product terms is 5 and that of the literals 12. A resulting hardware implemen-
tation using combinational logic is shown in figure 8.53 The second encoding had been
S0 = 00, S1 = 11, S2 = 01
out = ābc
VLSI Design
Course 8-53
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
y1 = ab̄
y2 = ab̄ + bc
The number of related product terms and literals is now 4 resp. 9. Figure 8.54 shows a
corresponding realization.
As one can see state encoding is crucial for efficiency of the final solution. Unfortunately there
is no way to find an optimal assignment with an algorithm whose complexity is bound by a
polynomial expression. A good heuristic is to simply select an encoding where only one bit
is changing when sequencing from state to state (gray code). Another good approach can be
one-hot encoding (where a single bit represents each state) which is certainly restricted to a
small number of states.
Although it is possible to base FSM realizations on self-timed or other timing disciplines, most
FSM implementations are based on a synchronous, single-clock scheme. As already mentioned
in connection with figure 8.48 a general sketch of an implementation strategy using the Moore
VLSI Design
Course 8-54
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
machine model (outputs are functions only of current state, independently of current inputs)
is shown in figure 8.55. One should note the use of a clocked register to hold the current state
information. All other blocks are combinational logic components which can be realized in
different ways (PLA, ROM, dedicated logic circuits etc).
Timing of the inputs of such a circuit has to be synchronous with the FSM’s clock because
all signal outputs of the next-state logic have to be settled down before the values are loaded
into the registers during rising clock. In the case of asynchronous transitions nonsense might
be loaded or meta-stable states of the registers might be activated.
Asynchronous inputs can be treated as shown in figure 8.56. The synchronisation through
additional clocked registers guarantees that the inputs to the state register are stable at each
active clock edge, assuming of course that the propagation delay along the combinational
VLSI Design
Course 8-55
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
path through the logic is shorter than the clock period (plus setup-time of state registers).
Moreover, although meta-stable behaviour of the input register remains a possibility, it has
a clock period (minus the next-state propagation delay) to become valid before it corrupts
the contents of the state register. Thus, for sufficiently long clock periods, this latter design
should be arbitrarily reliable.
It is important to recognize that the implementations of figures 8.55 and 8.56 behave slightly
differently, owing to the extra clock delay in the inputs of figure 8.56. Given identical next-
state logic, identical input sequences will yield output sequences delayed by one clock cycle in
the second approach.
Most real digital systems are finite-state machines, yet the view and techniques introduced
in this chapter are not appropriate in every circumstance. The binary encoding of an FSM’s
state allows at most 2k states to be represented in k bits of state variables, and in general
about k flip-flops are required to hold the state of a 2k -state machine. Adding a single flip-flop
to a machine potentially doubles its number of states. This exponential relationship between
the number of states and the amount of physical hardware in a sequential circuit leads the
FSM model to become awkward in dealing with sequential circuits having more than a few
bits of storage. A 10-bit register, for example, would be quite difficult to characterize by a
state-transition diagram; the number of states of a supercomputer is inconceivably large.
Typically, such systems are viewed in terms of memory cells and registers, partitioning the
enormous state into more tractable units. It is important to recognize that sequential circuits
may be viewed either in state or in bit terms, that the two are exponentially related, and that
it is often useful to change between these views.
Therefore, the reader should be aware that it makes no sense to apply the FSM-model to each
type of sequential circuit. However, the FSM-model is very well suited to support the design
of controllers since the number of states is reasonably small.
VLSI Design
Course 8-56
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
The input/output behaviour of two FSM’s may be identical even though the machines have
different transition and output rules or even different numbers of states. As a degenerate
example, one consider two single-input FSMs whose output remains constant, independent of
their state. From external observations it is impossible to distinguish between the states of
such machines – one might have one state and the other nine, yet the machines are externally
indistinguishable. We call FSMs equivalent if they are indistinguishable; for all practical pur-
poses, equivalent FSMs are interchangeable. Therefore, the equivalence of FSMs is important
for their construction since the designer is interested to transform an initial FSM specifica-
tion to an equivalent machine which can be realized most efficiently on silicon meeting all
required constraints. It is therefore useful to develop the notion of equivalence together with
engineering tools for reducing a specified FSM to a simpler equivalent.
The terms state equivalence and FSM equivalence are defined in the following way:
State equivalence: Let s1 and s2 be particular states of FSMs M1 and M2 . State s1 of M1
is equivalent to state s2 of M2 if and only if for every finite sequence of
inputs, the outputs resulting from the application of that sequence to
M1 in s1 are identical to the outputs resulting from the application of
the same sequence to M2 in s2 .
Thus two states are not equivalent only if there exists a finite input sequence that leads them
to produce distinct outputs. The notation M : s will be used to specify state s of machine M .
FSM equivalence: Let s1 and s2 be initial states of FSMs M1 and M2 . Then the machines
M1 and M2 are equivalent if and only if M1 : s1 is equivalent to M2 : s2 .
Given an FSM that solves some practical problem, one is often interested in finding the
smallest equivalent FSM in order to minimize costs. While several measures of ‘smallest’
might be proposed, a natural candidate (and usual choice) is the number of FSM states. Thus
one seeks to perform a state reduction on a given FSM M1 to yield and equivalent M2 having
fewer states. In general, this may be done by detecting and merging equivalent states within
M1 .
For example one can look for pairs M1 : si and M2 : sj that are equivalent. When such a
pair is found, they simply can be combined into a single state, yielding an equivalent FSM
with one fewer state. This process of looking for equivalent states can be continued in the
new FSM and terminates when a pair of equivalent states can no longer be found. This is an
example of a relaxation algorithm, in which a set of reduction rules is repeatedly applied to
reduce a structure until it can be reduced no more. It begins with a pessimistic but working
model of the desired FSM and iteratively improves the cost while maintaining equivalence.
This approach has the disadvantage that the equivalence of two states can be difficult to detect.
Rather than incrementally improving an initial pessimistic model, the optimistic relaxation
approach begins with the assumption that all of the states of M1 are equivalent (yielding a
one-state machine). The relaxation iteratively discovers pairs of presumed equivalent states
that cannot in fact be equivalent and grudgingly splits them into their components. This
scheme is based on the detection of state none-equivalence through the following two rules:
VLSI Design
Course 8-57
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
to state Sj2 , where Si2 and Sj2 are nonequivalent, then si1 and sj1 are nonequivalent
Beginning with the unrealistic assumption that all states are equivalent, iteration of the above
rules will uncover more and more nonequivalent pairs of states until every pair that has not
been shown nonequivalent is in fact equivalent.
Consider e.g. the FSM diagrammed in figure 8.57. The search for a reduced equivalent starts
by constructing a truth table for output and transition rules for a one-state equivalent:
Transitions
New state Output 0 1
S0 = S1 = S2 = S3 = S 4 X
In the course of building the table, it has to be checked that each output and next-state value
for a merged state is consistent with each of the component states from the original FSM. In
this first step, an inconsistency will be detected immediately: It is impossible to put a value
into the output column for the single combined state that is consistent with all five component
states. Thus the aggregate state has to be split into two new states for the next iteration, with
output values of 0 and 1. One partitions the five-state aggregate into one state corresponding
to the original S0 and S3 states with a 1 output, and a second state corresponding to the
original states with a 0 output. Then it has to be attempted to fill out the truth table:
Transitions
New state Output 0 1
S0 = S3 1 S1 = S4 S0
S1 = S2 = S 4 0 S2 X
This time the table could be nearly completed. A single inconsistency is encountered when
trying to assign a transition for the S1 = S2 = S4 state on a 1 input: In the original machine,
S1 and S4 both go to S3 in this case, while S2 goes to S4 . Since the respective next states S3
and S4 are not equivalent, S2 has to be split into a separate state. This results in:
VLSI Design
Course 8-58
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
Transitions
New state Output 0 1
S0 = S3 1 S1 = S4 S0
S1 = S4 0 S2 S3
S2 0 S2 S4
The corresponding state-transition diagram is shown in figure 8.58. The reader might verify
Regular expressions are a commonly used notation for describing simple classes of strings
and symbols. For the purpose of this subsection the following regular-expression syntax for
describing stings of uppercase letters will be used:
1. Finite strings of symbols (letters), including the empty string (which will be written
as ), are regular expressions. Thus, A, and ABCAABCAAABB are valid regular
expressions, each denoting a set containing only the specified string of zero or more
letters
2. If p and q are regular expressions, then pq is a regular expression denoting the set of
strings formed by concatenating a string from p with a string from q
3. If p and q are regular expressions, then p | q is a regular expression denoting the set of
strings that includes both the strings denoted by p and the strings denoted by q. Thus
A | B is a regular expression defining a set containing the strings A and B
4. If p is a regular expression, then (p) is a regular expression denoting the same set of
strings; parentheses are used to disambiguate – for example, to distinguish (AB) | C
from A(B | C)
VLSI Design
Course 8-59
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
5. If p is a regular expression, then p∗ is a regular expression denoting all strings that are
concatenations of finitely many (zero or more) strings denoted by p. Thus A∗ denotes
the set of strings containing the empty string as well as every string consisting of finitely
many As; A(A | B)∗ B denotes the set of all strings of As and Bs that begin with A and
end with B
An interesting property of regular expressions is that each regular expression defines a set
of strings that can be recognized by a finite state machine. It is assumed that the input to
the FSM is a sequence of symbols (in this case, encoded uppercase letters) and that each
consecutive input symbol can cause a transition from the current FSM state to a new state.
At any time when the sequence of input symbols corresponds to a string to be recognized, the
FSM is in a distinguished state marked R; it is allowed to mark several states in this way. The
starting state will be marked S. The FSM of figure 8.59, for example, recognizes the strings
B(AB)∗ . Note that transitions corresponding to input strings that are not recognized (such as
those containing the letter C) are omitted. The selected convention is that such strings cause
implicit transitions to a BAD state, which causes the entire input sequence to be rejected.
Although every regular expression denotes a set of strings recognizable by an FSM, the system-
atic derivation of an FSM recognizer from a regular expression is not entirely trivial. A useful
conceptual tool in dealing with regular expressions is the nondeterministic FSM (NFSM),
whose state-transition diagram is ambiguous in the sense that it may indicate several possible
transitions on a given input symbol. The simple NFSM in figure 8.60 recognizes the strings
A | (AB).
One can view the NFSM as being in several states simultaneously. Its behaviour can be
emulated by hand, using tokens that are moved about on the state-transition diagram to
record active states. One begins with a token on the starting state. At each input symbol,
VLSI Design
Course 8-60
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
tokens are placed on each state at the arrow end of a transition from a marked state and the
previous tokens have to be removed. Note that at most one token has to be placed in each
state. Whenever one or more states marked R contains a token, the input string is accepted
(recognized) by the NFSM.
It is possible to construct a deterministic FSM that recognizes any regular expression, but
the construction becomes cumbersome when an expression of the form α | β is encountered.
In effect, the FSM under construction must entertain the two alternative forms α and β as
possible inputs until some input symbol rules one or both forms out; this may require a
number of states, each corresponding to some combination of a tentative parse of form α or
an alternative parse of form β. In contrast, the NFSM provides direct accommodation for
alternative input forms by means of ambiguous transitions. The dual paths between the S
and R states of figure 8.60, for example, correspond directly to the alternative input forms A
and AB.
As a further convenience in the construction of NFSMs from regular expression, the use of
transitions on the empty input string is allowed; such transitions are taken spontaneously by
the NFSM. In the token model, whenever there is an empty transition from a state marked by
a token, the target of the empty transition will be marked as well. Figure 8.61 shows how one
might use empty transitions, designated by , to convert the A | (AB) NFSM, for example, to
recognize (A | (AB))∗ .
Nondeterministic FSMs are, in an important sense, no more powerful than deterministic FSMs:
The same set of strings (the ones that can be described by regular expressions) can be rec-
ognized by each. NFSMs, however, provide a primitive model for parallelism because of their
ability to model several discrete states simultaneously. While NFSMs and FSMs perform the
same computations, a deterministic FSM may require exponentially many states compared to
the equivalent NFSM.
The nondeterministic FSM, although not directly realizable in hardware, can be an important
tool in the synthesis of realizable deterministic FSMs that perform useful computations. The
synthesis of an FSM to recognize strings described by the regular expression (A | (AB))∗ ,
for example, might be approached by the straightforward synthesis of the NFSM of figure
8.61 followed by the derivation of an equivalent (but less intuitive) deterministic FSM using
a computer-based algorithm.
VLSI Design
Course 8-61
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Finite-State Machine
8.6.7 Context
Finite-state machines are simultaneously a mathematical abstraction that has received con-
siderable attention from theorists and a practical engineering tool of enormous consequence to
the designer of digital systems. These roles are not independent; the formal study of FSMs has
significantly enriched the repertoire of optimizations and techniques available to the engineer,
while their practical significance stimulates continued attention by theorists.
VLSI Design
Course 8-62
Darmstadt University of Technology
Institute of Microelectronic Systems 0
ASIC Design Process
Chapter 9
VLSI Design
Course 9-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
ASIC Design Process
VLSI Design
Course 9-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
ASIC Design Process
Control-dominated:
VLSI Design
Course 9-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
ASIC Design Styles
VLSI Design
Course 9-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
ASIC Design Styles
VLSI Design
Course 9-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
– Row Structure
– Island Structure
– Matrix of structures (= sea of gates)
VLSI Design
Course 9-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
VLSI Design
Course 9-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
Fig. 9.3 principally shows the structure of gate arrays of International Microcircuits Inc. (IMI)
(single metal layer). The real circuit has 1440 cells. In the Figure a reduced number of 40
cells is drawn in order to improve the clearity of the representation.
The gate array consists of the following elements:
• Underpasses to cross under the power and ground buses without contacting them
VLSI Design
Course 9-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
• Cells containing transistors are clustered around the VDD and VSS buses.
• In each cell four horizontal bars (crossing VDD and VSS ) can be seen. The thick bar
represents a poly underpass while the the three thin bars are common poly input lines
to an nMOS/pMOS transistor pair
VLSI Design
Course 9-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
VLSI Design
Course 9-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
Figure 9.6: Explanations of grid: (a) basic cell. (b) internal interconnects. (c) basic cell and
crossover (poly) block. (d) XR = transistor. (e) crossover block interconnects
In Fig. 9.6 (b) the internal gate (long horizontal poly lines) and internal diffusion (short
horizontal diffusion lines) are shown. From Fig. 9.6 (d) it can be seen that adjacent nMOS
or pMOS transistors have a common drain/source connection. Contacts for the nMOS source
and drain connections are at both sides of the VSS bus (same for pMOS transistors and VDD
bus.
VLSI Design
Course 9-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
VLSI Design
Course 9-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
VLSI Design
Course 9-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
VLSI Design
Course 9-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
Figure 9.11: Personalization for inverter: (a) schematic. (b),(c) IMI layout. (d) CDI layout
VLSI Design
Course 9-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
Figure 9.13: Layout of transmission gates: (a) single TG. (b) pair of TGs with common output
VLSI Design
Course 9-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
Advantages:
• Many others for masters, second source fabrication, libraries and design systems
Disadvantages:
VLSI Design
Course 9-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gate Arrays
VLSI Design
Course 9-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Standard Cell Design
VLSI Design
Course 9-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Standard Cell Design
VLSI Design
Course 9-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Standard Cell Design
Standard Cells:
• Cells in rows: VDD /VSS - lines connected by cell abutment, uniform cell height, variable
width: I/O - connections top and bottom
Advantages:
Disadvantages:
• very complex or large-area functional blocks like RAM, ROM or PLA cannot be inserted
VLSI Design
Course 9-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Standard Cell Design
VLSI Design
Course 9-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Macro Cell Concept
Figure 9.21: Floor plan for macro cell design style (= building block approach
VLSI Design
Course 9-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Mixed Design Styles
• EEPROM cells
• Power components:
• ASIC-Hybrid combinations
• SC-Filter
• Biquad units
• Temperature sensors
VLSI Design
Course 9-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Devices
• PROM (Programmable Read Only Memory) Device with fixed AND array and a pro-
grammable OR array
1. mask programmable
+ superior speed performance due to internal connections hardwired during man-
ufacture
+ cheap at high volumes
– can only be programmed by manufacturer
– development cycle = weeks or months
2. field programmable
+ immediately programmable
+ at low volumes less expensive than mask-programmable devices
– resistance of programmable routing switches lowers signal performance
• PLA
AND array and OR array programmable
product term sharing: every product term of the AND array can be connected to
any of the OR output gates
• PAL
AND array is programmable and OR array has fixed connection points
VLSI Design
Course 9-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Devices
VLSI Design
Course 9-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Devices
VLSI Design
Course 9-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Devices
VLSI Design
Course 9-28
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Devices
=⇒ these devices use EPROM cells or EEPROM cells instead of fuses as programmable
connections
=⇒ tendency:
instead of large global logic planes a blockoriented architecture with local logic
blocks and macrocells and an interconnection network between the blocks is used
VLSI Design
Course 9-29
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Devices
• each EP1800 quadrant contains 12 macrocells and has a local bus with 24 lines (for
normal and inverted macrocell outputs) and a local clock
• the global bus has 64 lines and runs through all of the four quadrants (true and com-
plement signals of 12 inputs (= 24 lines) + true and complement of 4 clocks (= 8 lines)
+ true and complement of I/O-pins of the 4 global macro cells in each quadrant (= 32
lines)
VLSI Design
Course 9-30
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Devices
VLSI Design
Course 9-31
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Devices
VLSI Design
Course 9-32
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Programmable Logic Devices
VLSI Design
Course 9-33
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
• Logic blocks
VLSI Design
Course 9-34
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
1. Block organized, SRAM based (internal block structure not restricted to AND–OR)
• Xilinx
• Altera (FLEX)
• Plessey
• AT&T
• ...
• Actel
• Quicklogic
• ...
VLSI Design
Course 9-35
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
• Anti-fuses are made with a modified CMOS process involving an extra step
• This step creates a very thin insulating layer that separates two conducting layers
• This insulator is penetrated by applying a high voltage to the to conducting layers (this
process is not reversible)
• The programming voltage must be much higher than the logic threshold, otherwise the
chip would program itself under operation
• Such high voltages can be destructive for CMOS logic circuitry
• Large isolation devices may be required to protect logic gates from the programming
voltage
VLSI Design
Course 9-36
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
• Actel PLICE anti-fuses can be programmed by placing a relatively high voltage (18V)
across the anti-fuse terminals, heat and melt the dielectric by a driving current of about
5 mA and form a conductive link between poly-Si and n+ diffusion
• bottom and top layer of the anti-fuse are connected to metal, the over all resistance of
a programmed anti-fuse (from metal to metal) is about 300Ω – 500Ω
• a low resistance path (80Ω) between two metal wires is created by a 10V programming
voltage at the terminals of the anti-fuse
VLSI Design
Course 9-37
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
VLSI Design
Course 9-38
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
• two stage look-up tables, two functions of 4 variables or one function of five variable can
be implemented
VLSI Design
Course 9-39
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
Figure 9.42: Xilinx XC4000 double length lines and long lines
VLSI Design
Course 9-40
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
VLSI Design
Course 9-41
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
VLSI Design
Course 9-42
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
VLSI Design
Course 9-43
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
VLSI Design
Course 9-44
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
VLSI Design
Course 9-45
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Field Programmable Gate Arrays
FPGA:
– relatively low speed of operation caused by the resistance and capacitance of pro-
grammable switches in the routing network
– decreased logical density, programmable switches and configuration network require chip
area
MPGA:
– no redesign flexibility
VLSI Design
Course 9-46
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Overview on Logic Design Alternatives
VLSI Design
Course 9-47
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Overview on Logic Design Alternatives
VLSI Design
Course 9-48
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Adders / Subtracters
Chapter 10
Arithmetic Units
In the following chapter, basic arithmetic units like adders, subtracters, or multipliers are
discussed. These components are widely used in VLSI circuits e. g. for the digital signal
processing application domain. More detailed descriptions on arithmetic units can be found
e. g. in [13] or [3].
C = A1 A2 (10.1)
S = A1 ⊕ A2 (10.2)
is called half–adder and can be used to calculate the sum S of two bits A1 and A0 . A possible
carry is set at the C output.
Full Adder For adding binary numbers having a bitwidth of more than one single bit, the
concept of the half–adder has to be extended. The carry output of less significant bits in the
addition process have to be taken into account in the more significant bits. For that, a new
circuit structure called full–adder is used which is based on the following functional equations:
These equations can be realized either by logic gates (AND, OR, XOR) or by two half–adders
and an OR gate.
The following section introduces the basic arithmetic components used in VLSI designs. First,
adder and subtracter architectures are discussed. Since addition and subtraction for binary
VLSI Design
Course 10-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Adders / Subtracters
numbers can be calculated by almost the same hardware (by selecting the appropriate comple-
ment representation first), the term “adder” is used as synonym for both adder and subtracter
in the following section.
Serial Adders
- Shift Register
.....
n ....
?Sum ?Cout
At the beginning of the operation, the two n–bit operands A and B are loaded to the shift
registers. The carry register is cleared resp. set to the value of the carry input. During the
next n clock cycles (if a wordlength of n bits for each operand is assumed), the operands are
added bitwise in the full–adder and stored in the sum register. For that, the operand shift
registers apply the least significant bit to the full–adder inputs whereas the sum shift register
reads the current sum output of the full–adder at the serial input and and shift the contents
by one bit to the right each clock cycle. The carry output of an addition is stored in the carry
register for use in the next clock cycle. The n-bit sum and the carry output are available after
(n+1) clock cycles [1 operand load, n calculation].
The serial adder has the smallest hardware complexity which is wordlength independent (if
the shift registers are not considered) but requires the highest computation time of all adder
implementations.
Parallel Adders
Ripple Carry Adder Chained full–adders which form an adder of the required wordlength
are called ripple carry adder since during addition the carry “ripples” through the whole chain
from the least significant to the most significant bit as shown in Fig. 10.2:
The addition time is therefore dependent on the wordlength of the operands.
Carry Lookahead Adder To speed up the addition process, lookahead methods can be
applied to reduce the time associated with carry propagation. The carry input of a stage
i is calculated directly from the input of the preceding stages i − 1, i − 2, . . . i − k rather
VLSI Design
Course 10-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Adders / Subtracters
? ? ? ?
CoutSum[n-1] Sum[1] Sum[0]
than allowing carries to ripple from stage to stage. To perform that task, the cout of ordinary
full–adders are substituted by the generate and propagate signals defined by
gi = ai bi (10.5)
pi = ai + bi . (10.6)
As can be seen in the equations above, the carry lookahead logic circuits can be realized by a
two level logic implementation, that means the whole addition is performed in constant time
(without influence of wordlength). The implementation of the carry lookahead corresponding
to the above equations is shown in Fig. 10.3.
A[3] B[3] A[2] B[2] A[1] B[1] A[0] B[0]
? ? ? ? ? ? ? ?
Cin[3]
+ + Cin[2] + Cin[1] Cin[0]
+ Cin
? ? ? ?
Sum[3] Sum[2] Sum[1] Sum[0]
?
Cout
The number of gate inputs is restricted due to technological constraints. That means, the
wordlength of a carry lookahead cannot increase above any number. Due to that reason,
adders for a big wordlength are split into smaller groups processed by single carry lookahead
adders with reasonable wordlengths as shown in Fig. 10.4.
VLSI Design
Course 10-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Adders / Subtracters
? ? ? ? ?
Cout Sum[15:12] Sum[11:8] Sum[7:4] Sum[3:0]
The carry signal produced by a group is forwarded to the next group so that, if the group is
considered as a single block, the carry ripples through different blocks as in the carry ripple
adder. Alternatively, a hierarchical approach might be chosen in a way, that for each group
a group-generate as well as a group-propagate signal are generated which are evaluated by a
second level carry lookahead circuit.
Carry Select Adder In the following adder type, the wordlength of the operands is again
subdivided into clusters (see Fig. 10.5). The cluster subwordlength is chosen to balance the
time required for intra-cluster carry ripple additions and carry calculation of the preceding
clusters. The additions are all performed in parallel assuming the following two cases: carry in
of a cluster are ’0’ and are ’1’. The results (cluster carry out and partial sum C/Sum[i : j]) are
forwarded to multiplexors which select the appropriate value depending on the carry output of
the preceding stages. Since the time to switch a multiplexor is almost negligible compared to
the time required for the carry ripple additions, the overall addition time is almost independent
of the wordlength.
A[15:12] B[15:12] A[11:8] B[11:8] A[7:4] B[7:4] A[3:0] B[3:0]
? ? ? ? ? ? ? ?
4 bit 4 bit 4 bit 4 bit
+
CR-Adder
0 +
CR-Adder
0 +
CR-Adder
0 +
CR-Adder
Cin
? ? ? ? ? ?
4 bit 4 bit 4 bit
+
CR-Adder
1 +
CR-Adder
1 +
CR-Adder
1
H1?
?
H1?
?
H1?
?
0
0
0 C[3]
H
H H
H H
H
? ? C[11] ? C[7] ? ?
Cout Sum[15:12] Sum[11:8] Sum[7:0] Sum[3:0]
Since the carry select adder requires two carry ripple adder chains for each cluster (except in
the least significant), the hardware amount is almost twice that of a simple ripple carry adder.
It is slower than a carry lookahead adder but compared to that type it has a higher regularity
and is for that reason better suited for VLSI implementation.
VLSI Design
Course 10-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Adders / Subtracters
Carry Save Adder For the addition of very many addends (e. g. in parallel multipliers),
the time required for full carry propagation even in the case of use of carry lookahead adders
might be to high for some applications. To achieve constant addition time complexity, the
propagation of computed carry results is avoided in the same stage and both, the S and the
Cout vectors are connected to the correct adder in the succeeding stage. This concept requires
a final addition to merge the sum and the carry vector of the final stage into a single sum
vector which can be realized using any of the adders discussed above (in Fig. 10.6 a carry ripple
adder has been chosen for simplicity). In a carry save adder, the adder delay is increased by
one full-adder delay if it is extended by an additional operand.
X[n-1] Y[n-1] X[2] Y[2] X[1] Y[1] X[0] Y[0] ....
...
?? ?? ?? ??
Full-Adders + + + + Cin
. . . .
..... ..... ..... .....
.... . . .
.... .... ....
W[n-1] W[2]
W[1]
W[0]
? ? ? ? ? ? ??
Full-Adders + + + + 0 Carry
Save
Adder
. . . .
.... .... .... .... Array
.... . . .
.... .... ....
V[n-1] V[2]
V[1]
V[0]
?? ? ? ? ? ? ? ??
Full-Adders + 0 + + + + 0
. . .
.... .... .... ....
..... ..... # ..... .....
...
## .......
...
? ?? ?? ?? ?? Final
Full-Adders + + + . Cout[2]
. . + Cout[1] + 0 Carry
Propagation
...
....
? ? ? ? ? ? ?
Cout Sum[n+1] Sum[n] Sum[n-1] Sum[2] Sum[1] Sum[0]
.... ..
... ....
Stages required to
evaluate the carry outputs
of preceeding stages
VLSI Design
Course 10-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Multipliers
10.2 Multipliers
Shift and Add Multiplier The most common multiplier is the Shift and Add Multiplier
(SAA Mult.). Two binary unsigned integer words X and Y of bit-size Nx and Ny , respectively,
can be written using their binary representation:
x −1
NX Ny −1
X
X= xi 2i Y = yj 2j (10.12)
i=0 j=0
x −1
NX
Z= xi Y 2i = (...((xNx −1 Y )2 + xNx −2 Y )2 + ...)2 + x0 Y (10.13)
i=0
In each step of the recurrence one bit of X is multiplied (a simple AND-operation) with Y and
added to the intermediate result Di which is shifted one bit. Figure 10.7 shows the general
structure of the Shift and Add multiplier with bit-sizes Nx and Ny .
For this multiplier type it takes Nx clock cycles to complete the multiplication, since one bit
of X is processed each step. The delay of the combinatorical circuit (which determines the
maximum clock frequency) is approximately: Ny δF A (δF A is the delay of a full adder, the
register delays are not considered).
The cost of a Shift and Add Multiplier is (3Ny + 2Nx )γF A (the cost of a full adder γF A is
assumed to be equal to the cost of a register).
Carry Save Multiplier In opposite to the SAA-Multiplier, the Carry Save Multiplier
(CSM) calculates the result in one step. Every bit of the first argument is multiplied with
every bit of the second argument concurrently. The results are added up according to the
position of the source bits.
VLSI Design
Course 10-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Multipliers
The CSM consists of combinatorial logic only. The multiplication of two 4-bit binary numbers
can be written as
X3 X2 X1 X0
Y3 Y2 Y1 Y0
————————–
P30 P20 P10 P00
P31 P21 P11 P01
P32 P22 P12 P02
P33 P23 P13 P03
—————————————————
Z7 Z6 Z5 Z4 Z3 Z2 Z1 Z0
where Pij = Xi ∧ Yj . The addition of all Pij terms can be done in an array of full adders.
Figure 10.8 shows the general structure of a Carry Save Multiplier assuming Nx ≥ Ny . Part
II is omitted in case of same size for Nx and Ny . The Carry In of the full adder is supplied in
the upper right corner. Not every full adder needs a Carry In, for some position half adders
are sufficient. The adder Carry Out is depicted in the lower left corner.
The delay of this type of multipliers is (Nx + Ny − 2)δF A . The cost is (Nx − 1)Ny γF A plus
(2Ny + 2Nx )γF A , if X, Y and the Z-register are accounted as in the shift and add case above.
Block Multiplier A combination of the fully parallel Carry Save Multiplier and the serial
Shift and Add Multiplier leads to a flexible architecture which can be configured from working
fully serial to working fully parallel. Many combinations in between are possible, thus allowing
the adaptation to given specifications and restrictions.
VLSI Design
Course 10-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Multipliers
The basic idea of the block multiplication is to divide each argument into blocks of the same
size. Each block of the first argument is multiplied with each block of the second argument in
a fast Carry Save Multiplier. All calculated block products are added up taking into account
the positions of the current argument blocks. Therefore, as in the Shift and Add Multiplier,
the arguments and the intermediate result have to be shifted in an appropriate way.
.
. . . . . . .......
.
X register nx Y register
.
AA
.......... .... ...
.. ny
Carry Save
Multiplier
XX ( ( nx+ n y
XX nx+ n y ..... ..... ..... .....
.......... ..........
Adder
( ( nx+ n y
Controller
..... .....
........ ..........
..
.
. . . .......
.
Z register
Figure 10.9 shows the architecture of the block multiplier. The argument registers and the
Carry Hold Register are simple shift registers. The intermediate result has to be shifted in
both directions, thus requiring a bidirectional shift register. Signals for controlling the shift
directions are generated by a controller, which can be realized using a simple counter.
The multiplier can be configured by varying the block sizes of the arguments. With increasing
block sizes the multiplier becomes more parallel, thus reducing the number of clock cycles
needed to perform a multiplication. Larger block sizes, however, require a larger Carry Save
Multiplier, which increases the area needed to realize the multiplier. Assuming that the first
argument is separated in kx Blocks of size nx and the second argument in ky blocks of size
ny , the multiplier needs kx ∗ ky clock cycles to perform a multiplication. The delay of the
multiplier is determined by the size of the ripple carry adder, which has a width of nx + ny
bits.
VLSI Design
Course 10-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Microarchitectures
Chapter 11
Microarchitectures
The term microarchitecture describes the domain between the macroarchitecture (the lowest-
level hardware visible to the user) and the implementation technology (MOS VLSI) [27]. For
better analysis, microarchitectures are usually divided into 3 parts: the data path which
performs the data manipulations and calculations, the control path is used to apply correct
sequences of control signals to the data path, and the input/output unit providing access
from/to the external world (see Fig. 11.1)
Control
.. ..
.....
Signals
Data Control
Path Path
Status
.. ...
....
-
Flags
...
.....
6
?
Input /
Output
.....
6
....
?
External I/O Data
The control path which can be interpreted as a more or less complex finite state machine
(FSM) can be either hardwired (used in fixed applications like a controller for the serial
adder in Fig. 10.1) or programmable (microprocessor with downloadable microcode). The
microarchitecture scheme as shown in Fig. 11.1 can represent quite simple circuits (like a
traffic light controller) as well as complex microprocessors.
VLSI Design
Course 11-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Datapath Design
In the datapath of a microarchitecture, the operations and data manipulations are performed.
For that, control signals are generated by the control path depending on the operation(s) to
be executed. By forwarding information about the status of the data path (e. g. exceptional
conditions, underflow, overflow, division by zero, . . . ), the control path is able to react in a
correct way to the actual needs. The state signals (flags) can be used to enable conditional
branching depending on the state of the data path. Data processing is usually performed by
typical components like ALUs, shifters, register files, . . . .
The following section shows how datapath structures are usually implemented in larger VLSI
designs. For that, we assume the following simple datapath structure:
Control Signals
Clock OP-Sel Sel Shift Clock
Cin
..........
?.
Ain - -PP
?
P ?
Inputs -@
? ..........
?.
..... 6
.
.....
- Rout
Output
@ - -
..........
? -
.
?
Bin - -
?
Status Flags
Status Signals
Figure 11.2: Datapath example
The datapath consists of 2 input registers for the input operands Ain and Bin, an arithmetic-
logic unit (ALU), a multiplexor to select between the Cin input and the ALU output, a
shifter unit, and an output register. The datapath structure could be implemented based on
standard cells, where basic library cells (like gates, muxes, registers, . . . ) are selected and
interconnected, or, if a datapath compiler is used, based on a set of several layout tiles as
shown in Fig. 11.3.
A datapath compiler creates a regular layout depending on the wordlength of the operands by
stacking the appropriate number of tiles in the layout. The horizontal structure consisting of
a set of tiles performing all functions for a single bit is called bit slice. If we apply vertical cuts
to the layout structure, the whole layout will be subdivided in layout blocks corresponding to
a single function implemented. These layout stripes are called functional slices.
As an example for a discrete datapath implementation the 2901 bit-slice will be discussed in
the following section (→ [10]).
The 2901 integrated circuit contains besides of a 16 word register set, a Q register (used
VLSI Design
Course 11-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Datapath Design
Control
Signal
Buffers
.....
.....
.
Bit[0]
Bit[1] Bit
Slices
.
Bit[n-1] ....
.....
Status
Buffers
..... .
..... ....
. .....
Functional Slices
Figure 11.4: 2901 4-bit ALU slice Figure 11.5: 2901 µ-OPs
VLSI Design
Course 11-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Datapath Design
The 2901 IC has been widely used for applications in digital signal processing and for minicom-
puters. It is available as stand-alone IC and some silicon manufacturers also provide macrocells
with the functionality of the 2901 (for different wordlengths) that might be included to ASIC
designs.
VLSI Design
Course 11-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Controller Implementations
Controllers are used to apply a sequence of control signals to the datapath components. These
control signals are chosen to perform the desired operation(s) within the datapath. The
datapath is able to interact with the controller unit by sending appropriate status signals
(e. g. overflow flag when an addition is performed, equal flag as a result of a comparison, . . . ).
The controller can be designed to change the sequence of control signals depending on these
flags (used e. g. in microprocessors to perform conditional branches).
The general structure of such a controller can be found in Fig. 11.7.
Environmental Inputs
? ?
Combinational
Logic
?
State Register
?
Control Outputs
It consists of a combinational logic block and a register. The combinational logic block gener-
ates out of the input signals (which can be e. g. an instruction word defining the sequence of
control signals to be generated, state flags, . . . ) and parts of the previous register content the
control output signals as well as the information which step in the sequence of control signals
is to be executed in the next cycle. The controller can be seen as a realization of the abstract
model of a finite state machine.
To get a high level of regularity in the design of a controller, very often regular layout structures
(like ROMs or PLAs) are used to implement the combinational logic block rather than directly
implement the logic functions in separate gates (random logic). The random logic approach
was chosen in the control unit of many early microprocessors (≤ 8 bit) and in RISC (Reduced
Instruction Set Computer) processors whereas the regular layout structures are used in CISC
(Complex Instruction Set Computer) processors to simplify their controller design. Regular
structures simplify the design process due to the fact that if modifications in the control
sequences are required only the contents of a PLA resp. a ROM has to be redefined instead
of designing a whole combinational gate network. Since the design process for the latter
approach can be compared with programming a memory contents instead of circuit design,
that approach is called microprogramming and will be considered in detail in the sequel.
VLSI Design
Course 11-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Controller Implementations
@ Address
@ Decoder
ROM
?
Control NA
6 6
?
Control Outputs Environmental Inputs
. . . . . . . . . . . . . . .P
. .L. .A. . . ......
. .
. .
. .
. .
. .
. OR A ND .
. .
. .
. .
. .
...................... .6
.....
?
Control NA
6 6
?
Control Outputs Environmental Inputs
Depending on the generation of the control signals, two types of microinstructions can be
distinguished:
VLSI Design
Course 11-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Controller Implementations
? ? ? ? ? ? ? ? ? ? ? ?
..... .
..... .....
. ....
Control Lines
?
@
@
Control Bit Decoder
? ? ? ? ? ? ? ? ? ? ? ?
..... .....
.....
. .....
Control Lines
In controller design, one can proceed one step further: if a microinstruction itself can be
represented as a sequence of ‘sub’microinstructions (so called nanoinstructions, the structure
shown in Fig. 11.12 can be used. The most simple approach, which already has been mentioned
under vertical microcode, is a single step ‘sequence’ of nanoinstructions, namely the decoding
of the control outputs out of an encoded control vector from the microcode control memory.
VLSI Design
Course 11-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Controller Implementations
If feedback is introduced in the decoder PLA (via the NNA [nanocode next address] register),
control sequences can be generated by the nanocode PLA. As long as a nanocode sequence
is running, the MNA [microcode next address] register is halted. In the case that many
microinstructions use the same nanocode sequences, significant savings in implementation
area for the whole controller can be reached.
. . . . . . . .Microcode
. . . . . . . . .PLA
..... ......
. .
. .
. .
. .
. .
. OR A ND .
. .
. .
. .
. .
...................... .6
.....
?
MNA
6 6
Environmental Inputs
?
Control Outputs
VLSI Design
Course 11-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Introduction
Chapter 12
12.1 Introduction
The following design guidelines have been adapted from [5]. These recommendations are useful
in order to avoid functional faults and get the desired functionality.
• the same active edge of a single clock is applied at precisely the same time to all storage
elements
→ The clock-input of the second FF is skewed by the clock-to-q delay of the first FF and
not activated at every activation clock edge (e.g. ripple counter)
VLSI Design
Course 12-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Synchronous Circuits
→ Clock skew caused by gating the clock line (e.g. multiplexer in clock line)
→ Synchronous design principle, that all FFs change state at exactly the same time is not
fulfilled
VLSI Design
Course 12-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clock Buffering
Recommended circuits for synchronous circuit design are described in the subsequent sections.
→ Clock skew
VLSI Design
Course 12-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clock Buffering
VLSI Design
Course 12-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clock Buffering
→ Same fanout
VLSI Design
Course 12-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Clock Buffering
VLSI Design
Course 12-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Gated Clocks
→ Signal change at multiplexer input can cause a glitch at the clk input (FF captures
invalid data)
VLSI Design
Course 12-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Double-edged Clocking
VLSI Design
Course 12-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Asynchronous Resets
VLSI Design
Course 12-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Shift-Registers
12.7 Shift-Registers
VLSI Design
Course 12-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Asynchronous Inputs
→ Circuits with complicated feedback loops to capture asynchronous inputs (very sensitive
to noise and functionality can be influenced by placement and routing delays
VLSI Design
Course 12-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Asynchronous Inputs
VLSI Design
Course 12-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Asynchronous Inputs
• the first flip-flop is reset asynchronously when the r input is zero or when the qb outputs
of the second and the third FF both have the value 0
• the q-output of the first FF is asynchronously set to high, when a positive edge arises
at its ck-input
• the high output of the first FF is propagated through the second and the third FF in
the two following cycles. The q-outputs of these FFs are set to zero and the reset logic
for the first FF is activated. Now the first FF is ready to receive another edge at its
input.
1. the second FF stabilizes to q=1 before the next rising clock edge (circuit works as
desired)
2. the second FF settles to q=0 and the third FF remains in its state. Since the
output q of the first FF is high, the propagation of this output works correctly, but
it needs one cycle more than in the first case.
3. The metastable state of the second FF is still there at the next rising edge of
the clock signal. Then the third FF also becomes metastable. The probability of
receiving a metastable d (internal) signal can be reduced by increasing the length
of the register chain.
VLSI Design
Course 12-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Delay Lines and Monostables
In general it can not be recommended to build circuits, which functionality relies on delays.
VLSI Design
Course 12-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Bistable Elements
VLSI Design
Course 12-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Bistable Elements
VLSI Design
Course 12-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
RAMs and ROMs in Synchronous Circuits
Problem: RAMs are double-edge triggered. The address is latched on the opposite edge to
the data
VLSI Design
Course 12-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
RAMs and ROMs in Synchronous Circuits
VLSI Design
Course 12-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Tristates
12.12 Tristates
VLSI Design
Course 12-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Tristates
Figure 12.36: Tristate bus with central control of tristate enables and ad-
ditional driver activated on non-controlled states
Disadvantages of Tristates:
• large area
• limited buffering
Advantages of Multiplexers:
• small area
• efficient routing
VLSI Design
Course 12-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Parallel Signals
VLSI Design
Course 12-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fanout
12.14 Fanout
VLSI Design
Course 12-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fanout
VLSI Design
Course 12-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fanout
VLSI Design
Course 12-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Speed
2. Use AOI logic (complex cells from standard cell library) where possible
Figure 12.44: Late changing input fed late into combinational logic
VLSI Design
Course 12-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Speed
q0 q1 q2 q3
0 0 0 0
1 0 0 0
1 1 0 0
1 1 1 0
1 1 1 1
0 1 1 1
0 0 1 1
0 0 0 1
0 0 0 0
VLSI Design
Course 12-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Testability:
1. Controllability
2. Observability
Figure 12.47: Circuit with inaccessible internal logic: only first block is
controllable and only last block is directly observable
Figure 12.48: Chain of counters: first counter is not directly observable and
second counter is not directly controllable
VLSI Design
Course 12-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Figure 12.49: Counter with closed feedback loop: initial state not known
VLSI Design
Course 12-28
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Figure 12.51: Chain of counters broken by test input and output signals
VLSI Design
Course 12-29
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Figure 12.52: Counter with feedback loop opened by test control and output
signals
VLSI Design
Course 12-30
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
VLSI Design
Course 12-31
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
VLSI Design
Course 12-32
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Motivation
Chapter 13
13.1 Motivation
Example:
Testing of a combinational circuit with n inputs (10 MHz, one test per cycle)
VLSI Design
Course 13-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Economical Considerations
#DefectiveParts
aql = (13.1)
#AcceptedParts
VLSI Design
Course 13-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Economical Considerations
• DL(= AQL): Number of defective circuits which have been classified as correct working
(testing with T )
• Y: yield
• T: fault coverage
DL = 1 − Y 1−T (13.2)
VLSI Design
Course 13-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Economical Considerations
VLSI Design
Course 13-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design Flow: Testing
VLSI Design
Course 13-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fundamental Definitions
Manufacturing Process
↓
Parametric Test (current/power dissipation)
(erroneous chips are marked with color points and removed after sawing)
↓
Chip Test on Tester
• fault:
physical defect, imperfection or flaw which occurs in an hardware or software component
• error:
manifestation of a fault (erroneous information on an hardware line or in a program,
caused by a fault)
• failure:
malfunction of a system
VLSI Design
Course 13-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fault Models
• Oxide defects
• Missing implants
• Lithographic defects
• Junction defects
• Moisture accumulation
• Impurities/Contaminants
• Static discharge
VLSI Design
Course 13-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fault Models
VLSI Design
Course 13-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fault Models
VLSI Design
Course 13-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fault Models
VLSI Design
Course 13-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fault Models
VLSI Design
Course 13-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fault Tolerant Design
• Self-Checking Logic
VLSI Design
Course 13-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fault Tolerant Design
VLSI Design
Course 13-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fault Tolerant Design
VLSI Design
Course 13-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Test Pattern Generation
• manually
• algorithmic
In the D-Algorithm the symbols D and D are used to refer to the changes. D and D are used
as follows:
D: used if a line has the value 1 in absence of a fault and the value 0 in case of a fault
ocurrance
D: used if a line has the value 0 if no fault occurs and otherwise the value 1
The D-algorithm method for path sensitization consists of two principal phases:
These two s.pdf are iterated for different propagation paths for the D-value from one dedicated
internal point i to one dedicated primary output point o until the backward trace phase is
finished without any contradiction (a test vector for a fault at i has been found) or until all
possible paths from i to o have been examined.
VLSI Design
Course 13-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Test Pattern Generation
1. A primitive D-cube of a failure is a D-cube associated with a fault l/α on the output
line l of a gate G. This produces the value D or D on l and the input lines have values
which would produce α in the fault-free case.
Figure 13.10: Primitive D-cube of fault (pdcf) for two-input NAND gate
VLSI Design
Course 13-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Test Pattern Generation
VLSI Design
Course 13-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Test Pattern Generation
VLSI Design
Course 13-18
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Test Pattern Generation
VLSI Design
Course 13-19
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Test Pattern Generation
In the following the D-algorithm is illustrated for the given example from fig. 13.15
VLSI Design
Course 13-20
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Test Pattern Generation
VLSI Design
Course 13-21
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Test Pattern Generation
4. If cube i is used with D instead of D, the propagation to the output can be done:
5. Now the consistency phase is started and a value for line 4 has to be found. From the
singular cover table can be seen that a 0 on line 10 implies both line 7 and line 8 to be
1. In cube m line 7 is a D (and also line 5 which is connected to 7 by j) and this D must
now be set to 1 which is a contradiction which disables the path sensitization 5 → 6/7
→ 9 → 11.
⇒ Start test vector generation using another path
7. From the singular cover table we get the information that a 1 on line 8 is the same as a
0 on line 4. Additionally it can be seen that the 0 on line 9 can be obtained by a 1 on
line 1.
1110DDD10DD
VLSI Design
Course 13-22
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Test Pattern Generation
1110
VLSI Design
Course 13-23
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Fault Simulation
VLSI Design
Course 13-24
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Testability:
• controllability
• observability
• → additional chip area required
• → shorter design cycle
• ad-hoc techniques
• structured approaches
Figure 13.17: Design for testability: complex gate (a) not testable with stuck-at model. (b)
fully testable with stuck-at model
VLSI Design
Course 13-25
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
VLSI Design
Course 13-26
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Figure 13.19: Testability: ad-hoc techniques (a) insertion of register in order to limit logic
depth to a given maximum value. (b) test shift registers for PLA test (increasing PLA area).
VLSI Design
Course 13-27
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Scan-Path:
2. System operation mode: Wait until inputs of Y are steady. Clock new state into Y.
VLSI Design
Course 13-28
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Advantages:
Disadvantages:
• Wastes silicon
• Additional Complexity
Overhead
VLSI Design
Course 13-29
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
VLSI Design
Course 13-30
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
VLSI Design
Course 13-31
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Example:
VLSI Design
Course 13-32
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
VLSI Design
Course 13-33
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Signature Analyse
D R
=Q+
P P
VLSI Design
Course 13-34
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
2m−n − 1
N=
2m − 1
F ≈ 1 − 2n
VLSI Design
Course 13-35
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
Interpretation:
• n = 16 bit −→ F = 99,99985%
VLSI Design
Course 13-36
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
A BILBO register is a universal element for use in either a scanpath environment or a self-test
(signature analysis) environment.
Figure 13.27: BILBO registers: 1. full circuit 2. normal use 3. scan-path use 4. signature
analysis
Advantages:
• Versatility
– Normal operation
– Scan-path test: enhances testability
– Test vector generation via LFSR
– Data compression via LFSR
– Combined scan-path/self-test using same LFSRs
Disadvantages:
• Silicon area
VLSI Design
Course 13-37
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Design for Testability
VLSI Design
Course 13-38
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Chapter 14
Boundary-Scan Architecture –
JTAG Standard
• later North American companies joined the group (→ Joint Test Action Group = JTAG)
VLSI Design
Course 14-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Classical Board Test Approaches
VLSI Design
Course 14-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Classical Board Test Approaches
• increased density
VLSI Design
Course 14-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Introduction to Boundary Scan
VLSI Design
Course 14-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Introduction to Boundary Scan
Input Output
Expected Actual
x1x1x0xxxxxx xxxxxxxx01x1 xxxxxxxx11x0
x0x0x1xxxxxx xxxxxxxx10x0 xxxxxxxx11x0
VLSI Design
Course 14-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Introduction to Boundary Scan
• self-testing ICs: boundary scan can be used to trigger the self-test procedure
VLSI Design
Course 14-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
The IEEE Standard 1149.1
• TAP Controller: responds to the control sequences supplied through the test access port
(TAP) and generates the clocks an control signals required for the operation of the other
circuit blocks
• Instruction Register: shift register which is serially loaded with instruction for test
• Test Data Registers: Bank of shift registers. The stimuli values required for a test are
serially loaded into a test register selected by the current instruction. After execution
the results can be shifted out for examination
VLSI Design
Course 14-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
The IEEE Standard 1149.1
• Test Clock Input (TCK): independent of the system clock; used for synchronization of
test operations between various chips on a board
• Test Mode Select Input (TMS): Input for controlling the test logic
• Test Data Input (TDI): Serial input for instruction and test register data
• Test Data Output (TDO): Serial output of instruction or test register data (source se-
lected by TMS code)
VLSI Design
Course 14-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
The IEEE Standard 1149.1
VLSI Design
Course 14-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
The IEEE Standard 1149.1
Figure 14.11: Use of bus master chip to control IEEE Std 1149.1 chips
14.3.3 TAP-Controller
• 16-state FSM which controls data register (DR) and instruction register (IR) operations
• input signals:
– TRST∗
– TCK
– TMS
– last state (stored in internal FFs)
VLSI Design
Course 14-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
The IEEE Standard 1149.1
• output signals:
– Reset*
– Select
– Enable
– ShiftIR
– ClockIR
– UpdateIR
– ShiftDR
– ClockDR
– UpdateDR
VLSI Design
Course 14-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
The IEEE Standard 1149.1
VLSI Design
Course 14-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
The IEEE Standard 1149.1
Bypass Register
VLSI Design
Course 14-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
The IEEE Standard 1149.1
VLSI Design
Course 14-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
The IEEE Standard 1149.1
VLSI Design
Course 14-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog Signal Processing
Chapter 15
as shown in Fig.15.1
The aim of development is to integrate all these functions on a single chip.
VLSI Design
Course 15-1
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog Signal Processing
Figure 15.3: Signal bandwidths that can be processed by present day (1989)
technologies
VLSI Design
Course 15-2
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog Signal Processing
Fig. ?? illustrates how analog-to-digital (A/D) and digital-to-analog (D/A) converters are
used in data systems. In general, an A/D conversion process will convert a sampled and
held analog signal to a digital word that is a representative of the analog signal. The D/A
conversion process is essentially the inverse of the A/D process. Digital words are applied to
the input of the D/A converter to create from a reference voltage an analog output signal that
is a representative of the digital word.
Figure 15.4: Converters in signal processing systems: (a) A/D, (b) D/A
VLSI Design
Course 15-3
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Digital-To-Analog Converters
Figure 15.5: (a) Conceptual block diagram of a D/A converter, (b) Clocked
D/A converter
In most cases, the digital input of the D/A converter is synchronously clocked. It is therefore
necessary to provide a latch to hold the word for conversion and a sample-and-hold circuit at
the output, as shown in Fig. ??(b).
The basic architecture of the D/A converter without an output sample-and-hold circuit is
shown in Fig. ??. Fig. ?? shows the ideal input-output characteristics for such a D/A
converter.
The output Voltage of a current-scaling D/A converter as shown in Fig. ?? can be expressed
as
R R b1 b2 b3 bN
Vout = − I0 = − + + + . . . + N −1 Vref (15.4)
2 2 R 2R 4R 2 R
= −Vref (b1 2−1 + b2 2−2 + b3 2−3 + . . . + bN 2−N ) (15.5)
VLSI Design
Course 15-4
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Digital-To-Analog Converters
Figure 15.6: (a) Sample-and-hold circuit, (b) Waveforms illustrating the op-
eration of the sample-and-hold circuit
VLSI Design
Course 15-5
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Digital-To-Analog Converters
The major disadvantage of this approach is the large ratio of component values. For example,
the ratio of the resistor for the MSB to the resistor for the LSB is given by
RM SB 1
= N −1 (15.6)
RLSB 2
Thus, the output voltage of the R-2R D/A converter is given by Eq. ??.
VLSI Design
Course 15-6
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Digital-To-Analog Converters
VLSI Design
Course 15-7
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Digital-To-Analog Converters
A voltage-scaling D/A converter is shown in Fig. ??. Its output voltage at any tap i can be
expressed as
Vref
Vi = (i − 0.5) (15.8)
8
The output voltage of the D/A converter is then determined by the values of the inputs b1 ,
b2 and b3 .
The structure of this voltage-scaling D/A converter is very regular and thus well suited for
MOS technology. A problem with this type of D/A converters is the accuracy requirements
of the resistors used. This makes it difficult to build D/A converters of this type with more
than 8 bit resolution.
VLSI Design
Course 15-8
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog-To-Digital Converters
The objective of an A/D converter is the determination of the digital word corresponding to
the analog input signal. Usually a sample-and-hold circuit (see Fig. ??) is required at the
input of the A/D converter because it is not possible to convert a changing analog signal.
A block diagram of a general A/D converter is shown in Fig. ??. The ideal input-output
characteristics for a A/D converter are shown in Fig. ??.
VLSI Design
Course 15-9
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog-To-Digital Converters
Two possible implementations of serial A/D converters are single-slope and dual-slope A/D
converters. Both will not be discussed in detail here. The main advantages of these converters
is their simplicity, their main disadvantage is the long conversion time required.
This type of A/D converters converts an analog input into an N-bit digital word in N clock
cycles. Consequently, the conversion time is less than for the serial converters without much
increase in the complexity of the circuit. Fig. ?? shows an example of a successive approxi-
mation A/D converter architecture.
VLSI Design
Course 15-10
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog-To-Digital Converters
In many applications, it is necessary to have a smaller conversion time than is possible with
the previously described A/D converter architectures. Parallel A/D converters, also known as
flash A/D converters, typically require down to one clock cycle for conversion. An architecture
of a 3-bit parallel A/D converter is shown in Fig. ??.
Parallel A/D converters can reach typically up to 20 MHz for CMOS technology. The sample-
and-hold time may though be larger than 50 ns and could prevent this conversion time from
being realised. Another problem is that the number of comparators required is 2N −1 . For N
greater than 8, too much area is required.
One method of achieving small system conversion times is to use slower A/D converters in
parallel, which is called time-interleaving and is shown in Fig. ??. Here M successive ap-
proximation A/D converters are used in parallel to complete the N -bit conversion of one
analog signal per clock cycle. The sample-and-hold circuits consecutively sample and apply
the input analog signal to their respective A/D converters. N clock cycles later, the A/D
converter provides a digital word output. If M = N , then a digital word is given out every
clock cycle. If one examines the chip area for an N -bit A/D converter using the parallel A/D
converter architecture (M = 1) compared with the time-interleaved architecture for M = N ,
the minimum area will occur for a value of M between 1 and N .
VLSI Design
Course 15-11
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog-To-Digital Converters
VLSI Design
Course 15-12
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog-To-Digital Converters
VLSI Design
Course 15-13
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog-To-Digital Converters
Introduction
The basic structure of a sigma-delta converter is shown in Fig. ??. The sigma-delta converter
can be referred to as an oversampling converter, although oversampling is just one of the tech-
niques contributing to the performance of a sigma-delta converter. The sigma-delta converter
shown in Fig. ?? quantizes an analog signal with very low resolution (1 bit) and a very high
sampling rate (2 MHz). With the use of oversampling techniques and digital filtering, the
sampling rate is reduced (8 kHz) and the resolution is increased (16 bits).
A more detailed block diagram of the sigma-delta modulator is shown in Fig. ??. It consists
of an integrator, a quantizer (comparator for 1 bit) and a feedback loop with a D/A converter
(switch for 1 bit). The output of the sigma-delta modulator is shown in Fig.?? for a sine
wave input. The single-bit conversion will result in an output which is either ’1’ or ’0’. When
the signal is near plus full scale, the output is positive during most of the clock cycles. The
opposite is true for near minus full scale signals. When the output is followed by a digital filter
as shown in Fig. ?? which can perform sophisticated averaging functions, the 1-bit sequence
is transformed into a much more meaningful signal.
Noise Shaping
One feature that makes the sigma-delta converter so powerful is its noise shaping capability.
To understand how this works, the analysis of the sigma-delta modulator in the frequency
domain is appropriate. Fig.?? shows the frequency domain linearized model of a sigma-delta
modulator.
VLSI Design
Course 15-14
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog-To-Digital Converters
VLSI Design
Course 15-15
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog-To-Digital Converters
The integrator is represented as a analog filter. For an integrator, the transfer function has
an amplitude which is inversly proportional to the input frequency ( f1 relationship). The
quantizer is modelled as a gain stage followed by the addition of quantization noise.
Thus, the output y of the sigma-delta converter can be expressed by
1
y = (x − y) +q (15.9)
f
where (x − y) is the difference signal from the summing node at the input and q is the
quantization noise. Applying some algebraic rearrangement yields
x y
y = − +q
f f
1 x
1+ y = +q
f f
x
f q
y = 1 + 1
1+ f 1+ f
x qf
y = + (15.10)
f +1 f +1
At a frequency f = 0, the output signal equals x with no noise element q. At higher frequencies,
the value of x is reduced and the influence of q increases. In essence, the sigma-delta modulator
has a low pass effect on the signal and a high pass effect on the noise. As a result of this,
the modulator can be thought of as a noise shaping filter where noise in the signal pass band
is reduced and noise energy is pushed into the higher frequency region. The effect of this
procedure on normally equally distributed (white) quantization noise is shown in Fig. ??.
VLSI Design
Course 15-16
Darmstadt University of Technology
Institute of Microelectronic Systems 0
Analog-To-Digital Converters
Digital Filtering
The sigma-delta modulator described so far produces a stream of single-bit digital values at
a very high rate. The modulator’s output bit stream is fed into the converter’s digital filter,
which performs several different functions. All of these functions, however, are integrated into
a single filter implementation. The functions of the filter are:
The sampling rate reduction is done by averaging over a sample of cycles of the input bit
stream and produces an output data stream that is reduced in sampling rate, but increased
in resolution (i.e. number of bits per sample).
• Sigma-delta converters are a complete conversion and filtering system, additional digital
filtering functions may easily be implemented in the digital output filter of the converter
• Very low-cost and high-performance conversion ist possible as the analog part of the
converter is very simple and need not be as accurate as in other A/D converters. The
main part of the converter is the digital filter which can be integrated more easily in
MOS technology.
VLSI Design
Course 15-17
Darmstadt University of Technology
Institute of Microelectronic Systems 0