Lecture Notes On Mixed Signal Circuit Design by Prof Dinesh.K.sharma

http://www.satishkashyap.
com/
Basics of Semiconductor Devices
Dinesh Sharma
Microelectronics group
EE Department, IIT Bombay
October 13, 2005
http://www.satishkashyap.com/
1
In this booklet, we review the fundamentals of Semiconductor Physics and basics
of device operation. We shall concentrate largely on elemental semiconducors such
as silicon or germanium, and most numerical values used for examples are specific to
silicon.
Semiconductor fundamentals
A semiconductor has two types of mobile charge carriers: negatively charged electrons and positively charged holes. We shall denote the concentrations of these charge
carriers by n and p respectively. The discussions in this booklet apply to elemental semiconductors (like silicon) which belong to group IV of the periodic table. We
can intentionally add impurities from groups III and V to the semiconductor. These
impurities are called dopants. Impurities from group III are called acceptors while
those from group V are called donors. Each donor atom has an extra electron, which
is very loosely bound to it. At room temperature, there is sufficient thermal energy
present, so that the loosely bound electron breaks free from the donor, leaving the
donor positively charged. This contributes an additional electron to the free charge
carriers in the semiconductor, and a positive ionic charge at a fixed location in the
semiconductor. Similarly, an acceptor atom captures an electron, thus producing a
mobile hole and becoming negatively charged itself. A semiconductor without any
dopants is called intrinsic. An unperturbed semiconductor must be charge neutral as
a whole. If we denote the concentration of ionised donors by Nd+ and the concentration of ionised acceptors by Na , we can write for the net charge density at any point
in the semiconductor as:
= q(Nd+ Na + p n)
(1)
where q is the absolute value of the electronic charge. In an unperturbed semiconductor, will be zero everywhere. Electrons and holes are generated thermally - the
availability of energy equal to the band gap of the semiconductor results in the generation of an electron - hole pair. Simultaneously, electrons and holes can recombine
to annihilate each other, giving out energy which is equal to the band gap of the
semiconductor. Thus we have the reversible reaction:
e + h + *
) Eg
Where Eg is the band gap energy of the semiconducor.
Applying the law of mass action to the above reaction, we can write for the equilibrium
concentration of holes and electrons:
n p = constant
The above relation applies to doped as well as intrinsic semiconductors. But for an
intrinsic semiconductor,
n = p ni
Therefore, the constant in the equation connecting n and p must be n2i . Thus, for
a semiconductor in equilibrium,
n p = n2i
(2)
Since n and p are not independent, but are constrained by the above relation, we can
define a single independent variable, the Fermi potential by
KB T p
KB T n i
F
ln =
ln
q
ni
q
n
2
(3)
Where KB is the Boltzmann constant, T is the absolute temperature and q is the
absolute value of the electronic charge. At room temperature, KB T /q is approximately 26 mV and ni is of the order of 1010 /cm3 for silicon. Now electron and hole
concentrations are given by:
n = ni e
qF
BT
qF
p = n i e KB T
(4)
To simplify these relations, we define a dimensionless Fermi potential by:

uF
qF
= ln(p/ni ) = ln(ni /n)
KB T
then:
n = ni euF
p = n i e uF
(5)
Generally, a semiconductor will be doped with only one kind of impurity. A

semiconductor doped with donors will have many more electrons than holes. This
type of semiconductor is called N type, and electrons are the majority carriers in this
type of semiconductor. Similarly, holes are the majority carriers in a semiconductor
doped with acceptors and it is termed P type. If both types of dopants are present,
the one present in higher concentration determines the type of the semiconductor.
The net doping is defined as the difference in the concentrations of the more abundant
and the less abundant dopants.
In most practical cases, the ratio of majority to minority carriers is very high. The
concentration of majority carriers is then very nearly equal to the net dopant concentration. To take a typical example, consider P type silicon with boron concentration
of 1016 atoms/cm3 . This gives:
p = Na = 1016 /cm3
n = n2i /p 1020 /1016 /cm3 = 104 /cm3
p/n 1012 !
1.1
Band Diagrams
The above concepts are often visualised with the help of band diagrams. The arrangement of atoms in a semiconductor results in certain electron energies which are not
permitted. Thus, the energy range is divided into bands of permitted energy values
alternating with forbidden gaps.
The highest such band which is nearly filled with electrons is called the valance
band. Unoccupied levels in this band correspond to holes. For stability, electrons
seek the lowest energy level available. If a vacancy is available at a lower energy - an
electron at a higher energy will drop to this level. The vacancy thus bubbles up to a
higher level. Therefore, holes seek the highest electron energy available.
The band just above the valance band is called the conduction band. In a semiconductor, this is partially http://www.satishkashyap.com/
filled. Conduction in a semiconducor is caused by electrons
in the conduction band (which are normally to be found at the lowest energy in the
3
conduction band) or holes in the valance band - (found at the highest electron energy
in the valance band). Band diagrams are plots of electron energies as a function of
position in the semiconductor. Typically, the top of the valance band (corresponding
to minimum hole energy) and the bottom of the conduction band are plotted. We can
show the Fermi potential and the corresponding Fermi energy(= -qF ) in the band
diagram of silicon as a level in the band gap. We use the halfway point between the
conduction and the valence band as the reference for energy and potential. When
n = p = ni , the Fermi potential is 0 (from eq. 3) and correspondingly, the Fermi
energy lies at the intrinsic Fermi level halfway in the band gap. (Actually, this level
can be slightly away from the middle of the band gap depending on the density of
allowed states in the conduction and valance bands - but for now, well ignore this).
When holes are the majority carriers, F is positive and the Fermi energy (= -q F )
lies below the mid gap level, as shown in the adjoining figure. When electrons are the
majority carriers, F is negative, and the Fermi energy lies above the mid gap level.
Ec
1.2 A semiconducor in the presence

of an electric field
In the presence of an electric field, the elctrostatic potential is different at different

Ei
positions. qO
F
EF
The energy of an electron has an extra comEv ponent = q where is the electrostatic potenV
tial. Consequently in the band diagram the conduction, valance and intrinsic levels are bent. In
X
equilibrium, the Fermi level is still straight. (We
Ec shall see later that in the absence of a current, the
slope of the Fermi level must vanish). Relations
Ei
for n and p must now take the electrostatic poEF
tential as well as the Fermi potential into account
Ev
and the electron and hole concentrations are not
uniform over the semiconductor. If we represent
Figure 1: Potential distribution the concentrations of electrons and holes without
and Band Diagram in the pres- any applied field by n0 and p0 respectively, then
ence of a field
in the presence of a field (but in equilibrium),
q
n = n 0 e KB T
p = p0 e
KqT
B
(6)
where is the electrostatic potential.

If we define a dimensionless electrostatic potential by:
u
q
KB T
(7)
we can write the above relations as:

n = n0 eu = ni e(uuF )
p = p0 eu = ni e(uuF )
(8)
Since there is equilibrium, even

though electron and hole concentration is not uniform,
the product of n and p is still constant and equal to n2i everywhere.
4
1.3
Non-equilibrium case
The above relations assume a semiconductor in equilibrium. It is possible to create

excess carriers in the semiconductor over those dictated by equilibrium considerations.
For example, if we shine light on a semiconductor, electron-hole pairs will be created.
Since the value of n as well as that of p goes up, the np product will exceed n2i , till the
equilibrium is restored after the light is turned off (by enhanced recombination). If the
number of excess carriers is small compared to the majority carriers, we may assume
that the carrier concentrations are still described by relations like those given above.
However, the concentrations of electrons and holes are not constrained by relation(2)
any more. Therefore, we cannot use the same value of uF for describing electron as
well as hole concentrations. We now have separate values of F for electrons and holes.
These are called quasi Fermi levels (or imrefs) for electrons and holes, Fn and Fp ,
defined by the relations
n = ni e(uuFn )
p = ni e(uuFp )
(9)
Where uFn and uFp are the dimensionless versions of quasi Fermi levels Fn and
Fp defined as in equation(7)). The np product is now given by
np = n2i e(uFp uFn )
(10)
and is no longer constant. Because the number of additional carriers is assumed to be

small compared to the majority carriers, the concentration of majority carriers and
hence its quasi Fermi level is very close to the equilibrium value. The relative change
in the concentration of minority carriers could, however, be large and consequently the
minority carrier quasi Fermi level could be substantially different from the equilibrium
Fermi level.
The p-n diode
We shall analyse the abrupt pn junction, in reverse and forward bias.

We assume that the doping density is constant
and its value = Na on the P side and Nd on
Xdp Xdn
the N side, changing abruptly at the metallurgical junction as shown. Because there is a strong
concentration gradient for electrons and holes at
P
N
the junction, there will be a diffusion current of
holes towards the N side and of electrons towards
the P side. As these carriers leave behind ionised
N E
c
dopants, small regions on either side of the juncEF
tion acquire a charge. The P side, from where
E
i
positively charged holes have left, (leaving behind
P
Ev
negatively charge acceptor ions), acquires a negative potential. Similarly, the N side becomes
Figure 2: The abrupt p-n junc- positively charged. The regions from where motion
bile charges have left, are called depletion regions.
The potential difference resulting from this charge redistribution (called the built-in
voltage) opposes further diffusion
of carriers. A dynamic equilibrium is reached when
the drift current due to this potential difference and the diffusion current due to the
5
concentration gradient become equal and opposite. In equilibrium, The electron as
well as hole currents must be zero individually (principle of detailed balance). Writing
the electron and hole current densities as sums of their respective drift and diffusion
current densities:
n
) + qDn
x
x
p
= pqp ( ) qDp
x
x
Jn = nqn (
Jp
(11)
From equation(9)
n
= ni e(uuFn ) (u uFn )
x
x
p
(uFp u)
= ni e
(uFp u)
x
x
or
q
n
= n
( Fn )
x
KB T x
p
q
= p
(Fp )
x
KB T x
Using Einstein relations ( KBq T D = ), and Substituting in the relations for Jn and Jp ,
) + nqn ( Fn )
x
x
= pqp ( ) pqp (Fp )

x
x
Jn = nqn (
Jp
Which leads to
Fn
;
x
Fp
= pqp
;
x
Jn = nqn
Jp
(12)
When there is no flow of current, Fn = Fp = F . according to the relations derived

above, the derivative of F must vanish everywhere for zero current. Thus, the Fermi
level is constant and the same at the two sides of the junction. The Fermi potentials
before being put in contact were:
F =
F =
KB T
ln(Na /ni)
q
KB T
q ln(Nd /ni)
The Fermi potential difference was, therefore,
P side : x < 0
N side : x > 0
KB T
q
ln
Nd Na
n2i
. Since after being put
in contact, the Fermi levels have equalised on the two sides, the built in voltage must
be equal and opposite to this potential, taking the P side to a negative potential and
the N side to a positive potential. We can write for the magnitude of the built in
voltage:
!
KB T
Na Nd
Vbi =
ln
(13)
q
n2i
6
2.1
pn Diode in Reverse Bias
The diode is reverse biased when we apply a voltage such that the n side is more
positive as compared to the p side. In this case, the applied voltage is in the same
direction as the built-in field, which opposes the movement of majority carriers and
widens the depletion regions on either side of the junction. We analyse the reverse
biased diode by making the depletion approximation. We assume that in reverse bias,
the depletion regions have zero carrier density, and the field is completely confined to
depletion regions. Solving Poissons equation in P region (x < 0) and the N region
(x > 0)
2
a
= qN
si
2
x
2
d
= qN
si
x2
(for x < 0)
(for x > 0)
Integrating with respect to x
a
= qN
x + c1
si
x
d
= qN
x + c2
si
x
(for x < 0)
(for x > 0)
where c1 and c2 are constants of integration, which can be evaluated from the condition
that the field vanishes at the edge of the depletion regions at -Xdp and at Xdn . This
leads to
a
= qN
(x + Xdp )
si
x
d
(x Xdn )
= qN
si
x
(for x < 0)
(for x > 0)
(14)
Since the value of the field must match at x = 0;

Na Xdp = Nd Xdn
(15)
Integrating equation (14) once again with respect to x, we get
qNa
si
d
= qN
si
x2
2
x2
2
+ Xdp x + c3
(for x < 0)
Xdn x + c4
(for x > 0)
Where the constants of integration c3 and c4 can again be evaluated from the boundary
conditions at -Xdp and Xdn . If we require that the potential is 0 at -Xdp and V at Xdn ,
qNa 2
X
2si dp
qNd 2
= V
X
2si dn
c3 =
c4
Substituting these values, we get:
qNa
si
2
x2 +Xdp
2
+ Xdp x
(for x < 0)
2
x2 +Xdn
d
= V qN
Xdn x
(for x > 0)
si
2
7
(16)
Since the potential at x = 0 should be continuous,
qNd 2
qNa 2
Xdp = V
X
2si
2si dn
so, V =
q
2
2
(Na Xdp
+ Nd Xdn
)
2si
(17)
making use of equation (15), we can write

V
2
qNa Xdp
(Nd + Na )
=
2si Nd
2
qNd Xdn
=
(Nd + Na )
2si Na
which leads to
Xdp =
2si V
Nd
q(Nd + Na ) Na
Xdn =
2si V
Na
q(Nd + Na ) Nd
(18)
From which the total depletion width can be calculated as:

Xd Xdp + Xdn =
2si V
q(Nd + Na )
Nd
+
Na
Na
Nd
which gives
Xd =
2si V
q
1
1
+
Na Nd
(19)
The voltage V in the above expressions is the total voltage across the junction. Since
there is a reverse bias of Vbi for a zero applied voltage, that will add (in magnitude)
to the applied reverse voltage. Using equation(13) we can write:
V = Vbi + Vappl
KB T
Na Nd
= Vappl +
ln
q
n2i
(20)
The pn diode in forward bias
If we apply an external voltage, such that the P side is made positive with respect
to the N side, the applied voltage will reduce the built in voltage across the junction.
The magnitude of the built-in voltage is such that it balances the drift and diffusion
currents, resulting in zero net current. But if the voltage across the junction is reduced,
a net current will flow through the diode. This is the forward mode of operation.
Because of this flow of current, electrons are injected into the P side and holes into
the N side. Consequently, the concentration of carriers is no longer at the equilibrium
value. We denote the equilibrium value of electron and hole concentrations on P and
N side by np0 , nn0 , pp0 , pn0 respectively. Since the majority carrier concentration in
equilibrium is equal to the doping density, we have:
nn 0 N d ,
pp0 http://www.satishkashyap.com/
Na
and
np0 = n2i /Na ,
8
pn0 = n2i /Nd
According to equation(10)
np = n2i e(uFp uFn )
As we make the potential of P type more positive compared to N type, the np product
in forward bias is greater than n2i . From relations(12), we see that the change in quasi
Fermi levels is small wherever the carrier concentration is high. Thus, we can assume
that the quasi Fermi levels of the majority carriers at either side of the junction remain
at their equilibrium values. Hence the voltage across the junction is given by
V = Fp Fn
and therefore the non-equilibrium np product is given by
np =
n2i e
qV
KB T
therefore,

np
n2i
=
e
pp
pn
n2i
=
e
nn
qV
KB T
qV
KB T

= n p0 e
= p n0 e
qV
KB T
qV
KB T

(21)
(22)
The continuity equation for any particle flow can be written as

.(particle current dencity) =
(particle concentration)
t
Applying it to electron and hole currents in 1 dimension on the n side,

!
Jn
=U
x q
!
Jp
=U
x q
where U is the net recombination rate. Using relation(11), we have
!
nn
nn n
Dn
= U
x
x
x
!
pn
p n p
+ Dp
= U
x
x
x
or
nn
2
2 nn
n
+ n nn 2 D n
= U
x x
x
x2
pn
2
2 pn
p
+ p pn 2 + Dp 2 = U
x x
x
x
Assuming the regions outside the small depletion regions to be charge neutral,
(nn nn0 ) (pn pn0 )
9
We define ambipolar diffusion and lifetime by the relations
nn + p n
nn /Dp + pn /Dp
nn n n 0
p n p n0
=
U
U
Da
a
(23)
(24)
multiplying the electron continuity equation with p pn and the hole continuity equation with n nn and combining, we get
2 pn
p n p n0
pn
nn p n
+ Da 2 +
=0
a
x
nn /p + pn /n x x
(25)
If we make the low injection assumption (pn << nn nn0 ), this reduces to
In the neutral region,
2 pn
pn
p n p n0
+ Dp 2 + p
=0
p
x
x x
(26)
is zero, so the above simplifies further to

2 p n p n p n0
=0
x2
Dp p
(27)
This can be solved with the boundary condition given by relation(21) and noting that
pn = pn0 at x = to give:

p n p n0 = p n0 e
where
Lp
qV
KB T
1 e
xxn
Lp
(28)
Dp p
(29)
Evaluating the hole current at Xdn , we get
qV
qDp pn0
pn
Jp = qDp
=
e KB T 1
x
Lp
(30)
Similarly, we can evaluate the electron current on the p side as

Jn = qDn
qV
np
qDn np0
=
e KB T 1
x
Ln
(31)
which gives the total current density as

J = Jp + Jn = Js e
Where Js
qV
KB T
qDp pn0 qDn np0

+
Lp
Ln
(32)
(33)
The MOS Capacitor
It is important to understand the MOS capacitor in order to understand the behaviour

of the the MOS transistor. Before we describe the MOS structure, it is useful to review
the basic electrostatics as applied
to parallel plate capacitors. We shall then go on to
analyse the MOS structure.
10
4.1
The Parallel Plate Capacitor
The parallel plate capacitor consists of two parallel metallic plates of area A, separated
by an insulator of thickness ti and dielectric constant . If we place a charge Q on the
upper plate, it attracts charges of opposite sign in the bottom plate, while repelling
charges of the same sign.
If the bottom plate is connected to ground, the repelled charge flows to ground.
Now the two capacitor plates hold equal and opposite charge. This charge resides just
next to the insulator on either side of it. This is true, whatever the quantity or sign
of charge placed on the upper plate. The inducing and induced charge are always
separated by the thickness of the insulator, ti . Therefore this structure has a constant
capacitance given by:
A
Ctotal =
ti
Since there are no charges inside the dielectric, the electric field in the insulator is
constant and the electrostatic potential changes linearly from one plate to the other.
4.2
The MOS capacitor
In a MOS capacitor, we replace the lower plate by a semiconductor. Unlike a metal,

a semiconductor can have charges distributed in its bulk.
+For+ the
+ +
+ of
+ an
+ example,
+ Q
sake
let us consider
t
a P type semiconductor
(Si) doped to 1016 atoms /cm3 .
i
As we know,
holes
outnumber
Q electrons in this
semiconductor by an extremely large factor. If we
Metal
place a negative charge on the upper plate, holes
will be attracted by this charge, and will accuInsulator (Oxide)
mulate near the silicon-insulator interface. This
Depletion region
situation is analogous to the parallel plate capacSemiconductor
itor and thus, the capacitance will be the same as
that for a parallel plate capacitor. If, however, we
Metal
place a positive charge on the upper plate, negative charges will be attracted by it and positive
charges will be repelled. In a P type semiconductor, there are very few electrons. The negative charge is provided by the ionised
acceptors after the holes have been pushed away from them. But the acceptors are
fixed in their locations and cannot be driven to the edge of the insulator. Therefore,
the distance between the induced and inducing charges increases - so the capacitance
is lower as compared to the parallel plate capacitor. As more and more positive charge
is placed on the upper plate, holes from a thicker slice of the semiconductor are driven
away, and the incremental induced charge is farther from the inducing charge. Thus
the capacitance continues to decrease. This does not, however, continue indefinitely.
We know from the law of mass action that as hole density reduces, the electron density
increases. At some point, the hole density is reduced and electron density increased to
such an extent that electrons now become the majority carriers near the interface.
This is called inversion. Beyond this point, more positive charge on the upper plate is
answered by more electrons in the semiconductor. But the electrons are mobile, and
will be attracted to the silicon insulator interface. Therefore, the capacitance quickly
increases to the parallel plate value.
11
Accumulation
Inversion
Capacitance
Depletion
Figure 3: Low frequency capacitance for a MOS capacitor
4.3
Quantitative Analysis
Consider a one dimensional representation of the MOS structures as shown in the

figure below.
The origin is assumed to be at the silicon-oxide
interface and the positive x direction is into the
M O
S
M
bulk of silicon. Using a one dimensional analysis,
we want to relate the semiconductor charge to the
applied gate voltage. In a practical case, there is a
potential difference between two dissimilar materials in contact. Also, the silicon - oxide interface
will have some fixed charge sitting there. Howo
X
ever, we consider the ideal case first - where there
is no built in contact potential between the semiconductor and the metal, and there
is no interface charge.
4.3.1
Ideal Case
Let the back surface of Si be at zero potential and the voltage applied to the gate
terminal be Vg . Let the electrostatic potential at any point x be denoted by (x) and
let the potential at the silicon-oxide interface be s .
We construct a Gaussian box passing through
the interface and extending to +. According
to Gauss law, the integral of the outward pointM O
S
M
ing D vector around the box should be equal to
the charge contained inside. The only boundary
where D is non zero is the one passing through
the interface. Therefore,
Area ox
s V g
= Total Charge in silicon
tox
Gaussean Box
If we define Qsi to be the semiconductor charge

per unit area, and Cox to be the parallel plate capacitance per unit area, we get
Vg = s
Qsi
Cox
Thus, the surface potential and the applied gate voltage can be related to each other.
If the surface potential is known, we can evaluate the semiconductor charge by intehttp://www.satishkashyap.com/
grating the Poissons equation
in the semiconductor, once.
12
We can write the Poissons equation in the semiconductor as
D=
or
2
= q(Nd+ Na + p n)
x2
Since the electrostatic potential is dependent only on x, we can change partial derivatives to total derivatives.
si
d2
d
d
2 =
dx
dx
dx
d
(E)
dx
where E is the electrostatic field. Changing the variable from x to .

d2
dE
2 =
=
dx
dx
d
dx
d
d
1 d 2
(E) = E (E) =
E
d
d
2 d
If we define
u
where
q
KB T
We get
d 2
1 d 2
d2
(34)
E =
E
2 =
dx
2 d
2 du
The right hand side of the Possons equation represents the charge density. In the
absence of an applied voltage, this must be zero everywhere. Therefore,
q(Nd+ Na + p0 n0 ) = 0
where p0 and n0 represent the hole and electron density in the absence of an applied
field. therefore,
Nd+ Na = (p0 n0 )
Sustituting equation(34) and the above in the Poissons equation,
so
si d 2
E = q [p p0 (n n0 )]
2 du
"
d 2
2qp0 p
n0
E =
1
du
si p0
p0
n
1
n0
#
From equation(8)
n = n 0 eu
and
p = p0 eu
So,
"
d 2
2qp0 u
n0
E =
e 1 (eu 1)
du
si
p0
This can be integrated from x = (where E = 0 and u = 0) to x to give

"
2qp0 u
n0
E =
e 1 + u (eu 1 u)
si
p0
2
Therefore
s
"
2qp0 u
n0
E = http://www.satishkashyap.com/
e 1 + u (eu 1 u)
si
p0
13
#1
And thus, the displacement vector D can be evaluated as:
s
"
2qp0 si u
n0
D = si E =
e 1 + u (eu 1 u)
p0
#1
(35)
This equation permits us to calculate D (= si u

) from u. In fact if u is very small,
x
the exponentials in u can be exapanded to second order. The first two terms cancel
with 1 and u, leaving
s
!
u
n0
qp0
1+
u
'
x
si
p0
if we take
q n0 << p0 , we get exponential solutions for u with a characteristic length
si
LD = qp
This implies that small local perturbations in potential tend to decrease
0
exponentially, with this characteristic length. This length is known as the extrinsic
Debye Length.
By putting u = us in eq. 35, we get the D vector at the surface. We construct
a Gaussean box passing through the interface and enclosing the semiconductor (as
desribed in section 4.3.1) The charge contained in the box is then the integral of
the outward pointing D vector over the surface of the box. D is non zero only at
the interface. The outward pointing D is along the negative x axis. Therefore by
application of Gauss theorem,
Sem. Charge = Area (D)
Hence the charge in the semiconductor per unit area is:
Qsi
where
=
us
and
LD
"
#1
2
n0 us
2si us
1 + us +
e
(e 1 us )
LD
p0
s
q
K T
sB
si
= The Extrinsic Debye Length

qp0
(36)
Notice that Qsi is the charge in the semiconductor per unit area. In this treatment,
we shall use symbols of the type Q and C with various subscripts to denote the corresponding charges and capacitance values per unit area. Qsi consists of mobile as
well as fixed charge. The mobile charge is contributed by holes when us < 0 and
by electrons when us > 0 (for a P type semiconductor). As we shall see later, the
mobile electron charge is substantial only when the positive surface potential exceeds
a threshold value.
The fixed charge is contributed by the depletion charge when the surface potential
is positive. The depletion charge per unit area can be calculated by the depletion
formula.
q
(s > 0)
Qdepl = qNa Xd = 2qNa si s
A somewhat more accurate expression for depletion charge accounts for slightly lower
charge density at the edge of the depletion region by subtracting KB T/q from s .
Qdepl = qNa Xd = 2qNa si (s KB T /q)
(s > KB T /q)
q
14
(37)
Abs. Sem. Charge (C/cm2 )
1e05
1e06
Maj. Carrier
Charge
Q
total
1e07
Depl.
1e08
1e09
0.4
0.2
0.2
0.4
0.6
0.8
Gate Voltage (V)

Figure 4: semiconductor charge as a function of surface potential
Calculated values for the total semiconductor charge per unit area (ie. inclusive
of depletion and mobile charge) and just the depletion charge per unit area have
been plotted in figure 4 for a P type semiconductor doped to 1016 /cm3 . For small
positive surface potential, the total semiconductor charge contains only depletion
charge. However, beyond a surface potential near 2F , the total charge exceeds the
depletion charge very rapidly. This additional charge is due to mobile minority carriers
(in this case, electrons).
4.3.2
Practical case
A practical MOS structure will differ from the ideal case assumed above in a few
respects. There is a built-in potential difference between the metal used and Si, due
to the difference between their work functions. This shifts the relationship between
Vg and s . Also, there is a fixed oxide charge which resides essentially at the siliconoxide interface. Thus, the total charge in the Gaussian box includes this fixed charge
and the semiconductor charge. These two non-idealities can be accounted for by
modifying the relationship between Vg and s to be
Vg = ms + s
Qsi + Qox
Cox
(38)
Where ms is the metal to semiconductor work function difference.

Figure 5 shows the surface potential as a function of applied voltage for a MOS
capacitor with oxide thickness of 22.5 nm, substrate doping of 1016 /cc, oxide charge
of 4 1010 q and aluminium as the gate metal. The surface potential changes quite
slowly as a function of gate voltage in the accumulation and inversion regions.
The absolute value of semiconductor charge has been plotted as a function of applied
gate voltage in figure 6. (The charge is actually negative for positive gate voltages).
As one can see, for small positive gate voltages, the entire semiconductor charge
is depletion charge. As the voltage exceeds a threshold voltage, the total charge
becomes much larger than the depletion charge. The excess charge is provided by
mobile electron charges. This
is the inversion region of operation, where electrons
become the majority carriers near the surface in a p type semiconductor. Notice that
15
Surface Potential (V)
1.0
0.8
0.6
0.4
0.2
0.0
0.2
4.0
2.0
0.0
2.0
GATE VOLTAGE (V)
4.0
Figure 5: Surface potential as a function of gate voltage

Q
Abs. Sem. Charge (C/cm )
1e06
total
Q
1e07
inv
depletion
1e08
1e09
2
Gate Voltage (V)
Figure 6: Semiconductor charge as a function of gate voltage

the depletion charge is practically constant in this region. This region begins when
the surface potential exceeds 2F .
The MOS Transistor
Inversion converts a p type semiconductor to n type at the surface. We can use this
fact to construct a transistor. We place semiconductor regions strongly doped to N
type on either side of a MOS capacitor made using P type silicon. Now if we try
S
n+
GATE
D
n+
P type Si
Figure 7: A MOS Transistor
to pass a current between these two N regions when inversion has not occurred, we
encounter series connected NP and PN diodes on the way. Whatever the polarity of
the voltage applied to passhttp://www.satishkashyap.com/
current, one of these will be reverse biased and practically
no current will flow.
16
However, after inversion, the intervening P region would have been converted to
N type. Now there are no junctions as the whole surface region is n type. Current
can now be easily passed between the two n regions. This structure is an n channel
MOS transistor. PMOS transistors can be similarly made using P regions on either
side of a MOS capacitor made on n type silicon. When current flows in an n channel
transistor, electrons are supplied by the more negative of the two n+ contacts. This
is called the source electrode. The more positive n+ contact collects the electrons and
is called the drain. The current in the transistor is controlled by the metal electrode
on top of the oxide. This is called the gate electrode.
I-V characteristics of a MOS transistor
A quantitative derivation of the current-voltage characteristics of the MOS device is

complicated by the fact that it is inherently a two dimensional device. The vertical
field due to the gate voltage sets up a mobile charge density in the channel region as
seen in figure 6. The horizontal field due to source-drain voltage causes these charges
to move, and this constitutes the drain current. Therefore, a two dimensional analysis
is required to calculate the transistor current, which can be quite complex. However,
reasonably simple models can be derived by making several simplifying assumptions.
6.1
A simple MOS model
We make the following simplifying assumptions:

The vertical field is much larger than the horizontal field. Then, the resultant
field is nearly vertical, and the results derived for the 1 dimensional analysis
for the MOS capacitor can be used to calculate the point-wise charge density
in the channel. This is known as the gradual channel approximation. Accurate
numerical simulations have shown that this approximation is valid in most cases.
The source is shorted to the bulk.
The gate and drain voltages are such that a continuous inversion region exists
all the way from the source to the drain.
The depletion charge is constant along the channel.
The total current is dominated by drift current.
The mobility of carriers is constant along the channel.
Figure 8 shows the co-ordinate system used for evaluating the drain current. The x
axis points into the semiconductor, the y axis is from source to the drain and the
z axis is along the width of the transistor. The origin is at the source end of the
channel. We represent the channel voltage as V(y), which is 0 at the source end and
Vd at the drain end. We assume the current to be made up of just the drift current.
Since we are carrying out a quasi 2 dimensional analysis, all variables are assumed
to be constant along the z axis. Let n(x,y) be the concentration of mobile carriers
(electrons for an n channel device) at the position x,y (for any z). The drift current
density at a point is
J = no.http://www.satishkashyap.com/
of carriers charge per carrier velocity
17
L
W
Y
S
D
Z
X
dy
Figure 8: Coordinate system used for analysing the MOS transistor

V (y)
= n(x, y) (q)
y
V (y)
= n(x, y)q
y
Integrating the current density over a semi-infinite plane at the channel position y (as
shown in the figure 8) will then give the drain current.
Id =
x=0
W
z=0
n(x, y)q
V (y)
dzdx
y
Since there is no dependence on z, the z integral just gives a multiplication by W.

Therefore,
Id = W q
x=0
n(x, y)
V (y)
dx
y
the value of n(x,y) is non zero in a very narrow channel near the surface. We can
assume that Vy(y) is constant over this depth. Then,
Id
V (y)
= W q
y
x=0
n(x, y)dx
but q x=0
n(x, y)dx = Qn (y) where Qn (y) is the electron charge per unit area in the
semiconductor at point y in the channel. (Qn (y) is negative, of course). therefore
Id = W
V (y)
Qn (y)
y
(39)
Integrating the drain current along the channel gives

Z
Id dy = W
Id L = W
L
0
Qn (y)
Vd
0
V (y)
dy
y
Qn (y)dV (y)
W Vd
So, Id =
Qn (y)dV (y)
L 0
18
Z
We now use the assumption that the surface potential due to the vertical field saturates
around 2F if we are in the inversion region. Therefore, the total surface potential at
point y is V(y) + 2 F . Now, by Gauss law and continuity of normal component of
D at the interface,

Cox Vg MS s = (Qsi + Qox )
therefore,
Qsi = Cox Vg MS V (y) 2F + Qox /Cox

However,
Qsi = Qn + Qdepl
So
Qn (y) = Qsi (y) + Qdepl

= Cox Vg MS V (y) 2F + (Qox + Qdepl )/Cox
We have assumed the depletion charge to be constant along the channel. Let us define
VT MS + 2F
(Qox + Qdepl )
Cox
then
Qn (y) = Cox (Vg VT V (y))
and therefore,
Id
W Vd
= Cox
(Vg VT V (y))dV (y)
L 0
1
W
= Cox [(Vg VT )Vd Vd2 ]
L
2
Z
(40)
This derivation gives a very simple expression for the drain current. However, it
requires a lot of simplifying assumptions, which limit the accuracy of this model.
If we do not assume a constant depletion charge along the channel, we can apply the
depletion formula to get its dependence on V(y).
q
Qdepl = 2si qNa (V (y) + 2F )

then,

Qn = Cox Vg MS V (y) 2F + Qox

which leads to
Id = Cox
W
L

2si qNa (V (y) + 2F )
Qox
1
Vg MS 2F +
Vd Vd2
Cox
2
#

2 2si qNa
3/2
3/2
(Vd + 2F ) (2F )
3 Cox

This is a more complex expression, but gives better accuracy.
19
6.2
Modeling the saturation region
The treatment in the previous section is valid only if there is an inversion layer all the
way from the source to the drain. For high drain voltage, the local vertical field near
the drain is not adequate to take the semiconductor into inversion. Several models
have been used to describe the transistor behaviour in this regime. The simplest of
these defines a saturation voltage at which the channel just pinches off at the drain
end. The current calculated for this voltage by the above models is then supposed
to remain constant at this value for all higher drain voltages. The pinchoff voltage is
the drain voltage at which the channel just vanishes near the drain end. Therefore,
at this point the gate voltage Vg is just less than a threshold voltage above the drain
voltage Vd . Thus, at this point,
Vdsat = Vg VT
The current calculated at Vdsat will be denoted as Idss . Thus,
Idss = Cox
W
1
[(Vg VT )2 (Vg VT )2 ]
L
2
for the simple transistor model. Thus

1
W
Idss = Cox (Vg VT )2
2
L
(41)
The drain current is supposed to remain constant at this Vd independent value for all
drain voltages > Vg VT .
6.2.1
Early Voltage approach
Assuming a constant current in the saturation region leads to an infinite output resistance. This can lead to exaggerated estimates of gain from an amplifier. Therefore,
we need a more realistic model for the transistor current in the saturation region.
One of these is a generalisation of the model proposed by James Early for bipolar
transistors. This model is not strictly applicable to MOS transistors. However, due
to its numerical simplicity, it is often used in compact models for circuit simulation.
A geometrical interpretation of the Early model states that the drain current
increases linearly in the saturation region with drain voltage, and if saturation characteristics for different gate voltages are produced backwards, they will all cut the
drain voltage axis at the same (negative) drain voltage point. The absolute value of
this voltage is called the Early Voltage VE .
The current equations in saturation mode now become:
Idss Id (Vg , Vdss )
Vd + V E
Id = Idss
Vdss + VE
For Vd > Vdss
(42)
Any model can be used for calculating the drain current for Vd < Vdss . The value of
Vdss will be determined by considerations of continuity of the drain current and its
derivative at the changeover point from linear to saturation regime. For example, if
20
we use the simple model described in eq. 40,
W
Id
= Cox (Vg VT Vd )
For Vd Vdss
Vd
L
Id
Idss
=
For Vd Vdss
Vd
Vdss + VE

W
1 2
Idss Cox
(Vg VT ) Vdss Vdss
L
2
And
Where
On matching the value of
Id
Vd
on both sides of Vdss , we get

s
Vdss = VE 1 +
2 (Vg VT )
1
VE
In practice, VE is much larger than Vg VT . If we expand the above expression, we

find that to first order the value of Vdss remains the same as the one used in the simple
model - that is, Vg VT . Expansion to second order gives
Vdss
6.2.2
Vg V T
' (Vg VT ) 1
2VE

(43)
Simulation Model
Since the value of Vdss does not change substantially from the ideal saturation case,
a simpler approach can be tried. The drain current is calculated using the ideal
saturation model and its value is multiplied by a correction factor = (1 + Vd ) in
saturation as well as in linear regime. This automatically assures continuity of Id and
its derivative. is a fit parameter, whose value is 1/VE . This approach is used in
SPICE, a popular circuit simulation program.
21
The Design Process

Basic HDL concepts
Concurrent and sequential Descriptions
Hardware Description Languages

Basic Concepts
Dinesh Sharma
Microelectronics Group, EE Department
IIT Bombay, Mumbai
May 2006
Dinesh Sharma, May 2006
The Design Process

Basic HDL concepts
Design Flow
The Design Process
We ask our selves the question:

What is Electronic Design?
The Design Process

Basic HDL concepts
Design Flow
The Design Process

Given specifications, we want to develop a circuit by connecting
known electronic devices, such that the circuit meets given
specifications.
The Design Process

Basic HDL concepts
Design Flow
The Design Process

specifications.
Specifications refer to the description of the desired behaviour
of the circuit.
The Design Process

Basic HDL concepts
Design Flow
The Design Process

specifications.
Specifications refer to the description of the desired behaviour
of the circuit.
Known devices are those whose behaviour can be modeled
by known equations or algorithms, with known values of
parameters. http://www.satishkashyap.com/
The Design Process

Basic HDL concepts
Design Flow
Electronic Design
Electronic Design is the process of converting

a behavioural description (What happens when ..)
to
a structural description (What is connected to what and how ..)
After conversion to a structural description, we may need to do
Physical Design which involves choosing device sizes,
placement of blocks, routing of interconnect lines etc.
This part is already done for us in FPGA based design.
The Design Process

Basic HDL concepts
Design Flow
Conquest over Complexity
The main challange for modern electronic design is that the

circuits being designed these days are extremely complex.
While IC technology has moved at a rapid pace,
capabilities of human brain have remained the same :-(
The human mind cannot handle too many objects at the
same time. So a complex design has to be broken down
into a small number of manageable objects.
If each object is still too complex to handle, the above
process has to be repeated recursively. This leads to
hierarchical design.
Systematic procedures have to be developed to handle
complexity.
The Design Process

Basic HDL concepts
Design Flow
A page out of the software designers book
We must learn from the experience of software designers for

handling complexity.
We must adopt:
Hierarchical Design.
Modular architecture.
Text based, rather than pictorial descriptions.
Re-use of existing resources
The Design Process

Basic HDL concepts
Design Flow
Abstraction Levels
Types and levels of modeling
Structural
Geometric
Abstraction levels refer to

functional, structural or
geometric views of the design.
Top down design begins with
higher levels of abstraction.
Low
Levels of
Abstraction
High
As we go to lower levels of
abstraction, the level of detail
goes up.
It is advantageous to do as
much work as possible at
higher levels of abstraction,
when thw detail is low.
Y chart
Gajski and Kahn
Functional
The Design Process

Basic HDL concepts
Design Flow
Abstraction Levels: Geometric

Floor Plan
Geometric
Unit Cells
Stick Diagrams
Polygons
At high levels of geometric

abstraction, we view the layout
as a floor plan with blocks.
At lower levels, we look at
basic cells.
At lower levels still, we view
transistors as stick diagrams.
Y chart
Gajski and Kahn
At the lowest level, we have to

worry about all rectangles and
polygons making up the
layout.
The Design Process

Basic HDL concepts
Design Flow
Abstraction Levels: Structural

Structural
Functional
Blocks
Registers
Gates
At high levels of abstraction,

we view the structure in terms
of functional blocks or IP
cores.
At lower levels, we see it in
terms of registers, simple
blocks
Transistors
Y chart
Gajski and Kahn
At still lower levels, we view it

in terms of logic gates etc.
At the lowest level, we have to

see full details at transistor
level.
The Design Process

Basic HDL concepts
Design Flow
Abstraction Levels: Functional

At the top level, we have the

functional specifications.
At lower levels, we view the
design in terms of protocols
and algorithms.
Equations
Data and
Control Flow
Algorithms
Specifications
At Still lower levels, we view it

in terms of data and control
flow etc.
At the highest level of detail,

we have to worry about all the
governing equations at all
nodes.
Y chart
Gajski and Kahn
Functional
The Design Process

Basic HDL concepts
Design Flow
Design Flow: System and logic level

System Partitioning
Block specification
Block Level Simulation
OK?
Logic Design
Logic Simulation
OK?
The Design Process

Basic HDL concepts
Design Flow
Design Flow: Physical level

Physical Design
Layout, Back extraction
Resimulation, Timing
OK?
Mask Making
Fabrication
Test
Debug
OK?
The Design Process

Basic HDL concepts
Design Flow
Hierarchical Design
The design process has to be hierarchical.

A complex circuit is converted to a structural description of
blocks which have not yet been designed - but whose
behaviour can be described.
Each of these blocks is then designed as if it was an
independent design problem of lower complexity.
This process is continued till all blocks are broken down
into known devices.
It is essential that any departure from proper operation is
detected early - at a low complexity level.
A hardware description language must be able to simulate
a system whose components have been designed to
different levels of detail.
The Design Process

Basic HDL concepts
Design Flow
But Hardware is different!
Hardware components are concurrent

(all parts work at the same time).
Whereas (traditional) software is sequential (executes an instruction at a time).
Description of hardware behaviour has timing as an integral
part.
Traditional software is not real time sensitive.
Therefore, design of complex hardware involves many more
basic concepts beyond those of programming languages.
The Design Process

Basic HDL concepts
Design Flow
Hardware description languages need the ability to

Describe
Simulate at
Behavioural
Structural
and mixed
level.
and to synthesize (structure from behaviour).
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Simulation of hardware
Basic HDL concepts
Timing
Concurrency
Hardware Simulation process which involves:
Analysis
Elaboration
and Simulation
Simulation proceeds in two distinct phases

Signal update
Selective re-simulation
The Design Process

Basic HDL concepts
HDL Uses
Timing and Delays

concurrency
Hardware Description Languages are used for:

Description of
Interfaces
Behaviour
Structure
Test Benches
Synthesis
The Design Process

Basic HDL concepts
Delays
Timing and Delays

concurrency
How do we describe delays?
In
Delay = 30uS
Out
Out <= In AFTER 30 uS;
Is this description unambiguous?
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Delay: Inertial
In 30uS
Out
In
x
out
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Delay: Transport
In
Optical Fibre
Out
Delay=30uS
In
Out
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Modeling Delay
So the same amount of delay (30 S in our example), can

result in qualitatively different phenomena!
We have to define two different kinds of delay
Inertial Delay is the RC kind of delay, which swallows pulses
much narrower than the delay amount.
Transport Delay is the optical fibre kind of delay, which lets all
pulses pass through irrespective of their width.
In most hardware description languages, Delays are inertial by
default.
The delay amount is taken to be zero if not specified.
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Signal Assignments: Transactions
To represent real hardware, each signal assignment has to be

associated with a delay.
When a value is assigned to a signal, the target signal does not
acquire the assigned value immediately. The value is acquired
after some delay.
Remembering that a signal is scheduled to acquire a value in
the future is called a Transaction
Thus, when an assignment is made, we imply that the target
signal will acquire this value after so much delay of this type.
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Concept of delta delay
When a transaction is placed on a signal, the default type of

delay is inertial and the default amount of delay is zero.
Zero delay is implemented as a small () delay which goes to
zero in the limit.
This has scheduling implications.
Events occurring at t, t + , t + 2 are all reported as having
occurred at t, but are time ordered as if were non zero.
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Handling Concurrency
Concurrency is handled by following an even driven

architecture.
In a concurrent system many things can happen at the
same time.
We can efficiently handle only one thing at a time,
Therefore we need to control the passage of time.
Time is treated as a global variable. Things which happen
simultaneously are handled one after the other, keeping
the time value the same. Time is incremented explicitly
after all events at the current time have been handled.
Obviously, the value of the time variable represents the
time during the operation of the concurrent system - and
has nothing to do with the actual time taken by a computer
to simulate the system.
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Hardware Simulation
Hardware simulation involves three stages:

Analysis Syntax of hardware description is checked and
interpreted.
Elaboration This is a preparatory step which sets up a
hierarchically described circuit for simulation.
Flattening the hierarchy: For structural
descriptions, components are expanded, till
the circuit is reduced to an interconnection of
simple components which are described
behaviourally.
Data structures describing sensitivity lists of
all elemental components are built up.
Simulation Event
driven simulation is carried out.
The Design Process

Basic HDL concepts
Analysis
Timing and Delays

concurrency
Check for Syntax and Semantics

Syntax: Grammar of the language
Semantics: Meaning of the model
Analyse each design unit separately
Place analysed units in a working library,
(generally in an implementation dependent internal form to
enhance efficiency).
The Design Process

Basic HDL concepts
Elaboration
Timing and Delays

concurrency
This step builds up a detailed circuit from a hierarchical

description.
Flatten the design hierarchy
Create ports (interfaces with other blocks).
Create signals and processes.
For each instantiated component, copy the component
template to the instance.
Repeat recursively till we are left only with behaviourally
described atomic modules.
The end result of elaboration is a flat collection of signal

nets connected to behaviourally described modules
through defined ports.
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Event Driven Simulation
We maintain a time-ordered queue of signals which are

waiting to acquire their assigned values.
The time variable is advanced to the earliest entry in this
queue.
All signals waiting for acquiring their values at this time are
updated.
If this updating results in a change in the value of a signal,
an Event is said to have occurred on this signal.
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Sensitivity List
During the elaboration phase, we determine which pieces of

hardware are affected by (are sensitive to) which event.
This is called a sensitivity list
The data structure is optimized for reverse look up:
That is, given an event, one can quickly get a list of all
hardware which is sensitive to it.
Notice that hardware could be sensitive to a particular kind of
change- for example to a rising edge of the clock.
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
The Simulation Cycle
The time variable is advanced to the earliest time entry in the

time ordered queue of transactions.
The update phase Update all signals which were to acquire
their values at the current time (and then delete
their entry from the queue).
Event handling phase If the value of a signal changes due to
the above update, it is said to have had an event.
All events which resulted at the current time are
handled by a scheduler.
The Design Process

Basic HDL concepts
Scheduling
Timing and Delays

concurrency
For each event that took place at the current time,

We re-simulate all modules which are sensitive to this
event.
As a result of re-simulation, fresh transactions will be
placed on various signals. These are inserted at
appropriate positions in the time ordered queue.
This is done for all events which occurred at the current time.
When all events have been handled, we advance the time to
the earliest entry in the time ordered transactions list and start
the update phase again.
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
A Simulation Example
20
Nodes: A,B and C

Input A, Output C
Inverter Delay: 8 units
NAND delay: 6 units
50
Sensitivity List
Event on A Inverter, NAND

Event on B NAND
Time ordered Transaction List:
Time Trans.
0
A=0
A=1
20
50
A=0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20
50
6
B
At Time = 0, update A = 0.
Time A B C
Initial X X X
0
0 X X
A has an event.
Inverter and NAND are sensitive to A.
After Re-sim
Initial
Time
Trans.
Re-evaluate:
Time Trans.
6
C=1
0
A=0
Inverter: B 1 at 8;
8
B=1
A=1
20
NAND:
C 1 at 6
20
A=1
50
A = 0http://www.satishkashyap.com/
50
A=0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20
50
6
B
At Time = 6, update C = 1.
Time A B C
0
0 X X
6
0 X 1
C has an event.
No module is sensitive to C.
A
B
10
20
30
40
Initial
After Re-sim
Time Trans.
Time
Trans.
Re-evaluate:
6
C=1
8
B=1
8
B=1
A=1
20
None Required
20
A=1
50
A
=0
50
50
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20
50
6
B
At Time = 8, update B = 1.
Time A B C
6
0 X 1
8
0 1 1
B has an event.
Only NAND is sensitive to B.
A
B
10
20
30
40
After Re-sim
Initial
Time
Trans.
Time Trans.
Re-evaluate:
8
B=1
14
C=1
NAND: C 1 at 14
A=1
A=1
20
20
50
A = 0http://www.satishkashyap.com/50
A=0
50
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20
50
6
B

Time A B C
8
0 1 1
14
0 1 1
There is no event.
No Sensitivity is triggered.
A
B
10
20
30
40
Initial
After Re-sim
Time Trans.
Re-evaluate:
Time Trans.
14
C=1
20
A=1
A=1
20
None Required
A=0
50
50
50
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20
50
6
B

Time A B C
14
0 1 1
20
1 1 1
A has an event.
Inverter and NAND are sensitive to
A.
A
B
10
20
30
40
After Re-sim
Initial
Re-evaluate:
Time Trans.
Time Trans.
26
C=0
Inverter: B 0 at 28;
20
A=1
B=0
28
NAND:
C 0 at 26
50
50
A=0
50
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20
50
6
B

Time A B C
20
1 1 1
26
1 1 0
C has an event.
No module is sensitive to C
A
B
10
20
30
40
Initial
After Re-sim
Time Trans.
Re-evaluate:
Time Trans.
26
C=0
28
B=0
B=0
28
No update is required.
A=0
50
50
50
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20

Time A B C
26
1 1 0
28
1 0 0
B has an event.
Only NAND is sensitive to B.
50
Initial
Time Trans.
28
B=0
A=0
50
Re-evaluate:
NAND:
C 1 at 34
A
B
10
30
40
After Re-sim
Time Trans.
34
C=1
A=0
50
20
50
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20

Time A B C
28
1 0 0
34
1 0 1
C has an event.
No module is sensitive to C.
50
Initial
Time Trans.
34
C=1
A=0
50
Re-evaluate:
No evaluation needed.
A
B
10
30
40
After Re-sim
Time Trans.
50
A=0
20
50
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20

Time A B C
34
1 0 1
50
0 0 1
A has an event.
Inverter and NAND are sensitive to
A.
50
A
B
10
20
30
40
After Re-sim
Time Trans.
Inverter: B 1 at 58;
56
C=1
NAND:
C
1
at
56
B=1
http://www.satishkashyap.com/58
Initial
Time Trans.
50
A=0
Re-evaluate:
50
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20

Time A B C
50
0 0 1
56
0 0 1
There is no event
No Sensitivity is triggered.
50
Initial
Time Trans.
56
C=1
B=1
58
Re-evaluate:
No re-evaluation
required.
A
B
10
30
40
After Re-sim
Time Trans.
58
B=1
20
50
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20

Time A B C
56
0 0 1
58
0 1 1
B has an event
Only NAND is sensitive to B
50
Initial
Time Trans.
58
B=1
Re-evaluate:
NAND:
C 1 at 64
A
B
10
30
40
After Re-sim
Time Trans.
64
C=1
20
50
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
20

Time A B C
58
0 1 1
64
0 1 1
There is no event
No sensitivity is triggered.
50
Initial
Time Trans.
64
C=1
Re-evaluate:
No re-evaluation
required.
A
B
10
30
40
50
After Re-sim
Time ordered list is
empty.
20
60
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Scheduling for Delay types
What do we do if there is more than one

transaction waiting for the same signal?
Inertial Delay A transaction scheduled for later time results in
deletion of waiting transactions for a different value
on the same signal.
Transport Delay All transactions are retained and signal
assignments made at their respective times.
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Inertial Delay Example
Time Transaction
0 In := 0
In
Inertial 30uS
Out
40 In := 1
45 In := 0
In
0
40 45
80
130
Out
30
110
80 In := 1
160
130 In := 0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Time Transaction
In
Inertial 30uS
30 out :=0
Out
40 In := 1
45 In := 0
In
0
40 45
80
130
Out
30
110
80 In := 1
160
130 In := 0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Time Transaction
In
Inertial 30uS
Out
40 In := 1
45 In := 0
In
0
40 45
80
130
Out
30
110
80 In := 1
160
130 In := 0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Time Transaction
In
Inertial 30uS
Out
45 In := 0
In
0
40 45
80
70 Out := 1
130
Out
30
110
80 In := 1
160
130 In := 0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Time Transaction
In
Inertial 30uS
Out
In
0
40 45
80
70 Out := 1
130
75 Out :=0
Out
30
110
80 In := 1
160
130 In := 0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Time Transaction
In
Inertial 30uS
Out
In
0
40 45
80
130
75 Out :=0
Out
30
110
80 In := 1
160
130 In := 0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Time Transaction
In
Inertial 30uS
Out
In
0
40 45
80
130
Out
30
110
80 In := 1
160
130 In := 0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Time Transaction
In
Inertial 30uS
Out
In
0
40 45
80
130
Out
30
110
160
110 Out := 1
130 In := 0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Time Transaction
In
Inertial 30uS
Out
In
0
40 45
80
130
Out
30
110
160
130 In := 0
The Design Process

Basic HDL concepts
Timing and Delays

concurrency
Time Transaction
In
Inertial 30uS
Out
In
0
40 45
80
130
Out
30
110
160
160 Out := 0
The Design Process

Basic HDL concepts
concurrent Descriptions
Sequential Descriptions
Concurrent Descriptions
The order of placing concurrent descriptions in a

hardware description language is immaterial.
As seen in the example described earlier, each concurrent
block is handled when its sensitivity is struck, wherever it
is placed in the overall description.
So what defines the limits of a concurrent block?
If it is a single line, there is no problem.
If the description of a concurrent block needs multiple
lines, How are these lines to be executed?
The Design Process

Basic HDL concepts
Multi-line concurrent descriptions
A multiline concurrent block has to be executed completely

when its sensitivity is struck.
Therefore, the multi-line description of a complex
concurrent block must be executed sequentially, line by
line.
A hardware description language must therefore provide a
syntax to distinguish sequential parts from concurrent
parts.
(After all, a single line of description could be a
stand-alone concurrent description or part of a multi-line
sequential code).
Multiline descriptions of hardware blocks are concurrent
sequential inside!
outside and
The Design Process

Basic HDL concepts
Describing hardware by sequential code raises a problem!

What happens when the sequential description reaches its
end?
Hardware blocks are perpetual objects. These cannot
terminate like software routines.
The Design Process

Basic HDL concepts

end?
We can make sequential descriptions perpetual by adding
the convention that a sequential description loops back to
its beginning when it reaches its end.
The Design Process

Basic HDL concepts

end?
We can make sequential descriptions perpetual by adding
the convention that a sequential description loops back to
its beginning when it reaches its end.
This, however, leads to yet another problem!
The Design Process

Basic HDL concepts
Suspending endless loops
An endless loop will never terminate.

Then how can we handle the next event?
Indeed, when can we advance the time variable?
The Design Process

Basic HDL concepts
Suspending endless loops
An endless loop will never terminate.

Then how can we handle the next event?
Indeed, when can we advance the time variable?
The convention should therefore be that when a sequential
description ends, execution will loop back to the beginning,
and execution of the loop will be suspended here!
The supsended loop will restart only when the sensitivity of this
block is struck again.
The Design Process

Basic HDL concepts
Now we can handle multiple blocks waiting to be handled at any

given time.
We handle each block whose sensitivity has been triggered, till
it is suspended.
Then we handle the next block and so on, till all blocks have
been done.
Now we update the time to the next earliest entry in the time
order queue and go through the next signal update - event
handling cycle.
The Design Process

Basic HDL concepts
This ends
The first part of the lecture series on
HARDWARE DESCRIPTION LANGUAGES

Fundamental Concepts
Current Mode Interconnect

Dinesh Sharma, Marshnil Dave
Department Of Electrical Engineering
Indian Institute Of Technology, Bombay
Sept. 25, 2010
Current Mode Interconnects Group at IIT Bombay

Prof. Maryam Shojaei Baghini
Marshnil Dave
Amit Vishnani
Navin Kacharappu
Sandeep Waikar
Girish Naik
Dinesh Sharma
Supreet Joshi
Rajkumar Satkuri
Mahavir Jain
M. Veerraju
Part I
Current Mode Data Communication

Scaling
Unscaled Interconnect Delay
Solutions for Interconnect Delay problem
Buffer Insertion
Current signaling
Inductive Peaking
Dynamic Overdriving
Scaling
To increase packing density, we would like to reduce the

size of transistors and passive components.
In order to decrease lateral sizes, we have to reduce

vertical sizes too.
If dimensions are scaled down, voltages must also be

reduced to avoid breakdown.
This is known as constant field scaling.

So what price do we have to pay to get denser, more complex
circuits?
MOS model
Drain Current (mA)
1.4
Vg = 3.5
1.2
1.0
3.0
0.8
0.6
For Vgs VT ,
Ids = 0
For Vgs > VT and Vds Vgs VT ,

2
Ids = K (Vgs VT )Vds 12 Vds
2.5
0.4
2.0
0.2
0.0 0.5
1.5
1.0
1.0 1.5 2.0 2.5 3.0 3.5
Drain Voltage (V)
K Cox
W
L
4.0 4.5
For Vgs > VT and Vds > Vgs VT ,

Ids = K2 (Vgs VT )2
ox
tox
(Gate capacitance Cox is per unit area)
Cox
Consequences of Scaling
All dimensions and voltages divided by the factor S(> 1).

Device area
W L : ( S)( S)
S2
Cox
ox /tox : const/( S)
S
2
Ctotal
A/t : ( S )/( S)
S
VDS , VGS , VT
Voltages : ( S)
S
2
Id
Cox (W /L)( V ) :
( S)(const)( S 2 )
S
dV
Slew Rate dt
I/Ctotal : ( S)/( S)
const.
Delay
V / dV
S
:
(
S)/(const)
dt
Static Power
V I : ( S)( S)
S2
2
2
dynamic power
Ctotal V f : ( S)( S )( S) S 2
Power delay product delay power( S)( S 2 )
S3
2
2
Power density
power/area : ( S )/( S ) const.
Impact of scaling
Improved packing density: S 2
Improved speed: delay S
Improved power consumption: S 2
However . . .
The above improvements apply to active circuits.
What about passive components?
Also, reduced voltages imply a lower signal to noise ratio.
Concern: Interconnect Delay
R=
tm
L
,
Wtm
C=
LW
ti
ti
Charge Time RC =
L2
tm ti
To first order, delay is independent of W.

This is because increasing W reduces resistance but
increases capacitance in the same ratio.
Unfortunately W is the only parameter that the circuit

designer can decide! (L is fixed by the distance between
the points to be connected, , , tm and ti are decided by
the technology).
Concern: Interconnect Delay
Relative Frequency
Local interconnects scale with device size.
Global interconnects scale with die size.
Normalized Wire length
Interconnect Delay =
2
tm ti L
AL2
For local interconnects, L scales the same way as tm , ti ,

so delay is invariant.
For Global Interconnects, L goes up with die size, while tm and
ti scale down. This leads to a sharp increase in delay.
Buffer Insertion
Global Interconnect delay can be the determining factor for the

speed of an integrated system.
The L2 dependence of interconnect delay is a source of
particular concern.
This problem can be somewhat mitigated by buffer insertion in
long wires.
We define some critical wire length and when a wire segment
exceeds this length, we insert a buffer.
Repeater Insertion in Voltage Mode
What is the optimum wire length after which we should insert a

2
buffer? (Wire Delay = tLm ti = AL2 )
Length = L
Let the wire segment length = L.

Segment wire delay = AL2 .
Let buffer delay =
For n segments, there will be n-1 buffers, and L = nL .
= nAL2 + (n 1) =
L 2
L
L
AL + ( 1) = ALL + ( 1)
L
L
L
Putting the derivative with respect to L = 0 for optimization,

AL
L
= 0, so AL2 =
L2
L should be so chosen that the wire segment delay = .

Total delay is proportional to n and so, is linear in L.
Difficulties with Buffer Insertion
Currently, buffer insertion is the most widely used method to

control interconnect delay.
However, there are several difficulties with buffer insertion.
Buffers consume power and silicon area.
Typically, we do floor planning and layout first and then put

in the interconnects. When the wire length reaches L, we
need to put in a buffer. However, it is quite possible that
there is active circuitry underneath, and there is no room to
put in a buffer!
We either live with buffer insertion at non-optimal wire

lengths or create space by pushing out existing cells and
modifying the lay out.
Problem with bi-directional data transmission

Global interconnects often include data busses, which may

require bidirectional data transmission. (For example, a
bus connecting a processor and memory).
However, buffer insertion fixes the direction of data flow!
We need to replace buffers with bidirectional transceivers.
These require a direction signal, which will enable a buffer

in the desired direction.
This direction signal must also be routed with the bus and
should have its own buffers. It should reach the
bidirectional buffers ahead of the data.
Concern: Signal Integrity
As interconnect wire separation is reduced . . .
There is a serious signal integrity problem because of

electrostatic coupling between long wires.
Inter-signal interference can lead to unpredictable delay

variations.
Grounded shielding wires must often be inserted to avoid

interference.
This leads to extra capacitance and CV 2 f power loss.
Concern: Timing closure
Global interconnects are placed after active circuit design

and layout is complete.
One has to anticipate the wire length, and then design the
active circuits to meet total delay specifications.
If the actual wire length is different from what was

anticipated, one has to re-design the active circuits after
layout.
After a fresh layout, wire lengths and hence, delays are

changed.
This leads to a design-layout-redesign iteration known as

Timing Closure. This iteration becomes longer and longer
when total delays are dominated by interconnect delay.
Promise of current mode signaling
Why not signal with current rather than voltage?
Current rise time is limited by inductance rather than

capacitance. Typically, inductive effects are much smaller
than capacitive effects.
(After all, 4, = 1 for insulators used in ICs).
So electromagnetic coupling is lower than electrostatic
coupling.
Signal voltage swings are limited by scaled down supply

voltages: this does not restrict current swings.
In fact, we can use multiple current values to send more

than one bit down the same wire!
Promise of current mode signaling
If we hold the Voltage on the interconnect nearly constant
Dynamic power is negligible.
Latency is much lower.
We also have the option of using multiple current levels to

transmit multiple bits simultaneously. This can give
Higher Throughput.
Lower interconnect area.
Possibility for improving Latency, Throughput and Power

simultaneously!
Since V 0, while I 6= 0
We need a low (near 0) input impedance receiver.
Digital Designers need not panic!
Only the interface works in current mode. Rest of the circuit is

traditional.
A library circuit does the voltage mode to current conversion
(transmitter) and another converts the current back to voltage
mode (receiver).
To put this plan into action, we need a receiver with very low
input impedance.
(If inductive effects are to be taken into account, we would like
to terminate the line into its characteristic impedance.)
Zero input impedance circuit
Low rin amps are used for photo-detectors.

Vref
v
Mp1
Mp2
v1
i1
Mn1
i2
v2
i1 = gmn1 v1 = gmp1 (v v2 )
i2 = gmn2 v1 = gmp2 v2
mn2 i1
v2 = ggmn2
v1 = ggmp2
gmn1
mp2
i1 = gmp1 v +
Mn2
define
gmn2 /gmn1
gmp2 /gmp1
gmn2 /gmn1
i1
gmp2 /gmp1
then, i1 (1 ) = gmp1 v
This gives rin = (1 )/gmp1
1
C.-K. Kim et al, High Injection Efficiency Readout Circuit for Low
Resistance Infrared Detector, IEE Electronic Letters, 35, 1507, 1999.
Robustness of design
In saturation,
1
W
Cox (Vg VT )2
2
L
r
W
W
So, gm = Cox (Vg VT ) = 2Cox Id
L
L
s
(W /L)n2 I2
gmn2 /gmn1 =
(W /L)n1 I1
s
(W /L)p2 I2
gmp2 /gmp1 =
(W /L)p1 I1
s
(W /L)n2 /(W /L)n1
gmn2 /gmn1
Therefore
=
gmp2 /gmp1
(W /L)p2 /(W /L)p1
Id =
Receiver Design - Input stage
Vref
Iint
Mp1
i1
Mp2
v1
Mn1
i2
Iout
v2
Mn2
Input resistance controlled by geometry of transistors
Interconnect voltage held fixed
Input resistance insensitive to process variations
Reduced swing signaling
Low Swing Voltage mode
Line
Buffer/amp
Low swing
Driver
Low Swing Current Mode

Line
Low swing
Driver
Receiver
RL
In reduced swing voltage mode signaling, the line is not

terminated in a low impedance.
Current mode signaling terminates the line in a low

impedance.
This reduces the time constant, increases bandwidth.
However, this also leads to static power consumption.
Improving Current Mode Signaling

Line
Low swing
Driver
Receiver
RL
Current mode signaling
Consumes Static Power
Direct Trade-off between speed and static power
Possible Improvements
Inductive Peaking
Dynamic Over-driving
Concept of Inductive Peaking
On-chip interconnects can be

modeled as distributed RC which is
essentially a low pass filter.
R0
R0
R0
R0
L
DRIVER
C0
C0
C0
C0
RL
Bandwidth enhancement techniques

used in RF amplifiers can be
employed for bandwidth
enhancement on interconnects
Inductive Peaking: Line termination

circuit exhibits inductive input
impedance
Shows enhancement of about

500MHz in 3dB bandwidth.
Bandwidth Enhancement Vs Load Inductance

For a given line length, the amount

of bandwidth enhancement is a
function of inductance and load
resistance.
Significant bandwidth enhancement

can be achieved for a wide range of
inductance values greater than
Lpeak .
The required inductance for

significant enhancement in
bandwidth is a few hundreds of nano
Henries !!
An active inductor is required
Beta Multiplier: A Gyrator
Vref
v
Mp1
i1
The Beta Multiplier essentially forms a

gyrator circuit with two Gm elements
connected back to back along with the
parasitic capacitance of the transistors.
So Beta Multiplier Circuits can exhibit

inductive input impedance for some
frequency range if designed properly.
Mp2
v1
Mn1
i2
v2
Mn2
Beta Multiplier: Input Impedance
Zin =
1 =
{(1 2 + k2 3 )s2 + (1 + 2 + k(3 + 2 ))s + 1 + k }

{(gmp1 + R13 ){(1 + 1 s)(1 + 2 s)(1 + 4 s)}}
Cg1
gmn1
3 = Cg3 rop1
=
gmp1 /gmp2
gmn1 /gmn2
2 =
4 =
Cg2
gmp2
Cg3
gmp1
R1 =
1
gmn1
R3 = rop1
k=
1
gmn1 rop1
1
gmp1 + rop1
R1
R3
i1 = gmp1 (vint - vg2)

i1
ro_p1
int
Cg1
(1 ) +
Rin =
1/gmp2
Cg3
i2
1/gmn1
Cg2
i2 = gmn2 vg1
Beta Multiplier : Equivalent Circuit
Relative location of poles and zeros determine nature of

impedance (inductive of capacitive)
If the first zero occurs a decade prior to the first pole, input
impedance is inductive
Leff
=
+
gmn11rop1 > 0.9 and any two time constants being equal
ensures that a zero occurs a decade prior to the first pole

Cg1
Cg2
rop1
+
gmp1 rop1 + 1 gmn1 gmp2

Cg3
Cg2
+
gmp2 gmn1 rop1 gmn1 gmp1 rop1
(1
Zin
Req
Ceq
) + gmn11rop1
1
gmp1http://www.satishkashyap.com/
+ rop1
Reff
Ceff
= KCgx
Leq
Beta Multiplier : Input Impedance Control
Beta Multiplier shows an effective inductance of hundreds

of nano Henries for a practical range of input current and
transistor geometries.
Its effective resistance can be controlled by ratios of
transconductances while its effective inductance depends
on the absolute value of transconductance.
It is possible to control Rin and Leff with very little
interaction between the two. Inductance changes from
100nH tohttp://www.satishkashyap.com/
980nH while the value of effective resistance
remains within 12% of its nominal value for 20A change in
the current.
Current Mode Receiver Circuit with Beta Multiplier

Effective impedance offered by the receiver is

equal to the parallel combination of the
impedance offered by individual beta multipliers.
Voltage at input node swings around Vref . Small

voltage swing on the line is sensed and
amplified by the inverting amplifier.
Vref is generated by shorting the input and

output of an inverter to ensure that the value of
Vref is the same as switching threshold of
receiver amplifier across all process corners.
Vdd
Mp11
Mp22
Mn11
Mn22
Source Type
Beta Mult.
Inv Amp
Input
Vref
Mp1
Mp2
Mn1
Mn2
Sink Type
Beta Mult.
rout of Vref generation circuit comes in series with beta

multiplier Zin and hence beta multiplier has to be sized
accordingly.
Vref generation
circuit consumes static power.
Simulation Results
Performance Comparison of three signaling schemes (line=6

mm, Power measured at 1Gbps)
Signaling
Scheme
CMS-BMul(30 mV)[1]
CMS-Diode-CC(30 mV)[2]
Voltage Mode
Delay
(ps)
420
500
1000
Throughput
(Gbps)
2.56
2.45
2.85
Power
( W )
310
380
3000
Area
(m2 )
2.00
2.00
12.53
Inductive termination gives 16% improvement in delay and

about 18 % improvement in power. Also more than 50 %
improvement in delay at the same time an order of
magnitude lower power.
[1] M Dave et. al., ISLPED 2008, [2] V. Venkatraman et. al. ISQED 2005
Concept of Dynamic Overdriving/Pre-emphasis

Current mode transmission can be speeded up by using

high drive current.
However, this increases static power consumption.
One possible solution is to dump high drive current only

when the state of the line needs to be changed from 0 to 1
or from 1 to 0.
When the line remains at 1 or 0 from one bit to the next, we

use a small drive current to maintain the line at the
required voltage.
This is called Dynamic Over Driving.
Dynamic Overdriving essentially means amplifying high

frequency components of the input signal
Possible implementation of Dynamic Overdriving

Steady State (Weak)

Driver
The p channel driver gate is low (enabled)

when the input is 1.
As the line reaches VDD VTp , the upper

p channel transistor turns off, restricting
line voltage swing.
Similarly the n channel driver transistor is

enabled when the input is 0 and the lower
transistor turns off when the input
approaches VTn during discharge.
VDD
Swing Control (High)
p Drive
Input
n Drive
Swing Control (Low)
A. Katoch et. al. ESSCIRC, 2005

Dynamic (Strong)
Driver
VDD
Input
Wire
The feedback inverter acts as an inverting

amplifier converting low swing logic levels on
the wire to full swing (inverted) CMOS logic
level on its output.
Feedback
P channel gate is low (enabled) only when the input is high

AND the line is at 0.
N channel gate is high (enabled) only when the input is low

AND the line is at 1.
Input to the feedback inverter is a low swing level around

VDD /2. Therefore it consumes static power.
Self limiting Strong Driver
Dynamic (Strong)
Driver
Inverter output = 1, NAND output = 0, NOR output = 0
P channel driver dumps current to charge

the line.
VDD
Input
Input = 1, Wire voltage < Vm
Wire
Input = 0, Wire voltage > Vm

Inverter output = 0, NAND output = 1, NOR output = 1
N channel sinks current to discharge the

line.
As soon as low swing logic level on the line = input
Inverter output = input, NAND output = 1, NOR output = 0
Feedback
This disables both drive transistors automatically.
Dynamic Overdriving with Inductive termination?

Dynamic Overdriving (DOD) and Inductive line termination both

essentially amplify high frequency components of input signal.
Can we use both?
Current Mode Signaling Schemes with Ideal

Components http://www.satishkashyap.com/
Following four current mode signaling schemes were simulated:
CMS Scheme with DOD and Resistive Load
CMS Scheme with Simple Driver and Resistive Load
CMS Scheme Inductive Load
CMS Scheme with DOD and Inductive Load
Implementation details of these circuits are:
Dynamic Overdriving driver is implemented by ideal VCCS
with current wave shape as shown in the figure. Controlling
voltage is input.
Simple driver is implemented as VCCS with square wave
shape. The input current ranging from Iavg to +Iavg .
p
p
static
= peak
t
RL = 4k, l = 4H
Iavg
t +I
(tt )
Comparison of Delay
With Large Overdrive (Ipeak = 500A)
Dynamic overdriving shows 5

improvement in delay over RC
Inductive peaking does not offer

substantial additional advantage when
combined with dynamic overdriving.
Inductive peaking alone shows 25% of

improvement in delay over RC
With Small Overdrive (Ipeak = 50A)
Dynamic Overdriving alone and inductive

peaking alone give nearly the same delay
Inductive peaking along with dynamic

overdriving shows around 20%
improvement in delay over dynamic
overdriving alone
Comparison of Throughput (Eye-opening)

Dynamic overdriving improves

throughput by 5 over RC
Inductive peaking does not offer

substantial additional advantage
when combined with dynamic
overdriving.
Inductive peaking shows throughput

enhancement of 26% over RC
Conclusion: Inductive Peaking vs Dynamic Overdrive

For very high data rate applications, dynamic overdriving

alone should be employed as inductive peaking does not
offer any additional advantages
For low power and low data rate applications, the use of
inductive peaking can give 26% improvement in throughput
over RC
inductive peaking can give 16% improvement in delay over
RC
dynamic overdrive along with inductive peaking can further
improve throughput by 20%
Part II
Variation Tolerant Current Mode Signaling
Need for Process Variation Tolerance
Effect of Process Variations on different CMS Schemes
The Proposed Variation Tolerant CMS Scheme
Performance Evaluation
Bidirectional Links
Simulated Performance of Bidirectional Link
Current mode signaling derives its advantages over

voltage mode due to the reduced swing on the line.
Careful design is necessary, otherwise small changes in

device parameters can have a disproportionate effect on
the performance of the system.
In modern short channel processes, variations in transistor

parameters are large some of the parameters can vary
by as much as 60%.
we have to design circuits, so that they are robust with

respect to batch-to-batch variations, as well as variations
between devices on the same die.
Batch-to-batch or inter-die variations can shift operating

points and drive strengths.
Intra-die http://www.satishkashyap.com/
variations cause mismatch in parameters of
transmitter and receiver transistors.
Robustness requirements
Process, Supply Voltage and Temperature variations will

affect the core logic as well as data communication
circuitry.
The requirement for data transmission is therefore not of

complete invariance with respect to PVT variations.
We have to ensure that throughput and delay properties of

the interconnect are at least as good as data generation
and clock rates.
Thus the deterioration in interconnect properties should be

no worse than the deterioration in general logic.
Because global interconnects, by definition, connect

remote points on the die, on chip variations can be of
greater concern.
Effect of common mode voltage mismatch

Ideal
In case of ideal match, small fluctuations

in line voltage are converted to rail to rail
swing by the receiver.
If, however, the mismatch is large, the

small swing on the line may be completely
ignored by the receiver.
It is important, therefore, that the amount

of swing on the line is much more than the
mismatch in common mode voltages.
But high swing will cause power

dissipation.
VcmRx
Transmitter
Receiver
Misaligned
VcmRx
It is better to have smart bias circuits,

which will reduce mismatch and the need
for a large swing.
System parameters affected by variations
Variations in the following parameters have a strong influence

on the performance of the signaling scheme:
1. Ipeak : Peak current supplied by the strong driver during
input transition
2. tp : Duration for which the strong driver is ON
3. V : Line voltage swing at the receiver end in steady state
4. Mismatch between any VCMRx and operating point of an
amplifier
CMS Scheme with Feedback (CMS-Fb)
Strong
Driver
Weak
Driver
VDD
Receiver Eq. Circuit
Wire
Input
LineRx
RxOut
RL
+ Vcm Rx
I1
Feedback
Wire
NAND/NOR generates pulses to turn-on/off the strong

driver
Input transition the strong driver turns on

line voltage at transmitter end crosses VM of inverter I1
strong driver turns off.
Weak driver supplies Istatic and line voltage swing at

receiver http://www.satishkashyap.com/
end is VCMRx Istatic RL
Effect of Inter-die Process Variations on CMS with

feedback
Strong
Driver
Weak
Driver
VDD
Wire
Input
RxOut
LineRx
RL
I1
Feedback
Vcm Rx
Wire
Variations in Ipeak are well compensated due to the

feedback at the driver end.
If the driver is weaker due to process variations, the feed
back system keeps it on for longer till the line reaches the
desired voltage.
This might, however, not be optimum from a power point of
view.
Effect of Intra-die Process Variations on CMS-Fb

VCMRx
VMTx
Line voltage is not constant for

constant low input voltage
During low to high transition

the strong driver is turned off
well before the line voltage
crosses VCMRx
CMS Scheme without Feedback (CMS-Fpw)

Strong
Driver
Weak
Driver
Fixed Width
Pulse Generator
Input
VDD
Wire
LineRx
RxOut
Delay
RL
tp
Vcm Rx
is given by delay element
Less sensitive to intra-die variations
In the skewed corners, sourcing Ipeak and sinking Ipeak are

different, leading to different rise and fall delay
Throughput
can degrade significantly in skewed corners
A.Tabrizi et. al. MWSCAS, 2007
Minimizing Process Dependence
To minimize process dependence, we need smart bias circuits

which sense the process corner and adjust the bias to
compensate for variations.
Vdd
Long Channel transistors show relatively less variation

with process compared to Short Channel transistors in
the same process.
We can make use of this difference to design a bias

generator which senses the process corner and tries
to increase the transistor current in the slow corners
and to decrease it in the fast corners.
Simple bias generators using inverters with input and

output shorted and which use this feature are shown
here.
Short p MOS
Vbp
Long n MOS
Vdd
Long p MOS
Vbn
Short n MOS
Proposed CMS Scheme with Smart Bias
We propose a Dynamic Overdrive scheme in which both the

strong and the weak drivers use constant current sources
controlled by process aware bias generators.
Strong Dr.
Weak Dr.
Vdd
p Bias Gen
Short
pMOS
Vbp
Long
nMOS
Vdd
Wire
Rx
Output
Delay
Input
n Bias Gen
Vdd
Long
pMOS
RxBias
Vbn
Inv.
Amp
Short
nMOS
There is no feedback inverter in the driver circuit
Bias voltages change in the desired direction to keep the

current through
weak and strong drivers the same across
all corners
Effect of Process Variation on the Proposed CMS

Scheme
Ipeak
remains nearly the same across all corners. In

extreme corners, SS and FF, small change in Ipeak is
compensated by the opposite change in tp .
V = Istatic RL remains the same across all corners,

1
RL = gmn +g
mp
The inverter with input-output shorted and the inverter

amplifier are designed using fingers and placed close to
each other so that their switching thresholds are closely
matched across all corners.
This makes the proposed circuit less sensitive to intra die

process variations as well.
Simulation Setup
Foundry specified four corner model files and mismatch

model file for Montecarlo simulations were used.
All the signaling schemes offer the same input capacitance

(equivalent to one minimum sized inverter).
All signaling scheme drive FO4 load.
Line RLC used were: Rline = 244 /mm,

Lline = 1.5nH/mm, Cline = 201fF /mm.
All schemes were designed for a throughput of 2.65Gbps.
Current mode schemes are designed for Ipeak = 500A
Effect of Intra-die Process Variations
Mismatch in VM of inverter can be up to 40 mV. 2 . For

VM-mismatch of 40 mV
CMS system
CMS-Fb
CMS-Fpw
CMS-Bias
Percentage Degradation
Delay
Throughput
25
33
10
14
4
9.5
2
Mismatch Data sheet from the foundry
Effect of Inter-die Process Variations
Signaling System/
Logic Circuit
CMS-Fb
CMS-Fpw
CMS-Bias
Voltage Mode
Ring Oscillator Freq
SS
SNFP FNSP
17.5
5.7
2.9
32
33.6
34.9
18.75
8.2
7.14
27
<1
2.8
23
2.88
3
Interconnects with CMS-Fpw scheme become the

bottleneck in overall performance of the chip in skewed
corners
Degradation in the throughput of the proposed scheme in

the skewed corners is around 7% which is less than that in
CMS-Fpw
scheme
Overall Comparison
Performance Comparison of four signaling schemes (line=6

mm, Power measured at 1Gbps)
Signaling
Scheme
CMS-Fb(90 mV)
CMS-Fpw
Proposed CMS
Voltage Mode
Delay
(ps)
700
503
490
1100
Throughput
(Gbps)
2.56
2.65
2.56
2.85
Power
( W )
146
114
113
655
Area
(m2 )
2.00
2.40
3.07
12.53
The CMS-Fb scheme consumes higher power than other

schemes due to static power consumption in the feedback
inverter
The proposed scheme shows 78% improvement in area

over voltage mode scheme whereas other schemes,
CMS-Fb and CMS-Fpw show 84% and 80% respectively
Overall Comparison
10000
1.5
1
0.5
Data Rate(Mbps)
800
CMS Power <VM Power
400
200
10
12
10
100
1000
Data Rate(Mbps)
600
Line =1.5mm
200
10
100
(c)
Data Rate = 500 Mbps
X 6.6
400
10000
(e) Data Rate=500 Mbps
150
(f)
4 6 8 10 12 14
Line Length (mm)
Line=6mm
1
0.1
50
0
4 5 6 7 8 9 10
0
Line Length (mm)
DODFpw+RxFb [2]
DODFb+RxFb [1]
2
X8
100
200
(d)
600
4
6
8
10
Line Length (mm)
800
(b)
Line=6mm
125 Mbps
1000
Power (uW)
Data Rate=50 Mbps
Energy (pJ)
(a)
Power (uW)
Power (uW)
Delay (ns)
2.5
0.01
10
100
1000
10000
4 6 8 10 12 14
Data Rate (Mbps)
Line Length (mm)
Proposed
DODFpw+RxBMul [3]
Voltage Mode
Bidirectional Links
In many applications, on-chip buses need to carry signal in both

directions.
For example, the bus between processor and memory, main
processor and floating point multiplier etc.
Often bidirectional buffers with direction control are used for
this.
Limitations of Conventional Bidirectional Buffer

Back-to-Back Connected
Tri-state Buffers
En
En=
En
En
One of the two tristate buffers is

enabled at a given time
Two transistors in stack increased

sizes of PMOS and NMOS
Delay of a bidirectional repeater is more

than that of a unidirectional buffer
Direction control signal is required by

each repeater
Buffers offer huge load to direction
control signal
Buffers carrying direction control signal
consume additional power
En
Direction
Signal
Wire
Segment
Wire
Segment
Wire
Segment
En
En
En
En
We need a repeaterless Signaling Scheme
The Proposed Current Mode Bidirectional Link

Employs only two bidirectional transceivers, one at each

end of the line.
Direction signal is required only at two ends of the line
The direction control signal can be the same as one of the

control signal or derived from it based on communication
protocol
Assumption: Direction signal (Tx/Rx) is locally available at

both ends before data transmission starts
Proposed Current-Mode Transceiver
Transmitter Part
Receiver Part
Strong
Driver
Short
PMOS
Weak
Driver
Terminator
Vbp
Tx/Rx
Long
NMOS
Inverter
Amplifier
Vbp
Tx/Rx
Tx_ip_1
In
Data
Delay
element
Vbn
out
Wire
Long
PMOS
Tx_ip_0
Tx/Rx
Vbn
Tx/Rx
Short
NMOS
Either the transmitter

part or the receiver part is enabled at a
time
Speed-Power of Proposed Bidirectional CMS Scheme

Current-Mode Vs. Voltage-Mode

CMBid
(a)
VMBid
(b)
Power (uW)
10e3 Data Rate=500Mbps
Delay (ns)
2.5
2
1.5
1
0.5
0
2
35%
Line Length (mm)
1.7 lower power for 2mm lines

and 7 lower power for 8mm
line
3 4 5 6 7
Line Length (mm)
Power crossover frequency

100Mbps for 4mm long lines
5 reduction in power at 1Gbps

For lines longer than 2mm
communicating at data-rates
more than 180Mbps, the
proposed scheme consumes
less power than voltage-mode
(d)
Line=4mm
1e3
Crossover
Data Rate (Mbps)
Power (uW)
1e2
100Mbps
1e2
35% improvement in delay for

nearly all line lengths
7x
1e3
(c)
10e3
180
5X
100
Data Rate(Mbps)
1000
CMBid
Power
140
100
60
20
2
VMBid
Power
Line Length (mm)
Designed in 180nm http://www.satishkashyap.com/

for Vdd =1.8V using nominal Vt devices
Line Characteristics: R=211/mm and C=0.245pF/mm
Effect on Supply Noise
Peak Current Drawn From Supply
68% reduction in peak current and hence contribution to

is much less
supply noise
80% reduction in active area
Performance of Proposed Scheme in Four Digital

Process Corners
Specs
TT
SS
FF
FNSP
Delay (ns)
VM-Bid CM-Bid
1.35
0.81
1.57
0.90
1.21
0.69
1.35
0.80
Power (W)
VM-Bid CM-Bid
2127
567
2055
435
2163
727
2113
572
For a 4mm line operating at 500Mbps
38% improvement in delay even in worst case (SNFP)
3.45 lower power consumption even in worst case

(SNSP)
Part III
Implementation on Si and Measured Results

On chip measurement
Time to Frequency Conversion
Time to Voltage Conversion
Implementation on a Test Chip
Measurement Results
Bidirectional Lines
Motivation
Delays of on-chip interconnects are of the order of

hundreds of pico-seconds.
It is nearly impossible to measure these off-chip.
We need on chip delay measurement circuits. We have

designed two test circuits based on:
S 0
S 1
RO
RO with
Wire
Tx Wire Rx
Transmission gates were used to

implement switches.
Multiplexer(demultiplexer) are designed so

that delays for both possible paths through
the mux/demux pair are the same.
The floor plan of the circuit is such that the

beginning and the end of the long
interconnect are close to each other.
Therefore when the short path L3 is

chosen, the total delay corresponds to the
delay in inverters, mux/demux etc.
L3
L1
L2
CMS Link
Demux
Mux
(a) Delay Measurement Circuit: Principle
I
N
V
E
R
T
E
R
S
D
E
M
U
X
M
U
X
L1
Transmitter
Wire
L3
Receiver
L2
L3=L1+L2
(b) Delay Measurement with CMS Link: Floorplan
S 0
S 1
RO
RO with
Wire
We first measure the frequency of

oscillation choosing the short wire path
between the demux and mux.
This gives the delay of the measurement

circuit except for the system under test.
We now select the interconnect system

whose delay we want to measure and find
the frequency again.
L3
L1
Tx Wire Rx
L2
CMS Link
Demux
Mux
(a) Delay Measurement Circuit: Principle
I
N
V
E
R
T
E
R
S
D
E
M
U
X
M
U
X
L1
Transmitter
Wire
L3
Receiver
L2
L3=L1+L2
(b) Delay Measurement with CMS Link: Floorplan
Delay = 0.5
1
fRO
1
fsystem
Time to Frequency Conversion: Accuracy
To assess the accuracy of the scheme, we simulated the whole

circuit, for different line lengths up to 14 mm in a 180 nm
process.
The delay through the interconnect scheme was noted

from the simulation results. We call this the Simulated
Delay
The delay was also calculated by the formula:

1
1
0.5
fRO
fsystem
We call this the Calculated Delay
These results were tabulated to assess the expected

accuracy from this test scheme.
Time to Frequency Conversion: Accuracy
Line Length
(mm)
4
6
10
14
Simulated
Delay (ps)
501
661
1068
1575
Calculated
Delay (ps)
507
658
1077
1599
% Error
1.2
0.4
0.8
1.5
Delays are the average of rise and fall delay
Power-delay product can be evaluated using this circuit.
This being a differential measurement, the only source of

error is differences in rise and fall time
Vdd
Vref
Mn0
Mn1
Clock
I
Test Pulse
Input
0
1
Delayed
System Input
Under Test
Pulse Select
Capacitor C is pre-charged to peak value during the

negative phase of the clock.
It is then discharged for a time equal to the delay through

the system.
Delay =
Value of http://www.satishkashyap.com/
k is found experimentally using a calibration pulse
of known duration.
CV
I
= kV
Time to Voltage Conversion: Accuracy
Line
Length
(mm)
4
6
10
14
Simulated Delay
rising
falling
(ps)
(ps)
380
393
478
497
730
769
1065
1149
Calculated Delay
rising
falling
(ps)
(ps)
378
398
482
503
733
781
1078
1171
Error
rising falling
%
%
0.8
1.0
0.8
1.2
0.4
1.8
1.2
1.9
This scheme permits the measurement of rise and fall

delays separately.
Accuracy of about 2% is predicted by simulations.
Current-Mode Signaling Test Chip
1.5mm 1.5mm chip fabricated in 180nm MM/RF process
44-pin die packaged in QFN56 package
Measurement Results
(Frequency measured using a 6-digit frequency counter)

Signaling
Scheme
Voltage Mode
CMS-Fb
CMS-Bias
Delay
(ns)
1.191
1.006
0.938
Energy
(pJ)
4.54
1.52
0.851
EDP
(pJns)
5.328
1.52
0.799
Measured at
Data Rate (Mbps)
371
400
621
The proposed circuit offers 22% improvement in delay and 85%

improvement in EDP over voltage-mode scheme.
Performance of Proposed CMS Scheme
(a) VM
CMSFb
Power (mW)
Delay (ns)
10
1.2
40%
0.8
Energy/bit (pJ)
(c)
5
6
7
Line Length (mm)
180
Power
of
CMSBias
At least 7 lower
power in the worst
process corner
78% gain in active

area
65% reduction in
peak current
Power
of
VM
100
4
5
6
7
Line Length (mm)
140
(d)
Line=6mm
Data Rate=600 Mbps
0.1
Breakeven
Data Rate (Mbps)
0.4
CMSBias
(b)
1.6
8
66.66 Mbps
100
Data Rate(Mbps)
1000
60
20
2
4
5
6
7
Line Length (mm)
Voltage-mode scheme was optimized for delay

separately for every line length
Comparison with Existing Dynamic Overdriving CMS

Schemes
Source
JSSCC
2006
Sim./Measured Meas.
Tech.
130nm
Line (mm)
10
Gain in Delay
32%
Gain in Energy/bit 35.48%
Gain in EDP
56.5%
Data Rate (Gbps)
3
Activity
1.0
CICC
ESSCIRC
This This*
2006 2005(CMS-Fb) work work
Meas.
Meas.
Meas. Sim.
250nm
130nm
180nm 180nm
5
10
6
6
28.3%
53%
22.5% 32%
67%
25%
81.0% 87%
76.8%
65.5%
85% 90%
2
0.7
0.62
1
1.0
NA
1.0
1.0
Comparison With Voltage Mode Buffer Insertion

The proposed dynamic overdriving CMS scheme offers

26-40% improvement in delay over the voltage-mode
scheme for 2mm-8mm long lines.
These also offer improvement in energy consumption over

buffer insertion scheme for lines longer than 2mm
operating at data-rates more than around 66Mbps.
The proposed 6mm long link reduces energy consumption

at least by a factor of 7 compared to the voltage-mode
scheme at 1Gbps.
It offers 85% improvement in Energy Delay Product (EDP)

over voltage-mode scheme.
Comparison With Other Current Mode Schemes

The scheme proposed by us offers 22% improvement in

Power Delay Product (PDP) over the current mode scheme
with feedback proposed by Katoch et al.
The CMS scheme with feedback is sensitive to intra-die

variations. Our CMS scheme remains faster than logic
circuit even in the presence of intra-die and inter-die
process variations.
Measurement Results for Bidirectional Links

Measurement results match simulation results within 20%
Voltage-mode bidirectional link was not put on silicon due

to limited number of pads
Signaling
Scheme
CM-Bid
Delay
(ns)
1.16
Power
(W )
680
PDP
(mWns)
0.788
Data rate
of Measurement(Gbps)
0.56
Matched Model Parameters
BSIM parameters corresponding to this run were extracted
A few main model parameters (BSIM) were changed to

define four process corners (FF,SS,FS,SF)
Main model parameters (BSIM) were adjusted to match

Isat , Vth , Ioff and a few points on measured Ids -Vgs
characteristics of the devices fabricated in this process run.
Simulation with Matched Model Parameters

Parameters
Isatn (mA)
Isatp (mA)
Vtn (mV)
Vtp (mV)
Ioffn (pA)
Ioffp (pA)
Idsn /Idsp @ Vgs
Idsn @0.9 (A)
Idsp @0.9 (A)
Idsn @1.2 (A)
Idsp @1.2 (A)
Idsn @1.8 (A)
Idsn @1.8 (A)
TT
Measured MMP % Match

Basic Device Parameters
6.23
6.44
6.43
99.8
2.40
2.22
2.28
97.3
501
510
506
99.2
494
493
499
98.8
75
170
120
82.4
80
48
58
80.5
Ids Vgs points
66.6
65
66.4
97.85
76.2
70
67.5
96.45
154.4
150
145
96.67
191
170
172
98.82
347
330
317
96
491
440
452
97.27
Measurement Results and Simulation Results with

MMP
CMBid (MMP)
CMSBid (Measured)
VMBid (MMP )
Power (uW)
Delay (ns)
1.7
1.5
1.3
1.1
0.9
2200
1700
1200
700
200
PDP (X 1e12)
1.6
1.7
Vdd (V)
1.8
2.8
2.3
1.8
1.3
0.8
0.3
1.6
1.7
Vdd (V)
Improvement in Specs
For Simulations using MMP
Vdd (V)
1.6
1.6
1.7
Vdd (V)
1.8
Delay(%) Power(x)
1.8
PDP(x)
36.8
4.5
7.2
1.7
34.4
4.39
6.8
1.8
34.21
4.01
6.0
Conclusion
Global interconnects form a major bottleneck for

performance of digital system at scaled down technology.
Use of current mode signaling is promising to remove this
bottleneck.
Through simulation, circuit fabrication and actual
measurements, we have demonstrated that current mode
signaling has overwhelming advantages over the currently
used voltage mode buffer insertion schemes.
We have demonstrated that the particular configuration
suggested by us for a current mode scheme is superior to
other current mode schemes.
Our scheme is robust with respect to batch to batch
parametric variations and to on chip parametric variation.
Therefore we assert that it is a practical option for use in
modern systems
for implementing both unidirectional and
bidirectional data links.
Current Mode Interconnect

Marshnil Dave, Maryam Shojaei Baghini, Dinesh Sharma
Department Of Electrical Engineering
Indian Institute Of Technology, Bombay
December 2, 2010
Contents
1 Introduction
1.1 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Unscaled Interconnect Delay . . . . . . . . . . .
1.2 Buffer Insertion for Delay Reduction . . . . . . . . . .
1.2.1 Optimum Buffer Insertion . . . . . . . . . . . .
1.3 Concerns with Voltage mode Buffer Insertion Technique
1.3.1 Timing closure . . . . . . . . . . . . . . . . . .
1.3.2 Problem with bi-directional data transmission .
1.3.3 Signal Integrity . . . . . . . . . . . . . . . . . .
1.4 Current signaling . . . . . . . . . . . . . . . . . . . . .
1.4.1 Zero input impedance circuit . . . . . . . . . . .
1.5 Other low impedance line terminations . . . . . . . . .
1.5.1 Digital Designers need not panic! . . . . . . . .
1.6 Reduced swing signaling . . . . . . . . . . . . . . . . .
1.7 Improvment in Current Mode Signaling . . . . . . . . .
1.7.1 Inductive Peaking . . . . . . . . . . . . . . . . .
1.7.2 Simulation Results . . . . . . . . . . . . . . . .
1.7.3 Dynamic Overdriving . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
2
3
3
4
4
5
5
5
6
8
8
9
10
10
15
15
2 Variation Tolerant Current Mode Signaling

2.1 Need for Process Variation Tolerance . . . . . . . . . . . . .
2.2 Robustness requirements . . . . . . . . . . . . . . . . . . . .
2.2.1 Effect of Process, Voltage and Temperature Variation
2.2.2 Effect of common mode voltage mismatch . . . . . .
2.3 System parameters affected by variations . . . . . . . . . . .
2.4 A brief review of Current Mode Signaling Schemes . . . . . .
2.4.1 CMS Scheme with Feedback (CMS-Fb) . . . . . . . .
2.5 Effect of Process Variations on different CMS Schemes . . .
2.5.1 CMS Scheme with Feedback (CMS-Fb) . . . . . . . .
2.5.2 CMS Scheme with fixed pulse width (CMS-Fpw) . .
2.6 The Proposed Variation Tolerant CMS Scheme . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
22
22
22
22
23
23
24
24
25
25
26
27
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2.7
2.8
Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bidirectional Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8.1 Simulated Performance of Bidirectional Link . . . . . . . . . . . . . . .
28
30
31
Chapter 1
Introduction
1.1
Scaling
VLSI technology has used device scaling to continually improve the performace of circuits.
In constant field scaling, all device dimensions as well as all voltages are scaled down by
some factor S. This leads to improved packing density: ( S 2 ), improved speed (delay S),
and improved power consumption ( S 2 ). However these improvements apply only to active
circuits. What about passive components?
1.1.1
Unscaled Interconnect Delay
Consider an interconnect in a chip. This is made of a metal layer of thickness tm running over
an insulator of thickness ti .
tm
ti
Figure 1.1: Delay through an Interconnect
R=
L
,
W tm
C=
LW
ti
L2
(1.1)
tm ti
To first order, delay is independent of W. This is because increasing W reduces resistance
but increases capacitance in the same ratio. Unfortunately W is the only parameter that the
circuit designer can decide! (L is fixed by the distance between the points to be connected,
Charge Time RC =
, , tm and ti are decided by the technology).
Relative Frequency
If we see the distribution of wirelengths on a design, there are a large number of wires
with short lenths which connect a gate to the other locally. At the same time, there is a con-
Normalized Wire length
Figure 1.2: Notional distribution of wire lengths on a chip

siderable number of much longer wires which run over the entire chip. These include clocks,
power on reset signals, power supply lines, data buses etc. These are the global interconnects.
While local interconnects scale with device size, global interconnects scale with die size.
From eqn 1.1
2
Interconnect Delay =
L AL2
(1.2)
tm ti
For local interconnects, L scales the same way as tm and ti , so delay is invariant. However, even
as the transistor sizes are scaled down as the technology advances, average chip sizes show an
increasing trend. This is because the complexity of systems that we put on integrates circuits
has increased at a rate higher than the rate at which device geometries shrink. Therefore,
for Global Interconnects, L goes up with die size, while tm and ti scale down. This leads to a
sharp increase in delay.
1.2
Buffer Insertion for Delay Reduction
Global Interconnect delay can be the determining factor for the speed of an integrated system.
The L2 dependence of interconnect delay is a source of particular concern. This problem can
be somewhat mitigated by buffer insertion in long wires. We define some critical wire length
L and when a wire segment exceeds this length, we insert a buffer.
1.2.1
Optimum Buffer Insertion
What is the optimum wire length after which we should insert a buffer? Consider a long wire
in which we insert buffers after every segment of length L. From eqn 1.2,
Segment wire Delay =
L2
= AL2
tm ti
Let buffer delay = . For n segments, there will be n-1 buffers, and L = nL . If the total
Length = L
Figure 1.3: A buffered interconnect line

delay is denoted by
= nAL2 + (n 1) =
L
L
L
2
AL
+
(
1)
=
ALL
+
(
1)
L
L
L
Putting the derivative with respect to L = 0 for optimization,

AL
L
= 0, so AL2 =
2
L
(1.3)
Since AL2 is the wire delay for the segment, this equation tells us that L should be so chosen
that the wire segment delay = . Total delay is proportional to n and so, is linear in L.
1.3
Concerns with Voltage mode Buffer Insertion Technique
Currently, buffer insertion is the most widely used method to control interconnect delay.
However, there are several difficulties with buffer insertion. Buffers consume power and silicon
area. Also, we normally do floor planning and layout first and then put in the interconnects.
When the wire length reaches L, we need to put in a buffer. However, it is quite possible that
at this point, there is active circuitry underneath, and there is no room to put in a buffer!
Then we either have to live with buffer insertion at non-optimal wire lengths or create space
by pushing out existing cells and modifying the lay out.
1.3.1
Timing closure
Global interconnects are placed after active circuit design and layout is complete. One has to
anticipate the wire length, and then design the active circuits to meet total delay specifications.
If the actual wire length is different from what was anticipated, one has to re-design the active
circuits after layout. After a fresh layout, wire lengths and hence, delays are changed. This
leads to a design-layout-redesign iteration known as Timing Closure. This iteration becomes
longer and longer when total delays are dominated by interconnect delay.
1.3.2
Problem with bi-directional data transmission
Global interconnects often include data busses, which may require bidirectional data transmission. (For example, a bus connecting a processor and memory). However, buffer insertion
fixes the direction of data flow! Therefore, if we need bidirectional transmission, we need to
replace buffers with bidirectional transceivers. These require a direction signal, which will
enable the buffers pointing in the desired direction. This direction signal must also be routed
with the bus (and should have its own buffers) and it should reach the bidirectional buffers
ahead of the data.
1.3.3
Signal Integrity
As interconnect wire separation is reduced, there is a serious signal integrity problem because
of electrostatic coupling between long wires. Inter-signal interference can lead to unpredictable
delay variations. Grounded shielding wires must often be inserted to avoid interference. This
leads to extra capacitance and CV 2 f power loss.
1.4
Current signaling
Because of these problems with voltage mode signaling, we propose that 1s and 0s be signaled
by the presence or absence of a current and not by a high or a low voltage. This has several
advantages:
Current rise time is limited by inductance rather than capacitance. Typically, inductive
effects are much smaller than capacitive effects. (After all, 4, = 1 for insulators
used in ICs). So electromagnetic coupling is lower than electrostatic coupling.
Signal voltage swings are limited by scaled down supply voltages: this does not restrict
current swings.
In fact, we can use multiple current values to send more than one bit down the same
wire!
If we hold the Voltage on the interconnect nearly constant dynamic power will be negligible
and latency will be much lower.
We also have the option of using multiple current levels to transmit multiple bits simultaneously. This can give higher Throughput and lower interconnect area.
Current mode transmission offers the possibility for improving Latency, Throughput and
Power simultaneously!
Since V 0, while I 6= 0,
We need a low (near 0) input impedance receiver.
1.4.1
Zero input impedance circuit
Low rin amps are used for photo-detectors [?]. Once such configuration is shown below: This
Vref
v
Mp1
i1
Mp2
v1
i2
Mn1
v2
Mn2
Figure 1.4: Low input impedance Beta Multiplier Circuit

circuit uses complementary current mirrors feeding each other. This configuration is also
known as a beta multiplier. To derive its input impedance, we can write small signal currents
and voltages as:
i1 = gmn1 v1
= gmp1 (v v2 )
i2 = gmn2 v1
= gmp2 v2
i1
v2 = ggmn2
v1 = ggmn2
mp2
mp2 gmn1
i1 = gmp1 v +
gmn2 /gmn1
i1
gmp2 /gmp1
We define
gmn2 /gmn1
gmp2 /gmp1
(1.4)
then, i1 (1 ) = gmp1 v
Which gives rin = (1 )/gmp1
(1.5)
By making close to 1, we can reduce the input impedance to 0. In fact we can set the
input impedance to any value, (for example, the characteristic impedance of a transmission
line) by a proper choice of and gmp1 . However, we should make sure that does not exceed
1, because that will lead to a negative input impedance, and instability. Therefore it is of
some interest to determine how accurately we may set the value of inspite of power supply,
process and temperature variations.
Robustness of design
In saturation,
1
W
Id = Cox (Vg VT )2
2
L
W
So, gm = Cox (Vg VT ) =
L
gmn2 /gmn1
gmp2 /gmp1
2Cox
W
Id
L
v
u
u (W/L)n2 I2
=t
(W/L)n1 I1
v
u
u (W/L)p2 I2
=t
(W/L)p1 I1
v
u
(W/L)n2 /(W/L)n1
gmn2 /gmn1 u
Therefore
=t
gmp2 /gmp1
(W/L)p2 /(W/L)p1
(1.6)
This means that depends only on transistor geometries and is independent of supply voltage,
bias values, transistor parameters or temperature. This enables us to choose a value of very
close to 1, which in turn can provide very low input impedence.
Receiver Design - Input stage
Just by adding another current mirror transistor and a current to voltage converter, we can
use the beta multiplier as a receiver for current mode data signaling.
Iint
Vref
Mp1
i1
Mp2
v1
i2
Mn1
Iout
v2
Mn2
Figure 1.5: A Beta Multiplier based Current Mode Receiver

The input resistance is controlled largely by the geometry of transistors. The beta multiplier also has the property that it drives its own input through a low output impedance to
bring it to the same voltage as Vref . Thus the interconnect voltage is held fixed. The Input
resistance is largely insensitive to process variations. The only dependence comes through
8
gmp1 , but since it is multiplied by 1 which is close to 0, the sensitivity to variations is

quite low.
1.5
Other low impedance line terminations
The beta multiplier is not the only choice for providing low input impedance. Simpler circuits
like a diode connected MOS transistor are often used. Another option is to use an inverter
with its output shorted to its input as the termination. This is equivalent to terminating the
line to ground through a diode connected n channel transistor and to Vdd through a diode
connected p channel transistor. The effective terminating admittance is the sum of gm values
of n and p channel transistors.
Indeed in our later work, we have preferred a reference inverter with its output shorted
to input as the line termination. Low input impedance can be achieved by adjusting the
Vdd
Figure 1.6: Alternative circuit for Low impedance Termination

geometry of the p and n channel transistors. This termination is faster because of the absence
of parasitic capacitances contributed by the beta multiplier transistors. The termination holds
the line at a DC potential which is matched to the transition voltage of the amplifier inverter
which follows the termination.
1.5.1
Digital Designers need not panic!
We suggest that only the interface works in current mode. Rest of the circuit remains traditional.
A library circuit will do the voltage mode to current conversion (transmitter) and another
will convert the current back to voltage mode (receiver).
9
To put this plan into action, we need a receiver with very low input impedance. (If inductive
effects are to be taken into account, we would like to terminate the line into its characteristic
impedance.)
1.6
Reduced swing signaling
The main advantage of the current mode signaling comes from the fact that the line voltage
is held nearly constant. This is somewhat similar to low swing signaling in voltage mode.
Low swing signaling in voltage mode involves driving high capacitive loads like interconnects
Low Swing Voltage mode
Line
Buffer/amp
Low swing
Driver
Figure 1.7: Reduced Swing Voltage Mode Signaling

to re-defined levels for 0 and 1 which drastically reduce the voltage swing on the load. The
levels are restored to the usual CMOS levels at the receiver end by amplification. This can
drastically reduce the power required by line drivers
It is important to distinguish between reduced swing voltage mode signaling and current
mode signaling.
Line
Low swing
Driver
Receiver
RL
Figure 1.8: Current Mode signaling
In reduced swing voltage mode signaling, the line is not terminated in a low impedance.
Current mode signaling terminates the line in a low impedance.
This reduces the time constant, increases bandwidth.
However, this also leads to static power consumption.
10
1.7
Improvment in Current Mode Signaling
Traditional current mode signaling consumes Static Power and presents a trade-off between
speed, static power and signal to noise ratio. Its performance can be improved by two techniques:
Inductive Peaking
Dynamic Over-driving
1.7.1
Inductive Peaking
On-chip interconnects can be modeled as distributed RC lines which is essentially a low

pass filter. This results in severe attenuation of high frequency components of the signal
arriving at the receiver end. This can be corrected by bandwidth enhancement techniques
used in RF amplifiers. This involves inductive peaking where the line termination circuit
exhibits inductive input impedance. Current flowing through the inductor will produce a
voltage (jL)i, which increases with frequency. Thus, this can counteract the high frequency
attenuation due to the line.
R0
R0
R0
R0
L
DRIVER
C0
C0
C0
C0
RL
Figure 1.9: Inductively Terminated Line

We performed simulations in which the interconnect line was represented by a realistic
LCR segmented line. This was then terminated with resistive/inductive loads of different
values. Results of the simulation are shown in fig. 1.10 for a 4mm long line terminated
with a 1K resistor in series with different inductance values. The transfer function of the
terminated line is plotted as a function of frequency on a log-log scale in fig. 1.10 (a). For a
given line length, the amount of bandwidth enhancement is a function of inductance and load
resistance. The bandwidth increases with inductance upto a point and after that it remains
fixed at that value. As can be seen, we can achieve enhancement of about 500MHz in 3dB
bandwidth in this example for an inductive termination of 100 nH. (Because of the log scale,
the separation between the curves does not truely reflect the amount by which the bandwidth
has been increased). The bandwidth enhancement remains at roughly the same value for
larger inductances. We designate the inductance at which the improvement in bandwidth
11
(a)
(b)
Figure 1.10: Effect of Inductive Termination on Bandwidth

saturates as Lpeak . As seen from fig 1.10 (b), The dependence on L is not very critical as long
as the value is greater than Lpeak . The required inductance for significant enhancement in
bandwidth Lpeak is of the order of a few hundreds of nano Henries. This cannot be conveniently
made from spiral inductors etc. Therefore for a practical implementation, we need an active
inductor.
Beta Multiplier: A Gyrator
The beta multiplier circuit suggested earlier for achieving low input resistance values can
infact be used to simulate inductances of required values. The Beta Multiplier essentially
Vref
v
Mp1
i1
Mp2
v1
i2
Mn1
v2
Mn2
forms a gyrator circuit with two Gm elements connected back to back along with the parasitic capacitance of the transistors. So Beta Multiplier Circuits can exhibit inductive input
impedance for some frequency range if designed properly.
12
Beta Multiplier: Input Impedance

The input impedance of the beta multiplier is calculated by taking parasitic capacitances into
account.
i1 = gmp1 (vint - vg2)

int
i1
ro_p1
1/gmp2
Cg3
i2
1/gmn1
Cg1
Cg2
i2 = gmn2 vg1
Figure 1.11: Small Signal Equivalent Circuit of Beta Multiplier
We define:
g1
1 gCmn1
3 Cg3 rop1
2
4
Cg2
gmp2
Cg3
gmp1
gmp1 /gmp2
gmn1 /gmn2
1
R1 gmn1
R3 rop1
R1
R3
Then the input impedance can be shown to be:

{(1 2 + k2 3 )s2 + (1 + 2 + k(3 + 2 ))s + 1 + k }
Zin =
{(gmp1 + R13 ){(1 + 1 s)(1 + 2 s)(1 + 4 s)}}
(1.7)
Correspondingly, the resistive part of the input impedance can be expressed as:
Rin =
1
gmn1 rop1
1
gmp1 + rop1
(1 ) +
Beta Multiplier : Equivalent Circuit

The nature of input impedance (inductive of capacitive) is determined by the relative location
of poles and zeros. If the first zero occurs at least a decade prior to the first pole, the input
impedance is inductive. To ensure that a zero occurs a decade prior to the first pole, we have
to choose operating currents etc., such that gmn11rop1 > 0.9 and any two time constants
are equal. Under these conditions, we may approximate the input impedance of the beta
13
Zin
Req
Ceq
Leq
Figure 1.12: Equivalent circuit for the Beta Multiplier
multiplier by the equivalent circuit shown in fig 1.12

Where
Lef f
Ref f
rop1
Cg1
Cg2
=
+
gmp1 rop1 + 1 gmn1 gmp2
)
Cg2
Cg3
+
+
gmp2 gmn1 rop1 gmn1 gmp1 rop1
(1 ) + gmn11rop1
=
1
gmp1 + rop1
Cef f = KCgx
(1.8)
(1.9)
(1.10)
(1.11)
Beta Multiplier : Input Impedance Control

We are interested in using an inductor whose value should be in hundreds of nano Henries. We
want to find if these values can be achieved under reasonable bias and geometry conditions.
We therefore evaluated the input impedance of the beta multiplier under various operating
conditions. As can be seen from the figure, the beta multiplier shows an effective inductance
Figure 1.13: Bandwidth enhancement with Beta multiplier termination

14
of hundreds of nano Henries for a practical range of input current and transistor geometries.
Its effective resistance can be controlled by ratios of transconductances while its effective
inductance depends on the absolute value of transconductance. It is possible to control Rin
and Lef f with very little interaction between the two. Inductance changes from 100nH to
980nH while the value of effective resistance remains within 12% of its nominal value for
20A change in the current.
Current Mode Receiver Circuit with Beta Multiplier
Vdd
Source Type
Beta Mult.
Mp11
Mp22
Mn11
Input
Mn22 Inv Amp

Vref
Mp1
Mp2
Mn1
Mn2
Sink Type
Beta Mult.
Figure 1.14: Current mode receiver with inductinve peaking using beta multipliers
We can design a current mode receiver with inductive peaking using two beta multipliers
as shown in fig. 1.14 above. One of the beta multipliers sources current while the other sinks
current. The Effective impedance offered by the receiver is equal to the parallel combination
of the impedance offered by individual beta multipliers. Voltage at the input node swings
around Vref . The small voltage swing on the line is sensed and amplified by the inverting
amplifier. Vref is generated by shorting the input and output of an inverter to ensure that
the value of Vref is the same as the switching threshold of receiver amplifier across all process
corners.
rout of Vref generation circuit comes in series with beta multiplier Zin and hence beta
multiplier has to be sized accordingly.
Vref generation circuit consumes static power.
15
1.7.2
Simulation Results
To see the effectiveness of inductive termination, we should compare the power as well as speed
of the voltage mode buffer insertion scheme, Diode connected MOS terminated current mode
scheme and the beta multiplier based inductive peaking scheme. Simulations were performed
for a 6mm long line at a rate of 1 Gbps. Results of the comparison are summarized in the
table below:
(line=6 mm, Power measured at 1Gbps)
Signaling
Delay Throughput Power Area
Scheme
(ps)
(Gbps)
( W ) (m2 )
CMS-BMul(30 mV)[1]
420
2.56
310
2.00
CMS-Diode-CC(30 mV)[2] 500
2.45
380
2.00
Voltage Mode
1000
2.85
3000
12.53
Inductive termination gives 16% improvement in delay and about 18 % improvement in power
compared to Diode termination. Compared to Voltage Mode scheme, we see more than 50 %
improvement in delay and an order of magnitude lower power [?, ?].
1.7.3
Dynamic Overdriving
Inductive peaking attempts to correct the low pass nature of the line by putting a high pass
termination at the receiver end. However, by the time the signal reaches the receiver, its
high frequency components have been severely attenuated. Therefore boosting them back to
normal level will also boost high frequency noise.
Rather than boosting the high frequency components at the receiver end, why dont we
boost them before attenuation at the transmitter itself? This technique of boosting the high
frequency components before passing them through a low pass channel is know as preemphasis.
Concept of Dynamic Overdriving/Pre-emphasis
Current mode transmission can be speeded up by using high drive current. However, this
increases static power consumption. One possible solution is to dump high drive current only
when the state of the line needs to be changed from 0 to 1 or from 1 to 0. When the line
remains at 1 or at 0 from one bit to the next, we use a small drive current to maintain the
line at the required voltage. This is called Dynamic Over Driving. Dynamic Overdriving
essentially means amplifying high frequency components of the input signal
16

The transmitter end contains a weak driver and a strong driver. The strong driver is enabled
only when a level change is needed from 0 to 1 or from 1 to 0.
Weak Driver
The weak driver provides the minimal drive required to keep the line (terminated by low
impedance) at the desired voltage level. When the input is 1, the p channel driver gate is low
VDD
Swing Control (High)
p Drive
Input
n Drive
Swing Control (Low)
Figure 1.15: Steady State (Weak) Driver

(enabled). This charges up the output. As the line voltage reaches VDD VT p , the upper p
channel transistor turns off, restricting line voltage swing in the up direction.
Similarly when the input is 0 the n channel driver transistor is enabled by a high level at
its gate. The transistor discharges the line. However, when the line voltage approaches VT n
during discharge, the lower transistor turns off, stopping the discharging process.
Thus the line can only swing beween VDD VT p and VT n . [?]
Strong Driver
The strong driver should be enabled only when the input and the level on the output line do
not represent the same logic. The feedback inverter acts as an inverting amplifier converting
low swing logic levels on the wire to full swing (inverted) CMOS logic level on its output. The
P channel gate is low (enabled) only when both inputs to the NAND are 1. This will happen
only when the input is high AND the line is at 0. This is indeed the condition when we want
17
VDD
Input
Wire
Feedback
Figure 1.16: Dynamic (Strong) Driver
the strong driver to charge the line.

The N channel gate is high (enabled) only when both inputs to the NOR gate are 0. This
will happen only when the input is low AND the line is at 1.
Notice that the input to the feedback inverter is a low swing level around VDD /2. Therefore it consumes static power.
The action of the strong driver is self limiting. This is because both NAND and NOR
receive the input and the inverted logic level of the line. If the input and the logic level of
the line are the same, NAND and NOR are fed with input and input. Thus one of the inputs
to NAND/NOR is 1, while the other is 0. This ensures that the output of NAND is 1, while
that of NOR is 0, so that both the p and n channel transistors are OFF. Therefore the strong
driver does not need a series transistor as was the case for the weak driver.
When the Input = 1 and Wire voltage < Vm ,
the inverter output = 1, NAND output = 0 and NOR output = 0.
The P channel driver is ON and dumps current to charge the line.
When the Input = 0 and Wire voltage > Vm ,
the inverter output = 0, NAND output = 1 and NOR output = 1.
the N channel driver is ON and sinks current to discharge the line.
As soon as low swing logic level on the line becomes equal to the logic level at the input
Inverter output = input,
and so NAND output = 1, NOR output = 0;
which disables both drive transistors automatically.
18
Dynamic Overdriving with Inductive termination?

Dynamic Overdriving (DOD) and Inductive line termination both essentially amplify high
frequency components of input signal. Can we use both?
Figure 1.17: Current drive from a Dynamic Over Drive (DOD) type transmitter
To answer this question, the following four current mode signaling schemes were simulated:
CMS Scheme with DOD and Resistive Load
CMS Scheme with Simple Driver and Resistive Load
CMS Scheme Inductive Load
CMS Scheme with DOD and Inductive Load
Dynamic Overdriving driver was implemented by an ideal voltage controlled current source
(VCCS) with the output current wave shape as shown in fig 1.17. The Simple driver was
implemented as a Voltage Controlled Current Sounce with a square output current wave
shape. The drive current in this case is Iavg for a 0 at the input and +Iavg for a 1 at the
input. For a fair comparison, Iavg for the simple driver is equal to the weighted mean of the
current used for dynamic overdrive transmitter.
Iavg =
Ipeak tp + Istatic (t tp )
t
For this comparison, we used terminations of

RL = 4k, L = 4H
19
(1.12)
Comparison of Delay
With Large Overdrive (Ipeak = 500A)
Dynamic overdriving shows 5 improvement in delay over RC
Inductive peaking does not offer substantial additional advantage when combined with
dynamic overdriving.
Inductive peaking alone shows 25% of improvement in delay over RC
With Small Overdrive (Ipeak = 50A)
Dynamic Overdriving alone and inductive peaking alone give nearly the same delay
Inductive peaking along with dynamic overdriving shows around 20% improvement in
delay over dynamic overdriving alone
20
Comparison of Throughput (Eye-opening)

We apply a random sequence of bits to the input at a given data rate and observe the wave
form at the receiver. The wave form, when observed for two clock periods, looks like a pair of
eyes and is known as the eye diagram. Wide open eyes in the vertical direction represent
good signal to noise ratio as the 1 level and the 0 level are well separated. Goof eye opening
in the time direction represents low timing jitter in the arrival time of bits which is also a
desirable feature.
As the data rate is increased, The eye closes in the vertical direction, as there is not
sufficient time for the driver to charge/discharge the line. Assuming that the receiver is
capable of resolving a 30mV input to a full rail to rail swing output, we determine the data
rate at which the eye opening is reduced to 30mV. This is the maximum throughput which can
be supported by the interconnect. Using this criterion, We can now compare the throughput
for the different schemes. We find that
Dynamic overdriving improves throughput by 5 over RC
Inductive peaking does not offer substantial additional advantage when combined with
dynamic overdriving.
Inductive peaking shows throughput enhancement of 26% over RC
Conclusion: Inductive Peaking vs Dynamic Overdrive
For very high data rate applications, dynamic overdriving alone should be employed as
inductive peaking does not offer any additional advantages
For low power and low data rate applications, the use of inductive peaking can give 26%
improvement in throughput and 16% improvement in delay over RC.
For low power and low data rate applications, the use of dynamic overdrive along with
inductive peaking can further improve the throughput by 20%
21
Figure 1.18: Eye diagram for different schemes at data rates where the eye opening is 32
mV
22
Chapter 2
Variation Tolerant Current Mode
Signaling
2.1
Current mode signaling derives its advantages over voltage mode due to the reduced swing on
the line. Careful design is necessary, otherwise small changes in device parameters can have a
disproportionate effect on the performance of the system. In modern short channel processes,
variations in transistor parameters are large some of the parameters can vary by as much
as 40% of their nominal values. We have to design circuits, so that they are robust with
respect to batch-to-batch variations, as well as variations between devices on the same die.
Batch-to-batch or inter-die variations can shift operating points and drive strengths, while
intra-die variations cause mismatch in parameters of transmitter and receiver transistors.
2.2
Robustness requirements
Process, Supply Voltage and Temperature (PVT) variations will affect the core logic as well
as data communication circuitry. The requirement for data transmission is therefore not of
complete invariance with respect to PVT variations. We have to ensure that throughput and
delay properties of the interconnect are at least as good as data generation and clock rates.
Thus the deterioration in interconnect properties should be no worse than the deterioration
in general logic.
2.2.1
Effect of Process, Voltage and Temperature Variation
Due to process, voltage and temperature variations, the drive capabilities and operating
points of various circuits used for data transmission will vary. The cumulative effect of all
23
these variations on the performance of the interconnect scheme.
2.2.2
Effect of common mode voltage mismatch
Because global interconnects, by definition, connect remote points on the die, on chip variations can, in fact, be of even greater concern. On chip variations will result in different
common mode voltages at the transmitter and the receiver end. In case of ideal match, small
Ideal
VcmRx
Transmitter
Receiver
Misaligned
VcmRx
Figure 2.1: Mismatched common mode voltages at Transmitter and Receiver

fluctuations in line voltage are converted to rail to rail swing by the receiver. If, however, the
mismatch is large, the small swing on the line may be completely ignored by the receiver. It is
important, therefore, that the amount of swing on the line is much more than the mismatch in
common mode voltages. But high swing will cause power dissipation. Therefore, it is better
to have smart bias circuits, which will reduce mismatch and the need for a large swing.
2.3
System parameters affected by variations
Variations in the following parameters have a strong influence on the performance of the
signaling scheme:
1. Ipeak : Peak current supplied by the strong driver during input transition
2. tp : Duration for which the strong driver is ON
24
3. V : Line voltage swing at the receiver end in steady state

4. Mismatch between VCM Rx and operating point of an amplifier
2.4
A brief review of Current Mode Signaling Schemes
Several current mode signaling schemes have been suggested in the literature. We shall
concentrate on three schemes here.
2.4.1
This scheme uses feedback at both the transmitter and the receiver ends to adjust the operating points of these circuits. [?] The transmitter used by this scheme is shown below:
The feedback inverter converts low swing logic levels on the line to full rail to rail CMOS
Strong
Driver
Weak
Driver
VDD
Input
Wire
Feedback
From
Wire
I1
Figure 2.2: Transmitter used by CMS scheme with feedback

levels. The NAND/NOR gates ensure that the strong driver is turned on only during data
transitions and is turned off as soon as the line crosses the swithing point of the feedback
inverter to make the logic level on the line equal to the input. The weak driver supplies Istatic
and the line voltage swing at the receiver end is VCM Rx Istatic RL The receiver also uses
feedback to adjust its common-mode voltage. Take the case where VCM T x at the transmitter
end
25
2.5
2.5.1
Effect of Process Variations on different CMS Schemes

Strong
Driver
Weak
Driver
VDD
Wire
Input
LineRx
RxOut
RL
I1
Feedback
Vcm Rx
Wire
Figure 2.3: Current Mode Scheme with Feedback (CMS-fb)
Effect of Inter-die Process Variations on CMS with feedback

Variations in Ipeak are well compensated due to the feedback at the driver end.
If the driver is weaker due to process variations, the feed back system keeps it on for
longer till the line reaches the desired voltage.
This might, however, not be optimum from a power point of view.
Effect of Intra-die Process Variations on CMS-Fb
If the VCM T x for the feedback inverter at the transmitter end is not the same as the VCM Rx
for the receiver amplifier, this scheme does not work very well. Take the case where VCM T x
V
VCMRx
VMTx
Figure 2.4: Mismatched common mode voltages at Transmitter and Receiver

at the transmitter end is lower than the VCM Rx at the receiver end. During the low to high
transitions the strong driver will be turned off well before the line voltage crosses VCM Rx .
This can result in very slow charging of the line after the strong driver is turned off, leading
to a low throughput. In an extreme case, the line voltage may never reach VCM Rx , leading to
26
malfunction.
The same phenomenon will occur for the high to low transition if VCM T x > VCM Rx .
2.5.2
CMS Scheme with fixed pulse width (CMS-Fpw)

Strong
Driver
Weak
Driver
VDD
Fixed Width
Pulse Generator
Input

Wire
LineRx
RxOut
Delay
RL
Vcm Rx
tp is given by delay element

Less sensitive to intra-die variations
In the skewed corners, sourcing Ipeak and sinking Ipeak are different, leading to different
rise and fall delay
Throughput can degrade significantly in skewed corners
[?]
27
2.6
The Proposed Variation Tolerant CMS Scheme
Minimizing Process Dependence

To minimize process dependence, we need smart bias circuits which sense the process corner
Vdd
Vdd
Short p MOS
Vbp
Long n MOS
Long p MOS
Vbn
Short n MOS
and adjust the bias to compensate for variations.

Long Channel transistors show relatively less variation with process compared to Short
Channel transistors in the same process.
We can make use of this difference to design a bias generator which senses the process
corner and tries to increase the transistor current in the slow corners and to decrease it
in the fast corners.
Simple bias generators using inverters with input and output shorted and which use this
feature are shown here.
Proposed CMS Scheme with Smart Bias
We propose a Dynamic Overdrive scheme in which both the strong and the weak drivers use
constant current sources controlled by process aware bias generators.
Strong Dr.
Weak Dr.
Vdd
p Bias Gen
Short
pMOS
Vbp
Long
nMOS
Vdd
Wire
Rx
Output
Delay
Input
n Bias Gen
Vdd
Long
pMOS
RxBias
Inv.
Amp
Vbn
Short
nMOS
There is no feedback inverter in the driver circuit

Bias voltages change in the desired direction to keep the current through weak and
strong drivers the same across all corners
Effect of Process Variation on the Proposed CMS Scheme
Ipeak remains nearly the same across all corners. In extreme corners, SS and FF, small
change in Ipeak is compensated by the opposite change in tp .
28
1
V = Istatic RL remains the same across all corners, RL = gmn +g
mp
The inverter with input-output shorted and the inverter amplifier are designed using
fingers and placed close to each other so that their switching thresholds are closely
matched across all corners.
This makes the proposed circuit less sensitive to intra die process variations as well.
2.7
Performance Evaluation
Simulation Setup
Foundry specified four corner model files and mismatch model file for Montecarlo simulations were used.
All the signaling schemes offer the same input capacitance (equivalent to one minimum
sized inverter).
All signaling scheme drive FO4 load.
Line RLC used were: Rline = 244 /mm, Lline = 1.5nH/mm, Cline = 201f F /mm.
All schemes were designed for a throughput of 2.65Gbps.
Current mode schemes are designed for Ipeak = 500A
Effect of Intra-die Process Variations
Mismatch in Vm of an inverter can be up to 40 mV. 1 . For a mismatch of 40 mV in the Vm
value of the inverters,
CMS system
CMS-Fb
CMS-Fpw
CMS-Bias
1
Delay
Throughput
25
33
10
14
4
9.5
Mismatch Data sheet from the foundry
29
Effect of Inter-die Process Variations

Signaling System/ Percentage Degradation
Logic Circuit
SS
SNFP FNSP
CMS-Fb
17.5
5.7
2.9
CMS-Fpw
32
33.6
34.9
CMS-Bias
18.75
8.2
7.14
Voltage Mode
27
<1
2.8
Ring Oscillator Freq
23
2.88
3
Interconnects with CMS-Fpw scheme become the bottleneck in overall performance of
the chip in skewed corners
Degradation in the throughput of the proposed scheme in the skewed corners is around
7% which is less than that in CMS-Fpw scheme
Overall Comparison
Performance Comparison of four signaling schemes (line=6 mm, Power measured at 1Gbps)
Signaling
Scheme
CMS-Fb(90 mV)
CMS-Fpw
Proposed CMS
Voltage Mode
Delay Throughput
(ps)
(Gbps)
700
2.56
503
2.65
490
2.56
1100
2.85
Power Area
( W ) (m2 )
146
2.00
114
2.40
113
3.07
655
12.53
The CMS-Fb scheme consumes higher power than other schemes due to static power
consumption in the feedback inverter
The proposed scheme shows 78% improvement in area over voltage mode scheme whereas
other schemes, CMS-Fb and CMS-Fpw show 84% and 80% respectively
10000
1.5
1
0.5
Data Rate(Mbps)
800
CMS Power <VM Power
400
200
10
12
10
100
1000
Data Rate(Mbps)
600
100
Line =1.5mm
200
0
10000
(e) Data Rate=500 Mbps
(c)
Data Rate = 500 Mbps
X 6.6
400
10
150
(f)
4 6 8 10 12 14
Line Length (mm)
Line=6mm
1
0.1
50
0
4 5 6 7 8 9 10
0
Line Length (mm)
DODFpw+RxFb [2]
DODFb+RxFb [1]
2
X8
100
200
(d)
600
4
6
8
10
Line Length (mm)
800
(b)
Line=6mm
1000
125 Mbps
Power (uW)
Data Rate=50 Mbps
Energy (pJ)
(a)
Power (uW)
Power (uW)
Delay (ns)
2.5
0.01
10
100
1000
10000
4 6 8 10 12 14
Data Rate (Mbps)
Line Length (mm)
Proposed
DODFpw+RxBMul [3]
Voltage Mode
30
2.8
Bidirectional Links
Bidirectional Links
In many applications, on-chip buses need to carry signal in both directions.
For example, the bus between processor and memory, main processor and floating point
multiplier etc.
Often bidirectional buffers with direction control are used for this.
Limitations of Conventional Bidirectional Buffer
En
En=
En
En
En
Direction
Signal
Wire
Segment
Wire
Segment
En
En
Wire
Segment
En
En
Back-to-Back Connected Tri-state Buffers

One of the two tristate buffers is enabled at a given time
Two transistors in stack increased sizes of PMOS and NMOS
Delay of a bidirectional repeater is more than that of a unidirectional buffer
Direction control signal is required by each repeater
Buffers offer huge load to direction control signal
Buffers carrying direction control signal consume additional power
We need a repeaterless Signaling Scheme

The Proposed Current Mode Bidirectional Link
Employs only two bidirectional transceivers, one at each end of the line.
Direction signal is required only at two ends of the line
The direction control signal can be the same as one of the control signal or derived from
it based on communication protocol
Assumption: Direction signal (T x/Rx) is locally available at both ends before data
transmission starts
31
Proposed Current-Mode Transceiver

Transmitter Part
Receiver Part
Strong
Driver
Short
PMOS
Weak
Driver
Inverter
Amplifier
Terminator
Vbp
Tx/Rx
Long
NMOS
Vbp
Tx/Rx
Tx_ip_1
In
Data
Delay
element
Tx_ip_0
Wire
out
Vbn
Long
PMOS
Tx/Rx
Vbn
Tx/Rx
Short
NMOS
Either the transmitter part or the receiver part is enabled at a time
2.8.1
Simulated Performance of Bidirectional Link
Speed-Power of Proposed Bidirectional CMS Scheme

CMBid
(a)
VMBid
(b)
Power (uW)
10e3 Data Rate=500Mbps
Delay (ns)
2.5
2
1.5
1
0.5
0
2
35%
7x
1e3
1e2
Line Length (mm)
3 4 5 6 7
Line Length (mm)
(c)
(d)
Line=4mm
100Mbps
1e3
1e2
Current-Mode Vs. Voltage-Mode
Crossover
Data Rate (Mbps)
Power (uW)
10e3
180
5X
100
1000
CMBid
Power
140
100
60
20
2
Data Rate(Mbps)
1.7 lower power for 2mm lines and 7 lower power for 8mm line
5 reduction in power at 1Gbps
32
Line Length (mm)
35% improvement in delay for nearly all line lengths
Power crossover frequency 100Mbps for 4mm long lines
VMBid
Power
For lines longer than 2mm communicating at data-rates more than 180Mbps, the proposed
scheme consumes less power than voltage-mode
Designed in 180nm for Vdd =1.8V using nominal Vt devices
Line Characteristics: R=211/mm and C=0.245pF/mm
33
CMOS Static Logic

Pseudo nMOS Design Style
Complementary Pass gate Logic
Cascade Voltage Switch Logic
Dynamic Logic
Logic Design Styles

Dinesh Sharma
IIT Bombay, Mumbai
June 1,2006
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
A simple model
Drain Current (mA)
1.4
Vg = 3.5
1.2
for Vgs VT , Ids = 0
1.0
3.0
for Vgs > VT and Vds Vgs VT,

2
Ids = K (Vgs VT )Vds 21 Vds
0.8
0.6
2.5
0.4
2.0
0.2
0.0 0.5
1.5
1.0
1.0 1.5 2.0 2.5 3.0 3.5
Drain Voltage (V)
for Vgs > VT and Vds > Vgs VT ,

4.0 4.5
Ids = K
(Vgs VT )2
2
This model assumes current to be independent of Vds in the

saturation region.
(This is somewhat oversimplified.)
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
0.0
0.2
0.4
Drain Current (mA)

0.6
0.8 1.0 1.2
1.4 1.6
A more realistic model
0.0
Let Early Voltage VE

s
2(V
V
)
gs
T
define Vdss VE 1 +
1
VE

Vgs VT
(Vgs VT ) 1
2VE

1 2
and Idss K (Vgs VT )Vdss Vdss
1.0
2.0
3.0
4.0
5.0
2
Drain Voltage (V)

1 2
for Vgs > VT and Vds Vdss Ids = K (Vgs VT )Vds Vds
2
V + VE
and Vds > Vdss Ids = Idss d
for Vgs > VT http://www.satishkashyap.com/
Vdss + VE
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Inverter Static Characteristics
Noise margins
Dynamic Characteristics
Conversion of CMOS Inverters to other logic
CMOS Static Logic
Each logic stage contains pull up and pull down networks

controlled by input signals.
The pull up network contains p channel transistors.
The pull down network is made of n channel transistors.
If the pull up network is on, the pull down network is off
and vice versa.
Since the pull up and pull down networks are never on
simultaneously, there is no static power consumption.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
CMOS Inverter
The simplest of CMOS logic structure is the inverter.
CMOS inverter is the basic gate.
Vdd
Vi
Vo
More complex gates are designed by

mapping them to an equivalent inverter.
The pull up network of the logic gate is
made equivalent to the pMOS of the
inverter.
The pull down network of the logic gate is
made equivalent to the nMOS of the
inverter.
Thumb rules are used to map the
geometries of the pull up and pull down
networks to single transistors.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Static Characteristics
Inverter Transfer Curve
The range of input voltages can be divided into
several regions.
OH
nMOS off, pMOS on

nMOS saturated, pMOS linear
nMOS saturated, pMOS saturated
nMOS linear, pMOS saturated
nMOS on, pMOS off
OL
iL
iH
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
nMOS off, pMOS on

V
For 0 < Vi < VTn
OH
the n channel transistor is off,

the p channel transistor is on and the
output voltage = Vdd .
This is the normal digital operation range
with input = 0 and output = 1.
OL
iL
iH
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins

V
In this regime, both transistors are on.
OH
The input voltage Vi is > VTn , but is small

enough so that the n channel transistor is
in saturation, and the p channel transistor
is in the linear regime.
OL
iL
In static condition, the output voltage will

adjust itself such that the currents through
the n and p channel transistors are equal.
iH
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins

The absolute value of gate-source voltage on the p
channel transistor is Vdd - Vi , and therefore the over
voltage on its gate is Vdd - Vi - VTp .
The drain source voltage of the pMOS has an absolute
value Vdd -Vo .
Therefore,
Id
= Kp (Vdd Vi VTp )(Vdd

=
1
Vo ) (Vdd Vo )2
2
Kn
(Vi VTn )2
2
have their usual meanings.
Where symbols
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
We define Kn /Kp and Vdp Vdd Vo

Then we can solve the quadratic equation:

1
Id = Kp (Vdd Vi VTp )(Vdd Vo ) (Vdd Vo )2
2
Kn
=
(Vi VTn )2
2
q
So Vo = Vi + VTp + (Vdd Vi VTp )2 (Vi VTn )2
If Kn = Kp ; ( = 1),
q
Vo = (Vi + VTp ) + (Vdd VTn VTp )(Vdd 2Vi + VTn VTp )
Vdd + VTn VTp
2
for Vi
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins

when Vi =
Vdd + VTn VTp
,
1+
both transistors are saturated.
Currents of both transistors are independent of

their drain voltages.
3.0
VoH
Output Voltage
2.5
we do not get a unique solution for Vo by

equating drain currents.
2.0
V +V
Tn Tp
1.5
1.0
The currents will be equal for all values of Vo in

the range
0.5
VoL
0.0
0.0
0.5
1.0
1.5
2.0
ViL
ViH
Input Voltage
Vi VTn Vo Vi + VTp
2.5 3.0
Thus the transfer

curve of an inverter shows a drop of VTn + VTp
at a voltage near Vdd /2.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins

As we increase Vi further, so that
Vdd + VTn VTp
< Vi < Vdd VTp

1+
both transistors are still on, but nMOS enters the linear regime
while pMOS is saturated. Equating currents in this condition,
Kp
Id =
(V Vi VTp )2
2 dd

1 2
= Kn (Vi VTn )Vo Vo
2
From this, we get the quadratic equation
(Vdd Vi VTp )2
1 2
(Vi VTn )Vo +
Vohttp://www.satishkashyap.com/
=0
2
2
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
(Vdd Vi VTp )2
1 2
Vo (Vi VTn )Vo +
=0
2
2
This has solutions
Vo = (Vi VTn )
(Vi VTn )2
(Vdd Vi VTp )2
In the special case where = 1, we have

q
Vo = (Vi VTn ) (Vdd VTn VTp )(2Vi Vdd VTn + VTp )
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
nMOS on, pMOS off

V
OH
As we increase the input voltage beyond

Vdd - VTp , the p channel transistor turns
off, while the n channel conducts
strongly.
As a result, the output voltage falls to zero.
This is the normal digital operation range
with input = 1 and output = 0.
OL
iL
iH
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Noise Margins
For robust design, the output levels must be interpreted
correctly at the input of next stage even in the presence of
noise.
For the high level, we require that the output of one stage
should still be interpreted as high at the input of the next
gate even when pulled down a little due to noise.
Therefore VoH should be > ViH .
Similarly VoL should be < ViL
The difference, ViL VoL is the low noise margin. and
VoH ViH is the high noise level.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Logic Levels
A digital circuit should distinguish logic levels, but be
insensitive to the exact analog voltage at the input.
Therefore flat portions of the transfer curve (where
small) are suitable for digital logic.
Vo
Vi
is
We select two points on the transfer curve where the slope

o
( V
Vi ) is -1.0.
The coordinates of these two points define the values of
(ViL ,VoH ) and (ViH ,VoL ).
The region to the left of ViL and to the right of ViH has
o
| V
Vi | < 1, and is suitable for digital operation.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Calculation of Noise Margins
Vdd
Vi
Vo
To evaluate the values of noise margins,

we shall use the expressions derived for
= 1 to keep the algebra simple.
When the input is low and output high, the
n channel transistor is saturated and the p
channel transistor is in its linear regime.
When the input is high and the output is
low, the n channel transistor is in its linear
regime, while the p channel transistor is
saturated.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Calculation of ViL and VoH

for (ViL ,VoH ), n channel transistor is saturated, while the p
channel transistor is in its linear regime.
q
Vo = (Vi + VTp ) + (Vdd VTn VTp )(Vdd + VTn VTp 2Vi )
Vo
Vi
and set it = -1.

s
Vdd VTn VTp
Vo
= 1 = 1
Vi
Vdd + VTn VTp 2Vi
From this, we evaluate
This gives
3Vdd + 5VTn 3VTp
8
+
V
7V
Vdd VTn VTp
Tn + VTp
dd
= http://www.satishkashyap.com/
= Vdd
8
8
ViL =
VoH
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Calculation of ViH and VoL

When the input is high, we should use the equation for nMOS
linear and pMOS saturated.
q
Vo = (Vi VTn ) (Vdd VTn VTp )(2Vi Vdd VTn + VTp )
Differentiating with respect to Vi gives

s
Vdd VTn VTp
Vo
= 1 = 1
Vi
2Vi Vdd VTn + VTp
From where, we get
5Vdd + 3VTn 5VTp
8
Vdd VTn VTp
VoL =
8
ViH
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins

The High noise margin is given by
VoH ViH =
Vdd VTn + 3VTp

4
Similarly, the Low noise margin is

ViL VoL =
Vdd + 3VTn VTp

4
The two noise margins can be made equal by choosing equal

values for VTn and VTp .
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
For the calculation of rise and fall times, we shall assume

that only one of the two transistors in the inverter is on.
This is more conservative than the static logic levels
calculated by slope considerations.
We shall use the simple model described at the beginning
of this lecture.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Rise time
Vdd
When the input is low, the n channel transistor

is off, while the p channel transistor is on.
From Kirchoffs current law at the output node,
ViL
Idp = C
Vo
dVo
dt
so,
dt
dVo
=
C
Idp
Integrating both sides, we get
Z VoH
rise
dVo
=
C
Idp
0
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
rise
=
C
VoH
dVo
Idp
Till the output rises to ViL + VTp , the p channel transistor is in

saturation.
if VoH > ViL + VTp (which is normally the case), the integration
range can be broken into saturation and linear regimes. Thus
rise
C
Z
Z
ViL +VTp
dVo
Kp
2 (Vdd
0
VoH
ViL +VTp
Kp (Vdd
ViL VTp )2
dVo

ViL VTp )(Vdd Vo ) 12 (Vdd Vo )2
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
rise =
+
2C(ViL + VTp )
Kp (Vdd ViL VTp )2
Vdd + VoH 2ViL 2VTp
C
ln
Kp (Vdd ViL VTp )
Vdd VoH
The first term is just the constant current charging of the

load capacitor.
The second term represents the charging by the pMOS in
its linear range.
This can be compared with resistive charging, which would
have taken a charge time of
Vdd ViL VTp
= RC ln
Vdd VoH
to charge from ViL + VTp to VoH .
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Fall Time
Vo
V
iH
When the input is high, the p channel transistor

is off, while the n channel transistor is on.
From Kirchoffs current law at the output node,
dVo
dt
Separating variables and integrating from the initial voltage
(= Vdd ) to some terminal voltage VoL gives
Idn = C
fall
=
C
voL
Vdd
dVo
Idn
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Fall time
fall
=
C
voL
Vdd
dVo
Idn
The n channel transistor will be in saturation till the output falls

to Vi - VTn . Below this, the transistor will be in its linear regime.
We can divide the integration range in two parts.
Z Vi VTn
Z VoL
dVo
dVo
fall
=
C
Idn
Vi VTn Idn
Vdd
Z Vdd
dVo
=
Kn
2
Vi VTn 2 (Vi VTn )
Z Vi VTn
dVo
+
Kn [(Vi VTn )Vo 21 Vo2
VoL
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Fall time
fall
2(Vi VTn ) VoL
V Vi + VTn
1
= Kdd
+
ln
n
2
C
Kn (Vi VTn )
VoL
2 (Vi VTn )
The first term represents the time taken to discharge at
constant current in the saturation regime, whereas the second
term is the quasi-resistive discharge in the linear regime.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
Trade off between power, speed and robustness

Noise margins are given by
Vdd VTn + 3VTp
4
Vdd + 3VTn VTp
ViL VoL =
4
As we scale technologies, we improve speed and power
consumption. However, the noise margin becomes worse.
We can improve noise margins by choosing relatively
higher threshold voltages. However, this will reduce
speeds.
We could also increase Vdd - but that would increase power
dissipation.
Thus we havehttp://www.satishkashyap.com/
a trade off between power, speed and noise
margins.
VoH ViH
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
CMOS Inverter Design Flow

A common design requirement is symmetric charge and
discharge behaviour and equal noise margins for high and
low logic values.
This requires matched values of Kn and Kp and equal
values of VTn and VTp .
Rise and fall times depend linearly on Kn and Kp .
Thus it is a straightforward calculation to determine
transistor geometries if speed requirements and
technological parameters are given.
However, as transistor geometries are made larger, self
loading can become significant.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins

For large self-loading, we have to model the load
capacitance as
CLoad = Cext + Kn
where we have assumed that = Kn /Kp is constant. is a
technological constant.
We use the expressions for K /C which depend only on
voltages. Once these values are calculated, the geometry
can be determined.
In the extreme case, when self capacitance dominates the
load capacitance, K/C becomes constant and becomes
geometry independent. There is no advantage in using
wider transistors in this regime to increase the speed. It is
better to http://www.satishkashyap.com/
use multi-stage logic with tapered buffers in this
regime.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
From Inverters to Other Logic

Once the basic CMOS inverter is designed, other logic gates
can be derived from it. The logic has to be put in a canonical
form which is a sum of products with a bar (inversion) on top.
For every . in the expression, we put the corresponding n
channel transistors in series and the corresponding p
channel transistors in parallel.
for every +, we put the n channel transistors in parallel
and the p channel transistors in series.
We scale the transistor widths up by the number of devices
(n or p) put in series.
The geometries are left untouched for devices put in
parallel. http://www.satishkashyap.com/
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
CMOS Inverter
Noise margins
CMOS implementation of A.B + C.(D + E)

Vdd
A
D
C
E
Out
For n channel, A and B are in series, The

pair is in parallel with C which is in series
with a parallel combination of D and E.
For p channel, A is in parallel with B, the
pair is in series with C which is in parallel
with a series combination of D and E.
Implementation
of A.B + C.(D + E ) in CMOS logic design style.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Dynamic characteristics
Pseudo nMOS design Flow
CMOS summary
Logic consumes no static power in CMOS
design style.
Vdd
Vi
Vo
However, signals have to be routed to the

n pull down network as well as to the p
pull up network.
So the load presented to every driver is
high.
This is exacerbated by the fact that n and

p channel transistors cannot be placed
close together as these are in different
wells which have to be kept well separated
in order to avoid latchup.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Vdd
Out
in
Gnd
The CMOS pull up network is replaced by

a single pMOS transistor with its gate
grounded.
Since the pMOS is not driven by signals, it
is always on.
The effective gate voltage seen by the
pMOS transistor is Vdd . Thus the
overvoltage on the p channel gate is
always Vdd - VTp .
When the nMOS is turned on, a direct

path between supply and ground exists
and static power will be drawn.
However, the dynamic power is reduced
Logic Design
Styles
capacitive
loading
due to lower
Dinesh Sharma
CMOS Static Logic

Dynamic Logic
Noise margins
As we sweep the input voltage from ground to Vdd , we

encounter the following regimes of operation:
nMOS off
nMOS linear, pMOS linear
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Low input
Vdd
Out
in
When the input voltage is less than VTn .

The output is high and no current is
drawn from the supply.
As we raise the input just above VTn , the
output starts falling.
Gnd
In this region the nMOS is saturated, while

the pMOS is linear
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins

The input voltage is assumed to be sufficiently low so that the
output voltage exceeds the saturation voltage Vi VTn .
Normally, this voltage will be higher than VTp , so the p channel
transistor is in linear mode of operation.
Equating currents through the n and p channel transistors, we
get

1
Kn
2
Kp (Vdd VTp )(Vdd Vo ) (Vdd Vo ) =
(V VTn )2
2
2 i
defining V1 Vdd Vo and V2 Vdd VTp , we get
1 2
V1 V2 V1 + (Vi VTn )2 = 0
2
2
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins

1 2
V1 V2 V1 + (Vi VTn )2 = 0
2
2
The solutions are:
q
V1 = V2 V22 (Vi VTn )2
substituting the values of V1 and V2 and choosing the sign

which puts Vo in the correct range, we get
q
Vo = VTp + (Vdd VTp )2 (Vi VTn )2
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins

Vo = VTp +
(Vdd VTp )2 (Vi VTn )2
As the input voltage is increased, the output voltage will

decrease.
The output voltage will fall below Vi VTn when
q
2 + ( + 1)V (V
VTp + VTp
dd
dd 2VTp )
Vi > VTn +
+1
The nMOS is now in its linear mode of operation. The
does not apply beyond this input voltage.
derived equation
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins

As the input voltage is raised still further, the output voltage will
fall below VTp . The pMOS transistor is now in saturation
regime. Equating currents, we get

Kp
1 2
Kn (Vi VTn )Vo Vo =
(Vdd VTp )2
2
2
which gives
(Vdd VTp )2
1 2
Vo (Vo VTn )Vo +
2
2
This can be solved to get
q
Vo = (V
V
)
(Vi VTn )2 (Vdd VTp )2 /

i
Tn
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Noise Margins
We find points on the transfer curve where the slope is -1.
When the input is low and output high, we should use
q
Vo = VTp + (Vdd VTp )2 (Vi VTn )2
Differentiating this equation with respect to Vi and setting the

slope to -1, we get
and
Vdd VTp
ViL = VTn + p
( + 1)
s
(V VTp )
+ 1 dd
VoH = VTp +
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
When the input is high and the output low, we use

q
Vo = (Vi VTn ) (Vi VTn )2 (Vdd VTp )2 /
Differentiating with respect to Vi and setting the slope to -1, we

get
2
ViH = VTn +
(Vdd VTp )
3
and
VoL =
(Vdd VTp )
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Ratioed Logic
To make the output low value lower than VTn , we get the
condition

1 Vdd VTp 2
>
3
VTn
This places a requirement on the ratios of widths of n and
p channel transistors. The logic gates work properly only
when this equation is satisfied.
Therefore this kind of logic is also called ratioed logic.
In contrast, CMOS logic is called ratioless logic because it
does not place any restriction on the ratios of widths of n
and p channel transistors for static operation.
The noise
margin for pseudo nMOS can be determined
easily from the expressions for ViL , VoL , ViH , VoH .
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Rise Time
Vdd
ViL
Vo
When the input is low, the nMOS is off and the

output rises from low to high.
The situation is identical to the charge up
condition of a CMOS gate with the pMOS
being biased with its gate at 0V.
This gives
rise

2VTp
Vdd + VoH 2VTp
C
+ ln
=
Kp (Vdd VTp ) Vdd VTp
Vdd VoH
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Fall Time
Vdd
Out
in
Gnd
Calculation of fall time is complicated by the

fact that the pMOS load continues to dump
current in the output node, even as the nMOS
tries to discharge the output capacitor.
The nMOS needs to sink the discharge current
as well as the drain current of the pMOS
transistor.
Simplifying assumption:
pMOS current remains constant at its
saturation value through the entire discharge
process.
(This will result

in a slightly pessimistic value of discharge time).
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Fall Time
If we assume that the pMOS current remains constant at its
saturation value,
Kp
(Vdd VTp )2
Ip =
2
. We can write the KCL equation at the output node as:
In Ip + C
dVo
=0
dt
which gives
fall
=
C
VoL
Vdd
dVo
In Ip
Vi VTn and V2 Vdd VTp .

We define V1http://www.satishkashyap.com/
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Fall Time
Vdd
Out
The integration range can be divided into two

regimes.
nMOS is saturated when V1 Vo < Vdd .
It is in the linear regime when
VoL < Vo < V1 .
in
Gnd
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Fall Time
fall
=
C
V1
Vdd
dVo
1
2
2 Kn V1 Ip
VoL
V1
dVo
Kn (V1 Vo 12 Vo2 ) Ip
so,
fall
V V1
= 1 dd 2
+
C
2 Kn V1 Ip
V1
VoL
dVo
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Pseudo nMOS Inverter design
We design the basic inverter and then scale device sizes

based on the logic function being designed.
The load device size is calculated from the rise time.

2VTp
Vdd + VoH 2VTp
C
rise =
+ ln
Kp (Vdd VTp ) Vdd VTp
Vdd VoH
Given a value of rise , operating voltages and technological
constants, Kp and hence, the geometry of the p channel
transistor can be determined.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Pseudo nMOS Inverter design
Geometry of the n channel transistor can be determined

from static considerations.
q
VoL = (ViH VTn ) (ViH VTn )2 (Vdd VTp )2 /
We take VoL = VTn , and calculate .
But Kn /Kp and Kp is already known.

This evaluates Kn and hence, the geometry of the n
channel transistor.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Conversion to other logic
Once the basic pseudo nMOS inverter is designed, other

logic gates can be derived from it.
The procedure is the same as that for CMOS, except that it
is applied only to nMOS transistors.
The p channel transistor is kept at the same size as that for
an inverter.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
Conversion to other logic
The logic is expressed as a sum of products with a bar

(inversion) on top.
For every . in the expression, we put the corresponding n
channel transistors in series.
For every +, we put the n channel transistors in parallel.
We scale the transistor widths up by the number of devices
put in series.
The geometries are left untouched for devices put in
parallel.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Noise margins
A.B + C.(D + E) in pseudo-nMOS
Vdd
Out
A and B are in series.

The pair is in parallel with C which is in
series with a parallel combination of D and
E.
Implementation of A.B + C.(D + E ) in pseudo-nMOS logic

design style.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Logic Design using CPL

Pull up for Leakage current Reduction

This logic family is based on multiplexer logic.
Given a boolean function F (x1 , x2 , . . . , xn ), we can express
it as:
F (x1 , x2 , . . . , xn ) = xi f 1 + xi f 2
where f1 and f2 are reduced expressions for F with xi
forced to 1 and 0 respectively.
Thus, F can be implemented with a multiplexer controlled
by xi which selects f1 or f2 depending on xi .
f1 and f2 can themselves be decomposed into simpler
expressions by the same technique.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

To implement a multiplexer, we need both xi and xi .

Therefore, this logic family needs all inputs in true as well
as in complement form.
In order to drive other gates of the same type, it must
produce the outputs also in true and complement forms.
Thus each signal is carried by two wires.
This logic style is called Complementary Passgate Logic
or CPL for short.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Basic Multiplexer Structure
xi
f1
xi
F
f2
f1
f2
Pure passgate logic contains no amplifying

elements. Therefore, each logic stage
degrades the logic level.
Hence, multiple logic stages cannot be
cascaded.
F
We include conventional CMOS inverters to
restore the logic level.
F Ideally, the multiplexer should be composed of
complementary pass gate transistors.
However, we shall use just n channel
transistors as switches for simplicity.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

For any logic function, we pick one input as the control

variable.
Multiplexer inputs are decided by re-evaluating the
function, fourcing this variable to 1 and zero respectively.
Since both true and complement outputs are generated by
CPL, we need fewer types of gates.
For example, we do not need separate gates for AND and
NAND functions.
The same applies to OR-NOR, and XOR-XNOR functions.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Implementation of XOR and XNOR

To take an example, let us consider the XOR-XNOR functions.
A
B
Because of the inverter, for XOR output,

We calculate the XNOR function given by
A.B + A.B.
A
A+B
A+B
B
B
A+B
A+B
B
XORXNOR
If we put A = 1, this reduces to B and for A

= 0, it reduces to B.
For the XNOR output, we generate the
XOR expression = A.B + A.B
The expression reduces to B for A = 1 and
to B for A = 0.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Implementation of AND-NAND and OR-NOR

A
A.B
B
A.B
A
B
A+B
A+B
B
A.B
A+B
A+B
A.B
B
ORNOR
ANDNAND
For AND, the mux should output A.B to be inverted by the

buffer. This reduces to B when A = 1 and to 1 (= A) when
A = 0.
Implementation of NAND, OR and NOR functions follows
same lines.
along thehttp://www.satishkashyap.com/
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Buffer Leakage Current

xi
The high output of the multiplexer (y)

cannot rise above Vdd - VTn because we
use nMOS multiplexers.
xi
f1
y=F
F
f2
Consequently, the pMOS transistor in the

buffer inverter never quite turns off.
This results in static power consumption in
the inverter.
xi
xi
This can be avoided by adding a pull up pMOS

with the inverter.
f1
y=F
f2
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Use of Pullup PMOS
xi
When the multiplexer output (y) is low,

the inverter output (F) is high. The pMOS
is off and has no effect.
xi
f1
y=F
f2
When the multiplexer output (y) goes

high, the inverter output falls and turns
the pMOS on.
Now, even though the multiplexer nMOS turns off as y

approaches Vdd - VTn , the pMOS remains on and takes the
inverter input (y) all the way to Vdd .
This avoids leakage in the inverter.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Need for ratioing

The use of pMOS pullup brings up another problem.
Consider the equivalent circuit when the inverter output is low
and the pMOS is on.
If the final output is low, the pMOS pullup is
on. Now if the multiplexer output wants to go
Vdd
low, it has to fight the pMOS pullup - which is
0
trying to keep this node high.
0 ->1
0
In fact, the multiplexer n transistor and the pull

up p transistor constitute a pseudo nMOS
inverter.
Therefore, thehttp://www.satishkashyap.com/
multiplexer output cannot be pulled low unless
the transistor geometries are appropriately ratioed.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Improving Pseudo nMOS

Vdd
Vdd
Out
Out
A
In the pseudo-nMOS NOR circuit on the left, static power is

consumed when the output is LOW
We would like to turn the pMOS off when A OR B is TRUE.
The OR logic can be constructed by using a Pseudo-nMOS
NAND of A and B as in the circuit on the right.
But then what about the pMOS drive of this circuit?
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Improving Pseudo nMOS

Vdd
Vdd
Out
Out
A
In the pseudo-nMOS NOR circuit on the left, static power is

consumed when the output is LOW
We would like to turn the pMOS off when A OR B is TRUE.
The OR logic can be constructed by using a Pseudo-nMOS
NAND of A and B as in the circuit on the right.
But then what about the pMOS drive of this circuit?
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Pseudo nMOS without Static Power

Vdd
Vdd
Out
Out
A
The output of the circuit on the right is LOW when both

A and B are HIGH (A = B = 0).
We would like to turn its pMOS off when NOR of A and B is
TRUE
But this can be provided by the circuit on the left!
So the two
circuits can drive each others pMOS transistors
and avoid static power consumption.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Pseudo nMOS without Static Power

Vdd
Vdd
Out
Out
A
The output of the circuit on the right is LOW when both

A and B are HIGH (A = B = 0).
We would like to turn its pMOS off when NOR of A and B is
TRUE
But this can be provided by the circuit on the left!
So the two
circuits can drive each others pMOS transistors
and avoid static power consumption.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Vdd
Out
Out
A
A
This kind of logic is called Cascade Voltage

Switch Logic (CVSL).
It can use any network f and its
complementary network f in the two
cross-coupled branches.
Like CMOS static logic, there is no static power

consumption.
Like CPL, this logic requires both True and Complement
signals. It also provides both True and complement
outputs. (Dual Rail Logic).
Like pseudo nMOS, the inputs present a single transistor
load to the driving stage.
The circuit
is self latching. This reduces ratioing
requirements.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic
Four Phase Dynamic Logic

Domino Logic
Zipper logic
Dynamic logic
In this style of logic, some nodes are required to hold their
logic value as a charge stored on a capacitor.
These nodes are not connected to their drivers
permanently.
The driver places the logic value on them, and is then
disconnected from the node.
Due to leakage etc., the logic value cannot be held
indefinitely.
Dynamic circuits therefore require a minimum clock
frequency to operate correctly.
Use of dynamic circuits can reduce circuit complexity and
substantially.
power consumption
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Domino Logic
Zipper logic
A CMOS dynamic logic circuit

Vdd
Out
A
B
C
Ck
CL
When the clock is low, pMOS

is on and the bottom nMOS is
off.
The output is pre-charged to
1 unconditionally.
When the clock goes high, the
pMOS turns off and the
bottom nMOS comes on.
The circuit then conditionally
discharges the output node, if
(A+B).C is TRUE.
This implements the function

(A + B).C.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Domino Logic
Zipper logic
Problem with Cascading

Vdd
Ck
Ck
(A+B).C = TRUE
A
B
C
Out
(A+B).C = FALSE
CL
Ck
Out
Out
There is no problem when (A+B).C is false. X pre-charges to 1

and remains at 1.
When (A+B).C is TRUE, X takes some time to discharge.
During this time, charge placed on the output leaks away as the
input to nMOS
of the inverter is not 0.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Domino Logic
Zipper logic
4 Phase Dynamic Logic

The problem can be solved by
using a 4 phase clock.
Ck1
Ck2
Ck3
Ck4
In phase 1 node P is
pre-charged.
Ck23
P
A
In phase 2 P and output are

pre-charged.
Out
In phases 4 and 1, the output

is isolated from the driver and
remains valid.
B
C
Ck12
In phase 3 The gate evaluates.
This is called a type 3 gate. It
evaluates in phase 3 and is
valid in phases 4 and 1.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Domino Logic
Zipper logic
Drive cycles
Drive Sequences
Type 1
Type 2
A type 3 gate can drive a type

4 or a type 1 gate.
similarly, type 4 will drive types
1 and 2; type 1 will drive types
2 and 3; and type 2 will drive
types 3 and 4.
Type 4
We can use a 2 phase clock if

we stick to type 1 and type 3
gates (or type 2 and type 4
gates) as these can drive each
Type 3
other.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Domino Logic
Zipper logic
Domino Logic
Another way to eliminate the

problem with cascading logic
stages is to use a static inverter
after the CMOS dynamic gate.
The output is 0 when it is not
valid. Therefore, it does not affect
the evaluation of the next gate.
P
A
B
C
Ck
However, the logic is non-inverting. Therefore, it cannot be

used to implement any arbitrary logic function.
Dinesh Sharma
Logic Design Styles
CMOS Static Logic

Dynamic Logic

Domino Logic
Zipper logic
Zipper Logic
Instead of using an inverter, we can alternate n and p
evaluation stages.
Vdd
B
C
Ck
E
D
Ck
The n stage is pre-charged

high, but it drives a p stage.
A high pre-charged stage will
keep the p evaluation stage
off, which will not cause any
malfunction.
The p stage will be

pre-discharged to low, which
A, B, C must be from p stages.
is safe for driving n stages.
D and E must be from n stages.
This kind of logic
is called zipper logic.
Gnd
Dinesh Sharma
Logic Design Styles
Logic Design
Dinesh Sharma
Microelectronics group
EE Department, IIT Bombay
Contents
1 Transistor Models
2 Static CMOS Logic Design

2.1 Static CMOS Design style . . . . . . . . . . . . . . . .
2.2 CMOS Inverter . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Static Characteristics . . . . . . . . . . . . . . .
2.2.2 Noise margins . . . . . . . . . . . . . . . . . . .
2.2.3 Dynamic Considerations . . . . . . . . . . . . .
2.2.4 Trade off between power, speed and robustness
2.2.5 CMOS Inverter Design Flow . . . . . . . . . . .
2.2.6 Conversion of CMOS Inverters to other logic . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
7
7
11
13
16
17
17
3 Beyond Static CMOS

3.1 Pseudo nMOS Design Style . . . . . . . . . . . . . . . . .
3.1.1 Static Characteristics . . . . . . . . . . . . . . . . .
3.1.2 Noise margins . . . . . . . . . . . . . . . . . . . . .
3.1.3 Dynamic characteristics . . . . . . . . . . . . . . .
3.1.4 Pseudo nMOS design Flow . . . . . . . . . . . . . .
3.1.5 Conversion of pseudo nMOS Inverter to other logic
3.2 Complementary Pass gate Logic . . . . . . . . . . . . . . .
3.2.1 Basic Multiplexer Structure . . . . . . . . . . . . .
3.2.2 Logic Design using CPL . . . . . . . . . . . . . . .
3.2.3 Buffer Leakage Current . . . . . . . . . . . . . . . .
3.3 Cascade Voltage Switch Logic . . . . . . . . . . . . . . . .
3.4 Dynamic Logic . . . . . . . . . . . . . . . . . . . . . . . .
3.4.1 Problem with Cascading CMOS dynamic logic . . .
3.4.2 Four Phase Dynamic Logic . . . . . . . . . . . . . .
3.4.3 Domino Logic . . . . . . . . . . . . . . . . . . . . .
3.4.4 Zipper logic . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
19
19
20
21
22
23
24
24
25
25
26
28
30
31
32
33
33
.
.
.
.
.
.
.
.
List of Figures
1.1 MOS characteristics according to the simple analytic model . . . . .
1.2 MOS characteristics with non zero conductance in saturation . . . .
2.1
2.2
2.3
2.4
2.5
The basic CMOS inverter . . . . . . . . . .

Transfer Curve of a CMOS inverter . . . . .
CMOS inverter with the nMOS off . . . . .
CMOS inverter with the pMOS off . . . . .
CMOS implementation of A.B + C.(D + E)
.
.
.
.
.
.
.
.
.
.
8
10
13
15
18
3.1
3.2
3.3
3.4
3.5
high to low transition on the output . . . . . . . . . . . . . . .

Pseudo NMOS implementation of A.B + C.(D + E) . . . . . . . .
Basic Multiplexer with logic restoring inverters . . . . . . . . . . .
Implementation of XOR and XNOR by CPL logic. . . . . . . . .
Implementation of (a) AND-NAND and (b) OR-NOR functions using complementary passgate logic. . . . . . . . . . . . . . . . . . .
High leakage current in inverter . . . . . . . . . . . . . . . . . . .
Pull up pMOS to avoid leakage in the inverter . . . . . . . . . . .
Problem with a low to high transition on the output . . . . . . . .
Pseudo-nMOS NOR . . . . . . . . . . . . . . . . . . . . . . . . .
Pseudo-nMOS OR from complemented inputs . . . . . . . . . . .
OR-NOR implementation in Cascade Voltage Switch Logic . . . .
CMOS dynamic gate to implement (A + B).C. . . . . . . . . . . .
CMOS 4 phase dynamic logic . . . . . . . . . . . . . . . . . . . .
CMOS 4 phase dynamic logic drive constraints . . . . . . . . . . .
CMOS domino logic . . . . . . . . . . . . . . . . . . . . . . . . .
Zipper logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
22
24
25
26
.
.
.
.
.
.
.
.
.
.
.
.
26
27
27
28
28
29
29
30
32
32
33
34
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
3.16
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
4
Chapter 1
Transistor Models
In this booklet, we shall use simple analytical models for MOS transistors. We
use a sign convention according to which, voltage and current symbols associated
with the pMOS transistor (such as VT p ) have positive values. Then, the n channel
formulae can be used for both transistors and we shall assign signs to quantities
explicitly.
Drain Current (mA)
1.4
Vg = 3.5
1.2
1.0
3.0
0.8
0.6
2.5
0.4
2.0
0.2
0.0 0.5
1.5
1.0
1.0 1.5 2.0 2.5 3.0 3.5
Drain Voltage (V)
4.0 4.5
Figure 1.1: MOS characteristics according to the simple analytic model

The model we use is described by the following equations:
for Vgs VT ,
Ids = 0
(1.1)
for Vgs > VT and Vds Vgs VT ,

1
Ids = K (Vgs VT )Vds Vds2
2

(1.2)
and for Vgs > VT and Vds > Vgs VT ,

Ids = K
(Vgs VT )2
2
(1.3)
The saturation region equation is somewhat oversimplified because it assumes that

the current is independent of Vds . In reality, the current has a weak dependence
on Vds in this region.
0.0
0.2
Drain Current (mA)

0.4 0.6
0.8 1.0 1.2
1.4 1.6
In order to model the saturation region more accurately, we adopt an Early

Voltage like formalism.
0.0
1.0
2.0
3.0
4.0
Drain Voltage (V)
5.0
Figure 1.2: MOS characteristics with non zero conductance in saturation
It is assumed that the current increases linearly in the saturation region. All linear
4
characteristics in saturation can be produced backwards towards negative drain

voltages and will intersect the drain voltage axis at a single point at -VE . (This
is, at best, an approximation). Because the conductance in saturation is now
non zero, the onset of saturation has to be redefined, so that the current and its
derivative are continuous at the boundary of linear and saturation regimes. The
current equations are given by:
For Vgs > VT and Vds Vdss ,
Ids
1
= K (Vgs VT )Vds Vds2
2

(1.4)
and for Vgs > VT and Vds > Vdss ,

Ids = Idss
Vd + VE
Vdss + VE
(1.5)
Where VE is the Early Voltage. Here Vdss and Idss are saturation drain voltage
and drain current respectively. Since the current values must match at either side
of Vds = Vdss , we must have:
Idss
1 2
K (Vgs VT )Vdss Vdss
.
2

(1.6)
For the curve to be smooth and continuous at Vd = Vdss , the value of the first
derivative should match on either side of Vdss . Therefore,
K(Vgs VT Vdss ) =
Idss
Vdss + VE
So,
1 2
K(Vgs VT Vdss )(Vdss + VE ) = K (Vgs VT )Vdss Vdss
2
This leads to a quadratic equation in Vdss

1 2
V + VE Vdss (Vgs VT )VE = 0
2 dss
(1.7)
(1.8)
Solving this quadratic, we get

Vdss
2(Vgs VT )
= VE 1 +
1
VE
(1.9)
For VE >> Vgs VT this reduces to

Vdss
Vgs VT
(Vgs VT ) 1
2VE

(1.10)
Characteristics of a MOS transistor using this model are shown in fig.1.2. While
accurate modeling of the output conductance is essential for linear design, the
simpler model assuming constant Id in saturation is often adequate for preliminary
digital design. In any case, final designs will have to be validated with detailed
simulations. In this booklet, we shall use the simple model for MOS devices to
keep the algebra simple.
Chapter 2
Static CMOS Logic Design
Static logic circuits are those which can hold their output logic levels for indefinite
periods as long as the inputs are unchanged. Circuits which depend on charge
storage on capacitors are called dynamic circuits and will be discussed in a later
chapter.
2.1
Static CMOS Design style
The most common design style in modern VLSI design is the Static CMOS logic
style. In this, each logic stage contains pull up and pull down networks which are
controlled by input signals. The pull up network contains p channel transistors,
whereas the pull down network is made of n channel transistors. The networks are
so designed that the pull up and pull down networks are never on simultaneously.
This ensures that there is no static power consumption.
2.2
CMOS Inverter
The simplest of such logic structures is the CMOS inverter. In fact, for any CMOS
logic design, the CMOS inverter is the basic gate which is first analyzed and
designed in detail. Thumb rules are then used to convert this design to other more
complex logic. The basic CMOS inverter is shown in fig. 2.1. We shall develop
the characteristics of CMOS logic through the inverter structure, and later discuss
ways of converting this basic structure more complex logic gates.
2.2.1
The range of input voltages can be divided into several regions.

7
Vdd
Vi
Vo
Figure 2.1: The basic CMOS inverter
nMOS off, pMOS on

For 0 < Vi < VT n the n channel transistor is off, the p channel transistor is on
and the output voltage = Vdd . This is the normal digital operation range with
input = 0 and output = 1.
In this regime, both transistors are on. The input voltage Vi is > VT n , but is
small enough so that the n channel transistor is in saturation, and the p channel
transistor is in the linear regime. In static condition, the output voltage will adjust
itself such that the currents through the n and p channel transistors are equal. The
absolute value of gate-source voltage on the p channel transistor is Vdd - Vi , and
therefore the over voltage on its gate is Vdd - Vi - VT p . The drain source voltage
of the pMOS has an absolute value Vdd -Vo . Therefore,
Id = Kp
Kn
1
(Vi VT n )2
(Vdd Vi VT p )(Vdd Vo ) (Vdd Vo )2 =
2
2

(2.1)
Where symbols have their usual meanings.

We define Kn /Kp . We make the substitution Vdp Vdd Vo , where Vdp is
the absolute value of the drain-source voltage for the p channel transistor. Then,
1
(Vdd Vi VT p )Vdp Vdp2 = (Vi VT n )2

2
2
Which gives the quadratic
1 2
Vdp Vdp (Vdd Vi VT p ) + (Vi VT n )2 = 0

2
2
Solutions to the quadratic are:
Vdp = (Vdd Vi VT p )
(Vdd Vi VT p )2 (Vi VT n )2
8
(2.2)
(2.3)
(2.4)
These equations are valid only when the pMOS is in its linear regime. This requires
that
Vdp Vdd Vo Vdd Vi VT p
Therefore, we must choose the negative sign. Thus
Vdd Vo = (Vdd Vi VT p )
Therefore,
Vo = Vi + VT p +
Vdd Vi VT p )2 (Vi VT n )2
(Vdd Vi VT p )2 (Vi VT n )2
(2.5)
(2.6)
Since Vo must be Vi + VT p , the limit of applicability of the above result is given

by
(Vdd Vi VT p )2 = (Vi VT n )2
That is, the solution for Vo is valid for
Vi
Vdd +
VT n VT p
1+
(2.7)
In the case where we size the n and p channel transistors such that
Kn = Kp ; so = 1
we have
Vo = (Vi + VT p ) +
with
(Vdd VT n VT p )(Vdd 2Vi + VT n VT p )
Vi
(2.8)
Vdd + VT n VT p
2

At the limit of applicability of eq. 2.7, when the input voltage is exactly at
Vdd + VT n VT p
Vi =
(2.9)
1+
both transistors are saturated. Since the currents of both transistors are independent of their drain voltages in this condition, we do not get a unique solution for
Vo by equating drain currents. The currents will be equal for all values of Vo in
the range
Vi VT n Vo Vi + VT p
Thus the transfer curve of an inverter shows a drop of VT n + VT p at a voltage near
Vdd /2. This is actually an artifact of the simple transistor model chosen for this
9
3.0
VoH
Output Voltage
2.5
2.0
1.5
V +V
Tn Tp
1.0
0.5
VoL
0.0
0.0
0.5
1.0
1.5
2.0
ViL
ViH
Input Voltage
2.5 3.0
Figure 2.2: Transfer Curve of a CMOS inverter
analysis, which assumes perfect saturation of drain current. In a real case, the
drain current does depend on the drain voltage (albeit weakly) in the saturation
region. If the model incorporates an Early Voltage like effect, the drop near the
middle of the characteristic is more gradual.

At the gate voltage given by eq. 2.9, both transistors are saturated. As we increase
Vi beyond this value, such that
Vdd + VT n VT p
< Vi < Vdd VT p

1+
both transistors are still on, but nMOS enters the linear regime while pMOS gets
saturated. Equating currents in this condition,
Kp
1
Id =
(Vdd Vi VT p )2 = Kn (Vi VT n )Vo Vo2
2
2

(2.10)
From this, we get the quadratic equation

1 2
(Vdd Vi VT p )2
V (Vi VT n )Vo +
=0
2 o
2
10
(2.11)
This has solutions

Vo = (Vi VT n )
(Vi VT n )2
(Vdd Vi VT p )2
(2.12)
Since the equations are valid only when the n channel transistor is in the linear
regime (Vo < Vi VT n ), we choose the negative sign. This gives,
Vo = (Vi VT n )
(Vi VT n )2
(Vdd Vi VT p )2
(2.13)
Again, in the special case where = 1, we have

Vo = (Vi VT n )
(Vdd VT n VT p )(2Vi Vdd VT n + VT p )
(2.14)
nMOS on, pMOS off

As we increase the input voltage beyond Vdd - VT p , the p channel transistor turns
off, while the n channel conducts strongly. As a result, the output voltage falls
to zero. This is the normal digital operation range with input = 1 and output =
0.
The figure below shows the transfer curve of an inverter with Vdd = 3V, VT n =
0.6V and VT p = 0.5V, and = 1.
3.5
Output Voltage
3
2.5
2
1.5
1
0.5
0
0
0.5
1.5
2.5
Input Voltage
The plot produced by SPICE for this circuit with realistic models is quite similar.
2.2.2
Noise margins
The requirement from a digital circuit is that it should distinguish logic levels,
but be insensitive to the exact analog voltage at the input. This implies that
11
o
the flat portions of the transfer curve (where V
is small) are suitable for digital
Vi
o
logic. We select two points on the transfer curve where the slope ( V
) is -1.0.
Vi
The coordinates of these two points define the values of (ViL ,VoH ) and (ViH ,VoL ).
Robust digital design requires that the output high level be higher than what is
acceptable as a high level at the input (VoH > ViH ). The difference between these
two levels is the high noise margin. This is the amount of noise that can ride
on the worst case high output and still be accepted as a high at the input of
the next gate. Similarly, we require VoL < ViL . The difference, ViL VoL is the
low noise margin. Obviously, it is of interest to evaluate the values of these noise
margins. For the discussion which follows, we shall use the expressions derived
earlier for = 1 to keep the algebra simple.
Calculation of ViL and VoH

from eq. (2.8)
q
Vo = (Vi + VT p ) +
(Vdd VT n VT p )(Vdd + VT n VT p 2Vi )
Vo
Vi
From this, we can evaluate
and set it = -1.
Vo
= 1 = 1
Vi
Vdd VT n VT p
Vdd + VT n VT p 2Vi
(2.15)
This gives
3Vdd + 5VT n 3VT p

8
Substituting this in eq.(2.8), we get
ViL =
VoH =
7Vdd + VT n + VT p
Vdd VT n VT p
= Vdd
8
8
(2.16)
(2.17)
Calculation of ViH and VoL

When the input is high, we should use eq.(2.14).
Vo = (Vi VT n )
(Vdd VT n VT p )(2Vi Vdd VT n + VT p )
Differentiating with respect to Vi gives

Vo
= 1 = 1
Vi
From where, we get
ViH =
Vdd VT n VT p
2Vi Vdd VT n + VT p
5Vdd + 3VT n 5VT p

8
12
(2.18)
(2.19)
and
VoL =
Vdd VT n VT p
8
(2.20)

The high noise margin is given by
VoH ViH =
Vdd VT n + 3VT p
4
(2.21)
Vdd + 3VT n VT p
4
(2.22)
Similarly, the Low noise margin is

ViL VoL =
The two noise margins can be made equal by choosing equal values for VT n and
VT p .
2.2.3
Dynamic Considerations
In this section, we analyze the dynamic behaviour of the inverter. For the calculation of rise and fall times, we shall assume that only one of the two transistors
in the inverter is on. (Notice that this is more conservative than the input high
and low conditions determined by slope considerations in eq.2.19 and 2.16). We
shall continue to use the simple model described at the beginning of this booklet.
Rise time
When the input is low, the n channel transistor is off, while the p channel transistor is on. The equivalent circuit in this condition is shown in fig. 2.3. From
Vdd
ViL
Vo
Figure 2.3: CMOS inverter with the nMOS off
13
Kirchoffs current law at the output node,

Idp = C
so,
dVo
dt
dt
dVo
=
C
Idp
This separates the variables, with the LHS independent of operating voltages and
the RHS independent of time. Integrating both sides, we get
rise
=
C
VoH
dVo
Idp
Till the output rises to ViL + VT p , the p channel transistor is in saturation. Since
the current is constant, the integration is trivial. If VoH > ViL + VT p (which is
normally the case), the integration range can be broken into saturation and linear
regimes. Thus
rise
=
C
ViL +VT p
Kp
(Vdd
2
dVo
ViL VT p )2
dVo
VoH
ViL +VT p
Kp (Vdd ViL VT p )(Vdd Vo ) 21 (Vdd Vo )2
We define V1 Vdd Vo and V2 Vdd ViL VT p , so dVo = dV1 .

We get
Z Vdd VoH
Kp rise
ViL + VT p
dV1
=
2
2C
V2
2V1 V2 V12
V2
The integral can be evaluated as
dV1
2V1 V2 V12
V2

Z V2
1
1
1
=
+
dV1
2V2 Vdd VoH V1 2V2 V1

V2
1
V1
=
ln
2V2
2V2 V1 Vdd VoH
1
2V2 Vdd + VoH
ln
=
2V2
Vdd VoH
Therefore,
Vdd VoH
Kp rise
ViL + VT p
1
2V2 Vdd + VoH
=
+
ln
2
2C
V2
2V2
Vdd VoH
14
or
Kp rise
1
2V2 Vdd + VoH
ViL + VT p
=
+
ln
2C
(Vdd ViL VT p )2 2(Vdd ViL VT p )
Vdd VoH
Thus,
C(ViL + VT p )
ViL VT p )2
C
Vdd + VoH 2ViL 2VT p
+
ln
Kp (Vdd ViL VT p )
Vdd VoH
rise =
Kp
(Vdd
2
(2.23)
The first term is just the constant current charging of the load capacitor. The
second term represents the charging by the pMOS in its linear range. This can be
compared with resistive charging, which would have taken a charge time of
= RC ln
Vdd ViL VT p
Vdd VoH
to charge from ViL + VT p to VoH .

Fall time
When the input is high, the n channel transistor is on and the p channel transistor
is off. If the output was initially high, it will be discharged to ground through
Vo
Vi H
Figure 2.4: CMOS inverter with the pMOS off

the nMOS. To analysis the fall time, we apply Kirchoffs current law to the output
node. This gives
dVo
Idn = C
dt
Again, separating variables and integrating from the initial voltage (= Vdd ) to some
terminal voltage VoL gives
Z voL
f all
dVo
=
C
Vdd Idn
15
The n channel transistor will be in saturation till the output voltage falls to Vi - VT n .
Below this voltage, the transistor will be in its linear regime. Thus, we can divide
the integration range in two parts.
Z Vi VT n
f all
dVo Z VoL dVo
=
C
Idn
Vi VT n Idn
Vdd
Z Vdd
dVo
=
Kn
Vi VT n
(Vi VT n )2
2
Z Vi VT n
dVo
+
Kn [(Vi VT n )Vo 21 Vo2
VoL
Therefore
Kn f all
dVo
Vdd Vi + VT n Z Vi VT n
=
+
2
2C
(Vi VT n )
2Vo (Vi VT n ) Vo2
VoL
!
Z Vi VT n
1
Vdd Vi + VT n
1
1
+
dVo
=
+
(Vi VT n )2
2(Vi VT n ) VoL
Vo 2(Vi VT n ) Vo
Which gives
"
Vdd Vi + VT n
1
Kn f all
Vo
=
+
ln
2
2C
(Vi VT n )
2(Vi VT n )
2(Vi VT n ) Vo
=
#Vi VT n
VoL
1
Vdd Vi + VT n
2(Vi VT n ) VoL
+
ln
2
(Vi VT n )
2(Vi VT n )
VoL
and therefore
f all =
C
C(Vdd Vi + VT n )
2(Vi VT n ) VoL
+
ln
Kn
Kn (Vi VT n )
VoL
(Vi VT n )2
2
(2.24)
Again, the first term represents the time taken to discharge at constant current in
the saturation regime, whereas the second term is the quasi-resistive discharge in
the linear regime.
2.2.4
Trade off between power, speed and robustness
As we scale technologies, we improve speed and power consumption. However,

as we can see from the expression for noise margins, (eq 2.21 and eq 2.22) the
noise margin becomes worse. We can improve noise margins by choosing relatively
higher threshold voltages. However, this will reduce speeds. We could also increase
Vdd - but that would increase power dissipation. Thus we have a trade off between
power, speed and noise margins.
This choice is made much more complicated by process variations, because we
have to design for the worst case.
16
2.2.5
The CMOS inverter forms the basis of most static CMOS logic design. More complex logic can be designed from it by simple thumb rules. A common (though not
universal) design requirement is symmetric charge and discharge behaviour and
equal noise margins for high and low logic values. This requires matched values
of Kn and Kp and equal values of VT n and VT p . For a constant load capacitance,
rise and fall times depend linearly on Kn and Kp . Thus it is a straightforward
calculation to determine transistor geometries if speed requirements and technological parameters are given. However, as transistor geometries are made larger,
self loading can become significant. We now have to model the load capacitance
as
CLoad = Cext + Kn
where we have assumed that = Kn /Kp is kept constant. is a technological
constant. We use the expressions for K /C which depend only on voltages. Once
these values are calculated, the geometry can be determined.
In the extreme case, when self capacitance dominates the load capacitance, K/C
becomes constant and becomes geometry independent. There is no advantage
in using wider transistors in this regime to increase the speed. It is better to use
multi-stage logic with tapered buffers in this regime. This will be discussed in the
module on Logical Effort.
2.2.6
Once the basic CMOS inverter is designed, other logic gates can be derived from
it. The logic has to be put in a canonical form which is a sum of products with a
bar (inversion) on top. For every . in the expression, we put the corresponding
n channel transistors in series and the corresponding p channel transistors in parallel. for every +, we put the n channel transistors in parallel and the p channel
transistors in series. We scale the transistor widths up by the number of devices
(n or p) put in series. The geometries are left untouched for devices put in parallel. Fig.2.5 shows the implementation of A.B + C.(D + E) in CMOS logic design
style.
17
Vdd
A
D
C
E
Out
Figure 2.5: CMOS implementation of A.B + C.(D + E)
18
Chapter 3
Beyond Static CMOS
3.1
CMOS design style ensures that the logic consumes no static power. This is because the pull down and pull up networks are never on simultaneously. However,
this requires that signals have to be routed to the n pull down network as well as
to the p pull up network. This means that the load presented to every driver is
high. This fact is exacerbated by the fact that n and p channel transistors cannot
be placed close together as these are in different wells which have to be kept well
separated in order to avoid latchup.
Pseudo nMOS design style reduces dynamic power (by reducing capacitive
loading) at the cost of having non-zero static power by replacing the pull up
network by a single pMOS transistor with its gate terminal grounded. The pseudo
nMOS inverter is shown below.
Vdd
Out
in
Gnd
Notice that since the pMOS is not driven by signals, it is always on. The effective
gate voltage seen by the pMOS transistor is Vdd . Thus the overvoltage on the p
channel gate is always Vdd - VT p . When the nMOS is turned on, a direct path
between supply and ground exists and static power will be drawn.
19
3.1.1
As we sweep the input voltage from ground to Vdd , we encounter the following
regimes of operation:
nMOS off
This is the case when the input voltage is less than VT n . The output is high and
no current is drawn from the supply.
As the input voltage is raised above VT n , we enter this region. The input voltage
is assumed to be sufficiently low that the output voltage exceeds the saturation
voltage Vi VT n . Normally, this voltage will be higher than VT p , so the p channel
transistor is in linear mode of operation. Equating currents through the n and p
channel transistors, we get
Kp
Kn
1
(Vdd VT p )(Vdd Vo ) (Vdd Vo )2 =
(Vi VT n )2
2
2

(3.1)
defining V1 Vdd Vo and V2 Vdd VT p , we get

1 2
V1 V2 V1 + (Vi VT n )2 = 0
2
2
with solutions
(3.2)
V22 (Vi VT n )2
V1 = V2
substituting the values of V1 and V2 and choosing the sign which puts Vo in the
correct range, we get
Vo = VT p +
(Vdd VT p )2 (Vi VT n )2
(3.3)

As the input voltage is increased, the output voltage will decrease in accordance
with equation(3.3). At some point, the output voltage will fall below Vi VT n . It
can be shown that this will happen when
Vi > VT n +
VT p +
VT2p + ( + 1)Vdd (Vdd 2VT p )

+1
The nMOS is now in its linear mode of operation. We shall not derive the expression for the output voltage in this mode of operation in the discussion here. The
solution is straightforward, though algebraically tedious.
20

As the input voltage is raised still further, the output voltage will fall below VT p .
The pMOS transistor is now in saturation regime. Equating currents, we get
1
Kp
Kn (Vi VT n )Vo Vo2 =
(Vdd VT p )2
2
2

which gives
1 2
(Vdd VT p )2
Vo (Vo VT n )Vo +
2
2
This can be solved to get
Vo = (Vi VT n )
3.1.2
(Vi VT n )2 (Vdd VT p )2 /
(3.4)
Noise margins
As in the case of CMOS inverter, we find points on the transfer curve where the
slope is -1.
When the input is low and output high, we should use eq(3.3). Differentiating
this equation with respect to Vi and setting the slope to -1, we get
Vdd VT p
ViL = VT n + q
( + 1)
and
VoH = VT p +
(Vdd VT p )
+1
(3.5)
(3.6)
When the input is high and the output low, we use eq(3.4). Again, differentiating
with respect to Vi and setting the slope to -1, we get
2
(Vdd VT p )
ViH = VT n +
3
and
VoL =
(Vdd VT p )
To make the output low value lower than VT n , we get the condition
1 Vdd VT p
>
3
VT n

2
21
(3.7)
(3.8)
This condition on values of places a requirement on the ratios of widths of n

and p channel transistors. The logic gates work properly only when this equation
is satisfied. Therefore this kind of logic is also called ratioed logic. In contrast,
CMOS logic is called ratioless logic because it does not place any restriction on
the ratios of widths of n and p channel transistors for static operation. The noise
margin for pseudo nMOS can be determined easily from the expressions for ViL ,
VoL , ViH , VoH .
3.1.3
In the sections above, we have derived the behaviour of a pseudo nMOS inverter
in static conditions. In the sections below, we discuss the dynamic behaviour of
this inverter.
Rise Time
When the input is low and the output rises from low to high, the nMOS is off.
The situation is identical to the charge up condition of a CMOS gate with the
pMOS being biased with its gate at 0V. This gives
rise
"
2VT p
Vdd + VoH 2VT p
C
+ ln
=
Kp (Vdd VT p ) Vdd VT p
Vdd VoH
(3.9)
Fall Time
Analytical calculation of fall time is complicated by the fact that the pMOS load
continues to dump current in the output node, even as the nMOS tries to discharge
the output capacitor.
Vdd
Out
in
Gnd
Figure 3.1: high to low transition on the output

Thus the nMOS should sink the discharge current as well as the drain current of
the pMOS transistor. We make the simplifying assumption that the pMOS current
22
remains constant at its saturation value through the entire discharge process. (This
will result in a slightly pessimistic value of discharge time). Then,
Ip =
Kp
(Vdd VT p )2
2
. We can write the KCL equation at the output node as:

In Ip + C
which gives
f all
=
C
dVo
=0
dt
VoL
Vdd
dVo
In Ip
We define V1 Vi VT n and V2 Vdd VT p . The integration range can be divided

into two regimes. nMOS is saturated when V1 Vo < Vdd and is in linear regime
when VoL < Vo < V1 . Therefore,
Z V1
Z VoL
f all
dVo
dVo
=
1
2
C
Vdd 2 Kn V1 Ip
V1
so,
3.1.4
Vdd V1
f all
= 1
+
C
K V 2 Ip
2 n 1
V1
VoL
dVo
We design the basic inverter first and then map the inverter design to other logic
circuits. The load device size is calculated from the rise time. From eq. 3.9 we
have
"
#
C
2VT p
Vdd + VoH 2VT p
+ ln
rise =
Kp (Vdd VT p ) Vdd VT p
Vdd VoH
Given a value of rise , operating voltages and technological constants, Kp and
hence, the geometry of the p channel transistor can be determined.
Geometry of the n channel transistor in the reference inverter design can be
determined from static considerations. Using eq. 3.4, the output low level is
given by:
q
Vo = (Vi VT n ) (Vi VT n )2 (Vdd VT p )2 /
If the desired value of the output low level is given, we can calculate . But
Kn /Kp and Kp is already known. This evaluates Kn and hence, the geometry
of the n channel transistor.
23
Vdd
Out
Figure 3.2: Pseudo NMOS implementation of A.B + C.(D + E)
3.1.5
Conversion of pseudo nMOS Inverter to other logic
Once the basic pseudo nMOS inverter is designed, other logic gates can be derived
from it. The procedure is the same as that for CMOS, except that it is applied
only to nMOS transistors. The p channel transistor is kept at the same size as
that for an inverter.
The logic is expressed as a sum of products with a bar (inversion) on top.
For every . in the expression, we put the corresponding n channel transistors in
series and for every +, we put the n channel transistors in parallel. We scale
the transistor widths up by the number of devices put in series. The geometries
are left untouched for devices put in parallel. Fig.3.2 shows the implementation of
A.B + C.(D + E) in pseudo NMOS logic design style.
3.2
This logic family is based on multiplexer logic.

Given a boolean function F(x1, x2, . . . , xn), we can express it as:
F (x1, x2, . . . , xn) = xi f 1 + xi f 2
where f1 and f2 are reduced expressions for F with xi forced to 1 and 0 respectively.
Thus, F can be implemented with a multiplexer controlled by xi which selects f1
or f2 depending on xi. f1 and f2 can themselves be decomposed into simpler
expressions by the same technique.
To implement a multiplexer, we need both xi and xi. Therefore, this logic
family needs all inputs in true as well as in complement form. In order to drive
24
xi
xi
F
f1
f2
F
f1
f2
Figure 3.3: Basic Multiplexer with logic restoring inverters
other gates of the same type, it must produce the outputs also in true and complement forms. Thus each signal is carried by two wires. This logic style is called
Complementary Passgate Logic or CPL for short.
3.2.1
Basic Multiplexer Structure
Pure passgate logic contains no amplifying elements. Therefore, it has zero or

negative noise margin. (Each logic stage degrades the logic level). Therefore,
multiple logic stages cannot be cascaded. We shall assume that each stage includes
conventional CMOS inverters to restore the logic level. Ideally, the multiplexer
should be composed of complementary pass gate transistors. However, we shall
use just n channel transistors as switches for simplicity.
This gives us the multiplexer structure shown in fig.3.3.
3.2.2
Since both true and complement outputs are generated by CPL, we do not need
separate gates for AND and NAND functions. The same applies to OR-NOR, and
XOR-XNOR functions.
To take an example, let us consider the XOR-XNOR functions. Because of the
inverter, the multiplexer for the XOR output first calculates the XNOR function
given by A.B +A.B. If we put A = 1, this reduces to B and for A = 0, it reduces to
B. Similarly, for the XNOR output, we generate the XOR expression = A.B +A.B
which will be inverted by the logic level restoring inverter. The expression reduces
to B for A = 1 and to B for A = 0. This leads to an implementation of XOR25
A
A+B
A+B
B
A+B
A+B
B
XORXNOR
Figure 3.4: Implementation of XOR and XNOR by CPL logic.
XNOR as shown in fig.3.4

A
A.B
B
A.B
A+B
A
B
A+B
A
B
A.B
A+B
A.B
A+B
ANDNAND
ORNOR
Figure 3.5: Implementation of (a) AND-NAND and (b) OR-NOR functions using
complementary passgate logic.
Implementation of AND and OR functions is similar. In case of AND, the
multiplexer should output A.B to be inverted by the buffer. This reduces to B
when A = 1. When A = 0, it evaluates to 1 = A. For NAND output, the
multiplexer should output A.B, which evaluates to B for A = 1 and to 0 (or A)
when A = 0.
3.2.3
Buffer Leakage Current
The circuit configuration described above uses nMOS multiplexers. This limits
26
xi
xi
f1
y=F
F
f2
Figure 3.6: High leakage current in inverter
the high output of the multiplexer (node y - which is the input for the inverter)
to Vdd - VT n . Consequently, the pMOS transistor in the buffer inverter never quite
turns off. This results in static power consumption in the inverter. This can be
xi
xi
f1
y=F
F
f2
Figure 3.7: Pull up pMOS to avoid leakage in the inverter

avoided by adding a pull up pMOS as shown in fig. 3.7. When the multiplexer
output (y) is low, the inverter output is high. The pMOS is therefore off and has
no effect. When the multiplexer output goes high, the inverter input charges up,
the output starts falling and turns the pMOS on. Now, as the multiplexer output
(y) approaches Vdd - VT n , the nMOS switch in the multiplexer turn off. However,
the pMOS pull up remains on and takes the inverter input all the way to Vdd .
This avoids leakage in the inverter.
However, this solution brings up another problem. Consider the equivalent circuit when the inverter output is low and the pMOS is on. Now if the multiplexer
output wants to go low, it has to fight the pMOS pullup - which is trying to keep
27
Vdd
0
0 ->1
0
Figure 3.8: Problem with a low to high transition on the output
this node high.

In fact, the multiplexer n transistor and the pull up p transistor constitute a
pseudo nMOS inverter. Therefore, the multiplexer output cannot be pulled low
unless the transistor geometries are appropriately ratioed.
3.3
We can understand this logic configuration as an attempt to improve pseudo-nMOS

logic circuits. Consider the NOR gate shown below: Static power is consumed by
Vdd
Out
A
Figure 3.9: Pseudo-nMOS NOR

this NOR circuit whenever the output is LOW. This happens when A OR B is
TRUE. We wish that the pMOS could be turned off for just this combination of
inputs.
To turn the pMOS transistor off, we need to apply a HIGH voltage level to its
gate whenever A OR B is true. This obviously requires an OR gate. Non-inverting
28
gates cannot be made in a single stage. However, We can create the OR function
by using a NAND of A and B as shown in figure 3.10. But then what about the
Vdd
Out
A
B
Figure 3.10: Pseudo-nMOS OR from complemented inputs

pMOS drive of this circuit?
We want to turn the pMOS of this OR circuit off when both A and B are
HIGH; i.e. when A = B = 0. This means we would like to turn the pMOS of
this circuit off when the NOR of A and B is TRUE.
But we already have this signal as the output of the first (NOR) circuit! So
the two circuits can drive each others pMOS transistors and avoid static power
consumption. This kind of logic is called Cascade Voltage Switch Logic (CVSL). It
Vdd
Out
Out
A
A
Figure 3.11: OR-NOR implementation in Cascade Voltage Switch Logic

can use any network f and its complementary network f in the two cross-coupled
branches. The complementary network is constructed by changing all series connections in f to parallel and all parallel connections to series, and complementing
all input signals.
CVSL shares many characteristics with static CMOS, CPL and pseudo-nMOS.
Like CMOS static logic, there is no static power consumption.
29
Like CPL, this logic requires both True and Complement signals. It also
provides both True and complement outputs. (Dual Rail Logic).
Like pseudo nMOS, the inputs present a single transistor load to the driving
stage.
The circuit is self latching. This reduces ratioing requirements.
3.4
Dynamic Logic
In this style of logic, some nodes are required to hold their logic value as a charge
stored on a capacitor. These nodes are not connected to their drivers permanently. The driver places the logic value on them, and is then disconnected from
the node. Due to leakage etc., the logic value cannot be held indefinitely. Dynamic
circuits therefore require a minimum clock frequency to operate correctly. Use of
dynamic circuits can reduce circuit complexity and power consumption substantially. When the clock is low, pMOS is on and the bottom nMOS is off. The output
Vdd
Out
A
B
C
CL
Ck
Figure 3.12: CMOS dynamic gate to implement (A + B).C.
is pre-charged to 1 unconditionally. When the clock goes high, the pMOS turns
off and the bottom nMOS comes on. The circuit then conditionally discharges the
output node, if (A+B).C is TRUE. This implements the function (A + B).C.
30
3.4.1
Problem with Cascading CMOS dynamic logic
There is no problem when (A+B).C is false. X pre-charges to 1 and remains at 1.

Vdd
Out
A
B
C
CL
Ck
Ck
(A+B).C = FALSE
Out
Ck
(A+B).C = TRUE
X
Out
When (A+B).C is TRUE, X takes some time to discharge. During this time,
charge placed on the output leaks away as the input to nMOS of the inverter is
not 0.
31
3.4.2

Ck1
Ck2
Ck3
Ck4
Ck23
P
A
Out
B
C
Ck12
Figure 3.13: CMOS 4 phase dynamic logic

The problem can be solved by using a 4 phase clock. The idea is to sample the
previous stage only after its evaluation is complete.
In phase 1, node P is pre-charged. In phase 2, P as well as the output are precharged. In phase 3, The gate evaluates. In phases 4 and 1, the output is isolated
from the driver and remains valid. This is called a type 3 gate. It evaluates in
phase 3 and is valid in phases 4 and 1. Similarly, we can have type 4, type 1 and
type 2 gates. A type 3 gate can drive a type 4 or a type 1 gate. Similarly, type
Drive Sequences
Type 1
Type 2
Type 4
Type 3
Figure 3.14: CMOS 4 phase dynamic logic drive constraints

4 will drive types 1 and 2; type 1 will drive types 2 and 3; and type 2 will drive
32
types 3 and 4. We can use a 2 phase clock if we stick to type 1 and type 3 gates
(or type 2 and type 4 gates) as these can drive each other.
3.4.3
Domino Logic
P
A
B
C
Ck
Figure 3.15: CMOS domino logic

Another way to eliminate the problem with cascading logic stages is to use a
static inverter after the CMOS dynamic gate. Recall that the cascaded dynamic
CMOS stage causes problems because the output is pre-charged to Vdd . If the final
value is meant to be zero, the next stage nMOS to which the output is connected
erroneously sees a one till the pre-charged output is brought down to zero. During
this time, it ends up discharging its own pre-charged output, which it was not
supposed to do. If an inverter is added, the output is held low before logic evaluation. If the final output is zero, there is no problem anyway. If the final output
is supposed be one, the next stage is erroneously held at zero for some time. However, this does not result in a false evaluation by the next stage. The only effect
it can have is that the next stage starts its evaluation a little later. However, the
addition of an inverter means that the logic is non-inverting. Therefore, it cannot
be used to implement any arbitrary logic function.
3.4.4
Zipper logic
Instead of using an inverter, we can alternate n and p evaluation stages. The n

stage is pre-charged high, but it drives a p stage. A high pre-charged stage will
keep the p evaluation stage off, which will not cause any malfunction. The p stage
will be pre-discharged to low, which is safe for driving n stages. This kind of logic
is called zipper logic.
33
Vdd
B
C
E
D
Ck
Ck
Gnd
A, B, C must be from p stages.

D and E must be from n stages.
Figure 3.16: Zipper logic
34
CMOS Mixed Signal Design

Part I: OpAmp Design
Dinesh Sharma
IIT Bombay, Mumbai
September 19, 2010

Introduction
Linear Mode
Linear Mode of Operation

V
OH
Analog circuits require the

output voltage to be sensitive
to the input voltage.
Digital logic requires the
output to be insensitive to the
exact input voltage.
OL
ViL
ViH
Circuits needhttp://www.satishkashyap.com/
to be biased for operation in the linear regime.

Introduction
Linear Mode

V
OH

OL
ViL
ViH

Introduction
Linear Mode

V
OH

OL
ViL
ViH

Introduction
Linear Mode

V
OH

OL
ViL
ViH

Single Transistor Amplifier
A Single Transistor Amplifier

dId =
I
d
v
o
v
i
Vg
I
Id
dVg + d dVd
Vg
Vd
Id
= gm (Transconductance)
Vg
V
d
Id
= go (O/P conductance)
Vd
The current source load keeps the drain current constant. So

dId = 0 = gm vi + go vo
Hence, the voltage gain (Ao ) is
vo
gm
Ao =
=
= gm ro
vi
go

Transistor Characteristics
gm and go depend on the transistor characteristics.
In saturation,
K
Id (Vgs VT )2
2
where, K is the conductivity factor given by:

W
W
Cox
K =K
L
L
VT is the threshold voltage
W and L are transistor width and length respectively.
is the mobility
and Cox is thehttp://www.satishkashyap.com/
gate oxide capacitance per unit area.

Transconductance
Let VGT (Vgs VT )
2
KVGT
Then Id =
2
2Id
VGT =
K

I
W
gm = d = KVGT = K
VGT
Vg
L
s
r

p
2Id
W
Also gm = KVGT = K
= 2KId = 2K
Id
K
L
Similarly, K =
2Id
and
; Therefore gm =
2Id
VGT =
VGT
VGT
2
2Id
VGT

Which formula?

W
L
To increase gm
should we increase VGT ?
s
or decrease it?

W
Is gm linearly dependent on
Id
gm = 2K
L
transistor size?
dependent on its square root?
2I
gm = d
or is it independent of transistor
VGT
size?
In fact, which formula should be applied depends on how the
transistor is biased and sized. If size and VGT are known, the
first formula applies. If the drain current and size are known, the
second one does. If gate voltage and drain current are given
and the transistor
is accordingly sized, the third formula should
be used.
gm = K
VGT

Which formula?

W
L
To increase gm
s
or decrease it?

W
Id
gm = 2K
L
transistor size?
2I
gm = d
VGT
size?
and the transistor
be used.
gm = K
VGT

Which formula?

W
L
To increase gm
s
or decrease it?

W
Id
gm = 2K
L
transistor size?
2I
gm = d
VGT
size?
and the transistor
be used.
gm = K
VGT

Which formula?

W
L
To increase gm
s
or decrease it?

W
Id
gm = 2K
L
transistor size?
2I
gm = d
VGT
size?
and the transistor
be used.
gm = K
VGT

Which formula?

W
L
To increase gm
s
or decrease it?

W
Id
gm = 2K
L
transistor size?
2I
gm = d
VGT
size?
and the transistor
be used.
gm = K
VGT

Which formula?

W
L
To increase gm
s
or decrease it?

W
Id
gm = 2K
L
transistor size?
2I
gm = d
VGT
size?
and the transistor
be used.
gm = K
VGT

Which formula?

W
L
To increase gm
s
or decrease it?

W
Id
gm = 2K
L
transistor size?
2I
gm = d
VGT
size?
and the transistor
be used.
gm = K
VGT

Which formula?

W
L
To increase gm
s
or decrease it?

W
Id
gm = 2K
L
transistor size?
2I
gm = d
VGT
size?
and the transistor
be used.
gm = K
VGT

Output conductance
Assuming a simple Early effect like model, we can write for go :
go Id /L
where L is the channel length and is a technology dependent
parameter. In terms of geometry and VGT , we can write:
go =
K W 2
V
2 L2 GT
The Early Voltage VA is L/ . So,

K W VGT 2
go Id /VA =
2
VA

DC Voltage Gain
Voltage Gain
The voltage gain in terms of geometry and VGT :
Ao =
2L
V
GT
In terms of drain current and geometry:

s
1 2K WL
Ao =
Id
Thus, if the transistor is biased at constant current, the DC gain
is determined by the square root of the gate area.

AC Behaviour
AC Behaviour
Cgd
vo
G
vi Cg
S
gm vi
ro
D
Co
S
vo
sCo vo = 0
ro

1
gm vo sCgd + + sCo = 0
ro
sCgd (vi vo ) gm vi
vi sCgd
1 sCgd /gm
vo
AC gain A1 =
So the http://www.satishkashyap.com/
= gm ro
vi
1 + sro (cgd + co )

AC Behaviour
Bandwidth
A1 = gm ro
1 sCgd /gm
1 + sro (cgd + co )
Let Ctot Cgd + Co

Then, A1 = Ao
1 sCgd /gm
1 + sro Ctot
Normally, Cgd /gm << 1

Therefore, A1
Ao
1 + sro Ctot
This describes
the frequency response of a system with one
dominant pole. The bandwidth is given by 1/ro Ctot .

AC Behaviour
Gain Bandwidth Product
Gain (db)
Ao
A - 3db
o
0 db
BW
GBW
Frequency
GBW = gm ro
1
gm
=
ro Ctot
Ctot
The gain bandwidth
product (or the cutoff frequency) is
independent of ro .

AC Behaviour
Maximum GBW
GBW is max. when there is no load connected and the load is
entirely due to the device capacitance itself. Then the load
capacitance is proportional to the device width.
Ctot = W where is a technological parameter.
GBWmax =
gm
W
K VGT
L
r
1 2K Id
=
WL
2I
d
=
WVGT
GBWmax

AC Behaviour
Summary
Free Design Variables:

Parameters W , L, VGT qW , L, Id
L, VGT , Id
gm
K WL VGT
go
2
K WVGT
2
2L
Ao
2L
VGT
GBW
GBW max
K WVGT
LCtot
K VGT
L
2K W
L Id
Id
qL
2K WL
Id
2K WId 1
qL Ctot
2K Id
1
WL
2Id
VGT
Id
L
2L
VGT
2Id
VGT Ctot
K VGT
L

AC Behaviour
Technological Constraint
Ao GBWmax
2L
K VGT
1
=
=
VGT
L
So Ao GBWmax =
2K WL 1
Id
2K Id
WL
2K

Therefore, this quantity is a technological constant and the

designer has no control over it.
What if an application requires a Gain-GBW product higher
than this value?

Cascode Amplifier
Cascode Amplifier
dId = gmeq dVg1 + goeq dVd2

d
V
d2
ref
Vg2
Vg1
v in
v out
So gmeq =
Id
with dVd2 = 0
Vg1
and goeq =
Id
with dVg1 = 0
Vd2
M2
V
d1
M1
To calculate gmeq , we put a voltage source at

Id
the output node and calculate V
.
g1
goeq is calculated by putting a voltage source at
Id
vg1 and calculating V
.
d2

Cascode Amplifier
Cascode eq. gm
Equivalent gm of Cascode
Id
Vg1
= dVd1 ,
gmeq =
I
dVds2
V
d2
V
ref
Vg2
Vg1
v in
v out
M2
M1
dVgs2 = dVd1
id
= gm1 vg1 + go1 vd1
id
= gm2 vd1 go2 vd1

id
=
gm2 + go2
go1
= gm1 vg1 id
gm2 + go2
So vd1
V
d1
with dVd2 = 0
id
+g
m2
o2
gmeq = d = gm1
gm1
vg1
go1 + go2 + gm2

Cascode Amplifier
Cascode eq. go
Equivalent go of Cascode
goeq =
dVgs1 = 0,
I
d
V
d2
ref
Vg2
Vg1
v in
with dVg1 = 0
dVgs2 = dVd1 ,
dVds2 = dVd2 dVd1
i
id = 0 + go1 vd1 ,
sovd1 = d
go1
v out
M2
V
d1
Id
Vd2
id = gm2 vd1 + go2 (vd2 vd1 )

id = id
gm2 + go2
+ go2 vd2
go1
id
go1 go2
=
v
g
+
go2 + gm2
o1
d2
M1
go2
goeq go1
gm2
goeq =

Cascode Amplifier
DC gain of Cascode
DC gain of Cascode
gmeq
gm1 (gm2 + go2 ) g01 + g02 + gm2
=
goeq
g01 + g02 + gm2
g01 g02

gm1 (gm2 + go2 )
gm1
gm2
=
1+
So
Ao =
g01 g02
g01
g02
gm1
Let
A01
common source gain
g01
gm2
And A02 1 +
common gate gain
g02
Ao =
Then,
Ao = A01 A02
DC gain = thehttp://www.satishkashyap.com/
product of the DC gain of the two transistors.

Cascode Amplifier
AC Behaviour of Cascode
I
d
V
d2
ref
Vg2
Vg1
v in
v out
M2
vi
Cg1 gm1 vi
S
V
d1
M1
ro2
Cdg1
vx
vo
ro1
gm2 vx
Co
We shall see presently that vx is quite small.
Initially, we shall ignore the effect of the drain capacitance of

the lower transistor and the gate capacitance of the upper one.
If necessary, we can always replace ro1 by ro1 kCds1 kCg2 .

Cascode Amplifier
ro2
Cdg1
vx
vi
Cg1 gm1 vi
S
gm2 vx +
vx =
vo
ro1
gm2 vx
Co
vx vo
= sCo vo
ro2
1 + sro2 Co
1 + sro2 Co
vo =
vo
1 + gm2 ro2
A2
Since A2 is quite large, vx is very small compared to vo .

Cascode Amplifier
ro2
Cdg1
vx
vi
Cg1 gm1 vi
S
vo
ro1
gm2 vx
sCdg1 (vi vx ) = gm1 vx +
Co
vx
+ sCo vo
ro1
(A1 sro1 Cdg )A2

vo
=
vi
(1 + sro2 Co )(1 + sro1 Cdg ) + A2 sCo ro1
If sro1 Cdg is small,
Voltage gain =
vo
A1 A2
=
vi
1 + sro1 Co (A2 + ro2 /ro1 )
This shows that the DC gain is multiplied by A2 and the

bandwidth is reduced by roughly the same factor.

Cascode Amplifier
Example Cascode Design

We want to design a cascode amplifier with the following
specifications:
DC gain = 2500
Gain-Bandwidth product = 100MHz.
ref
v in
Load capacitance = 1 pF
The two transistors in cascode
I
configuration have identical geometries
d
V
d2
v out
and the load is an ideal current source.
Vg2
M2
Assume the following technological
V
parameters:
d1
Vg1
Kn = 150A/V 2, VTn = 0.5V , VE = 20V
M1
Assume the supply voltage to be 3.3V.

Cascode Amplifier
Calculation of gm
The gain bandwidth product is given by
2 108 =
gm
C .
So,
gm
gm1
=
C
1012
So gm1 = 628.3S
Since the same current flows through the two transistors and
they have the same geometry, gm1 = gm2 , go1 = go2 .
Let A =
gm1
gm2
=
go1
go2
Therefore,

gm2
gm1
1+
= A(A + 1)
2500 =
go1
go2
This gives A 49.5.

Cascode Amplifier
Calculation of bias current and geometry
49.5 =
628.3 106
go1
so go1 = 12.7S
Therefore go1 = 12.7 106 =
Id
I
= d
VE
20
From where, the drain current is 254A.

r
W
W
628.32 1012
Since gm1 = 2K Id ,
=
5.2
L
L
300 10 6 254 106
W
= 5.2, Id = 254A
L
Therefore gm = 628.3S,

Cascode Amplifier
Bias Voltages
1 W 2
Id = K VGT
2
L
d
V
d2
ref
Vg2
Vg1
v in
v out
M2
V
d1
M1
So VGT =
2 254
= .81V
150 5.2
Vg1 VTn + VGT = 0.5 + 0.81 = 1.31V

M1 will be in saturation when
Vd1 = VS2 0.81V ,
So Vg2 0.81 + 0.5 + 0.81 = 2.12V .
For M2to be in saturation,
Vd2 2.12 0.5 = 1.62V .
Thus the maximum

output swing is from 1.62V to Vdd .

Cascode Amplifier
DC level incompatibility
The output DC level of a cascode amplifier is higher than the
input DC level. This causes problems with direct connection to
the next stage, or with DC feed back to itself.
These problems can be reduced if we use
a complementary arrangement of n and p
channel transistors for cascoding.
The upper transistor of the cascode
arrangement can be thought of as a
source follower to its bias voltage, which
keeps the drain voltage of the lower
amplifier transistor (nearly) constant.
Vdd
Load
Vbiasn
Vout
Vin
Gnd
Can we use ahttp://www.satishkashyap.com/

p channel transistor as a
source follower?

Cascode Amplifier
Alternative Cascode
The p source follower will keep the drain

voltage of the amplifier at Vbiasp + |VTp |,
allowing the cascode action as before.
Unfortunately, the circuit wont work as
there is no path between Vdd and ground!
We can rectify this problem by providing a
current source p load to the amplifier
transistor M1.
Vdd
M1
Gnd
M2
Vbiasp
Vout
Load

Cascode Amplifier
Folded Cascode
Folded Cascode
Vdd
Vbiasp1
M3
M2
Vin
M1
Gnd
Vbiasp2
Vout
Load
This arrangement is called a folded cascode.

M3 provides the bias current.
M2 and M3 keep the drain voltage of M1 nearly
fixed
Id3 - Id1 flows through the p channel
cascoding transistor M2, which provides
amplification in a common gate configuration.
rout = (1 + gm2 ro2 )(ro1 ||ro3 ) + ro2

This is lower than the output resistance of the telescopic
cascode stage, because of the paralleling of ro1 and ro3 .
However, ithttp://www.satishkashyap.com/
is much higher than the single transistor output
resistance.

Current Mirrors
Current Source Loads

Up to now we have assumed current source loads. How do we
implement these?
A transistor in saturation has a (nearly) constant drain
current.
Therefore single transistors (preferably with long channels)
can be used as current sources/sinks.
These act as current sources/sinks only over some voltage
range not for all voltages.
There is a weak dependence on voltage due to nonzero
output conductance.
This dependence can be reduced by using a cascode
stage. http://www.satishkashyap.com/

Current Mirrors
A simple Current Mirror

Iref
Io
For M1, Vds = Vgs > Vgs VT

Therefore M1 is saturated.
Iref =
M1
Vref
M2
K
(Vref VT )2
2
r
2Iref
K
If M2 is also saturated, Io = Iref
Therefore Vref = VT +
Thus M2 can act as a current source load

r
2Iref
if Vo > Vref VT
i.e.
Vo >
http://www.satishkashyap.com/K

Current Mirrors
Load for a Cascode stage

Vdd
Vbiasp1
Vbiasp2
Vbiasn
Vin
Gnd
Vout
The output resistance of the load appears in

parallel with that of the amplifying stage.
If we use a single transistor current load for a
cascode, the output resistance of the load will
be ro while that of the cascode stage will be
A ro .
The effective output resistance will thus be
dominated by the much lower resistance of the
load and we shall lose the advantages of the
cascode stage.
It is important, therefore, that the load also
should be a current source made from a
cascode pair.

Current Mirrors
A cascode current mirror

Iref
Vx
M1
Io
A single transistor current mirror will have

some dependence on the drain voltage
due to its output resistance.
Vb
M3
This dependence can be reduced
substantially by using a cascode stage.
Vy
Vref
However, this reduces the available
M2
voltage range over which the transistors
are saturated.
r
2Iref
For saturation of M2 Vy Vref VT =
K
r
2Iref
Vb 2
Therefore
+ VT
K
r
2Iref
For saturation of M3 Vo 2
K

Current Mirrors
Self biased Cascode current mirror

Iref
Io
This circuit does not need an external

voltage bias.
M0
M3
Vb
Vx
Vy
M1
Vref
The reference side of the mirror generates

the bias voltages for both the transistors of
the cascode output side.
However, this reduces the voltage range

over which the r
the output may swing.
2Iref
Vb = 2
+ 2VT
K
r
2Iref
For saturation of M3 Vo 2
+ VT
K
M2
The output voltage needs to be a VT higher than the minimum.

Current Mirrors
Folded Cascode with load

Vdd
M3
Vbiasp1
M2
Vbiasp2
Vout
Vbiasn2
Vin
The load for the folded cascode should also be

a cascode pair.
Here two n channel transistors in cascode
configuration are used as the load.
M1
Vbiasn1
Gnd
One major advantage of the folded cascode is that the output

can be directly coupled to the input for negative feedback.

Current Mirrors
Folded Cascode with Load

Vdd
M3
Vbiasp1
M2
Vbiasp2
Vout
Vbiasn2
Vin
M1
Vbiasn1
Gnd
The single transistor amplifier can be replaced by any

transconductance, of course. In operational amplifiers, the
single transistor
stage will be replaced by a differential amplifier.

Operational Amplifiers
Differential Amplifiers
Circuits which amplify the difference of two input voltages (each

of which has equal and opposite signal excursions) have many
advantages over single ended amplifiers.
Noise picked up by both inputs gets canceled in the output.
Input and feedback paths can be isolated.
If both inputs have the same DC bias, the output is
insensitive to changes in the bias.

Some definitions
It is more convenient to represent the two input voltages and
the two output voltages by their mean and difference values.
vid
vicm
vod
vocm
vi1 vi2
vi1 + vi2
2
vo1 vo2
vo1 + vo2
The common mode and differential gains are:

vod
vid
v
ocm
Acm
vicm
Adiff

Common Mode Rejection Ratio
For a good diff amp, the differential gain should be high and
independent of input common mode voltage, whereas the
common mode gain should be as low as possible. The
common mode rejection ratio is:
CMRR 20 log
Adiff
dB
Acm

Will this do?

Vdd
vo 1 vo 2
vi 1
vi 2
One (not very good) way of implementing a diff amp is to use

two single ended amplifiers as shown above.
Output = Vo1 Vo2
Here the transistor currents, and hence the differential gain, will
depend on the common mode voltage. This is not desirable as
we would like the circuit to ignore the common mode voltage
and to amplify just the difference signal.

The long tail pair
A better diff amp can be implemented by adding a current

source to keep the total current constant.
Vdd
vo 1 vo 2
vi 1
vi 2
Vs
Is
If the common mode voltage appearing at the

two inputs changes, it will only change the
voltage at the node where the two sources join
(Vs ). However, the current remains unchanged
due to the current source - and therefore, the
differential gain is unaffected by the common
mode voltage.

Diff amp with single ended output

Vdd
Mp1
Mp2
I(Mp2) = I(Mp1) (current mirror)

I(Mp1) = I(Mn1) (series connection)
i out
vi 1
Mn1
Mn2
Vs
Is
iout = I(Mp2) I(Mn2)
vi 2
iout = I(Mn1) I(Mn2) = gm (vi1 vi2 )

iout Gm (vi1 vi2 ) = Gm vid
Thus we have a single output which is proportional to the

difference of inputs.
The effective Gm is just the gm of either of the diff-pair
transistors. http://www.satishkashyap.com/

Gain of the OTA

Vdd
Mp1
Mp2
i out
vi 1
Mn1
Mn2
Vs
vi 2
This circuit is also called an operational

transconductance amplifier (OTA) because the
output is a current.
Is
Rout = ro (Mn2)kro (Mp2)

So DC voltage gain = gm (ro (Mn2)kro (Mp2))
gm
and GBW =
CL
CL includes Cdg and Cd for Mn2 and Mp2, as well as the load
capacitance.

The two stage op-amp
Two stage op-amp

Vdd
Mp1
Mp3
Mp2
i out
vi 1
Mn1
Vbias
Mn2
Vs
vi 2
vout
Mn3
Mn4
A simple two stage op-amp can be constructed

by following the diff amp by a common source
stage with a constant current load.
The current source for the diff amp is
implemented by an n channel MOS transistor
in saturation.
The two stage design permits us to optimize the output stage

for driving the load and the input stage for providing good
differential gain and CMRR.
A diff amp with n transistors and an output stage with p driver is
shown. However, a p type diff amp with n type common source
stage is better
for low noise operation.

op-amp eq. circuit

Differential Stage
gm11 v1
v2
R1
C1
Output Stage
gm22 v2
v0
R2
C2
Each stage of the opamp can be considered a gain stage with a

single pole frequency response.
Notice that the phase of the output of each stage will undergo a
phase change of 90o around its pole frequency.

op-amp Compensation
Most opamps are used with negative feedback.

If the opamp stages themselves contribute a phase difference
of 180o , the negative feedback will appear as positive feedback.
If the gain at this frequency is > 1, the circuit will become
unstable.
Both stages of the opamp have a single pole frequency
response.
The poles for both the stages can be quite close together.
As a result, they can contribute a total of 180o phase shift over
a relatively narrow frequency range.

Pole Splitting
To avoid instability, we would like to arrange things such

that the gain drops to below one by the time the phase shift
through the opamp becomes 180o .
- Even if it means that we have to reduce the bandwidth of
the op amp.
This is often achieved by a technique called pole splitting.
The lower frequency pole is brought to a low enough
frequency, so that the gain diminishes to below one by the
time the second pole is reached.
One way of doing this is to use a Miller capacitor.

Eq. Circuit of compensated Opamp
Differential Stage
Cc
Output Stage
v2
R1
gm11 v1
v0
C1
R2
C2
gm22 v2

Miller Compensation
C
A2
A1
The diff amp stage sees a load capacitance A 2 C.

This brings its pole to ro1 A1 2 C .
The total DC gain is A1 A2 .
The bandwidth is set by the diff amp stage.
Therefore the gain-bandwidth product is:
A1 A2
A1
ro1 A2 C
ro 1C

Slew rate
Miller compensation also sets the slew rate of the op amp.
For large signal input, the output current of the
Vdd
Mp3
Mp1
Mp2
OTA = tail current.
i out
vout
The effective load capacitance for this stage is
vi 1
Mn1 Mn2 vi 2
Vs
A2 C.
dV
Mn3
Vbias
Mn4
= I(Mn4)
A2 C
dt
Output of the OTA slews at a rate
I(Mn4)
A2 C .
So the op amp slews at a rate which is A2 times this value.

Hence the slew rate of the op amp is
I(Mn4)
C .

Design Equations-I
All transistors must be saturated
Vdd
Mp1
Mp3
Mp2
i out
vi 1
Mn1
Vbias
Mn2
Vs
vi 2
vout
Mn3
Mn4
I(Mn1) = I(Mn2) =
I(Mn1) = I(Mp1)
I(Mp1) = I(MP2)
I(Mn4)
2
(Series connection)
(Mirror)
Mp1 is always saturated.

Mp1, Mp2 have the same Vs , Vg , Id
Since W/L(Mp2) = W/L(Mp1), MP2 will have the same Vd as
Mp1, and so, will be saturated.

Design Equations-II
Mp3 has the same Vs , Vg as Mp1.
Vdd
Mp1
Mp3
Mp2
i out
vi 1
Mn1
Vbias
Mn2
Vs
vi 2
vout
Mn3
Mn4
If
I(Mp3)
W /L(Mp3)
=
I(Mp1)
W /L(Mp1)
Mp3 will have the same Vd as Mp1

and will be saturated.
The slew rate determines I(Mn4).

I(Mn4) = C Slew Rate
I(Mn4)
I(Mn1) = I(Mn2) =
2

Design Equations-III
Vdd
Mp1
Mp3
Mp2
i out
vi 1
Mn1
Vbias
Mn2
Vs
vi 2
GBW determines gm of Mn1, Mn2.
vout
GBW =
Mn3
Mn4
gm (Mn2)
C
Since the current as well as gm of Mn1 and Mn2 are now known
p
gm (Mn2) =
2K W /L(Mn2)I(Mn2)
W /L(Mn1) = W /L(Mn2)
This will determine the geometries of Mn1 and Mn2.

Design Equations-IV
Currents through Mn2,Mp2, Mp3 and Mn3 are known
(go = Id /VA )
where VA is the Early voltage = L/
The overall DC gain is given by

A=
gm (Mn2)gm (Mp3)
(go (Mn2)||go (Mp2))(go (Mp3)||go (Mn3))
As gm for Mn2 and all go values are known, this determines the
gm for MP3.
Once we know the gm as well as the current for Mp3, we can
calculate its geometry.

Example Design: Specifications
K (n) = 120A/V2, K (p) = 60A/V2

VT (n) = 0.4V, VT (p) = 0.4V
Early Voltage VA = 20V
Op amp DC gain
for both p and n channel transistors

= 80dB (Voltage gain of 10000)
Gain Bandwidth product = 50MHz, slew rate = 20V /s

Example Design-1
Vdd
Mp1
Mp3
Mp2
i out
vi 1
Mn1
Vbias
Mn2
Vs
vi 2
vout
Mn3
Mn4
We choose a compensation capacitor value of 2 pF.

We shall bias the second stage at 5 times the tail current of
the differential stage.
From the slew rate, I(Mn4) = 2 1012
20
106
= 40A
Therefore I(Mn1) = I(Mn2) = I(Mp1) = I(Mp2) = 20A

and I(Mp3)
= I(Mn3) = = 200A

Example Design-2
Vdd
Mp1
Mp3
Mp2
i out
vi 1
Mn1
Vbias
Mn2
Vs
vi 2
vout
Mn3
Mn4
From the GBW requirement,

2 50 106 =
gm (Mn2)
2 1012
This gives gm (Mn2) 628.

To get a gm of 628 with a current of 20A,
q
6
628 http://www.satishkashyap.com/
10 = 2 120 106 (W /L) 20 106
this gives W/L(Mn2) 82 = W/L(Mn1)

Example Design-3
Vdd
Mp1
Mp3
Mp2
i out
vi 1
Mn1
Vbias
Mn2
Vs
vi 2
vout
Mn3
Mn4
go of Mn2 and Mp2 = 20A/20V = 1.

Therefore go (Mn2)kgo (Mp2) = 2.
go of Mn3 and Mp3 is = 200A/20V = 10.
Therefore go (Mp3)kgo (Mn3) = 20.
628 gm(Mp3)
2
20
DC gain = 10000 =
So, gm (Mp3) 637

Example Design-4
Vdd
Mp1
Mp3
Mp2
i out
vi 1
Mn1
Vbias
Mn2
Vs
vi 2
vout
Mn3
Mn4
To get a gm of 637 with a drain current of 200A, we

should have
q
6
637 10 = 2 60 106 (W /L) 200 106
which gives the W/L of Mp3 17.
Since the geometry of Mp1 and Mp2 has to be in the
current ratio
with Mp3, W/L of Mp1 and Mp2 should be
1.7.

Example Design-5
Vdd
Mp1
Mp3
Mp2
i out
vi 1
Mn1
Vbias
Mn2
Vs
vi 2
vout
Mn3
Mn4
Finally, we assume that an n type reference bias transistor

of W/L = 4 is available with a current of 10 A. This will
give the W/L of Mn4 and Mn3 as 16 and 80 respectively.
This completes the design for the simple two stage op amp.

Cascode Opamps
Telescopic Cascode Opamp
Vdd
Vbiasp2
Vbiasp1
+
-
Vout
The telescopic cascode is a differential version

of the cascode amplifier discussed earlier.
Its gain is comparable to the two stage op-amp.
The output impedance is (very) high!
Vbiasn2
Vin +
Gnd
Vin -
Vbiasn1
The output impedance in conjunction with the

load capacitance constitutes the dominant pole
of the system.

Cascode Opamps
Telescopic Cascode Opamp

Gain is comparable to the two stage opamp (product of two
single stage amplifiers).
It needs a higher supply voltage compared to a two stage
opamp.
The output stage is high impedance, so the dominant pole
is at the output.
Compensation is provided by the load capacitance. So a
minimum value of load capacitance is required for stability.
The output common mode voltage is different from the
input common mode voltage range.
This presents difficulties in direct coupling to the next stage
and DC feedback to its own input.

Cascode Opamps
Folded Cascode
The common mode voltage incompatibility of a telescopic
cascode can be solved by using a folded cascode.
Vdd
Vbiasp1
Vbiasp2
Vin +
Vin -
Vout
Vbiasn2
Vbiasn1
Gnd

Push Pull Output Stage
Push-Pull Op Amp
Differential to single ended conversion can be done in the
output stage, by using a push-pull driver. The output loads in
the differential stage (Mp1 and Mp2) are diode connected.
Current through Mp2 is mirrored in
the output p transistor Mp4.
Vdd
Mp3
vi-
Mp1
Mp2
Mn1
Mn2
Mp4
vi+
Out
Vs
Mn3
Mn4
Vbias
Gnd
Mn5
Current through Mp1 is mirrored into

a pMOS (Mp3) and passed through
a diode connected nMOS (Mn3).
This current is mirrored in the output
stage nMOS (Mn4).
Mirroring ratio of Mp4 to Mp2 and

Mn4 to Mn3 should be identical (and
can be large).
Pipeline Optimization
Dinesh Sharma
IIT Bombay, Mumbai
2006
Dinesh Sharma
Von Neumann Architecture
State
Data
Processing
Instructions
Data
Instruction
Processing
Instructions
Bus
A common bus is used for

data as well as instructions.
The system can become bus
bound.
Bottleneck!
Memory
Dinesh Sharma
Harvard Architecture
State
Data
Processing
Instructions
Instruction
Processing
Separate data and instruction

paths
Good performance
Data
Data
Memory
Instructions
Instruction
Memory
Needs 2 buses expensive!

Traffic on the buses is not
balanced.
Instruction bus may remain
idle.
Dinesh Sharma
Modified Harvard Architecture
State
Data
Processing
Instructions
Instruction
Processing
Better Bus balancing is

possible.
MUX
Constants
Data
Memory
Constants can be stored with

Instructions in ROM.
Read Only
Memory
Typically, 1 instruction read, 1

constant read, 1 data read and
1 result write per instruction.
2 mem ops per bus.
Dinesh Sharma
Modified Harvard with Cache
State
Data
Processing
Instructions
Instruction
Processing
Cache
MUX
Each operation need not be

balanced individually.
Constants
Data
Memory
Cache allows optimum

utilization of bus bandwidths.
Read Only
Memory
Dinesh Sharma
Instruction and Data State Machines
Address
From PC
Req. Instr.
Recv. Instr.
Recv State
From DP
Operand
Addr to DP
Decode,
Send to DP
Request
Operands
Receive
Oper. Addr
Receive
Instruction
Receive
Operands
Execute
Instruction
Store
Results
Return
State
Operation of the system may

be modeled as two interacting
state machines.
Instruction processor fetches
instr, decodes and gives
operation type and operand
locations to data processor.
Data processor fetches
operands, performs operation
and writes back the result.
Dinesh Sharma
A pipelined processor
Consider a Harvard architecture

processor, which performs the
following tasks repetitively:
Instruction Fetch
ROM
RAM
Fetch Op Code (ROM)
ROM address
ROM Address
ROM data
Instruction
RAM Address
RAM data
.
Dinesh Sharma

Data and Constant Fetch

ROM
RAM
Fetch Op Code (ROM)
Constant
ROM Address
ROM data
Fetch variable (RAM)
RAM Address
RAM data
Fetch constant (ROM)
Data
Dinesh Sharma

Execution Phase
ROM
RAM
Fetch Op Code (ROM)

ROM Address
ROM data
RAM Address
RAM data
Calculate result
Dinesh Sharma

Write Back
ROM
RAM
Fetch Op Code (ROM)

ROM Address
ROM data
RAM Address
RAM data
Result
Calculate result
Store result (RAM)
Dinesh Sharma
Resource Reservation
We can keep track of which resource is doing what at any given

time by a table as shown below:
Resource Reservation Table
ROM
RAM
ALU
0
Instr Fetch
1
Const. fetch
Var. Fetch
3
Write Back
Compute
This is called a reservation table.

Given this reservation table, It appears that we can launch a
new instruction every 4 cycles.
Dinesh Sharma
Overlapping Operations
However, we need not wait for the previous operation to be over

before launching a new one.
ROM
RAM
ALU
0
0
1
0
0
0
0
When can we launch the next calculation?
Dinesh Sharma
10
Pipelining
We can fetch the next instruction from ROM

while we write back the result of the current one to the RAM.
ROM
RAM
ALU
0
0
1
0
0
3
1
0
4
1
1
6
2
1
7
2
2
10
2
2
This will enable us to launch a new calculation every third cycle.
Dinesh Sharma
Overlapping Operations
Is this the best we can do?
ROM
RAM
ALU
0
0
1
0
0
3
1
0
4
1
1
6
2
1
7
2
2
10
2
2
None of the resources are utilized 100% in this scheme.

The ROM and the RAM are busy for 2 out of 3 cycles, whereas
the ALU is used for 1 cycle out of 3.
A new sample is handled every 3rd cycle now.
Can we get even better throughput?
Dinesh Sharma
Improved Scheduling
If we store the result in a local register for 1 cycle,

and write it to the RAM only in the 4th cycle, we get
Modified Resource Reservation Table
ROM
RAM
ALU
BUF
0
0
1
0
0
0
0
0
By delaying the write back,

we can launch the next instruction earlier!
Dinesh Sharma
Improved Scheduling

ROM
RAM
ALU
BUF
0
0
1
0
0
2
1
3
1
1
0
0
4
2
0
1
5
2
2
1
6
3
1
2
7
3
3
2
8
4
2
3
9
4
4
10
5
3
4
We can now launch a new operation every 2nd cycle.

Can this be further improved?
Dinesh Sharma
Improved Scheduling

ROM
RAM
ALU
BUF
0
0
1
0
0
2
1
3
1
1
0
0
4
2
0
1
5
2
2
1
6
3
1
2
7
3
3
2
8
4
2
3
9
4
4
3
The RAM and the ROM are now occupied

100% of the time, So the design is optimal
and the throughput cannot be improved any further.
Dinesh Sharma
10
5
3
4
How can we always find the optimum solution?

Given a Resource Reservation Table, we would like to set

up a systematic method which optimizes the throughput
of the process using this table.
For maximum throughput, we would like to launch new
operations as frequently as possible.
Thus, we want to minimize the time gap between launching
two operations.
This is called the Sample Period (SP).
What is the minimum possible value of SP?
Dinesh Sharma
The minimum Sampling Period
Consider an operation in which the busiest resource is

used for n cycles.
If we launch a new operation every n cycles, this resource
will be used 100% of the time.
If we launch operations any more frequently than this, the
resource will not have enough time to do its work.
Therefore, the minimum possible Sample Period is equal to
the maximum number of cycles for which the busiest of the
resource(s) is in operation.
Dinesh Sharma
Sampling Period
We want to minimize the sampling period.

But the sampling period need not be a constant!
SP can cycle through a finite set of values.
We should therefore define an Average Sampling period
ASP.
The minimum value of this average Sampling Period
(MASP) is given by the number of cycles for which the
busiest resource is used in an operation.
Dinesh Sharma
Cyclic Sampling Period
Consider the following reservation table:

RSC1
RSC2
RSC3
0
0
2
0
0
0
Now the next operation can be launched in cycle 1 itself.

However, the following one can only be launched after a gap of
3 cycles in cycle 4.
ROM
RAM
ALU
0
0
1
1
0
2
0
1
0
3
1
0
1
4
2
1
5
3
2
6
2
3
2
7
3
2
3
8
4
3
9
5
4
10
4
5
4
Again, the next

operation can be launched in the next cycle (in
cycle 5) and after that, with a gap of 3 cycles in cycle 8.
Dinesh Sharma
Average Sampling Period
ROM
RAM
ALU
0
0
1
1
0
2
0
1
0
3
1
0
1
4
2
1
5
3
2
6
2
3
2
7
3
2
3
8
4
3
9
5
4
10
4
5
4
New operations can be launched in clock periods

0,1,4,5,8,9 . . . .
Thus, the sample period cycles through the values {1,3}.
The average of the cycle is called the Average Sampling
Period (ASP).
The Average Sampling period (ASP) is 2 here.
The whole pattern repeats every 4 cycles. This is called
(p).
the period
Dinesh Sharma
Minimum Average Sampling Period
The minimum value of the Average Sampling Period

(MASP) is given by the maximum number of cycles for
which a resource is busy during an operation.
Therefore, given a reservation table, MASP is known.
If the actual average Sampling Period is equal to MASP,
the system is already optimum and nothing needs to be
done.
If the actual average Sampling Period is greater than
MASP, we can attempt to modify the reservation table,
such that MASP is achieved.
Dinesh Sharma
For a given reservation table, find the current average

sample period (ASP).
Find the largest no. of cycles for which a resource is busy.
This is equal to the Minimum possible Average Sampling

Time (MASP).
If ASP = MASP, there is nothing to be done.
Else, we should try to re-schedule events such that MASP

is achieved.
Dinesh Sharma
Method to achieve MASP
We first consider various cycles whose average is the

desired MASP.
For example, if MASP is 2, we can have cycles of {2}, {1,3}
or {1,1,4} etc.
The periods are 2, 4 and 6 in these three cases.
Dinesh Sharma
The Generator Set
For each cycle, we construct a generator set G, which

contains elements of the cycle, their sums taken two at a
time, three at a time etc., modulo periodicity p.
In our example, cycles are {2}, {1,3} and {1,1,4}
For a cycle of {2},
p = 2, so G = {0}
For a cycle of {1,3},
p = 4, so G = {0,1,3}
For a cycle of {1,1,4}, p = 6, so G = {0,1,2,4,5}
Dinesh Sharma
The Source Set
For each selected cycle, We now construct the Source set

S. This contains integers 0 through p-1, from which all
members of G except 0 have been removed.
In our example, cycles are {2}, {1,3} and {1,1,4}
Cycle
p G
S
{2},
2 {0}
{0,1}
{1,3},
4 {0,1,3}
{0,2}
{1,1,4}, 6 {0,1,2,4,5} {0,3}
Dinesh Sharma
Design Sets
For each selected cycle, We construct Design sets Di

which have the property that:
if a D and b D
then |a b| also D.
In our example,
Cycle
p S
{2},
2 {0,1}
{1,3},
4 {0,2}
{1,1,4}, 6 {0,3}
D sets
{0}, {1} and {0,1}
{0}, {2}, {0,2}
{0}, {3}, {0,3}
Dinesh Sharma
Notice that Design sets do not depend on the reservation

table.
The sets G, S and Di are constructed from the repetition
cycles whose average value is the MASP.
Therefore we can make a library of these in advance for
different combinations of MASP values and cycles - and
use them when needed.
Dinesh Sharma
Row Vectors
We construct a row vector for each resource in the

reservation table.
The row vector is a set which contains the clock period in
which a specific resource is busy.
ROM
RAM
ALU
0
0
1
0
0
3
0
In this example, the row vector for ROM is {0,1}, for RAM is
{1,3} and for ALU is {2}.
Dinesh Sharma
Matching Rows with Design Sets
Choose a particular cycle with the desired MASP.

(Say MASP = 2, cycle = {2}).
Pick the corresponding design sets.
(In this example, D = {0}, {1}, {0,1}).
For each resource,
take its row vector and take a design set with the same
cardinality.
Align these according to defined rules.
Dinesh Sharma
Rules for Alignment of the First elements
Compare R(1) and D(1).

If these are equal, nothing needs to be done.
Else,
If R(1) < D(1), add D(1)-R(1) to all members of R
If R(1) > D(1), add R(1)-D(1) to all members of D
This is equivalent to a rigid shift of R or D till their first

members are aligned.
For Example, if R = {1,3,4,6} and D={0,2,5,6}
0
R
D
X
0
X
1
X
X
3
1,3,4,6
0,2,5,6
6
X
1,3,4,6
D
X
X
X X
1,3,6,7
Dinesh Sharma
Alignment of other elements
If R(i) = D(i)] Nothing needs to be done.

0
If R(i) < D(i)

Add D(i) - R(i) delays to all
members of R at position i
and beyond.
Break Here and move

4 5 6 7
X
1,3,6,7
The ith elements are now aligned.
Dinesh Sharma
1,3,4,6
X
X
1,3,6,8
1,3,6,7
Alignment of other elements
Peridicity p = 2
If D(i) < R(i)

(for Example, p = 2
R = {1,3,4,6}, D = {1,2,5,6}.
Now D2 < R2)
1
Add sufficient multiples of p to

D(i) such that it is R(i).
Add the same number to

members of D beyond i.
Now if R(i) < D(i), add D(i) R(i) delays to all members of
R at position i and beyond.
1,3,4,6
1,2,5,6
Break here and

move forward by p (=2) steps
0
1,3,4,6
1,4,7,8
Now align R
0
R
D
Dinesh Sharma
1,4,5,7
X
X
1,4,7,8
Alignment Example
Let R = 1,3,4,6 and D = 0,1,4,5; with periodicity p = 2

0
R
D
0
R
D
R
1
X
X
X
1
X
X
X
X
3
X
4
X
X
X
2
3
X
4
X
5
X
X
5
X
X
X
6
X
To align the first element,

move all elements of D
forward by 1 step.
Now D = 1,2,5,6.
X
X
For the second element, D is

behind. Move D2 onwards
fwd by p = 2, so D = 1,4,7,8.
Move R2 onwards fwd by 1
So R = 1,4,5,7
X
6
X
X
Dinesh Sharma
Alignment Example
R = 1,4,5,7 and D = 1,4,7,8. R3 < D3

0
D
R
R
D
D
R
1
X
X
X
4
X
X
X
7
X
X
X
8
X
10 Move R3 and beyond

forward by 2
X
So R = 1,4,7,9
and D = 1,4,7,8.
X
0 1 2 3 4 5 6 7 8 9 10 D4 < R4
Move D4 forward by 2
X
X
X
X
to 10.
X
X
X X
Now R4 < D4.
X
X
X
X
Move R4 forward by 1
X
X
X
X
to 10
Vectors are now aligned at 1,4,7,10.
Dinesh Sharma
Example System
we shall illustrate the method using our original example,

whose reservation table is:
ROM
RAM
ALU
0
0
1
0
0
0
0
Since the ROM and the RAM are used for 2 cycles each in
every operation, MASP = 2.
However, as we had seen before, ASP = 3 in this case.
Therefore, the schedule needs improvement.
Dinesh Sharma
Example Application
Aligning the ROM
ROM
RAM
ALU
0
0
1
0
0
3
0
MASP = 2, Choose the cycle:{2}

Then D = {0}, {1}, {0,1}
For ROM: R = {0,1}, D={0,1}
So no alignment is required.
Dinesh Sharma
Adjusting the RAM Schedule
For RAM: R = {1,3}, D={0,1}

Aligning the First Element:
R(1) > D(1)
Add (1-0)=1 to D elements D = {1,2}
Aligning other elements:
R(2) > D(2)
Add p (=2) to D(2)
D = {1, 4}
Now R(2) < D(2)
Add (3-2)=1 to R(2)
R = {1, 4}
R and D are now aligned.
Dinesh Sharma
ALU Schedule
For ALU: R = {2}, D = {0}

Aligning first element: Add (2-0) = 2 to D D = {2}
R and D are now aligned.
ROM = {0,1}, RAM = {1,4}, ALU = {2}
Modified Reservation Table
0 1 2 3 4
ROM 0 0
RAM
0
0
ALU
0
As we have seen earlier, this is indeed the optimal schedule
with ASP = 2.
Dinesh Sharma
Optimized Reservation Table

ROM
RAM
ALU
0
0
1
0
0
2
1
3
1
1
4
2
0
1
5
2
2
6
3
1
2
7
3
3
8
4
2
3
9
4
4
10
5
3
4
The ALU is idle 50% of the time.

Rather than buffering its result to delay the write back, we
can use a slower ALU which takes 2 cycles to compute.
Dinesh Sharma
Using a Slower ALU
The reservation table with a slower ALU is:

ROM
RAM
ALU
0
0
1
0
0
2
1
0
3
1
1
0
4
2
0
1
5
2
2
1
6
3
1
2
7
3
3
2
8
4
2
3
9
4
4
3
10
5
3
4
One can trade off power for speed when designing the
ALU.
By using optimization techniques, we are able to reach a
higher throughput, even with a slower ALU!
Dinesh Sharma
Alternative Choice of Cycle
ROM
RAM
ALU
0
0
1
0
0
3
0
MASP = 2, Choose the cycle:{1,3}

Then D = {0}, {2}, {0,2}
For ROM: R = {0,1}, D={0,2}
R(1) = D(1) = 0, R(2) < D(2)
Add D(2) - R(2) to all members of R at position 2 (and beyond)
R(2) = 2.
R and D are now aligned at {0,2}
Dinesh Sharma
Alternative Cycle:RAM Schedule
For RAM: R = {1,3}, D={0,2}

R(1) > D(1)
Add (1-0)=1 to D elements: D = {1,3}
R and D are now aligned at {1,3}.
For ALU: R = {2}, D = {0}
Aligning first element: Add (2-0) = 2 to D D = {2}
R and D are now aligned at {2}.
ROM
RAM
ALU
0
0
2
0
3
0
Dinesh Sharma
Time Ordering
ROM
RAM
ALU
0
0
1
1
0
2
0
1
0
3
1
0
1
4
2
1
5
3
2
6
2
3
2
7
3
2
3
8
4
3
9
5
4
10
4
5
4
As expected, the schedule is optimum.

The sampling rate alternates between 1 and 3.
However this schedule does not preserve time order.
It asks for computation and constant fetch in the same
cycle.
If we pre-fetch the constant for the next to next calculation
in this cycle and store it for 4 cycles, it may still work.
Dinesh Sharma
Conclusions
Pipeline can improve throughput of systems.

A systematic procedure for optimizing pipeline throughput
exists. It can create modified reservation tables which are
optimal by delaying some operations.
However, it does not guarantee that the time order of
different operations will be preserved.
Different cycles with the same Average Sampling Period
may have to be tried before an acceptable time order is
found.
The procedure also allows us to identify non-critical
components which can then be redesigned to be slower
but at lower power consumption.
Dinesh Sharma
AN Introduction to VHDL
Overview
Dinesh Sharma
IIT Bombay, Mumbai
August 2008
Dinesh Sharma
VHDL
Design Units in VHDL

Object and Data Types
Part I
VHDL Design Units
1

entity
Architecture
Component
Configuration
Packages and Libraries

Scalar data types
Composite Data Types
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
An introduction to VHDL
VHDL is a hardware description language which uses the
syntax of ADA. Like any hardware description language, it is
used for many purposes.
For describing hardware.
As a modeling language.
For simulation of hardware.
For early performance estimation of system architecture.
For synthesis of hardware.
For fault simulation, test and verification of designs.
etc.
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
Design Elements in VHDL: ENTITY

The basic design element in VHDL is called an ENTITY.
An ENTITY represents a template for a hardware block.
It describes just the outside view of a hardware module
namely its interface with other modules in terms of input
and output signals.
The hardware block can be the entire design, a part of it or
indeed an entire test bench.
A test bench includes the circuit being designed, blocks
which apply test signals to it and those which monitor its
output.
The inner operation of the entity is described by an
ARCHITECTURE
associated with it.
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
ENTITY DECLARATION
The declaration of an ENTITY describes the signals which
connect this hardware to the outside. These are called port
signals. It also provides optional values of manifest constants.
These are called generics.
VHDL 93
VHDL 87
entity name is
generic(list);
port(list);
end entity name;
entity name is
generic(list);
port(list);
end name;
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
ENTITY EXAMPLE
VHDL 87
VHDL 93
entity flipflop is
generic (Tprop:delay length);
port (clk, d: in bit; q: out bit);
end entity flipflop;
entity flipflop
generic (Tprop: delay length);
end flipflop;
The entity declares port signals, their directions and data types.
These signals are used by an architecture associated with this

entity.
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
Design Elements in VHDL: ARCHITECTURE

An ARCHITECTURE describes how an ENTITY operates. An
ARCHITECTURE is always associated with an ENTITY.
There can be multiple ARCHITECTURES associated with an
ENTITY.
An ARCHITECTURE can describe an entity in a structural
style, behavioural style or mixed style.
The language provides constructs for describing components,
their interconnects and composition (structural descriptions).
The language also includes signal assignments, sequential and
concurrent statements for describing data and control flow, and
for behavioural
descriptions.
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
ARCHITECTURE Syntax
VHDL 93
VHDL 87
architecture name of entity-name

is
(declarations)
begin
(concurrent statements)
end architecture name;
architecture name of entity-name

is
(declarations)
begin
(concurrent statements)
end architecture name;
The architecture inherits the port signals from its entity. It must
declare its internal signals. Concurrent statements constituting
the architecture can be placed in any order.
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
ARCHITECTURE Example
VHDL 93
VHDL 87
architecture simple of dff is

signal ...;
begin
...
end architecture simple;
architecture simple of dff is

signal ...;
begin
...
end simple;
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
Design Elements in VHDL: COMPONENTS

An ENTITY ARCHITECTURE pair actually describes a
component type.
In a design, we might use several instances of the same
component type.
Each instance of a component type may be distinguished
by using a unique name.
Thus, a component instance with a unique instance name
is associated with a component type, which in turn is
associated with an ENTITY ARCHITECTURE pair.
This is like saying U1 (component instance) is a D Flip Flop
(component type) which is associated with an entity DFF
(which describes its pin diagram) using architecture
LS7474 (which describes its inner operation).
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
Component Example
VHDL 93
VHDL 87
component name is
generic(list);
port(list);
end component name;
EXAMPLE:
component flipflop is
end component flipflop;
component name
generic(list);
port(list);
end component;
EXAMPLE:
component flipflop
end component;
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
Design Elements in VHDL: Configuration

Structural Descriptions describe components and their
interconnections.
A component is an instance of a component type.
Each component type is associated with
an ENTITY ARCHITECTURE pair.
The architecture used can itself contain other components whose type will then be associated with other
ENTITYARCHITECTURE pairs.
A configuration describes linkages between component
types and ENTITY ARCHITECTURE pairs. It specifies
bindings for all components used in an architecture associated
with an entity.
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
Design Elements in VHDL: Packages

Related declarations and design elements like subprograms
and procedures can be placed in a package for re-use.
A package has a declarative part and an implementation part.
This is somewhat like entity and architecture for designs.
Objects in a package can be referred to by a
packagename.objectname syntax.
A description can include a use clause to incorporate the
package in the design. Objects in the package then become
visible to the description without having to use the dot reference
as above.
Dinesh Sharma
VHDL

entity
Architecture
Component
Configuration
Design Elements in VHDL: Libraries

Many design elements such as packages, definitions and entire
entity architecture pairs can be placed in a library.
The description invokes the library by first declaring it:
For example,
Library IEEE;
Objects in the Library can then be incorporated in the design by
a use clause.
For example,
Use IEEE.std logic 1164.all
In this example, IEEE is a library and std logic 1164 is a
package in the library.
Dinesh Sharma
VHDL

Scalar data types

Object and Data Types in VHDL

VHDL defines several types of objects. These include
constants, variables, signals and files.
The types of values which can be assigned to these objects are
called data types.
Same data types may be assigned to different object types.
For example, a constant, a variable and a signal can all have
values which are of data type BIT.
Declarations of objects include their object type as well as the
data type of values that they can acquire.
For example http://www.satishkashyap.com/
signal Enable: BIT;
Dinesh Sharma
VHDL

Scalar data types

Data Types
Scalar
Discrete
Access
Floating Pt.
Integer
real
enumeration
Severity Level
File
unconstrained
array
Physical
time
Composite
constrained
array
bit_vector
string
bit
character
boolean
file_open_kind
file_open_status
Dinesh Sharma
VHDL

Scalar data types

Enumeration Type
VHDL enumeration types allow us to define a set of values that
a variable of this type can acquire. For example, we can define
a data type by the following declaration:
type instr is (add, sub, adc, sbb, rotl, rotr);
Now a variable or a signal defined to be of type instr can only
be assigned values enumerated above that is: add, sub, adc,
sbb, rotl and rotr.
In actual implementation, these values may may be mapped to
a 3 bit value. However, an attempt to assign, say, 010 to a
variable of type instr will result in an error. Only the enumerated
values can behttp://www.satishkashyap.com/
assigned to a variable of this type.
Dinesh Sharma
VHDL

Scalar data types

Pre-defined Enumeration Types

A few enumeration types are pre-defined in the language.
These are:
type bit is (0, 1);
type boolean is (false, true);
type severity level is (note, warning, error, failure);
type file open kind is (read mode, write mode, append mode);
type file open status is
(open ok, status error, name error, mode error);
In addition to these, the character type enumerates all the
ASCII characters.
Dinesh Sharma
VHDL

Scalar data types

Types and SubTypes

A signal type defined in the IEEE Library is std logic. This is a
signal which can take one of 9 possible values. It is defined by:
type std logic is (U, X, 0, 1, Z, W, L, H, -);
A subtype of this kind of signal can be defined, which can take
the four values X, 0, 1, and Z only.
This can be defined to be a subtype of std logic
subtype fourval logic is std logic range X to Z;
Similarly, we may want to constrain some integers to a limited
range of values. This can be done by defining a new type:
subtype bitnum is integer range 31 downto 0;
Dinesh Sharma
VHDL

Scalar data types

Physical Types
Objects which are declared to be of Physical type, carry a value

as well as a unit. These are used to represent physical
quantities such as time, resistance and capacitance.
The Physical type defines a basic unit for the quantity and may
define other units which are multiples of this unit.
Time is the only Physical type, which is pre-defined in the
language. The user may define other Physical types.
Dinesh Sharma
VHDL

Scalar data types

Pre-defined Physical Type: Time

type time is range 0 to . . .
units
fs;
ps = 1000 fs;
ns = 1000 ps;
us = 1000 ns;
ms = 1000 us;
sec = 1000 ms;
min = 60 sec;
hr = 60 min;
end units time;
The user may define other physical types as required.

Dinesh Sharma
VHDL

Scalar data types

User Defined Physical Types

As an example of user defined Physical types, we can define
the resistance type.
type resistance is range 0 to 1E9
units
ohm;
kohm = 1000 ohm;
Mohm = 1000 kohm;
end units resistance;
Dinesh Sharma
VHDL

Scalar data types

Composite data types are collections of scalar types.

VHDL recognizes records and arrays as composite data types.
Records are like structures in C.
Arrays are indexed collections of scalar types. The index must
be a discrete scalar type.
Arrays may be one-dimensional or multi dimensional.
Dinesh Sharma
VHDL

Scalar data types

Arrays
Arrays can be constrained or unconstrained.
In constrained arrays, the type definition itself places
bounds on index values. For example:
type byte is array (7 downto 0) of bit;
type rotmatrix is array (1 to 3, 1 to 3) of real;
In unconstrained arrays, no bounds are placed on index
values. Bounds are established at the time of declaration.
type bus is array (natural range <>) of bit;
The declaration could be:
signal addr bus: bus(15 downto 0);
signal data
bus: bus(7 downto 0);
Dinesh Sharma
VHDL

Scalar data types

Built in Array types
VHDL defines two built in types of arrays. These are:

bit vectors and strings. Both are unconstrained.
type bit vector is array (natural range <>) of bit;
type string vector is array (positive range <>) of character;
As a result we can directly declare:
variable message: string(1 to 20)
signal Areg: bit vector(7 downto 0)
Dinesh Sharma
VHDL

Scalar data types

Records
While an array is a collection of the same type of objects,
a record can hold components of different types and sizes.
This is like a struct in C.
The syntax of a record declaration contains
a semicolon separated list of fields, each field having the format
name, . . ., name : subtype
For example:
type resource is record
(P reg, Q reg : bit vector(7 downto 0); Enable: bit)
end record resource;
Dinesh Sharma
VHDL
Structural Description
Part II
Structural Description in VHDL
3
Component Declarations
Component Instantiation
Configuration
Repetition Grammar
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Structural Style
Structural style describes a design in terms of components and
their interconnections.
Each component declares its ports and the type and direction
of signals that it expects through them
How can we describe interconnections between components?
s7
p1
p5
In
p2
U1p3
p6
p4
s1
s2
p1
p6
p2
U2
p3
p4
s3
s4
p5
Out
s5
s6
s3
p1
p5
p2
U3
p3
p4
p6
s4
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Describing Interconnect
s7
p1
p5
In
p2
U1p3
p6
p4
s1
s2
s3
s4
p1
p6
p2
U2
p3
p4
p5
Out
s5
s6
s3
p1
p5
p2
U3
p3
p4
p6
s4
For each internal interconnect, we

define an internal signal.
When instantiating a component,
we map its ports to specific internal
signals.
For example, in the circuit above, At the time of

instantiating U1, we map its pin p2 to signal s2.
Similarly, when instantiating U2, we map its pin p3 to s2.
This connects p2 of U1 to s2 and through s2 to pin p3 of
U2.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Structural Architecture
A purely structural architecture for an entity will consist of
1
Component declarations: to associate component types

with their port lists.
Signal Declarations: to declare the signals used.
Component Instantiations: to place component instances

and to portmap their ports to signals. Signals can be
internal or port signals declared by the ENTITY.
Configurations: to bind component types to ENTITY

ARCHITECTURE pairs.
Repetition grammar: for describing multiple instances of

the same component type for example, memory cells or
bus buffers.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
VHDL 93
VHDL 87
component name is
generic(list);
port(list);
end component name;
EXAMPLE:
component flipflop is
end component flipflop;
component name
generic(list);
port(list);
end component;
EXAMPLE:
component flipflop
end component;
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
VHDL-93: Direct Instantiation
VHDL-93 allows direct instantiation of
ENTITY ARCHITECTURE pairs without having to go through
a component type declaration first.
Instance-name: entity entity-name (architecture-name)
generic map(list)
port map(list);
This form is convenient, but does not have the flexibility of
associating alternative ENTITY ARCHITECTURE pairs with
a component.
VHDL-87 does not allow direct instantiation.

Dinesh Sharma
VHDL
Configuration
Repetition Grammar
VHDL-93: Normal Instantiation

Instance-name: component component-type-name
generic map(list)
port map(list);
The association here is with a previously declared component
type. The type will be bound to an ENTITY ARCHITECTURE
pair using an inline configuration statement or a configuration
construct.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
VHDL-87
The keyword component is not used in VHDL-87. This is
because direct instantiations are not allowed and therefore the
binding is always to a component.
Instance-name: component-type-name
generic map(list)
port map(list);
The association is with a previously declared component type.
The type will be bound to an ENTITY ARCHITECTURE pair
using an inline configuration statement or construct.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Inline Configuration
The association between component types and
ENTITYARCHITECTURE pairs can be made inline with a
use clause.
for all: component-name
use entity entity-name(architecture-name);
Instead of saying for all, we can specify a list of selected
instances of this component type to which this binding will
apply.
instance-name-list: component-name
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
The key word OTHERS
If we use the keyword others instead of a list of instance

names, it refers to all component instances of this
component-name which have not yet figured in a name-list.
In VHDL, the key word others is used in different contexts
involving lists.
If some members of the list have been specified, then others
refers to the remaining members. (If none was specified, it is
equivalent to all.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Hierarchical Configuration
When we associate a component type with a previously defined

ENTITY ARCHITECTURE pair,
the chosen architecture could itself contain other components
- and these components in turn would be associated with other
ENTITY ARCHITECTURE pairs.
This hierarchical association can be described by a standalone
design unit called a configuration.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Hierarchical Configuration
VHDL contains fairly complex configuration statements. A
simplified construct is introduced here:
configuration config-name of entity-name is
for architecture-name
for component-instance-namelist: component-type-name
end for
end for
end configuration config-name;
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Structural description: Example
A
A
B
A+B
A+B
A+B
B
Let us choose the xor gate

shown on the left as an
example for structural
description.
It uses four instances of a
single type of component: two
input NAND.
A+B
We shall describe the NAND

gate first.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
The work library
In VHDL, as we describe entities and architectures, these

are compiled into a special library called WORK.
This library is always included and does not have to be
declared.
In some sense, the WORK library represent the current
state of development of the project for designing
something.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Definition of NAND
Entity nand2 is
port (in1, in2: in bit; p: out bit);
end entity nand2;
We do not use any generic for this
simple example.
Architecture trivial of nand2 is

p <= not (in1 and in2);
end Architecture trivial;
not and and are inbuilt logical
functions.
(Actually so is nand but we are
trying to be cute!)
Now that we have this entity-architecture pair, we can use it to

build our xor gate.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
XOR Gate example
A
A
N1
B
s1
N2
A+B
s2
A+B
s1
USE WORK.ALL
Entity xor is
port(a,b: in bit; axb: out bit);
End Entity xor;
s3
s1
N3
A+B
N4
axb
A+B
Architecture simple of xor is

component NAND2in IS port(a,b:
in bit; axb: out bit);
For all NAND2in: use Entity
NAND2(Trivial);
signal s1,s2,s3: bit;
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
XOR Architecture body
A
A
N1
B
s1
N2
A+B
s2
A+B
s1
s3
s1
N3
A+B
N4
axb
A+B
begin
N1: component NAND2in
portmap(a, b, s1);
portmap(a, s1, s2);
portmap(b, s1, s3);
portmap(s2, s3, axb);
end Architecture simple;
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Repetition Grammar
We frequently use a large number of identical components of

the same type. (For example memory cells or bus drivers).
It is tedious to instantiate and configure each one of them
individually.
VHDL provides a way to place a collection of instances of a
component type at one go using the generate statement.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
GENERATE Statement
The generate statement contains a for loop which takes effect
during the circuit elaboration step. This can be used to repeat
instantiation constructs. We illustrate this statement with an
example:
groupname: for index in 0 to width-1 generate
begin
some-name: component outbuf
portmap (...);
end generate groupname;
The defined index in the for construct has local scope and can
be used to pick
specific signals from an array in portmap
statements.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Example: Full adder

C_in
a
b
C_out
Entity FullAdder is
Full
Adder sum Port(a,b, C in: in bit; sum, C out: out bit);
End Entity FullAdder;
C out and sum represent the more significant and less

significant bits of a+b+C in.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Example: Full adder

C_in
a
b
C_out
Entity FullAdder is
Full

Suppose this is too difficult for the likes of us to figure out
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Example: Full adder

C_in
a
b
C_out
Entity FullAdder is
Full

Suppose this is too difficult for the likes of us to figure out
We would like to decompose the circuit into blocks which
handle two bits
at a time.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Decomposition of Full Adder

Half Adder
sum
s2
s
HA2
i1
i2 cy cy2
C_in
s s1
HA1
i1
cy cy1
i2
cy
i1
C_out
combn
i2
Each half adder represents the

sum and carry of just two bits.
Carry occurs only if both bits are 1.

Sum is zero if both bits are zero or
both are one.
so sum = a xor b, cy = a and b.
The combiner just combines the

carries from the two half adders.
(Just an OR Gate will do it.)
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Description of full Adder

Architecture simple of FullAdder is
Component HalfAdder is
port(a, b: in bit; s, cy: out bit);
Entity HalfAdder is
End Component HalfAdder;
port(in1, in2: in bit; s, cy: out bit);
signal s1, cy1, cy2: bit;
End Entity HalfAdder;
begin
Architecture trivial of HalfAdder is HA1: Component HalfAdder
begin
portmap(a,b,s1,cy1)
s <= a xor b;
HA2: Component HalfAdder
cy <= a and b;
portmap(s1,cy1,sum,cy2)
end Architecture trivial;
Cmbn: Component OR2in
portmap(cy1, cy2, C out)
end Architecture simple;
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
The half adder

Carry from the half adder is an AND gate, and the combiner is
an OR.
i1
i2
But Gates without inversion are slow. So we bring out carry

rather than carry, using a NAND gate.
Entity HalfAdder is
port(in1, in2: in bit; s, cybar: out bit);
Half Adder
End Entity HalfAdder;
cybar
Architecture better of HalfAdder is
begin
s
s <= a xor b;
cybar <= a nand b;
end Architecture better;
The combinerhttp://www.satishkashyap.com/
should now be an OR of negative true signals.
This is just a NAND.
Dinesh Sharma
VHDL
Configuration
Repetition Grammar
Efficient Full Adder
C_in
s1
HA1
i1
cybar c1b
i2
Architecture better of FullAdder is

Component HalfAdder is
port(a, b: in bit; s, cybar: out bit);
sum
s2
End Component HalfAdder;
s
HA2
signal s1, c1b, c2b: bit;
i1
cybar c2b
begin
i2
C_out
portmap(a,b,s1,c1b);
combn
portmap(s1,c1b,sum,c2b);
Cmbn: Component NAND2in
portmap(c1b, c2b, C out);
end Architecture better;
Dinesh Sharma
VHDL
Behavioural Description
Subprograms
Attributes
Part III
Behavioural Description Using VHDL
4
Concurrent Statements
VHDL Operators
Processes
Sequential Statements
Subprograms
Attributes
Array attributes
Type Attributes
Signal attributes
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Behavioural Style
Behavioural style describes a design in terms of its behaviour,
and not in terms of a netlist of components.
We describe behaviour through if-then-else type of constructs,
loops, sequential and concurrent assignment statements.
Statements like if-then-else are inherently sequential. These
must therefore occur only inside sequential bodies like
processes.
A concurrent assignment statement may be considered as a
shorthand for a very simple process.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Specifying a waveform
A waveform is described by a comma separated list of values
and optionally, delays. For example, we may assign a waveform
by a statement like
indata <= 0, 1 AFTER 20 NS, 0 AFTER 50 NS;
The values at different times are treated as transport delays
and are all inserted in the time ordered queue without wiping
out earlier values.
(This is the only context where delays are transport by default).
Single value assignments use inertial delay by default.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Concurrent Assignment
A concurrent assignment can be made conditionally by using

when clauses.
name < = [delay-mechanism]
waveform when Boolean-expression else
waveform when Boolean-expression;
The assignment is made from the first waveform where the
Boolean expression evaluates to TRUE.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Concurrent Assignment
The assignment can also be made on a selective basis, based

on the value of some expression:
with expression select
name < = [delay-mechanism]
waveform when choices,
waveform when choices;
If the expression evaluates to one of the specified choices, the
corresponding assignment is made.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Assignment to an aggregate
Assignments can be made to a collection of signals
simultaneously. For example let vec be defined as bit vector(2
downto 0)
vec <= (000) - - 000 : string
vec <= (0,0,1) - - 001 : positional
vec <= (1=>1, others => 0) - - 010 : named, partial
vec <= (1, others => 0) - - 100 : positional, partial
vec <= (2|0 => 1 , others => 0) - - 101 : partial
vec <= (others => 1) - - 111
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
VHDL Operators
Logical operators: AND, OR, NAND, NOR, OR, XNOR and
NOT
For example x <= a xor b;
Relational operators: =, /, <, <=, >, >=
= and = operate on any type. Others operate on arithmetic
types: (integers, reals etc.). All of these return a boolean
value.
Shift operators: SLL (logical left), SLA (arithmetic left) SRL
(logical right), SRA (Arithmetic right), ROL rotate left and
ROR (rotate right).
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Processes
Sequential constructs need to be placed inside a process. A
process uses the syntax:
[ process-label: ] process [(sensitivity-list)] [is]
[declarations]
begin
[sequential statements]
end process [process-label];
Sequential statements include if constructs, case statements,
looping constructs, assertions, wait statements etc.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Process with Sensitivity list
Every process is like an endless loop. Therefore, it requires an

explicit or implicit suspend statement.
If a sensitivity list is given with the process statement, the
process automatically suspends when it reaches its end.
It restarts from the beginning when any of the signals in its
sensitivity list has an event.
This process has a static sensitivity and an implicit suspend
statement.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Wait statements
A process without a sensitivity list requires explicit suspend
statements. These are provided by wait statements. These can
be of the form:
wait for waiting-time;
wait on signal-list;
wait until waiting-condition;
wait for 0 some-time-unit;
wait;
wait for 0 ns causes the process to suspend till the next delta.
The last form (bare wait statement) suspends the process for
ever.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Dynamic sensitivity
Processes without a sensitivity list and multiple wait statements
have a dynamic sensitivity. This is because these processes
are sensitive to different events at different times.
One cannot mix static and and dynamic sensitivity
Thus, a process with a sensitivity list cannot use wait
statements.
This is because once the process is suspended, it is possible to
have an event on a signal in the sensitivity list simultaneously
with the condition for resumption after wait being fulfilled.
This would leave the process undecided on where to resume
from.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
IF statements
if statements are similar to their counterparts in programming
languages. The syntax is:
[ if-label: ] if Boolean-expression then
sequential statements
[ elsif Boolean-expression then
sequential statements ]
[ elsif ... ]
[ else sequential statements ]
end if [ if-label ];
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
CASE statements
A case statement acts like a multiplexer.

The syntax is:
[ case-label:] case expression is
when choices = >
sequential-statements
[ when ... ]
end case [ case-label ];
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
CASE Choices
Choices can be specified in CASE statements as vertical bar

separated lists of expressions, discrete ranges or the keyword
others. For example:
case opcode is
load | store | add | subtract = >
...
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Loop Statements
There are several different forms of the loop statement. The
simplest is the endless loop:
[ loop-label: ] loop
[ loop-label: ] loop
end loop [ loop-label ];
This constitutes an endless loop.
It is assumed that it will have an exit statement or a wait
statement inside to suspend operation.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Exiting a Loop
The exit statement has the syntax:

[ label: ] exit [ loop-label ] [ when Boolean expression ]
The loop label allows one to exit several levels of nested loops.
We can also skip to the end of a loop by using the next
statement. This works like continue in C.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
NEXT Statement
[ label: ] next [ loop-label ] [ when Boolean expression ]

The next statement skips the statements of the loop
and immediately starts the next iteration of the specified loop.
The loop label allows one to skip through several levels of
nested loops.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
WHILE Loops
VHDL also has a while loop.

[ loop-label: ]
while Boolean-expression loop
The loop continues as long as the Boolean expression is TRUE.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
For Loops
VHDL also provides a for loop.
[ loop-label: ]
for identifier in discrete-range loop
The discrete range can be of the form
expression to | downto expression
The identifier is initialized to the left limit of the range and takes
on successive values in the discrete range till it exceeds the
right limit.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Assertions and Reports
The assert statement takes the form

[ label: ] assert Boolean expression
[ report expression ] [ severity expression ];
If the Boolean expression is TRUE, no action is taken.
If it is FALSE, an assertion violation is said to have occurred.
The simulators then outputs the report expression.
Subsequent operation depends on the severity clause.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Severity Clause in Assertions
Assert statements are used for debugging and documentation.

The severity clause decides what happens when an assertion
failure occurs.
Severity is an enumerated type which is predefined to take any
of the values:
note, warning, error, failure
Depending on the severity value, simulation continues or is
aborted.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Severity values
Note is simply to generate an output when an assertion

violation occurs.
Warning is useful when the validity of the simulation may be
in doubt, but we would like to issue a warning and
continue anyway.
Error is used when an unexpected value is encountered.
Failure is the most severe violation and is used when
some inconsistency is detected.
Dinesh Sharma
VHDL
Subprograms
Attributes
VHDL Operators
Processes
Assertions defaults
[ label: ] assert Boolean expression
[ report expression ] [ severity expression ];
If the optional report clause is missing in the assert statement,
the default report message is Assertion Violation.
If the severity clause is omitted, the default value is error.
Most simulators allow the user to set a severity threshold,
beyond which the simulation is aborted on an assertion
violation. It is common to continue on note and warning and to
abort on error and failure.
In VHDL-93, the report clause can be used by itself as a
statement to http://www.satishkashyap.com/
output useful messages.
Dinesh Sharma
VHDL
Subprograms
Attributes
Subprograms in VHDL
VHDL has two types of subprograms: Functions and
Procedures.
FUNCTIONS are used to return a single value from a given list
of input parameters. These occur in expression on
the right hand side of VHDL statements. Functions
execute in zero simulation time.
PROCEDURES can return multiple values and need not
execute in zero simulation time. The parameters
have their type as well as direction defined in the
parameter list. These are invoked like a VHDL
statement.
Dinesh Sharma
VHDL
Subprograms
Attributes
FUNCTIONS
Functions can be PURE or IMPURE.
A PURE function returns the same value every time it is called
with the same value of input parameters. Most functions are
PURE.
An IMPURE function can return different values for calls with
the same parameter values.
For example, the function NOW, which returns the current
simulation time.
RANDOM is also an IMPURE function.
Dinesh Sharma
VHDL
Subprograms
Attributes
Functions
Function name(parameter list) Return type IS

. . . Local declarations . . .
BEGIN
Sequential Statements;
...;
END [FUNCTION] name;
Dinesh Sharma
VHDL
Subprograms
Attributes
Function Example
TYPE Byte IS ARRAY(7 DOWNTO 0) OF BIT;
FUNCTION ByteVal(InByte: Byte) RETURN Integer IS
Variable RetVal: Integer := 0;
BEGIN
FOR I IN 7 DOWNTO 0 LOOP
RetVal = 2 * RetVal;
IF (InByte = 1) THEN RetVal := RetVAl + 1;
END IF;
END LOOP;
RETURN RetVal;
END FUNCTION ByteVal;
Dinesh Sharma
VHDL
Subprograms
Attributes
Procedures
Declaration:
PROCEDURE name (parameter list) IS
. . . Local declarations . . .
BEGIN
Sequential Statements;
...;
END [PROCEDURE] name;
A procedure ends when it reaches the END statement. It can

be terminated earlier by using the RETURN statement.
Dinesh Sharma
VHDL
Subprograms
Attributes
Parameter Lists for Procedures
Similar to List of signals in a PORT declaration.

Elements of the list have a TYPE as well as a direction.
The direction can be in, out or inout.
Elements of the list can also have their Object Class
(Constant/ Variable/ Signal) also in the parameter list.
For example: (SIGNAL a, b, c: IN BIT; Variable result: OUT
INTEGER);
Dinesh Sharma
VHDL
Subprograms
Attributes
Array attributes
Type Attributes
Signal attributes
Attributes
VHDL provides built in functions which return usefult attributes
of the objects that they operate on.
Attribute functions may provide attributes of
Arrays
Types
Signals
Entities
Attributes are invoked as nameattrib name.
The single quote is read as tick
Dinesh Sharma
VHDL
Subprograms
Attributes
Array attributes
Type Attributes
Signal attributes
Array Attributes
Array attributes interrogate the property of arrays. Consider the
declaration:
TYPE regfile IS ARRAY(0 To 3, 7 Downto 0) OF BIT;
Then we can use the following attributes:
LEFT :
RANGE:
regfileLEFT(2) = 7
regfileRANGE(1)= 0 TO 3
RIGHT:
REVERSE RANGE:
regfileRIGHT(1) = 3
regfileREVERSE RANGE(1) = 3
HIGH:
DOWNTO 0
LENGTH: regfileLENGTH(1) = 4
regfileHIGH(2) = 7
ASCENDING:
LOW:
regfileASCENDING(1) = TRUE
regfileLOW(1) = 0http://www.satishkashyap.com/
Dinesh Sharma
VHDL
Subprograms
Attributes
Array attributes
Type Attributes
Signal attributes
Type Attributes
Type attributes apply only to scalar types. Consider the
declarations:
TYPE nineval IS(U, X, 0, 1, Z, L, H, W, -)
SUBTYPE fourval IS nineval RANGE X to Z
Then, fourvalBASE = nineval
Attributes LEFT, RIGHT, HIGH and LOW are defined for TYPES
also. When applied to a TYPE, these return the corresponding
values as defined for the type. For example,
ninevalLEFT = U, fourvalLEFT = X
POSITIVELOW = 1
Dinesh Sharma
VHDL
Subprograms
Attributes
Array attributes
Type Attributes
Signal attributes
Signal Attributes
Name
DELAYED
STABLE
EVENT
QUIET
TRANSACTION
DRIVING
DRIVING VALUE
Example
sDELAYED
sSTABLE(5ns)
sEVENT
sQUIET(3ns)
sTRANSACTION
sDRIVING
sDRIVING VALUE
Return type
Signal
Signal
Value
Signal
Signal
Value
Value
Dinesh Sharma
VHDL
Value type
same as s
Boolean
Boolean
Boolean
BIT
Boolean
same as s
Subprograms
Attributes
Array attributes
Type Attributes
Signal attributes
Case of RS Latch
Entity RS Latch is
Port(R,S: IN BIT; Q, Qbar: OUT BIT);
R
End Entity RS Latch;
Architecture trouble of RS Latch is
Begin
Q <= R NOR Qbar;
Q
S
Qbar <= S NOR Q;
End Architecture trouble;
This will run into trouble as Q and Qbar are declared to be
outputs and cannot be used on the RHS expression of an
assignment.
Dinesh Sharma
VHDL
Subprograms
Attributes
Array attributes
Type Attributes
Signal attributes
RS Latch
Q
R
We have several choices:

Declare Q and Qbar to be inout.
This is not safe as this will allow outside circuitry to drive Q and
Qbar nodes.
Use structural description and connect nor outputs to internal
signals s1 and s2. Later assign s1 and s2 to Q, Qbar.
Introduces artificial
delay in driving of Q and Qbar.
Better choice is to use the driving value attribute.
Dinesh Sharma
VHDL
Signal types in Package Std Logic 1164

Functions Defined in std logic package 1164
Part IV
The IEEE Package Std Logic 1164
7

The resolution Function
Logic Functions with std logic
Dinesh Sharma
VHDL


9 Valued Logic
The stdlogic package uses 9 valued logic.
The basic unresolved signal type is declared as:
TYPE std ulogic IS (U,X,0,1,Z,W,L,H,-);
Here U is uninitialized,
X is forcing unknown, W is weak unknown,
L and H are weak 0 and 1,
Z is high impedance and - is dont care.
This type combines signal values and drive strengths,
permitting modeling of open drain and wired or circuits. Other
types are derived from this basic signal type.
Dinesh Sharma
VHDL


Derived types
We derive the following types from the basic u logic signal

TYPE std ulogic vector IS
ARRAY (NATURAL RANGE<>) OF std ulogic);
FUNCTION resolved(s:std ulogic vector) RETURN std ulogic;
SUBTYPE std logic IS resolved std ulogic;
TYPE std logic vector IS
ARRAY (NATURAL RANGE<>) OF std logic);
Dinesh Sharma
VHDL


Other Types
The IEEE package 1164 also defines the following subtypes of
std ulogic.
1
X01 allows the values X, 0 and 1.
X01Z allowed the values X, 0, 1 and Z. This type is

compatible with the default verilog signal type.
UX01 allows the values U, X, 0 and 1.
UX01Z allows the values U, X, 0 1 and Z.
The package includes functions for conversion between various

types.
Dinesh Sharma
VHDL


The Resolution Function

This function uses the following table:
U
X
0
1
Z
W
L
H
-
U
U
U
U
U
U
U
U
U
U
X
U
X
X
X
X
X
X
X
X
0
U
X
0
X
0
0
0
0
X
1
U
X
X
1
1
1
1
1
X
Z
U
X
0
1
Z
W
L
H
X
W
U
X
0
1
W
W
W
W
X
L
U
X
0
1
L
W
L
W
X
H
U
X
0
1
H
W
W
H
X
U
X
X
X
X
X
X
X
X
The resolution
function receives a vector of driving values of
type std ulogic. The return is type std ulogic!
Dinesh Sharma
VHDL


The Resolution Function

FUNCTION resolved(s: std ulogic vector)
RETURN std ulogic IS
VARIABLE result:std ulogic:=Z
BEGIN
IF (sLENGTH = 1) THEN RETURN s(sLOW);
ELSE
FOR i IN sRANGE LOOP
result:= resolution table(result,s(i));
END LOOP;
END IF;
RETURN result;
END resolved;
Dinesh Sharma
VHDL



Since signals can now acquire a multiplicity of values, we need
to redefine logic functions.
This is done by overloading logic functions with new definitions
when their arguments are of type std ulogic or std logic.
What happens when we put an inverter on a std ulogic signal?
This is defined by the NOT logic function:
input
output
U
U
X
X
0
1
NOT
1 Z
0 X
W
X
L
1
H
0
Dinesh Sharma
VHDL


Logic Truth TABLES

Truth tables of 2 input logic functions will now be 9x9 matrices!
AND
U X 0 1 Z W L H U U U 0 U U U 0 U U
X U X 0 X X X 0 X X
0 0 0 0 0 0 0 0 0 0
1 U X 0 1 X X 0 1 X
Z U X 0 X X X 0 X X
W U X 0 X X X 0 X X
L 0 0 0 0 0 0 0 0 0
H U X 0 1 X X 0 1 X
-http://www.satishkashyap.com/
U X 0 X X X 0 X X
Dinesh Sharma
VHDL

Conversion Functions
The following type conversion functions are included in
package 1164:
These include To bit (from std ulogic) and To std ulogic
(from bit)
To bit vector (from std ulogic vector and std ulogic vector)
To std ulogic vector (from bit vector) and
To std logic vector (from bit vector)
To std logic vector (from std ulogic vector) and
To std ulogic vector (from std logic vector)
There are similar functions for inter-conversions between
X01, X01Z
etc. and std logic and std ulogic.
Dinesh Sharma
VHDL

Edge Detection Functions

The IEEE library package 1164 includes edge detection
functions for std ulogic types. These are defined as:
FUNCTION rising edge (SIGNAL s: std ulogic)
RETURN Boolean
The rising edge is detected when there is a transition
from 0 or L to 1 or H.
FUNCTION falling edge (SIGNAL s: std ulogic)
RETURN Boolean
The falling edge is detected when there is a transition
from 1 or H tohttp://www.satishkashyap.com/
0 or L.
Dinesh Sharma
VHDL
A magnitude comparator
Part V
An Example Design
9
First Level Description
Constructing the Byte Comparator
Structural Description of Bit Comparator
Dinesh Sharma
VHDL

A Magnitude Comparator
The example used in this section has been described in

the book: VHDL: Analysis and Modeling of Digital
Systems by Zainalabedin Navabi (McGraw Hill).
However the treatment in this tutorial is different.
We illustrate top down design using this example.
Dinesh Sharma
VHDL

We want to design a circuit to compare the magnitude of
two binary numbers.
We shall illustrate the design by a comparator for byte wide
numbers.
However, the design should be stackable, so that wider
numbers can be compared.
The input to the system are the two numbers and stacking
inputs, gt in, eq in and lt in.
The outputs are the result of comparison: gt out, eq out
and lt out.
The stacking inputs and outputs use one hot coding:
exactly one
of the conditions gt, eq or lt is TRUE at a given
time.
Dinesh Sharma
VHDL

First level description
Library IEEE;
USE IEEE.std logic 1164.ALL;
TYPE Byte IS Array (7 DownTo 0) OF std ulogic;
Entity Byte Compar is
Port(a, b: IN BYTE;
gt in, eq in, lt in: IN std ulogic;
gt out, eq out, lt out: OUT std ulogic);
End Entity Byte Compar;
Dinesh Sharma
VHDL

Architecture of Byte Comparator

Architecture first Of Byte Compar is
Variable val1, val2: Integer:= 0;
BEGIN
P1: PROCESS(a, b, gt in, eq in, lt in)
BEGIN
val1 := ByteVal(a);
val2 := ByteVal(b);
IF (val1 > val2) THEN
gt out <= 1; eq out <= 0; lt out <= 0;
ELSIF (val1 < val2) THEN
ELSE gt out <= gt in; eq out <= eq in; lt out <= lt in;
END IF;
END PROCESS
P1;
Dinesh Sharma
VHDL

Decomposition of Byte Comparator

The byte comparator is difficult to design directly.
We can brek up the design into bit comparators
with cascading inputs gt in, eq in and lt in;
and cascading outputs gt out, eq out and lt out.
A0
B0 A1
B1 A2
B2 A3
B3 A4
B4 A5
B5 A6
B6 A7
B7
>
=
<
>
=
<
BitPart
BitPart
BitPart
BitPart
BitPart
BitPart
BitPart
BitPart
Notice that the most significant bit is compared closest to the

output.
Dinesh Sharma
VHDL

Composing the Byte comparator

Architecture compose of Byte Compar IS
COMPONENT BitPart IS
Port(a, b: IN std ulogic;
END COMPONENT BitPart;
FOR ALL: BitPart
USE ENTITY Bit Compar(behave);
TYPE Connect IS ARRAY (1 TO 3, 0 TO 6) OF std ulogic);
Signal Cascade: Connect;
Dinesh Sharma
VHDL


BEGIN
FOR I in 0 T0 7 GENERATE
First: IF I = 0 GENERATE
COMPONENT BitPart
PORTMAP
(gt in, eq in, lt in,
a(I), b(I),
Connect(1, I), Connect(2,I), Connect(3,I));
END GENERATE;
Dinesh Sharma
VHDL

Last: IF I = 7 GENERATE
COMPONENT BitPart
PORTMAP
(Connect(1, I-1), Connect(2,I-1), Connect(3,I-1));
a(I), b(I),
gt out, eq out, lt out)
END GENERATE;
Dinesh Sharma
VHDL


Mid: IF (I >0) AND (I< 7) GENERATE
COMPONENT BitPart
PORTMAP
(Connect(1, I-1), Connect(2,I-1), Connect(3,I-1));
a(I), b(I),
Connect(1, I), Connect(2,I), Connect(3,I));
END GENERATE;
END GENERATE;
END Architecture Compose;
Dinesh Sharma
VHDL

The bit comparator

Once we have decomposed the byte comparator as above, we
need to design the bit comparator.
The bit comparators recieve a pair of bits to compare.
If A > B, i.e. A=1 and B=0; it makes the output gt out
TRUE and makes the other outputs FALSE.
If A < B, i.e. A=0 and B=1; it makes the output lt out TRUE
and makes the other outputs FALSE.
IF A and B are equal, it copies its cascading inputs (gt in,
eq in, lt in) to its outputs (gt out, eq out, lt out);
Dinesh Sharma
VHDL

The bit comparator

Library IEEE;
Entity Bit Compar is
End Entity Bit Compar;
Dinesh Sharma
VHDL

Behavioural Architecture of Bit Comparator

Architecture behave Of Bit Compar is
BEGIN
P1: PROCESS(a, b, gt in, eq in, lt in)
BEGIN
IF (a = 1 AND b = 0) THEN
ELSIF (a = 0 AND b = 1) THEN
ELSE gt out <= gt in; eq out <= eq in; lt out <= lt in;
END IF;
END PROCESS P1;
END Architecture behave;
Dinesh Sharma
VHDL


We can write Karnaugh Maps for the three outputs easily:
eq out
gt out
ab 00 01 11 10
ab 00 01 11 10
eq
in
gt in
0
0

1
1
ab
lt in
0
1
lt out
00 01
This gives:
11
10
gt out = a b + gt in (a + b)
lt out = a b + lt in (a + b)
eq out = eq in (a b + a b)
Dinesh Sharma
VHDL

Final Design of bit comparator
lt_in
b
lt_out
a
a+b
a
b
This design can be described

structurally in terms of basic
gates.
eq_out
a+b
The design uses only inverting

gates. It can be implemented
directly on a chip.
gt_out
gt_in
eq_in
Dinesh Sharma
VHDL


Architecture struct Of Bit Compar is
Component Inv IS
PORT(In1: IN std ulogic; op1: OUT std ulogic);
END COMPONENT Inv;
FOR ALL: Inv USE ENTITY Inverter(behav);
Component Nand2 IS
PORT(In1, In2: IN std ulogic; op1: OUT std ulogic);
END COMPONENT Nand2;
FOR ALL: Nand2 USE ENTITY Nand2(behav);
Component Nand3 IS
PORT(In1, In2, In3: IN std ulogic; op1: OUT std ulogic);
END COMPONENT Nand3;
FOR ALL: Nand3 USE ENTITY Nand3(behav);
Dinesh Sharma
VHDL

Structural Architecture of Bit Comparator

SIGNAL Abar, Bbar, AplusBbar, BplusAbar: std ulogic;
SIGNAL s1, s2, Eqbar: std ulogic;
BEGIN
Inv1:
Inv PORTMAP(A, Abar);
Inv2:
Inv PORTMAP(B, Bbar);
N1:
Nand2 PORTMAP(A, Bbar, BplusAbar);
N2:
Nand2 PORTMAP(B, Abar, AplusBbar);
N3:
Nand2 PORTMAP(lt in, BplusAbar, s1);
N4:
Nand2 PORTMAP(gt in, AplusBbar, s2);
Nand2 PORTMAP(s1, AplusBbar, lt out);
N5:
N6:
Nand2 PORTMAP(s2, BplusAbar, gt out);
Nand3 PORTMAP(AplusBbar, BplusAbar, Eq in, Eqbar);
N7:
Inv3:
Inv PORTMAP(Eqbar, Eq out);
END ARCHITECTURE
struct;
Dinesh Sharma
VHDL

Inline configuration
The configuration of a component can be declared inline in an
architecture.
Architecture compose of Byte Compar IS
COMPONENT BitPart IS
END COMPONENT BitPart;
FOR ALL: BitPart
USE ENTITY Bit Compar(behave);
TYPE Connect IS ARRAY (1 TO 3, 0 TO 6) OF std ulogic);
Signal Cascade: Connect;
All components of type BitPart have been configured to use the

entity Bit Compar withDinesh
architecture
behave.
Sharma
VHDL

Standalone configuration
In the example given, all components of type BitPart were

configured to use the entity Bit Compar with architecture
behave.
This was specified inline in the architecture declarative
part.
We can write a separate configuration description outside
the architecture using the configuration.
Dinesh Sharma
VHDL

Stand alone configuration

The syntax of a standalone configuration is:
CONFIGURATION configname OF entityname IS

FOR architecture name
FOR instance name | OTHERS | ALL : component name
USE ENTITY sub entity name(sub architecture name)
...
END FOR;
END FOR;
END [CONFIGURATION] [configname];
Dinesh Sharma
VHDL

Hierarchical configuration
The architecture being configured may contains
components which are bound to architectures containing
other components.
This requires hierarchical configuration.
Instead of binding component instances to
entity-architecture pairs directly, we bind these to other
configurations.
These other configurations associate the component with
an entity-architecture pair and cofigure the lower level
components.
Dinesh Sharma
VHDL

Hierarchical configuration
The syntax used for hierarchical configuration is:
CONFIGURATION configname OF entityname IS
FOR architecture name
FOR instance name | OTHERS | ALL : component name
USE CONFIGURATION subconfig name;
...
END FOR;
END FOR;
END [CONFIGURATION] [configname];
Subconfig name will associate the component with an
entity-architecture pair and will configure lower level
components in the hierarchy.
Dinesh Sharma
VHDL

Hierrarchy in a single configuration

The hierarchy can be described through nested FORs in a
single configuration description.
CONFIGURATION single OF Byte compar IS
FOR compose architecture name
FOR ALL: BitPart
USE ENTITY WORK.Bit Compar(struct);
FOR struct architecture of Bit Compar
FOR ALL: Nand2 USE ENTITY . . .
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
Part VI
File I-O in VHDL
10 Files in VHDL
File Declarations
Opening and Closing Files
Reading and writing
Example of File usage
11 The Textio Package
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
File Declarations
Reading and writing
Files in VHDL
To VHDL, a file is a collection of information of a type that is
known to it.
File I-O presents a special problem, because conventions
for naming files and directories are different for different
Operating Systems.
We would like to insulate hardware descriptions from this
variation.
We do it by making a distinction between file names used
by VHDL and the operating system dependent filename
which is associated with it.
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
File Declarations
Reading and writing
FILE Types
In VHDL, in order to use files, we use a two step procedure.

1
We declare a FILE TYPE first. This associates a File TYPE

with the kind of objects that files of this type will contain.
We can then decare files of this FILE TYPE.

The file declaration associates a VHDL filename with a
FILE TYPE and optionally, with a Physical file name and
file mode (read, write or append).
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
File Declarations
Reading and writing
Examples
TYPE datafile IS FILE OF CHARACTER;
This specifies that any file which has the type datafile will
contain characters and each read will return a character while
each write will accept a character to be written to the file.
Once a file type has been declared, we may declare one or
more files of this type. For example,
FILE vfile1: datafile;
FILE vfile2: datafile IS indata.dat
FILE vfile3: datafile OPEN WRITE MODE is output.dat;
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
File Declarations
Reading and writing
FILE vfile1: datafile;

This form merely associates the VHDL name vfile1 with the file
TYPE datafile, which specifies that it contains characters.
FILE vfile2: datafile IS indata.dat
This form also associates the VHDL filename vfile2 with the
Physical filename indata.dat.
FILE vfile3: datafile OPEN WRITE MODE is output.dat; This
form associates the vhdl filename vfile3 with the physical
filename output.dat and also opens it in write mode.
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
File Declarations
Reading and writing

If a file has not been opened during its declaration, it can be
opened later by specific statements.
Once a file type has been declared as:
TYPE FileType IS FILE OF DataType;
it implicitly defines various procedures and functions.
PROCEDURE FILE OPEN(FILE f: FileType;
Phys name: IN string;
open kind: IN FILE OPEN KIND:= READ MODE);
PROCEDURE FILE CLOSE(FILE f: FileType);
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
File Declarations
Reading and writing
Reading from and Writing to Files
Once file types and files have been declared, various

subprograms become available.
PROCEDURE READ(FILE f: FileType; value: OUT Data type);
PROCEDURE WRITE(FILE f: FileType; value: IN Data type);
FUNCTION ENDFILE(FILE f: FileType) RETURN Boolean;
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
File Declarations
Reading and writing
Unconstrained Data Types

It is possible to declare a File Type to contained unconstrained
arrays as data types. For example:
TYPE VectorFile IS FILE OF std ulogiv vector;
Now how do we know the amount of data which will be returned
upon each read request? For this, there is an additional syntax
for the read procedure:
PROCEDURE READ(FILE f: FileType; value: OUT Data type
Length: OUT natural);
When we use this form, we supply an array large enough to
accommodate the array in the worst case and a variable, which
will receive the
length of the vector actually read.
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
File Declarations
Reading and writing
Library IEEE;
ENTITY ROM Block IS
GENERIC(size: NATURAL, content file: STRING)
PORT(Chip sel: IN std logic;
rdbar: IN std logic;
Addr: IN std logic vector;
Data: IN std logic vector);
END ENTITY ROM Block;
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
File Declarations
Reading and writing
ROM Initialization
ARCHITECTURE From File OF ROM Block IS
SUBTYPE Word IS
std logic vector(DataLength-1 DOWNTO 0);
TYPE Mem Array IS
ARRAY(NATURAL RANGE 0 TO 2**size -1) of Word;
VARIABLE Mem Contents: Mem Array;
VARIABLE Index: Natural;
...
TYPE RomData File IS FILE of WORD;
FILE Rom Contents : RomData FILE
OPEN Read Mode IS content file;
...
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
File Declarations
Reading and writing
ROM Initialization
BEGIN
Filling: Process IS
BEGIN
Index := 0;
WHILE NOT EndFile(ROM Contents) LOOP
READ(ROM Contents, Mem Contents(Index);
Index:= Index+1;
END LOOP;
WAIT;
END PROCESS Filling;
...
- - process to handle rdbar
END ARCHITECTURE
From File;
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
The Textio Package
This package defines various TYPEs and provides many

procedures for handling text.
TYPE TEXT IS FILE OF STRING;
TYPE LINE IS ACCESS STRING;
FILE INPUT: TEXT OPEN READ MODE IS std input
FILE OUTPUT: TEXT OPEN WRITE MODE IS std output
PROCEDURE READLINE(FILE f: TEXT; L: INOUT LINE)
Dinesh Sharma
VHDL
Files in VHDL
The Textio Package
Reading and Writing Text

Text reading and writing is a two step procedure. For writing,
you first compose a line and then write it to a file. For reading,
you read a line and then extract values from it.
Several overloaded functions all carrying the names READ or
WRITE are provided for this. For example:
PROCEDURE READ (L: InOut LINE; value: OUT BIT);
PROCEDURE READ (L: InOut LINE; value: OUT
BIT VECTOR);
PROCEDURE READ (L: InOut LINE; value: OUT Integer);
PROCEDURE READ (L: InOut LINE; value: OUT BIT);
etc.
Similarly, there are many WRITE functions.
Dinesh Sharma
VHDL

Lecture Notes On Mixed Signal Circuit Design by Prof Dinesh.K.sharma

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Lecture Notes On Mixed Signal Circuit Design by Prof Dinesh.K.sharma

Diunggah oleh

Hak Cipta:

Format Tersedia

http://www.satishkashyap.

Basics of Semiconductor Devices

October 13, 2005

To simplify these relations, we define a dimensionless Fermi potential by:

Generally, a semiconductor will be doped with only one kind of impurity. A

1.2 A semiconducor in the presence

In the presence of an electric field, the elctrostatic potential is different at different

where is the electrostatic potential.

we can write the above relations as:

Since there is equilibrium, even

The above relations assume a semiconductor in equilibrium. It is possible to create

and is no longer constant. Because the number of additional carriers is assumed to be

The p-n diode

We shall analyse the abrupt pn junction, in reverse and forward bias.

= pqp ( ) pqp (Fp )

When there is no flow of current, Fn = Fp = F . according to the relations derived

The Fermi potential difference was, therefore,

. Since after being put

pn Diode in Reverse Bias

Integrating with respect to x

Since the value of the field must match at x = 0;

Integrating equation (14) once again with respect to x, we get

making use of equation (15), we can write

From which the total depletion width can be calculated as:

The pn diode in forward bias

pn0 = n2i /Nd

The continuity equation for any particle flow can be written as

Applying it to electron and hole currents in 1 dimension on the n side,

In the neutral region,

is zero, so the above simplifies further to

Evaluating the hole current at Xdn , we get

Similarly, we can evaluate the electron current on the p side as

which gives the total current density as

qDp pn0 qDn np0

The MOS Capacitor

It is important to understand the MOS capacitor in order to understand the behaviour

The Parallel Plate Capacitor

The MOS capacitor

In a MOS capacitor, we replace the lower plate by a semiconductor. Unlike a metal,

Figure 3: Low frequency capacitance for a MOS capacitor

Consider a one dimensional representation of the MOS structures as shown in the

If we define Qsi to be the semiconductor charge

where E is the electrostatic field. Changing the variable from x to .

This can be integrated from x = (where E = 0 and u = 0) to x to give

This equation permits us to calculate D (= si u

= The Extrinsic Debye Length

Abs. Sem. Charge (C/cm2 )

Gate Voltage (V)

Where ms is the metal to semiconductor work function difference.

Surface Potential (V)

Figure 5: Surface potential as a function of gate voltage

Abs. Sem. Charge (C/cm )

Gate Voltage (V)

Figure 6: Semiconductor charge as a function of gate voltage

The MOS Transistor

I-V characteristics of a MOS transistor

A quantitative derivation of the current-voltage characteristics of the MOS device is

A simple MOS model

We make the following simplifying assumptions:

Figure 8: Coordinate system used for analysing the MOS transistor

Since there is no dependence on z, the z integral just gives a multiplication by W.

Integrating the drain current along the channel gives

This equation permits us to calculate D (= si u

Qdepl = 2si qNa (V (y) + 2F )

2si qNa (V (y) + 2F )