'HVLJQ&KDOOHQJHV
$Q2YHUYLHZ
Jan M. Rabaey
BWRC
University of California @ Berkeley
http://bwrc.eecs.berkeley.edu
DSM 1/DSM
Microscopic Problems Macroscopic Issues
Wiring Load Management Time-to-Market
Noise, Crosstalk Millions of Gates
Reliability, Manufacturability High-Level Abstractions
Complexity: LRC, ERC Reuse & IP: Portability
Accurate Power Prediction Predictability
Accurate Delay Prediction etc.
etc.
1010
*0.08m
Human
Humanmemory
memory
Human DNA 4 Gbits
109 Human DNA 0.15m P7
1 Gbits 1000.0
0.15-0.2m
108
P6
0.25-0.3m
100.0
107
64 Mbits 0.35-0.4m
1970 1980 1990 2000 2010 1980 1985 1990 1995 2000
Year Year
Due to 30%
10,000
Vdd scaling 70
60
1,000
50
2
100
Watts/cm
Pentium 40
Icc (amps)
Pro (R)
10 30
Pentium 100-2,000amps 20
1 (R)
486 10
386
0 0
1985 1990 1995 2000 2005 2010
Courtesy Intel
Challenges in Deep-Submicron Design
Device scaling
Scaling of the voltage
The leaky transistor
Short- and long-term reliability
Interconnect scaling
Capacitance
Resistance
Inductance
Transistor Scaling
(velocity-saturated devices)
DSM devices: Evolution of Idsat
NMOS
800 PMOS 800
700 700
Idsat (A/m)
600 600
500 500
400 400
300 300
200 200
0.08 0.12 0.16 0.20 0.24
Drawn Channel Length (m)
Data taken from 16 papers (IBM,TI, Bell Labs, Motorola, Intel, AMD)
Demonstrates a relatively constant Idsat from Ldrawn of 0.25 to 0.09 m
[Sylvester, Keutzer, 98]
Evolution of Power Density
Source: Sakurai97
Scaling the Supply Voltage
5
0.2
4.5
4
0.15
3.5
Supply Voltage (V)
Vout (V)
0.1
2.5
1.5 0.05
0.5
-1 0
1 10 0 0.05 0.1 0.15 0.2
Minimum Feature Size (micron) V (V)
in
88
Off-current
66
44
22
00
250
250 180
180 130130 100
100 70
70 50
50
Te
Technology
chnologyNode
Node
Power & Delay Dependence on VDD & VTH
Courtesy Sakurai97
Power-Delay vs Energy-Delay Product
Reduced VDD/VT ratio
Reduces Predictability
[Sakurai&Kuroda]
DSM Reduces Predictability
Drain
Gate Gate
Source
SiO2
[Huang, IEDM99]
Device Challenges Summary
Conventional planar CMOS continues as long as
possible
Transistor gets (slightly) faster and (plenty) leakier
Off-current and gate-current will both increase to meet
design limit
Circuit design techniques needed to address standby power
dissipation
Deep sub-micron effects (VT-variation, drain-induced
effects, hot-carrier) impact predictability
Non-planar transistors separate shrinks from
performance improvements
Dual-gate devices help to suppress DSM effects
The Interconnect Challenge
With increases in performance and integration
density, wire parasitics gain dominance
The wire combines capacitance, resistance, and
inductance
Wire parasitics impact performance, energy
dissipation and reliability
transmitters receivers
Interconnect Distribution
Source: Intel
10 100 1,000 10,000 100,000
Length (u)
The Ideal Wire Scaling Model
The RC Dilemma
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0.1 0.13 0.18 0.25
Process Generation (m)
fringing parallel
Crosstalk
W S
Neighboring wires switch,
Cc Cc
T coupling to a quiet line
Ca Cv Ca
Quiet line sees a undesired
H
Ground Plane
voltage spike
Crosstalk can lead to:
- Logic faults (especially in dynamic circuits)
- Voltage overshoot (stress, forward-bias PN junctions)
Voltage spike, Vx Cc / Ctotal
Vx is a complex function of
- Driver strength
- Fan-out capacitance
- Wiring resistance
Delay Degradation
Miller Effect
- Both terminals of capacitor are switched in opposite directions
(0 Vdd, Vdd 0)
- Effective voltage is doubled and additional charge is needed
(from Q=CV)
Structured and Predictable Interconnect
V S G S V S
S
V
S
G
S
V
Example: Dense Wire Fabric (DWF) [Khatri, DAC99]
Trade-off:
Cross-coupling capacitance 40x lower, 2% delay variation
Increase in area and overall capacitance
The Impact of Resistivity
Tr
C1 C2 CN-1 CN
Vin
2 .5
2 .5
x= L /1 0
Diffused signal 2
2
x= L /1 0
x = L /4
propagation x = L /4
v o lta g e (V)
1 .5
v o lta g e (V)
1 .5
x = L /2
x = L /2
1
1
x= L
Delay ~ L2 0 .5
0 .5
x= L
0
00 0 .5 1 1 .5 2 2 .5 3 3 .5 4 4 .5 5
0 0 .5 1 1 .5 2 2 .5 3 3 .5 4 4 .5 5
tim e ( n se c )
tim e ( n se c )
Using Copper as Interconnect
Material
With cladding and other effects,
Cu ~ 2.2 mW-cm vs. 3.5 for Al(Cu)
40% reduction in resistance
Yields 12% performance
improvement over an aluminum
process in a PowerPC design
Electromigration improvement;
100X longer lifetime (IBM, IEDM97)
Electromigration is a limiting
factor beyond 0.18 mm if Al is
used (HP, IEDM95)
Transistor SEM
The Global Wire Problem
Td = 0.377 R w C w + 0.693(R d C out + R d C w + R w C out )
Challenges
No further improvements to be expected after the
introduction of Copper (superconducting, optical?)
Design solutions
Use of fat wires
Insert repeaters but might become prohibitive (power, area)
Efficient chip floorplanning
Towards communication-based design
How to deal with latency?
Is synchronicity an absolute necessity?
Architecture Must Evolve to Fit
the Landscape
Global operations
Low bandwidth 20 Clocks
High latency &
High power
90,000
tracks
3.0 3.5
3.0
2.5
M3
2.5
2.0 M5 M5
2.0
M4 M4
1.5
M3 1.5 M3
M2
1.0 M2 M2
M1
1.0 M1
M1
0.5 Poly 0.5 Poly
poly
substrate 0.0 0.0
1.0 0.8 0.6 0.35 0.25 1.0 0.8 0.6 0.35 0.25
0.25 m wiring stack
Resistance and the Power
Distribution Problem
10,000
1,000
100
Pentium
Icc (amps)
10
Pro (R) 100-3,000amps RI drop
Pentium
VDD I
1 (R)
486
386
0
1985 1990 1995 2000 2005 2010
pr e R VDD - V
I V
V
R
Resistance and the Power
Distribution Problem
Before After
Vdd GND
Bus lines
Inductive Noise - Ldi/dt
1.E+08
1.E+07
1.E+06
di/dt in AU
1.E+05
Pentium
1.E+04 Pro
1.E+03 Pentium di/dt noise
increases
1.E+02 486
1.E+01
386
1.E+00
1.5 0.8 0.35 0.18 0.1
Source: Intel
Inductive Noise - Ldi/dt
Decoupling
capacitance problem
Board Bonding becoming extreme
Wiring Wire DEC 21164: 128
+
nF of on-chip
decoupling
SUPPLY Cd CHIP
DEC 21264: add
- flip-chip decoupling
capacitor chip