Anda di halaman 1dari 45

8OWUD'HHS6XEPLFURQ

'HVLJQ&KDOOHQJHV
$Q2YHUYLHZ

Jan M. Rabaey
BWRC
University of California @ Berkeley
http://bwrc.eecs.berkeley.edu

With contributions from D. Sylvester, K. Keutzer,


and many others
The Deep Sub-Micron Challenge

DSM 1/DSM
Microscopic Problems Macroscopic Issues
Wiring Load Management Time-to-Market
Noise, Crosstalk Millions of Gates
Reliability, Manufacturability High-Level Abstractions
Complexity: LRC, ERC Reuse & IP: Portability
Accurate Power Prediction Predictability
Accurate Delay Prediction etc.
etc.

Everything Looks a Little Different and Theres a Lot of Them!


?
Design at a Crossroad
Silicon technology tracking Moores Law
Silicon in 2010 De n s ity Ac c e s s Time
(Gb its /c m2 ) (n s )
Die Area: 2.5x2.5 cm
DRAM 8.5 10
Voltage: 0.6 - 0.9 V
Technology: 0.07 m 15
DRAM (Lo g ic )
times denser
2.5 10
2.5
S RAMtimes
(Cac h epower
) 0.3 1.5
than todayrate
density
5 times clock
Density Max. Ave. Power Clock Rate
(Mgates/cm2) (W/cm2) (GHz)
Custom 25 54 3
Std. Cell 10 27 1.5
Gate Array 5 18 1
Single-Mask GA 2.5 12.5 0.7
FPGA 0.4 4.5 0.25
Design at a Crossroad
Silicon technology tracking Moores Law
64 Gbits

1010
*0.08m
Human
Humanmemory
memory
Human DNA 4 Gbits
109 Human DNA 0.15m P7
1 Gbits 1000.0
0.15-0.2m
108
P6

Microprocessor Power (MIPS)


256 Mbits
Number of bits per chip

0.25-0.3m
100.0
107
64 Mbits 0.35-0.4m

106 Book 10.0 Pentium


Book 16 Mbits 0.5-0.6m 486

105 4 Mbits 0.7-0.8m


1.0
1 Mbits 1.0-1.2m
104 386
256 Kbits Encyclopedia
1.6-2.4m Encyclopedia
22hrs
hrsCD
CDAudio
Audio
30
64 Kbits 30sec
secHDTV
HDTV 286
Page
Page

1970 1980 1990 2000 2010 1980 1985 1990 1995 2000

Year Year

Courtesy of David Eaglesham, Lucent


Power Dissipation

Due to 30%
10,000
Vdd scaling 70

60
1,000
50

2
100

Watts/cm
Pentium 40
Icc (amps)

Pro (R)
10 30
Pentium 100-2,000amps 20
1 (R)
486 10
386
0 0
1985 1990 1995 2000 2005 2010

Surpassed Hot-Plate Power


Density in 0.6 m CMOS

Courtesy Intel
Challenges in Deep-Submicron Design

Device scaling
Scaling of the voltage
The leaky transistor
Short- and long-term reliability
Interconnect scaling
Capacitance
Resistance
Inductance
Transistor Scaling
(velocity-saturated devices)
DSM devices: Evolution of Idsat
NMOS
800 PMOS 800

700 700
Idsat (A/m)

600 600

500 500

400 400

300 300

200 200
0.08 0.12 0.16 0.20 0.24
Drawn Channel Length (m)
Data taken from 16 papers (IBM,TI, Bell Labs, Motorola, Intel, AMD)
Demonstrates a relatively constant Idsat from Ldrawn of 0.25 to 0.09 m
[Sylvester, Keutzer, 98]
Evolution of Power Density

Source: Sakurai97
Scaling the Supply Voltage
5
0.2
4.5

4
0.15
3.5
Supply Voltage (V)

Vout (V)
0.1
2.5

1.5 0.05

0.5
-1 0
1 10 0 0.05 0.1 0.15 0.2
Minimum Feature Size (micron) V (V)
in

Scaling forced by reliability Scaling limited by gain


and power considerations and thermal noise
Projected Evolution in Ioff
12
12
10
10
(25C)
Off-current(25C)

88
Off-current

66
44

22
00
250
250 180
180 130130 100
100 70
70 50
50
Te
Technology
chnologyNode
Node
Power & Delay Dependence on VDD & VTH

Courtesy Sakurai97
Power-Delay vs Energy-Delay Product
Reduced VDD/VT ratio
Reduces Predictability

[Sakurai&Kuroda]
DSM Reduces Predictability

Degradation of IV characteristics of NMOS transistor


due to hot-electron effects [McGaughy88]
Silicon-on-Insulator

Gate tSi < 50 nm


tOX
Oxide n+ tSi n+ Oxide

Buried Oxide (BOX) t BOX


P Substrate

Extension beyond Bulk CMOS Reduced leakage (low-power)


Scaling Limit Latch-Up Elimination
Performance Improvement Ease of Device Isolation
Reduced Junction Capacitance
Potentially Reduced Wafer
Absence of Reverse-Body Effect
Fabrication Cost
Soft Error Rate (SER) Improvement
IBM 64b PowerPC
Bulk CMOS Base Design PD/SOI Technology
0.12 m Leff , 6LM (Cu) 0.12 m Leff , 6LM (Cu)
450 MHz, 22 W @ 1.8 V 550 MHz, 24 W @ 1.8 V

CMOS6S2 CMOS7S CMOS7S SOI


Core Clock Frequency 350 MHz 450 MHz 550 MHz
L1 Cache 64KB-I + 64KB-D 128KB-I + 128KB-D 128KB-I + 128KB-D
L2 Directory N/A 104 x 16 K 104 x 16 K
Supply Voltage 2.5 V 1.8 V 1.8 V
Transistors 12 M 34 M 34 M
Die Size 162 mm2 139 mm2 139 mm2
Power 34 W 22 W 24 W
Leff (nFET) 0.18 m 0.12 m 0.12 m
TOX 5.0 nm 3.5 nm 3.5 nm
Metalization 5 Layers AL 6 Layers Cu 6 Layers Cu
Contacted M2 M4 1.26 m 0.81 m 0.81 m
Pitch

(D. H. Allen et al., ISSCC, 1999)


Alternatives to planar CMOS:
Vertical transistors

Drain

Gate Gate

Source

Some early types of vertical structures discussed since 1980s


Separates performance-scaling from packing-scaling
Require manufacturable process with low parasitics
Dual-Gate Berkeley FinFET (1999)

SiO2

Closer to planar technology


Proven operation for NMOS and PMOS down to 18 nm
Can be scaled down to 10 nm
Suppresses short-channel effects! [Huang, IEDM99]
Berkeley PMOS FINFET (Lg = 45 nm)

S = 69 mV/decade Highest reported PMOS Drive Current

[Huang, IEDM99]
Device Challenges Summary
Conventional planar CMOS continues as long as
possible
Transistor gets (slightly) faster and (plenty) leakier
Off-current and gate-current will both increase to meet
design limit
Circuit design techniques needed to address standby power
dissipation
Deep sub-micron effects (VT-variation, drain-induced
effects, hot-carrier) impact predictability
Non-planar transistors separate shrinks from
performance improvements
Dual-gate devices help to suppress DSM effects
The Interconnect Challenge
With increases in performance and integration
density, wire parasitics gain dominance
The wire combines capacitance, resistance, and
inductance
Wire parasitics impact performance, energy
dissipation and reliability

transmitters receivers
Interconnect Distribution

Pentium Pro (R)


Pentium(R) II
Pentium (MMX)
Pentium (R)
Pentium (R) II
(Log Scale)
No of nets

Source: Intel
10 100 1,000 10,000 100,000
Length (u)
The Ideal Wire Scaling Model
The RC Dilemma

While transistor delay


scales as 1/S!
Constant Resistance Scaling

Scaling would increase R ( S3)


historically aspect ratio has increased to compensate
Constant Resistance Scaling
Differential scaling of horizontal and vertical dimensions keeps
resistance in check

c: horizontal/vertical capacitance scaling factor (including fringing)


Will Interconnect Dominate Delay?
delay
1
# logic levels decreasing
(architecture)
0.5 Min. gate size shrinking
Parasitics increase due to
0.25 scaling
Increasing RC delay with
88 94 00 Year chip size

From Aykut Dengi


gate delay
1996 ICCAD tutorial
delay due to
sizing and buffering
interconnect delay
Or Will Its Impact Decrease?
2-input NAND, FO = 2, W/L = 16
120 120
110 Gate delay 110 [Keutzer98]
100 Stage delay 100
90 90
80 80
Delay (ps)

70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
0.1 0.13 0.18 0.25
Process Generation (m)

Shorter local wire length, transistor sizing, and low-k dielectrics


Interconnect Projections
Low-k dielectrics
Both delay and power are reduced by dropping
interconnect capacitance
Types of low-k materials include: inorganic (SiO2),
organic (Polyimides) and aerogels (ultra low-k)
The numbers below are on the
conservative side of the NRTS roadmap

Generation 0.25 0.18 0.13 0.1 0.07 0.05


m m m m m m
Dielectric 3.3 2.7 2.3 2.0 1.8 1.5
Constant
From Capacitance-to-GND to
Interwire Capacitance

fringing parallel
Crosstalk
W S
Neighboring wires switch,
Cc Cc
T coupling to a quiet line

Ca Cv Ca
Quiet line sees a undesired
H

Ground Plane
voltage spike
Crosstalk can lead to:
- Logic faults (especially in dynamic circuits)
- Voltage overshoot (stress, forward-bias PN junctions)
Voltage spike, Vx Cc / Ctotal
Vx is a complex function of
- Driver strength
- Fan-out capacitance
- Wiring resistance
Delay Degradation

- Impact of neighboring signal


Cc activity on switching delay
- When neighboring lines switch
in opposite direction of victim
line, delay increases

Miller Effect
- Both terminals of capacitor are switched in opposite directions
(0 Vdd, Vdd 0)
- Effective voltage is doubled and additional charge is needed
(from Q=CV)
Structured and Predictable Interconnect

V S G S V S

S
V
S
G
S
V
Example: Dense Wire Fabric (DWF) [Khatri, DAC99]
Trade-off:
Cross-coupling capacitance 40x lower, 2% delay variation
Increase in area and overall capacitance
The Impact of Resistivity
Tr

The distributed rc-line


R1 R2 RN-1 RN

C1 C2 CN-1 CN
Vin

2 .5
2 .5

x= L /1 0

Diffused signal 2
2
x= L /1 0

x = L /4

propagation x = L /4
v o lta g e (V)

1 .5
v o lta g e (V)

1 .5
x = L /2
x = L /2
1
1
x= L

Delay ~ L2 0 .5
0 .5
x= L

0
00 0 .5 1 1 .5 2 2 .5 3 3 .5 4 4 .5 5
0 0 .5 1 1 .5 2 2 .5 3 3 .5 4 4 .5 5
tim e ( n se c )
tim e ( n se c )
Using Copper as Interconnect
Material
With cladding and other effects,
Cu ~ 2.2 mW-cm vs. 3.5 for Al(Cu)
40% reduction in resistance
Yields 12% performance
improvement over an aluminum
process in a PowerPC design
Electromigration improvement;
100X longer lifetime (IBM, IEDM97)
Electromigration is a limiting
factor beyond 0.18 mm if Al is
used (HP, IEDM95)

Transistor SEM
The Global Wire Problem
Td = 0.377 R w C w + 0.693(R d C out + R d C w + R w C out )

Challenges
No further improvements to be expected after the
introduction of Copper (superconducting, optical?)
Design solutions
Use of fat wires
Insert repeaters but might become prohibitive (power, area)
Efficient chip floorplanning
Towards communication-based design
How to deal with latency?
Is synchronicity an absolute necessity?
Architecture Must Evolve to Fit
the Landscape
Global operations
Low bandwidth 20 Clocks
High latency &
High power

90,000
tracks

Local, parallel operations


High bandwidth
Low latency &
Low power

Source: Bill Dally, Stanford


Interconnect: # of Wiring Layers
= 2.2 M6
# of metal layers is steadily increasing due to:
-cm
Increasing die size and device count: we need
Tins more wires and longer wires to connect
everything
M5 Rising need for a hierarchical wiring network;
local wires with high density and global wires with
W
S low RC
M4 Minimum Widths (Relative) Minimum Spacing (Relative)
H 3.5 4.0

3.0 3.5

3.0
2.5
M3
2.5
2.0 M5 M5
2.0
M4 M4
1.5
M3 1.5 M3
M2
1.0 M2 M2
M1
1.0 M1
M1
0.5 Poly 0.5 Poly
poly
substrate 0.0 0.0
1.0 0.8 0.6 0.35 0.25 1.0 0.8 0.6 0.35 0.25
0.25 m wiring stack
Resistance and the Power
Distribution Problem
10,000

1,000

100
Pentium
Icc (amps)

10
Pro (R) 100-3,000amps RI drop
Pentium
VDD I
1 (R)
486
386
0
1985 1990 1995 2000 2005 2010
pr e R VDD - V

I V
V
R
Resistance and the Power
Distribution Problem
Before After

Requires fast and accurate peak current prediction


Heavily influenced by packaging technology
Source: Simplex
Inductance

Transmission line effects


cause overshooting and non-
monotonic behavior
Wave propagation puts
minimum bound on delay
and may require termination
Only to be considered when
the rise and fall times of the
signal are comparable to the
time-of-flight of the line, and
when the resistance of the
wire is small (< 5Z0)
Clock signals in 400 MHz IBM Microprocessor
(measured using e-beam prober) [Restle98]
Dealing with Inductance
Inductance hard to analyze accurately
Structural design approaches might be more appropriate
DEC approach in Alpha 21264 use entire planes of metal as
references (Vdd and GND) to reduce inductance by controlling the
return path
- Loss of routing density, added metal layers reduce yield &
raise costs
Another industry approach uses shield wires every ~ 3 signal lines
in a dense array

Vdd GND
Bus lines
Inductive Noise - Ldi/dt
1.E+08
1.E+07
1.E+06
di/dt in AU

1.E+05
Pentium
1.E+04 Pro
1.E+03 Pentium di/dt noise
increases
1.E+02 486
1.E+01
386
1.E+00
1.5 0.8 0.35 0.18 0.1

Source: Intel
Inductive Noise - Ldi/dt
Decoupling
capacitance problem
Board Bonding becoming extreme
Wiring Wire DEC 21164: 128
+
nF of on-chip
decoupling
SUPPLY Cd CHIP
DEC 21264: add
- flip-chip decoupling
capacitor chip

Decoupling Mostly solvable by


Capacitor advances in
packaging
technology and novel
timing approaches
Summary
Deep-submicron effects impact reliability,
performance, and power dissipation
The major device challenge: low-voltage, non-leaky
design
Interconnect starts playing a dominant role
capacitive: the increasing impact of interwire capacitance
resistive: global wire delay and power distribution
inductive: mostly supply noise, but transmission line effects
are emerging
Requires a new generation of fast and accurate
analysis tools
But most of all novel design methodologies and
concepts producing predictable or insensitive design

Anda mungkin juga menyukai