Horowitz

Scaling, Power and the Future of CMOS
Mark Horowitz, Stanford
The 80s Power Problem

Until mid 80s technology was mixed
nMOS, bipolar, some CMOS
Supply voltage was not scaling / power was rising

nMOS, bipolar gates dissipate static power
From Roger Schmidt, IBM Corp

Slide 2
Solution: Move to CMOS

And then scale Vdd
10
Feat Size (um) Vdd 0.1 Jan-85 Jan-88 Jan-91 Jan-94 Jan-97 Jan-00 Jan-03
Slide 3
Bad News
Voltage scaling has stopped as well
kT/q does not scale Vth scaling has power consequences
If Vdd does not scale

Energy scales slowly
Ed Nowak, IBM
Slide 4
Energy Performance Space

Every design is a point on a 2-D plane
Energy
Performance
Slide 5

Energy
Performance
Slide 6

Energy
Performance
Slide 7
Trade-offs for an Adder

10
5
Dom. BK Stat. KS Stat. BK
10
Stat.Carr.Chain
Stat.Carr.Sel
Dom. LF
Dom. KS
10
Stat. LF
Stat. 84421
10
10
Slide 8
Key Observation:
Define the Energy/Delay sensitivity of parameter
For example Vdd:
Sens (Vdd ) =
E D
Vdd Vdd
Vdd =Vdd *
At optimal point, all sensitivities should be the same

Must equal the slope of the Pareto optimal curve
Slide 9
What This Means

Vdd and Vth are not directly set by scaling
Instead set by slope of Pareto optimal curve Leakage rose to lower total system power!
10
5
[V =1.3 Vt =0.22]
dd N
10
[V =1.2 Vt =0.26]
dd N
10
3
[V =0.68, Vt =0.31]
dd N
[V =1 Vt =0.30]
dd N
10
1
10
Slide 10
Leakage Energy
Matching marginal costs for Vdd and Vth
1.2 1 0.8 Voltage 0.6 0.4 0.2 0 -2 10 Leakage Ratio 0.2 Vth 0.1 Vdd 0.5
0.4 Leakage Ratio
0.3
10 Activity Factor
-1
Slide 11
Low Power Design Techniques

Three main classes of methods to reduce energy: Cheating
Reducing the performance of the design
Reducing waste
Stop using energy for stuff that does not produce results Stop waiting for stuff that you dont need (parallelism)
Problem reformulation
Reduce work (less energy and less delay)
Slide 12
Reducing Energy Waste

Clock gating
If a section is idle, remove clock Removes clock power Prevents any internal node from transitioning
Create system power states

Turn on subsystems only when they are needed Can have different off states
Power vs. wakeup time Disk (do you stop it from spinning?)
Slide 13
Embedded Power Gating

Since transistors still leak when power is off
Power Switch Control Signals Embedded Power Switches
Can reduce leakage

250x reported
But costs
Performance Drop in Vdd, Gnd
Rows of Standard Cells

Royannez, et al, 90nm Low Leakage SoC Design Techniques for Wireless Applications, ISSCC 2005
Slide 14
Range of Applicability
Power supply gating
Done to remove leakage power But slows down the circuit
Adds series resistance to the supply Makes circuit worse when energy sensitivity is high
Energy/op
Performance
Slide 15
Parallelism
If the application has data parallelism
Parallelism is a way to improve performance

With low additional energy cost
Slide 16
Existing Processors
1
Watts/(Spec*L*Vdd )
2
0.1
0.01 1 10
10 processors working in parallel

100 Spec2000*L 1000
Slide 17
Problem Reformulation
Best way to save energy is to do less work
Energy directly reduced by the reduction in work But required time for the function decreases as well
Convert this into extra power gains
Shifts the optimal curve down and to the right Energy/op
User Performance
Slide 18
Cost of Variation
Variability changes position of the optimal curves
Need to margin Vth, Vdd to ensure circuit always works
10
1
Vth = 120 mV
Energy
10
10
Slide 19
-1
Vth = 0 mV
10
-1
Performance
10
Partial Compensation
Adjust Vdd after you get part back
Compensates very well for small deviations in Vth
10
1
Vth = 120 mV
Energy
10
0
10
Slide 20
-1
Vth = 0 mV
10
-1
Performance
10
Variable Application Demands

Try to provide a couple of operating points
Application can control speed and energy Hard question is what are valid Vdd, F pairs
Usually determined during test
Dynamic voltage scaling

Intel Speed Step in laptop processors
2 performance/power points
Transmeta Long Run Technology

Many operating points. Test data + formula
Slide 21
Constant Power Scaling

Foxton controller on next-gen. Itanium II
Raises Vdd/boosts F when most units idle Lowers Vdd for parallel code to stay in budget
Slide 22
Self Checking Hardware

Razor (Austin/Blaauw, U of Mich)
Use the actual hardware to check for errors Latch the input data twice
Once on the clock edge, and then a little later If the data is not the same, you are going too fast
clk Din 2 Din 1 Din 0 0 1 Main Flip-Flop Shadow Latch Q[0] Error_L error clk_del comparator
RAZOR FF
Slide 23
Future Systems
Some simple math
Assume scaling continues Dies dont shrink in size Average power/gate must decrease by 2x / generation
Since gates are shrinking in size

Get 1.4x from capacitive reduction
Where is the other factor of 1.4x ?

Slide 24
Exploit Parallelism / Scale Vdd

If you have parallelism
Add more function units
Fill up new die (2x) Energy/op
Lower energy/op
E/P will decrease Vdd, sizes, etc will reduce Build simpler architectures
Performance
Works well when E/P is large

Per unit performance decrease is small
Slide 25
Exploit Specialization
Optimize execution units for different applications
Reformulate the hardware to reduce needed work Can improve energy efficiency for a class of applications
Stream / Vector processing is a current example

Exploit locality, reuse High compute density
HISC NI -controller Memory System
SRF Clusters
Bill Dally et al, Stanford Imagine
Slide 26
Exploit Integration
If both those techniques dont work
Still can increase integration by at least 1.4x
Moving units onto one chip

Reduces the number of I/Os on system I/O can take significant power today Allows even larger integration
Slide 27
TI - OMAP2420
Specialization And power domains
Most units are off
OMAP 2420
5 Power Domains #1: MCU Core #2: DSP Core #3: Graphic Accelerator #4: Core + Periph. #5: Always On logic
Royannez, et al, 90nm Low Leakage SoC Design Techniques for Wireless Applications, ISSCC 2005
Slide 28
What All This Means

As long as $/function and cap continue to scale
Moving to the new technology will be profitable And will allow designs to be better systems
In the worst case, active die area will decrease

Scale gates by the decrease in gate capacitance
In most cases, we will do much better

But effectiveness of a gate must go down
Either have lower duty cycle Or lower energy by lowering Vdd, which makes it slower
Slide 29
Conclusions
Unfortunately power is an old problem
Magic bullets have mostly been spent
Power will be addressed by application-level optimization, parallelism/specialized functional units, and more adaptive control Performance growth will slow
NRE costs will be a very large issue More performance will require application specialization
Slide 30

Horowitz

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Horowitz

Diunggah oleh

Hak Cipta:

Format Tersedia

Scaling, Power and the Future of CMOS

Mark Horowitz, Stanford

The 80s Power Problem

Supply voltage was not scaling / power was rising

From Roger Schmidt, IBM Corp

Solution: Move to CMOS

If Vdd does not scale

Energy Performance Space

Energy Performance Space

Energy Performance Space

Trade-offs for an Adder

Dom. BK Stat. KS Stat. BK

At optimal point, all sensitivities should be the same

What This Means

0.4 Leakage Ratio

Low Power Design Techniques

Reducing Energy Waste

Create system power states

Embedded Power Gating

Can reduce leakage

Rows of Standard Cells

Parallelism is a way to improve performance

10 processors working in parallel

Shifts the optimal curve down and to the right Energy/op

Variable Application Demands

Dynamic voltage scaling

Transmeta Long Run Technology

Constant Power Scaling

Self Checking Hardware

Since gates are shrinking in size

Where is the other factor of 1.4x ?

Exploit Parallelism / Scale Vdd

Works well when E/P is large

Stream / Vector processing is a current example

Bill Dally et al, Stanford Imagine

Moving units onto one chip

What All This Means

In the worst case, active die area will decrease

In most cases, we will do much better

Anda mungkin juga menyukai