Anda di halaman 1dari 29

Introduction to

CMOS VLSI
Design

Design for Skew

Outline

Clock Distribution
Clock Skew
Skew-Tolerant Static Circuits
Traditional Domino Circuits
Skew-Tolerant Domino Circuits

CMOS VLSI Design

Design forSlide
Skew2

Clocking
Synchronous systems use a clock to keep
operations in sequence
Distinguish this from previous or next
Determine speed at which machine operates
Clock must be distributed to all the sequencing
elements
Flip-flops and latches
Also distribute clock to other elements
Domino circuits and memories

CMOS VLSI Design

Design forSlide
Skew3

Clock Distribution
On a small chip, the clock distribution network is just
a wire
And possibly an inverter for clkb
On practical chips, the RC delay of the wire
resistance and gate load is very long
Variations in this delay cause clock to get to
different elements at different times
This is called clock skew
Most chips use repeaters to buffer the clock and
equalize the delay
Reduces but doesnt eliminate skew
CMOS VLSI Design

Design forSlide
Skew4

Example
Skew comes from differences in gate and wire delay
With right buffer sizing, clk1 and clk2 could ideally
arrive at the same time.
But power supply noise changes buffer delays
clk2 and clk3 will always see RC skew
gclk
3 mm

3.1 mm

clk1

clk2

1.3 pF

0.4 pF

CMOS VLSI Design

0.5 mm
clk3
0.4 pF

Design forSlide
Skew5

Review: Skew Impact

sequencing overhead

tcd thold tccq tskew

F1

Q1

Combinational Logic

D2

F2

clk

Tc
clk
tpcq
Q1

tskew
tpdq

tsetup

D2
clk
Q1

CL

clk
D2

F2

t pd Tc t pcq tsetup tskew


1 4 44 2 4 4 43

clk

F1

Ideally full cycle is


available for work
Skew adds sequencing
overhead
Increases hold time too

tskew
clk

thold

Q1 tccq
D2

tcd

CMOS VLSI Design

Design forSlide
Skew6

Cycle Time Trends


Much of CPU performance comes from higher f
f is improving faster than simple process shrinks
Sequencing overhead is bigger part of cycle
1000

100

M H z

S p e c In t9 5

10

100

80386
80486
P e n tiu m
P e n tiu m II / III

0 .1

80386
80486
P e n tiu m
P e n tiu m II / III
10
1985

0 .0 1

1985

1988

1991

1994

1997

2000

1988

1991

1994

1997

2000

500

F O 4 in v e r te r d e la y s / c y c le

F a n o u t - o f - 4 ( F O 4 ) I n v e r t e r D e la y ( p s )

100

V D D = 3 .3

VD D = 5

V D D = 2 .5

200

100

50
2 .0

1 .2

0 .8

0 .6

0 .3 5

0 .2 5

50

80386
80486
P e n tiu m
P e n tiu m II / II I

20

10
1985

1988

1991

1994

1997

2000

P ro c e s s

CMOS VLSI Design

Design forSlide
Skew7

Solutions
Reduce clock skew
Careful clock distribution network design
Plenty of metal wiring resources
Analyze clock skew
Only budget actual, not worst case skews
Local vs. global skew budgets
Tolerate clock skew
Choose circuit structures insensitive to skew

CMOS VLSI Design

Design forSlide
Skew8

Clock Dist. Networks

Ad hoc
Grids
H-tree
Hybrid

CMOS VLSI Design

Design forSlide
Skew9

Clock Grids

Use grid on two or more levels to carry clock


Make wires wide to reduce RC delay
Ensures low skew between nearby points
But possibly large skew across die

CMOS VLSI Design

Design for
Slide
Skew
10

Alpha Clock Grids


Alpha 21064

Alpha 21164

Alpha 21264

PLL

gclk grid

Alpha 21064

gclk grid

Alpha 21164

CMOS VLSI Design

Alpha 21264

Design for
Slide
Skew
11

H-Trees
Fractal structure
Gets clock arbitrarily close to any point
Matched delay along all paths
Delay variations cause skew
A
A and B might see big skew

CMOS VLSI Design

Design for
Slide
Skew
12

Itanium 2 H-Tree
Four levels of buffering:
Primary driver
Repeater
Second-level
clock buffer
Gater
Route around
obstructions

Repeaters

Typical SLCB
Locations

Primary Buffer

CMOS VLSI Design

Design for
Slide
Skew
13

Hybrid Networks
Use H-tree to distribute clock to many points
Tie these points together with a grid
Ex: IBM Power4, PowerPC
H-tree drives 16-64 sector buffers
Buffers drive total of 1024 points
All points shorted together with grid

CMOS VLSI Design

Design for
Slide
Skew
14

Skew Tolerance
Flip-flops are sensitive to skew because of hard edges
Data launches at latest rising edge of clock
Must setup before earliest next rising edge of clock
Overhead would shrink if we can soften edge
Latches tolerate moderate amounts of skew
Data can arrive anytime latch is transparent

CMOS VLSI Design

Design for
Slide
Skew
15

Skew: Latches
Q1

Combinational
Logic 1

D2

1
Q2

Combinational
Logic 2

D3

Q3

pdq

sequencing overhead

tcd 1 , tcd 2 thold tccq tnonoverlap tskew


tborrow

L3

122t 3

D1

L1

t pd Tc

L2

2-Phase Latches

Tc
tsetup tnonoverlap tskew
2

Pulsed Latches

t pd Tc max t pdq , t pcq tsetup t pw tskew


1444442444443
sequencing overhead

tcd thold t pw tccq tskew


tborrow t pw tsetup tskew
CMOS VLSI Design

Design for
Slide
Skew
16

Dynamic Circuit Review


Static circuits are slow because fat pMOS load input
Dynamic gates use precharge to remove pMOS
transistors from the inputs
Precharge: = 0 output forced high
Evaluate:
= 1 output may pull low
A
B

C
D
A

Y
C

Y
B

CMOS VLSI Design

Design for
Slide
Skew
17

Domino Circuits
Dynamic inputs must monotonically rise during
evaluation
Place inverting stage between each dynamic gate
Dynamic / static pair called domino gate
Domino gates can be safely cascaded
domino AND

A
B

dynamic static
NAND inverter

CMOS VLSI Design

Design for
Slide
Skew
18

Domino Timing
Domino gates are 1.5 2x faster than static CMOS
Lower logical effort because of reduced Cin
Challenge is to keep precharge off critical path
Look at clocking schemes for precharge and eval
Traditional schemes have severe overhead
Skew-tolerant domino hides this overhead

CMOS VLSI Design

Design for
Slide
Skew
19

Traditional Domino Ckts


Hide precharge time by ping-ponging between halfcycles
One evaluates while other precharges
Latches hold results during precharge
Tc
clk
clk

CMOS VLSI Design

Latch

Dynamic

clk clk
Static

Dynamic

clk
Static

Dynamic

clk
Static

tpdq

Dynamic

Latch

Dynamic

clk clk clk


Static

Dynamic

clk
Static

Dynamic

clk
Static

clk
Dynamic

t pd Tc 2t pdq

tpdq

Design for
Slide
Skew
20

Clock Skew
Skew increases sequencing overhead
Traditional domino has hard edges
Evaluate at latest rising edge
Setup at latch by earliest falling edge
clk

Latch

Dynamic

clk clk
Static

Dynamic

Dynamic

clk
Static

clk

Latch

clk clk
Dynamic

Static

Dynamic

clk
Static

clk
Dynamic

t pd Tc 2tsetup 2tskew

clk

tsetup tskew

CMOS VLSI Design

Design for
Slide
Skew
21

Time Borrowing
Logic may not exactly fit half-cycle
No flexibility to borrow time to balance logic
between half cycles
Traditional domino sequencing overhead is about
25% of cycle time in fast systems!
clk

Latch

clk
Static

clk
Dynamic

clk
Static

clk

Dynamic

Static

Dynamic

clk
Static

Dynamic

clk

Latch

clk

tsetup tskew

CMOS VLSI Design

Design for
Slide
Skew
22

Relaxing the Timing


Sequencing overhead caused by hard edges
Data departs dynamic gate on late rising edge
Must setup at latch on early falling edge
Latch functions
Prevent glitches on inputs of domino gates
Holds results during precharge
Is the latch really necessary?
No glitches if inputs come from other domino
Can we hold the results in another way?

CMOS VLSI Design

Design for
Slide
Skew
23

Skew-Tolerant Domino
Use overlapping clocks to eliminate latches at phase
boundaries.
Second phase evaluates using results of first
No latch at
phase boundary

Static

Dynamic

2
Static

Dynamic

CMOS VLSI Design

Design for
Slide
Skew
24

Full Keeper
After second phase evaluates, first phase precharges
Input to second phase falls
Violates monotonicity?
But we no longer need the value
Now the second gate has a floating output
Need full keeper to hold it either high or low

H
X
f

weak full
keeper
transistors

CMOS VLSI Design

Design for
Slide
Skew
25

Time Borrowing
Overlap can be used to
Tolerate clock skew
Permit time borrowing
No sequencing overhead

toverlap
tborrow tskew

Phase 1

CMOS VLSI Design

Static

Dynamic

2
Static

Dynamic

2
Static

Dynamic

2
Static

Dynamic

1
Static

Dynamic

1
Static

Dynamic

1
Static

Dynamic

1
Static

1
Dynamic

t pd Tc

Phase 2

Design for
Slide
Skew
26

Multiple Phases
With more clock phases, each phase overlaps more
Permits more skew tolerance and time borrowing
1
2
3
4

Phase 1

Phase 2

Phase 3

CMOS VLSI Design

Static

Dynamic

4
Static

Dynamic

4
Static

Dynamic

3
Static

Dynamic

3
Static

Dynamic

2
Static

Dynamic

2
Static

Dynamic

1
Static

Dynamic

Phase 4

Design for
Slide
Skew
27

Clock Generation
en clk
1
2
3
4
CMOS VLSI Design

Design for
Slide
Skew
28

Summary
Clock skew effectively increases setup and hold
times in systems with hard edges
Managing skew
Reduce: good clock distribution network
Analyze: local vs. global skew
Tolerate: use systems with soft edges
Flip-flops and traditional domino are costly
Latches and skew-tolerant domino perform at full
speed even with moderate clock skews.

CMOS VLSI Design

Design for
Slide
Skew
29

Anda mungkin juga menyukai