High Speed SRAM Design: Clock to WL Timing

High Speed SRAM Design

Clock to WL (Read/Write speed)

Read path (Memory Core)

Data-Out path (Periphery)

Data-In path (Periphery)

Write Path (Core)

Precharge scheme (Core)

Misc Issues
CLK TO WORDLINE (WL)

CLK to WL timing is one of the most critical timing in
High Speed SRAM design. (50% of SRAM access
time).

Important for both Read and Write.

For Read, after WL is turned on, data from the cell
comes on Bit-line which is sensed and sent to data-path
blocks.

For write, after WL is turned on, data-to be written is
presented on bitline which flip the cell.

WL in high speed designs in "pulsed" till data is sensed
(Read) / data is written (Write).
Clock to WL path
Location of Input buffers
Address Decoding/Multiplexing (PR Generation)
Row Redundancy using dynamic logic
Clock routing using Rowred info in XDEC
Dynamic logic scheme for GWL/SWL generation

Timing Example (QDR B2)

RD/WR address at rising/falling edge resp.

Q /D are valid at both rising/falling edge (DDR
operation)

Separate Clocks (K/C) for cmd and Data.

Two clocks (K/K_) & (C/C_) for DDR timing.
36M Chip floor plan
Clock to WL (I/P buffers location)

Long RC lines are driven by big drivers and it switches
at regulated VDD supply and hence increase current
consumption in set-up phase.

Input buffers if kept close to the pad result in long RC
lines for address from Input buffer till master/slave latch
(typically kept in chip-centre)

Current consumption is a function of address change
pattern. (More for address compliment)

Typical distance for a 36M (0.11um) from the
bond-pad to the middle of the chip is 7.5K.

Place the input buffer at chip-centre and route the
pad signal all the way.

Increase the pin capacitance as entire 7.5K
routing is seen on the bondpad.

Entire RC line is driven by external driver and
not by the regulated VDD supply so address
switching current is independent of address
pattern.

Width of the pad signal to be decided based on
ESD requirement plus pin capacitance.

Higher the width, better for ESD but input capacitance
becomes higher.

Layout should avoid 90 degree bends for better signal
waveforms. Use 45 degree instead.

Use top-metal for better EM and lesser capacitance.

Requires low cap ESD structure to meet pin capacitance
spec

Shield the pad signals on both sides by supply lines to
avoid delay variations due to signal switching condition.
Shielding of Signal Lines in Setup/Hold
path
Issues : Shielding

Switching of lines in the flow through path depends upon external
conditions.

External conditions means switching pattern during setup/hold
conditions, rise/fall time.

This effect could be severe particularly if pad signal is directly
routed till the chip-middle.

Amount of dips/bumps on supply lines depend up on rise/fall time
of the signal.

No logic should be put on shielding VDD/GND lines due to
dips/bumps.

Hence minimum width VDD/GND lines to be used for shielding
to minimise space overhead.
Master Latch

Required to meet hold time.

Located in HMIDDLE to minimize hold time
skews across different addresses.

Kept as close as possible to CLKGEN.

IN to OUT delay should be min. for good setup.
Address predecode/slave-latch (1)

TG latch to register decoded addresses.

Driver input is driven low by 4 series nMOS
hence speed suffers.

High gate-load on Address inputs As / NAs.
Address predecode/Slave latch (2)

Dynamic logic for slave latch.

Driver input is driven by 2 series nMOS (faster).

Less input gate load on As / NA.

Area efficient. (Small NOR3 devices)
Set-up/Hold best scenario

Master/slave latches should be placed close by so that
min sized devices can be used in M-latch.

Small pass-gate sizes (for CLK) in master latch means
less gate-loading for CLK and hence better hold time. (1
M-latch for each address)

Predecoder gate load at the input of slave latch should be
small so that driver in master-latch (and also pass-gate)
can be kept as small as possible w/o sacrificing M-latch
delay.

Hence choice of slave-latch (2) is better.
Set-up/Hold layout issues

Watch out for the connection RC (done in lower level
metal) of the clk signal to M-latch/Slave-latch.

Place M/S-latches as close as possible to main Clk trace
(Top metal).

Typically M/S-latch exists for all address and control
inputs so connection RC gets multiplied by no. of M/S-
latches.

Keep the Clk connection away from other connection to
latches (to reduce coupling caps).
Row-Redundancy

Dedicated fuse-logic for every core.

Evaluation one of the bottleneck to fire WL (one
of the gating item to fire normal WL)

Address based redundancy is faster but for
different RD/WR address timing (eg QDR) PR
based redundancy is simpler.

Kept in HMIDDLE.

Typically 1 redundant row per Core. There is
one signal carrying redundancy eval info for
every core.
Motivation for Dynamic Rowred

Common Clock controlling WL pulse width is
heavily loaded as it has to drive decoders (with
huge input cap) for both Top/Bottom half of core.
Local Inverters are used for wave-shaping.

Clock information if sent through redundancy
eval signal will be specific to each core and hence
will be less loaded. Local wave-shaping inverters
are not required.
Dynamic Row-Redundancy
RGWL (Old v/s New)

Timing requirement similar in both the cases.
GWL (Static v/s Dynamic)
Comparison : Static v/s Dynamic

Dynamic decoders have lesser area ; Easy to layout in
tight space.

Considerably reduces gate load for long lines hence
sharper waveforms.

Gate-sharing (series nMOS) reduces the gate-load (Eg :
Common nMOS for PRI in dynamic scheme)

Separate WLE_NREDEN_CORESEL as compared to
common WLE_SEL means sharper Clock waveforms
without local waveshaping inverters.

For pulsed high inputs, no timing margin constraints in
dynamic scheme (Evaluation during high).
SWL generation (Static v/s Dynamic)

Less input gate loading for dynamic scheme
(3 times less loading for SEGEN input).
Read Path (Sense path) Basics

Bitlines are selected based on some set of
external addresses.

Bitlines are precharged at VDD. There are two
ways of sensing the information on Bitline after
WL is turned on.

Voltage sensing : Senses the differential voltage
on BL - BL_ (Compact S/A circuit)

Current sensing : Sense the current difference on
BL - BL_ (Big size S/A circuitry)
Voltage v/s Current sensing

Voltage sensing : Sense amplifer needs to be
turned on after a delay from WL to allow the
differential to build.

Current Sensing : Sense amplifier turns on as soon
as WL turns on.

Current sensing consumes more current (biasing)
than voltage sensing.

Current sensing becomes risky in case of
technology with high leakage currents, Voltage
sensing can be made to work by delaying SAMP
enable more.
Read path Basics (Contd)

Two ways of connecting the BL/BL_ to SAMP
(a) Through the column pass-gate (pMOS)
(b) Directly (No column pass-gate)

In case of (a) typically 16 or 32 to 1 Mux is used
i.e. 1 SAMP for 16/32 bitline pairs

In case of (b) there is 1 SAMP per BL-BL_.

More circuitry in Core for SAMP and related
logic in case of (b). SAMP layout has to fit in
BL-BL_ pitch which favours choice of voltage
SAMP.
T-Gate muxing for Read

CPGs should arrive same time as WL.

Difficult to optimise pMOS size for speed.

Less Width ; less C but more R.

More Width ; less R but more C.

Delay in split transfer from BL to TBUSR.

Keep SAMP at mid point of TBUS routing.
Motivation :Per Bit SAMP scheme

Column pass-gate mux adds routing parasitics and wired-OR gate
loading in the weakly driven critical path of voltage differential
on SAMP .

Typically pMOS are used as Bit-lines are around VDD in read
mode. (pMOS passes ‘1’ efficiently)

A 32 : 1 Mux involves drain load of 32 pMOS and around 50um
routing. This adds 500ps delay for the 80mv differential to appear
on SAMP nodes (CSM 0.13um)

WL pulse has to be wider to get the desired differential .

Higher BL/BL_ spilt and hence higher precharge current/longer
precharge time at the end of the Read cycle.

Not much speed gain with 16:1,8:1,4:1 mux.
Per Bit SAMP scheme (READ)

One SAMP per every Column (BL-BL_)

Column decoder signals combine with Sense-
clocks to decide which SAMP to turn-on.

Column decoder signals can arrive later than WL
but before than Sense clocks. (Less constraint)

16: 1 Mux is kept at the output of SAMP which
are strongly driven CMOS signals (faster).
Per bit SAMP scheme
Per bit SAMP issues

SAMP has to be laid out in BL/BL_ pitch hence SAMPBANK has
lot of devices packed in a tight area.

SAMP input nodes are liable to coupling.

BL/BL_ pitch (core DRC rules) and SAMP transistors rule
(peripherey DRC rules) vary across technologies so difficult to
port layout.

More devices hence more leakage paths per BL/BL_ in
SAMPBANK hence not suitable for designs with low leakage
current spec.

To minimize leakage current, transistor length for logic in
SAMPBANK area is kept > min channel length there by putting
more constraint on pitched layout design.
COLRED scheme Basics

COLRED logic to repair faulty bitlines which
translate to faulty I/O repair.

Typically a redundant bitline can repair more than
one I/Os.

COLRED logic contains fuses corresponding to
each column address plus IO fuses.

IO fuse decide the assignment of Redundant
SAMP to a particular IOR line.

Typically output data from redundant column can
be assigned to 3 I/Os. (2 IO fuses)
Read Data Muxing (COLRED &
NORMAL path)

Two schemes to mux the data from redundant
bitlines and normal bitlines

(a) Provide separate RIOR lines each redundant
bitline data out and put mux to select RIOR and
IOR lines.

(b) Use IOR lines to carry both normal and
redundant bitline data (Wired-OR I/O line).
Need to disable normal path incase of
COLRED evaluation.
Read data muxing scheme (a)
Read Data scheme (b)
COLRED RD Data path

No muxing at immediate output of SAMP,
Muxing at the input of final Read Driver.
COLRED logic placement (layout)

Scheme (a) involves mux in the data-path (post
read driver) hence ideal location is away from
VMID and close to I/O path circuitry.

Scheme (b) requires that evaluation should
happen with sufficient margin to disable path
hence it should be kept close to address latches
for faster evaluation. (In VMID).

Advantage of scheme (b) is less switching current
due to lesser routing for address to eval logic.
Data out Path (Periphery circuits)

Output Data-timing (QDR & QDR II)

DLL : Tight tCO and tDOH in QDR II.

Echo Clocks

Prefetching for DDR operation

Write DIN muxing in case of address-match
(Coherency)

Read I/O line muxing for X36, X18, X9 etc

Q-latch clocking for DDR operation

Programmable Impedance

Output Stage
Dataout timing (QDR / QDR II)

Assume 166 Mhz operation, Half-cycle = 3ns

For QDR, tCO = 2.5ns & tDOH = 1.2ns
Data-valid window = (3 - 2.50) + 1.20 = 1.7ns

For QDR II tCO = 0.45ns & tDOH = -0.45ns
Data-valid window = ( 3 - 0.45) + 0.45 =2.1ns

Wider Data-valid window for QDR II. Data is
more or less coincident with K / K_ rising.

1st Data is half-cycle delayed in QDR II.
Importance of Data validity

Input data valid window for any chip is equal to
set-up time + Hold time (tS + tH).

Some ASICs/FPGAs do have tS/tH of 1ns.

QDR won’t be able to interface with a chip with
tS = tH = 1ns (data-validity = 2ns) but QDR II
will.

Shorter data valid window means little margin for
set-up/hold time window to the interfacing logic
IC.
Factors affecting data valid window

Logic gate speed becomes half at fast corner (Higher
voltage/Cold temp/fast process) as compared to slow
corner (Low voltage/high temp/slow process).

Hence tCO = 2.5ns result in tDOH = 1.2ns.

tCO includes external clk to Q-latch clk, Q-latch delay,
predriver delay and output buffer delay. (Quite a few
gates).

Hence way to increase data valid window is to reduce
tCO i.e. reduce no of gates in Clock to output path.
Flight time in system

Traditional Sync SRAMs have single clock to
control input and output.

Flight time delays both clock and output data for
farthest SRAM.

Flight time variation makes it difficult for
controller to latch the data.
Separate Command/IO clocks

Input clocks (K/K_) for command/address
registration.

Output clocks (C/C_) for read output timing.
Flight time elimination (1)

Separate command and I/O clocks eliminate
flight time differences for data (to be latched at
controller).

Farthest SRAM is clocked for I/O first, returned
clocks is used to latch the data.
Flight Time elimination (2)

To avoid loading, separate C/C_ clocks can be
routed from controller to the 2 SRAMs.

Nearest SRAM has maximum skew between
command and IO clocks. (Spec for K / to C/)
DLL : Tighter tCO/tDOH in QDRII

DLL makes Output data coincident with the
rising edge of C and C_ clocks. (Low tCO)

Internally DLL clocks are generated tCO time
earlier than output clocks.

External tCO and tDOH spec gets generated
because of slow & fast corner respectively.

Because of small positive tCO, tDOH is negative.
DLL Issues

DLL needs some amount of clocks cycles to lock
up after power up. (1024 cycles).

DLL is fine tuned for a particular frequency
range.

Puts a min frequency spec on the QDR unlike
conventional sync SRAMs.

DLL can be turned off by a pin called DOFF,
QDR II timing will become similar to QDR.
Need for Echo clocks in QDR II

QDR, hold time is positive (1.2ns). Hence QDR
controller can use K_ / to latch the 1st data, K /
can be used to latch 2nd data and so on.

QDR II, hold times are negative !

QDR II supplies Echo clocks which helps in
latching the data from QDR II accurately.

CQ and CQ_ in sync with clocks C (K) and
C_(K_) respectively.
Echo Clock timing

CQ & CQ _ ; Free running clocks.

Same Frequency as input clks (C - C_ or K-K_)

Constant timing relationship with Data coming
out of QDR and echo clocks.

Echo clock helps in generation of DLL clocks
which are in advance of external clocks.
DLL Clock generation / Echo clocks

Aim of DLL : To make CQ/CQ# and Q
coincident within tCO (0.45ns) from C/C.
Echo Clock Generation

Same output buffer for CQ/CQ_ as for Q.

Same controlling clocks (DLL outputs) to Q-
latch for data and Q-latch for CQ/CQ_.

CQ/CQ_ buffers are kept on same VDDQ/VSS
bus as Q buffers.

Similar ESD structures to match output cap.

CQ/CQ_ buffers are laid out close to Qs in
left/right.
CQ/CQ_ scheme for data latching

CQ/CQ_ are specific to each SRAM.

Allows point to point link between controller &
SRAMs for data latching.
CQ/CQ_ clocks for data latching

Constant timing relation between data and CQ clocks.

Data can be latched with 100 % accuracy.

Rising edge of echo clocks always occur 0.1ns/0.2ns
before data.

Echo clocks can be delayed to arrive centered with data
at controller input for equal tS/tH.

Allows use of QDR SRAMs from multiple sources.
Data out path block diagram
Prefetching for DDR operation (B4)

Corresponding to Read address A1, four data are
output (Q11, Q12, Q13, Q14)

Q11 and Q12 are accessed simultaneously in
Read-A1 cycle. Similarly Q13-Q14 are accessed
simultaneously in next cycle. (Prefetch)

Hence for a X36 (36 output) part, internal data-
path contain 72 Read data lines.
Address Match timing (QDR B2)

Read address A3 = Write address A2
(Previous match) Q11 = D21 ; Q12 = D22

Read address A1 = Write address A2
(Current match) Q31 = D21 ; Q32 = D22

Both cases write DIN should be routed to DOUT
ignoring the memory read.
I/O Muxing

Typically X36, X18, X9 options are provided in a
single die (with bond-option). Hence muxing.

More options, more muxing logic.

Muxing introduces gate delays in data-path.

Data stored in Write pipe should be routed to
output buffer in case of address match between
successive Read and Write (2 Match conditions)

Overall it is >= 6:1 Mux
I/O Muxing schemes
Q-Latch

Input(s) change every clock cycle (Prefetch); Outputs
change every half cycle (DDR operation).

Output driver is CMOS

Output enable information is combined before Data goes
to Q-latch to prevent junk data being driven on output
pins.

For tristate, pMOS gate = VDD, nMOS gate = GND

Separate latch for pull down and pull up path with
separate set/reset logic for tristate.

Logic should be added to tristate output during power-
up.
Q-latch clocking for DDR (pulldown)
Q-Latch clocking for DDR(pullup)
Programmable Impedance

Output drivers can be configured to have variable
impedance (between 35 & 75 Ohms) to match load
conditions.

Chip configures its output impedance to match the
load every 1024 cycles to track supply voltage and
tempreture variations.

Output impedance can be changed on the fly
during active Read cycles. (pull-down nMOS is
configured while output data is '1' and vice versa).
Programming Code (impedance)
on the fly
Programming Impedance codes
on the fly
Output predriver/driver
Write DIN (B4) : One cycle latency

Ideally Separate I/O should make zero DIN
latency possible.

Zero cycle latency for write makes it an issue for
data coherency (latest data output).

Tight conditions in case of address match i.e. A1
= A2. D21 will be routed through mux to the
output buffer with in half cycle.
LW solves coherency problems

A1 = A2 is not match condition requiring DIN
information since Q11 comes before D21.

A2 = A3 is valid match ; D21/D22 comes much
before being required to be routed to the output.
Synchronizing Write address (WL) and
DIN

WL corresponding to write can’t be turned on in
A2 cycle since data is delayed by 1 cycle.

Read command can be given with different
address (A3) during D21-D22 cycle so write
can’t be performed in D21-D22 cycle either.

Hold the data in registers till next valid write
command is given.

Write-A3 is done when Write-A4 (i.e. next valid
write) command is given.
Interesting scenario

Always one unfinished write command.

Write address to be written (till next valid write)
is stored in a register which gets updated only on
a valid write.

Address match scenario can happen after any
number of cycles. (Read address = Unfinished
write address)
Write Data-in path (QDR B4)

Write DIN comes 1 cycle later than Address
(Late Write).

D21-D22 : Written together in 1st half cycle,
D23-D24 : Written together in 3rd half cycle.

Write is actually performed at next write cmd i.e.
D21-D22 are "actually" written in 1st half of
Write-A5 cycle and D22-D23 are written in the 1st
half of next DSEL cycle (D51).

2 cycles min. between successive Writes.
Write Data In path (QDR B2)

Data comes half cycle before address. (Early
Write)

Data can be written in the same cycle as given
but to simplify design, D21-D22 are written
together during Write-A5 half cycle.

One cycle min. between successive write.

1/2 cycle delay between DIN at the pin and DIN
actually being written in B2/B4 respectively ;
DIN path is not speed critical.
Write Path Timing (Eg QDR B4)
Write Path Basics (Core)

Connecting BL/BL_ to WRTDRV (2 ways)
(a) Through the column pass-
gate (nMOS) (b) Directly (No column pass-gate)

In case of (a) typically 16 or 32 to 1 Mux is used i.e. 1
WRTDRV for 16/32 bitline pairs

In case of (b) there is WRTDRV per BL-BL_.

More circuitry in Core for WRTDRV and related logic
in case of (b). WRTDRV layout has to fit in BL-BL_
pitch.
T-gate muxing for Write

Difficult to optimise nMOS size for speed.

Less Width ; less C but more R.

More Width ; less R but more C. .

Keep WRTDRV at mid point of TBUS routing
Motivation : Per BL/BL_ WRTDRV
scheme

Since either BL or BL_ is driven to GND to write
the cell, nMOS mux is used.

During write, 0.7/0.13 nMOS device comes in
series with big write-driver pulldown (w=3.4)
effectively reducing its strength to flip the cell.

At 1.2V/TT/120C, this series nMOS adds about
800ps delay from TBUS \ to BL \. (16 -> 1
CPG and 20u TBUS length).

Not much speed gain in 16:1, 8:1, 4:1 mux.
Per BL/BL_ WRTDRV scheme

One nMOS pulldown directly connected to each
BL/NBL.

CPG<0:15> combine with WE / to select 1 out of
16 WRTDRV.

WRTDRV enable pulse-width is determined by
"WE" pulse-width.

CPGs need to arrive before WE /.

nMOS Pull-downs are laid out in BL/NBL pitch.

No pull-up in write driver
Per BL/BL_ WRTDRV scheme

No pullup in WRTDRV as SEN2 is low during
write, ssamp pMOS keeps BL/NBL high.

If BL is driven low by turning on MN8, this low
passes (weakly) through SEN2 passgate MP58
and turns on pull up of Inverter I4 which keeps
NBL to VDD.
Per BL/BL_ WRTDRV scheme (1)

One Pre write-driver
for every 16 Bitline.

Output of Pre Wrtdrv
DATA_BL and
DATA_NBL combine
with CPG<0:15> for
every BL/NBL_.

NMOS Gate-sharing in
final wrtdrv for
compact layout.
Per BL/BL_ WRTDRV scheme (2)
• Final Wrtdrv – just 3
transistors as compared to
13 transistors.
• Gate load on CPG and
DATA signals is more.
• No leakage path in wrtdrv
logic (previous circuit – 4
leakage paths).
• Bigger devices in the final
stage to get the same drive
strength because of series
nMOS.
Per BL/BL_ Wrtdrv scheme (3)
• 10 devices (< 1st scheme).
• 3 Leakage path.
• Extra routing for local
NCPG signal in wrtdrv.
• Less load on CPG and
DATA signals. CPG low
isolate gate load on DATA
• Single nMOS for final
stage so less final driver
size than 2nd case.
Per BL/BL_ write driver : issues

Like SAMP, Write-driver has to be laid out in
BL-BL_ pitch.

Internal nodes are at digital levels (VDD/GND) ;
not unlike analog voltages (voltage-differential)
in SAMP.

Lesser layout constraint/requirements.

Since logic is repeated for every BL/BL_ pair,
channel lengths are kept > min length to limit
leakage current (Standby current spec).
Bitline precharge (Basics)

Bitlines are precharged to VDD in between active
cycles (RD/WR).

During write, either BL or NBL is driven fully to
ground. Hence BL swing during Equalisation is
very high.

Write -> Read equalisation is one of most
important critical timing in high speed SRAMs.

During Read, because of the pulsed WL, typical
BL/NBL splits are around 200+ mv hence
precharge after read is not critical.
Typical Precharge Scheme/Issues

For longer BL/BL_
backend EQ helps.

Routing for backend
EQ is more, hence
smaller pMOS for
backend EQ.

Size of LOGIC EQ
pMOS is determined
by WR -> RD timing.

NEQ sees big load.
Rise/Fall time is an
issue for faster EQ.
Motivation : Faster precharge scheme

Equalisation takes more time only for the BL or
NBL which is driven low.

Typically only few bitlines are driven low in a
coreseg. (Eg. In ALSC QDR SRAM only 6 out of
192 bitlines are driven low during WR).

CPG selects the BL/NBL to driven low during
write hence it can be used to selectively turn on
big EQ devices during precharge.

Back end EQ devices can be sized to take care of
equalisation after read. Hence to save current big
EQ devices should be turned on only after Write.
Faster Precharge scheme (1)

Big EQ device will turn
on only for BL NBL
being written.

EQ is less loaded.

Latch should be
powered up so that
NEQ_CPG = high.

8 transistors per
BL/NBL. 2 leakage
paths per BL pair.

Difficult for pitched
layout.
Faster precharge scheme (2)

Lesser devices.

NEQ_CPG goes low only for
(CPG = 1) during EQ high.

NEQ_CPG floating for BLs
(CPG = 0) during EQ high.

During standby NEQ_CPG is
floating low for last written
bitline but floating high for all
other BLs.

Only 1 leakage path.

Easy for pitched layout (3
transistors)
Faster precharge scheme (3)

EQ behaviour changed,
default is low, self timed
pulse after WL \.

Ensure EQ is low during
power-up.

NEQ_CPG floating high
for BLs (CPG =0) during
EQ high.

During Standby,
NEQ_CPG is taken solid
high (better !)
Voltage regulator issues

Separate supply for SAMP to isolate the switching noise
on regular VDD rail.

Max VDD under minimum current condition.

Tail current adjustment according to switching load.

Regulator current under active and standby conditions.

Regulator biasing block placement.

VREF routing (shielded) with decap.

Regulator final driver placement.
Misc Layout: Decaps
Decaps have to be kept under powerbus, signal lines especially near big
drivers to prevent localised dips in VDD.

Decaps kept under Top, (Top-1) metal lines increase the capacitance by <
2% if thin orthogonal Metal1 is used for supply connection of decaps.

Higher L for decaps means more decaps can be laid out in a given area but
decap effectiveness will reduce because of series resistance due to higher L.

Length of decaps should be kept 12 times the minimum channel length to
optimise parasitic series resistance of decap transistor and amount of
decaps.

Put decaps as on VDDQ/VSSQ bus as part of output buffer layout. Keep
decaps little away from ESD structures as they tend to store charge during
ESD event.
Layout : Drivers

Avoid keeping too big drivers for long routing.

Keep options to reduce/increase drive strength.

Big drivers when turn on causes localised drops
in VDD due to high current.

Instead use stages of driver for long traces.

Avoid keeping too many big drivers near by
which are likely to switch on simultaneously.
Layout : Routing long line

Top metal has higher thickness hence highest coupling cap.

Top metal and (Top-1) metal capacitance differs by 10% for a
routing length of 3.5K.

Alternatively route Top and (Top-1) metal, use Top metal for
relatively longer distance and Top-1 metal for shorter distance.

Delayed signals like sense clocks, IO precharge, Address pulse etc
can be routed in (Top-1) metal ; Routing delay can be taken into
account for overall timing.

Set-up time critical signals like redundancy info, WL clocks
should be routed in Top metal.
Many Thanks !!

To all QDR team members (SQ/SF ; design and
layout) for implementing the schemes and
thorough simulations.

To DLL/IO/Regulator teams for their support and
assistance.

High Speed SRAM Design: Clock to WL Timing

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

High Speed SRAM Design: Clock to WL Timing

Diunggah oleh

Hak Cipta:

Format Tersedia

High Speed SRAM Design

Location of Input buffers

Address Decoding/Multiplexing (PR Generation)

Row Redundancy using dynamic logic

Clock routing using Rowred info in XDEC

Dynamic logic scheme for GWL/SWL generation

Anda mungkin juga menyukai