Clock to WL (Read/Write speed)
Read path (Memory Core)
Data-Out path (Periphery)
Data-In path (Periphery)
Write Path (Core)
Precharge scheme (Core)
Misc Issues
CLK TO WORDLINE (WL)
CLK to WL timing is one of the most critical timing in
High Speed SRAM design. (50% of SRAM access
time).
Important for both Read and Write.
For Read, after WL is turned on, data from the cell
comes on Bit-line which is sensed and sent to data-path
blocks.
For write, after WL is turned on, data-to be written is
presented on bitline which flip the cell.
WL in high speed designs in "pulsed" till data is sensed
(Read) / data is written (Write).
Clock to WL path
1/2 cycle delay between DIN at the pin and DIN
actually being written in B2/B4 respectively ;
DIN path is not speed critical.
Write Path Timing (Eg QDR B4)
Write Path Basics (Core)
Connecting BL/BL_ to WRTDRV (2 ways)
(a) Through the column pass-
gate (nMOS) (b) Directly (No column pass-gate)
In case of (a) typically 16 or 32 to 1 Mux is used i.e. 1
WRTDRV for 16/32 bitline pairs
In case of (b) there is WRTDRV per BL-BL_.
More circuitry in Core for WRTDRV and related logic
in case of (b). WRTDRV layout has to fit in BL-BL_
pitch.
T-gate muxing for Write
Difficult to optimise nMOS size for speed.
Less Width ; less C but more R.
More Width ; less R but more C. .
Keep WRTDRV at mid point of TBUS routing
Motivation : Per BL/BL_ WRTDRV
scheme
Since either BL or BL_ is driven to GND to write
the cell, nMOS mux is used.
During write, 0.7/0.13 nMOS device comes in
series with big write-driver pulldown (w=3.4)
effectively reducing its strength to flip the cell.
At 1.2V/TT/120C, this series nMOS adds about
800ps delay from TBUS \ to BL \. (16 -> 1
CPG and 20u TBUS length).
Not much speed gain in 16:1, 8:1, 4:1 mux.
Per BL/BL_ WRTDRV scheme
One nMOS pulldown directly connected to each
BL/NBL.
CPG<0:15> combine with WE / to select 1 out of
16 WRTDRV.
WRTDRV enable pulse-width is determined by
"WE" pulse-width.
CPGs need to arrive before WE /.
nMOS Pull-downs are laid out in BL/NBL pitch.
No pull-up in write driver
Per BL/BL_ WRTDRV scheme
No pullup in WRTDRV as SEN2 is low during
write, ssamp pMOS keeps BL/NBL high.
If BL is driven low by turning on MN8, this low
passes (weakly) through SEN2 passgate MP58
and turns on pull up of Inverter I4 which keeps
NBL to VDD.
Per BL/BL_ WRTDRV scheme (1)
One Pre write-driver
for every 16 Bitline.
Output of Pre Wrtdrv
DATA_BL and
DATA_NBL combine
with CPG<0:15> for
every BL/NBL_.
NMOS Gate-sharing in
final wrtdrv for
compact layout.
Per BL/BL_ WRTDRV scheme (2)
• Final Wrtdrv – just 3
transistors as compared to
13 transistors.
• Gate load on CPG and
DATA signals is more.
• No leakage path in wrtdrv
logic (previous circuit – 4
leakage paths).
• Bigger devices in the final
stage to get the same drive
strength because of series
nMOS.
Per BL/BL_ Wrtdrv scheme (3)
• 10 devices (< 1st scheme).
• 3 Leakage path.
• Extra routing for local
NCPG signal in wrtdrv.
• Less load on CPG and
DATA signals. CPG low
isolate gate load on DATA
• Single nMOS for final
stage so less final driver
size than 2nd case.
Per BL/BL_ write driver : issues
Like SAMP, Write-driver has to be laid out in
BL-BL_ pitch.
Internal nodes are at digital levels (VDD/GND) ;
not unlike analog voltages (voltage-differential)
in SAMP.
Lesser layout constraint/requirements.
Since logic is repeated for every BL/BL_ pair,
channel lengths are kept > min length to limit
leakage current (Standby current spec).
Bitline precharge (Basics)
Bitlines are precharged to VDD in between active
cycles (RD/WR).
During write, either BL or NBL is driven fully to
ground. Hence BL swing during Equalisation is
very high.
Write -> Read equalisation is one of most
important critical timing in high speed SRAMs.
During Read, because of the pulsed WL, typical
BL/NBL splits are around 200+ mv hence
precharge after read is not critical.
Typical Precharge Scheme/Issues
For longer BL/BL_
backend EQ helps.
Routing for backend
EQ is more, hence
smaller pMOS for
backend EQ.
Size of LOGIC EQ
pMOS is determined
by WR -> RD timing.
NEQ sees big load.
Rise/Fall time is an
issue for faster EQ.
Motivation : Faster precharge scheme
Equalisation takes more time only for the BL or
NBL which is driven low.
Typically only few bitlines are driven low in a
coreseg. (Eg. In ALSC QDR SRAM only 6 out of
192 bitlines are driven low during WR).
CPG selects the BL/NBL to driven low during
write hence it can be used to selectively turn on
big EQ devices during precharge.
Back end EQ devices can be sized to take care of
equalisation after read. Hence to save current big
EQ devices should be turned on only after Write.
Faster Precharge scheme (1)
Big EQ device will turn
on only for BL NBL
being written.
EQ is less loaded.
Latch should be
powered up so that
NEQ_CPG = high.
8 transistors per
BL/NBL. 2 leakage
paths per BL pair.
Difficult for pitched
layout.
Faster precharge scheme (2)
Lesser devices.
NEQ_CPG goes low only for
(CPG = 1) during EQ high.
NEQ_CPG floating for BLs
(CPG = 0) during EQ high.
During standby NEQ_CPG is
floating low for last written
bitline but floating high for all
other BLs.
Only 1 leakage path.
Easy for pitched layout (3
transistors)
Faster precharge scheme (3)
EQ behaviour changed,
default is low, self timed
pulse after WL \.
Ensure EQ is low during
power-up.
NEQ_CPG floating high
for BLs (CPG =0) during
EQ high.
During Standby,
NEQ_CPG is taken solid
high (better !)
Voltage regulator issues
Separate supply for SAMP to isolate the switching noise
on regular VDD rail.
Max VDD under minimum current condition.
Tail current adjustment according to switching load.
Regulator current under active and standby conditions.
Regulator biasing block placement.
VREF routing (shielded) with decap.
Regulator final driver placement.
Misc Layout: Decaps
Decaps have to be kept under powerbus, signal lines especially near big
drivers to prevent localised dips in VDD.
Decaps kept under Top, (Top-1) metal lines increase the capacitance by <
2% if thin orthogonal Metal1 is used for supply connection of decaps.
Higher L for decaps means more decaps can be laid out in a given area but
decap effectiveness will reduce because of series resistance due to higher L.
Length of decaps should be kept 12 times the minimum channel length to
optimise parasitic series resistance of decap transistor and amount of
decaps.
Put decaps as on VDDQ/VSSQ bus as part of output buffer layout. Keep
decaps little away from ESD structures as they tend to store charge during
ESD event.
Layout : Drivers
Avoid keeping too big drivers for long routing.
Keep options to reduce/increase drive strength.
Big drivers when turn on causes localised drops
in VDD due to high current.
Instead use stages of driver for long traces.
Avoid keeping too many big drivers near by
which are likely to switch on simultaneously.
Layout : Routing long line
Top metal has higher thickness hence highest coupling cap.
Top metal and (Top-1) metal capacitance differs by 10% for a
routing length of 3.5K.
Alternatively route Top and (Top-1) metal, use Top metal for
relatively longer distance and Top-1 metal for shorter distance.
Delayed signals like sense clocks, IO precharge, Address pulse etc
can be routed in (Top-1) metal ; Routing delay can be taken into
account for overall timing.
Set-up time critical signals like redundancy info, WL clocks
should be routed in Top metal.
Many Thanks !!
To all QDR team members (SQ/SF ; design and
layout) for implementing the schemes and
thorough simulations.
To DLL/IO/Regulator teams for their support and
assistance.