Anda di halaman 1dari 4

4-2

A Low Power 128Mb Pseudo SRAM Using Hyper


Destructive Read Architecture
Jin-Hong Ahn, Sang Hoon Hong, Jae-Bum Ko, Se Jun Kim, Sung-Won Shin, Hee-Bok Kang, Jae-Jin Lee,
and Joong-Sik Kih

Memory R&D, Hynix Semiconductor Inc


San 136-1, Ichon, Kyoung-gi
Korea
Jinhong.ahn@hynix.com

Abstract—A 128Mb Pseudo SRAM is developed using a special In this paper, we describe the Hyper Destructive Read
type of architecture with the purpose of effectively reducing the Architecture(HyDRA) using standard DRAM process, enabling
standby current. Standby current, especially the off leakage destructive reading of DRAM cells without much area penalty
current is becoming more difficult problem to handle in modern
devices because shorter channel length in high density and high using global bitline and extra tag memory. Ideally, HyDRA
speed devices are at a point where off leakage is the major source. enables the DRAM row cycle time to be half that of normal
In order to solve this issue, Hyper Destructive Read Architecture architecture. When applied to pseudo SRAM, this can save
(HyDRA) is developed. HyDRA is a special DRAM architecture about 20% of address access time, and more conservative
enabling destructive reading of DRAM cells using global bitline devices can be used to reduce the standby current maintaining
and extra tag memory. The paper demonstrates HyDRA utilizing
its fast row cycle capability to instead minimize standby power to the speed and chip area of the conventional scheme.
150uA @ 2.1V and 90°C while maintaining the speed and chip area
of the conventional scheme using the same process technology. II. CONCEPT
Figure 1 shows the main mechanism of reducing the row
cycle time. An operation cycle is made up of sense and restore
I. INTRODUCTION of a typical DRAM. The basic rule behind HyDRA is when two
While technological obstacles to nano-scale DRAM have consecutive operations occur, overlap restore operation of the
been steadily overcome, the core barrier, the short channel sub- first cycle with the sense cycle of the second cycle. This is the
threshold leakage issue of nano-scale transistors still remain similar to approach taken in [4]. This is straightforward for two
and this is one of the biggest challenges in silicon technology[1]. cycles accessing independent (not sharing the same sense amps;
Moreover, as nanolevel scaling increases the intra-die of different blocks) rows as shown in the first half of the figure.
variability of device parameters, the statistical distribution of If however, accesses are on dependent rows, the restore
threshold voltage becomes sensitive to the process conditions operation of the first operation is performed at a separate
and this worsens the leakage issue[2]. location independent from the accessed block, which is labeled
On the other hand, using longer channel devices and higher ‘extra block’ in the figure. And the following sense operation is
threshold voltage degrades the voltage headroom VDD-Vt, performed in the initially accessed block, maintaining the
which is critical to DRAM performance. Circuit techniques such overlap rule. Of course, next time the same row address that
as body bias and sleep transistors can reduce the leakage restored at the extra block is accessed, the extra block should be
without much performance degradation[3]. However, these accessed.
techniques require considerable wake-up time to activate the
DRAM chip and the usage of this technique for DRAM is
limited to the deep-power-down condition only. access access access access
command
DRAM’s Destructive Reading methods have been studied to block0 row0 block1 row1 block1 row2 block1 row1
overcome the long cycle time of DRAM cell access[4]. These
are similar to the pipeline scheme of high speed logic to reduce block0 row0 block0 row0
sense restore block0 access
the clock cycle time, which is applied to the DRAM cell.
However, the data recovery of destructed DRAM cell requires block1 row1 block1 row2 block1 row2
block1 access
sense sense restore
hidden write-back circuit and additional routing lines, which is
not allowed for standard DRAM process. This is why the block1 row1 block1 row1 block1 row1
sense extra block access
restore restore
Destructive Reading methods have been used only for
embedded DRAMs with ASIC process, not for the standard
DRAMs[4].
Figure. 1: Basic concept of fast row cycle capability

0-7803-9162-4/05/$20.00 ©2005 IEEE 113

Authorized licensed use limited to: University of Central Florida. Downloaded on October 26, 2008 at 00:07 from IEEE Xplore. Restrictions apply.
B. Global Bitline and S/A
HyDRA’s main feature is restoring entire row at a different
location from the sensed location. To efficiently move the row
16Mb Bank Global BLSA data, global bitlines are placed between global S/A as shown in
Local BLSA 0 figure 3. The number of global bitlines (GBL) per pitch are half
Tag0 Block 0
Local BLSA 0/1
256
that of the local bitlines. This means that GBL can be laid out
EBT Tag1 Block 1
using just the first metal layer. This is made possible through
X
HyDRA Control

Local BLSA 1/2


1Kb Tag2 Block 2 GBIS switches. These switches split global bitlines to two sides
Local BLSA 2/3
Tag3
D Block 3 to effectively double and match the local bitlines. This is shown
2k
Tag4
Local BLSA 3/4
Block 4 in the figure illustrating data movement from block2 to block7.
Tag5 E Local BLSA 4/5
Block 5
The data from block2 are moved to global S/A by splitting GBL
Local BLSA 5/6 using GBIS_1_2 and channeling sensed data from block2
Tag6
C Block 6
Local BLSA 6/7 through BIS_2. This moves data sensed from left side of the
Tag7 Block 7 block2 move to left global S/A and data sensed from right side
Local BLSA 7/8
Tag8 Block 8 (initial EBT) to right side of the global S/A. Since all the ‘sensed’ data from
Local BLSA 8

Global BLSA
the selected row of block2 are loaded in the global S/A, it can
4b then be ‘restored’ at block 7 by using GBIS_7_8 and BIS_7.
8k The ON resistance of GBIS switches should be kept low not to
affect the data transfer through global bitlines. Figure 4 shows
the GBIS switch layout. The width of GBIS switch is extended
Figure. 2: Bank Architecture to occupy about 3 times of minimum device size, and the worst
case global bitline resistance(4 series GBIS connections) is
III. HYDRA : CHIP ARCHITECTURE almost the same as that of one local bitline switch(BIS) which
have more narrow channel effect than GBIS.
A. Bank Architecture
In order to keep track of data moved to ‘extra block’, tag M1 GBL BL

memory is required. Figure 2 shows such architecture S/A S/A


Global BL Switch
incorporating the tag. It shows 9 blocks in a 16Mb bank, each
block containing 256 rows and 8k columns. 8 of these blocks
are addressed logically, and one block is designated as extra
block. In addition to tag memory, there is another type of
memory pointing to the extra block. This pointer informs where
the extra block is in an event of an operation. It will be shown
later that for accessing a row extra block for that row can be
anyone of these 9 blocks. Each block has its own independent
local sense amplifiers which are connected by switches through
global bitlines provided by global sense amplifiers. These global
sense amplifiers serve both as a in/out port for data and also aid Figure. 4: Global bitline switch layout
in performing restore operations in extra blocks.

Block_0 Block_1 Block_2 Block_3 Block_4 Block_5 Block_6 Block_7 Block_8

GBIS_1_2 GBIS_3_4 GBIS_5_6 GBIS_7_8


BIS_2 BIS_7
Global SA Block 2 to Global Sense Amp. Global SA

GBIS_1_2 GBIS_3_4 GBIS_5_6 GBIS_7_8


Local SA
Global Sense Amp. to Block 7

Figure. 3: Restoring at a different block

114

Authorized licensed use limited to: University of Central Florida. Downloaded on October 26, 2008 at 00:07 from IEEE Xplore. Restrictions apply.
C. Tag memory and Extra Block Table the next time logical block0 row0 is accessed, row0 of physical
The tag memory is constructed using the DRAM cells, which block8 will be accessed with pointer pointing to physical block0.
is effectively pitch matched to the main cell blocks. For each Since there is an extra block for each row, there will always be
block of main memory, there are same number of rows holding an extra row for different block restore operations. Figure 6
4bits of information. Each row in the tag holds logical block shows simulated waveform of HyDRA operation. The tag
number (3bits) and whether or not to refresh (1bit). This refresh memory checks if there is a block collision at every 15ns cycle
bit is set only if the corresponding main row was written at least which is half of normal DRAM row cycle time. If the accessed
once. All rows in the tag are initialized at power up to hold their block is different from previous block, the 15ns cycle base
respective logical values except for the block8. This block is the operation is possible without block collision. At 9th cycle in the
initial extra block holding the extra wordlines and they do not Figure 6, the block 0 was accessed consecutively. Though the
need to be initialized. The extra block table which points to the sensed data in block 0 at 8th cycle is destructed to prepare the 9th
extra block (4bits) for a given row is a 256x4bits SRAM and it is cycle, it is shown that the data is successfully restored at block 8.
initialized to point at block8. Figure 5 illustrates the sequence of The restore time at block 8 cannot be long enough because
events that occur around the tag when two consecutive another block collision may occur at block 8. However, the
operations access the same block of memory. The example restored level at block 8 is shown to be higher than that of
shows accessing row0 of logical block0 followed by row1 of normal half cycle case(block 0 at 8th cycle) because the strong
logical block0. First the extra block memory is read out to point global sense amplifier is driving the local sense amplifier without
to the extra block for row0. Then all row0’s of tag blocks are sensing operation.
read out and compared against logical block except for the block
pointed by the extra block memory. At the same time the logical IV. HYDRA : IMPLEMENTATION
block address is written to the pointed extra block. Since the tag HyDRA can be applied to any DRAM type which needs
blocks are initialized at power up, physical block0 is chosen and small row cycle time. For pseudo SRAM case, using the row
physical block8 is blocked by the pointer. Only the row0 of the cycle time reduction capability of HyDRA, the reserved timing
matched block is ‘sensed’ at the main memory. The next budget for self refresh cycle inserted inside every normal address
operation is detected and this time all row1’s of tag blocks are access time can be half that of normal pseudo SRAM, and about
compared against logical block0. The controller detecting 15ns of access time can be reduced. For this Pseudo SRAM
consecutive physical block0 access, initiates extra block restore design, to guarantee the enough extra row restore shown in
operation by updating the row1 of the extra block memory with Figure 6, a specially provided circuit redirects the refresh
the logical block0 address. The data being sensed by the first counter’s address pointer to the corresponding extra row again at
operation is restored at the extra block (physical block8) that was the next possible refresh cycle. This idea is possible for Pseudo
pointed to by the first operation and concurrently the second SRAM, where the only collision case possible is between
operation’s sense operation occurs at physical block0. Therefore, internal refresh cycle and normal cycle. There’s no normal cycle
collision case because the normal cycle time is more than 50ns.

4 Block 3
8k
Extra
Address 4
256
Block 256
Table Tag 0 row1 Block 0 256
Row 8
Address Tag 1 Block 1

Current state Tag 2 Block 2

Physical
Block0 row0 Block0 row1 Block0 row1
Tag 3 Block 3
Block0
‘ sense ’ ‘ sense ’ ‘ restore ’
Physical Block0 row0 Tag 4 Block 4
Block8 ‘ restore ’
Tag 5 Block 5
row0 block0 block0
Tag0 row1 block0 block0
Tag 6 Block 6
row0 x - > block0 block0
Tag8
row1 x x - > block0
Tag 7 Block 7
Extra row0 Phy . block8 Phy . block8 - >Phy . block0
Block row0
Table row1 Phy . block8 Phy . block8 Tag 8 Block 8

: Block Address compare hit

Figure 5 : Operation profile

115

Authorized licensed use limited to: University of Central Florida. Downloaded on October 26, 2008 at 00:07 from IEEE Xplore. Restrictions apply.
+---------+---------+---------+---------+---------+-
2.500V +. . ***********************************************+
2.400V !. **********************************************!
2.300V !. **********************************************!
2.200V !. ********************************************!
Tag Memory 2.100V >!. *******************************************!<
2.000V +. . . . . *****************************************+
1.900V !. . **************************************!
1.800V !. . **********************************!
1.700V !. . . ****************************!
1.600V !. . . **********************!
1.500V +. . . . . . . . . . . . . . . . . . . . . . . . . .+
Block0 +---------+---------+---------+---------+---------+-
^
40.00NS 50.00NS 60.00NS 70.00NS 80.00NS 90.0NS ()

Figure 8: Shmoo plot of 128M Pseudo SRAM


address access time.
Block 8 8th 9th Using
HyDRA
cycle cycle conventional
128Mb PSRAM architecture

Figure 6: Simulated waveforms of HyDRA operation. Technology 0.12 um double


Same
metal DRAM
process
V. MEASUREMENTS 0.22um CMOS 0.18um CMOS
Figure 7 is the micrograph of the 128Mb HyDRA Pseudo devices devices

SRAM using 0.12um double metal process measuring Chip size 6.85 mm x 7.9mm Approximately
5% larger
6850x7900um. Utilizing the HyDRA’s speed advantages, to due to short
reduce die penalty, long word lines and bitlines are used for the WL and BL.
main memory to increase cell efficiency. Also to reduce off
Access time 65 ns at 1.65V, 90°C Similar
current, gate lengths were increased by 20%.
The measured access time at 1.65V, 90◦C is about 65ns as Standby current 150 uA at 2.1V, 90°C 300 uA at 2.1V, 90°C
shown in Figure 8. Table 1. shows the characteristics of the
chip.
Table 1: Chip characteristics

VI. CONCLUSIONS
Main Memory In this implementation of HyDRA, a fast row cycle capable
Tag Memory architecture is demonstrated. It is used to eliminate die penalty
and reduce standby current while maintaining comparable speed
to the conventional schemes. This architecture can be used for
variety of other implementations either optimized for speed or
Row Controller power such as network memory and embedded memory.
REFERENCES
Bank4

Bank5

Bank6

Bank7
Bank0

[1] K. Itoh, “Review and future prospects of Low voltage Embedded RAMs,”
Digest of CICC 2004, Oct., 2004.
Hy DRA Controller [2] S. Narendra, “Full-Chip Subthreshold Leakage Power Prediction and
Reduction Techniques for Sub-0.18-µm CMOS,” IEEE Journal of Solid-State
Circuits, Vol. 39., No 2., February 2004, pp. 501-509.
Ex tra Block Table [3] K. Zhang, “An SRAM Design on 65nm CMOS Technology with Integrated
Leakage reduction scheme,” Symp. VLSI circuits, June, 2004, pp. 294-295.
[4] C. Hwang., “A 2.9ns Random Access Cycle Embedded DRAM with a
Destructive-Read Architecture”, Symposium on VLSI circuits 2002, pp. 174-
175.

Figure. 7: Chip micrograph of 128Mb PSRAM

116

Authorized licensed use limited to: University of Central Florida. Downloaded on October 26, 2008 at 00:07 from IEEE Xplore. Restrictions apply.

Anda mungkin juga menyukai