Marching Memory
Tadao Nakamura
C- M - C
Computer
Communication
Memory
CPU
Ideal
Processing Speed
1 EFlops
100 PFlops
Maximum of
10 PFlops
Todays
1 PFlops
100 TFlops
10 TFlops
SX
Todays
Mainframe
Applications
3
HPC
Computer System
Tunable Energy Efficiency
1, 2, , 9 at Levels 1-9
COOL Software
Software Domain
Instruction Set
Architecture
(Old Definition of
Architecture)
Hardware Domain
COOL Chips
Architecture
Energy Consumption
Power
Dynamic
Static
Time
Time
Time
Conventional memory
C
P
U
Memory access time is 1000 times the clock cycle of the CPU
The memory bottleneck as a memory wall
7
Marching memory
C
P
U
Cache
Memory
CPU Marching
Memory memory
CPU
Increasing bandwidth
CPU
Conventional Conventional
cache
Register file
Marching memory
appears as a huge
DRAM vector
register file
10
ALU /
Arithmetic
pipeline
11
Classification of Marching
Memory
Definition: The abbreviation of Marching Memory is MM
1) Simple MM
Sequential Access Mode
As a core of a complex MM <<<<<
As an independent MM unit for streaming/vector data only
As an independent MM unit for sequential programs only
2) Extended simple MM
Random Access Mode with High Locality
As a core of a complex MM <<<<<
As an independent MM unit for data with high locality
As an independent MM unit for programs without long distance branches
3) Complex MM
Random Access Mode with Low Locality
As an independent MM unit for huge programs with long distance branches
As an independent MM unit for general types of data
12
13
AND gate
with delay
Memory Write
Memory Read
Capacitor
Cell 1
Cell 2
Cell 3
Cell 4
time=t time = t+1
(a) Memory scheme
Cell 5
Cell position
& Time
Logical value
1
Time
t
t+
Vague part!
Clock
Space
AND gate
with delay
Memory Write
Capacitor
Cell 1
Cell 2
time=t time = t+1
15
Cell position
& Time
1.
2. A definite way of using memory cells that have one normal AND gate
consisting of several CMOS FETs, and one capacitor
3. An expected way of using memory cells that have one (special) AND
gate function MOS FET and one capacitor
16
Information shifting
(a) Shifting (marching) right
(b) Staying
Information shifting
(c) Shifting (marching) left
17
Clock
Selector
Space
AND gate
with delay
AND gate
with delay
Memory Read/Write
Memory Read/Write
Capacitor
Cell 1
Cell 2
Cell 3
time=t time = t+1
18
Cell 4
Cell 5
Cell position
& Time
Contents
19
Is
Scalar / vector
instruction
Contents
t s r q po
(c) For vector data and streaming data
20
Instructions marching
(a) For instructions
Position
not used
Vector data
from the next
column
21
Instruction to the
CPU, and from/to
the next column
Vector data to
a pipeline
Column
decoder
Memory space
A wordline
Conventional DRAM
Row decoder
Bitline 1 or
Bitline 2 or
Bitline 3 or
Bitline 4
One stage of MM
Marching memory MM
22
Data sense
Column
access
Data
restore
Bitline
precharge
time
time
Data sense
Possible?
Column
access
Data
restore
Bitline
precharge
time
24
time
1.
2.
27
Word line
Bit line
Row decoder
Sense amplifiers
Column decoder
28
DRAM device
Rank
Rank
Dual in-line memory module
29
256 Kbit-core
Bitline pitch is 90 nm
90 nm*512 bitlines = 46 um
Wordline
Bitline
DRAM device
30
Core
Rank
Complex marching memory
Rank
Dual in-line marching memory module
31
pins
Contents
32
Position Index/Tag
(For example, The token of
the column that means the first address
of the column bytes)
Disappeared!
And then
None of addressing, sensing, restore, precharge and refresh exist
MMs Downsizing with much less hardware
No long wires even among complex MMs,
Simple structure of MM systems total
As a result
High speed
Large capacity
Low energy consumption
Low cost
34
Conclusions
We have presented two types of marching memory (MM)
using DRAM type technology
1. Simple MM is fast and suitable for streaming
applications
1-1. Extended Simple MM for random access
with high locality of data
2. Complex MM for random access in a program uses
multiple (extended) simple MM cores
This is slower and requires multi-core
interconnection networks. More research is
needed.
MM has potential for faster memory technology
especially with good compiler support.
35