Outline
Background.
Marching Memory Through Type (MMTH).
Proposed router.
Evaluation.
Conclusion.
NOCS 2014
NoC (Network-on-Chip)
10%
NOCS 2014
NOCS 2014
Various approaches:
reduction of the power in buffers
Utilize
next-generation memory
A Hybrid buffer design with STT-MRAM
[H. Jang, et al., NOCS12]
September 18, 2014
NOCS 2014
Our approach
NOCS 2014
Outline
Background.
Marching Memory Through Type (MMTH).
Proposed router.
Evaluation.
Conclusion.
NOCS 2014
NOCS 2014
MMTH
(Marching Memory Through Type)
NOCS 2014
Behavior of MMTH
Write:
column
Read:
Data in the column indicated by
the R-pointer are transferred to
the output port.
NOCS 2014
Structure of MMTH
Inverted T
NOCS 2014
Memory cell
10
Overwrite occurs
NOCS 2014
11
Summary:
Characteristics of MMTH
Advantage
Disadvantage
Read latency (1 clock cycle at 2GHz, 8depth)
NOCS 2014
12
Outline
Background.
Marching Memory Through Type (MMTH).
Proposed router.
Evaluation.
Conclusion.
NOCS 2014
13
Router architecture
NOCS 2014
14
Router pipeline
NOCS 2014
Reduction of BR stage
Bypass buffers
September 18, 2014
NOCS 2014
16
NOCS 2014
17
Latency = 3H
Latency = 4H
Overhead
Latency = 3H + 1
September 18, 2014
NOCS 2014
18
A reset signal
Reset MMTH after finishing transmitting a packet
Reset
An invalid signal
Invalidate data while BR stage
during BR
September 18, 2014
NOCS 2014
Invalidate(flit type<=none)
19
Outline
Background.
Marching Memory Through Type (MMTH).
Proposed router.
Evaluation.
Conclusion.
NOCS 2014
20
Overview of evaluation
NOCS 2014
21
Simulation setup
(a)CMP System Configuration
in GEM5 full system simulator
System Parameters
Details
System Parameters
Details
Processor
X86-64
Clock frequency
2GHz
# of processors
Topology
2D-Mesh
# of directories
# of cores
# of L2 caches
16
32KB
Buffer size
8flits
L2 cache size
256KB
Routing
XY routing
Coherence protocol
MOESI directory
Arbiter type
Round-robin
Flit size
4x4 Mesh
Packet size
Each router has a L2 cache bank
Routers in the four corners are
Traffic pattern
connected with processors and directories
September 18, 2014
NOCS 2014
64bit
1 header + 6 bodies
Uniform
22
Performance overhead:
Network simulation
50
Baseline
MM(naive)
MM(proposed)
40
30
20
10
60
50
40
30
20
10
0
0 0.05 0.1 0.15 0.2 0.25
Injected Traffic [packets/node/cycle
0
0
0.05 0.1 0.15 0.2
Injected Traffic [packets/node/cycle
Baseline
MM(naive)
MM(proposed)
NOCS 2014
23
Performance overhead:
Full system simulation
1.2
Execution time(Normalized)
1.15
Baseline
MM(naive)
MM(proposed)
1.1
10%
2%
1.05
1
0.95
0.9
0.85
0.8
IS
MG
CG
FT
BT
LU
EP
Benchmark programs
NOCS 2014
SP
UA
Ave.
24
100
Frequency[%]
80
1
0
1
0
to
to
to
to
0
1
1
0
BCR[%]
60
20
0
IS MG CG FT BT LU EP SP UA Ave.
Benchmark programs
NOCS 2014
25
Power consumption
Power consumption[mW]
30
The others
Input buffers
25
Ave. 13%
Ave. 68%
Max 45.4%
20
Ave.42.4%
15
10
5
0
Baseline MM(MAX) MM(IS) MM(MG) MM(CG) MM(FT) MM(BT) MM(LU) MM(EP) MM(SP) MM(UA) MM(Ave.)
NOCS 2014
26
Outline
Background.
Marching Memory Through Type (MMTH).
Proposed router.
Evaluation.
Conclusion.
NOCS 2014
27
Conclusion
NOCS 2014
28