CIEntIfIC
ProgrAmmIng
Editors: Konstantin Lufer, laufer@cs.luc.edu
Konrad Hinsen, hinsen@cnrs-orleans.fr
CPUs spend most of their time waiting for data to arrive. Identifying low-level bottlenecksand how to
ameliorate themcan save hours of frustration over poor performance in apparently well-written programs.
68
CISE-12-2-ScientificPro.indd 68
The Hierarchical
Memory Model
2/8/10 2:23:25 PM
Mechanical disk
Mechanical disk
Mechanical disk
Main memory
Main memory
Main memory
Speed
Capacity
Level 3 cache
Central
processing
unit (CPU)
(a)
Level 2 cache
Level 2 cache
Level 1 cache
CPU
Level 1 cache
CPU
(b)
(c)
figure 1. Evolution of the hierarchical memory model. (a) the primordial (and simplest) model; (b) the most common current
implementation, which includes additional cache levels; and (c) a sensible guess at whats coming over the next decade:
three levels of cache in the CPU and solid state disks lying between main memory and classical mechanical disks.
implemented several memory layers with different capabilities: lowerlevel caches (that is, those closer to
the CPU) are faster but have reduced
capacities and are best suited for performing computations; higher-level
caches are slower but have higher capacity and are best suited for storage
purposes.
Figure 1 shows the evolution of
this hierarchical memory model over
time. The forthcoming (or should I
say the present?) hierarchical model
includes a minimum of six memory
levels. Taking advantage of such a
deep hierarchy isnt trivial at all, and
programmers must grasp this fact
if they want their code to run at an
acceptable speed.
Techniques to Fight
Data Starvation
marCh/april 2010
CISE-12-2-ScientificPro.indd 69
69
2/8/10 2:23:26 PM
SCIEntIfIC ProgrAmmIng
C = A <oper> B
Dataset A
Operate
Cache
Central processing unit
Dataset C
Dataset B
figure 2. the blocking technique. the blocking approach exploits both spatial and
temporal localities for maximum benefit by retrieving a contiguous block that fits
into the CPU cache at each memory access, then operates upon it and reuses it as
much as possible before writing the block back to memory.
70
CISE-12-2-ScientificPro.indd 70
2/8/10 2:23:27 PM
marCh/april 2010
CISE-12-2-ScientificPro.indd 71
References
1.
2.
3.
4.
5.
K. Dowd, memory reference optimizations, High Performance Computing, oreilly & Associates, 1993,
pp. 213233.
C. Dunphy, rAm-o-ramaYour field
guide to memory types, Maximum
PC, mar. 2000, pp. 6769.
r. van der Pas, Memory Hierarchy in
Cache-Based Systems, tech. report, Sun
microsystems, nov. 2002; www.sun.
com/blueprints/1102/817-0742.pdf.
U. Drepper, What Every Programmer
Should Know About Memory, tech.
report, redHat Inc., nov. 2007;
http://people.redhat.com/drepper/
cpumemory.pdf.
t. Halfhill, Parallel Processing with
CUDA: nvidias High-Performance
Computing Platform Uses massive
multithreading, Microprocessor
Report, vol. 22, no. 1, 2008; www.
mdronline.com/mpr/h/2008/0128/
220401.html.
71
2/8/10 2:23:29 PM