CPU Cache: How Caching Works

CPU cache
CPU cache is a cache used by the central processing unit of a computer to reduce the average time to
access memory. The cache is a smaller, faster memory which stores copies of the data from the most
frequently used main memory locations. As long as most memory accesses are cached memory
locations, the average latency of memory accesses will be closer to the cache latency than to the latency
of main memory.
When the processor needs to read from or write to a location in main memory, it first checks whether a
copy of that data is in the cache. If so, the processor immediately reads from or writes to the cache, which
is much faster than reading from or writing to main memory.
Most modern desktop and server CPUs have at least three independent caches: an instruction cache to
speed up executable instruction fetch, a data cache to speed up data fetch and store, and a translation
lookaside buffer (TLB) used to speed up virtual-to-physical address translation for both executable
instructions and data.
How Caching Works
In the example in the previous section a loop was used to read characters from a file,
store them in working memory, and then write them to the screen. The first time each
of these instructions (read, store, write) is executed, it must be loaded from relatively
slow system memory (assuming it is in memory, otherwise it must be read from the
hard disk which is much, much slower even than the memory).
The cache is programmed (in hardware) to hold recently-accessed memory locations
in case they are needed again. So each of these instructions will be saved in the cache
after being loaded from memory the first time. The next time the processor wants to
use the same instruction, it will check the cache first, see that the instruction it needs
is there, and load it from cache instead of going to the slower system RAM. The
number of instructions that can be buffered this way is a function of the size and
design of the cache.
Let's suppose that our loop is going to process 1,000 characters and the cache is able
to hold all three instructions in the loop (which sounds obvious, but isn't always, due
to cache mapping techniques). This means that 999 of the 1,000 times these
instructions are executed, they will be loaded from the cache, or 99.9% of the time.
This is why caching is able to satisfy such a large percentage of requests for memory
even though it has a capacity that is often less than 1% the size of the system RAM.
How The Memory Cache Works

Author: Gabriel Torres
Type: Tutorials Last Updated: September 12, 2007
Page: 1 of 9
Introduction
The memory cache is high-speed memory available inside the CPU in order
to speed up access to data and instructions stored in RAM memory. In
this tutorial we will explain how this circuit works in an easy to follow
language.
A computer is completely useless if you don’t tell the processor (i.e., the
CPU) what to do. This is done through a program, which is a list of
instructions telling the CPU what to do.
The CPU fetches programs from the RAM memory. The problem with the
RAM memory is that when it's power is cut, it's contents are lost – this
classifies the RAM memory as a “volatile” medium. Thus programs and
data must be stored on non-volatile media (i.e., where the contents aren’t
lost after your turn your PC off) if you want to have them back after you
turn off your PC, like hard disk drives and optical media like CDs and
DVDs.
When you double click an icon on Windows to run a program, the

program, which is usually stored on the computer’s hard disk drive, is
loaded into the RAM memory, and then from the RAM memory the CPU
loads the program through a circuit called memory controller, which is
located inside the chipset (north bridge chip) on Intel processors or
inside the CPU on AMD processors. In Figure 1 we summarize this (for
AMD processors please ignore the chipset drawn).
click to enlarge
Figure 1: How stored data is transferred to the CPU.
The CPU can’t fetch data directly from hard disk drives because they are
too slow for it, even if you consider the fastest hard disk drive available.
Just to give you some idea of what we are talking about, a SATA-300 hard
disk drive – the fastest kind of hard disk drive available today for the
regular user – has a maximum theoretical transfer rate of 300 MB/s. A
CPU running internally at 2 GHz with 64-bit internal datapaths* will
transfer data internally at 16 GB/s – over 50 times faster.
* Translation: the paths between the CPU internal circuits. This is rough
math just to give you an idea, because CPUs have several different
datapaths inside the CPU, each one having different lengths. For example,
on AMD processors the datapath between the L2 memory cache and the
L1 memory cache is 128-bit wide, while on current Intel CPUs this
datapath is 256-bit wide. If you got confused don’t worry. This is just to
explain that the number we published in the above paragraph isn’t fixed,
but the CPU is always a lot faster than hard disk drives.
The difference in speed comes from the fact that hard disk drives are
mechanical systems, which are slower than pure electronics systems, as
mechanical parts have to move for the data to be retrieved (which is far
slower than moving electrons around). RAM memory, on the other hand, is
100% electronic, thus faster than hard disk drives and optimally as fast as
the CPU.
And here is the problem. Even the fastest RAM memory isn’t as fast as the
CPU. If you take DDR2-800 memories, they transfer data at 6,400 MB/s –
12,800 MB/s if dual channel mode is used. Even though this number is
somewhat close to the 16 GB/s from the previous example, as current
CPUs are capable of fetching data from the L2 memory cache at 128- or
256-bit rate, we are talking about 32 GB/s or 64 GB/s if the CPU works
internally at 2 GHz. Don’t worry about what the heck “L2 memory cache”
is right now, we will explain it later. All we want is that you get the idea
that the RAM memory is slower than the CPU.
By the way, transfer rates can be calculated using the following formula
(on all examples so far “data per clock” is equal to “1”):
Transfer rate = width (number of bits) x clock rate x data per clock / 8
The problem is not only the transfer rate, i.e., the transfer speed, but also
latency. Latency (a.k.a. “access time”) is how much time the memory
delays in giving back the data that the CPU asked for – this isn’t
instantaneous. When the CPU asks for an instruction (or data) that is
stored at a given address, the memory delays a certain time to deliver this
instruction (or data) back. On current memories, if it is labeled as having a
CL (CAS Latency, which is the latency we are talking about) of 5, this
means that the memory will deliver the asked data only after five memory
clock cycles – meaning that the CPU will have to wait.
Waiting reduces the CPU performance. If the CPU has to wait five memory
clock cycles to receive the instruction or data it asked for, its performance
will be only 1/5 of the performance it would get if it were using a memory
capable of delivering data immediately. In other words, when accessing a
DDR2-800 memory with CL5, the performance the CPU gets is the
same as a memory working at 160 MHz (800 MHz / 5). In the real world
the performance decrease isn’t that much because memories work under a
mode called burst mode where from the second data on, data can be
delivered immediately, if it is stored on a contiguous address (usually the
instructions of a given program are stored in sequential addresses). This is
expressed as “x-1-1-1” (e.g., “5-1-1-1” for the memory in our example),
meaning that the first data is delivered after five clock cycles but from the
second data on data can be delivered in just one clock cycle – if it is stored
on a contiguous address, like we said.
How does the CPU Cache work?
Without the cache memory every time the CPU requested data it would send a request to the main
memory which would then be sent back across the memory bus to the CPU. This is a slow process in
computing terms. The idea of the cache is that this extremely fast memory would store and data that is
frequently accessed and also if possible the data that is around it. This is to achieve the quickest
possible response time to the CPU. Its based on playing the percentages. If a certain piece of data has
been requested 5 times before, its likely that this specific piece of data will be required again and so is
stored in the cache memory.
Lets take a library as an example o how caching works. Imagine a large library but with only one
librarian (the standard one CPU setup). The first person comes into the library and asks for Lord of the
Rings. The librarian goes off follows the path to the bookshelves (Memory Bus) retrieves the book and
gives it to the person. The book is returned to the library once its finished with. Now without cache the
book would be returned to the shelf. When the next person arrives and asks for Lord of the Rings, the
same process happens and takes the same amount of time.
If this library had a cache system then once the book was returned it would have been put on a shelf at
the librarians desk. This way once the second person comes in and asks for Lord of the Rings, the
librarian only has to reach down to the shelf and retrieve the book. This significantly reduces the time it
takes to retrieve the book. Back to computing this is the same idea, the data in the cache is retrieved
much quicker. The computer uses its logic to determine which data is the most frequently accessed and
keeps them books on the shelf so to speak.
That is a one level cache system which is used in most hard drives and other components. CPU's
however use a 2 level cache system. The principles are the same. The level 1 cache is the fastest and
smallest memory, level 2 cache is larger and slightly slower but still smaller and faster than the main
memory. Going back to the library, when Lord of the Rings is returned this time it will be stored on the
shelf. This time the library gets busy and lots of other books are returned and the shelf soon fills up. Lord
of the Rings hasn't been taken out for a while and so gets taken off the shelf and put into a bookcase
behind the desk. The bookcase is still closer than the rest of the library and still quick to get to. Now
when the next person come in asking for Lord of the Rings, the librarian will firstly look on the shelf and
see that the book isn't there. They will then proceed to the bookcase to see if the book is in there. This is
the same for CPU's. They check the L1 cache first and then check the L2 cache for the data they
require.
Is more Cache always better?
The answer is mostly yes but certainly not always. The main problem with having too much cache
memory is that the CPU will always check the cache memory before the main system memory. Looking
at our library again as an example. If 20 different people come into the library all after different books
that haven't been taken out in quite a while but the library has been busy before and so the shelf and the
bookcase are both full we have a problem. Each time a person asks for a book the librarian will check
the shelf and then check the bookcase before realising that the book has to be in the main library. The
librarian each time then trots off to get the book from the library. If this library had a non cache system it
would actually be quicker in this instance because the librarian would go straight to the book in the main
library instead of checking the shelf and the bookcase.
As the fact that non cache systems only work in certain circumstances and so in certain applications
CPU's are definitely better with a decent amount of cache. Applications such as MPEG encoders are not
good cache users because they have a constant stream of completely different data.
Does cache only store frequently accessed data?
If the cache memory has space it will store data that is close to that of the frequently accessed data.
Looking back again to our library. If the first person of the day comes into the library and takes out Lord
of the Rings, the intelligent librarian may well place Lord of the Rings part II on the shelf. In this case
when the person brings back the book, there is a good chance that they will ask for Lord of the Rings
part II. As this will happen more times than not. It was well worth the Librarian going to fetch the second
part of the book in case it was required.
Cache Hit and Cache Miss
Cache hit and cache miss are just simple terms for the accuracy of what goes into the CPU's cache. If
the CPU accesses its cache looking for data it will either find it or it wont. If the CPU finds what's its after
that's called a cache hit. If it has to go to main memory to find it then that is called a cache miss. The
percentage of hits from the overall cache requests is called the hit rate. You will be wanting to get this as
high as possible for best performance.
What is Cache Memory?

Cache (pronounced cash) memory is extremely fast memory that is built into a computer’s central
processing unit (CPU), or located next to it on a separate chip. The CPU uses cache memory to store
instructions that are repeatedly required to run programs, improving overall system speed. The advantage
of cache memory is that the CPU does not have to use themotherboard’s system bus for data transfer.
Whenever data must be passed through the system bus, the data transfer speed slows to the
motherboard’s capability. The CPU can process data much faster by avoiding the bottleneck created by
the system bus.

CPU Cache: How Caching Works

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

CPU Cache: How Caching Works

Diunggah oleh

Hak Cipta:

Format Tersedia

CPU cache

How Caching Works

How The Memory Cache Works

Type: Tutorials Last Updated: September 12, 2007

When you double click an icon on Windows to run a program, the

How does the CPU Cache work?

Is more Cache always better?

Does cache only store frequently accessed data?

Cache Hit and Cache Miss

What is Cache Memory?

Anda mungkin juga menyukai