Anda di halaman 1dari 3

CPEG655-High-Perf Computing Lab 0

Haoke Xu
UD ID: 702367277

1. Environment
Processor: Intel Core i7-4710MQ CPU 2.50GHZ
Ram: 8.00 GB
OS: Windows 7 64 bit SP 1
Compiler: Visual studio 2015 community
2. Introduction
To have a deeper understanding of how CPU and memory
work, we double the value of every element in an array by
different orders. Then we collect the time the machine
cost of each order and analysis the result we get.
3. Experiment design
My design is that, process the elements by strides. Process
the n element, then process the n+i element, then n+2i,
then n+3i, and go on. After the array is done by stride by
stride, we begin to process n+1, then n+i+1 and we
repeat this kind of processing unless we double all the
element in the array.
So it is easily to know that if stride is 1, then we just
process the element one by one. If stride is 2, then we do
all the elements with odd orders and then all the even
ones.
So my design is collect the time of processing using 1 to
100 as stride, and collect the time of each iteration.
3. Experiment result
This is part of my result:

To observe it intuitively, I make a diagram below:


Computing time of each stride set
12
10
8

Computing tim(Second)

6
4
2
0

10

12

Stride

We can see that the time increase before the stride is


about 50, then it become lesser.
It is also obvious that at beginning the points are quite
dense. With the stride increase, they become looser.
4. Analyses
After do some research, I think there are two main factor
to cause that result:
1. The time to read array from memories, and make
them into cache.

2. The number of iterations. Cause the length of array is


fix, so the less the iterations are, the less the time
wasted in structure process.
When the stride become large, every time the program
process the next element, it cant find the current element
in cache(cause the cache will only take the elements
around the object element. Once the stride is large, and it
jump out of the coverage of cache, the cache must reload
the elements). So it makes the time increase when stride
increase.
As to the factor 2, cause iteration is (array size/stride), so
when stride increase, the numbers of iteration decrease,
which makes the time less.
These 2 factor both effect the result, one makes it
increase and another one makes it decrease. So Why is
final result increase and then decrease
I can assume a example to help to understand it. when the
stride begin to increase, at first, the first three stride is still
covered in cache. So the cache must reload every 4
processes. Then the stride increase, only two stride is
covered, the the cache must reload every 3 processes.
After the stride increasing, the cache must reload more
and more often, which makes the time increase before 50.
As to the factor2(structure cost decrease), it is much little
and cannot pad the increase of cache-loading cost.
But when the stride is totally bigger than the coverage of
cache, the cache must reload every time, and this cost
become fix, and do not increase anymore. But factor 2,
the structure cost is still decreasing because of the less
number of iteration. So the time begin to decrease.
5. Conclusion
In my deduction, there are two different factor to effect
the time cost. One is the loading time of cache, and
another is the structure cost of the program it self. By their
own properties, they finally make the result as I plot in the
diagram.