Anda di halaman 1dari 33

## UNIVERSITY OF HUMAN DEVELOPMENT

COMPUTER SCIENCE DEPARTMENT
COMPUTER ARCHITECTURE

MEASURING PERFORMANCE
Azhi A. Faraj
Performance
2

## When we say one computer is faster than another

is, what do we mean?
The computer user is interested in reducing response
time
Performance: Latency vs. Throughput
3

## Latency(execution time): time to finish a fixed task

Throughput (bandwidth): number of tasks per unit time
Example: move people 10 miles
Car: capacity = 5, speed = 60 miles/hour
Bus: capacity = 60, speed = 20 miles/hour
Latency:
car = 10 min,
bus = 30 min
Throughput:
car= 30 PPH ,
bus = 120 PPH

3
Measuring Performance
4

Execution time:
Time between start and completion of a task (including
disk accesses, memory accesses )
Throughput:
Total amount of work done in a given time

4
Performance of a Computer
5

## Two Computer X and Y;

Performance of (X) > Performance of (Y)

## Execution Time (Y) > Execution Time (X)

5
Performance of difference 2 Computer
6

6
CPU Time
7

## Time CPU spends on a task

User CPU time
CPU time spent in the program
System CPU time
CPU time spent in OS performing tasks on behalf of the
program

7
CPU Time (Example)
8

## User CPU time = 90.7s

System CPU time 12.9s
Execution time 2m 39 s 159s

% of CPU time =
User CPU Time + System CPU Time
X 100 %
Execution time

8
CPU Time
9

159
= 65 %

9
Amdahls law
10

## Performance improvement that can be gained from

some faster mode of execution is limited by fraction
of the time the faster mode can be used

10
Amdahls law
11

Speedup depends on
Fraction of computation time in original machine that
can be converted to take advantage of the
enhancement
(Fraction Enhanced)
Improvement gains by enhanced execution mode
(Speedup Enhanced)

11
Example
12

## Total execution time of a Program = 50 s

Execution time that can be enhanced = 30 s

12
Speedup
13

13
Example
14

## Normal mode execution time for some portion of a

program = 6s
Enhances mode execution time for the same program
= 2s

=3

14
Execution Time
15

15
Example
16

## Suppose we consider an enhancement to the processor of a

server system used for Web serving.
New CPU is 10 times faster on computation in Web application
than original CPU. Assume original CPU is busy with
computation 40% of the time and is waiting for I/O 60% of
time.

enhancement?

16
17

17
Remark
18

## Ifan enhancement is only usable for fraction of a

task, we cannot speedup by more than

18
Example
19

## A common transformation required in graphics

engines is square root implementation of floating-
point (FP). square root vary significantly in
performance, especially among processors designed
graphics
Suppose FP square root (FPSQR) is responsible for
20% of execution time of a critical graphics program
Design alternative
1. Enhance FPSQR hardware and speed up this operation by
a factor of 10
2. Make all FP instruction run faster by a factor of 1.6

19
Example
20

## FP instruction are responsible for a total of 50% of

execution time. Design team believes they can make
all fp instruction run 1.6 times faster with same
effort as required for fast square root.

## Compare these two design alternatives

20
21

21
Computer Clock Times
22

## Computers run according to a clock that runs at a

The time interval is called a clock cycle (eg, 10ns).
The clock rate is the reciprocal of clock cycle - a
frequency, how many cycles per sec (eg, 100MHz).
10 ns = 1/100,000,000 (clock cycle), same as:-
1/10ns = 100,000,000 = 100MHz (clock rate).
Cycles Per Instruction
23

## The execution time of a program clearly must depend on the

number of instructions
but different instructions take different times
An expression that includes this is:-
CPU clock cycles = N * CPI
N = number of instructions
CPI = average clock cycles per instruction
CPU performance equation
24

CPU time = CPU clock cycles for a program x Clock cycle time

24
Example
25

## A program runs in 10s (CPU time) on computer A

having 400 MHz clock rate.
A new machine B, which could run the same
program in 6s, has to be designed.
Further, B should have 1.2 times as many clock
cycles as A.

25
26

26
CPU Clock Cycles
27

## CPI (clock cycles per instruction)

average no. of clock cycles each instruction takes to
execute
IC (instruction count)
no. of instructions executed in the program

## Note: CPI can be used to compare two different

implementations of the same instruction set architecture
(as IC required for a program is same)
27
Example
28

## Consider two implementations of same instruction set

architecture. For a certain program, details of time
measurements of two machines are given below

## Which machine is faster for this program and by how

much?

28
29

29
Measuring components of CPU performance
equation
30

## CPU Time: by running the program

Clock Cycle Time: published in documentation
IC: by a software tools/simulator of the architecture
((more difficult to obtain)
CPI: by simulation of an implementation (more
difficult to obtain)

30
CPU clock cycles
31
Suppose n different types of instruction
Let
ICi No. of times instruction i is executed in a program
CPIi Avg. no. of clock cycles for instruction i

31
Example
32

## Suppose we have made the following measurements:

Frequency of all FP operation = 25%
Average CPI of FP operations excluding FPSQR = 4.0
Average CPI of all other (non-FP) instructions = 1.33
Frequency of FPSQR = 3%
CPI of FPSQR = 20

Design alternatives:
1. decrease CPI of FPSQR to 2
2. decrease average CPI of all FP operation to 2.5

## Compare these two design alternatives using CPU

performance equation
32