1
What is performance?
Which computer performs better?
It depend on what you want to perform.
2
You go to the grocery store
What is important?
response time: how fast one customer is
processed
throughput: how many customers are processed
per hour.
3
For now focus on response time.
Response time is also called execution time.
Maximum performance means minimum
execution time:
1
Performancex
ExecutionTimex
4
Comparing Performance
x is n times faster than y
Performance
x n
Performance
y
5
Time
Elapsed Time: wall clock time.
CPU time: the time the CPU spends on the
program.
user CPU time: the time spent in my program.
system CPU time: time spent in the O.S. (in
service of my program).
CPU times do not include time waiting for I/O.
6
The Unix time command
timetells you the breakdown of your
program in user, system and elapsed times.
7
User vs. Designer
Users worry about how long a program will
take.
9
Quiz
Computer A has a 400MHz clock.
Our program takes 10 seconds on A.
10
Find the total number of clock
cycles
# cycles CPUtime * clock rate
11
Use same equation solve for
new clock rate
# cycles CPUtime * clock rate
12
A realistic twist on the problem
Suppose in order to move to a faster clock we
need to make a change in CPU design.
13
Instruction Set
Each processor has a set of instructions that
it can execute.
programs are just sequences of these
instructions.
Some instructions take longer than others
Instruction time is often measured in cycles.
14
CPI: Cycles per Instruction
CPI is the average number of cycles per
instruction.
average over the instructions executed in a
specific program.
You cant just take the average over all the
instructions in the instruction set!
15
IPS: Items per Scan
Average # of items the cashier can process per scan.
2 identical cans of a particular item might take only 1
scan (for an experienced scanner).
a bunch of bananas might take only 1 scan.
Given a cart full of items, the IPS is the total
number of items divided by the total number of
scans.
IPS will depend on what is in the cart!
CPI depends on what is in the program!
16
New Relationship
17
Another Problem
2 Computers: A and B.
Same instruction set (same program will run on
both computers).
A has cycle time of 1ns and CPI of 2.0 for
program P.
B has cycle time of 2ns and CPI of 1.2 for
program P.
Which machine is faster (and by how much)?
18
Computer A
CPU Execution Time
# Instructions in program * CPI
Clock Rate
# Instructions in program * CPI * Clock Cycle
19
Computer B
CPU Execution Time
# Instructions in program * CPI
Clock Rate
# Instructions in program * CPI * Clock Cycle
20
A vs. B on program P
A: 2.0p ns B: 2.4p ns
21
Important Relationship!
22
Instructions Cycles Seconds
Time
program Instruction Cycle
23
A Relevant Question
Given a program, how can we tell what the
instruction count is?
24
Instruction Count
IC is a count of the instructions executed at
runtime!
You have to watch the program flow when
running (called profiling) or simulate the
entire program.
You can't tell just by looking at the program
(even at the machine code).
25
Improving Performance
(1)Lets first find the number of clock cycles required for the program on A:
cycles
CPU clock cyclesA 10 second 4 109 40 109 cycles
second
(2)CPU time for B can be found using this equation:
We can conclude that computer A is 1.2 times as fast as computer B for this
program.
Figure 4.2 The basic components of performance and how
each is measured.
n
CPU clock cycles (CPI i Ci )
This yields i 1
Computer A Computer B
Program 1 (seconds) 1 10
Program 2 (seconds) 1000 100
Total time (seconds) 1001 110
4.4 Real Stuff: Two SPEC
Benchmarks and the Performance
of Recent Intel Processors
Keywords
System performance evaluation cooperative (SPEC) benchmark A
set of standard CPU-intensive, integer and floating point benchmarks
based on real programs.
Benchmarks
Performance best determined by running a real application
Use programs typical of expected workload
Or, typical of expected class of applications
e.g., compilers/editors, scientific applications, graphics, etc.
Small benchmarks
nice for architects and designers
easy to standardize
can be abused
SPEC (System Performance Evaluation Cooperative)
companies have agreed on a set of real program and inputs
valuable indicator of performance (and compiler technology)
can still be abused
Benchmark Games
An embarrassed Intel Corp. acknowledged Friday that a bug in a software
program known as a compiler had led the company to overstate the speed of
its microprocessor chips on an industry benchmark by 10 percent. However,
industry analysts said the coding errorwas a sad commentary on a
common industry practice of cheating on standardized performance
testsThe error was pointed out to Intel two days ago by a competitor,
Motorola came in a test known as SPECint92Intel acknowledged that it
had optimized its compiler to improve its test scores. The company had
also said that it did not like the practice but felt to compelled to make the
optimizations because its competitors were doing the same thingAt the
heart of Intels problem is the practice of tuning compiler programs to
recognize certain computing problems in the test and then substituting
special handwritten pieces of code
600
SPEC performance ratio
500
400
300
200
100
0
gcc espresso spice doduc nasa7 li eqntott matrix300 fpppp tomcatv
Benchmark
Compiler
Enhanced compiler
SPEC CPU2000
SPEC 2000
Does doubling the clock rate double the
1400
performance?
1200
Pentium 4 CFP2000
better performance?
800
600
Pentium III CINT2000
400
0
500 1000 1500 2000 2500 3000 3500
Clock rate in MHz
Figure4.7 SPEC web9999 performance for a variety of Dell PowerEdge systems using the
Xeon versions of the Pentium III and Pentium 4 microprocessors.
1.2
1.0
0.8
0.6
0.4
0.2
0.0
SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000 SPECINT2000 SPECFP2000
Always on/maximum clock Laptop mode/adaptive Minimum power/minimum
clock clock
Benchmark and power mode
Experiment
Phone a major computer retailer and tell them you are having
trouble deciding between two different computers, specifically
you are confused about the processors strengths and weaknesses
What kind of response could you give a friend with the same
question?
4.5 Fallacies and Pitfalls
Keywords
Amdahls law A rule stating that the performance enhancement possible
with a given improvement is limited by the amount that the improved feature
is used.
Example:
10 109
Execution time1 2.5 seconds
4 10 9
15 109
Execution time 2 3.75 seconds
4 10 9
(5 1 1)
Now Lets compute the10 9
MIPS rate for each version of the program:
MIPS1 2800
2.5 10 6
(10 1 1) 109
MIPS2 3200
3.75(30) 10 6
So, the code from compiler 2 has a higher MIPS rating, but the code
from compiler 1 runs faster!
Example
Suppose we enhance a machine making all floating-point instructions
run five times faster. If the execution time of some benchmark before
the floating-point enhancement is 10 seconds, what will the speedup be
if half of the 10 seconds is spent executing floating-point instructions?