Lecture 04
Outline of Lecture
1. The Role of Computer Performance 2. Measuring Performance
comp 180
Lecture 04
Summary
The CPU time can be decomposed as follows:
Instructions Clock cycles - ----------------------------------CPU time = --------------------------------Program Instructions Seconds Seconds - = ---------------------- -------------------------------Clock cycle Program
CPU execution time for a pro- Seconds for the program gram Instruction count Clock cycles per instruction (CPI) Clock time Instructions executed for the program Average clock cycles / Instruction Seconds / Clock cycle
comp 180
Lecture 04
The Challenge
There are trade-offs between the various factors that affect performance (i.e., clock cycle, CPI, IC) - optimizing one of them can lead to worsening the performance of the other.
comp 180
Lecture 04
Example
A compiler designer is trying to decide between two code sequences for a particular computer. The hardware designers have supplied the following facts: Instruction class A B C CPI for this instruction class 1 2 3
For a particular high-level-language statement, the compiler writer is considering two code sequences that require the following instruction counts: Code sequence 1 2 IC for instruction class A 2 4 B 1 1 C 2 1
Which code sequence executes the most instructions? Which will be faster? What is the CPI for each sequence?
Answer
Sequence 1 executes 2 + 1 + 2 = 5 instructions.
& ! " #!
comp 180
Lecture 04
'
comp 180
Lecture 04
Speed
MIPS (Million Instructions Per Second) depend on the instruction set - cannot compare two computers with different instruction sets. MIPS varies between programs on the same computer - different programs use different instruction mixes. MIPS can vary inversely to performance (See textbook for concrete example).
comp 180
Lecture 04
)19A@5BC25476
MFLOPS (Million Floating-point Operations Per Second) depends on the program. Must be FP intensive. MFLOPS depends on the computer as well. The oating point operations vary in complexity (e.g., add & divide). Assumes other type of instructions are not work.
25DFEHG
Peak performance occurs only for meaningless codes (e.g., executing the same instruction over and over).
comp 180
Lecture 04
Benchmarks
Benchmarking are the programs to use to measure performance
Real programs: CAD tools, text processing (Latex), compilers (C) - have inputs, outputs, and options when the user wants to use them. The most accurate way to characterize performance.
Kernels: key pieces from real programs. Typically used to extract specic features of the machine. They are good for focusing on individual features. Often useful for processor designer. Have little value.
comp 180
Lecture 04
Benchmark Suites
These are a collection of programs that try to explore and capture all the strengths and weaknesses of a computer system (real programs, kernels). Good benchmarks accelerate progress
V
help real programs vs. sell machines (optimize the machine for the benchmark). Inventions that help real programs may not help benchmark.
comp 180
Lecture 04
comp 180
Lecture 04
Comparing Performance
Comparing the performance by looking at individComputer A Program P1 (secs) Program P2 (secs) Total time (secs) 1 1000 1001 Computer B 10 100 110 Computer C 20 20 40
ual programs is not fair. A better measure would be the arithmetic mean:
n 1 -- Time i n i=1
where Timei is the execution time of program i of a total of n in the workload. An even better measure would be the weighted arithmetic mean:
Weight i Time i = WAM
! " #!
i=1
comp 180
Lecture 04
where Weighti is the frequency of program i of a total of n in the workload. An another approach is to measure the performance of a new computer using a certain program by normalizing it to a reference machine.