Anda di halaman 1dari 18

dce

2010

COMPUTER ARCHITECTURE CE2010


BK
TP.HCM

Faculty of Computer Science and Engineering Department of Computer Engineering Nam Ho

dce

2010

Chapter 1
Computer Abstraction and Technology - Exercises
Adapted from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

dce

2010

Pitfall: MIPS as a Performance Metric


MIPS: Millions of Instructions Per Second
Doesnt account for
Differences in ISAs between computers Differences in complexity between instructions

Instruction count MIPS = Execution time 106 Instruction count Clock rate = = Instruction count CPI CPI 106 6 10 Clock rate

CPI varies between programs on a given CPU


Computer Architecture Chapter 1 2010, Dr. Dinh Duc Anh Vu 3

dce

2010

Using MIPS
There are three problems with MIPS:
MIPS specifies the instruction execution rate but not the capabilities of the instructions MIPS varies between programs on the same computer MIPS can vary inversely with performance (see the next example)

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

dce

2010

Exercise 1
Consider the machine with the following three instruction classes and CPI:
CPI for this instruction class A B C 1 2 3

CPI

Now suppose we measure the code for the same program from two different compilers and obtain the following data:
Code from Compiler 1 Compiler 2 Instruction count in (billons) for each instruction class A B C 5 1 1 10 1 1

Assume that the machines clock rate is 500 MHz. Which code sequence will execute faster according to execution time ? According to MIPS?
2010, Dr. Dinh Duc Anh Vu 5

Computer Architecture Chapter 1

dce

2010

Exercise 1
Using the formula: CPU clock cycles
Clock Cycles = (CPIi Instruction Count i )
i=1 n

Execution time = CPU clock cycles/clock rate

Compiler 1:
CPU clock cycles = (5 x 1 + 1 x 2 + 1 x 3) x 109 = 10 x 109 cycles Execution time = (10 x 109)/500 x 106 = 20 seconds

Compiler 2:
CPU clock cycles = (10 x 1 + 1 x 2 + 1 x 3) x 109 = 15 x 109 cycles Execution time = (15 x 109)/(500 x 106) = 30 seconds

Therefore compiler 1 generates a faster program

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

dce

2010

Exercise 1
Instruction count MIPS = Execution Time 106

Compiler 1: Compiler 2:

(5 + 1 + 1) 109 MIPS = = 350 6 20 10 (10 + 1 + 1) 109 MIPS = = 400 6 30 10

Although compiler 2 has a higher MIPS rating, the code from generated by compiler 1 runs faster
Computer Architecture Chapter 1 2010, Dr. Dinh Duc Anh Vu 7

dce

2010

Exercise 2
PowerPC G4 can execute 4 instructions/cycle with a clock rate 867 MHz. What is the MIPS rate ? Answer:
CPI = 1/4 MIPS = 1/(CPI x Cycle Time x 106) MIPS = Clock Rate/CPI x 106 MIPS = 4 x 867 = 3468

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

dce

2010

Speedup - Amdahls Law


Speedup = Execution Time Old --------------------------------Execution Time New
1

Tbefore improved = --------------Timproved

Speedup = ----------------------------------, n is the improvement factor 1 fenhanced + fenhanced/n Taffected fenhanced = ----------------, Tbefore improved = Taffected + Tunaffected Tbefore improved

Timproved =

Taffected + Tunaffected improvement factor

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

dce

2010

Exercise 3
Consider the implementation of a ISA. The clock rate is 2Ghz. What is the execution time of the following program ?
Arith Instructions CPI 500 1 Store 50 5 Load 100 5 Branch 50 2

Execution Time = (500 1 + 50 5 + 100 5 + 50 2) 0.5 109 = 675 ns If the number of load instructions can be reduced by one-half, what is the speedup and the CPI? Execution Time = (500 1 + 50 5 + 50 5 + 50 2) 0.5 109 = 550 ns Speedup = 675/550 = 1.22 CPI = Execution Time x Clock rate/ Instruction Count CPI = 550 x 10-9 x 2 x 109/700 = 1.57

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

10

dce

2010

Exercise 4
Speedup = Execution Time Old --------------------------------Execution Time New
1

Tbefore improved = --------------Timproved

Prove that Speedup = ---------------------------------1 fenhanced + fenhanced/n n is the improvement factor Taffected fenhanced = ---------------Tbefore improved Tbefore improved = Taffected + Tunaffected

Taffected Timproved = + Tunaffected improvement factor


Computer Architecture Chapter 1 2010, Dr. Dinh Duc Anh Vu 11

dce

2010

Exercise 5
A common transformation required in graphics processors is square root. Implementations of floating-point (FP) square root vary significantly in performance, especially among processors designed for graphics. Suppose FP square root (FPSQR) is responsible for 20% of the execution time of a critical graphics benchmark. One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10. The other alternative is just to try to make all FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responsible for half of the execution time for the application. Compare these two design alternatives.

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

12

dce

2010

Exercise 5
SpeedupFPSQR
1 = = 1.22 = 0.2 0.82 (1 0.2) + 10 1

SpeedupFP =

1 = = 1.23 0.5 0.8125 (1 0.5) + 1.6

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

13

dce

2010

Geometric Mean
GM =
n

Execution time ratio


i =1

The advantage of the geometric mean is that


It is independent of the running times of the individual programs And it doesnt matter which computer is used for normalization.

The drawback to using geometric means of execution times is that


They violate our fundamental principle of performance measurement, they do not predict execution time

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

14

dce

2010

Geometric Mean

A and B have the same performance. Performance of C is 0.63 of A or B. Unfortunately, the total execution time of A is 10 times longer than that of B, and B in turn is about 3 times longer than C.
2010, Dr. Dinh Duc Anh Vu 15

Computer Architecture Chapter 1

dce

2010

Exercise 6
Show that the Geometric Mean doesnt matter which computer is used for normalization.

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

16

dce

2010

Fallacy: MFLOPS
MFLOPS: Million Floating-point Operations Per Second Fallacy: MFLOPS is a consistent and useful measure of performance. For example, the Cray C90 has no divide instruction, while the Intel Pentium has divide, square root, sine, and cosine

Computer Architecture Chapter 1

2010, Dr. Dinh Duc Anh Vu

17

dce

2010

Exercise 7
Consider the programs in the following table, running on a processor with clock rate = 3GHz. Find the MFLOPS for the program a, b.

Program a: FPop = 106 0.4 = 4 105, clockcylesfp = CPI No. FP instr. = 4 105 Tfp = 4 105 0.33 109 = 1.32 104 then MFLOPS = 3.03 103 Program b: FPop = 3 106 0.4 = 1.2 106, clockcylesfp = CPI No. FP instr. = 0.70 1.2 106 Tfp = 0.84 106 0.33 109 = 2.77 104 then MFLOPS = 4.33 103
Computer Architecture Chapter 1 2010, Dr. Dinh Duc Anh Vu 18