Anda di halaman 1dari 3

ASSIGNMENT-1

ACSA-2012
1. You have a system that contains a special processor for doing floating-point operations. You have determined that 60% of your computations can use the floating-point processor. When a program uses the floating-point processor, the speedup of the floating-point processor is 40% faster than when it doesnt use it. a) What is the overall speedup by using the floating-point processor? b) In order to improve the speedup you are considering two options: Option 1: Modify the compiler so that 70% of the computations can use the floating-point processor. Cost incurred is Rs. 25 lakhs. Option 2: Improve the FP unit such that it doubles the speed of FP operations. Assume in this case that 50% of the computations can use the floatingpoint processor. Cost incurred is Rs. 30 lakhs. Which option would you recommend? Justify your answer quantitatively.

2. Suppose you have a load/store computer with the following instruction mix: Operation ALU ops Loads Stores Branches Frequency 35% 25% 15% 25% clock_cycle 1 2 2 3

a) Compute the average CPI. b) It is observed that 35% of the ALU ops are paired with a load, and we propose to replace these ALU ops and their loads with a new instruction. The new instruction takes 1 clock cycle. With the new instruction added, branches take 5 clock cycles, Compute the CPI for the new version. c) If the clock of the old version is 20% faster than the new version, which version has a lower CPU execution time and by how much percent ? 3. Consider the following four-segment normalized floating-point segment, which equals the pipeline clock period. X Y S1 S2 S3 S4 Z adder with a 10ns delay per

Name the appropriate function to be performed by the four segments. Find the minimum number of pipeline clock periods required to add 100 floating point numbers A1 , A2 , ., A100 using this pipeline adder. Assume that the output Z of stage S4 can be fed back to any of the inputs with a delay equal to the pipeline clock period. Compute the values of speedup of this pipeline in doing the job. What is its throughput? 4. i) The memory unit of a computer has 256K words of 32 bits each. The computer has an instruction format with four fields: an operation code field, a mode field to specify one of the seven addressing modes, a register address field to specify one of the 60 processor registers, and a memory address. Specify the instruction format and the number of bits in each field if the instruction is in one memory word. ii) An instruction is stored at location 300 with its address field at location 301. the address field has the value 400. A processor register R1 contains the number 200. Evaluate the effective address if the addressing mode of the instruction is a) direct; b) immediate: c) relative: d) register indirect: e) index with R1 as the index register. 5. The pipeline of figure given below has the following propagation times: 40 ns for the operand to be read from the memory into the registers R1 and R2, 45 ns for the signal to propagate through the multiplier, 5 ns for the transfer into R3, and 15 ns to add the two numbers into R5. a) What is the minimum clock cycle time that can be used ? b) A non-pipeline system can perform the same operation by removing R3 and R4. How long will it take to multiply and add the operand without using the pipeline ? c) Calculate the speedup of the pipeline for 10 tasks and again for 100 tasks. d) What is the maximum speedup that can be achieved ?

6. Suppose the branch frequencies (as percentage of all instructions) are as follows: conditional branches jumps and calls conditional branches taken 20% 5% 60% are taken

We are examining a four-deep pipelining where the branch is resolved at the at end of the second cycle for unconditional branches and at the end of the third cycle for conditional branches. Assuming that only the first pipe stage can always be done independent of whether the branch goes and ignoring other pipelining stalls, how much faster would the machine be without any branch hazard.

Anda mungkin juga menyukai