Anda di halaman 1dari 20

University of California, Berkeley

College of Engineering
Computer Science Division  EECS

Spring 2003 John Kubiatowicz

Midterm I
March 12, 2003
CS152 Computer Architecture and Engineering

Your Name:

SID Number:

Discussion Section:

Problem Possible Score

1 25

2 20

3 25

4 30

Total

1
[ This page left for π ]

3.141592653589793238462643383279502884197169399375105820974944

2
Problem 1: Short Answer
Problem 1a [3 pts]:
What is Amdhal’s law? Give a formula and define the terms. How is this useful?

1
Amdal’s law: speed up =
F
(1 − F ) +
n
F is the part that you speed up
n is the speed up ratio

The Amdal’s law tells you can NOT speed up too much the whole system by speeding up only one part

Problem 1b [3 pts]:
What are setup and hold-time and how can they be violated in a synchronous circuit? Can you
still use a chip that is experiencing hold-time violations? How about setup violations?
Tsetup is the time that input signal should arrive before the clock edge
Thold is the time that input signal should keep unchanged after the clock edge

If the clock cycle time is not long enough, you could have Setup time violation. To avoid setup violation, the cycle
time Tclk should satisfy:
Tclk>Tclk_Q+Tmax+Tsetup+Tskew+Tj
If the combinational logic delay is too short, you could have hold time violation. The following has to be satisfied
to avoid hold time violation:
Tclk_Q+Tmin>Tskew+Thold

If setup time is violated, you can slow down the clock to make the chip work
If hold time is violated, you can NOT make the chip work

Problem 1c [2 pts]:
Is the multi-cycle data path always faster than the single-cycle data path? Explain.
NO, for example: lw for MIPs in multi-cycle data path will be slower than single-cycle data path for more Tclk_Q
overhead is introduced in multi-cycle data path.

Problem 1d [3 pts]:
What are precise interrupts and why is it easy to provide them for our multi-cycle data path?
Precise interrupts means that the instructions before offending instruction are executed but no instructions are
executed after the offending instruction. In our multi-cycle data path, it is easy to provide precise interrupts for
interrupts only occur before WB stage and instructions are executed in order.

3
Problem 1e [3 pts]:
Suppose that you have analyzed a benchmark that runs on your company’s processor. This
processor runs at 500MHz and has the following characteristics:

Instruction Type Frequency (%) Cycles


Arithmetic and logical 40 1
Load and Store 30 2
Branches 20 1
Floating Point 10 12

What is the CPI and MIPS rating of this processor running this benchmark?
CPI=1*0.4+2*0.30+1*0.2+12*0.10=2.4
MIPS=500/2.4=208 MIPS

Problem 1f [2 pts]:
What is the technique used for a carry-select adder and how can it be used in general to speed up
hardware (hint: this is a very general technique to make a time⇔space tradeoff)

In carry select adder, you pre-compute the addition both with carry in “0” and “1”. Then, the correct result is
selected by the late arrival carry in. In general, you can throw in more hardware, do parallel pre-computing and
select the correct by using late arrival signal, which can speed up the hardware.

Problem 1g [3pts]: What does it mean for a branch to have a delay slot, and why does it
complicate the servicing of interrupts?

Delay slot means that the instruction right after branch is always executed regardless the branch is taken or not. If
the instruction in the delay causes the exception, you have to re-evaluate the branch to figure out the correct next
executing instruction after the interrupt service.

4
Problem 1h[3pts]: The Clark paper on testing talked about using randomness in at least two
different ways during testing of the VAX. What were they?
1. Use random vector
2. Insert random errors

Problem 1i [3 pts]:
The 1-bit Booth algorithm recodes one of the operands of a multiplier from binary into trinary
logic with symbols: 1 , 0, and 1. The transformation occurs one bit at a time, as given in class:
Cur Prev Out
0 0 0
0 1 1
1 0 1
1 1 0
Suppose we encode 3 bits at a time. Finish filling out the following transformation table:
Cur Prev Out
000 0 0
000 1 1
001 0 1
001 1 2
010 0 2
010 1 3
011 0 3
011 1 4
100 0 -4
100 1 -3
101 0 -3
101 1 -2
110 0 -2
110 1 -1
111 0 -1
111 1 0

5
Problem 2: Delay For a Full Adder
A key component of an ALU is a full adder. A symbol for a full adder is:

A B

Cout Full Adder Cin

Problem 2a [5pts]:
Implement a full adder using as few 2-input AND, OR, and XOR gates as possible. Keep in
mind that the Carry In signal may arrive much later than the A or B inputs. Thus, optimize your
design (if possible) to have as few gates between Carry In and the two outputs as possible:

S
B

Cin Cout

6
Assume the following characteristics for the gates:
AND: Input load: 100fF,
Propagation delay: TPlh=0.4ns, TPhl=0.4ns,
Load-Dependent delay: TPlhf=.0020ns, TPhlf=.0021ns
OR: Input load: 100fF
Propagation delay: TPlh=0.2ns, TPhl=0.6ns
Load-Dependent delay: TPlhf=.0020ns, TPhlf=.0021ns
XOR: Input load: 200fF,
Propagation delay: TPlh=.8ns, TPhl=.8ns
Load-Dependent delay: TPlhf=.0040ns,TPhlf=.0042ns

Problem 2b [3pts]:
Compute the input load for each of the 3 inputs to your full adder:

CA=CB=Ccin=200fF+100fF=300fF

Problem 2c [4pts]:
Identify two critical paths from the inputs to the Sum and the Carry Out signal.
Compute the propagation delays for these critical paths based on the information given above.
(You will have 2 numbers for each of these two paths):

Notice A and B are symmetric inputs


Critical path from inputs to the sum: since XOR is slow compared to AND and OR, the critical
path for sum is A->XOR->XOR->S

Critical path from inputs to the carry out: since XOR is slow compared to AND and OR, the
critical path for sum is A->XOR->AND->OR->Cout
W/o wire delay caculation
TphlAS= 0.8+0.0042*(200+100)+0.8=2.86ns
TplhAS= 0.8+0.0042*(200+100)+0.8=2.86ns

TphlACout=0.6+0.0021*200+0.4+0.0042*(200+100)+0.8=3.48ns
TplhACout=0.2+0.0021*200+0.4+0.0042*(200+100)+0.8=3.08ns

W/ wire delay caculation


TphlAS=0.8+0.0042*(200+100)*2+0.8=4.12ns
TplhAS=0.8+0.0042*(200+100)*2+0.8=4.12ns

TphlACout=0.6+0.0021*200*2+0.4+0.0042*(200+100)*2+0.8=5.16ns
TplhACout=0.2+0.0021*200*2+0.4+0.0042*(200+100)*2+0.8=4.76ns

Problem 2d [2pts]:
Compute the Load Dependent delay for your two outputs.
TphlASf=0.0042ns/fF, TplhASf=0.0040ns/fF, TplhACoutf=0.0020ns/fF
TphlACoutf=0.0021ns/fF

7
Problem 2e [6pts]:
Suppose we wish to build a fast adder. One component might be a 4-bit adder with propagate
and generate signals (such as might be used for carry-lookahead logic). Construct this adder as a
4-bit ripple adder internally (you can use your full-adders) with separate propagate and generate
outputs. Let the output carry be a function of the propagate and generate signals. Draw a circuit
for this component and compute the propagation delay for the slowest signal.

P
A
A[3:0] B[3:0]
S
B
C out 4-BIT Adder Cin
[FAST]

S[3:0] G P Cin Cout

A3 B3 A2 B2 A1 B1 A0 B0

1 bit full adder 1 bit full adder 1 bit full adder 1 bit full adder Cin

S3 S2 S1 S0

P0

P1

P2
C4
P3

Notice that the slowest path should be A0->S3 for the XOR gate is much slower than AND/OR
used for generating Carry out C4(which uses 7 AND/OR levels). Assuming that P0 drives 1 AND
gate, P1 drive 2 AND gates, P2 drives 3 AND gates and P3 drives 4 AND gates for C4
generation. Notice the implementation is NOT unique!

W/o wiring delay:


TphlCinS3=[0.6+0.4+0.0021*100+0.8+0.0042*(100*2+200)]+2*(0.0021*300+0.4+0.0021*200+0.6)+0.0021*300
+0.8=9.22ns
W/ wiring delay:
TphlCinS3=[0.6+0.4+0.0021*100*2+0.8+2*0.0042*(100*2+200)]+2*(0.0021*300*2+0.4+0.0021*2*200+0.6)+0.0
021*300*2+0.8=13.84ns

8
Problem 3: Non-Restoring Division
Here is the pseudo-code for an unsigned division algorithm. It is the non-restoring version of the
last divider that we developed. Assume that quotient and remainder are 32-bit global values.
As far as inputs are concerned dividend is 32 bits wide, while divisor is no more than 31 bits.

divide(dividend, divisor)
{ int count;

/* Missing initialization instructions */

MOBIUS64(remainder,quotient);
while (count > 0) {
count--;
/* Low bit of quotient is inverted sign of remainder*/
if (quotient & 0x1)
remainder = remainder - divisor;
else
remainder = remainder + divisor;

MOBIUS64(remainder,quotient);
}

/* Missing trailing instructions */


}

The MOBIUS64(hi,lo) instruction rotates the 64-bit value <hi,lo> around to the left 1 bit,
wrapping the top bit of hi back to the bottom of lo and inverting it in the process. So, for
instance: MOBIUS64(0xFFFFFFFF, 0x000000FF) ⇒ 0xFFFFFFFE, 0x000001FE
And: MOBIUS64(0x0FFFFFFF, 0x000000FF) ⇒ 0x1FFFFFFE, 0x000001FF

Problem 3a [5pts]:
Implement MOBIUS64($t1, $t0) as 6 MIPS instructions. Assume $t2 contains the constant
0x80000000. Hint: what can different flavors of slt/sltu do to help?
slt $t3, $t0,$0
sltu $t4, $t1, $t2
sll $t0,$t0, 1
sll $t1,$t1,1
addiu, $t0,$t0, $t4
addiu, $t1,$t1, $t3

Problem 3b [5pts]:
The divide algorithm is incomplete. It is missing some initialization and some final code. What
is missing? (Careful – what if the remainder is negative?)
Initialization: Remainder =0; Quotient Dividend; Count=32;
Trail: Remainder =Remainder>>1;
If (!(Quotient&0x01)){
Remainder=Remainder | 0x80000000;
Remainder+=Divisor;
}

9
Problem 3c [11pts]:
Assume that you have a MIPS processor that is missing the divide instruction. Assume that
dividend and divisor are in $a0 and $a1 respectively, and that remainder and quotient are
returned in registers $v0 and $v1 respectively. You can use MOBIUS64 as a pseudo-instruction
that takes 3 registers (don’t forget the constant in $t2). Make sure to adher to MIPS register
conventions, and optimize the loop as much as possible.

Divide:
mov $t1, $0
mov $t0, $a0
ori $t8, 32
lui $t2, 0x8000

loop:
MOBIUS64($t1,$t0)
beq $t8, $0, DONE
addi $t8, $t8, -1
andi $t3, $t0, 1
beq $t3, $0, MOBIUSNEG
j loop

MOBIUSNEG:
add $t1, $t1, $a1
j loop
DONE:
mov $v1, $t9
mov $v0, $t1
srl $v0,$v0, 1
andi $t3, $v1,1
bne $t3,$0, exit
or $v0,$v0, $t2
add $v0,$v0,$a0
exit: jr $ra

Problem 3d [4pts]:
How will this algorithm have to change if the divisor is allowed to be 32 bits in size?
Check for the 32bit value for divisor and treat separately
Either Q=1, Remainder=Dividend-Divisor
Or Q=0, Remainder=Divisor

10
Problem 4: New instructions for a multi-cycle data path
PCWr PCWrCond PCSrc
Zero
IorD MemWr IRWr RegDst RegWr ALUSelA 1
32

Mux
32
PC
0 0
32 Zero

Instruction Reg
Rs

Mux
0 Ra
32 RAdr 5 32
Mux

ALU Out
32 Rt

ALU
Rb busA A 1
32 Ideal
Mem Data Reg
5 Reg File 32
1
Memory Rt 0 4 0

Mux
WrAdr 32 Rw 32
B 1
32 Din Dout Rd busW busB 32 32
1 2
32 ALU
1 Mux 0 3
<< 2 Control

Imm 16 Extend
32 ALUOp
ExtOp MemtoReg ALUSelB
The Multi-Cycle datapath developed in class and the book is shown above. In class, we
developed an assembly language for microcode. It is included here for reference:

Field Name Values For Field Function of Field


Add ALU Adds
Sub ALU subtracts
ALU
Func ALU does function code (Inst[5:0])
Or ALU does logical OR
PC PC ⇒ 1st ALU input
SRC1
rs R[rs] ⇒1st ALU input
4 4 ⇒ 2nd ALU input
rt R[rt] ⇒ 2nd ALU input
SRC2 Extend sign ext imm16 (Inst[15:0]) ⇒ 2nd ALU input
Extend0 zero ext imm16 (Inst[15:0]) ⇒ 2nd ALU input
ExtShft 2nd ALU input = sign extended imm16 << 2
rd-ALU ALUout ⇒ R[rd]
ALU Dest rt-ALU ALUout ⇒ R[rt]
rt-Mem Mem input ⇒ R[rt]
Read-PC Read Memory using the PC for the address
Memory Read-ALU Read Memory using the ALUout register for the address
Write-ALU Write Memory using the ALUout register for the address
MemReg IR Mem input ⇒ IR
ALU ALU value ⇒ PCibm
PC Write
ALUoutCond If ALU Zero is true, then ALUout ⇒ PC
Seq Go to next sequential microinstruction
Sequence Fetch Go to the first microinstruction
Dispatch Dispatch using ROM

11
In class, we made our multicycle machine support the following six MIPS instructions:

op | rs | rt | rd | shamt | funct = MEM[PC]


op | rs | rt | Imm16 = MEM[PC]

INST Register Transfers


ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4
SUBU R[rd] ← R[rs] - R[rt]; PC ← PC + 4
ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4
LW R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4
SW MEM[R[rs] + sign_ext(Imm16)] ← R[rs]; PC ← PC + 4
BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + sign_ext(Imm16) || 00
else PC ← PC + 4
For your reference, here is the microcode for two of the 6 MIPS instructions:

Label ALU SRC1 SRC2 ALUDest Memory MemReg PCWrite Sequence


Fetch Add PC 4 ReadPC IR ALU Seq
Dispatch Add PC ExtShft Dispatch

RType Func rs rt Seq


rd-ALU Fetch
BEQ Sub rs rt ALUoutCond Fetch

In this problem, we are going to add five new instructions to this data path:
jal <const> ⇒ PC ← zero_ext(Instr[25:0]) ||00
R[31] ← PC + 4
addiu $rt, $rs, <const> ⇒ R[rt] ← R[rs] + sign_ext(Imm16)

multu $rs, $rt ⇒ <hi,lo> ← R[rs] × R[rt]

mfhi $rs ⇒ R[rs] ← <hi>


mflo $rs ⇒ R[rs] ← <lo>

1. The jal instruction is familiar to you from the normal MIPS instruction set.
2. The addiu instruction is also a normal MIPS instruction that has an immediate value
3. The multu instruction is an unsigned multiply, and the mfhi/mflo instructions are for getting
the result.

12
Problem 4a [10pts]:
Describe/sketch the modifications needed to the datapath for the new instructions. Assume that
the original datapath had only enough functionality to implement the original 6 instructions. Try
to add as little hardware as possible. In particular, you must use the existing ALU for multiply.
You can showing additions to the data path – you do not need to completely redraw it. Make
sure that you are very clear about your changes. Hint: you need to add hi and lo registers. They
can be made from shift-registers. Your CPI for multiply should be no more than 40. So: how will
you loop 32 times for multu???
jal
1.expand RegW mux to include $r31
2.expand BusW mux to include data from pc register
3.expand PC source mux to include jump address
PCsrc

RegW
32
BusW 32
5 ALU
31 RegtoMem
Mux 32
M
PC
M ALUout u
Rt 5 RW
u 32 x
32 32 32 PC[31:28] || ShiftedExtImm
5 x
Rd
PC MEM ALUout

addiu
No change is needed to implement this instruction
WriteHi WriteLo
multu $rs, $rt
1. pc<-pc+4
2. ALUout<-pc+ShiftedExImm
OVM hi lo Shift
3. lo<-R[rt],hi<-0,A<-R[rs],count<-32
4. hi<-hi+ lo[0]= =0? 0, A; count<-count-1
5. srl ovm||hi||lo, 1 32 32
6. hi<hi+ lo[0]= =0? 0, A; count<-count-1 HiSr Mux
R[rt]

ALUsrcA 32 32
ALUsrcB
PC 0
32 ALU
Lo[0] 4 32
M
B M 32
u Counter
0 32 32 u
x ExImm load
M 32 x
32 dec Zero
u shiftExImm
32 x
A Hi
32
mfhi RegW
32
BusW
5
RegtoMem Rs
mflo Mux
M
Rt 5 RW
32 u
32 32
5 x
Rd
Hi Lo MEM ALUout13
Problem 4b [6pts]:
Draw a block diagram of as microcontroller that will support the new instructions (it will be
slightly different than that required for the original instructions). Include sequencing hardware,
the dispatch ROM, the microcode ROM, and decode blocks to turn the fields of the microcode
into control signals. Hint: you will need to provide a branching facility to your sequencer. Make
sure to include this functionality!

microPC
Adder

-1 0
Mux
M
Adder u 0
x µAddress ROM
1 Select
Logic

Opcode

Counter is not zero

ALUop SRC1 SRC2 ALUdest Mem MemReg PCWrite Sequence Multu

5 3 6 7 3 1 4 5
4

14
Problem 4c [4pts]:
Describe changes to the microinstruction assembly language for these new instructions. How
wide are your microinstructions now?
ALUop: no change (3bits)
SRC1: 1bit->2bits (including “0” data)
SRC2: no change (3bits, but the mux is expanded)
ALUdest: 2bits->3bits(supporting r31-PC, rs-Hi ,rs-Lo)
Mem: no change (2bits)
MemReg: no change (1bit)
PCWrite: no change (2bits, but support jump address now)
Sequence: no change (2bist, but support “dec”)
Multu: new 3bits(support load, shift, WriteHi,WriteLo, HiSrc)
New microinstruction length is :3+2+3+3+2+1+2+2+3=21

Problem 4d [10pts]:
Write complete microcode for the new instructions. Include the Fetch and Dispatch
microinstructions. If any of the microcode for the original instructions must change, explain how.

label ALU SRC1 SRC2 ALUdest Mem MemReg PCwrite Sequence MULTU
fetch add pc 4 ReadPC IR ALU seq
dispatch add pc shiftext dispatch

jal 31-pc jumpaddr fetch

addiu add rs extImm seq


rt-ALU fetch

multu Load, WriteHi,


Writelo, HiSrc=0
add 0 or rs hi seq Count--
,WriteHi,HiSrc=1
dec shift

mfhi rs-Hi fetch

mflo rs-Lo fetch

15
Problem 4e [Extra Credit: 5pts]: Describe a (relatively) small change that will allow you to
implement the signed mult instruction (think Booth!). Be complete with your answer
(datapath/microcode, etc).

The microcode will be quite similar to the non-booth encoding case. but one reset signal is
introduced to reset prev lo[0] 1bit register to “0”.

WriteHi WriteLo
Shift

OVM hi lo Prev lo[0]

32 32
Mux Reset
R[rt]
32 32

0 ALU

ALUsrcA

Lo[1:0] 32

PC M 32

0 u
x
A M 32
u
-A x

16
[ This page left for scratch ]

17
[ This page left for scratch ]

18
EXTRA SHEET FOR PROBLEM 4: FEEL FREE TO REMOVE
PCWr PCWrCond PCSrc
Zero
IorD MemWr IRWr RegDst RegWr ALUSelA 1
32

Mux
32
PC
0 0
32 Zero

Instruction Reg
Rs

Mux
0 Ra
32 RAdr 5 32
Mux

ALU Out
32 Rt

ALU
Rb busA A 1
32 Ideal

Mem Data Reg


5 Reg File 32
1
Memory Rt 0 4 0

Mux
WrAdr 32 Rw 32
B 1
32 Din Dout Rd busW busB 32 32
1 2
32 ALU
1 Mux 0 3
<< 2 Control

Imm 16 Extend
32 ALUOp
ExtOp MemtoReg ALUSelB
The Multi-Cycle datapath developed in class and the book is shown above. In class, we
developed an assembly language for microcode. It is included here for reference:
Field Name Values For Field Function of Field
Add ALU Adds
Sub ALU subtracts
ALU
Func ALU does function code (Inst[5:0])
Or ALU does logical OR
PC PC ⇒ 1st ALU input
SRC1
rs R[rs] ⇒1st ALU input
4 4 ⇒ 2nd ALU input
rt R[rt] ⇒ 2nd ALU input
SRC2 Extend sign ext imm16 (Inst[15:0]) ⇒ 2nd ALU input
Extend0 zero ext imm16 (Inst[15:0]) ⇒ 2nd ALU input
ExtShft 2nd ALU input = sign extended imm16 << 2
rd-ALU ALUout ⇒ R[rd]
ALU Dest rt-ALU ALUout ⇒ R[rt]
rt-Mem Mem input ⇒ R[rt]
Read-PC Read Memory using the PC for the address
Memory Read-ALU Read Memory using the ALUout register for the address
Write-ALU Write Memory using the ALUout register for the address
MemReg IR Mem input ⇒ IR
ALU ALU value ⇒ PCibm
PC Write
ALUoutCond If ALU Zero is true, then ALUout ⇒ PC
Seq Go to next sequential microinstruction
Sequence Fetch Go to the first microinstruction
Dispatch Dispatch using ROM

19
EXTRA SHEET FOR PROBLEM 4: FEEL FREE TO REMOVE
In class, we made our multicycle machine support the following six MIPS instructions:

op | rs | rt | rd | shamt | funct = MEM[PC]


op | rs | rt | Imm16 = MEM[PC]

INST Register Transfers


ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4
SUBU R[rd] ← R[rs] - R[rt]; PC ← PC + 4
ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4
LW R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4
SW MEM[R[rs] + sign_ext(Imm16)] ← R[rs]; PC ← PC + 4
BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + sign_ext(Imm16) || 00
else PC ← PC + 4
For your reference, here is the microcode for two of the 6 MIPS instructions:

Label ALU SRC1 SRC2 ALUDest Memory MemReg PCWrite Sequence


Fetch Add PC 4 ReadPC IR ALU Seq
Dispatch Add PC ExtShft Dispatch

RType Func rs rt Seq


rd-ALU Fetch
BEQ Sub rs rt ALUoutCond Fetch

In this problem, we are going to add five new instructions to this data path:
jal <const> ⇒ PC ← zero_ext(Instr[25:0]) ||00
R[31] ← PC + 4
addiu $rt, $rs, <const> ⇒ R[rt] ← R[rs] + sign_ext(Imm16)

multu $rs, $rt ⇒ <hi,lo> ← R[rs] × R[rt]

mfhi $rs ⇒ R[rs] ← <hi>


mflo $rs ⇒ R[rs] ← <lo>

4. The jal instruction is familiar to you from the normal MIPS instruction set.
5. The addiu instruction is also a normal MIPS instruction that has an immediate value
6. The multu instruction is an unsigned multiply, and the mfhi/mflo instructions are for getting
the result.

20

Anda mungkin juga menyukai