EE-457 Spring

EE457 Final (~30%)
Closed-book Closed-notes Exam; No cheat sheets;

Calculators are not needed and are not allowed. Verilog Guides are not needed and are not allowed.
Smart phones, tablets (and any kind of computing/Internet devices) are not allowed.
This is a Crowdmark exam. Please do not write on margins or on backside.
Spring 2016
Instructor: Gandhi Puvvada
Saturday, 5/7/2016
10:30 AM - 01:30 PM (3 Hour 00 min. = 180 min)
Location: SGM123
Student’s Last Name: _______________________________________
Student’s First Name: _______________________________________
@usc.edu
Student’s DEN Bb username: ______________________________
Ques# Topic Page# Time Points
1 Lab 7 Part 3 modified 2-5 70 min. 116

2 FIFO and ROB 6 15 min. 39
3 Branch Prediction 7 25 min. 48
4 CMP, CMT, Cache Coherency, LL/SC 8-9 35 min. 91
5 Tomasulo 10-11 20 min. 42
6 Virtual Memory 11 15 min. 29
Total 11 180 min. 365
Perfect Score 340
Viterbi School of Engineering

University of Southern California
May 6, 2016 11:18 am EE457 Final - Spring 2016 1 / 11 C Copyright 2016 Gandhi Puvvada
1 ( 42 + 28 + 46 = 116 points) 70 min. Pipelining (Lab 7 Part 3 modified)
This is a mix of the two past midterm questions (from Fall 2010 and Spring 2011) that you were
asked to go through. Please see the block diagram on the next page. It has five stages, IF, ID, EX1,
EX2, and WB. There is a ADD4 unit in each of the EX1 and EX2 stages. Rest of this page is
reproduced from the Fall 2010 Midterm solution.
The BZ (branch if zero) line was added from the Spring 2011 midterm.
Here, in EX1 also we have an ADD4 unit. No SUB3 at all. One can perform ADD4 in each of the
two stages EX1 and EX2 to do ADD8 as shown below.
Instruction Operation Opcode MSD 32-bit instruction in hex
MOV SUB3 ADD4 ADD8 D=Destination, S=Source
BZ
NOP 0 0 0 0 0 000000DS
MOV $R, $X; ($R) <= ($X) 1 0 0 0 8 800000DS
SUB3 $R, $X; ($R) <= ($X) - 3 0 1 0 0 4 400000DS

BZ $X, JJJJ; (PC) <= JJJJ if ($X) = 0 0 1 0 0 4 4JJJJ0DS
ADD4 $R, $X; ($R) <= ($X) + 4 0 0 1 0 2 200000DS
ADD8 $R, $X; ($R) <= ($X) + 8 0 0 0 1 1 100000DS
An ADD4 instruction can now execute from either EX1 or EX2.

ADD4 tries to execute as soon as possible so that he can provide forwarding help to his juniors.
However, he will not insist on executing from EX1, if he himself is dependent on, say, an ADD8
ahead of him. In that case he will skip EX1 (SKIP1 = 1) and execute from EX2. Let us call him,
"ADD4_Skpd1" (ADD4 Skipped EX1). Now if the next ADD4 is dependent on this ADD4_Sk-
pd1, then he will also skip EX1 and will become another ADD4_Skpd1! Now if ADD8 comes
after him he needs to stall himself as he can not get help from this ADD4_Skpd1 in time.
It is important to know if the ADD4, currently standing in EX2 has finished execution already
in EX1 (and hence may activate SKIP2 here) or skipped execution in EX1 and came here to get
forwarding help and then perform the addition of 4 here in EX2. To this end, Mr. Trojan carried
the SKIP1 signal into EX2 using a FF in the EX1/EX2 stage register. This signal is called
EX2_SKIP1 (SKIP1 carried into EX2).
Do not forget the MOV instruction. Let us call the MOV instruction, a "she". She necessarily skips
both EX1 and EX2 and may receive forwarding help in either EX1 or EX2 or perhaps in both
EX1 and EX2. Circle all correct statements below.
A. There are occasions where she has to receive the needed forwarding help only in EX1.
B. There are occasions where she has to receive the needed forwarding help only in EX2.
C. There is no harm if she receives help in EX1 from EX2 occupied by ADD8 or
ADD4_Skpd1, as this wasteful help will be replaced by correct help in the next clock.
D. Unlike ADD4 or ADD8, she will never regret (feel sorry) for receiving help in EX2.
E. In short, she can receive help in EX1 as well as EX2 without any worry.
You never need to stall (circle all correct answers): a NOP, a MOV, an ADD4, an ADD8
42
pts
PCSource ID EN
EX1 EN EX2 WB
IF
HDU_EX1
Branch 16 ID_XMS1 EN
1 HDU_BR EX1_XMS1 EX2_XMS1
Address
XMS1
XMS1
0
May 6, 2016 11:18 am

EN ID_XMS2
16 EX1_XMS2 EX2_XMS2
16 ADD4
ADD4
STALL_EX1
XMS2
XMS2
EN
STALL_ID
+
IFRF
1
Reg. File
XD_ZERO
A+4
JJJJ
FA_Mux XD A+4 R2_Mux
16 FB_Mux R1_Mux RD
XB_Mux
XA_Mux
XA XD 0 0
0 WB_RD
I-MEM
0
XA 1 XD 0 0 A 0
1 1
1 A
WB_RA RA
1 1 1
WB_RD
PC RD Cout
R-Write ID_XA Cout
WB_Write XA
EX1_XA
FORW_1A
FORW_1B
SKIP2
ID_BZ
BZ
SKIP1
WB_Write
EX2_SKIP1
FORW_2
Write
RESET_B
FA_Sel
FB_Sel
EE457 Final - Spring 2016

Control signals
Other than BZ
Control signals
Other than BZ
IF_Flush WB_RA
FU_EX1 FU_EX2 RA
3 / 11
RA RA
Control signals
Other than BZ
EX2_Write
RA ID_RA RESET_B RESET_B

RESET_B
RESET_B FU_BR
EX1_Bubble
ID_Bubble
Notes:
Comp Station in the ID Stage Comp Station in the EX1 Stage
1. Cross-out unneeded comparison units and unneeded
ID_XMS1= ID_XA Matched with his S1_RA EX1_XMS1= EX1_XA Matched with his S1_RA
comparison signal propagating FFs.
S1_RA = Senior #1 RA (i.e. here EX1_RA) S1_RA = Senior #1 RA (i.e. here EX2_RA)
2. Complete the 14 items marked as here on this page.
ID_XMS1 ID_XMS2 EX1_XMS1 EX1_XMS2
(ID_XMEX1) (ID_XMEX2) (=/= EX1_XMEX1) (=/= EX1_XMEX2) 14 items: These are 5 EN (enables) for the 5 registers,
5 forwarding paths for the 5 forwarding muxes,
P=Q P=Q P=Q P=Q PCSource, IF_Flush, ID_Bubble and EX1_Bubble.
Actually you can cross-out either ID_Bubble or EX1_Bubble.
P Q P Q P Q P Q 3. Produce the 8 items marked as on the next few pages.
ID_XA S1_RA ID_XA S2_RA EX1_XA S1_RA EX1_XA S2_RA 8 items: There are 2 HDUs, 3 FUs, 2 Skips and 1 Write control. Q#1
Senior#1 RA Senior#2 RA Senior#1 RA Senior#2 RA
(= EX1_RA) (= EX2_RA) (= EX2_RA) (= WB_RA) Modified LAB 7 Part 3 Block Diagram
C Copyright 2016 Gandhi Puvvada

The following paragraph is from the Spring 2011Midterm regarding the BZ instruction.
The BZ (Branch if Zero) instruction uses the opcode previously allocated to the SUB3
instruction. The instructions are 32-bits in size, but the addresses are only 16-bit. PC is 16-bit
wide and is incremented by a "1". The JJJJ in the BZ $X, JJJJ stands for a 16-bit (4-digit
hex) absolute branch address. If the source register $X is a zero then we branch to JJJJ
[ (PC) <= JJJJ if ($X) = 0 ]. The "D" in "4JJJJ0DS" is a random hex digit and should not be treated as
a valid destination, similar to the "DS" in "000000DS" for a NOP instruction. BZ executes from the
ID stage.
Specifics of this semester’s question:
1. Mr. Trojan says that unlike in the MIPS 5-stage pipeline, in this 5-stage pipeline, you can stall
an instruction such as ADD8 in the EX1 stage instead of in the ID stage. Explain the difference
5 between the two pipelines. ______________________________________________________
pts
____________________________________________________________________________
____________________________________________________________________________
So, here only BZ instruction is stalled if required in the ID stage. If any other instruction needs a
8 dependency stall, it is done at EX1 stage. When a BZ instruction is stalled in the ID stage, you
pts
also stall the _________________. When an ADD8 instruction is stalled in EX1, you also stall the
__________________. BZ instruction may get stalled at most for _____ (1/2/3) clocks. ADD8
instruction may get stalled at most for _____ (1/2/3) clocks.
2. Similar to the early branch of our lab 6, we have HDU_BR and FU_BR in the ID stage and
FU_EX1 in the EX1 stage. But, unlike in our lab #6, we have HDU_EX1 in the EX1 stage.
Also there is FU_EX2 in EX2 stage.
3. Notice that we started using more appropriate names for the register match signals.
For example the earlier ID_XMEX1 is now called ID_XMS1 [ID_XA Matches with the S1_RA
(standing for Senior #1 RA) where Senior #1 is the closest senior]. However, Miss Trojan has
crossed out two of the earlier pipeline FFs in the ID/EX1 stage register (as shown in the diagram on the
previous page) and added a fresh comparison station in EX1. She says that, in this design, you can use
the ID Stage comparator station inferences only in the ID stage and should not be carried into the
EX1 stage. Please explain why? __________________________________________________
8 ____________________________________________________________________________
pts
____________________________________________________________________________
The EX1 stage comparisons were however carried into the EX2 stage. Cross-out on the diagram
any unneeded comparison units and comparison signals unnecessarily propagated downstream.
7 In the lab 6 Part 5, we removed 2 comparison units inside the IFRF as they are duplicates. T / F
pts In this design, we could remove 1 comparison unit inside the IFRF as it is a duplicate. T / F
4.The last MOV instruction in the sequence of the four MOV instructions on the side receives MOV $5, $1;
MOV $5, $2;
forwarding help (though redundantly) _____(2/3/4/5/6/7) times (including inside the IFRF. MOV $5, $3;
MOV $6, $5;
46
pts
STALL_ID
FA_Sel
FB_Sel
STALL_EX1
SKIP_1
FORW_1A
FORW_1B
FORW_2
SKIP_2
EX2_Write
2 ( 20 + 19 = 39 points) 15 min. The FIFO and the ROB labs
2.1 FIFO (single-clock FIFO):
2.1.1 When we use (n+1)-bit pointers for WP[n:0] and RP[n:0] for a 2**n location FIFO, we use the
6 ________ (lower/upper) n bits namely _____________ ([n-1:0]/[n:1]) to ___________________
pts
_____________ (index the array of locations/perform WP-RP to arrive at depth/both/neither).
2.1.2 For a 128-location deep FIFO, since populated depth can vary from completely empty to
6 completely full, we have ________ (127/128/129) depth values and should be doing__ (a/b/c/d).
pts
(a) (WP-RP) mod-64 or (b) (WP-RP) mod-128 or (c) (WP-RP) mod-256 or (d) other
If multiple choices are possible, state where you would use what ? _______________________
____________________________________________________________________________
2.1.3 If you are using just 4-bit pointers for a 16-location FIFO, you need to set a lower threshold RAE
and upper threshold RAF. Four of your junior engineers have set the thresholds as shown below.
Comment/correct/prise them.
8
pts
2.2 ROB: Compare our ROB lab with the Tomasulo part 2 (IoI-OoE-IoC) taught in class:
2.2.1 In Tomasulo, an LS-buffer was provided after cache as cache is a ___________ (fixed/variable)
6 latency unit. Here, the dividers are all ___________ (fixed/variable) latency units.
pts
Once a divider is done, it _______ (A/B).
A. remains in its DONE state until the Issue Units selects it for transfer to ROB. B. goes to ROB.
2 2.2.2 Associative search of ROB is conducted in ______ (A/B/C/D)

pts A. only in the Tomasulo P2, B. only in this ROB lab, C. both, D. neither
2
2.2.3 WP goes ahead and then RP follows it in the ROB of ______ (A/B/C/D)
pts A. only in the Tomasulo P2, B. only in this ROB lab, C. both, D. neither
2.2.4 In the ROB diagrams on the side for the 8-location ROB in this 0 0
1 RP WP 1
lab, indicate the populated locations by shading ( ) 2 2
9
pts them. If you have shaded more than 4 locations, explain how 3 3
4 4
it is possible as there are only 4 single-dividers. ___________ WP
5 5
RP
_________________________________________________ 6 6
7 7
_________________________________________________
3 ( 48 points) 25 min. Branch Prediction
3.1 _________ (Early / Late) branch ___________ (is likely to / will) cause more branch penalty.
_________ (Early / Late) branch ___________ (is likely to / will) cause more dependency stalls.
8
pts
Branch penalty refers to the clocks lost due to __________________ (flushing by/stalling due to/
forwarding to) _________________________ (a taken branch / an untaken branch / any branch).
3.2 Branch direction prediction becomes more important in __________ (deeper / shallow) pipelines.
6
pts Branch direction prediction becomes more important in __________ (out-of-order / in-order)
executing pipelines. Branch target address needs to predicted if branch prediction is done from
the __________ (IF / ID) stage.
3.3 ________ (JAL only / JR$31 only / both / neither) cause changes to the content of the RAS.
8 ________ (JAL only / JR$31 only / both / neither) is helped by the RAS.
pts
RAS stands for _______________________ and may be __________ (4 / 4K) locations deep.
3.4 A 2-bit branch direction predictor is better than a 1-bit predictor. T / F

6 ______________ (However / Similarly) a 3-bit predictor ______________________________
pts ____________________________________________________________________________
____________________________________________________________________________
3.5 No aliasing if you are predicting from _______ (IF / ID) stage, but aliasing is OK if you are
6 predicting from _______ (IF / ID) stage.
pts Out of the two below ______________ (A/B/A and B/neither A nor B) can cause serious drop in performance
A. Predicting a non-branch instruction such as an ADD instruction as a taken branch and correcting later
B. Applying the prediction information of one branch to another branch
3.6 BPB (Branch Prediction Buffer) with depth =2K, needs K K 30-K 00 PC 30-K K 00 PC
0 0
2 bits to index it. The K bits are correctly taken from the PC K-bits 1 K-bits 1
pts in the _________ (Left /Right)-side design. 0 0
1 K 1 K
0 2 0 2
BPB 0
BPB 0
3.7 Our Lab 6 design has no branch prediction. It is equivalent 1 1
Left Right
to predicting always ______ (taken/not-taken). 1 1
8 If the real dynamic execution trace of most codes show that
pts
60% of the conditional branches are taken, it appears that we should choose to predicting "always
taken" rather than predicting "always not-taken". Mr. Trojan voiced a caution. He said ______
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
3.8 In the (m, n) predictor, _______ (m / n / neither) refers to the size in bits of the global history shift
4
pts register. A (m, n) predictor improves prediction accuracy if a branch behavior is correlating to the
past few branches globally. T / F
4 ( 60 + 31 = 91 points) 35 min. CMP, CMT, Cache Coherency, LL/SC
4.1 In the two CMP (Chip Multi Processor) organizations shown below, we discussed in EE457, the
_________ (Left / Right) design. If a CROSS BAR of 8x4 is used as the memory interconnection
7.5 network in the left design, the L2 cache can be divided into ___________ (8 / 4 / other banks).
pts
The L1 cache is ____ (A / B) and L2 cache is ____ (A / B). A. private to a core B. shared by all cores.
The L1 cache is a common resource to all threads of a core. T / F.
L2$ Shared (banked) L2$ Shared L2 cache (no banks)

L2 cache
Memory Interconnection Network Bus
L1$ L1$ L1$ L1$ L1$ L1$

Core 0
C0 C1 C7 C0 C1 C7
4.1.1 Locking the BUS in order to implement an atomic RMW operation on a semaphore is
8.5
_____________ (an old / a current) technique. ___________ (Both/Neither/Only LL/Only SC)
pts requires locking the Bus. During "busy" wait, excessive traffic on the Bus is _____________
(caused / avoided) by spinning around a ____________ (local/global) copy of the lock variable
using LL instruction.
16
4.2 The SCU(s) in ___________ (one / multiple) core(s) may be driving address and BusRdX on the
pts Bus the SCU(s) in ___________ (one / multiple) core(s) may be listening on the bus to invalidate
that block. Division of the cache operation into CCU and SCU is a key feature of a __________
____________ (Blocking/Non-Blocking) cache. In a 4-threaded core, there is/are _____ (1/4)
SCU(s). In an 8-core processor, there is/are _____ (1/8) SCU(s).
In the context of Non-blocking cache, MSHR stands for ____________________________________.
________ (CCU/SCU) leaves a request in MSHR and ________ (CCU/SCU) attends to it.
4.3 In the MPI (Miss rate per instruction) calculations, ____________________ (all / only memory
6
pts accessing) instructions are considered. If L1 cache MPI is 5% and the L1 miss penalty is 10
clocks and if L2 cache MPI is 1% and the L2 miss penalty is 200 clocks, what is the overall CPI
assuming there are no other problems causing lowering of the CPI. ______________________
____________________________________________________________________________
4.4 Assume simple in-order pipeline like the 6-stage Oracle Niagara (T1) processor.
18
pts Switching between threads on every clock in a fine-grain multi-threaded core wastes 1 or more clocks per switch. T / F
Switching between threads because of data cache miss in the running thread wastes 1 or more clocks per switch. T / F
Now assume out of order execution in SMT (same as HTT in Intel terminology). Here do you
incur loss of clocks due to switching threads? ________ (Y / N).
SMT stands for _______________________. HTT stands for ___________________________.
Since threads carry their _______ (thread ID/process ID) with them, they write to their respective
register files in the WB stage. TLB can be common or separate for the threads. T/ F
If it is common, it has ASN (Address Space Number). T/F
Operating System _____ (is/isn’t) informed when a thread switch occurs in a multi-threaded core.
5 4.5 MOESI state encoding: Fill-up the encoding table on the side for the 5 states. Property
Only-Copy bit
pts
Valid bit
Dirty bit
Code
4.6 The advantage of the "E" state is _________________________________
State
____________________________________________________________
Canceled
6 ____________________________________________________________ I (Invalid)
pts S (Shared)
The advantage of the "O" state is _________________________________ O (Owner)
____________________________________________________________ E (Exclsive)
____________________________________________________________ M (Modified
4.7 The word "Flush" in the diagrams below means helping a neighbor L1 cache.
It is _________________ (wrong/wasteful) to flush to the MM also unless needed.
PrRd/-- PrRd/--
EE557 PrWr/--
EE457 PrWr/--
PrWr/ M PrWr/ M
PrWr/-- PrWr/--
BusUpgr BusUpgr
BusRd/ PrWr/ BusRd/ PrWr/

Flush BusUpgr BusRd/ Flush BusUpgr BusRd/
Flush Flush
PrRd/-- PrRd/--
BusRd/Flush BusRd/Flush
PrRd/-- PrRd/--
O S O S
10 PrWr/ E PrWr/ E
pts BusRdX/
BusUpgr/--
BusRdX
PrRd/--
BusRdX/
BusUpgr/--
BusRdX
PrRd/--
BusRdX/-- PrRd(S)/ BusRdX/-- PrRd(S)/
Flush BusRdX/ Flush BusRd BusRdX/
BusRd
Flush Flush
PrRd(S)/ PrRd(S)/
I ‚
BusRd BusUpgr/-- I ‚
BusRd
BusUpgr/--
BusRdX/Flush BusRdX/Flush
BusRd/-- BusRd/--
BusUpgr/-- BusUpgr/--
BusRdX/-- BusRdX/--
Figure 5.12 State-transition diagram of a MOESI protocol

Copyright  2012 Michel Dubois, Murali Annavaram and Per Stenström
Mark appropriate state transitions in the EE457 design with either R/FMM (meaning replacement
causing flush to main memory) or R/-- (meaning replacement causing no flush to main memory).
The R/FMM or R/-- markings for the EE557 design would be identical to EE457 markings. T / F
We ______________ (wish to / do not wish to) defer (postpone) updating the MM as farther in time as possible.
An example of this is (narrate a case of transferring responsibility to update the MM to another cache):
__________________________________________________________________________________
__________________________________________________________________________________
__________________________________________________________________________________
4.8 LL stands for Load _________ and SC stands for Store ___________.
In LL $2, 2000($0); $2 is a _______________________ (source / destination / both).
16 In SC $2, 2000($0); $2 is a _______________________ (source / destination / both).
pts
It is ______________________ (possible / not possible) that a thread in a core executes two LL
instructions with the same address without any intervening SC instruction.
It is ______________________ (possible / not possible) that a thread in a core executes two SC
instructions with the same address without any intervening LL instruction. Successful execution
of a SC $2, 2000($0); by one thread in a core should have broken LL links for the address
2000 first for all threads in other cores when the block was ____________________________
____________________________________________________________________________
and then for all threads in that very core when the SC instruction is ______________________
5 ( 42 points) 20 min. Tomasulo
OoO Execution and In-Order Committing with ROB (Re-Order Buffer)
IoI - OoE - OoC design
IoI - OoE - IoC design Cache
I -Cache
ROB
(Re-order
Front-end I-Fetch B
Buffer)
Queue
Reg File
Dispatch
BPB
Integer Load/Store Div Mult

Queue Queue Queue Queue
Addr Buff

Back-end
Exe Unit Exe Unit Exe Unit Exe Unit
Cache
Issue Unit
LS_Buffer
CDB
Load Buffer
5.1 In the LHS (Left-Hand Side) out of order commitment design (IoI-OoE-OoC), the store word
instructions are committed ___________ (in order / out of order).
10 In this design the register renaming using tokens from the TAG FIFO for the ____________
pts
(source/destination) registers avoids WAW and WAR problems associated with the _________
_____________ (registers only / memory locations only / both). RAW problems among registers
are ___________________ (made to disappear / preserved) by the register renaming.
8 5.2 The memory disambiguation rules are simpler in the ________ (LHS/RHS/both) design(s)
pts because the WAW and WAR problems are taken care of by the strict ______________________
__________________________ in the ________ (LHS/RHS/both) design(s).
2 5.3 Every register write reaches the register file in the ________ (LHS/RHS/both) design(s).
pts
5.4 In his implementation of the RHS design, Mr. Bruin was careless and did not check to see if the
2-bit bypass counter (associated with the load instruction) in the LSQ became FULL when he was letting
a junior SW instruction bypass over the senior LW instructions with matching address. Please
10 explain to him how this could potentially cause deadlocks in his system. __________________
pts ____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
____________________________________________________________________________
12 5.5 For an 8-location ROB using 4-bit pointers for WP and RP some of the four situations below
pts namely ________ (#1, #2, #3, #4) is/are illegal. Among the remaining legal situations any one
of them could be occurring before the other(s) as ROB is a circular buffer. T / F .
Circle the populated locations for the legal situations and state the depth.
8 7 8 7 8 7 8 7
9 6 9 6 9 6 9 6
10 5 10 5 10 5 10 5
11 RP 4 11 WP 4 11 4 11 4
WP #1 RP #2 #3 #4
12 3 12 3 12 3 12 RP 3
13 2 13 2 13 RP WP 2 13 WP 2
14 14 1 14 14
1 1 1
15 0 15 0 15 0 15 0
Depth= _______ Depth= _______ Depth= _______ Depth= _______
6 ( 29 points) 15 min. Virtual Memory:
Virtual Memory:
6.1 PTBR stands for _____________________________________________________________.

6 It is initiated by _________________________ (hardware / operating system) and is utilized by
pts
___________ (MMU / CCU) (i.e. memory management unit or cache control unit) to look up
______________________ (TLB / Page Table / Cache Tag RAM).
6.2 Page Table: Number of A,B,C,D Tables built by the OS: TABLE-I
P Q R S T
PQRST on the side represents a 20-bit (5-digit hex) VPN in a 4-level page 7 2 6 4 5
table with upper 8 bits (PQ) indexing the A-level table, next 4 bits (R)
7 2 6 4 7
indexing the B-level tables, next 4 bits (S) indexing the C-level tables,
and the last 4 bits (T) indexing the D-level tables. Suppose the first 8 7 3 8 6 5
distinct virtual pages accessed by the application program had the VPNs
as stated in TABLE-I (in sorted order).
7 3 8 6 7
How many tables of what size were built by OS by this time? 7 4 9 6 5
12 A-level: _____________________________________________
pts B-level: _____________________________________________ 7 5 9 6 5
C-level: _____________________________________________ 7 6 9 6 5
D-level: _____________________________________________
7 6 9 7 5
8 6.3 Memory addresses: In a 32-bit virtual address system using 4KB pages,
pts state any two consecutive 32-bit word addresses (in hex) which do not fall in the same virtual
page.______________________
I am evicting a page containing the byte with virtual address 2345B789h. What is its virtual page
number (in hex)? __________. What is the range of byte addresses residing in that page (lowest
virtual byte address to highest virtual byte address). ____________________________________
The physical page frame number in the main memory is 2 (just 2). What is the range of byte
addresses residing in that page (lowest physical byte address to highest physical byte address).
___________________________________________________________________________
3
pts
6.4 ________ (VIPT/PIPT) is advantageous over ________ (VIPT/PIPT).

EE-457 Spring

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

EE-457 Spring

Diunggah oleh

Hak Cipta:

Format Tersedia

EE457 Final (~30%)

Closed-book Closed-notes Exam; No cheat sheets;

Student’s Last Name: _______________________________________

Student’s First Name: _______________________________________

Ques# Topic Page# Time Points

1 Lab 7 Part 3 modified 2-5 70 min. 116

Viterbi School of Engineering

MOV $R, $X; ($R) <= ($X) 1 0 0 0 8 800000DS

SUB3 $R, $X; ($R) <= ($X) - 3 0 1 0 0 4 400000DS

ADD4 $R, $X; ($R) <= ($X) + 4 0 0 1 0 2 200000DS

ADD8 $R, $X; ($R) <= ($X) + 8 0 0 0 1 1 100000DS

An ADD4 instruction can now execute from either EX1 or EX2.

May 6, 2016 11:18 am

EE457 Final - Spring 2016

RA ID_RA RESET_B RESET_B

C Copyright 2016 Gandhi Puvvada

Specifics of this semester’s question:

2.1 FIFO (single-clock FIFO):

2 2.2.2 Associative search of ROB is conducted in ______ (A/B/C/D)

3.4 A 2-bit branch direction predictor is better than a 1-bit predictor. T / F

L2$ Shared (banked) L2$ Shared L2 cache (no banks)

L1$ L1$ L1$ L1$ L1$ L1$

BusRd/ PrWr/ BusRd/ PrWr/

Figure 5.12 State-transition diagram of a MOESI protocol

Integer Load/Store Div Mult

Depth= _______ Depth= _______ Depth= _______ Depth= _______

6 ( 29 points) 15 min. Virtual Memory:

6.1 PTBR stands for _____________________________________________________________.

Anda mungkin juga menyukai

Depth= ___ Depth= _ Depth= _ Depth= ___