Anda di halaman 1dari 42


12:14 AM

COMP273 Page 1
COMP273 Page 2
COMP273 Page 3
COMP273 Page 4
COMP273 Page 5
1:01 PM

COMP273 Page 6
COMP273 Page 7
COMP273 Page 8
COMP273 Page 9
1:08 PM

COMP273 Page 10
COMP273 Page 11
COMP273 Page 12
1:06 PM

COMP273 Page 13
COMP273 Page 14
COMP273 Page 15
COMP273 Page 16
1:08 PM

COMP273 Page 17
COMP273 Page 18
COMP273 Page 19
COMP273 Page 20
COMP273 Page 21
1:07 PM

COMP273 Page 22

Audio recording
started: 1:39 PM

COMP273 Page 23
COMP273 Page 24
COMP273 Page 25
1:07 PM


Audio recording started: 1:08 PM September-23-10

COMP273 Page 26
COMP273 Page 27
1:04 PM

The smallest thing you can ask for is the register value or a byte. Since the status register is an entire register, you
can't ask for a single bit.

If the status register looks like : 10101111 and you are interested in a particular bit, how do you simplify?
& 00000100
If the register produces a 0 then you know that the 3rd bit from the R was a 0, else it
was a 1 and it produces a value != 0


Audio recording started: 1:12 PM September-28-10



For question 4 of Assignment 1:

1. Cmd looks like this: 10101
2. Cmd looks like this: 01010 etc…

• Answer to the question I had: Send one character, and then send the next character, then send the next

Programmable Combinatorial Parts (PLA, PROM, ROM)

Programmable circuit is like a grid where if you connect the grid, anything coming down the wire can
pass through it.

Use the dots to show that you want to connect them. Another short hand where we show:

COMP273 Page 28
Can send binary numbers to the machine - input enters decoder, to trigger particular wires


PAL has programmable side on AND array, not programmable OR array (ie. Fixed)

PLA has programmable AND and OR arrays -- ie. Both L and R are programmable

PLA Question: Basically want to make a decoder

PLA: Programmable Logic Array

PAL: Programmable Array Logic

More Questions:

Assume a temperature is stored on a register….

COMP273 Page 29
Micro Architecture (Lecture 8)

Atomic: each instruction does one and only one activity. If it does add, it adds - won't subtract.

Processing unit = CPU; job is to move instruction from RAM to control unit
- Control unit's job is to realize what the instruction is and execute it

RAM(input) had the program; RAM(output) had the answer

This is really pretty much identical to how computers operate today. No other alternatives really exist (kind of
quantum computing)

Modern Memory Types:

RAM: primary storage Random Access Memory

ROM: Read only Memory

Cache: Represented very similarly to Von Neumann representation of information memory in, memory out

Pipeline: Memory but also CPU (important concept); all modern CPUs now are pipeline CPUs

Basic CPU Architecture & Processing

Classical CPU Design:

From the system bus line, moving RIGHT - CPU
From the system bus line, moving LEFT - Rest of computer

If you don’t have a cache, and are loading everything from RAM, 'beat' required to move the instruction
through each element - takes a lot of effort to transfer it.

If instruction is already in cache, takes much less time

Key Components:

• Program counter - kind of acts like a pointer; points to ram. Contains address of instruction; after
instruction executed, it goes through a simple adder , 1 is added to it and it moves to the next

COMP273 Page 30

COMP273 Page 31
1:07 PM

Classical CPU Design


Address Byte
101 BEQ R3, 200 (*if (R3==0) goto 200*)
100 Add R3, R2, R1 (*R3 = R2+R1*)

^ connects to Add. ^ connects to Byte

Address Reg. Data Reg.


Audio recording started: 1:25 PM


COMP273 Page 32
COMP273 Page 33
COMP273 Page 34
Micro Architecture Part 2
1:08 PM

- Assignment 2 Out
- Midterm exam postponed to October 19th tentative

If we do a later exam, will still just include anything NOT related to programming (as we may have started programming in assembler MIPS.)

Pipeline Optimized Architecture

Classic CPU Architecture VS Pipeline Optimized

- Pipeline, instead of being spread out, is in a sort of straight line - like car conveyor belt production, you can accomplish more things at once
- Pipeline has 2 caches - one for instruction, one for data -- improves efficiency of the pipeline
- Pipeline linear activity allows to execute more than one instruction at once

- Program counter contains address of the instruction of the next instruction to execute
- Instruction memory holds the addresses of instructions

RAM --> Cache I --> Pipeline --> Cache Data

- As soon as we can download our code into the cache, we can take advantage of the pipeline
- Registers organized sort of like RAM: have an integer number that enters and selects which register you want to use (similar to A1, Q2 RAM circuit)
○ Can select up to 3 registers at once; if you need 4 register, have to find a different way to implementing since only wired u p to 3 registers MAX
○ Can ask for 3 registers, but can only get 2 registers out of the box
- Registers move directly to the ALU or can skip the ALU if not needed and go directly to the data memory cache
- Otherwise, two wires from the registers go into the ALU and the answer is output to the data memory cache

Should be able to create the high-level version of the diagrams. PIPELINE, CLASSICAL, HOW TO CODE IN MIPS -- what this course
is about.

Pipeline Architecture
Organizing the pipeline in 4 different stages, and ensuring are different boundaries (no other wires interfere with the process)
- Imagine a bunch of AND gates that stop once fetch has been completed and stops activity
- Similar for LOAD, ALU, STORE -- they're all segregated into these particular steps.

If truly separated, means that you could have 4 different instructions inside the CPU executing at the same time, at different stages.

ADD Y loaded from PC to instruction memory, held by AND gate - clock ticks, AND gates open, ADD Y passes through to LOAD
ADD Y, 5, 2 <<== 5, 2 are basically saying that when LOADED, "give me the contents of register 5 and register 2"
5,2 comes out of registers and ADD Y instruction tells ALU it has to add
Address Y gets answer from the ALU that comes out of the ALU and is stored in Data Memory

While this is happening other instructions are being loaded and held back by the AND gates separating each section.

Key improvement on Classical

ADD y, 10, 2
- One tick for load
- Get 10 to ALU
- Get 2 to ALU
- Add
- Y data saved

Therefore with the classical architecture, would take 4-5 ticks to get the information stored whereas using the pipelining method would allow other
instructions to be executed simultaneously, processing ~10 instructions

Drawback of Pipeline: All instructions executed in same number of ticks (everything needs to go through the pipeline even if it doesn't need to)
IE. Car has to sit in production waiting for 'fancy decals' that it won't actually get, until it can move to the next station.
- But the benefit far outweighs the negative

Timing issues:
BREQ R0 ==> if R0 = 0…
The instructions are based on previous instruction answers
- TIMING FAULT: all instructions in the pipeline are cancelled ---> BAD; all advantage is lost, and would have to re-load everything again

PC --> RAM --> IR (instruction register) --> CU --- does all the work <<CLASSICAL MODEL>>

PC --> CACHE --> IR1 (not shown in slide 4 diagram...this is how we hold the instruction)--> Registers ((and IR1 --> another IR2)) <<PIPELINE MODEL>>

COMP273 Page 35
PC --> RAM --> IR (instruction register) --> CU --- does all the work <<CLASSICAL MODEL>>

PC --> CACHE --> IR1 (not shown in slide 4 diagram...this is how we hold the instruction)--> Registers ((and IR1 --> another IR2)) <<PIPELINE MODEL>>

EX. ADD Y,5,2 --> Since IR1 takes the information, it must download the instruction to IR2 so that it can be used in ALU and
adding portion must be in IR3 so that it can be stored in the data cache/memory

- Instead of having a control unit (CU) at the end as with Classical CPU, there are typically CUs at every stage in the Pipeline CPU

Fetch Portion of the CPU

R-Format Instruction: Instruction that needs to use the ALU

+4 since each instruction is 32 bits long= 4 bytes, so every instruction increments by 4

Load Portion of CPU

Can split wires to have different instructions moving different places

Add OP1/S1/S2/D <<== S1 and S2 hold variables since adding variable to variable

Addi OP2/S1/CONSTANT/D <<== doesn't need second S2 because the S2 equivalent is stored as a constant since var + const.

Sign extend: Takes 16 bits and turns into 32 bits

- Assume constant is signed, then has MSB with the sign (0 if positive, 1 if negative)
- If 2's complement number, if negative, will have a whole bunch of ones and then the actual number
- If it's a positive number, will have a bunch of numbers and then zeros
- Can take all 16 bits (0 - 15) and take 15 (MSB) and make a bunch of wires out of it since it will be one of two patterns if it
was a two's complement number (and then you have 0 - 31)

COMP273 Page 36
ALU Portion of CPU

- Can address all of RAM with 32 bits; if you only have 16 bits can only jump so far in RAM

- To get to a particular address, stores how many instructions away it is (positively or negatively) and multiplied by 4 is the
address + PC -- lets you get 4 times farther than the 16 bits allows you to 'travel'
- Shift left 2 : If you shift bits over by 2, you multiply by 2 everytime you shift a bit (if you shift twice, multiply by four!)
○ 0001 - 1
○ 0010 - 2 SHIFTING BY 2
○ 0100 - 4

Shifts by 4!!

Delayed Branching

- Branching is a delayed activity -- happens just in case it will be used; machine doesn’t know if
BEQZ R0, … will be true at the branching stage ; must assume it is true "just in case"

Single Clock Cycle: Load/Store/ALU/B… ? Slide 17

Mux = multi-plexor

COMP273 Page 37
Micro Architecture Part 2 cont'd
1:09 PM

If you're having trouble understanding anything here, read the textbook! Slides are actually scanned from textbook, so very similar.

Final Summary Layout

Data memory should be thought of as very similar to Instruction Memory which should be thought of as similar to what we did for
Assignments 1 and 2.

Can't have a simple PC that just loops - because there might me other things connected to it. Therefore, once the ALU adds 4 from the PC, it
moves through the other elements

In MIPS, have two control unit.

ALU control controlled by 5 bits that come out of instruction (15-0); then controls ALU
- How do you have 5-0 bits from 15-0? OP code specifies bits - so the instruction could have an address on it, but could also have
some sort of OP based instruction
 Good because it lets you re-use the wires, and just take out the bits that are necessary.

MIPS does OP codes in a special way to optimize how the pipeline works.

'Control' control unit shows the wires 'coming out'

- RegWrite: Register write - tells we want to write something to the destination along that wire

MIPS 4 Instruction Classes

- Every instruction has its own unique OP code

- OP codes are classed

R Type is a popular class (add, sub, and, or, slt)

- All have same OP code: OP code 0

R-type | 0 | |mini OP code|

- When 0 sent to Control Unit (CU) it tells it that it needs to use the ALU
- Information on how we want to use the ALU is stored in the 'back'
- Mini-Op goes directly to ALU -> CU

Load & Store Class:

- Specifically take stuff out of RAM - special because they reach outside of the pipeline
- Instructions that should not be used because as soon as you do use them, you break the flow of the pipeline
- Base = PC; wherever you are currently in program counter
- Good to declare a function like:

fn ()
/*declare variables*/

return whatever;

Branch Instructions:

Jump Instruction:
- OP Code 2
- Can jump farther than other operations

Remember that the CPU can't use RAM - it can only use registers. So data must be moved to the registers so that it can be used by the

COMP273 Page 38
Remember that the CPU can't use RAM - it can only use registers. So data must be moved to the registers so that it can be used by the

R-Type Instruction Processing

Inactive MIPS CPU

R-type wakes up and says it's ready to go…

1) FETCH: Address register of the PC goes into the Read Address, and goes up to the Add register and goes up to the ADD ALU and MUX.
Only continues back around to the PC counter when the MUX produces a 1
2) LOAD: 32 wires are split up in separate paths - wires go different places; all places where things stop because they wait for signals to
come from other elements of the CPU; no control for reading information - as soon as the info goes in, it is read in corresponding

R2000 - 210000 MIPS

- Power PC
- Simple --> can optimize

- Complex --> mini optimize
- Advantage: arraysum dest, ptr, cellcount => would sum the entire array REALLY fast; can't do in RISC without building as a program

The Control Unit

See slides for definitions

Microprogramming: ADD has a set of wires that carries out that activity (that's the micro program)

Flat: another word for 'Classical' CPU

Slide 5
- The 'register opcode field' is the opcode that's coming in from the opcode register/instruction register

Add dest,source1,source2
Really does:
Get s1& s2
Add s1 with s2
Dest. = answer

3 Separate steps, but only one opcode - so how is it accomplished?

What happens:
- Instruction register contains OPCODE
□ Opcode comes in and extra wires are added to it within the control unit

COMP273 Page 39
For each instruction, can format them with and gates to determine which code is given.
In some way, it's kind of like a huge case statement where each of the different codes are different

In a pipeline machine, all have same number of steps - so you would construct instructions in a way
that they would always have the same number of steps.

Only difference between a FLAT and PIPELINE sequencer is that a FLAT has everything in it - big long

PIPELINE (SLIDE 8) can have many sequencers; in MIPS have 2.

COMP273 Page 40
2:10 PM

COMP273 Page 41
COMP273 Page 42