Anda di halaman 1dari 66

15CSE301:Computer Organization and

Architecture

Amritha School of Engineering

July 12, 2019


Course Contents

Outline
Introduction Computer Organization
Instruction Set Architecture (ISA)
MIPS Architecture & Instruction Set
Introduction to SPIM
Data & Control Path Design
Floating point arithmetic and Data Path
Data Path for Arith/LD/ST/CTR instruction
Control Path for Single Cycle CPU
Multi Cycle CPU Design
Pipelining
Hazards; Data, Control and Structural
Branch Prediction; Static and Dynamic
Memory Hiearchy design
Cache Memory Organization
Main Memeory Interleaving
Books and References
I John L Hennessy and David A Patterson, Computer
Organization and Design, Fifth Edition. Morgan Kaufman,
2013.
II John L Hennessy and David A Patterson, Computer
Architecture, A Quantitative Approach, Fourth Edition.
ELSEVIER, 2003. processor Design, Narosa,

Online
https://www.edx.org/course/computation-structures-2-
computer-architecture
https://www.edx.org/course/computation-structures-3-
computer-architecture
https://nptel.ac.in/courses/106106092/18
Evaluation

Internal Evaluation
Two written tests- 25 marks
Written Assignments-10 marks
Programming Assignments- 10 Marks
Quiz, Viva, Class Performance- 5 Marks

End Semester Exam


Written Test (Closed Book)- 50 Marks
Oraganization of a Computer I

Five Components
Five classic components of a computer input, output,
memory, datapath, and control
datapath + control = processor
Processor + Memory + I/O Devices = computer
Oraganization of a Computer II

Components
Components:
input (mouse, keyboard, camera, microphone...)
output (display, printer, speakers....)
memory (caches, DRAM, SRAM, hard disk drives, Flash....)
network (both input and output)
Our primary focus: the processor (datapath and control)
implemented using billions of transistors
Impossible to understand by looking at each transistor
We need...abstraction!
Oraganization of a Computer III

Abstraction in Computer
Abstraction is the act of representing essential features without
including the background details or explanations
Each of the following abstracts everything below it:
Applications software
Systems software
Assembly Language
Machine Language
Architectural Approaches: Caches, Virtual Memory, Pipelining
Sequential logic, finite state machines
Combinational logic, arithmetic circuits
Boolean logic, 1s and 0s
Transistors used to build logic gates (e.g. CMOS)
Semiconductors/Silicon used to build transistors
Properties of atoms, electrons, and quantum dynamics
Notice how abstraction hides the detail of lower levels, yet
gives a useful view for a given purpose
Oraganization of a Computer IV
Assembly-, Machine-, and High-Level Languages

Hiearchy of Languages
Assembly and Machine Language

Languages
Machine language
Native to a processor: executed directly by hardware
Instructions consist of binary code: 1s and 0s
Assembly language
Slightly higher-level language
Readability of instructions is better than machine language
One-to-one correspondence with machine language instructions
Assemblers translate assembly to machine code
Compilers translate high-level programs to machine code
Either directly, or Indirectly via an assembler
Compiler and Assembler

Transalation Process
Compiler and Assembler

Transalating Language
Instruction Set Architecture I

ISA
What is Computer
Architecture?
Computer Architecture =
Instruction Set Architecture
+ Machine Organization
Instruction Set Architecture II

ISA
A very important abstraction
interface between hardware and low-level software
standardizes instructions, machine language bit patterns, etc.
advantage: different implementations of the same architecture
disadvantage: sometimes prevents using new innovations
Common instruction set architectures:
IA-64, IA-32, PowerPC, MIPS, SPARC, ARM, and others
All are multi-sourced, with different implementations for the
same ISA
Performance Measures

Basics
Performance is determined by execution time
Do any of these other variables equal performance?
# of cycles to execute program?
# of instructions in program?
# of cycles per second?
average # of cycles per instruction?
average # of instructions per second?
Performance Measures

Metrics
Clock cycle time = 1 / clock speed
CPU time = clock cycle time x cycles per instruction x
number of instructions
Influencing factors for each:
clock cycle time: technology and pipeline
CPI: architecture and instruction set design
instruction count: instruction set design and compiler
CPI (cycles per instruction) or IPC (instructions per cycle)
can not be accurately estimated analytically

Execution Time
CPU execution time = Instruction count × average CPI × Clock
cycle time
Performance Metrics

Speedup
Speedup is a ratio = old exec time / new exec time
Improvement, Increase, Decrease usually refer to percentage
relative to the baseline
= (new perf old perf) / old perf

Example
A program ran in 100 seconds on my old laptop and in 70
seconds on my new laptop
What is the speedup?
(1/70) / (1/100) = 1.42
What is the percentage increase in performance?
( 1/70 - 1/100 ) / (1/100) = 42%
What is the reduction in execution time?
30%
CPU Performance: Problem

Example
My new laptop has an IPC that is 20% worse than my old
laptop. It has a clock speed that is 30% higher than the old
laptop. Im running the same binaries on both machines.
What speedup is my new laptop providing?
Solution
Exec time = cycle time * CPI * instrs
Perf = clock speed * IPC / instrs
Speedup = new perf / old perf
= new clock speed * new IPC / old clock speed * old IPC
= 1.3 * 0.8 = 1.04
Power-Energy of CPU

Problem
If processor A consumes 1.4x the power of processor B, but
finishes the task in 20% less time, which processor would you
pick:
1 if you were constrained by power delivery constraints?
2 if you were trying to minimize energy per operation?
3 if you were trying to minimize response times?
Solution:
Power-Energy of CPU

Problem
If processor A consumes 1.4x the power of processor B, but
finishes the task in 20% less time, which processor would you
pick:
1 if you were constrained by power delivery constraints?
2 if you were trying to minimize energy per operation?
3 if you were trying to minimize response times?
Solution:
1 Proc-B
2 Proc-A is 1.4x0.8 = 1.12 times the energy of Proc-B
3 Proc-A is faster, but we could scale up the frequency (and
power) of Proc-B and match Proc-As response time (while still
doing better in terms of power and energy)
Problem

Execution Time
1 Given the following parameters, answer each of the following two

questions for the program P1 on machine M1


Instruction Count = 26,395, Average CPI = 2.17, Clock Rate = 3.24
GHz
1 Calculate the execution time ∆texec for P1 on M1 (show all work):
2 If M1 runs program P2 2.9 times faster than P1 with clock rate and
CPI remaining the same as above, then what variable in the
performance equation changed, and by how much?

Solution
1 ∆texec = IC . CPI . ∆tclock = IC . CPI . (clock rate)−1
= 26,395 instr . 2.17 cycles/instr . (3.24 Gcycles/sec)−1
≈ 17,678 . 10-9 sec ≈ 17.7 mic
2 If CPI and clock rate are unchanged, then the remaining variable is
IC. For P2 to run 2.9 times faster, the runtime must be 1 / 2.9 ≈
0.345 that of P1. This means that IC of P2 must be 0.345 times the
IC of P1, so we have IC(P2)≈ 0.345 . IC(P1) = 0.345 . 26,395 9,102
instructions
Amdahl’s Law

Problem
Suppose a program runs in 100 seconds on a computer, with multiply
operations responsible for 80 seconds of this time. How much do I
have to improve the speed of multiplication if I want my program
to run 2 times faster?
Solution
Execution time after improvement =
Execution time affected by improvement
Amount of improvement + Execution time unaffected
Let n be the amount of improvement
50 = 80 n + (100-80)
n ≈ 2.6
Amdahl’s Law: Speedup Calculation

SpeedUp
Execution time after improvement =
Execution time affected by improvement
Amount of improvement + Execution time unaffected
Speedup =
Execution time before
Execution time affected
Execution time before−Execution time affected+ Amount of Improvement
1
Speedup = Fraction time affected
1−Fraction time affected+ Amount of improvement
1
Speedup = Fraction
1−FractionEnhanced + SpeedupEnhanced
Enhanced
Amdahl’s Law: Speedup Calculation

Problem 1
A new processor is 10 times faster on serving a web application
than the old one. Assume original processor is busy in 40% of
computation and waiting I/O in 60%. What is the overall speed?
Amdahl’s Law: Speedup Calculation

Problem 1
A new processor is 10 times faster on serving a web application
than the old one. Assume original processor is busy in 40% of
computation and waiting I/O in 60%. What is the overall speed?

Solution
FractionEnhanced = 0.4, SpeedupEnhanced = 10
1
Speedup = 1−0.4+ 0.4 = 1.56
10
Amdahl’s Law

Problem 2
Bob is given the job to write a program that will get a speedup of 3.8
on 4 processors. He makes it 95% parallel, and goes home dreaming
of a big pay raise. Using Amdahls law, and assuming the problem
size is the same as the serial version, and ignoring communication
costs, what speedup will Bob actually get?
Amdahl’s Law

Problem 2
Bob is given the job to write a program that will get a speedup of 3.8
on 4 processors. He makes it 95% parallel, and goes home dreaming
of a big pay raise. Using Amdahls law, and assuming the problem
size is the same as the serial version, and ignoring communication
costs, what speedup will Bob actually get?

Solution
FractionEnhanced = 0.95, FractionEnhanced = 4
1
Speedup = 1−0.95+ 0.95 = 3.47
4
Amdahl’s Law

Problem 3
Mary has a problem whose size can increase with an increasing num-
ber of processors. She executes the program and determines that in
a parallel execution on 100 processors. 5% of the time is spent in
the sequential part of the program. What is the scaled speedup of
the program on 100 processors?
Amdahl’s Law

Problem 3
Mary has a problem whose size can increase with an increasing num-
ber of processors. She executes the program and determines that in
a parallel execution on 100 processors. 5% of the time is spent in
the sequential part of the program. What is the scaled speedup of
the program on 100 processors?

Solution
FractionEnhanced = 0.95, FractionEnhanced = 100
1
Speedup = 1−0.95+ 0.95 = 19
100
Average CPI

Calculation
Performance Equation:
CPU Time = Cycle time x Instruction Count x Average CPI
Assuming n different type of instructions, each with count ICi
and requiring CPIi cycles:
CPU Time = Cycle time x Σni=1 (ICi x CPIi )
Then: Pn
i=1 (ICi x CPIi )
Average
Pn CPI = IC
= i=1 CPIi x Fi ,
where Fi is the frequency of instruction type i
Performance with multiple type instructions

Problem
A computer M has the following CPIs for instruction types A
thru D, and a program P has the following mix of instructions
(Note: pct = percent): M 2 : Type A CPI (A) = 1.7 Type B
CPI (B) = 2.1, Type C CPI (C) = 2.7 Type D CPI (D) = 2.4
P 3 : Type A = 22 pct, Type B = 29 pct, Type C = 17 pct,
Type D = remaining pct
Calculate
Pn the average CPI of Machine M:
CPI= i=1 CPIi x Fi ,
= 1.7(0.22) + 2.1(0.29) + 2.7(0.17) + 2.4(0.32)
= 2.21
Calculate the runtime of P on M if IC = 22,311 and clock rate
is 3.3 GHz:
CPU time = IC . CPI . (clock rate) −1
= 22,311 . 2.21 . (3.3 Gcycles/sec) −1
=14.9 µs
MIPS Architecture

MIPS Basics
MIPS: Microprocessor
without Interlocked Pipeline
Stages. Well be working with
the MIPS instruction set
architecture
similar to other
architectures developed
since the 1980’s
Almost 100 million MIPS
processors manufactured in
2002
used by NEC, Nintendo,
Cisco, Silicon Graphics,
Sony,
MIPS Design Principles

Simplicity Favors Regularity


Keep all instructions a single size
Always require three register operands in arithmetic
instructions
Smaller is Faster
Has only 32 registers rater than many more
Good Design Makes Good Compromises
Comprise between providing larger addresses and constants
instruction and keeping instruction the same length
Make the Common Case Fast
PC-relative addressing for conditional branches
Immediate addressing for constant operands
MIPS Registers & Memory
MIPS Registers & Use
MIPS Memory Organization
MIPS Instruction Format
MIPS ALU Instructions

Used for arithmetic, logical, shift instructions


op: Basic operation of the instruction (opcode)
rs: first register source operand
rt: second register source operand
rd: register destination operand
shamt: shift amount (more about this later)
funct: function - specific type of operation
Also called R-Format or R-Type Instructions

Instruction usage (assembly)


add dest, src1, src2 dest=src1 + src2
sub dest, src1, src2 dest=src1 - src2
and dest, src1, src2 dest=src1 AND src2
MIPS Data Transfer Instructions
Transfer data between registers and memory
Instruction format (assembly)
lw $dest, offset($addr) load word
sw $src, offset($addr) store word
Uses:
Accessing a variable in main memory
Accessing an array element
R and I Format
Format Examples

I Format
lw $9, 1200($8) == lw $t1, 1200($t0)
R Format
add $8, $8, $9
MIPS Instruction Types

Arithmetic & Logical - manipulate data in registers


add $s1, $s2, $s3 $s1 = $s2 + $s3
or $s3, $s4, $s5 $s3 = $s4 OR $s5
Data Transfer - move register data to/from memory
lw $s1, 100($s2) $s1 = Memory[$s2 + 100]
sw $s1, 100($s2) Memory[$s2 + 100] = $s1
Branch - alter program flow
beq $s1, $s2, 25 if ($s1==$s1) PC = PC + 4 + 4*25
ALU Instructions

Example
Machine language for
add $8, $17, $18
See reference card for op, funct values
Data Transfer I

Loading a Simple Variable


Data Transfer II

Array Variable
Data Transfer Instructions - Binary Representation

Used for load, store instructions


op: Basic operation of the instruction (opcode)
rs: first register source operand
rt: second register source operand
offset: 16-bit signed address offset (-32,768 to +32,767)
Also called I-Format or I-Type instructions
MIPS Examples I

Transalate to MIPS code


Translate the following Java statement into MIPS assembly code.
Assume that x, y, z, q are stored in registers $s1-$s4. You may use
the other registers to hold intermediate results.

x = x + y + z - q;

Solution
add $t0,$s1,$s2
add $t0,$t0,$s3
sub $s1,$t0,$s4
MIPS Examples II

Transalate to MIPS code


Write equivalent MIPS program for the C code i=N*N+3*N
MIPS Code
lw $t0, 4($gp) # fetch N
mult $t0, $t0, $t0 # N*N
lw $t1, 4($gp) # fetch N
ori $t2, $zero, 3 # 3
mult $t1, $t1, $t2 # 3*N
add $t2, $t0, $t1 # N*N + 3*N
sw $t2, 0($gp) # i = ...
A Simple ALP I

Arithmetic operations
A simple take two numbers from the user and perform basic
arithmetic functions such as addition, subtraction and
multiplication with them

Program Flow
1 Print statements to ask the user to enter the two different
numbers
2 Store the two numbers in different registers and print the
menu of arithmetic instructions to the user
3 Based on the choice made by the user, create branch
structures to perform the commands.
4 Print the result and Exit
A Simple ALP II

Step 1- text for interaction


.data
prompt1: .asciiz "Enter the first number:"
prompt2: .asciiz "Enter the second number:"
menu: .asciiz "Enter the number: 1 => add,
2 => subtract or 3 => multiply:"
resultText: .asciiz "Your final result is: "

Store the user’s choice


text
.globl main
main:
li $t3, 1 #1 into the temporary register $t3
li $t4, 2 #2 into the temporary register $t4
li $t5, 3 #3 into the temporary register $t5
A Simple ALP III
Step 1- Prompt the user to get the first value
li $v0, 4 #command for printing a string
la $a0,prompt1 #loading the string to print
syscall #executing the command

Step 1- Get the first value and save


li $v0, 5 #command for reading an integer
syscall #executing the command
move $t0, $v0 #moving the number read to $t0

Step 1- Prompt the user to get the second value


li $v0, 4 #command for printing a string
la $a0,prompt2 #loading the string to print
syscall #executing the command
A Simple ALP IV

Step 1- Get the second value and save


li $v0, 5 #command for reading an integer
syscall #executing the command
move $t1, $v0 #moving the number read to $t1

• Step 1 Completes

Step 2- Print the menu


li $v0, 4 #command for printing a string
la $a0, menu #loading the string for printing
syscall #executing the command
A Simple ALP V

Step 2- Get the user’s choice


li $v0, 5 #command for reading an integer
syscall #executing the command
move $t2, $v0 # move the choice to $t2

•Step 2 completes

Step 2- Branch for different operations


beq $t2,$t3,addProcess #’addProcess’if $t2 = $t3
beq $t2,$t4,subProcess #’subtractProcess’if $t2 = $t4
beq $t2,$t5,mulProcess #’multiplyProcess’if $t2 = $t5
A Simple ALP VI

Step 3- Addition
addProcess:add $t6,$t0,$t1 # $t6=$t0+$t1
J DisplayResult # Jump to display

Step 3- Subtraction
subProcess:sub $t6,$t0,$t1 # $t6=$t0-$t1
J DisplayResult # Jump to display

Step 3- Multiplication
addProcess:mul $t6,$t0,$t1 # $t6=$t0*$t1
J DisplayResult # Jump to display
A Simple ALP VII

Step 3- Display results and exit


li $v0,4 # for printing a string
la $a0,resultText #loads the string to print
syscall #executes the command
# Print the result
li $v0,1
la $a0, ($t6)
syscall
li $v0,10 #This is to terminate the program
MIPS Program- Arrays I

Array Access
Getting the data from an array cell, e.g, x = list[i];
Storing data into an array cell, e.g. list[i] = x;
Determining the length of an array, i.e. list.length.
MIPS Program- Arrays II

Array as Byte or Word


Declaration:

vowels: .byte ’a’, ’e’, ’i’, ’o’, ’u’


list: .word 3, 0, 1, 2, 6, -2, 4, 7, 3, 7

To Access array:

la $t3, list # put address of list into $t3


lw $t4, 0($t3) # get the value from the array cell
sw $t2, 12($t3) # store the value into the array cell

Address Calculation:
Address of vowels[k] == vowels + k
Address of list[k] == list + 4 * k
MIPS Program- Arrays III

Sum of the Array


Program Flow
Declare and initialize the array and result message in data
dection
Initialize the counter and sum
Start a loop, get the first no, add to sum, increment counter,
check loop
Display the result

Step1:Initialize the array


data
list: .word 3, 2, 1, 0, 1, 2
result: .asciiz "\n The sum of the array is:""
MIPS Program- Arrays IV

Step2: Initialize Sum, Counter and Index


.text
.globl main
main:
li $s0, 0 # Counter
li $a0, 0 # Sum
li $t0, 0 # Index
MIPS Program- Arrays V

Step3: Loop for addition


forsum:
bge $s0, 6, end_forsum
lw $t1,list($t0) # Load the number from array
add $a0, $a0, $t1 # Compute the sum
move $t2, $a0 # Copy of the sum
addi $t0,$t0,4 # Increment index
addi $s0,$s0,1 #Increment counter
j forsum
end_forsum:
MIPS Program- Arrays VI

Step4: Display the result


li $v0, 4
la $a0, result
syscall

move $a0, $t2 # Move back the result


li $v0,1
syscall # Print sum
li $v0,10 # Terminate program
syscall
Largest of Array I

Program Flow
1 Declare and initialize array, array-size
2 First iteration
Set the counter<– array-size
get the address of the array to a register
get the first value and set as max
increment pointer
decrement count
3 Get into loop
Get next number
Compare with max
replace max if a new max is found
increment pointer, decrement count
4 Display the result
Largest of Array II

Declare and initialize array


.data
array: .word 8,2,31,81,12,10
size: .word 6
max: .word 0
Largest of Array III

Do first iteration
.text
.globl main
main:
jal start
start:
li $t3, 6 #size
la $t1, array # get array address
lw $s5, ($t1) # set max, $s5 to array[0]
add $t1, $t1, 4 # skip array[0]
add $t3, $t3, -1 # len = len - 1
Largest of Array IV

Go to loop
loop:
lw $t4, ($t1) # get n of array[n]
ble $t4,$s5,L1 #if t4 is not less than t5 we got a new max
lw $s5, 0($t1) #get element in array <-- Pretty sure im do
#sw $s5,0($t1) #max
L1: add $t3, $t3, -1 #counter-1
addi $t1, $t1, 4 # advance array pointer
bnez $t3, loop #if not 0 then go on and loop
Largest of Array V

Display Result & Exit


sw $s5, max #printing the max val of the array
lw $a0, max
li $v0, 1
syscall
li $v0, 10