Anda di halaman 1dari 222

1

CS 2253
COMPUTER ORGANIZATION AND
ARCHITECTURE
K. R. Sarath Chandran
Assistant Professor (CSE Dept)
SSN College of Engineering
2
Unit 1
3
Basic Structure of
Computers
4
Functional Units
K. R. Sarath Chandran. AP/CSE/SSNCE
5
Functional Units
Figure 1.1. Basic functional units of a computer.
I/O Processor
Output
Memory
Input
and
Arithmetic
logic
Control
K. R. Sarath Chandran. AP/CSE/SSNCE
6
Information Handled by a
Computer
Instructions/machine instructions
Govern the transfer of information within a computer as
well as between the computer and its I/O devices
Specify the arithmetic and logic operations to be
performed
Program
Data
Used as operands by the instructions
Source program
Encoded in binary code 0 and 1
K. R. Sarath Chandran. AP/CSE/SSNCE
7
Memory Unit
Store programs and data
Two classes of storage
Primary storage
Fast
Programs must be stored in memory while they are being executed
Large number of semiconductor storage cells
Processed in words
Address
RAM and memory access time
Memory hierarchy cache, main memory
Secondary storage larger and cheaper
K. R. Sarath Chandran. AP/CSE/SSNCE
8
Arithmetic and Logic Unit
(ALU)
Most computer operations are executed in
ALU of the processor.
Load the operands into memory bring them
to the processor perform operation in ALU
store the result back to memory or retain in
the processor.
Registers
Fast control of ALU
K. R. Sarath Chandran. AP/CSE/SSNCE
9
Control Unit
All computer operations are controlled by the control
unit.
The timing signals that govern the I/O transfers are
also generated by the control unit.
Control unit is usually distributed throughout the
machine instead of standing alone.
Operations of a computer:
Accept information in the form of programs and data through an
input unit and store it in the memory
Fetch the information stored in the memory, under program control,
into an ALU, where the information is processed
Output the processed information through an output unit
Control all activities inside the machine through a control unit
K. R. Sarath Chandran. AP/CSE/SSNCE
10
The processor : Data Path and
Control
Two types of functional units:
elements that operate on data values (combinational)
elements that contain state (state elements)
K. R. Sarath Chandran. AP/CSE/SSNCE
11
Five Execution Steps
Load: Reg[IR[20-16]] =
MDR
Memory read completion
Load:MDR =Mem[ALUOut]
or
Store:Mem[ALUOut] = B
Reg[IR[15-11]] =
ALUOut
Memory access or R-type
completion
PC=PC[31-
28]||(IR[25-
0]<<2)
IF(A==B) Then
PC=ALUOut
ALUOut = A+sign
extend(IR[15-0])
ALUOut = A op B Execution, address
computation, branch/jump
completion
A = Reg[IR[25-21]]
B = Reg[IR[20-16]]
ALUOut = PC + (sign extend (IR[15-0])<<2)
Instruction decode/ register
fetch
IR = MEM[PC]
PC = PC + 4
Instruction fetch
Action for Action for
jumps jumps
Action for Action for
branches branches
Action for Memory Action for Memory- -
reference Instructions reference Instructions
Action for R Action for R- -type type
instructions instructions
Step name Step name
12
Basic Operational
Concepts
K. R. Sarath Chandran. AP/CSE/SSNCE
13
Review
Activity in a computer is governed by instructions.
To perform a task, an appropriate program
consisting of a list of instructions is stored in the
memory.
Individual instructions are brought from the memory
into the processor, which executes the specified
operations.
Data to be used as operands are also stored in the
memory.
K. R. Sarath Chandran. AP/CSE/SSNCE
14
A Typical Instruction
Add LOCA, R0
Add the operand at memory location LOCA to the
operand in a register R0 in the processor.
Place the sum into register R0.
The original contents of LOCA are preserved.
The original contents of R0 is overwritten.
Instruction is fetched from the memory into the
processor the operand at LOCA is fetched and
added to the contents of R0 the resulting sum is
stored in register R0.
K. R. Sarath Chandran. AP/CSE/SSNCE
15
Separate Memory Access and
ALU Operation
Load LOCA, R1
Add R1, R0
Whose contents will be overwritten?
K. R. Sarath Chandran. AP/CSE/SSNCE
16
Connection Between the
Processor and the Memory
Figure 1.2. Connections between the processor and the memory.
Processor
Memory
PC
IR
MDR
Control
ALU
R
n 1 -
R
1
R
0
MAR
n general purpose
registers
K. R. Sarath Chandran. AP/CSE/SSNCE
17
Registers
Instruction register (IR)
Program counter (PC)
General-purpose register (R
0
R
n-1
)
Memory address register (MAR)
Memory data register (MDR)
K. R. Sarath Chandran. AP/CSE/SSNCE
18
Typical Operating Steps
Programs reside in the memory through input
devices
PC is set to point to the first instruction
The contents of PC are transferred to MAR
A Read signal is sent to the memory
The first instruction is read out and loaded
into MDR
The contents of MDR are transferred to IR
Decode and execute the instruction
K. R. Sarath Chandran. AP/CSE/SSNCE
19
Typical Operating Steps
(Cont)
Get operands for ALU
General-purpose register
Memory (address to MAR Read MDR to ALU)
Perform operation in ALU
Store the result back
To general-purpose register
To memory (address to MAR, result to MDR Write)
During the execution, PC is
incremented to the next instruction
K. R. Sarath Chandran. AP/CSE/SSNCE
20
Interrupt
Normal execution of programs may be preempted if
some device requires urgent servicing.
The normal execution of the current program must
be interrupted the device raises an interrupt
signal.
Interrupt-service routine
Current system information backup and restore (PC,
general-purpose registers, control information,
specific information)
K. R. Sarath Chandran. AP/CSE/SSNCE
21
Bus Structures
There are many ways to connect different
parts inside a computer together.
A group of lines that serves as a connecting
path for several devices is called a bus.
Address/data/control
K. R. Sarath Chandran. AP/CSE/SSNCE
22
Bus Structure
Single-bus
Figure 1.3. Single-bus structure.
Memory Input Output Processor
K. R. Sarath Chandran. AP/CSE/SSNCE
23
Speed Issue
Different devices have different
transfer/operate speed.
If the speed of bus is bounded by the slowest
device connected to it, the efficiency will be
very low.
How to solve this?
A common approach use buffers.
24
Performance
K. R. Sarath Chandran. AP/CSE/SSNCE
25
Performance
The most important measure of a computer is
how quickly it can execute programs.
Three factors affect performance:
Hardware design
Instruction set
Compiler
K. R. Sarath Chandran. AP/CSE/SSNCE
26
Performance
Processor time to execute a program depends on the hardware
involved in the execution of individual machine instructions.
Main
memory
Processor
Bus
Cache
memory
Figure 1.5. The processor cache.
K. R. Sarath Chandran. AP/CSE/SSNCE
27
Performance
The processor and a relatively small cache
memory can be fabricated on a single
integrated circuit chip.
Speed
Cost
Memory management
K. R. Sarath Chandran. AP/CSE/SSNCE
28
Processor Clock
Clock, clock cycle, and clock rate
The execution of each instruction is divided
into several steps, each of which completes
in one clock cycle.
Hertz cycles per second
K. R. Sarath Chandran. AP/CSE/SSNCE
29
Basic Performance Equation
T processor time required to execute a program that has been
prepared in high-level language
N number of actual machine language instructions needed to
complete the execution (note: loop)
S average number of basic steps needed to execute one
machine instruction. Each step completes in one clock cycle
R clock rate
Note: these are not independent to each other
R
S N
T

=
How to improve T?
K. R. Sarath Chandran. AP/CSE/SSNCE
30
Pipeline and Superscalar
Operation
Instructions are not necessarily executed one after
another.
The value of S doesnt have to be the number of
clock cycles to execute one instruction.
Pipelining overlapping the execution of successive
instructions.
Add R1, R2, R3
Superscalar operation multiple instruction
pipelines are implemented in the processor.
Goal reduce S (could become <1!)
K. R. Sarath Chandran. AP/CSE/SSNCE
31
Clock Rate
Increase clock rate
Improve the integrated-circuit (IC) technology to make
the circuits faster
Reduce the amount of processing done in one basic step
(however, this may increase the number of basic steps
needed)
Increases in R that are entirely caused by
improvements in IC technology affect all
aspects of the processors operation equally
except the time to access the main memory.
K. R. Sarath Chandran. AP/CSE/SSNCE
32
CISC and RISC
Tradeoff between N and S
A key consideration is the use of pipelining
S is close to 1 even though the number of basic steps
per instruction may be considerably larger
It is much easier to implement efficient pipelining in
processor with simple instruction sets
Reduced Instruction Set Computers (RISC)
Complex Instruction Set Computers (CISC)
K. R. Sarath Chandran. AP/CSE/SSNCE
33
Compiler
A compiler translates a high-level language program
into a sequence of machine instructions.
To reduce N, we need a suitable machine instruction
set and a compiler that makes good use of it.
Goal reduce NS
A compiler may not be designed for a specific
processor; however, a high-quality compiler is
usually designed for, and with, a specific processor.
K. R. Sarath Chandran. AP/CSE/SSNCE
34
Performance Measurement
T is difficult to compute.
Measure computer performance using benchmark programs.
System Performance Evaluation Corporation (SPEC) selects and
publishes representative application programs for different application
domains, together with test results for many commercially available
computers.
Compile and run (no simulation)
Reference computer

=
=
=
n
i
n
i
SPEC rating SPEC
rating SPEC
1
1
) (
under test computer on the time Running
computer reference on the time Running

K. R. Sarath Chandran. AP/CSE/SSNCE
35
Multiprocessors and
Multicomputers
Multiprocessor computer
Execute a number of different application tasks in parallel
Execute subtasks of a single large task in parallel
All processors have access to all of the memory shared-memory
multiprocessor
Cost processors, memory units, complex interconnection networks
Multicomputers
Each computer only have access to its own memory
Exchange message via a communication network message-
passing multicomputers
36
Machine Instructions
and Programs
K. R. Sarath Chandran. AP/CSE/SSNCE
37
Objectives
Machine instructions and program execution,
including branching and subroutine call and return
operations.
Number representation and addition/subtraction in
the 2s-complement system.
Addressing methods for accessing register and
memory operands.
Assembly language for representing machine
instructions, data, and programs.
Program-controlled Input/Output operations.
Operations on stack, queue, list, linked-list, and
array data structures.
38
Number, Arithmetic
Operations, and
Characters
K. R. Sarath Chandran. AP/CSE/SSNCE
39
Unsigned Integer
Consider a n-bit vector of the form:
where a
i
=0 or 1 for i in [0, n-1].
This vector can represent positive integer values V =
A in the range 0 to 2
n
-1, where
0 3 2 1
a a a a A
n n n


=
0
0
1
1
2
2
1
1
2 2 2 2 a a a a A
n
n
n
n
+ + + + =

K. R. Sarath Chandran. AP/CSE/SSNCE


40
Signed Integer
3 major representations:
Sign and magnitude
Ones complement
Twos complement
Assumptions:
4-bit machine word
16 different values can be represented
Roughly half are positive, half are negative
K. R. Sarath Chandran. AP/CSE/SSNCE
41
Sign and Magnitude
Representation
0000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6
+7
-0
-1
-2
-3
-4
-5
-6
-7
0 100 = + 4

1 100 = - 4
+
-
High order bit is sign: 0 = positive (or zero), 1 = negative
Three low order bits is the magnitude: 0 (000) thru 7 (111)
Number range for n bits = +/-2
n-1
-1
Two representations for 0
K. R. Sarath Chandran. AP/CSE/SSNCE
42
Ones Complement
Representation
Subtraction implemented by addition & 1's complement
Still two representations of 0! This causes some problems
Some complexities in addition
0000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6
+7
-7
-6
-5
-4
-3
-2
-1
-0
0 100 = + 4

1 011 = - 4
+
-
K. R. Sarath Chandran. AP/CSE/SSNCE
43
Twos Complement
Representation
0000
0111
0011
1011
1111
1110
1101
1100
1010
1001
1000
0110
0101
0100
0010
0001
+0
+1
+2
+3
+4
+5
+6
+7
-8
-7
-6
-5
-4
-3
-2
-1
0 100 = + 4

1 100 = - 4
+
-
Only one representation for 0
One more negative number than positive
number
like 1's comp
except shifted
one position
clockwise
K. R. Sarath Chandran. AP/CSE/SSNCE
44
Binary, Signed-Integer
Representations
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
0
0
1
1
0
0
0
0
1
1
0
0
1
1
1
0
1
0
1
0
1
0
0
1
0
1
0
1
0
1
1 +
1 -
2 +
3 +
4 +
5 +
6 +
7 +
2 -
3 -
4 -
5 -
6 -
7 -
8 -
0 +
0 -
1 +
2 +
3 +
4 +
5 +
6 +
7 +
0 +
7 -
6 -
5 -
4 -
3 -
2 -
1 -
0 -
1 +
2 +
3 +
4 +
5 +
6 +
7 +
0 +
7 -
6 -
5 -
4 -
3 -
2 -
1 -
b
3
b
2
b
1
b
0
Sign and
magnitude
1's complement 2's complement
B Values represented
Figure 2.1. Binary, signed-integer representations.
Page 28
K. R. Sarath Chandran. AP/CSE/SSNCE
45
Comparison
Sign and Magnitude
Cumbersome addition/subtraction
Must compare magnitudes to determine sign of
result
Ones Complement
Simply bit-wise complement
Twos Complement
Simply bit-wise complement + 1
K. R. Sarath Chandran. AP/CSE/SSNCE
46
Addition of Positive Numbers
Figure 2.2. Addition of 1- bit numbers.
Carry- out
1
1
+
0 1 1
0
1 +
0
0
0
+
1
0
1
+
K. R. Sarath Chandran. AP/CSE/SSNCE
47
Addition and Subtraction Sign
Magnitude
4
+ 3
7
0100
0011
0111
-4
+ (-3)
-7
1100
1011
1111
result sign bit is the
same as the operands'
sign
4
- 3
1
0100
1011
0001
-4
+ 3
-1
1100
0011
1001
when signs differ,
operation is subtract,
sign of result depends
on sign of number with
the larger magnitude
K. R. Sarath Chandran. AP/CSE/SSNCE
48
Addition and Subtraction 1s
Complement
4
+ 3
7
0100
0011
0111
-4
+ (-3)
-7
1011
1100
10111
1
1000
4
- 3
1
0100
1100
10000
1
0001
-4
+ 3
-1
1011
0011
1110
End around carry
End around carry
K. R. Sarath Chandran. AP/CSE/SSNCE
49
Addition and Subtraction 1s
Complement
Why does end-around carry work?
Its equivalent to subtracting 2 and adding 1
n
M - N = M + N = M + (2 - 1 - N) = (M - N) + 2 - 1
n
n
(M > N)
-M + (-N) = M + N = (2 - M - 1) + (2 - N - 1)
= 2 + [2 - 1 - (M + N)] - 1
n
n
n n
M + N < 2
n-1
after end around carry:
= 2 - 1 - (M + N)
n
this is the correct form for representing -(M + N) in 1's comp!
K. R. Sarath Chandran. AP/CSE/SSNCE
50
Addition and Subtraction 2s
Complement
4
+ 3
7
0100
0011
0111
-4
+ (-3)
-7
1100
1101
11001
4
- 3
1
0100
1101
10001
-4
+ 3
-1
1100
0011
1111
If carry-in to the high
order bit =
carry-out then ignore
carry
if carry-in differs from
carry-out then overflow
Simpler addition scheme makes twos complement the most common
choice for integer number systems within digital systems
K. R. Sarath Chandran. AP/CSE/SSNCE
51
Addition and Subtraction 2s
Complement
Why can the carry-out be ignored?
-M + N when N > M:
M* + N = (2 - M) + N = 2 + (N - M)
n n
Ignoring carry-out is just like subtracting 2
n
-M + -N where N + M < = 2
n-1
-M + (-N) = M* + N* = (2 - M) + (2 - N)
= 2 - (M + N) + 2
n n
After ignoring the carry, this is just the right twos complement
representation for -(M + N)!
n n
K. R. Sarath Chandran. AP/CSE/SSNCE
52
2s-Complement Add and
Subtract Operations
1 1 0 1
0 1 1 1
0 1 0 0
0 0 1 0
1 1 0 0
1 1 1 0
0 1 1 0
1 1 0 1
0 0 1 1
1 0 0 1
0 1 0 1
1 1 1 0
1 0 0 1
1 1 1 1
1 0 0 0
0 0 1 0
0 0 1 1
0 1 0 1
4 + ( )
2 - ( )
3 + ( )
2 - ( )
8 - ( )
5 +
( )
+
+
+
+
+
+
1 1 1 0
0 1 0 0
1 0 1 0
0 1 1 1
1 1 0 1
0 1 0 0
6 - ( )
2 - ( )
4 + ( )
3 - ( )
4 + ( )
7 + ( )
+
+
(b)
(d) 1 0 1 1
1 1 1 0
1 0 0 1
1 1 0 1
1 0 0 1
0 0 1 0
0 1 0 0
0 1 1 0
0 0 1 1
1 0 0 1
1 0 1 1
1 0 0 1
0 0 0 1
0 0 1 0
1 1 0 1
0 1 0 1
0 0 1 0
0 0 1 1
5 - ( )
2 + ( )
3 +
( )
5 + ( )
2 + ( )
4 +
( )
2 - ( )
7 - ( )
3 - ( )
7 - ( )
6 + ( )
3 + ( )
1 + ( )
7 - ( )
5 - ( )
7 - ( )
2 + ( )
3 - ( )
+
+
-
-
-
-
-
-
(a)
(c)
(e)
(f)
(g)
(h)
(i)
(j)
Figure 2.4. 2's-complement Add and Subtract operations.
Page 31
K. R. Sarath Chandran. AP/CSE/SSNCE
53
Overflow - Add two positive numbers to get a
negative number or two negative numbers to
get a positive number
5 + 3 = -8
-7 - 2 = +7
0000
0001
0010
0011
1000
0101
0110
0100
1001
1010
1011
1100
1101
0111
1110
1111
+0
+1
+2
+3
+4
+5
+6
+7
-8
-7
-6
-5
-4
-3
-2
-1
0000
0001
0010
0011
1000
0101
0110
0100
1001
1010
1011
1100
1101
0111
1110
1111
+0
+1
+2
+3
+4
+5
+6
+7
-8
-7
-6
-5
-4
-3
-2
-1
K. R. Sarath Chandran. AP/CSE/SSNCE
54
Overflow Conditions
5
3
-8
0 1 1 1
0 1 0 1
0 0 1 1
1 0 0 0
-7
-2
7
1 0 0 0
1 0 0 1
1 1 0 0
1 0 1 1 1
5
2
7
0 0 0 0
0 1 0 1
0 0 1 0
0 1 1 1
-3
-5
-8
1 1 1 1
1 1 0 1
1 0 1 1
1 1 0 0 0
Overflow
Overflow
No overflow
No overflow
Overflow when carry-in to the high-order bit does not equal carry out
K. R. Sarath Chandran. AP/CSE/SSNCE
55
Sign Extension
Task:
Given w-bit signed integer x
Convert it to w+k-bit integer with same value
Rule:
Make k copies of sign bit:
X = x
w1
,, x
w1
, x
w1
, x
w2
,, x
0
k copies of MSB
X
X


w
w
k
K. R. Sarath Chandran. AP/CSE/SSNCE
56
Sign Extension Example
short int x = 15213;
int ix = (int) x;
short int y = -15213;
int iy = (int) y;
Decimal Hex Binary
x
15213
3B 6D 00111011 01101101
ix
15213
00 00 C4 92 00000000 00000000 00111011 01101101
y
-15213
C4 93 11000100 10010011
iy
-15213
FF FF C4 93 11111111 11111111 11000100 10010011
57
Memory Locations,
Addresses, and
Operations
K. R. Sarath Chandran. AP/CSE/SSNCE
58
Memory Location, Addresses,
and Operation
Memory consists
of many millions of
storage cells,
each of which can
store 1 bit.
Data is usually
accessed in n-bit
groups. n is called
word length.
second word
first word
Figure 2.5. Memory words.
nbits
last word
i th word

K. R. Sarath Chandran. AP/CSE/SSNCE


59
Memory Location, Addresses,
and Operation
32-bit word length example
(b) Four characters
character character character character
(a) A signed integer
Sign bit:
for positive numbers
for negative numbers
ASCII ASCII ASCII ASCII
32 bits
8 bits 8 bits 8 bits 8 bits
b
31
b
30
b
1
b
0
b
31
0 =
b
31
1 =

K. R. Sarath Chandran. AP/CSE/SSNCE


60
Memory Location, Addresses,
and Operation
To retrieve information from memory, either for one
word or one byte (8-bit), addresses for each location
are needed.
A k-bit address memory has 2
k
memory locations,
namely 0 2
k
-1, called memory space.
24-bit memory: 2
24
= 16,777,216 = 16M (1M=2
20
)
32-bit memory: 2
32
= 4G (1G=2
30
)
1K(kilo)=2
10
1T(tera)=2
40
K. R. Sarath Chandran. AP/CSE/SSNCE
61
Memory Location, Addresses,
and Operation
It is impractical to assign distinct addresses
to individual bit locations in the memory.
The most practical assignment is to have
successive addresses refer to successive
byte locations in the memory byte-
addressable memory.
Byte locations have addresses 0, 1, 2, If
word length is 32 bits, they successive words
are located at addresses 0, 4, 8,
K. R. Sarath Chandran. AP/CSE/SSNCE
62
Big-Endian and Little-Endian
Assignments
2
k
4 - 2
k
3 - 2
k
2 - 2
k
1 - 2
k
4 - 2
k
4 -
0 1 2 3
4 5 6 7
0 0
4
2
k
1 - 2
k
2 - 2
k
3 - 2
k
4 -
3 2 1 0
7 6 5 4
Byte address Byte address
(a) Big-endian assignment (b) Little-endian assignment
4
Word
address

Figure 2.7. Byte and word addressing.


Big-Endian: lower byte addresses are used for the most significant bytes of the word
Little-Endian: opposite ordering. lower byte addresses are used for the less significant
bytes of the word
K. R. Sarath Chandran. AP/CSE/SSNCE
63
Memory Location, Addresses,
and Operation
Address ordering of bytes
Word alignment
Words are said to be aligned in memory if they
begin at a byte addr. that is a multiple of the num
of bytes in a word.
16- bit word: word addresses: 0, 2, 4,.
32- bit word: word addresses: 0, 4, 8,.
64- bit word: word addresses: 0, 8,16,.
Access numbers, characters, and character
strings
K. R. Sarath Chandran. AP/CSE/SSNCE
64
Memory Operation
Load (or Read or Fetch)
Copy the content. The memory content doesnt change.
Address Load
Registers can be used
Store (or Write)
Overwrite the content in memory
Address and Data Store
Registers can be used
65
Instruction and
Instruction Sequencing
K. R. Sarath Chandran. AP/CSE/SSNCE
66
Must-Perform Operations
Data transfers between the memory and the
processor registers
Arithmetic and logic operations on data
Program sequencing and control
I/O transfers
K. R. Sarath Chandran. AP/CSE/SSNCE
67
Register Transfer Notation
Identify a location by a symbolic name
standing for its hardware binary address
(LOC, R0,)
Contents of a location are denoted by placing
square brackets around the name of the
location (R1[LOC], R3 [R1]+[R2])
Register Transfer Notation (RTN)
K. R. Sarath Chandran. AP/CSE/SSNCE
68
Assembly Language Notation
Represent machine instructions and
programs.
Move LOC, R1 = R1[LOC]
Add R1, R2, R3 = R3 [R1]+[R2]
K. R. Sarath Chandran. AP/CSE/SSNCE
69
Basic Instruction Types
High-level language: C = A + B
Action: C [A] + [B]
Assembly: Add A, B, C
Three-address instruction:
Operation Source1, Source2, Destination
Two-address instruction:
Operation Source, Destination
Add A, B = B [A] + [B]
K. R. Sarath Chandran. AP/CSE/SSNCE
70
Basic Instruction Types
Need to add something to the above two-address
instruction to finish:
Move B, C = C [B]
One-address instruction (to fit in one word length)
Accumulator: Add A
Load A
Add B
Store C
Zero-address instructions (stack operation)
K. R. Sarath Chandran. AP/CSE/SSNCE
71
Using Registers
Registers are faster
Shorter instructions
The number of registers is smaller (e.g. 32
registers need 5 bits)
Potential speedup
Minimize the frequency with which data is
moved back and forth between the memory
and processor registers.
K. R. Sarath Chandran. AP/CSE/SSNCE
72
Using Registers
Load A, R
i
Store R
i
, A
Add A, R
i
Add R
i
, R
j
Add R
i
, R
j
, R
k
Move Source, Destination
Move A, R
i
= Load A, R
i
Move R
i
, A = Store R
i
, A
K. R. Sarath Chandran. AP/CSE/SSNCE
73
Using Registers
C=A+B
In the processors where arithmetic operations are
allowed only on operands in register
Move A, R
i
Move B, R
j
Add R
i
, R
j
Move R
j
, C
In the processors where one operand may be in the
memory but the other one must be in registers
Move A, R
i
Add B, R
i
Move R
i
, C
K. R. Sarath Chandran. AP/CSE/SSNCE
74
Instruction Execution and
Straight-Line Sequencing
R0,C
B,R0
A,R0
Move
i + 8
Begin execution here Move
i
Contents Address
C
B
A
the program
Data for
segment
program
3-instruction
Add
i + 4
Figure 2.8. A program for C [] + [].
Assumptions:
- One memory operand
per instruction
- 32-bit word length
- Memory is byte
addressable
- Full memory address
can be directly specified
in a single-word instruction
Two-phase procedure
-Instruction fetch
-Instruction execute
Page 43
K. R. Sarath Chandran. AP/CSE/SSNCE
75
Branching
NUMn
NUM2
NUM1
R0,SUM
NUMn,R0
NUM3,R0
NUM2,R0
NUM1,R0
Figure 2.9. A straight-line program for adding n numbers.
Add
Add
Move
SUM
i
Move
Add
i 4n +
i 4n 4 - +
i 8 +
i 4 +

K. R. Sarath Chandran. AP/CSE/SSNCE


76
Branching
N,R1 Move
NUMn
NUM2
NUM1
R0,SUM
R1
"Next" number to R0
Figure 2.10. Using a loop to add n numbers.
LOOP
Decrement
Move
LOOP
loop
Program
Determine address of
"Next" number and add
N
SUM
n
R0 Clear
Branch>0

Branch target
Conditional branch
K. R. Sarath Chandran. AP/CSE/SSNCE
77
Condition Codes
Condition code flags
Condition code register / status register
N (negative)
Z (zero)
V (overflow)
C (carry)
Different instructions affect different flags
K. R. Sarath Chandran. AP/CSE/SSNCE
78
Generating Memory Addresses
How to specify the address of branch target?
Can we give the memory operand address
directly in a single Add instruction in the loop?
Use a register to hold the address of NUM1;
then increment by 4 on each pass through
the loop.
79
Addressing Modes
K. R. Sarath Chandran. AP/CSE/SSNCE
80
Addressing Modes
The different
ways in which
the location of
an operand is
specified in
an instruction
are referred
to as
addressing
modes.
Name Assembler syntax Addressingfunction
Immediate #Value Operand = Value
Register Ri EA = Ri
Absolute(Direct) LOC EA = LOC
Indirect (Ri ) EA = [Ri ]
(LOC) EA = [LOC]
Index X(Ri ) EA = [Ri ] + X
Basewith index (Ri ,Rj ) EA = [Ri ] + [Rj ]
Basewith index X(Ri ,Rj ) EA = [Ri ] + [Rj ] + X
and offset
Relative X(PC) EA = [PC] + X
Autoincrement (Ri )+ EA = [Ri ] ;
Increment Ri
Autodecrement (Ri ) Decrement Ri ;
EA = [Ri ]

K. R. Sarath Chandran. AP/CSE/SSNCE


81
Implementation of Variables
A variable is represented by allocating a
register or a memory location to hold its value.
Register mode
Absolute mode
Move LOC, R2
K. R. Sarath Chandran. AP/CSE/SSNCE
82
Implementation of Constants
Immediate mode
Move 200
immediate
, R0
Move #200, R0
A = B + 6
Move B, R1
Add #6, R1
Move R1, A
K. R. Sarath Chandran. AP/CSE/SSNCE
83
Indirection and Pointers
Indirect mode the effective address (EA) of the operand is the
contents of a register or memory location whose address appears in the
instruction.
R1
Add (R1),R0 Add (A),R0
Figure 2.11. Indirect addressing.
Register
B B
Operand
memory
Main
(a) Through a general-purpose register (b) Through a memory location
A B Operand B
K. R. Sarath Chandran. AP/CSE/SSNCE
84
Indirection and Pointers
R1
Add (R1),R0 Add (A),R0
Register B B Operand
memory
Main
(a) Through a general-purpose register (b) Through a memory location
A B Operand B
(a)
R1 has B=$2000, #5 is stored in MEM $2000
Add (R1), R0; Add $2000, R0; #5+R0R0
(b)
A=$1000, B=$2000, #$2000 is stored in MEM $1000
Add (A), R0; Add($1000), R0; Add $2000, R0; #5+R0R0
Add $1000, R0 ? #$2000+R0R0
K. R. Sarath Chandran. AP/CSE/SSNCE
85
Indirection and Pointers
Clear R0
Contents
Move
Add
Decrement
Add
LOOP
Initialization
LOOP
Address
Figure 2.12. Use of indirect addressing in the program of Figure
2.10.
(R2),R0
#4,R2
R1
R0,SUM
Move
Move N,R1
#NUM1,R2
Branch>0
The register or memory location that contains the address of an operand is
called a pointer.
K. R. Sarath Chandran. AP/CSE/SSNCE
86
Indirection and Pointers
C-language: A = *B
Move B, R1
Move (R1), A
Indirect addressing: Move (B), A
Indirect addressing through memory
Indirect addressing through registers
K. R. Sarath Chandran. AP/CSE/SSNCE
87
Indexing and Arrays
Index mode the effective address of the operand
is generated by adding a constant value to the
contents of a register.
Index register
X(R
i
): EA = X + [R
i
]
The constant X may be given either as an explicit
number or as a symbolic name representing a
numerical value.
If X is shorter than a word, sign-extension is needed.
K. R. Sarath Chandran. AP/CSE/SSNCE
88
Indexed Addressing
Operand 1020
Figure 2.13. Indexed addressing.
Add 1000(R1),R2
R1
R1
Add 20(R1),R2
Operand 1020
20 1000
20 = offset
20 = offset
1000 1000
(a) Offset is given as a constant
(b) Offset is in the index register
K. R. Sarath Chandran. AP/CSE/SSNCE
89
Example Student Record List
Figure 2.14. A list of students' marks.
Student 1
Student 2
Test 3
Test 2
Test 1
Student ID
Test 3
Test 2
Student ID
n
N
LIST
Test 1 LIST + 4
LIST + 8
LIST + 12
LIST + 16

K. R. Sarath Chandran. AP/CSE/SSNCE


90
Example (Cont)
Compute the sum of all scores obtained on each of
the tests and store these three sums in memory
locations SUM1, SUM2, and SUM3.
Move #LIST,R0
Add
Move
Add
12(R0),R3
LOOP
#16,R0
Clear R1
Clear R3
4(R0),R1
Clear R2
Add 8(R0),R2
N,R4
Add
Decrement R4
LOOP
Move R1,SUM1
Move
R2,SUM2
Move R3,SUM3
Branch>0
R0 doesnt change
K. R. Sarath Chandran. AP/CSE/SSNCE
91
Indexing and Arrays
In general, the Index mode facilitates access
to an operand whose location is defined
relative to a reference point within the data
structure in which the operand appears.
Several variations:
(R
i
, R
j
): EA = [R
i
] + [R
j
]
X(R
i
, R
j
): EA = X + [R
i
] + [R
j
]
K. R. Sarath Chandran. AP/CSE/SSNCE
92
Relative Addressing
Relative mode the effective address is determined
by the Index mode using the program counter in
place of the general-purpose register.
X(PC) note that X is a signed number
Branch>0 LOOP
This location is computed by specifying it as an
offset from the current value of PC.
Branch target may be either before or after the
branch instruction, the offset is given as a singed
num.
K. R. Sarath Chandran. AP/CSE/SSNCE
93
Additional Modes
Autoincrement mode the effective address of the operand is
the contents of a register specified in the instruction. After
accessing the operand, the contents of this register are
automatically incremented to point to the next item in a list.
(R
i
)+. The increment is 1 for byte-sized operands, 2 for 16-bit
operands, and 4 for 32-bit operands.
Autodecrement mode: -(R
i
) decrement first
R0 Clear
R0,SUM
R1
(R2)+,R0
Figure 2.16. The Autoincrement addressing mode used in the program of Figure 2.12.
Initialization
Move
LOOP Add
Decrement
LOOP
#NUM1,R2
N,R1 Move
Move
Branch>0
94
Assembly Language
K. R. Sarath Chandran. AP/CSE/SSNCE
95
Assembly Language
Machine instructions are represented by 0s and 1s
cool but
Symbolic names: Move, Add, Increment, Branch,
Mnemonics: MOV, ADD, INC, BRA,
A complete set of such symbolic names and rules
for their use constitute an assembly language.
The set of rules for using the mnemonics is called
the syntax of the language.
Assembler
Source program object program
K. R. Sarath Chandran. AP/CSE/SSNCE
96
Assembly language
May or may not be case sensitivity
Op code: binary pattern for the operation
Several example:
ADD #5, R3
ADDI 5, R3
MOVE #5, (R2)
MOVEI 5, (R2)
K. R. Sarath Chandran. AP/CSE/SSNCE
97
Assembler Directives
Allow programmer to specify other
information needed to translate the source
program into object program assembler
directives/commands.
SUM EQU 20 / LOCA EQU $1000
Label Operation Operand(s) Comment
K. R. Sarath Chandran. AP/CSE/SSNCE
98
Sample Program
NUM2
NUMn
NUM1
R0 Clear
R0,SUM
R1
#4,R2
(R2),R0
Figure 2.17. Memory arrangement for the program in Figure 2.12.
100
132
604
212
208
204
200
128
124
120
116
112
108
104
100
SUM
N
LOOP
LOOP
Decrement
Add
Add
Move
#NUM1,R2
N,R1 Move
Move
Branch>0
- How to interpret the names
- Where to place the instructions
in the memory
- Where to place the data operands
in the memory
Assembler has to know:
K. R. Sarath Chandran. AP/CSE/SSNCE
99
Sample Program
Memory Addressing
address or data
label Operation information
Assembler directives SUM EQU 200
ORIGIN 204
N DATAWORD 100
NUM1 RESERVE 400
ORIGIN 100
Statements that START MOVE N,R1
generate MOVE #NUM1,R2
machine CLR R0
instructions LOOP ADD (R2),R0
ADD #4,R2
DEC R1
BGTZ LOOP
MOVE R0,SUM
Assembler directives RETURN
END START
K. R. Sarath Chandran. AP/CSE/SSNCE
100
Assembly and Execution of
Programs
Assemblers task (source object)
Replace all symbols denoting operations and addressing modes
with the binary codes used in machine instructions
Replace all names and labels with their actual values
Assign addresses to instructions and data blocks (ORIGIN,
DATAWORD, RESERVE)
Determine the values that replace the names (EQU, label)
Branch address (Relative addressing, branch offset)
Scan through the source program, keep track of all names and the
numerical values that correspond them in a symbol table.
What if a name appears as an operand before it is given a value (forward branch)?
Two-pass assembler
K. R. Sarath Chandran. AP/CSE/SSNCE
101
Assembly and Execution of
Programs
Loader load the object program from the
disk into memory
Already be in memory
Know the length of the program and the address in the
memory where it will be stored
Branch to the first instruction to be executed
Debugger let user find errors easily
Number notation decimal, binary (%),
hexadecimal/hex ($)
102
Basic Input/Output
Operations
K. R. Sarath Chandran. AP/CSE/SSNCE
103
I/O
The data on which the instructions operate
are not necessarily already stored in memory.
Data need to be transferred between
processor and outside world (disk, keyboard,
etc.)
I/O operations are essential, the way they are
performed can have a significant effect on the
performance of the computer.
K. R. Sarath Chandran. AP/CSE/SSNCE
104
Program-Controlled I/O
Example
Read in character input from a keyboard and
produce character output on a display screen.
Rate of data transfer (keyboard, display, processor)
Difference in speed between processor and I/O device
creates the need for mechanisms to synchronize the
transfer of data.
A solution: on output, the processor sends the first
character and then waits for a signal from the display
that the character has been received. It then sends the
second character. Input is sent from the keyboard in a
similar way.
K. R. Sarath Chandran. AP/CSE/SSNCE
105
Program-Controlled I/O
Example
DATAIN DATAOUT
SIN SOUT
Keyboard Display
Bus
Figure 2.19 Bus connection for processor , keyboard, and display .
Processor
- Registers
- Flags
- Device interface
K. R. Sarath Chandran. AP/CSE/SSNCE
106
Program-Controlled I/O
Example
Machine instructions that can check the state
of the status flags and transfer data:
READWAIT Branch to READWAIT if SIN = 0
Input from DATAIN to R1
WRITEWAIT Branch to WRITEWAIT if SOUT = 0
Output from R1 to DATAOUT
K. R. Sarath Chandran. AP/CSE/SSNCE
107
Program-Controlled I/O
Example
Memory-Mapped I/O some memory
address values are used to refer to peripheral
device buffer registers. No special
instructions are needed. Also use device
status registers.
READWAIT Testbit #3, INSTATUS
Branch=0 READWAIT
MoveByte DATAIN, R1
K. R. Sarath Chandran. AP/CSE/SSNCE
108
Program-Controlled I/O
Example
Move #LOC,R0 Initialize pointer registerR0 to point to the
addressof the first locationin memory
wherethe charactersare to be stored.
READ TestBit #3,INSTATUS Wait for a characterto be entered
Branch=0 READ in thekeyboard buffer DATAIN.
MoveByte DATAIN,(R0) Transferthecharacterfrom DATAIN into
thememory(this clears SIN to 0).
ECHO TestBit #3,OUTSTATUS Wait for the display to becomeready.
Branch=0 ECHO
MoveByte (R0),DATAOUT Movethecharacterjust read to the display
buffer register(this clearsSOUT to 0).
Compare #CR,(R0)+ Check if thecharacterjust read is CR
(carriagereturn). If it is not CR, then
Branch0 READ branch back and read anothercharacter.
Also, increment the pointer to storethe
next character.
Figure 2.20. A program that reads a line of characters and displays it.
K. R. Sarath Chandran. AP/CSE/SSNCE
109
Program-Controlled I/O
Example
Assumption the initial state of SIN is 0 and the
initial state of SOUT is 1.
Any drawback of this mechanism in terms of
efficiency?
Two wait loopsprocessor execution time is wasted
Alternate solution?
Interrupt
110
Stacks and Queues
K. R. Sarath Chandran. AP/CSE/SSNCE
111
Stacks
A list of data elements, usually words or
bytes, with the accessing restriction that
elements can be added or removed at one
end of the list only.
Top / Bottom / Pushdown Stack
Last-In-First-Out (LIFO)
Push / Pop
K. R. Sarath Chandran. AP/CSE/SSNCE
112
Stacks
Figure 2.21. A stack of words in the memory.
register
Stack
pointer
17
BOTTOM
0
SP
Current
top element
element
Bottom
Stack
2
k
1 -

739
43
28 -
K. R. Sarath Chandran. AP/CSE/SSNCE
113
Stacks
Stack Pointer (SP)
Push
Subtract #4, SP
Move NEWITEM, (SP)
Move NEWITEM, -(SP)
Pop
Move (SP), ITEM
Add #4, SP
Move (SP)+, ITEM
K. R. Sarath Chandran. AP/CSE/SSNCE
114
Stacks
Figure 2.22. Effect of stack operations on the stack in Figure 2.21.
(b) After pop into ITEM (a) After push from NEWITEM
17
739
43
ITEM
SP
Stack
SP
NEWITEM
19
17
739
19
43
28 -
28 - 28 -

17
0
SP
2
k
1 -

739
43
28 -
K. R. Sarath Chandran. AP/CSE/SSNCE
115
Stacks
The size of stack in a program is fixed in
memory.
Need to avoid pushing if the maximum size is
reached
Need to avoid popping if stack is empty
Compare instruction
Compare src, dst
[dst] [src]
Will not change the values of src and dst.
K. R. Sarath Chandran. AP/CSE/SSNCE
116
Stacks
SAFEPOP Compare #2000,SP Check to seeif thestack pointer contains
Branch>0 EMPTYERROR an addressvaluegreaterthan 2000. If it
does,the stack is empty. Branch to the
routine EMPTYERROR for appropriate
action.
Move (SP)+,ITEM Otherwise,pop the top of the stack into
memorylocationITEM.
SAFEPUSH Compare #1500,SP Check to seeif the stack pointer
Branch 0 FULLERROR contains an addressvalueequal
to or lessthan1500. If it does, the
stack is full. Branch to the routine
FULLERROR for appropriateaction.
Move NEWITEM, (SP) Otherwise,push the element in memory
location NEWITEM onto the stack.

Figure 2.23. Checking for empty and full errors in pop and push operations.
(b) Routine for a safe push operation
(a) Routine for a safe pop operation
K. R. Sarath Chandran. AP/CSE/SSNCE
117
Queues
Data are stored in and retrieved from a queue
on a First-In-First-Out (FIFO) basis.
New data are added at the back (high-
address end) and retrieved from the front
(low-address end).
How many pointers are needed for stack and
queue, respectively?
Circular buffer
1, 2
118
Subroutines
K. R. Sarath Chandran. AP/CSE/SSNCE
119
Subroutines
It is often necessary to perform a particular subtask (subroutine)
many times on different data values.
To save space, only one copy of the instructions that constitute
the subroutine is placed in the memory.
Any program that requires the use of the subroutine simply
branches to its starting location (Call).
After a subroutine has been executed, it is said to return to the
program that called the subroutine, and the program resumes
execution. (Return)
Return
1000
location
Memory
Calling program
Memory
location
200
204
Call SUB
next instruction
Subroutine SUB
first instruction
K. R. Sarath Chandran. AP/CSE/SSNCE
120
Subroutines
Since the subroutine may be called from different
places in a calling program, provision must be made
for returning to the appropriate location.
Subroutine Linkage method: use link register to
store the PC.
Call instruction
Store the contents of the PC in the link register
Branch to the target address specified by the instruction
Return instruction
Branch to the address contained in the link register
K. R. Sarath Chandran. AP/CSE/SSNCE
121
Subroutines
Return Call
Figure 2.24. Subroutine linkage using a link register.
1000
204
204
Link
PC
Return
1000
location
Memory
Calling program
Memory
location
200
204
Call SUB
next instruction
Subroutine SUB
first instruction
K. R. Sarath Chandran. AP/CSE/SSNCE
122
Subroutine Nesting and The
Processor Stack
If a subroutine calls another subroutine, the
contents in the link register will be destroyed.
If subroutine A calls B, B calls C, after C has
been executed, the PC should return to B,
then A
LIFO Stack
Automatic process
by the Call instruction
Processor stack
MEM location of B
MEM location of A
SP
K. R. Sarath Chandran. AP/CSE/SSNCE
123
Parameter Passing
Exchange of information between a calling
program and a subroutine.
Several ways:
Through registers
Through memory locations
Through stack
K. R. Sarath Chandran. AP/CSE/SSNCE
124
Passing Parameters through
Processor Registers
Calling program
Move N,R1 R1 servesas a counter.
Move #NUM1,R2 R2 pointsto thelist.
Call LISTADD Call subroutine.
Move R0,SUM Save result.
.
.
.
Subroutine
LISTADD Clear R0 Initialize sumto 0.
LOOP Add (R2)+,R0 Add entry from list.
Decrement R1
Branch>0 LOOP
Return Returnto calling program.
Figure 2.25. Program of Figure 2.16 written as a subroutine; parameters passed through registers.
K. R. Sarath Chandran. AP/CSE/SSNCE
125
Passing Parameters through
Stack
Assumetop of stack is atlevel 1 below.
Move #NUM1, (SP) Pushparametersonto stack.
Move N, (SP)
Call LISTADD Call subroutine
(top of stack at level 2).
Move 4(SP),SUM Save result.
Add #8,SP Restoretop of stack
(top of stack at level 1).
.
.
.
LISTADD MoveMultiple R0 R2, (SP) Save registers
(top of stack at level 3).
Move 16(SP),R1 Initialize counter to n.
Move 20(SP),R2 Initialize pointer to the list.
Clear R0 Initialize sumto 0.
LOOP Add (R2)+,R0 Add entry from list.
Decrement R1
Branch>0 LOOP
Move R0,20(SP) Put result on the stack.
MoveMultiple (SP)+,R0 R2 Restoreregisters.
Return Return to calling program.


(a) Calling program and subroutine

[R2]
[R1]
[R0]
Return address
n
NUM1
Level 3
Level 2
Level 1

Figure 2.26. Program of Figure 2.16 written as a subroutine; parameters passed on the stack.
(b) Top of stack at various times
- Passing by reference
- Passing by value
Page 76
K. R. Sarath Chandran. AP/CSE/SSNCE
126
The Stack Frame
Some stack locations constitute a private
work space for a subroutine, created at the
time the subroutine is entered and freed up
when the subroutine returns control to the
calling program. Such space is called a stack
frame.
Frame pointer (FP)
Index addressing to access data inside frame
-4(FP), 8(FP),
127
Additional
Instructions
K. R. Sarath Chandran. AP/CSE/SSNCE
128
Logic Instructions
AND
OR
NOT (whats the purpose of the following)
Not R0
Add #1, R0
Determine if the leftmost character of the four ASCII
characters stored in the 32-bit register R0 is Z
(01011010)
And #$FF000000, R0
Compare #$5A000000, R0
Branch=0 YES
K. R. Sarath Chandran. AP/CSE/SSNCE
129
Logical Shifts
Logical shift shifting left (LShiftL) and shifting right
(LShiftR)
C R0 0
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
0 0 1 1 1 0 0 0
(b) Logical shift r ight LShiftR #2,R0
(a) Logical shift left LShiftL #2,R0
C R0 0
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
1 1 0
. . .
0 0 1 0 1
. . .
K. R. Sarath Chandran. AP/CSE/SSNCE
130
Logical Shifts
Two decimal digits represented in ASCII code are located at LOC
and LOC+1. Pack these two digits in a single byte location PACKED.
Extract the low-order four bits in LOC and LOC+1, and concatenate
them into the single byte at PACKED. Example: A1, B212
Move #LOC,R0 R0 pointsto data.
MoveByte (R0)+,R1 Load first byte into R1.
LShiftL #4,R1 Shift left by 4 bit positions.
MoveByte (R0),R2 Load secondbyte into R2.
And #$F,R2 Eliminate high-order bits.
Or R1,R2 ConcatenatetheBCD digits.
MoveByte R2,PACKED Store the result.
Figure 2.31. A routine that packs two BCD digits.
K. R. Sarath Chandran. AP/CSE/SSNCE
131
Arithmetic Shifts
C
before:
after:
0
1
1 1 0 0 0 1
. . .
0 1
1 1 0 0 1 0 1 1
(c) Arithmetic shift right AShiftR #2,R0
R0
. . .
K. R. Sarath Chandran. AP/CSE/SSNCE
132
Rotate
Figure 2.32. Rotate instructions.
C R0
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
1 0 1 1 1 0 0 1
(c) Rotate r ight without carry RotateR #2,R0
(a) Rotate left without carr y RotateL #2,R0
C R0
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
1 1 0
. . .
1 0 1 0 1
C
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
1 0 1 1 1 0 0 0
(d) Rotate r ight with carry RotateRC #2,R0
R0
. . .
. . .
(b) Rotate left with carr y RotateLC #2,R0
C R0
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
1 1 0
. . .
0 0 1 0 1
K. R. Sarath Chandran. AP/CSE/SSNCE
133
Multiplication and Division
Not very popular (especially division)
Multiply R
i
, R
j
R
j
[R
i
] [R
j
]
2n-bit product case: high-order half in R(j+1)
Divide R
i
, R
j
R
j
[R
i
] / [R
j
]
Quotient is in Rj, remainder may be placed in R(j+1)
134
Example Programs
K. R. Sarath Chandran. AP/CSE/SSNCE
135
Vector Dot Product Program
Move #AVEC,R1 R1 points to vector A.
Move #BVEC,R2 R2 points to vector B.
Move N,R3 R3 serves asa counter.
Clear R0 R0 accumulatesthedot product.
LOOP Move (R1)+,R4 Computethe product of
Multiply (R2)+,R4 nextcomponents.
Add R4,R0 Add toprevioussum.
Decrement R3 Decrement thecounter.
Branch
>0
LOOP Loop againif not done.
Move R0,DOTPROD Storedot product in memory.
Figure 2.33. A program for computing the dot product of two vectors.

=
=
1
0
) ( ) ( Product Dot
n
i
i B i A
K. R. Sarath Chandran. AP/CSE/SSNCE
136
Byte-Sorting Program
Sort a list of bytes stored in memory into
ascending alphabetic order.
The list consists of n bytes, not necessarily
distinct, stored from memory location LIST to
LIST+n-1. Each byte contains the ASCII code
for a character from A to Z. The first bit is 0.
Straight-selection algorithm (compare and
swap)
K. R. Sarath Chandran. AP/CSE/SSNCE
137
Byte-Sorting Program
for
(j = n 1; j > 0; j = j 1)
{ for ( k = j 1; k > = 0; k = k 1 )
{ if (LIST[k] > LIST[ j])
{ TEMP = LIST[k];
LIST[k] = LIST[ j ];
LIST[ j] = TEMP;
}
}
}


(a) C-language program for sorting
Move #LIST,R0 Load LIST into baseregister R0.
Move N,R1 Initialize outer loop index
Subtract #1,R1 register R1 to j = n 1.
OUTER Move R1,R2 Initialize inner loop index
Subtract #1,R2 register R2 to k = j 1.
MoveByte (R0,R1),R3 Load LIST( j ) into R3, which holds
current maximumin sublist.
INNER CompareByte R3,(R0,R2) If LIST( k) [R3],
Branch 0 NEXT do not exhange.
MoveByte (R0,R2),R4 Otherwise,exchangeLIST(k)
MoveByte R3,(R0,R2) with LIST(j ) andload
MoveByte R4,(R0,R1) newmaximuminto R3.
MoveByte R4,R3 RegisterR4 servesasTEMP.
NEXT Decrement R2 Decrement index registersR2 and
Branch 0 INNER R1, which alsoserve
Decrement R1 asloop counters, andbranch
Branch> 0 OUTER back if loopsnot finished.
(b) Assembly language program for sorting

K. R. Sarath Chandran. AP/CSE/SSNCE


138
Byte-Sorting Program
The list must have at least two elements because
the check for loop termination is done at the end of
each loop.
If the machine instruction set allows a move
operation from one memory location directly to
another memory location:
MoveByte (R0, R2), (R0, R1)
MoveByte R3, (R0, R2)
MoveByte (R0, R1), R3
MoveByte (R0, R2), R4
MoveByte R3, (R0, R2)
MoveByte R4, (R0, R1)
MoveByte R4, R3

K. R. Sarath Chandran. AP/CSE/SSNCE


139
Linked List
Many nonnumeric application programs
require that an ordered list of information
items be represented and stored in memory
in such a way that it is easy to add items to
the list or to delete items from the list at ANY
position while maintaining the desired order
of items.
Different from Stacks or Queues.
K. R. Sarath Chandran. AP/CSE/SSNCE
140
Problem Illustration
Maintain this list of
records in consecutive
memory locations in
some contiguous
block of memory in
increasing order of
student ID numbers.
What if a student
withdraws from the
course so that an
empty record slot is
created?
What if another
student registers in
the course?
Student 1
Student 2
Test 3
Test 2
Test 1
Student ID
Test 3
Test 2
Student ID
n
N
LIST
Test 1 LIST + 4
LIST + 8
LIST + 12
LIST + 16

K. R. Sarath Chandran. AP/CSE/SSNCE


141
Linked List
Each record still occupies a consecutive four-word block in the memory.
But successive records in the order do not necessarily occupy
consecutive blocks in the memory address space.
Each record contains an address value in a one-word link field that
specifies the location of the next record in order to enable connecting
the blocks together to form the ordered list.
Record 1
(a) Linking structure
Record 2 Recordk
Link address
Head Tail
0
Record 2
Record 1
New record
(b) Inserting a new record between Record 1 and Record 2
K. R. Sarath Chandran. AP/CSE/SSNCE
142
List in Memory
Figure 2.36. A list of student test scores organized as a linked list in memory.
First
28106 1200 1040
1 word 1 word 3 words
Head
(ID) (Test scores)
Memory
address
Key
field
Link
field
Data
field
record
27243 1040 2320
40632 1280 2720
28370 2880 1200
47871 0 1280
Second
record
Third
record
Second last
Last
record
record
Tail

Head pointer
K. R. Sarath Chandran. AP/CSE/SSNCE
143
Insertion of a New Record
Suppose that the ID
number of the new
record is 28241,
and the next
available free
record block is at
address 2960.
What are the
possibilities of the
new records
position in the list?
One-entry list
New head
Interior positon
New tail
INSERTION
Compare
Branch>0
Move RNEWREC, RHEAD
Return
Compare (RHEAD), (RNEWREC)
Branch>0 SEARCH
Move RHEAD, 4(RNEWREC)
Move RNEWREC, RHEAD
Return
Move RHEAD, RCURRENT
Move 4(RCURRENT), RNEXT
Compare
Branch=0 TAIL
(RNEXT), (RNEWREC)
Branch<0
#0, RNEXT
Compare
HEAD
not empty
#0, RHEAD
HEAD
INSERT
Move RNEXT, RCURRENT
Move
Branch
RNEXT, 4(RNEWREC)
Move RNEWREC, 4(RCURRENT)
Return
SEARCH
insert new record
somewhere after
current head
LOOP
insert new record in
an interior position
new record becomes ne w tail
INSERT
TAIL
LOOP
new record
becomes a
one-entry list
new record
becomes
new head
K. R. Sarath Chandran. AP/CSE/SSNCE
144
Deletion of a Record
DELETION Compare (RHEAD), RIDNUM
Branch>0 SEARCH
Move 4(RHEAD), RHEAD
Return
Move RHEAD, RCURRENT
Move 4(RCURRENT), RNEXT
Compare (RNEXT), RIDNUM
Branch=0 DELETE
Move RNEXT, RCURRENT
Branch
Move 4(RNEXT), RTEMP
RTEMP, 4(RCURRENT)
Return
LOOP
Move
not the head record
SEARCH
LOOP
DELETE
Figure 2.38. A subroutine for deleting a record from a linked list. Any problem?
145
Encoding of Machine
Instructions
K. R. Sarath Chandran. AP/CSE/SSNCE
146
Encoding of Machine
Instructions
Assembly language program needs to be converted into machine
instructions. (ADD = 0100 in ARM instruction set)
In the previous section, an assumption was made that all
instructions are one word in length.
OP code: the type of operation to be performed and the type of
operands used may be specified using an encoded binary pattern
Suppose 32-bit word length, 8-bit OP code (how many instructions
can we have?), 16 registers in total (how many bits?), 3-bit
addressing mode indicator.
Add R1, R2
Move 24(R0), R5
LshiftR #2, R0
Move #$3A, R1
Branch>0 LOOP
OP code Source Dest Other info
8 7 7 10
(a) One-word instruction
K. R. Sarath Chandran. AP/CSE/SSNCE
147
Encoding of Machine
Instructions
What happens if we want to specify a memory
operand using the Absolute addressing mode?
Move R2, LOC
14-bit for LOC insufficient
Solution use two words
(b) Two-word instruction
Memory address/Immediate operand
OP code Source Dest Other info
K. R. Sarath Chandran. AP/CSE/SSNCE
148
Encoding of Machine
Instructions
Then what if an instruction in which two operands
can be specified using the Absolute addressing
mode?
Move LOC1, LOC2
Solution use two additional words
This approach results in instructions of variable
length. Complex instructions can be implemented,
closely resembling operations in high-level
programming languages Complex Instruction Set
Computer (CISC)
K. R. Sarath Chandran. AP/CSE/SSNCE
149
Encoding of Machine
Instructions
If we insist that all instructions must fit into a single
32-bit word, it is not possible to provide a 32-bit
address or a 32-bit immediate operand within the
instruction.
It is still possible to define a highly functional
instruction set, which makes extensive use of the
processor registers.
Add R1, R2 ----- yes
Add LOC, R2 ----- no
Add (R3), R2 ----- yes
K. R. Sarath Chandran. AP/CSE/SSNCE
150
Encoding of Machine
Instructions
How to load a 32-bit address into a register that
serves as a pointer to memory locations?
Solution #1 direct the assembler to place the
desired address in a word location in a data area
close to the program, so that the Relative
addressing mode can be used to load the address.
Solution #2 use logical and shift instructions to
construct the desired 32-bit address by giving it in
parts that are small enough to be specifiable using
the Immediate addressing mode.
K. R. Sarath Chandran. AP/CSE/SSNCE
151
Encoding of Machine
Instructions
An instruction must occupy only one word
Reduced Instruction Set Computer (RISC)
Other restrictions
OP code
(c) Three-operand instruction
Ri Rj Other info Rk
152
Unit 2
153
ALU Design
K. R. Sarath Chandran. AP/CSE/SSNCE
154
Outline
A basic operation in all digital computers is
the addition or subtraction of two numbers.
ALU AND, OR, NOT, XOR
Unsigned/signed numbers
Addition/subtraction
Multiplication
Division
Floating number operation
155
Adders
K. R. Sarath Chandran. AP/CSE/SSNCE
156
Addition of Unsigned Numbers
Half Adder
Sum
s
0
1
1
0
Carry
c
0
0
0
1
0
0 +
0
1 +
1 0 0 0
1
0 +
1 0
1
1 +
0 1
x
y +
s c
Sum Carry
(a) The four possible cases
x y
0
0
1
1
0
1
0
1
(b) Truth table
x
y
s
c
HA
x
y
s
c
(c) Circuit (d) Graphical symbol
K. R. Sarath Chandran. AP/CSE/SSNCE
157
Addition and Subtraction of
Signed Numbers
s
i
=
c
i +1
=
Figure 6.1. Logic specification for a stage of binary addition.
13
7
+ Y
1
0
0
0
1
0
1
1
0
0
1
1
0
1
1
0
0
1
1
0
1
0
0
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
Example:
1
0 = = 0
0
1 1
1
1 1 0 0
1
1 1 1 0
Legend for stagei
x
i
y
i
Carry-in c
i
Sums
i
Carry-outc
i +1
X
Z
+ 6 0 +
x
i
y
i
s
i
Carry-out
c
i+1
Carry-in
c
i
x
i
y
i
c
i
x
i
y
i
c
i
x
i
y
i
c
i
x
i
y
i
c
i
x
i
y
i
c
i
=
+ + +
y
i
c
i
x
i
c
i
x
i
y
i
+ +
K. R. Sarath Chandran. AP/CSE/SSNCE
158
Addition and Subtraction of
Signed Numbers
A full adder (FA)
Full adder
(FA)
c
i
y
i
x
i
c
i 1 +
s
i
(a) Logic for a single stage
c
i
y
i
x
i
c
i
y
i
x
i
x
i
c
i
y
i
s
i
c
i 1 +
K. R. Sarath Chandran. AP/CSE/SSNCE
159
Addition and Subtraction of
Signed Numbers
n-bit ripple-carry adder
Overflow?
FA
c
0
y
1
x
1
s
1
FA
c
1
y
0
x
0
s
0
FA
c
n 1 -
y
n 1 -
x
n 1 -
c
n
s
n 1 -
(b) An n-bit ripple-carry adder
Most significant bit
(MSB) position
Least significant bit
(LSB) position
K. R. Sarath Chandran. AP/CSE/SSNCE
160
Addition and Subtraction of
Signed Numbers
kn-bit ripple-carry adder
n-bit
c
0
y
n
x
n
s
n
c
n
y
0
x
n 1 -
s
0
c
kn
s
k 1 - ( )n
x
0
y
n 1 -
y
2n 1 -
x
2n 1 -
y
kn 1 -
s
n 1 -
s
2n 1 -
s
kn 1 -
(c) Cascade of k n-bit adders
x
kn 1 -
Figure 6.2. Logic for addition of binary vectors.
adder
n-bit
adder
n-bit
adder
K. R. Sarath Chandran. AP/CSE/SSNCE
161
Addition and Subtraction of
Signed Numbers
Addition/subtraction logic unit
Add/Sub
control
n-bit adder
x
n 1 -
x
1
x
0
c
n
s
n 1 -
s
1
s
0
c
0
y
n 1 -
y
1
y
0
Figure 6.3. Binary addition-subtraction logic netw ork.
162
Make Addition Faster
K. R. Sarath Chandran. AP/CSE/SSNCE
163
Ripple-Carry Adder (RCA)
Straight-forward design
Simple circuit structure
Easy to understand
Most power efficient
Slowest (too long critical path)
K. R. Sarath Chandran. AP/CSE/SSNCE
164
Adders
We can view addition in terms of generate,
G[i], and propagate, P[i].
K. R. Sarath Chandran. AP/CSE/SSNCE
165
Carry-lookahead Logic
Carry Generate G
i
= A
i
B
i
must generate carry when A = B = 1
Carry Propagate P
i
= A
i
xor B
i
carry-in will equal carry-out here
S
i
= A
i
xor B
i
xor C
i
= P
i
xor C
i
C
i+1
= A
i
B
i
+ A
i
C
i
+ B
i
C
i
= A
i
B
i
+ C
i
(A
i
+ B
i
)
= A
i
B
i
+ C
i
(A
i
xor B
i
)
= G
i
+ C
i
P
i
Sum and Carry can be reexpressed in terms of generate/propagate/C
i
:
K. R. Sarath Chandran. AP/CSE/SSNCE
166
Carry-lookahead Logic
Reexpress the carry logic as follows:
C
1
= G
0
+ P
0
C
0
C
2
= G
1
+ P
1
C
1
= G
1
+ P
1
G
0
+ P
1
P
0
C
0
C
3
= G
2
+ P
2
C
2
= G
2
+ P
2
G
1
+ P
2
P
1
G
0
+ P
2
P
1
P
0
C
0
C
4
= G
3
+ P
3
C
3
= G
3
+ P
3
G
2
+ P
3
P
2
G
1
+ P
3
P
2
P
1
G
0
+ P
3
P
2
P
1
P
0
C
0
Each of the carry equations can be implemented in a two-level logic
network
Variables are the adder inputs and carry in to stage 0!
K. R. Sarath Chandran. AP/CSE/SSNCE
167
Carry-lookahead
Implementation
Adder with Propagate and
Generate Outputs
Increasingly complex logic
Pi @ 1 gate delay
Ci
Si @ 2 gate delays
Bi
Ai
Gi @ 1 gate delay
C0 C0
C0
C0
P0 P0
P0
P0
G0
G0
G0
G0
C1
P1
P1
P1
P1
P1
P1
G1
G1
G1
C2
P2
P2
P2
P2
P2
P2
G2
G2
C3
P3
P3
P3
P3
G3
C4
K. R. Sarath Chandran. AP/CSE/SSNCE
168
Carry-lookahead Logic
Cascaded Carry Lookahead
Carry lookahead
logic generates
individual carries
sums computed
much faster
A
0
B
0
C
0
S
0
@2
A
1
B
1
C
1
@3
S
1
@4
A
2
B
2
C
2
@3
S
2
@4
A
3
B
3
C
3
@3
S
3
@4
C
4
@3
K. R. Sarath Chandran. AP/CSE/SSNCE
169
Carry-lookahead Logic
Figure 6.5. 16-bit carry-lookahead adder built from 4-bit adders (see Figure 6.4 b).
Carry-lookahead logic
4-bit adder 4-bit adder 4-bit adder
4-bit adder
s
15-12
P
3
I
G
3
I
c
12
P
2
I
G
2
I
c
8
s
11-8
G
1
I
c
4
P
1
I
s
7-4
G
0
I
c
0
P
0
I
s
3-0
c
16
x
15-12
y
15-12
x
11-8
y
11-8
x
7-4
y
7-4
x
3-0
y
3-0
.
G
0
II
P
0
II
K. R. Sarath Chandran. AP/CSE/SSNCE
170
Carry-lookahead Logic
4 bit adders with internal carry lookahead
second level carry lookahead unit, extends lookahead to 16 bits
Group Propagate P = P
3
P
2
P
1
P
0
Group Generate G = G
3
+ G
2
P
3
+ G
1
P
3
P
2
+ G
0
P
3
P
2
P
1
4-bit Adder
4 4
4
A [15-12] B [15-12]
C
12
C
16
S [15-12]
P
G
4-bit Adder
4 4
4
A [1 1-8] B [1 1-8]
C
8
S [1 1-8]
P
G
4-bit Adder
4 4
4
A [7-4] B [7-4]
C
4
S [7-4]
P
G
4-bit Adder
4
4
4
A [3-0] B [3-0]
C
0
S [3-0]
P G
Lookahead Carry Unit
C
0
P
0
G
0
P
1
G
1
P
2
G
2
P
3
G
3
C
3
C
2
C
1
C
0
P 3-0 G 3-0
C
4
@3 @2
@0
@4
@4 @3 @2 @5
@7
@3 @2 @5
@8
@8
@3 @2
@5
@5 @3
@0
C
16
171
Unsigned
Multiplication
K. R. Sarath Chandran. AP/CSE/SSNCE
172
Manual Multiplication
Algorithm
(13) Multiplicand M 1
1
(143) Product P
(11) Multiplier Q 1
0
0
1
1
1
1 1 0 1
1 0 1 1
0 0 0 0
1 0 1 1
0 1 0 0 1 1 1 1

(a) Manual multiplication algorithm


K. R. Sarath Chandran. AP/CSE/SSNCE
173
Array Multiplication
M
u
l
t
i
p
l
i
e
r
Multiplicand
m
3
m
2
m
1
m
0
0 0 0 0
q
3
q
2
q
1
q
0
0
p
2
p
1
p
0
0
0
0
p
3
p
4
p
5
p
6
p
7
PP1
PP2
PP3
Partial product
(PP0)
p , p , ...p
PP4 =
7 6 0
= Product
Carry-in
q
i
m
j
Bit of incoming partial product (PP i)
Bit of outgoing partial product [PP( i +1)]
Carry-out
Typical cell
FA
(b) Array implementation
K. R. Sarath Chandran. AP/CSE/SSNCE
174
K. R. Sarath Chandran. AP/CSE/SSNCE
175
Another Version of 44 Array
Multiplier
x
0
y
1
x
1
y
1
x
2
y
1
x
0
y
2
x
1
y
2
x
2
y
2
x
0
y
3
x
1
y
3
x
2
y
3
0 0 0
x
3
y
0
x
2
y
0
x
1
y
0
x
3
y
1
x
3
y
2
x
3
y
3
x
0
y
0
p
7
p
6
p
5
p
4
p
3
p
2
p
1
p
0
+ + +
FA FA FA
FA FA FA
FA FA FA
K. R. Sarath Chandran. AP/CSE/SSNCE
176
Array Multiplication
What is the critical path (worst case signal
propagation delay path)?
Assuming that there are two gate delays from
the inputs to the outputs of a full adder block,
the path has a total of 6(n-1)-1 gate delays,
including the initial AND gate delay in all cells,
for the nn array.
Any advantages/disadvantages?
K. R. Sarath Chandran. AP/CSE/SSNCE
177
Sequential Circuit Binary
Multiplier
q
n 1 -
m
n 1 -
n-bit
Multiplicand M
(a) Register configuration
Control
sequencer
Multiplier Q
0
C
Shift right
Register A (initially 0)
adder
Add/Noadd
control
a
n 1 -
a
0
q
0
m
0
0
MUX
1 1 1 1
1 0 1 1
1 1 1 1
1 1 1 0
1 1 1 0
1 1 0 1
1 1 0 1
Initial configuration
Add
M
1 1 0 1
(b) Multiplication example
C
First cycle
Second cycle
Third cycle
Fourth cycle
No add
Shift
Shift
Add
Shift
Shift
Add
1 1 1 1
0
0
0
1
0
0
0
1
0
0 0 0 0
0 1 1 0
1 1 0 1
0 0 1 1
1 0 0 1
0 1 0 0
0 0 0 1
1 0 0 0
1 0 0 1
1 0 1 1
Q A
Product
178
Signed Multiplication
K. R. Sarath Chandran. AP/CSE/SSNCE
179
Signed Multiplication
Considering 2s-complement signed operands, what will happen to
(-13)(+11) if following the same method of unsigned multiplication?
Figure 6.8. Sign extension of negative multiplicand.
1
0
1 1 1 1 1 1 0 0 1 1
1 1 0
1 1 0
1
0
1 0 0 0 1 1 1 0 1 1
0 0 0 0 0 0
1 1 0 0 1 1 1
0 0 0 0 0 0 0 0
1 1 0 0 1 1 1 1 1
13 - ( )
143 - ( )
11 +
( )
Sign extension is
shown in blue
K. R. Sarath Chandran. AP/CSE/SSNCE
180
Signed Multiplication
For a negative multiplier, a straightforward
solution is to form the 2s-complement of both
the multiplier and the multiplicand and
proceed as in the case of a positive multiplier.
This is possible because complementation of
both operands does not change the value or
the sign of the product.
A technique that works equally well for both
negative and positive multipliers Booth
algorithm.
K. R. Sarath Chandran. AP/CSE/SSNCE
181
Booth Algorithm
Consider in a multiplication, the multiplier is
positive 0011110, how many appropriately
shifted versions of the multiplicand are added
in a standard procedure?
0
0 0
1 0 1 1 0 1
0
0 0 0 0 0 0
1
0
0 1 1 0 1 0
1 0 1 1 0 1 0
1 0 1 1 0 1 0
1 0 1 1 0 1 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 1 1 0 0 0 1 0 1 0 1 0
0
0 0
1 + 1 + 1 + 1 +
K. R. Sarath Chandran. AP/CSE/SSNCE
182
Booth Algorithm
Since 0011110 = 0100000 0000010, if we
use the expression to the right, what will
happen?
0
1
0 1 0 1 1 1
0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 1 0 0 1
0 0
0
0 0 0 1 0 1 1 0 1
0 0 0 0 0 0 0 0
0 1 1 0 0 0 1 0 0 1 0 0 0 1
2's complement of
the multiplicand
0
0
0
0
1 + 1 -
0
0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0
K. R. Sarath Chandran. AP/CSE/SSNCE
183
Booth Algorithm
In general, in the Booth scheme, -1 times the shifted multiplicand
is selected when moving from 0 to 1, and +1 times the shifted
multiplicand is selected when moving from 1 to 0, as the
multiplier is scanned from right to left.
Figure 6.10. Booth recoding of a multiplier.
0 0 1 1 0 1 0 1 1 1 0 0 1 1 0 1 0 0
0 0 0 0 0 0 0 0 1 + 1 - 1 - 1 + 1 - 1 + 1 - 1 + 1 - 1 +
K. R. Sarath Chandran. AP/CSE/SSNCE
184
Booth Algorithm
Figure 6.11. Booth multiplication with a negative multiplier.
0 1
0
1 1 1 1 0 1 1
0 0 0 0 0 0 0 0 0
0 0
0 1 1 0
0 0 0 0 1 1 0
1 1 0 0 1 1 1
0 0 0 0 0 0
0 1 0 0 0 1 1 1 1 1
1
1 0 1 1 0 1
1 1 0 1 0
6 - ( )
13 + ( )

78 - ( )
+1 1 - 1 -
K. R. Sarath Chandran. AP/CSE/SSNCE
185
Booth Algorithm
Multiplier
Bit i Bit i 1 -
Version of multiplicand
selected by bit i
0
1
0
0
0 1
1 1
0 M
1 + M
1

M
0 M
Figure 6.12. Booth multiplier recoding table.

K. R. Sarath Chandran. AP/CSE/SSNCE


186
Booth Algorithm
Best case a long string of 1s (skipping over 1s)
Worst case 0s and 1s are alternating
1
0
1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0
0 0 1 1 1 1 0 1 1 0 1 0 0 0 1
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 -
1 - 1 - 1 - 1 -
1 - 1 -
1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 +
1 +
1 + 1 + 1 +
1 +
Worst-case
multiplier
Ordinary
multiplier
Good
multiplier
187
Fast Multiplication
K. R. Sarath Chandran. AP/CSE/SSNCE
188
Bit-Pair Recoding of
Multipliers
Bit-pair recoding halves the maximum number of
summands (versions of the multiplicand).
1 + 1
(a) Example of bit-pair recoding derived from Booth recoding
0
0 0 0
1 1 0 1 0
Implied 0 to right of LSB
1
0
Sign extension
1
2 1

K. R. Sarath Chandran. AP/CSE/SSNCE


189
Bit-Pair Recoding of
Multipliers
i 1 +
i 1
(b) Table of multiplicand selection decisions
selected at position i
Multiplicand
Multiplier bit-pair
i
0
0
1
1
1
0
1
0
1
1
1
1
0
0
0
1
1
0
0
1
0
0
1
Multiplier bit on the right
0 0 M
1 +
1
1 +
0
1
2
2 +

M
M
M
M
M
M
M
K. R. Sarath Chandran. AP/CSE/SSNCE
190
Bit-Pair Recoding of
Multipliers
1 -
0 0 0 0
1 1 1 1 1 0
0 0 0 0 1 1
1 1 1 1 1 0 0
0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 1
0 1 1 0 1
0
1 0 1 0 0 1 1 1 1 1
1 1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 1 0 1 1 0 0 1 0
0
1
0 0
1 0
1
0 0
0
0 1
0
0 1
1 0
0
0 1 0
0 1 1 0 1
1 1
1 -
6 - ( )
13 + ( )
1 +
78 - ( )
1 - 2 -

Figure 6.15. Multiplication requiring only n/2 summands.


K. R. Sarath Chandran. AP/CSE/SSNCE
191
Carry-Save Addition of
Summands
(13) Multiplicand M 1
1
(143) Product P
(11) Multiplier Q 1
0
0
1
1
1
1 1 0 1
1 0 1 1
0 0 0 0
1 0 1 1
0 1 0 0 1 1 1 1

(a) Manual multiplication algorithm


K. R. Sarath Chandran. AP/CSE/SSNCE
192
Carry-Save Addition of
Summands
M
u
l
t
i
p
l
i
e
r
Multiplicand
m
3
m
2
m
1
m
0
0 0 0 0
q
3
q
2
q
1
q
0
0
p
2
p
1
p
0
0
0
0
p
3
p
4
p
5
p
6
p
7
PP1
PP2
PP3
Partial product
(PP0)
p , p , ...p
PP4 =
7 6 0
= Product
Carry-in
q
i
m
j
Bit of incoming partial product (PP i)
Bit of outgoing partial product [PP( i +1)]
Carry-out
Typical cell
FA
(b) Array implementation
K. R. Sarath Chandran. AP/CSE/SSNCE
193
Carry-Save Addition of
Summands
CSA speeds up the addition process.
FA FA FA FA
FA FA FA FA
FA FA FA FA
p
7
p
6
p
5
p
4
p
3
p
1
p
0
p
2
0 m
3
q
0
m
3
q
1
(a) Ripple-carry array (Figure 6.6 structure)
m
2
q
1
m
2
q
0
m
1
q
0
m
1
q
1
m
0
q
1
m
3
q
2
m
2
q
2
m
1
q
2
m
0
q
2
m
3
q
3
m
2
q
3
m
1
q
3
m
0
q
3
0
0
0
m
0
q
0
K. R. Sarath Chandran. AP/CSE/SSNCE
194
Carry-Save Addition of
Summands
FA FA FA FA
FA FA FA FA
FA FA FA FA
p
7
p
6
p
5
p
4
p
3
p
1
p
0
p
2
0 m
3
q
0
m
3
q
1
(b) Carry-save array
m
2
q
1
m
2
q
0
m
1
q
0
m
1
q
1
m
0
q
1
m
2
q
3
m
1
q
3
m
0
q
3
0
0
0
m
2
q
2
m
1
q
2
m
0
q
2
m
3
q
2
m
3
q
3
m
0
q
0
Figure 6.16. Ripple-carry and carry-save arrays for the
multiplication operation Mx Q = P for 4-bit operands.
Figure 6.16. Ripple-carry and carry-save arrays for the multiplication operation M Q = P for 4-bit operands.
K. R. Sarath Chandran. AP/CSE/SSNCE
195
Carry-Save Addition of
Summands
The delay through the carry-save array is somewhat
less than delay through the ripple-carry array. This is
because the S and C vector outputs from each row
are produced in parallel in one full-adder delay.
Consider the addition of many summands, we can:
Group the summands in threes and perform carry-save addition on
each of these groups in parallel to generate a set of S and C vectors
in one full-adder delay
Group all of the S and C vectors into threes, and perform carry-save
addition on them, generating a further set of S and C vectors in one
more full-adder delay
Continue with this process until there are only two vectors remaining
They can be added in a RCA or CLA to produce the desired product
K. R. Sarath Chandran. AP/CSE/SSNCE
196
Carry-Save Addition of
Summands
Figure 6.17. A multiplication example used to illustrate carry-save addition as shown in Figure 6.18.
1 0 0 1 1 1
1 0 0 1 1 1
1 0 0 1 1 1
1 1 1 1 1 1
1 0 0 1 1 1
M
Q
A
B
C
D
E
F
(2,835)
X
(45)
(63)
1 0 0 1 1 1
1 0 0 1 1 1
1 0 0 1 1 1
0 0 0 1 1 1 1 1 1 0 0 0
Product
K. R. Sarath Chandran. AP/CSE/SSNCE
197
Figure 6.18. The multiplication example from Figure 6.17 performed using
carry-save addition.
0 0 0 0 0 1 0 1 0 1 0
1 0 0 1 0 0 0 0 1 1 1 1
+
1 0 0 0 0 1 1 1
1 0 0 1 0 1 1 1 0 1 0 1
0 1 1 0 1 1 0 0
0 0 0 1 1 0 1 0 0 0 0
1 0 0 0 1 0 1 1 1 0 1
1 1 0 0 0 1 1 0
0 0 1 1 1 1 0 0
0 0 1 1 0 1 1 0
1 1 0 0 1 0 0 1
1 0 0 1 1 1
1 0 0 1 1 1
1 0 0 1 1 1
0 0 1 1 0 1 1 0
1 1 0 0 1 0 0 1
1 0 0 1 1 1
1 0 0 1 1 1
1 0 0 1 1 1
1 1 1 1 1 1
1 0 0 1 1 1
M
Q
A
B
C
S
1
C
1
D
E
F
S
2
C
2
S
1
C
1
S
2
S
3
C
3
C
2
S
4
C
4
Product
x
K. R. Sarath Chandran. AP/CSE/SSNCE
198
Carry-Save Addition of
Summands
Figure 6.19. Schematic representation of the carry-save
C
2
A B E D C F
addition operations in Figure 6.18.
Level 1 CSA
S
2
C
1
S
1
C
2
C
3
S
3
C
4
S
4
+
Product
Level 2 CSA
Level 3 CSA
Final addition
Figure 6.19. Schematic representation of the carry-save addition operations in Figure 6.18.
K. R. Sarath Chandran. AP/CSE/SSNCE
199
Carry-Save Addition of
Summands
When the number of summands is large, the
time saved is proportionally much greater.
Some omitted issues:
Sign-extension
Computation width of the final CLA/RCA
Bit-pair recoding
200
Integer Division
K. R. Sarath Chandran. AP/CSE/SSNCE
201
Manual Division
Figure 6.20. Longhand division examples.
1101
1
13
14
26
21
274 100010010
10101
1101
1
1110
1101
10000
13 1101
K. R. Sarath Chandran. AP/CSE/SSNCE
202
Longhand Division Steps
Position the divisor appropriately with respect to the
dividend and performs a subtraction.
If the remainder is zero or positive, a quotient bit of
1 is determined, the remainder is extended by
another bit of the dividend, the divisor is
repositioned, and another subtraction is performed.
If the remainder is negative, a quotient bit of 0 is
determined, the dividend is restored by adding back
the divisor, and the divisor is repositioned for
another subtraction.
K. R. Sarath Chandran. AP/CSE/SSNCE
203
Circuit Arrangement
q
n 1 -
m
n 1 -
-bit
Divisor M
Control
sequencer
Dividend Q
Shift left
adder
a
n 1 -
a
0
q
0
m
0
a
n
0
Add/Subtract
Quotient
setting
n 1 +
Figure 6.21. Circuit arrangement for binary division.
A
K. R. Sarath Chandran. AP/CSE/SSNCE
204
Restoring Division
Shift A and Q left one binary position
Subtract M from A, and place the answer
back in A
If the sign of A is 1, set q
0
to 0 and add M
back to A (restore A); otherwise, set q
0
to 1
Repeat these steps n times
K. R. Sarath Chandran. AP/CSE/SSNCE
205
Examples
1 0 1 1 1
Figure 6.22. A restoring-division example.
1 1 1 1 1
0 1 1 1 1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
10 1
1 1
11
0 1
0 0 0 1
Subtract
Shift
Restore
1 0 0 0 0
1 0 0 0 0
1 1
Initially
Subtract
Shift
1 0 1 1 1
1 0 0 0 0
1 1 0 0 0
0 0 0 0 0
Subtract
Shift
Restore
1 0 1 1 1
0 1 0 0 0
1 0 0 0 0
1 1
Quotient Remainder
Shift
1 0 1 1 1
1 0 0 0 0
Subtract
Second cycle
First cycle
Third cycle
Fourth cycle
0
0
0
0
0
0
1
0
1
1 0 0 0 0
1 1
1 0 0 0 0
1 1 1 1 1
Restore
q
0
Set
q
0
Set
q
0
Set
q
0
Set
K. R. Sarath Chandran. AP/CSE/SSNCE
206
Nonrestoring Division
Avoid the need for restoring A after an
unsuccessful subtraction.
Any idea?
Step 1: (Repeat n times)
If the sign of A is 0, shift A and Q left one bit position and
subtract M from A; otherwise, shift A and Q left and add
M to A.
Now, if the sign of A is 0, set q
0
to 1; otherwise, set q
0
to
0.
Step2: If the sign of A is 1, add M to A
K. R. Sarath Chandran. AP/CSE/SSNCE
207
Examples
Figure 6.23. A nonrestoring-division example.
1
Add
Quotient
Remainder
0 0 0 0 1
0 0 1 0 1 1 1 1 1
1 1 1 1 1
0 0 0 1 1
0 0 0 0 1 1 1 1 1
Shift 0 0 0
1 1 0 0 0
0 1 1 1 1
Add
0 0 0 1 1
0 0 0 0 1 0 0 0
1 1 1 0 1
Shift
Subtract
Initially 0 0 0 0 0 1 0 0 0
1 1 1 0 0 0 0 0
1 1 1 0 0
0 0 0 1 1
0 0 0 Shift
Add
0 0 1 0 0 0 0 1
1 1 1 0 1
Shift
Subtract
0 0 0 1 1 0 0 0 0
Restore remainder
Fourth cycle
Third cycle
Second cycle
First cycle
q
0
Set
q
0
Set
q
0
Set
q
0
Set
208
Floating-Point Numbers
and Operations
K. R. Sarath Chandran. AP/CSE/SSNCE
209
Floating-Point Numbers
So far we have dealt with fixed-point numbers (what
is it?), and have considered them as integers.
Floating-point numbers: the binary point is just to the
right of the sign bit.
Where the range of F is:
The position of the binary point is variable and is
automatically adjusted as computation proceeds.
) 1 ( 2 1 0
.

=
n
b b b b B
) 1 (
) 1 (
2
2
1
1
0
0
2 2 2 2 ) (

+ + + + =
n
n
b b b b B F
) 1 (
2 1 1


n
F
K. R. Sarath Chandran. AP/CSE/SSNCE
210
Floating-Point Numbers
What are needed to represent a floating-point
decimal number?
Sign
Mantissa (the significant digits)
Exponent to an implied base (scale factor)
Normalized the decimal point is placed to
the right of the first (nonzero) significant digit.
K. R. Sarath Chandran. AP/CSE/SSNCE
211
IEEE Standard for Floating-
Point Numbers
Think about this number (all digits are decimal):
X
1
.X
2
X
3
X
4
X
5
X
6
X
7
10
Y1Y2
It is possible to approximate this mantissa precision
and scale factor range in a binary representation
that occupies 32 bits: 24-bit mantissa (1 sign bit for
signed number), 8-bit exponent.
Instead of the signed exponent, E, the value actually
stored in the exponent field is an unsigned integer
E=E+127, so called excess-127 format
K. R. Sarath Chandran. AP/CSE/SSNCE
212
IEEE Standard
Sign of
number :
32 bits
mantissa fraction
23-bit
representation
excess-127
exponent in
8-bit signed
52-bit
mantissa fraction
11-bit excess-1023
exponent
64 bits
Sign
Value represented
0 0 1 0 1 0
. . .
0 0 0 0 1 0 1 0 0 0
S M
S M
Value represented
(a) Single precision
(b) Example of a single-precision number
(c) Double precision
Figure 6.24. IEEE standard floating-point formats.
E
+
1.0010100 2
87 -
=
1.M 2
E 127 -
=
Value represented 1.M 2
E 1023 -
=
E
0 signifies
- 1 signifies
(101000)
2
=40
10,
40-127=-87
K. R. Sarath Chandran. AP/CSE/SSNCE
213
IEEE Standard
For excess-127 format, 0 E 255.
However, 0 and 255 are used to represent
special value. So actually 1 E 254. That
means -126 E 127.
Single precision uses 32-bit. The value range
is from 2
-126
to 2
+127
.
Double precision used 64-bit. The value
range is from 2
-1022
to 2
+1023
.
K. R. Sarath Chandran. AP/CSE/SSNCE
214
Two Aspects
If a number is not normalized, it can always be put in normalized
form by shifting the fraction and adjusting the exponent.
0 1 1 0 0 1 0 0 0 0 1 0 1
(a) Unnormalized value
(b) Normalized version
0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 ...
(There is no implicit 1 to the left of the binary point.)
Value represented
0.0010110 2
9
+ =
...
Value represented 1.0110 2
6
+ =
Figure 6.25. Floating-point normalization in IEEE single-precision format.
excess-127 exponent
(100001000)
2
=136
10,
136-127=-9
6+127=133. 133
10,
= (100000101)2
K. R. Sarath Chandran. AP/CSE/SSNCE
215
Two Aspects
As computations proceed, a number that
does not fall in the representable range of
normal numbers might be generated.
It requires an exponent less than -126
(underflow) or greater than +127 (overflow).
Both are exceptions that need to be
considered.
K. R. Sarath Chandran. AP/CSE/SSNCE
216
Special Values
The end value 0 and 255 are used to represent
special values.
When E=0 and M=0, the value exact 0 is
represented. (0)
When E=255 and M=0, the value is represented.
( )
When E=0 and M0, denormal numbers are
represented. The value is 0.M2
-126
.
When E=255 and M0, Not a Number (NaN).
K. R. Sarath Chandran. AP/CSE/SSNCE
217
Exceptions
A processor must set exception flags if any of
the following occur in performing operations:
underflow, overflow, divide by zero, inexact,
invalid.
When exception occurs, the results are set to
special values.
K. R. Sarath Chandran. AP/CSE/SSNCE
218
Arithmetic Operations on
Floating-Point Numbers
Add/Subtract rule
Choose the number with the smaller exponent and shift its mantissa right a
number of steps equal to the difference in exponents.
Set the exponent of the result equal to the larger exponent.
Perform addition/subtraction on the mantissas and determine the sign of the
result.
Normalize the resulting value, if necessary.
Multiply rule
Add the exponents and subtract 127.
Multiply the mantissas and determine the sign of the result.
Normalize the resulting value, if necessary.
Divide rule
Subtract the exponents and add 127.
Divide the mantissas and determine the sign of the result.
Normalize the resulting value, if necessary.
K. R. Sarath Chandran. AP/CSE/SSNCE
219
Guard Bits and Truncation
During the intermediate steps, it is important
to retain extra bits, often called guard bits, to
yield the maximum accuracy in the final
results.
Removing the guard bits in generating a final
result requires truncation of the extended
mantissa how?
K. R. Sarath Chandran. AP/CSE/SSNCE
220
Guard Bits and Truncation
Chopping biased, 0 to 1 at LSB.
Von Neumann Rounding (any of the bits to be removed are 1,
the LSB of the retained bits is set to 1) unbiased, -1 to +1 at
LSB.
Why unbiased rounding is better for the cases that many
operands are involved?
Rounding (A 1 is added to the LSB position of the bits to be
retained if there is a 1 in the MSB position of the bits being
removed) unbiased, -to +at LSB.
Round to the nearest number or nearest even number in case of a tie
(0.b
-1
b
-2
0000 - 0.b
-1
b
-2
0, 0.b
-1
b
-2
1100 - 0.b
-1
b
-2
1+0.001)
Best accuracy
Most difficult to implement
0.b
-1
b
-2
b
-3
000 -- 0.b
-1
b
-2
b
-3
1110.b
-1
b
-2
b
-3
All 6-bit fractions with b
-4
b
-5
b
6
not equal to 000 are truncated to 0.b
-1
b
-2
1
K. R. Sarath Chandran. AP/CSE/SSNCE
221
Implementing Floating-Point
Operations
Hardware/software
In most general-purpose processors, floating-
point operations are available at the machine-
instruction level, implemented in hardware.
In high-performance processors, a significant
portion of the chip area is assigned to
floating-point operations.
Addition/subtraction circuitry
K. R. Sarath Chandran. AP/CSE/SSNCE
222
Figure 6.26. Floating-point addition-subtraction unit.
E
X
MagnitudeM
with larger E
Mof number
with smaller E
M of number
subtractor
8-bit
sign
subtractor
8-bit
MUX
Mantissa
SHIFTER
SWAP
detector
Normalize and
round
Leading zeros
to right
adder/subtractor
Subtract
Add /
Sign
Add/Sub
n bits
S
A
S
B
E
A
E
B

M
A
M
B
n E
A
E
B
- =
E
A
E
B

S
R
E X -
E
R
M
R
R:
32-bit
result
R A B + =
32-bit operands
A : S
A
E
A
M
A
, ,
B : S
B
E
B
M
B
, ,
Combinational
CONTROL
network