CS 2253
COMPUTER ORGANIZATION AND
ARCHITECTURE
K. R. Sarath Chandran
Assistant Professor (CSE Dept)
SSN College of Engineering
2
Unit 1
3
Basic Structure of
Computers
4
Functional Units
K. R. Sarath Chandran. AP/CSE/SSNCE
5
Functional Units
Figure 1.1. Basic functional units of a computer.
I/O Processor
Output
Memory
Input
and
Arithmetic
logic
Control
K. R. Sarath Chandran. AP/CSE/SSNCE
6
Information Handled by a
Computer
Instructions/machine instructions
Govern the transfer of information within a computer as
well as between the computer and its I/O devices
Specify the arithmetic and logic operations to be
performed
Program
Data
Used as operands by the instructions
Source program
Encoded in binary code 0 and 1
K. R. Sarath Chandran. AP/CSE/SSNCE
7
Memory Unit
Store programs and data
Two classes of storage
Primary storage
Fast
Programs must be stored in memory while they are being executed
Large number of semiconductor storage cells
Processed in words
Address
RAM and memory access time
Memory hierarchy cache, main memory
Secondary storage larger and cheaper
K. R. Sarath Chandran. AP/CSE/SSNCE
8
Arithmetic and Logic Unit
(ALU)
Most computer operations are executed in
ALU of the processor.
Load the operands into memory bring them
to the processor perform operation in ALU
store the result back to memory or retain in
the processor.
Registers
Fast control of ALU
K. R. Sarath Chandran. AP/CSE/SSNCE
9
Control Unit
All computer operations are controlled by the control
unit.
The timing signals that govern the I/O transfers are
also generated by the control unit.
Control unit is usually distributed throughout the
machine instead of standing alone.
Operations of a computer:
Accept information in the form of programs and data through an
input unit and store it in the memory
Fetch the information stored in the memory, under program control,
into an ALU, where the information is processed
Output the processed information through an output unit
Control all activities inside the machine through a control unit
K. R. Sarath Chandran. AP/CSE/SSNCE
10
The processor : Data Path and
Control
Two types of functional units:
elements that operate on data values (combinational)
elements that contain state (state elements)
K. R. Sarath Chandran. AP/CSE/SSNCE
11
Five Execution Steps
Load: Reg[IR[20-16]] =
MDR
Memory read completion
Load:MDR =Mem[ALUOut]
or
Store:Mem[ALUOut] = B
Reg[IR[15-11]] =
ALUOut
Memory access or R-type
completion
PC=PC[31-
28]||(IR[25-
0]<<2)
IF(A==B) Then
PC=ALUOut
ALUOut = A+sign
extend(IR[15-0])
ALUOut = A op B Execution, address
computation, branch/jump
completion
A = Reg[IR[25-21]]
B = Reg[IR[20-16]]
ALUOut = PC + (sign extend (IR[15-0])<<2)
Instruction decode/ register
fetch
IR = MEM[PC]
PC = PC + 4
Instruction fetch
Action for Action for
jumps jumps
Action for Action for
branches branches
Action for Memory Action for Memory- -
reference Instructions reference Instructions
Action for R Action for R- -type type
instructions instructions
Step name Step name
12
Basic Operational
Concepts
K. R. Sarath Chandran. AP/CSE/SSNCE
13
Review
Activity in a computer is governed by instructions.
To perform a task, an appropriate program
consisting of a list of instructions is stored in the
memory.
Individual instructions are brought from the memory
into the processor, which executes the specified
operations.
Data to be used as operands are also stored in the
memory.
K. R. Sarath Chandran. AP/CSE/SSNCE
14
A Typical Instruction
Add LOCA, R0
Add the operand at memory location LOCA to the
operand in a register R0 in the processor.
Place the sum into register R0.
The original contents of LOCA are preserved.
The original contents of R0 is overwritten.
Instruction is fetched from the memory into the
processor the operand at LOCA is fetched and
added to the contents of R0 the resulting sum is
stored in register R0.
K. R. Sarath Chandran. AP/CSE/SSNCE
15
Separate Memory Access and
ALU Operation
Load LOCA, R1
Add R1, R0
Whose contents will be overwritten?
K. R. Sarath Chandran. AP/CSE/SSNCE
16
Connection Between the
Processor and the Memory
Figure 1.2. Connections between the processor and the memory.
Processor
Memory
PC
IR
MDR
Control
ALU
R
n 1 -
R
1
R
0
MAR
n general purpose
registers
K. R. Sarath Chandran. AP/CSE/SSNCE
17
Registers
Instruction register (IR)
Program counter (PC)
General-purpose register (R
0
R
n-1
)
Memory address register (MAR)
Memory data register (MDR)
K. R. Sarath Chandran. AP/CSE/SSNCE
18
Typical Operating Steps
Programs reside in the memory through input
devices
PC is set to point to the first instruction
The contents of PC are transferred to MAR
A Read signal is sent to the memory
The first instruction is read out and loaded
into MDR
The contents of MDR are transferred to IR
Decode and execute the instruction
K. R. Sarath Chandran. AP/CSE/SSNCE
19
Typical Operating Steps
(Cont)
Get operands for ALU
General-purpose register
Memory (address to MAR Read MDR to ALU)
Perform operation in ALU
Store the result back
To general-purpose register
To memory (address to MAR, result to MDR Write)
During the execution, PC is
incremented to the next instruction
K. R. Sarath Chandran. AP/CSE/SSNCE
20
Interrupt
Normal execution of programs may be preempted if
some device requires urgent servicing.
The normal execution of the current program must
be interrupted the device raises an interrupt
signal.
Interrupt-service routine
Current system information backup and restore (PC,
general-purpose registers, control information,
specific information)
K. R. Sarath Chandran. AP/CSE/SSNCE
21
Bus Structures
There are many ways to connect different
parts inside a computer together.
A group of lines that serves as a connecting
path for several devices is called a bus.
Address/data/control
K. R. Sarath Chandran. AP/CSE/SSNCE
22
Bus Structure
Single-bus
Figure 1.3. Single-bus structure.
Memory Input Output Processor
K. R. Sarath Chandran. AP/CSE/SSNCE
23
Speed Issue
Different devices have different
transfer/operate speed.
If the speed of bus is bounded by the slowest
device connected to it, the efficiency will be
very low.
How to solve this?
A common approach use buffers.
24
Performance
K. R. Sarath Chandran. AP/CSE/SSNCE
25
Performance
The most important measure of a computer is
how quickly it can execute programs.
Three factors affect performance:
Hardware design
Instruction set
Compiler
K. R. Sarath Chandran. AP/CSE/SSNCE
26
Performance
Processor time to execute a program depends on the hardware
involved in the execution of individual machine instructions.
Main
memory
Processor
Bus
Cache
memory
Figure 1.5. The processor cache.
K. R. Sarath Chandran. AP/CSE/SSNCE
27
Performance
The processor and a relatively small cache
memory can be fabricated on a single
integrated circuit chip.
Speed
Cost
Memory management
K. R. Sarath Chandran. AP/CSE/SSNCE
28
Processor Clock
Clock, clock cycle, and clock rate
The execution of each instruction is divided
into several steps, each of which completes
in one clock cycle.
Hertz cycles per second
K. R. Sarath Chandran. AP/CSE/SSNCE
29
Basic Performance Equation
T processor time required to execute a program that has been
prepared in high-level language
N number of actual machine language instructions needed to
complete the execution (note: loop)
S average number of basic steps needed to execute one
machine instruction. Each step completes in one clock cycle
R clock rate
Note: these are not independent to each other
R
S N
T
=
How to improve T?
K. R. Sarath Chandran. AP/CSE/SSNCE
30
Pipeline and Superscalar
Operation
Instructions are not necessarily executed one after
another.
The value of S doesnt have to be the number of
clock cycles to execute one instruction.
Pipelining overlapping the execution of successive
instructions.
Add R1, R2, R3
Superscalar operation multiple instruction
pipelines are implemented in the processor.
Goal reduce S (could become <1!)
K. R. Sarath Chandran. AP/CSE/SSNCE
31
Clock Rate
Increase clock rate
Improve the integrated-circuit (IC) technology to make
the circuits faster
Reduce the amount of processing done in one basic step
(however, this may increase the number of basic steps
needed)
Increases in R that are entirely caused by
improvements in IC technology affect all
aspects of the processors operation equally
except the time to access the main memory.
K. R. Sarath Chandran. AP/CSE/SSNCE
32
CISC and RISC
Tradeoff between N and S
A key consideration is the use of pipelining
S is close to 1 even though the number of basic steps
per instruction may be considerably larger
It is much easier to implement efficient pipelining in
processor with simple instruction sets
Reduced Instruction Set Computers (RISC)
Complex Instruction Set Computers (CISC)
K. R. Sarath Chandran. AP/CSE/SSNCE
33
Compiler
A compiler translates a high-level language program
into a sequence of machine instructions.
To reduce N, we need a suitable machine instruction
set and a compiler that makes good use of it.
Goal reduce NS
A compiler may not be designed for a specific
processor; however, a high-quality compiler is
usually designed for, and with, a specific processor.
K. R. Sarath Chandran. AP/CSE/SSNCE
34
Performance Measurement
T is difficult to compute.
Measure computer performance using benchmark programs.
System Performance Evaluation Corporation (SPEC) selects and
publishes representative application programs for different application
domains, together with test results for many commercially available
computers.
Compile and run (no simulation)
Reference computer
=
=
=
n
i
n
i
SPEC rating SPEC
rating SPEC
1
1
) (
under test computer on the time Running
computer reference on the time Running
K. R. Sarath Chandran. AP/CSE/SSNCE
35
Multiprocessors and
Multicomputers
Multiprocessor computer
Execute a number of different application tasks in parallel
Execute subtasks of a single large task in parallel
All processors have access to all of the memory shared-memory
multiprocessor
Cost processors, memory units, complex interconnection networks
Multicomputers
Each computer only have access to its own memory
Exchange message via a communication network message-
passing multicomputers
36
Machine Instructions
and Programs
K. R. Sarath Chandran. AP/CSE/SSNCE
37
Objectives
Machine instructions and program execution,
including branching and subroutine call and return
operations.
Number representation and addition/subtraction in
the 2s-complement system.
Addressing methods for accessing register and
memory operands.
Assembly language for representing machine
instructions, data, and programs.
Program-controlled Input/Output operations.
Operations on stack, queue, list, linked-list, and
array data structures.
38
Number, Arithmetic
Operations, and
Characters
K. R. Sarath Chandran. AP/CSE/SSNCE
39
Unsigned Integer
Consider a n-bit vector of the form:
where a
i
=0 or 1 for i in [0, n-1].
This vector can represent positive integer values V =
A in the range 0 to 2
n
-1, where
0 3 2 1
a a a a A
n n n
=
0
0
1
1
2
2
1
1
2 2 2 2 a a a a A
n
n
n
n
+ + + + =
Branch target
Conditional branch
K. R. Sarath Chandran. AP/CSE/SSNCE
77
Condition Codes
Condition code flags
Condition code register / status register
N (negative)
Z (zero)
V (overflow)
C (carry)
Different instructions affect different flags
K. R. Sarath Chandran. AP/CSE/SSNCE
78
Generating Memory Addresses
How to specify the address of branch target?
Can we give the memory operand address
directly in a single Add instruction in the loop?
Use a register to hold the address of NUM1;
then increment by 4 on each pass through
the loop.
79
Addressing Modes
K. R. Sarath Chandran. AP/CSE/SSNCE
80
Addressing Modes
The different
ways in which
the location of
an operand is
specified in
an instruction
are referred
to as
addressing
modes.
Name Assembler syntax Addressingfunction
Immediate #Value Operand = Value
Register Ri EA = Ri
Absolute(Direct) LOC EA = LOC
Indirect (Ri ) EA = [Ri ]
(LOC) EA = [LOC]
Index X(Ri ) EA = [Ri ] + X
Basewith index (Ri ,Rj ) EA = [Ri ] + [Rj ]
Basewith index X(Ri ,Rj ) EA = [Ri ] + [Rj ] + X
and offset
Relative X(PC) EA = [PC] + X
Autoincrement (Ri )+ EA = [Ri ] ;
Increment Ri
Autodecrement (Ri ) Decrement Ri ;
EA = [Ri ]
739
43
28 -
K. R. Sarath Chandran. AP/CSE/SSNCE
113
Stacks
Stack Pointer (SP)
Push
Subtract #4, SP
Move NEWITEM, (SP)
Move NEWITEM, -(SP)
Pop
Move (SP), ITEM
Add #4, SP
Move (SP)+, ITEM
K. R. Sarath Chandran. AP/CSE/SSNCE
114
Stacks
Figure 2.22. Effect of stack operations on the stack in Figure 2.21.
(b) After pop into ITEM (a) After push from NEWITEM
17
739
43
ITEM
SP
Stack
SP
NEWITEM
19
17
739
19
43
28 -
28 - 28 -
17
0
SP
2
k
1 -
739
43
28 -
K. R. Sarath Chandran. AP/CSE/SSNCE
115
Stacks
The size of stack in a program is fixed in
memory.
Need to avoid pushing if the maximum size is
reached
Need to avoid popping if stack is empty
Compare instruction
Compare src, dst
[dst] [src]
Will not change the values of src and dst.
K. R. Sarath Chandran. AP/CSE/SSNCE
116
Stacks
SAFEPOP Compare #2000,SP Check to seeif thestack pointer contains
Branch>0 EMPTYERROR an addressvaluegreaterthan 2000. If it
does,the stack is empty. Branch to the
routine EMPTYERROR for appropriate
action.
Move (SP)+,ITEM Otherwise,pop the top of the stack into
memorylocationITEM.
SAFEPUSH Compare #1500,SP Check to seeif the stack pointer
Branch 0 FULLERROR contains an addressvalueequal
to or lessthan1500. If it does, the
stack is full. Branch to the routine
FULLERROR for appropriateaction.
Move NEWITEM, (SP) Otherwise,push the element in memory
location NEWITEM onto the stack.
Figure 2.23. Checking for empty and full errors in pop and push operations.
(b) Routine for a safe push operation
(a) Routine for a safe pop operation
K. R. Sarath Chandran. AP/CSE/SSNCE
117
Queues
Data are stored in and retrieved from a queue
on a First-In-First-Out (FIFO) basis.
New data are added at the back (high-
address end) and retrieved from the front
(low-address end).
How many pointers are needed for stack and
queue, respectively?
Circular buffer
1, 2
118
Subroutines
K. R. Sarath Chandran. AP/CSE/SSNCE
119
Subroutines
It is often necessary to perform a particular subtask (subroutine)
many times on different data values.
To save space, only one copy of the instructions that constitute
the subroutine is placed in the memory.
Any program that requires the use of the subroutine simply
branches to its starting location (Call).
After a subroutine has been executed, it is said to return to the
program that called the subroutine, and the program resumes
execution. (Return)
Return
1000
location
Memory
Calling program
Memory
location
200
204
Call SUB
next instruction
Subroutine SUB
first instruction
K. R. Sarath Chandran. AP/CSE/SSNCE
120
Subroutines
Since the subroutine may be called from different
places in a calling program, provision must be made
for returning to the appropriate location.
Subroutine Linkage method: use link register to
store the PC.
Call instruction
Store the contents of the PC in the link register
Branch to the target address specified by the instruction
Return instruction
Branch to the address contained in the link register
K. R. Sarath Chandran. AP/CSE/SSNCE
121
Subroutines
Return Call
Figure 2.24. Subroutine linkage using a link register.
1000
204
204
Link
PC
Return
1000
location
Memory
Calling program
Memory
location
200
204
Call SUB
next instruction
Subroutine SUB
first instruction
K. R. Sarath Chandran. AP/CSE/SSNCE
122
Subroutine Nesting and The
Processor Stack
If a subroutine calls another subroutine, the
contents in the link register will be destroyed.
If subroutine A calls B, B calls C, after C has
been executed, the PC should return to B,
then A
LIFO Stack
Automatic process
by the Call instruction
Processor stack
MEM location of B
MEM location of A
SP
K. R. Sarath Chandran. AP/CSE/SSNCE
123
Parameter Passing
Exchange of information between a calling
program and a subroutine.
Several ways:
Through registers
Through memory locations
Through stack
K. R. Sarath Chandran. AP/CSE/SSNCE
124
Passing Parameters through
Processor Registers
Calling program
Move N,R1 R1 servesas a counter.
Move #NUM1,R2 R2 pointsto thelist.
Call LISTADD Call subroutine.
Move R0,SUM Save result.
.
.
.
Subroutine
LISTADD Clear R0 Initialize sumto 0.
LOOP Add (R2)+,R0 Add entry from list.
Decrement R1
Branch>0 LOOP
Return Returnto calling program.
Figure 2.25. Program of Figure 2.16 written as a subroutine; parameters passed through registers.
K. R. Sarath Chandran. AP/CSE/SSNCE
125
Passing Parameters through
Stack
Assumetop of stack is atlevel 1 below.
Move #NUM1, (SP) Pushparametersonto stack.
Move N, (SP)
Call LISTADD Call subroutine
(top of stack at level 2).
Move 4(SP),SUM Save result.
Add #8,SP Restoretop of stack
(top of stack at level 1).
.
.
.
LISTADD MoveMultiple R0 R2, (SP) Save registers
(top of stack at level 3).
Move 16(SP),R1 Initialize counter to n.
Move 20(SP),R2 Initialize pointer to the list.
Clear R0 Initialize sumto 0.
LOOP Add (R2)+,R0 Add entry from list.
Decrement R1
Branch>0 LOOP
Move R0,20(SP) Put result on the stack.
MoveMultiple (SP)+,R0 R2 Restoreregisters.
Return Return to calling program.
(a) Calling program and subroutine
[R2]
[R1]
[R0]
Return address
n
NUM1
Level 3
Level 2
Level 1
Figure 2.26. Program of Figure 2.16 written as a subroutine; parameters passed on the stack.
(b) Top of stack at various times
- Passing by reference
- Passing by value
Page 76
K. R. Sarath Chandran. AP/CSE/SSNCE
126
The Stack Frame
Some stack locations constitute a private
work space for a subroutine, created at the
time the subroutine is entered and freed up
when the subroutine returns control to the
calling program. Such space is called a stack
frame.
Frame pointer (FP)
Index addressing to access data inside frame
-4(FP), 8(FP),
127
Additional
Instructions
K. R. Sarath Chandran. AP/CSE/SSNCE
128
Logic Instructions
AND
OR
NOT (whats the purpose of the following)
Not R0
Add #1, R0
Determine if the leftmost character of the four ASCII
characters stored in the 32-bit register R0 is Z
(01011010)
And #$FF000000, R0
Compare #$5A000000, R0
Branch=0 YES
K. R. Sarath Chandran. AP/CSE/SSNCE
129
Logical Shifts
Logical shift shifting left (LShiftL) and shifting right
(LShiftR)
C R0 0
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
0 0 1 1 1 0 0 0
(b) Logical shift r ight LShiftR #2,R0
(a) Logical shift left LShiftL #2,R0
C R0 0
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
1 1 0
. . .
0 0 1 0 1
. . .
K. R. Sarath Chandran. AP/CSE/SSNCE
130
Logical Shifts
Two decimal digits represented in ASCII code are located at LOC
and LOC+1. Pack these two digits in a single byte location PACKED.
Extract the low-order four bits in LOC and LOC+1, and concatenate
them into the single byte at PACKED. Example: A1, B212
Move #LOC,R0 R0 pointsto data.
MoveByte (R0)+,R1 Load first byte into R1.
LShiftL #4,R1 Shift left by 4 bit positions.
MoveByte (R0),R2 Load secondbyte into R2.
And #$F,R2 Eliminate high-order bits.
Or R1,R2 ConcatenatetheBCD digits.
MoveByte R2,PACKED Store the result.
Figure 2.31. A routine that packs two BCD digits.
K. R. Sarath Chandran. AP/CSE/SSNCE
131
Arithmetic Shifts
C
before:
after:
0
1
1 1 0 0 0 1
. . .
0 1
1 1 0 0 1 0 1 1
(c) Arithmetic shift right AShiftR #2,R0
R0
. . .
K. R. Sarath Chandran. AP/CSE/SSNCE
132
Rotate
Figure 2.32. Rotate instructions.
C R0
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
1 0 1 1 1 0 0 1
(c) Rotate r ight without carry RotateR #2,R0
(a) Rotate left without carr y RotateL #2,R0
C R0
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
1 1 0
. . .
1 0 1 0 1
C
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
1 0 1 1 1 0 0 0
(d) Rotate r ight with carry RotateRC #2,R0
R0
. . .
. . .
(b) Rotate left with carr y RotateLC #2,R0
C R0
before:
after:
0
1
0 0 0 1 1 1
. . .
1 1
1 1 0
. . .
0 0 1 0 1
K. R. Sarath Chandran. AP/CSE/SSNCE
133
Multiplication and Division
Not very popular (especially division)
Multiply R
i
, R
j
R
j
[R
i
] [R
j
]
2n-bit product case: high-order half in R(j+1)
Divide R
i
, R
j
R
j
[R
i
] / [R
j
]
Quotient is in Rj, remainder may be placed in R(j+1)
134
Example Programs
K. R. Sarath Chandran. AP/CSE/SSNCE
135
Vector Dot Product Program
Move #AVEC,R1 R1 points to vector A.
Move #BVEC,R2 R2 points to vector B.
Move N,R3 R3 serves asa counter.
Clear R0 R0 accumulatesthedot product.
LOOP Move (R1)+,R4 Computethe product of
Multiply (R2)+,R4 nextcomponents.
Add R4,R0 Add toprevioussum.
Decrement R3 Decrement thecounter.
Branch
>0
LOOP Loop againif not done.
Move R0,DOTPROD Storedot product in memory.
Figure 2.33. A program for computing the dot product of two vectors.
=
=
1
0
) ( ) ( Product Dot
n
i
i B i A
K. R. Sarath Chandran. AP/CSE/SSNCE
136
Byte-Sorting Program
Sort a list of bytes stored in memory into
ascending alphabetic order.
The list consists of n bytes, not necessarily
distinct, stored from memory location LIST to
LIST+n-1. Each byte contains the ASCII code
for a character from A to Z. The first bit is 0.
Straight-selection algorithm (compare and
swap)
K. R. Sarath Chandran. AP/CSE/SSNCE
137
Byte-Sorting Program
for
(j = n 1; j > 0; j = j 1)
{ for ( k = j 1; k > = 0; k = k 1 )
{ if (LIST[k] > LIST[ j])
{ TEMP = LIST[k];
LIST[k] = LIST[ j ];
LIST[ j] = TEMP;
}
}
}
(a) C-language program for sorting
Move #LIST,R0 Load LIST into baseregister R0.
Move N,R1 Initialize outer loop index
Subtract #1,R1 register R1 to j = n 1.
OUTER Move R1,R2 Initialize inner loop index
Subtract #1,R2 register R2 to k = j 1.
MoveByte (R0,R1),R3 Load LIST( j ) into R3, which holds
current maximumin sublist.
INNER CompareByte R3,(R0,R2) If LIST( k) [R3],
Branch 0 NEXT do not exhange.
MoveByte (R0,R2),R4 Otherwise,exchangeLIST(k)
MoveByte R3,(R0,R2) with LIST(j ) andload
MoveByte R4,(R0,R1) newmaximuminto R3.
MoveByte R4,R3 RegisterR4 servesasTEMP.
NEXT Decrement R2 Decrement index registersR2 and
Branch 0 INNER R1, which alsoserve
Decrement R1 asloop counters, andbranch
Branch> 0 OUTER back if loopsnot finished.
(b) Assembly language program for sorting
Head pointer
K. R. Sarath Chandran. AP/CSE/SSNCE
143
Insertion of a New Record
Suppose that the ID
number of the new
record is 28241,
and the next
available free
record block is at
address 2960.
What are the
possibilities of the
new records
position in the list?
One-entry list
New head
Interior positon
New tail
INSERTION
Compare
Branch>0
Move RNEWREC, RHEAD
Return
Compare (RHEAD), (RNEWREC)
Branch>0 SEARCH
Move RHEAD, 4(RNEWREC)
Move RNEWREC, RHEAD
Return
Move RHEAD, RCURRENT
Move 4(RCURRENT), RNEXT
Compare
Branch=0 TAIL
(RNEXT), (RNEWREC)
Branch<0
#0, RNEXT
Compare
HEAD
not empty
#0, RHEAD
HEAD
INSERT
Move RNEXT, RCURRENT
Move
Branch
RNEXT, 4(RNEWREC)
Move RNEWREC, 4(RCURRENT)
Return
SEARCH
insert new record
somewhere after
current head
LOOP
insert new record in
an interior position
new record becomes ne w tail
INSERT
TAIL
LOOP
new record
becomes a
one-entry list
new record
becomes
new head
K. R. Sarath Chandran. AP/CSE/SSNCE
144
Deletion of a Record
DELETION Compare (RHEAD), RIDNUM
Branch>0 SEARCH
Move 4(RHEAD), RHEAD
Return
Move RHEAD, RCURRENT
Move 4(RCURRENT), RNEXT
Compare (RNEXT), RIDNUM
Branch=0 DELETE
Move RNEXT, RCURRENT
Branch
Move 4(RNEXT), RTEMP
RTEMP, 4(RCURRENT)
Return
LOOP
Move
not the head record
SEARCH
LOOP
DELETE
Figure 2.38. A subroutine for deleting a record from a linked list. Any problem?
145
Encoding of Machine
Instructions
K. R. Sarath Chandran. AP/CSE/SSNCE
146
Encoding of Machine
Instructions
Assembly language program needs to be converted into machine
instructions. (ADD = 0100 in ARM instruction set)
In the previous section, an assumption was made that all
instructions are one word in length.
OP code: the type of operation to be performed and the type of
operands used may be specified using an encoded binary pattern
Suppose 32-bit word length, 8-bit OP code (how many instructions
can we have?), 16 registers in total (how many bits?), 3-bit
addressing mode indicator.
Add R1, R2
Move 24(R0), R5
LshiftR #2, R0
Move #$3A, R1
Branch>0 LOOP
OP code Source Dest Other info
8 7 7 10
(a) One-word instruction
K. R. Sarath Chandran. AP/CSE/SSNCE
147
Encoding of Machine
Instructions
What happens if we want to specify a memory
operand using the Absolute addressing mode?
Move R2, LOC
14-bit for LOC insufficient
Solution use two words
(b) Two-word instruction
Memory address/Immediate operand
OP code Source Dest Other info
K. R. Sarath Chandran. AP/CSE/SSNCE
148
Encoding of Machine
Instructions
Then what if an instruction in which two operands
can be specified using the Absolute addressing
mode?
Move LOC1, LOC2
Solution use two additional words
This approach results in instructions of variable
length. Complex instructions can be implemented,
closely resembling operations in high-level
programming languages Complex Instruction Set
Computer (CISC)
K. R. Sarath Chandran. AP/CSE/SSNCE
149
Encoding of Machine
Instructions
If we insist that all instructions must fit into a single
32-bit word, it is not possible to provide a 32-bit
address or a 32-bit immediate operand within the
instruction.
It is still possible to define a highly functional
instruction set, which makes extensive use of the
processor registers.
Add R1, R2 ----- yes
Add LOC, R2 ----- no
Add (R3), R2 ----- yes
K. R. Sarath Chandran. AP/CSE/SSNCE
150
Encoding of Machine
Instructions
How to load a 32-bit address into a register that
serves as a pointer to memory locations?
Solution #1 direct the assembler to place the
desired address in a word location in a data area
close to the program, so that the Relative
addressing mode can be used to load the address.
Solution #2 use logical and shift instructions to
construct the desired 32-bit address by giving it in
parts that are small enough to be specifiable using
the Immediate addressing mode.
K. R. Sarath Chandran. AP/CSE/SSNCE
151
Encoding of Machine
Instructions
An instruction must occupy only one word
Reduced Instruction Set Computer (RISC)
Other restrictions
OP code
(c) Three-operand instruction
Ri Rj Other info Rk
152
Unit 2
153
ALU Design
K. R. Sarath Chandran. AP/CSE/SSNCE
154
Outline
A basic operation in all digital computers is
the addition or subtraction of two numbers.
ALU AND, OR, NOT, XOR
Unsigned/signed numbers
Addition/subtraction
Multiplication
Division
Floating number operation
155
Adders
K. R. Sarath Chandran. AP/CSE/SSNCE
156
Addition of Unsigned Numbers
Half Adder
Sum
s
0
1
1
0
Carry
c
0
0
0
1
0
0 +
0
1 +
1 0 0 0
1
0 +
1 0
1
1 +
0 1
x
y +
s c
Sum Carry
(a) The four possible cases
x y
0
0
1
1
0
1
0
1
(b) Truth table
x
y
s
c
HA
x
y
s
c
(c) Circuit (d) Graphical symbol
K. R. Sarath Chandran. AP/CSE/SSNCE
157
Addition and Subtraction of
Signed Numbers
s
i
=
c
i +1
=
Figure 6.1. Logic specification for a stage of binary addition.
13
7
+ Y
1
0
0
0
1
0
1
1
0
0
1
1
0
1
1
0
0
1
1
0
1
0
0
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
Example:
1
0 = = 0
0
1 1
1
1 1 0 0
1
1 1 1 0
Legend for stagei
x
i
y
i
Carry-in c
i
Sums
i
Carry-outc
i +1
X
Z
+ 6 0 +
x
i
y
i
s
i
Carry-out
c
i+1
Carry-in
c
i
x
i
y
i
c
i
x
i
y
i
c
i
x
i
y
i
c
i
x
i
y
i
c
i
x
i
y
i
c
i
=
+ + +
y
i
c
i
x
i
c
i
x
i
y
i
+ +
K. R. Sarath Chandran. AP/CSE/SSNCE
158
Addition and Subtraction of
Signed Numbers
A full adder (FA)
Full adder
(FA)
c
i
y
i
x
i
c
i 1 +
s
i
(a) Logic for a single stage
c
i
y
i
x
i
c
i
y
i
x
i
x
i
c
i
y
i
s
i
c
i 1 +
K. R. Sarath Chandran. AP/CSE/SSNCE
159
Addition and Subtraction of
Signed Numbers
n-bit ripple-carry adder
Overflow?
FA
c
0
y
1
x
1
s
1
FA
c
1
y
0
x
0
s
0
FA
c
n 1 -
y
n 1 -
x
n 1 -
c
n
s
n 1 -
(b) An n-bit ripple-carry adder
Most significant bit
(MSB) position
Least significant bit
(LSB) position
K. R. Sarath Chandran. AP/CSE/SSNCE
160
Addition and Subtraction of
Signed Numbers
kn-bit ripple-carry adder
n-bit
c
0
y
n
x
n
s
n
c
n
y
0
x
n 1 -
s
0
c
kn
s
k 1 - ( )n
x
0
y
n 1 -
y
2n 1 -
x
2n 1 -
y
kn 1 -
s
n 1 -
s
2n 1 -
s
kn 1 -
(c) Cascade of k n-bit adders
x
kn 1 -
Figure 6.2. Logic for addition of binary vectors.
adder
n-bit
adder
n-bit
adder
K. R. Sarath Chandran. AP/CSE/SSNCE
161
Addition and Subtraction of
Signed Numbers
Addition/subtraction logic unit
Add/Sub
control
n-bit adder
x
n 1 -
x
1
x
0
c
n
s
n 1 -
s
1
s
0
c
0
y
n 1 -
y
1
y
0
Figure 6.3. Binary addition-subtraction logic netw ork.
162
Make Addition Faster
K. R. Sarath Chandran. AP/CSE/SSNCE
163
Ripple-Carry Adder (RCA)
Straight-forward design
Simple circuit structure
Easy to understand
Most power efficient
Slowest (too long critical path)
K. R. Sarath Chandran. AP/CSE/SSNCE
164
Adders
We can view addition in terms of generate,
G[i], and propagate, P[i].
K. R. Sarath Chandran. AP/CSE/SSNCE
165
Carry-lookahead Logic
Carry Generate G
i
= A
i
B
i
must generate carry when A = B = 1
Carry Propagate P
i
= A
i
xor B
i
carry-in will equal carry-out here
S
i
= A
i
xor B
i
xor C
i
= P
i
xor C
i
C
i+1
= A
i
B
i
+ A
i
C
i
+ B
i
C
i
= A
i
B
i
+ C
i
(A
i
+ B
i
)
= A
i
B
i
+ C
i
(A
i
xor B
i
)
= G
i
+ C
i
P
i
Sum and Carry can be reexpressed in terms of generate/propagate/C
i
:
K. R. Sarath Chandran. AP/CSE/SSNCE
166
Carry-lookahead Logic
Reexpress the carry logic as follows:
C
1
= G
0
+ P
0
C
0
C
2
= G
1
+ P
1
C
1
= G
1
+ P
1
G
0
+ P
1
P
0
C
0
C
3
= G
2
+ P
2
C
2
= G
2
+ P
2
G
1
+ P
2
P
1
G
0
+ P
2
P
1
P
0
C
0
C
4
= G
3
+ P
3
C
3
= G
3
+ P
3
G
2
+ P
3
P
2
G
1
+ P
3
P
2
P
1
G
0
+ P
3
P
2
P
1
P
0
C
0
Each of the carry equations can be implemented in a two-level logic
network
Variables are the adder inputs and carry in to stage 0!
K. R. Sarath Chandran. AP/CSE/SSNCE
167
Carry-lookahead
Implementation
Adder with Propagate and
Generate Outputs
Increasingly complex logic
Pi @ 1 gate delay
Ci
Si @ 2 gate delays
Bi
Ai
Gi @ 1 gate delay
C0 C0
C0
C0
P0 P0
P0
P0
G0
G0
G0
G0
C1
P1
P1
P1
P1
P1
P1
G1
G1
G1
C2
P2
P2
P2
P2
P2
P2
G2
G2
C3
P3
P3
P3
P3
G3
C4
K. R. Sarath Chandran. AP/CSE/SSNCE
168
Carry-lookahead Logic
Cascaded Carry Lookahead
Carry lookahead
logic generates
individual carries
sums computed
much faster
A
0
B
0
C
0
S
0
@2
A
1
B
1
C
1
@3
S
1
@4
A
2
B
2
C
2
@3
S
2
@4
A
3
B
3
C
3
@3
S
3
@4
C
4
@3
K. R. Sarath Chandran. AP/CSE/SSNCE
169
Carry-lookahead Logic
Figure 6.5. 16-bit carry-lookahead adder built from 4-bit adders (see Figure 6.4 b).
Carry-lookahead logic
4-bit adder 4-bit adder 4-bit adder
4-bit adder
s
15-12
P
3
I
G
3
I
c
12
P
2
I
G
2
I
c
8
s
11-8
G
1
I
c
4
P
1
I
s
7-4
G
0
I
c
0
P
0
I
s
3-0
c
16
x
15-12
y
15-12
x
11-8
y
11-8
x
7-4
y
7-4
x
3-0
y
3-0
.
G
0
II
P
0
II
K. R. Sarath Chandran. AP/CSE/SSNCE
170
Carry-lookahead Logic
4 bit adders with internal carry lookahead
second level carry lookahead unit, extends lookahead to 16 bits
Group Propagate P = P
3
P
2
P
1
P
0
Group Generate G = G
3
+ G
2
P
3
+ G
1
P
3
P
2
+ G
0
P
3
P
2
P
1
4-bit Adder
4 4
4
A [15-12] B [15-12]
C
12
C
16
S [15-12]
P
G
4-bit Adder
4 4
4
A [1 1-8] B [1 1-8]
C
8
S [1 1-8]
P
G
4-bit Adder
4 4
4
A [7-4] B [7-4]
C
4
S [7-4]
P
G
4-bit Adder
4
4
4
A [3-0] B [3-0]
C
0
S [3-0]
P G
Lookahead Carry Unit
C
0
P
0
G
0
P
1
G
1
P
2
G
2
P
3
G
3
C
3
C
2
C
1
C
0
P 3-0 G 3-0
C
4
@3 @2
@0
@4
@4 @3 @2 @5
@7
@3 @2 @5
@8
@8
@3 @2
@5
@5 @3
@0
C
16
171
Unsigned
Multiplication
K. R. Sarath Chandran. AP/CSE/SSNCE
172
Manual Multiplication
Algorithm
(13) Multiplicand M 1
1
(143) Product P
(11) Multiplier Q 1
0
0
1
1
1
1 1 0 1
1 0 1 1
0 0 0 0
1 0 1 1
0 1 0 0 1 1 1 1
78 - ( )
+1 1 - 1 -
K. R. Sarath Chandran. AP/CSE/SSNCE
185
Booth Algorithm
Multiplier
Bit i Bit i 1 -
Version of multiplicand
selected by bit i
0
1
0
0
0 1
1 1
0 M
1 + M
1
M
0 M
Figure 6.12. Booth multiplier recoding table.
M
M
M
M
M
M
M
K. R. Sarath Chandran. AP/CSE/SSNCE
190
Bit-Pair Recoding of
Multipliers
1 -
0 0 0 0
1 1 1 1 1 0
0 0 0 0 1 1
1 1 1 1 1 0 0
0 0 0 0 0 0
0 0 0 0 1 1 1 1 1 1
0 1 1 0 1
0
1 0 1 0 0 1 1 1 1 1
1 1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 1 0 1 1 0 0 1 0
0
1
0 0
1 0
1
0 0
0
0 1
0
0 1
1 0
0
0 1 0
0 1 1 0 1
1 1
1 -
6 - ( )
13 + ( )
1 +
78 - ( )
1 - 2 -
+ + + + =
n
n
b b b b B F
) 1 (
2 1 1
n
F
K. R. Sarath Chandran. AP/CSE/SSNCE
210
Floating-Point Numbers
What are needed to represent a floating-point
decimal number?
Sign
Mantissa (the significant digits)
Exponent to an implied base (scale factor)
Normalized the decimal point is placed to
the right of the first (nonzero) significant digit.
K. R. Sarath Chandran. AP/CSE/SSNCE
211
IEEE Standard for Floating-
Point Numbers
Think about this number (all digits are decimal):
X
1
.X
2
X
3
X
4
X
5
X
6
X
7
10
Y1Y2
It is possible to approximate this mantissa precision
and scale factor range in a binary representation
that occupies 32 bits: 24-bit mantissa (1 sign bit for
signed number), 8-bit exponent.
Instead of the signed exponent, E, the value actually
stored in the exponent field is an unsigned integer
E=E+127, so called excess-127 format
K. R. Sarath Chandran. AP/CSE/SSNCE
212
IEEE Standard
Sign of
number :
32 bits
mantissa fraction
23-bit
representation
excess-127
exponent in
8-bit signed
52-bit
mantissa fraction
11-bit excess-1023
exponent
64 bits
Sign
Value represented
0 0 1 0 1 0
. . .
0 0 0 0 1 0 1 0 0 0
S M
S M
Value represented
(a) Single precision
(b) Example of a single-precision number
(c) Double precision
Figure 6.24. IEEE standard floating-point formats.
E
+
1.0010100 2
87 -
=
1.M 2
E 127 -
=
Value represented 1.M 2
E 1023 -
=
E
0 signifies
- 1 signifies
(101000)
2
=40
10,
40-127=-87
K. R. Sarath Chandran. AP/CSE/SSNCE
213
IEEE Standard
For excess-127 format, 0 E 255.
However, 0 and 255 are used to represent
special value. So actually 1 E 254. That
means -126 E 127.
Single precision uses 32-bit. The value range
is from 2
-126
to 2
+127
.
Double precision used 64-bit. The value
range is from 2
-1022
to 2
+1023
.
K. R. Sarath Chandran. AP/CSE/SSNCE
214
Two Aspects
If a number is not normalized, it can always be put in normalized
form by shifting the fraction and adjusting the exponent.
0 1 1 0 0 1 0 0 0 0 1 0 1
(a) Unnormalized value
(b) Normalized version
0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 ...
(There is no implicit 1 to the left of the binary point.)
Value represented
0.0010110 2
9
+ =
...
Value represented 1.0110 2
6
+ =
Figure 6.25. Floating-point normalization in IEEE single-precision format.
excess-127 exponent
(100001000)
2
=136
10,
136-127=-9
6+127=133. 133
10,
= (100000101)2
K. R. Sarath Chandran. AP/CSE/SSNCE
215
Two Aspects
As computations proceed, a number that
does not fall in the representable range of
normal numbers might be generated.
It requires an exponent less than -126
(underflow) or greater than +127 (overflow).
Both are exceptions that need to be
considered.
K. R. Sarath Chandran. AP/CSE/SSNCE
216
Special Values
The end value 0 and 255 are used to represent
special values.
When E=0 and M=0, the value exact 0 is
represented. (0)
When E=255 and M=0, the value is represented.
( )
When E=0 and M0, denormal numbers are
represented. The value is 0.M2
-126
.
When E=255 and M0, Not a Number (NaN).
K. R. Sarath Chandran. AP/CSE/SSNCE
217
Exceptions
A processor must set exception flags if any of
the following occur in performing operations:
underflow, overflow, divide by zero, inexact,
invalid.
When exception occurs, the results are set to
special values.
K. R. Sarath Chandran. AP/CSE/SSNCE
218
Arithmetic Operations on
Floating-Point Numbers
Add/Subtract rule
Choose the number with the smaller exponent and shift its mantissa right a
number of steps equal to the difference in exponents.
Set the exponent of the result equal to the larger exponent.
Perform addition/subtraction on the mantissas and determine the sign of the
result.
Normalize the resulting value, if necessary.
Multiply rule
Add the exponents and subtract 127.
Multiply the mantissas and determine the sign of the result.
Normalize the resulting value, if necessary.
Divide rule
Subtract the exponents and add 127.
Divide the mantissas and determine the sign of the result.
Normalize the resulting value, if necessary.
K. R. Sarath Chandran. AP/CSE/SSNCE
219
Guard Bits and Truncation
During the intermediate steps, it is important
to retain extra bits, often called guard bits, to
yield the maximum accuracy in the final
results.
Removing the guard bits in generating a final
result requires truncation of the extended
mantissa how?
K. R. Sarath Chandran. AP/CSE/SSNCE
220
Guard Bits and Truncation
Chopping biased, 0 to 1 at LSB.
Von Neumann Rounding (any of the bits to be removed are 1,
the LSB of the retained bits is set to 1) unbiased, -1 to +1 at
LSB.
Why unbiased rounding is better for the cases that many
operands are involved?
Rounding (A 1 is added to the LSB position of the bits to be
retained if there is a 1 in the MSB position of the bits being
removed) unbiased, -to +at LSB.
Round to the nearest number or nearest even number in case of a tie
(0.b
-1
b
-2
0000 - 0.b
-1
b
-2
0, 0.b
-1
b
-2
1100 - 0.b
-1
b
-2
1+0.001)
Best accuracy
Most difficult to implement
0.b
-1
b
-2
b
-3
000 -- 0.b
-1
b
-2
b
-3
1110.b
-1
b
-2
b
-3
All 6-bit fractions with b
-4
b
-5
b
6
not equal to 000 are truncated to 0.b
-1
b
-2
1
K. R. Sarath Chandran. AP/CSE/SSNCE
221
Implementing Floating-Point
Operations
Hardware/software
In most general-purpose processors, floating-
point operations are available at the machine-
instruction level, implemented in hardware.
In high-performance processors, a significant
portion of the chip area is assigned to
floating-point operations.
Addition/subtraction circuitry
K. R. Sarath Chandran. AP/CSE/SSNCE
222
Figure 6.26. Floating-point addition-subtraction unit.
E
X
MagnitudeM
with larger E
Mof number
with smaller E
M of number
subtractor
8-bit
sign
subtractor
8-bit
MUX
Mantissa
SHIFTER
SWAP
detector
Normalize and
round
Leading zeros
to right
adder/subtractor
Subtract
Add /
Sign
Add/Sub
n bits
S
A
S
B
E
A
E
B
M
A
M
B
n E
A
E
B
- =
E
A
E
B
S
R
E X -
E
R
M
R
R:
32-bit
result
R A B + =
32-bit operands
A : S
A
E
A
M
A
, ,
B : S
B
E
B
M
B
, ,
Combinational
CONTROL
network