Anda di halaman 1dari 101

Instruction Set Principles and Examples

UNIT - 2

Classification of Instruction Set


Architectures

Instruction Set
Design

software
instruction set

hardware

Multiple Implementations: 8086 Pentium 4


ISAs evolve: MIPS-I, MIPS-II, MIPS-III, MIPS-IV,
MIPS,MDMX, MIPS-32, MIPS-64

MIPS (originally an acronym for Microprocessor


without Interlocked Pipeline Stages)
MIPS is a reduced instruction set computer (RISC)
instruction set architecture (ISA) developed by MIPS
Technologies (formerly MIPS Computer Systems, Inc.).
The early MIPS architectures were 32-bit, and later
versions were 64-bit. Multiple revisions of the MIPS
instruction set exist, including MIPS I, MIPS II, MIPS III,
MIPS IV, MIPS V, MIPS32, and MIPS64. The current
revisions are MIPS32 (for 32-bit implementations) and
MIPS64 (for 64-bit implementations).[1][2] MIPS32 and
MIPS64 define a control register set as well as the
instruction set.

Typical Processor Execution Cycle


Instruction

Obtain instruction from program storage

Fetch
Instruction

Determine required actions and instruction size

Decode
Operand

Locate and obtain operand data

Fetch
Execute
Result

Compute result value or status


Deposit results in register or storage for later use

Store
Next
Instruction

Determine successor instruction

Instruction and Data Memory: Unified or Separate


Computer
Program
(Instructions)

Programmer's View
ADD
SUBTRACT
AND
OR
COMPARE
.
.
.

01010
01110
10011
10001
11010
.
.
.

CPU
Memory
I/O

Computer's View
Princeton (Von Neumann) Architecture
--- Data and Instructions mixed in same
unified memory

Harvard Architecture
--- Data & Instructions in
separate memories

--- Program as data


--- Storage utilization

--- Has advantages in certain


high performance
implementations

--- Single memory interface

--- Can optimize each memory

Classifying instruction set Architectures


There are four types of internal storages uses by the
processor to store operands explicitly and implicitly for
execution of a program
1.Stack
2.Accumulator
3.Set of Registers (Register-Memory)
4.Set of Registers (Register-Register/load-store)
The operands in stack architecture are implicitly on the top
of the stack, and in an accumulator architecture one
operand is implicitly the accumulator. The general-purpose
register architectures (Register-Memory and RegisterRegister) have only explicit operands, either in registers or
memory locations.

Basic Addressing Classes

Declining cost of registers

Operand locations for four instruction set architecture


classes.

The arrows indicate whether the operand is an input or the result of the
ALU operation, or both an input and result.
Lighter shades indicate inputs, and the dark shade indicates the result.
In (a), a Top Of Stack register (TOS), points to the top input operand,
which is combined with the operand below. The first operand is removed
from the stack, the result takes the place of the second operand, and
TOS is updated to point to the result. All operands are implicit. In (b), the
Accumulator is both an implicit input operand and a result. In (c), one
input operand is a register, one is in memory, and the result goes to a
register. All operands are registers in (d) and, like the stack architecture,
can be transferred to memory only via separate instructions: push or pop
for (a) and load or store for (d).

Code Sequence for C=A+B

Stack
Push A
Push B
Add
Pop C

Accumulator
Load A
Add B
Store C

Register-memory Register-register
Load R1, A
Add R3,R1, B
Store R3, C

Load R1, A
Load R2, B
Add R3, R1, R2
Store R3, C

The code sequence for C = A + B for four classes of instruction sets.


Note that the Add instruction has implicit operands for stack and accumulator
architectures, and explicit operands for register architectures. It is assumed
that A, B, and C all belong in memory and that the values of A and B cannot be
destroyed.

Stack Architectures

Accumulator Architectures

Register-Set Architectures

Register-to-Register: Load-Store Architectures

Register-to-Memory Architectures

Memory-to-Memory Architectures

Instruction Formats

Instruction Set Architecture (ISA )


To command a computer's hardware, you must speak its
language
The words of a machine's language are called instructions, and
its vocabulary is called instruction set
Once you learn one machine language, it is easy to pick up
others:

There are few fundamental operations that all computers must provide
All designer have the same goal of finding a language that simplifies building
the hardware and the compiler while maximizing performance and
minimizing cost

Learning how instructions are represented leads to discovering


the secret of computing: the stored-program concept
The MIPS instruction set is used as a case study

Interface Design
A good interface:

Lasts through many implementations (portability, compatibility)


Is used in many different ways (generality)
Provides convenient functionality to higher levels
Permits an efficient implementation at lower levels
Design decisions must take into account:

Technology
Machine organization

use

Programming languages
Compiler technology
Operating systems

imp 1
Interface

use

imp 3

Time

use

imp 2

Classifying Instruction Set Architectures


Accumulator Architecture
Common in early stored-program computers when hardware was so expensive
Machine has only one register (accumulator) involved in all math. & logical operations
All operations assume the accumulator as a source operand and a destination for the
operation, with the other operand stored in memory

Extended Accumulator Architecture


Dedicated registers for specific operations, e.g stack and array index registers, added
The 8085 microprocessor is a an example of of such special-purpose register arch.

General-Purpose Register Architecture


MIPS is an example of such arch. where registers are not sticking to play a single role
This type of instruction set can be further divided into:
Register-memory: allows for one operand to be in memory
Register-register (load-store): demands all operands to be in registers
Machine
Motorola 6800
DEC VAX
Intel 8086
Motorola 68000
Intel 80386
PowerPC
DEC Alpha

# general-purpose
registers

Architecture style

2
16
1
16
32
32
32

Accumulator
Register-memory, memory-memory
Extended accumulator
Register-memory
Register-memory
Load-store
Load-store

Year
1974
1977
1978
1980
1985
1992
1992

Compact Code and Stack Architectures

When memory is scarce, machines, like Intel 80x86 had variable-length


instructions to match varying operand specifications and minimize code size
Stack machines abandoned registers altogether arguing that it is hard for
compilers to use them efficiently

Operands are to be pushed on a stack from memory and the results have to
be popped from the stack to memory
Operations take their operand by default from the top of the stack and insert
the results back onto the stack
Stack machines simplify compilers and lent themselves to a compact
instruction encoding but limit compiler optimization (e.g. in math. expressions)
Example:

A= B+C

Push AddressC
Push AddressB
add
Pop AddressA

# Top=Top+4; Stack[Top]=Memory[AddressC]
# Top=Top+4; Stack[Top]=Memory[AddressB]
# Stack[Top-4]=Stack[Top]+Stack[Top-4]; Top=Top-4
# Memory[AddressA]=Stack[Top]; Top=Top-4

Compact code is important for heralded network computers where programs


must be downloaded over the Internet (e.g. Java-based applications)

Other types of Architecture


High-Level-Language Architecture
In the 1960s, systems software was rarely written in high-level languages and virtually
every commercial operating system before Unix was written in assembly
Some people blamed the code density on the instruction set rather than the
programming language
A machine design philosophy was advocated with the goal of making the hardware
more like high-level languages
The effectiveness of high-level languages, memory size limitation and lack of efficient
compilers doomed this philosophy to a historical footnote

Reduced Instruction Set Architecture


With the recent development in compiler technology and expanded memory sizes less
programmers are using assembly level coding
Instruction set architecture became measurable in the way compilers rather
programmable use them
RISC architecture favors simplifying hardware design over enriching the offered set of
instructions, relying on compilers to effectively use them to perform complex operations
Virtually all new architecture since 1982 follows the RISC philosophy of fixed
instruction lengths, load-store operations, and limited addressing mode

Evolution of Instruction Sets


Single Accumulator (EDSAC 1950)
Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953)

Separation of Programming Model


from Implementation
High-level Language Based
(B5000 1963)

Concept of a Family
(IBM 360 1964)

General Purpose Register Machines


Complex Instruction Sets
(Vax, Intel 432 1977-80)

Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
RISC
(MIPS,SPARC,IBM RS6000, . . .1987)

Register-Memory Architectures
# memory
addresses

Max. number
of operands

SPARC, MIPS, PowerPC, ALPHA

Intel 60X86, Motorola 68000

VAX (also has 3 operands format)

VAX (also has 2 operands format)

Effect of the number of memory operands:

Examples

Memory Address
Interpreting Memory Addressing
The address of a word matches the byte address of one of its 4 bytes
The addresses of sequential words differ by 4 (word size in byte)
words' addresses are multiple of 4 (alignment restriction)
Machines that use the address of the leftmost byte as the word address is
called "Big Endian" and those that use rightmost bytes called "Little Endian"
Misalignment complicates memory access and causes programs to run slower
(Some machines does not allow misaligned memory access at all)
Byte ordering can be a problem when exchanging data among different machines
Byte addresses affects array index calculation to account for word addressing and
offset within the word
Object
addressed

Aligned at
byte offsets

Misaligned at
byte offsets

Byte

1,2,3,4,5,6,7

Never

Half word

0,2,4,6

1,3,5,7

Word

0,4

1,2,3,5,6,7

Double word

1,2,3,4,5,6,7

Addressing Modes
Addressing modes refer to how to specify the location of an
operand (effective address)
Addressing modes have the ability to:

Significantly reduce instruction counts


Increase the average CPI
Increase the complexity of building a machine
The VAX machine is used for benchmark data since it supports
wide range of memory addressing modes
Famous addressing modes can be classified based on:

the source of the data into register, immediate or


memory
the address calculation into direct and indirect
An indexed addressing mode is usually provided to allow
efficient implementation of loops and array access

Example of Addressing Modes


Address. mode

Example

Register

ADD R4, R3

Immediate
Displacement

ADD R4, #3
ADD R4, 100 (R1)

Register indirect

ADD R4, (R1)

Indexed

ADD R4, (R1 + R2)

Direct or absolute

ADD R4, (1001)

Memory indirect or
memory deferred
Autoincrement

ADD R4, @(R3)

Auto decrement

Scaled

ADD R4, (R2) +

ADD R4, -(R2)

ADD R4, 100 (R2)


[R3]

Meaning
Regs[R4] = Regs[R4] +
Regs[R3]
Regs[R4] = Regs[R4] + 3
Regs[R4] = Regs[R4] +
Mem[ 100 + Regs[R1] ]
Regs[R4] = Regs[R4] +
Mem[Regs[R1] ]
Regs[R4] = Regs[R4] +
Mem[Regs[R1] +
Regs[R2]]
Regs[R4] = Regs[R4] +
Mem[ 1001 ]
Regs[R4] = Regs[R4] +
Mem[Mem[Regs[R3] ]]
Regs[R4] = Regs[R4] +
Mem[Regs[R2] ]
Regs[R2] = Regs[R2] + d
Regs[R2] = Regs[R2] d
Regs[R4] = Regs[R4] +
Mem[Regs[R2] ]
Regs[R4] = Regs[R4] +
Mem[100 + Regs[R2] +
Regs[R3] * d]

When used
When a value is in a register
For constants
Accessing local variables
Accessing using a pointer or a
computed address
Sometimes useful in array
addressing: R1 = base of the
array: R2 = index amount
Sometimes useful for accessing
static data; address constant
may need to be large
If R3 is the address of the
pointer p, then mode yields *p
Useful for stepping through
arrays within a loop. R2 points to
start of the array; each reference
increments R2 by d.
Same use as autoincrement.
Autodecrement/increment can
also act as push/pop to
implement a stack
Used to index arrays.

Addressing Mode for Signal Processing


DSP offers special addressing modes to better serve popular algorithms
Special features requires either hand coding or a compiler that uses such
features (GNU would not be a good choice)
Modulo addressing:
Since DSP deals with continuous data streams,
circular buffers are widely used

Fast Fourier Transform

Circular or modulo addressing allows automatic


increment and decrement and resets pointer
when reaching the end of the buffer

1 (0012) 4 (1002)

Reverse addressing:

3 (0112) 6 (1102)

Resulting address is the reverse order of the


current address

4 (1002) 1 (0012)

Reverse addressing mode expedites the


access, which other wise requires a number of
logical instructions or extra memory access

0 (0002) 0 (0002)

2 (0102) 2 (0102)

5 (1012) 5 (1012)
6 (1102) 3 (0112)
7 (1112) 7 (1112)

Operations of the Computer Hardware


There must certainly be instructions for performing the
fundamental arithmetic operations.

Burkes, Goldstine and Von Neumann, 1947

Assembly language is a symbolic representation of what the processor actually understand


MIPS assembler allows only one instructions/line and ignore comments following # until end of line

Example:

Translation of a segment of a C program to MIPS assembly


instructions:
C:

f = (g + h) - (i + j)

MIPS:
add
add
sub

t0, g, h
t1, i, j
f, t0, t1

# temp. variable t0 contains "g + h"


# temp. variable t1 contains "i + j"
# f = t0 - t1 = (g + h) - (i + j)

Operations in the Instruction Set


Operator type
Arithmetic and logical
Data Transfer
Control
System
Floating point
Decimal
String
Graphics

Examples
Integer arithmetic and logical operations: add, and, subtract , or
Loads-stores (move instructions on machines with memory addressing)
Branch, jump, procedure call and return, trap
Operating system call, Virtual memory management instructions
Floating point instructions: add, multiply
Decimal add, decimal multiply, decimal to character conversion
String move, string compare, string search
Pixel operations, compression/decompression operations

Arithmetic, logical, data transfer and control are almost standard categories
for all machines
System instructions are required for multi-programming environments
although support for system functions varies
Decimal and string instructions can be primitives, e.g. IBM 360 and the VAX
Support for floating point, decimal, string and graphics can be optionally
sometimes provided via co-processor
Some machines rely on the compiler to synthesize special operations such
as string handling from simpler instructions

Operations for Media & Signal Process.


Single instruction multiple data (SIMD) and vector instructions
are often supported in DSPs, which are commonly used in
multimedia and signal processing applications
Partitioned Add (integer)

Perform multiple 16-bit addition on a 64-bit ALU since most data are narrow
Increases ALU throughput for multimedia applications
Paired single operations (float)

Allow same register to be acting as two operands to the same operation


Handy in dealing with vertices and coordinates
Multiply and accumulate

Very handy for calculating dot products of vectors (signal processing) and
matrix multiplication

Frequency of Operations Usage


The most widely executed instructions are the simple operations of an
instruction set
The following is the average usage in SPECint92 on Intel 80x86
Rank
1
2
3
4
5
6
7
8
9
10

80x86 Instruction
Load
Conditional branch
Compare
Store
Add
And
Sub
Move register-register
Call
Return
Total

Integer Average
(% total executed)
22%
20%
16%
12%
8%
6%
5%
4%
1%
1%
96%

Make the common case fast by focusing on these operations

Control Flow Instructions

Data is based on SPEC2000 on Alpha

jump for unconditional change in the control flow


branch for conditional change in the control flow
Procedure calls and returns

Destination Address Definition


Data is based SPEC2000 on Alpha

Relative addressing w.r.t. the program counter proved to be the best choice
for forward and backward branching or jumps (load address independent)
To allow for dynamic loading of library routines, register indirect address
allows addresses to be loaded in special registers
(e.g. virtual functions in C++ and system calls in a case statement)

Condition Evaluation

Compare&branch can be efficient if majority


of conditions are comparison with zero

Based on SPEC92 on MIPS

Remember to focus
on the common case

Frequency of Types of Comparison

Data is based on SPEC2000 on Alpha


Different benchmark and machine set new design
priority

DSPs support repeat instruction for for loops (vectors) using 3 registers

Supporting Procedures
Execution of a procedure follows the following steps:
Store parameters in a place accessible to the procedure
Transfer control to the procedure
Acquire the storage resources needed for the procedure
Perform the desired task
Store the results value in a place accessible to the calling program
Return control to the point of origin
The hardware provides a program counter to trace instruction flow and
manage transfer of control
Parameter Passing
Registers can be used for passing small number of parameters
A stack is used to spill registers of the current context and make room for
the called procedure to run and to allow for large parameters to be passed
Storage of machine state can be performed by caller or callee
Handling of shared variables is important to ensure correct semantics and
thus requires clear specifications in the library interface
Global variables stored in registers need careful handling

Type and Size of Operands


The type of an operand is designated by encoding it in the instructions
operation code
The type of an operand, e.g. single precision float, effectively gives its size
Common operand types include character, half word and word size integer,
single- and double-precision floating point
Characters are almost always in ASCII and integers are in 2s complement
and floating point in IEEE 754
The 16-bit Unicode, used in Java is gaining popularity due its support for
the international character sets
For business applications, some architecture support a decimal format in
binary coded decimal (BCD)
Depending on the size of the word, the complexity of handling different
operand types differs
DSP offers fixed point data types to support high precision floating point
arithmetic and to allow sharing single exponent for multiple numbers
For Graphics applications vertex and pixel operands are added features

Size of Operands

Frequency of reference by size based on SPEC2000 on Alpha


Double-word data type is used for double-precision floating point operations
and address storage in machines with a 64-bit wide address bus
Words are used for integer operations and for 32-bit address bus machines
Because the mix in SPEC, word and double-word data types dominates

Instruction Representation
Humans are taught to think in base 10 (decimal) but numbers may be
represented in any base (123 in base 10 = 1111011 in binary or base 2)
Numbers are stored in computers as a series of high and low electronic
signals (binary numbers)
Binary digits are called bits and considered the atom of computing
Each piece of an instruction is a number and placing these numbers
together forms the instruction
Assembler translate the assembly symbolic instructions into machine
language instructions (machine code)
Example:
Assembly:
M/C language (decimal):
M/C language (binary):

add $t0, $s1, $s2


0

17

18

32

000000

10001

10010

01000

00000

100000

6 b its

5 b its

5 b its

5 b its

5 b its

6 b its

Note: MIPS compiler by default maps $s0,,$s7 to reg. 16-23 and $t0,,$t7 to reg. 8-15

Encoding an Instruction Set


Instruction encoding affects the size of the compiled program and the
complexity of the CPU implementation
The operation is typically specified in one field called opcode
The addressing mode for the operand can be encoded with the operation
or specified through a separate identifier in case of large number of
supported modes
The architecture must balance between several competing factors:
1. Desire to support as many registers and addressing modes as possible
2. Effect of operand specification on the size of the instruction (program)
3. Desire to simplify instruction fetching and decoding during execution
Fixed size instruction encoding simplify the CPU design while limiting the
addressing modes supported
An architect caring about the code size can use variable size encoding
A hybrid approach is to allow variability by supporting multiple-sized
instruction

Encoding Examples

MIPS Instruction format


Register-format instructions:
op:
rs:
rt:
rd:
shmat:
funct:

op

rs

rt

rd

sham t

fu n c t

6 b its

5 b it s

5 b its

5 b it s

5 b its

6 b it s

Basic operation of the instruction, traditionally called opcode


The first register source operand
The second register source operand
The register destination operand, it gets the result of the operation
Shift amount
This field selects the specific variant of the operation of the op field

Immediate-type instructions:
op

rs

rt

a d d re s s

6 b its

5 b its

5 b it s

1 6 b its

Some instructions need longer fields than provided for large value constant
The 16-bit address means a load word instruction can load a word within a
region of 215 bytes of the address in the base register
Example:
lw
$t0, 32($s3)
# Temporary register $t0 gets A[8]
Instruction
add
sub
lw
sw

Format
R
R
I
I

op
0
0
35
43

rs
reg
reg
reg
reg

rt
reg
reg
reg
reg

rd
reg
reg
N/A
N/A

shamt
0
0
N/A
N/A

funct
32
34
N/A
N/A

address
N/A
N/A
address
address

The Stored Program Concept


Learning how instructions are represented leads to discovering
the secret of computing: the stored-program concept
Todays computers are build on two key principles :
Instructions are represented as numbers
Programs can be stored in memory to be
read or written just like numbers

The power of the concept:


memory can contain:
the source code for an editor

A c c o u n t in g p r o g r a m
( m a c h in e c o d e )
E d it o r p r o g r a m
( m a c h in e c o d e )

P ro c e s s o r

the compiled m/c code for the editor


the text that the compiled program is using
the compiler that generated the code

M e m o ry

C c o m p i le r
( m a c h in e c o d e )
P a y r o ll d a ta
B o o k te x t
S o u r c e c o d e in C
fo r e d ito r p ro g r a m

Compiling if-then-else in MIPS


Assuming the five variables f, g, h, i,
and j correspond to the five registers
$s0 through $s4, what is the compiler
MIPS code for the following C if
statement:

i= j

i = = j?

E ls e :
f= g + h

f= g h

if (i == j) f = g + h; else f = g - h;
MIPS:

E x it :

Else:
Exit:

bne

$s3, $s4, Else

add

$s0, $s1, $s2

Exit

sub

$s0, $s1, $s2

i j

# go to Else if i j
# f = g + h (skipped if i j)

# f = g - h (skipped if i = j)

Typical Compilation

Major Types of Optimization


Optimization Name

Explanation

Frequency

High level

At or near source level; machine indep.

Procedure integration

Replace procedure call by procedure body

Local

Within straight line code

Common sub- expression


elimination

Replace two instances of the same computation by


single copy

Constant propagation

Replace all instances of a variable that is assigned a


constant with the constant

Stack height reduction

Rearrange expression tree to minimize resources


needed for expression evaluation

Global

Across a branch

Global common sub


expression elimination

Same as local, but this version crosses branches

13%

Copy propagation

Replace all instances of a variable A that has been


assigned X (i.e., A = X) with X

11%

Code motion

Remove code from a loop that computes same value


each iteration of the loop

16%

Induction variable

Simplify/eliminate array addressing calculations


within loops

elimination

Machine-dependant

Depends on machine knowledge

Strength reduction

Many examples, such as replace multiply by a


constant with adds and shifts

Pipeline Scheduling

Reorder instructions to improve pipeline performance

N.M

18%
22%
N.M

2%

N.M
N.M.

Program and Compiler Optimization Level

Effect of Complier Optimization


Measurements taken on MIPS

Level 0: non-optimized code

Level 2: global optimization, s/w pipelining

Level 1: local optimization

Level 3: adds procedure integration

Compiler Support for Multimedia Instr.


Intels MMX and PowerPC AltiVec have small vector processing capabilities
targeting Multimedia applications (to speed up graphics)
Intel added new set of instructions, called Streaming SIMD Extension
A major advantage of vector computers is hiding latency of memory access
by loading multiple elements and then overlapping execution with data
transfer
Vector computers typically have strided and/or gather/scatter addressing to
perform operations on distant memory locations
Strided addressing allows memory access in increment larger than one
Gather/scatter addressing is similar to register indirect mode where the
address are stored instead of the data
Supporting vector operation without strided addressing, such as Intels MMX
limits the potential speedup
Such limited support for vector processing makes the use of vectorizing
compiler optimization unpopular and restrict its scope to hand coded routines
SIMD instructions on MMX and AltiVec tend to be solutions not primitives

Starting a Program
C p ro g ra m

C o m p il e r

A s s e m b ly la n g u a g e p r o g r a m

A s s e m b le r

Object files for Unix typically contains:


Header: size & position of components
Text segment: machine code
Data segment: static and dynamic variables
Relocation info: identify absolute memory ref.
Symbol table: name & location of labels,
procedures and variables
Debugging info: mapping source to object
code, break points, etc.

O b j e c t : M a c h in e la n g u a g e m o d u le

O b je c t : L ib r a r y r o u t in e ( m a c h in e la n g u a g e )

Linker
L in k e r

- Place code & data modules


symbolically in memory
-Determine the address of data &
instruction labels
-Patch both internal & external ref.

E x e c u t a b le : M a c h in e la n g u a g e p r o g r a m

L oa de r

M e m o ry

Loading Executable Program


$sp

7 ff f ff ff

hex

S ta c k

To load an executable, the operating system


follows these steps:
Reads the executable file header to
determine the size of text and data segments
Creates an address space large enough for
the text and data
Copies the instructions and data from the
executable file into memory

D y n a m ic d a t a
$gp

1000 8000

hex

1000 0000
pc

S t a tic d a ta

Initializes the machine registers and sets the


stack pointer to the first free location

Text

Jumps to a start-up routines that copies the


parameters into the argument registers and
calls the main routine of the program

hex

0040 0000

hex

R eserved
0

Copies the parameters (if any) to the main


program onto the stack

Effect of ISA on Compiler Complexity


Promote regularity

Limit # register formats and variability of # operands in the instruction


Avoid fixing a class of registers to some operations
Separate addressing modes from operations to allow consistent handling
Provide primitives, not solutions

Avoid being influenced by features of a particular high level programming


Language (VAX provide call instruction with automatic context saving and does not
allow for per-call optimization)

Allow enhancing the handling of common features


Simplify trade-offs among alternatives

Simplify the analysis of special features such as cache and pipeline


Allow of simultaneous activities to promote optimization
Favor static binding

Avoid dynamically changing, at run time, values set at compile time to


allow for optimization (e.g. changing the mask that indicates registers to be
saved by the caller can diminish its value in case of separate compilation)

Instruction Set Design Issues

6 Instruction Set Design Issues


1. Number of Addresses
2. Flow of Control
3. Operand Types
4. Addressing Modes
5. Instruction Types
6. Instruction Formats

Number of Addresses

Four categories
3-address machines
- 2 for the source operands and one for the result

2-address machines
- One address doubles as source and result

1-address machine
- Accumulator machines
- Accumulator is used for one source and result

0-address machines
- Stack machines
- Operands are taken from the stack
- Result goes onto the stack

Number of Addresses (cont.)


Three-address machines
Two for the source operands, one for the result
RISC processors use three addresses
Sample instructions
add

dest,src1,src2
; M(dest)=[src1]+[src2]

sub

dest,src1,src2
; M(dest)=[src1]-[src2]

mult

dest,src1,src2
; M(dest)=[src1]*[src2]

Three addresses:
Operand 1, Operand 2, Result
Example: a = b + c
Three-address instruction formats are not common, because they require a
relatively long instruction format to hold the three address references.

Number of Addresses (cont.)


Example

C statement
A=B+C*DE+F+A

Equivalent code:
mult

T,C,D

;T = C*D

add

T,T,B

;T = B+C*D

sub

T,T,E

;T = B+C*D-E

add

T,T,F

;T = B+C*D-E+F

add

A,T,A

;A = B+C*D-E+F+A

Number of Addresses (cont.)


Two-address machines
One address doubles (for source operand & result)
Last example makes a case for it
- Address T is used twice

Sample instructions
load

dest,src ; M(dest)=[src]

add

dest,src ; M(dest)=[dest]+[src]

sub

dest,src ; M(dest)=[dest]-[src]

mult

dest,src ; M(dest)=[dest]*[src]

Two Addresses:
One address doubles as operand and result
Example: a = a + b
The two-address formal reduces the space requirement but also
introduces some awkwardness. To avoid altering the value of an
operand, a MOVE instruction is used to move one of the values to a
result or temporary location before performing the operation.

Number of Addresses (cont.)


Example

C statement
A=B+C*DE+F+A

Equivalent code:
load

T,C

;T = C

mult

T,D

;T = C*D

add

T,B

;T = B+C*D

sub

T,E

;T = B+C*D-E

add

T,F

;T = B+C*D-E+F

add

A,T

;A = B+C*D-E+F+A

Number of Addresses (cont.)

One-address machines
Use special set of registers called accumulators
- Specify one source operand & receive the result

Called accumulator machines


Sample instructions

load

addr ; accum = [addr]

store

addr ; M[addr] = accum

add

addr ; accum = accum + [addr]

sub

addr ; accum = accum - [addr]

mult

addr ; accum = accum * [addr]

Number of Addresses (cont.)


Example

C statement
A=B+C*DE+F+A
Equivalent code:
load

;load C into accum

mult

;accum = C*D

add

;accum = C*D+B

sub

;accum = B+C*D-E

add

;accum = B+C*D-E+F

add

;accum = B+C*D-E+F+A

store

;store accum contents in A

Number of Addresses (cont.)

Zero-address machines
Stack supplies operands and receives the result
- Special instructions to load and store use an address

Called stack machines (Ex: HP3000, Burroughs B5500)


Sample instructions
push

addr ; push([addr])

pop

addr ; pop([addr])

add

; push(pop + pop)

sub

; push(pop - pop)

mult

; push(pop * pop)

Number of Addresses (cont.)

Example
C statement

A=B+C*DE+F+A
Equivalent code:
push

sub

push

push

push

add

Mult
push
add

push
B

F
A

add
pop

Load/Store Architecture

Instructions expect operands in internal processor registers


Special LOAD and STORE instructions move data between registers
and memory
RISC uses this architecture
Reduces instruction length

63

Load/Store Architecture (cont.)

Sample instructions
load

Rd,addr

;Rd = [addr]

store

addr,Rs

;(addr) = Rs

add

Rd,Rs1,Rs2 ;Rd = Rs1 + Rs2

sub

Rd,Rs1,Rs2 ;Rd = Rs1 - Rs2

mult

Rd,Rs1,Rs2 ;Rd = Rs1 * Rs2

Number of Addresses (cont.)

Example
C statement

A = B + C * D E + F + A
Equivalent code:
load

R1,B

mult

R2,R2,R3

load

R2,C

add

R2,R2,R1

load

R3,D

sub

R2,R2,R4

load

R4,E

add

R2,R2,R5

load

R5,F

add

R2,R2,R6

load

R6,A

store

A,R2

2: Flow of Control

Default is sequential flow


Several instructions alter this default
execution
Branches
- Unconditional
- Conditional
- Delayed branches

Procedure calls
- Delayed procedure calls

Flow of Control (cont.)

Branches
Unconditional
- Absolute address
- PC-relative
Target address is specified relative to PC contents
Relocatable code

Example: MIPS
- Absolute address
j target
- PC-relative
b target

Flow of Control (cont.)

e.g., Pentium

e.g., SPARC

Flow of Control (cont.)


Branches
Conditional
- Jump is taken only if the condition is met
Two types
- Set-Then-Jump
Condition testing is separated from branching
Condition code registers are used to convey the condition test
result
Condition code registers keep a record of the status of the last
ALU operation such as overflow condition

- Example: Pentium code


cmp
AX,BX
je
target

; compare AX and BX
; jump if equal

Flow of Control (cont.)


- Test-and-Jump
Single instruction performs condition testing and branching

- Example: MIPS instruction


beq
Rsrc1,Rsrc2,target
Jumps to target if Rsrc1 = Rsrc2

Delayed branching
Control is transferred after executing the instruction that
follows the branch instruction
- This instruction slot is called delay slot

Improves efficiency
Highly pipelined RISC processors support

Flow of Control (cont.)

Procedure calls
Facilitate modular programming
Require two pieces of information to return
- End of procedure
Pentium
uses ret instruction
MIPS
uses jr instruction

- Return address
In a (special) register
MIPS allows any general-purpose register
On the stack
Pentium

Flow of Control (cont.)

Flow of Control (cont.)

Delay slot

Parameter Passing

Two basic techniques


Register-based (e.g., PowerPC, MIPS)
- Internal registers are used
Faster
Limit the number of parameters
Recursive procedure

Stack-based (e.g., Pentium)


- Stack is used
More general

3: Operand Types

Instructions support basic data types


Characters
Integers
Floating-point

Instruction overload
Same instruction for different data types
Example: Pentium
mov

AL,address

;loads an 8-bit value

mov

AX,address

;loads a 16-bit value

mov

EAX,address

;loads a 32-bit value

Operand Types

Separate instructions
Instructions specify the operand size
Example: MIPS
lb

Rdest,address

;loads a byte

lh

Rdest,address

;loads a halfword
;(16 bits)

lw

Rdest,address

;loads a word
;(32 bits)

ld

Rdest,address
Similar instruction: store

;loads a doubleword
;(64 bits)

4: Addressing Modes

How the operands are specified


Operands can be in three places
- Registers
Register addressing mode

- Part of instruction
Constant
Immediate addressing mode
All processors support these two addressing modes

- Memory
Difference between RISC and CISC
CISC supports a large variety of addressing modes
RISC follows load/store architecture

5: Instruction Types

Several types of instructions


Data movement
- Pentium:

mov

dest,src

- Some do not provide direct data movement instructions


- Indirect data movement
add
Rdest,Rsrc,0 ;Rdest = Rsrc+0

Arithmetic and Logical


- Arithmetic
Integer and floating-point, signed and unsigned
add, subtract, multiply, divide

- Logical
and, or, not, xor

Instruction Types (cont.)

Condition code bits


S: Sign bit (0 = +, 1= -)
Z: Zero bit (0 = nonzero, 1 = zero)
O: Overflow bit (0 = no overflow, 1 = overflow)
C: Carry bit (0 = no carry, 1 = carry)

Example: Pentium
cmp

count,25

;compare count to 25
;subtract 25 from count

je

target

;jump if equal

Instruction Types (cont.)

Flow control and I/O instructions


- Branch
- Procedure call
- Interrupts

I/O instructions
- Memory-mapped I/O
Most processors support memory-mapped I/O
No separate instructions for I/O

- Isolated I/O
Pentium supports isolated I/O
Separate I/O instructions
in
AX,io_port ;read from an I/O port
out
io_port,AX ;write to an I/O port

6: Instruction Formats

Two types
Fixed-length
- Used by RISC processors
- 32-bit RISC processors use 32-bits wide instructions
Examples: SPARC, MIPS, PowerPC

Variable-length
- Used by CISC processors
- Memory operands need more bits to specify

Opcode
Major and exact operation

Examples of Instruction Formats

RISC

(Reduced Instruction Set Computer )

versus

CISC (Complex Instruction Set Computer)

RISC Vs CISC
The underlying philosophy of RISC machines is that a
system is better able to manage program execution
when the program consists of only a few different
instructions that are the same length and require the
same number of clock cycles to decode and execute.
RISC systems access memory only with explicit load
and store instructions.
In CISC systems, many different kinds of instructions
access memory, making instruction length variable
and fetch-decode-execute time unpredictable.
84

RISC Vs CISC
The difference between CISC and RISC becomes
evident through the basic computer performance
equation:

RISC systems shorten execution time by reducing


the clock cycles per instruction.
CISC systems improve performance by reducing the
number of instructions per program.

85

RISC Vs CISC
The simple instruction set of RISC machines
enables control units to be hardwired for maximum
speed.
The more complex-- and variable-- instruction set of
CISC machines requires microcode-based control
units that interpret instructions as they are fetched
from memory. This translation takes time.
With fixed-length instructions, RISC lends itself to
pipelining and speculative execution.

86

RISC Vs CISC
Consider the the program fragments:
CISC

mov ax, 10
mov bx, 5
mul bx, ax

RISC

Begin

mov ax, 0
mov bx, 10
mov cx, 5
add ax, bx
loop Begin

The total clock cycles for the CISC version might be:

(2 movs 1 cycle) + (1 mul 30 cycles) = 32 cycles

While the clock cycles for the RISC version is:

(3 movs 1 cycle) + (5 adds 1 cycle) + (5 loops 1


cycle) = 13 cycles

With RISC clock cycle being shorter, RISC gives us


much faster execution speeds.
87

RISC Vs CISC
Because of their load-store ISAs, RISC architectures
require a large number of CPU registers.
These register provide fast access to data during
sequential program execution.
They can also be employed to reduce the overhead
typically caused by passing parameters to
subprograms.
Instead of pulling parameters off of a stack, the
subprogram is directed to use a subset of registers.

88

RISC Vs CISC
This is how
registers can
be overlapped
in a RISC
system.
The current
window pointer
(CWP) points
to the active
register
window.

89

RISC Vs CISC
It is becoming increasingly difficult to distinguish
RISC architectures from CISC architectures.
Some RISC systems provide more extravagant
instruction sets than some CISC systems.
Some systems combine both approaches.
The following two slides summarize the
characteristics that traditionally typify the differences
between these two architectures.

90

RISC Vs CISC
RISC
Multiple register sets.
Three operands per
instruction.
Parameter passing
through register
windows.
Single-cycle
instructions.
Hardwired
control.
Continued....
Highly pipelined.

CISC
Single register set.
One or two register
operands per
instruction.
Parameter passing
through memory.
Multiple cycle
instructions.
Microprogrammed
control.
Less pipelined.
91

RISC Vs CISC
RISC
Simple instructions,
few in number.
Fixed length
instructions.
Complexity in
compiler.
Only LOAD/STORE
instructions access
memory.
Few addressing modes.

CISC
Many complex
instructions.
Variable length
instructions.
Complexity in
microcode.
Many instructions can
access memory.
Many addressing
modes.
92

RISC Vs CISC

Summary

Instruction Set Design Issues

Instruction set design issues include:


Where are operands stored?
- registers, memory, stack, accumulator
How many explicit operands are there?
- 0, 1, 2, or 3
How is the operand location specified?
- register, immediate, indirect, . . .
What type & size of operands are supported?
- byte, int, float, double, string, vector. . .
What operations are supported?
- add, sub, mul, move, compare . . .

More About General Purpose Registers

Why do almost all new architectures use


GPRs?
Registers are much faster than memory (even
cache)
- Register values are available immediately
- When memory isnt ready, processor must wait
(stall)

Registers are convenient for variable storage


- Compiler assigns some variables just to registers
- More compact code since small fields specify
registers
(compared to memory addresses)

Processor
Registers

Memory

Cache

Disk

What Operations are Needed


Arithmetic + Logical
Integer arithmetic: ADD, SUB, MULT, DIV, SHIFT
Logical operation: AND, OR, XOR, NOT

Data Transfer - copy, load, store


Control - branch, jump, call, return
Floating Point (ADDF, MULF, DIVF )
Same as arithmetic but usually take bigger operands

Decimal - ADDD, CONVERT


String - move, compare, search
Graphics pixel and vertex, compression/decompression operations

97

Stacks Architecture: Pros and Cons

Pros
Good code density (implicit top of stack)
Low hardware requirements
Easy to write a simpler compiler for stack architectures

Cons
Stack becomes the bottleneck
Little ability for parallelism or pipelining
Data is not always at the top of stack when need, so additional
instructions like TOP and SWAP are needed
Difficult to write an optimizing compiler for stack architectures

Accumulators Architecture : Pros and Cons

Pros
Very low hardware requirements
Easy to design and understand

Cons
Accumulator becomes the bottleneck
Little ability for parallelism or pipelining
High memory traffic

Memory-Memory Architecture : Pros and Cons

Pros
Requires fewer instructions (especially if 3 operands)
Easy to write compilers for (especially if 3 operands)

Cons
Very high memory traffic (especially if 3 operands)
Variable number of clocks per instruction
With two operands, more data movements are required

Memory-Register Architecture : Pros and Cons


Pros
Some data can be accessed without loading first
Instruction format easy to encode
Good code density

Cons
Operands are not equivalent (poor orthogonal)
Variable number of clocks per instruction
May limit number of registers