Anda di halaman 1dari 24

2008-2009

Informatics 3 - Computer Architecture 40


Additional notes:
Inf3 Computer Architecture - 2007-2008
40
Register Usage in MIPS ABI
Register Soft ABI function for this
Number Name register
$0 always contains zero
$1 at recerved for accembler
$2-$3 v0,v1 loLeg er fuocLloo reculL ouL) or cLaLlc llok lo)
$4-$7 a0-a3 f lrcL 4 loLeg er-Lype fuocLloo arg umeoLc
$8-$15 t0-t7 L emporary reg lcLerc for expreccloo evaluaLloo
$16-$23 s0-s7 reg lcLerc precerved acrocc fuocLloo call
$24-$25 t8,t9 L emporary reg lcLerc for expreccloo evaluaLloo
$28 gp g lobal poloLer
$29 sp cLack poloLer
$30 fp f rame poloLer
$31 ra reLuro addrecc
The ABI gives well-understood functions to each of the registers in the general purpose register
set. There are obvious uses, such as the stack pointer. There are also three other special registers;
the return address (ra), the frame pointer (fp) and the global pointer (gp). The ra register is
assigned the return address when a function call is made. Software will put this value on the
stack if the called function itself calls further functions. The fp register points to the base of the
stack frame for the current function. Well see that in the next slide. The gp register, when used,
points to a pool of global data that can be commonly referenced by all functions. This may
include variables with file or global scope.
A function can use registers t0-t9 freely, but if it calls another function they may be overwritten.
A function may not overwrite the contents of s0-s7, and must preserve their original contents if it
wants to use them. Hence, s0-s7 are callee-saved, whereas t0-t9 are caller-saved registers.
2008-2009
Informatics 3 - Computer Architecture 41
Additional notes:
Inf3 Computer Architecture - 2007-2008
41
Functions and Stack Frames
foo (int i)
{
return bar (i);
}
int bar (int n)
{
int a = n+1, b = n-1;
return (a*b);
}
! Each function has a dynamically
allocated stack frame
! Frame contents normally accessed by
addresses that are relative to either
the stack pointer $sp or the frame
pointer $fp
Stack frame
for foo
Stack frame
for bar
free stack space
high addresses
low addresses
stack
usually
grows
downwards
$sp
$fp
Stacks usually grow downwards in memory. Can you think why this might be?
2008-2009
Informatics 3 - Computer Architecture 42
Additional notes:
Inf3 Computer Architecture - 2007-2008
42
Anatomy of a Stack Frame
int foo (int i)
{
return bar (i);
}
int bar (int n)
{
int a = n+1, b = n-1;
return (a*b);
}
! Positive offsets from $fp = args
! Negative offsets from $fp = locals
! Not all portions of frame are needed by
all functions
! Callee save space holds previous $fp,
$ra, and any $s0-$7 that are modied by
function bar
Stack frame
for foo
Stack frame
for bar
free stack space
high addresses
low addresses
$sp
$fp
incoming args
callee-save space
local variables
outgoing args
The incoming arguments are values passed from foo to bar. Some of the args may be passed
in registers and may not need space on the stack. The callee save space is a region that bar
can use to save any of $s0-$s7 that may be modified in bar. Local variables in bar may
require some storage space on the stack. The outgoing args space is where args for functions
that bar calls will be stored. This space will become the incoming args space of functions
that bar calls (if any). If bar calls several functions, then the outgoing args space would
typically be the maximum space needed by any such function, allowing it to be allocated
once.
2008-2009
Informatics 3 - Computer Architecture 43
Additional notes:
Inf3 Computer Architecture - 2007-2008
43
Call Return Sequencing
! Call sequence
Save caller-saved registers
Copy arguments to stack or regs
Call the function
! Return sequence
Restore caller-saved registers
! Function Prologue
Allocate callees stack frame
Reposition frame pointer
Save callee-saved registers
! < execute body of function >
! Function Epilogue
Restore callee-saved registers
Restore frame pointer
De-allocated callees stack frame
Return to caller
Exercise: take the foo() and bar() code shown earlier. Compile it using gcc on your
workstation to produce an assembler file, and identify the four sequences listed in this slide.
To do this type:
gcc O S o assembler.lis program.c
Where assembler.lis is the output where your assembler code will be produced, and
program.c is the name of your C source file containing foo() and bar().
2008-2009
Informatics 3 - Computer Architecture 44
Additional notes:
Inf3 Computer Architecture - 2007-2008
44
Categorising Data by Location and Access
! C programs contain several categories of data, according to where they
live and how they are created
! The way addresses are computed depends on the category of access
Static
Read-only
Static
Read or Write
Dynamic
malloc(), free()
Dynamic
Function scope
Dynamic
Function scope
How created
$pc + signed offset Often in a constant pool in
the .text section
Embedded
constants
Addressing mode Where data is located Classication
$gp + signed offset .bss section Global and static
variables
GPR + offset On the heap Dynamically
allocated
variables
$fp + negative offset On stack, below frame
pointer
Automatic
variables
$fp + positive offset On stack, above frame
pointer
Function
arguments
Each category of data, whether a function argument or an automatic variable, is allocated in
a different way, and is therefore accessed in a different way. There are well-defined regions,
such as the stack, the heap and the global data area. Each may have its own pointer (e.g. $sp,
$gp) or may be accessed relative to $pc or a general-purpose register.
2008-2009
Informatics 3 - Computer Architecture 45
Additional notes:
Inf3 Computer Architecture - 2007-2008
45
Addressing Mode Frequency
! Bottom-line: few addressing modes account for most of the
instructions in programs
H&P
Fig. 2.7
1
0
24
43
32
6
16
3
17
55
1
6
11
39
40
0 10 20 30 40 50 60
Indirect
Scaled
Register
Immediate
Displacement
A
d
d
r
e
s
s
i
n
g

m
o
d
e
Frequency of the addressing mode (%)
gcc
spice
TeX
In practice, compilers usually convert complex address calculations into unsigned integer
computations and then use very simple addressing modes based on computed addresses.
Many memory references are to variables located on the stack. These always use [sp + offset]
addressing modes, making the Displacement mode one of the most common.
Try compiling a simple piece of C code into assembler and look at the addressing modes obtained
for each variable accessed by the code.
Hint: gcc -S foo.c
2008-2009
Informatics 3 - Computer Architecture 46
Additional notes:
Inf3 Computer Architecture - 2007-2008
46
Displacement Addressing and Data Classication
! Stack pointer and Frame pointer relative
Compiler can often eliminate frame pointer
Function must not call alloca()
5 to 10 bits of offset is sufcient in most cases
! Register + offset
Generic form for accessing via pointers
Multi-dimensional arrays require address calculations
! PC relative addresses
Useful for locating commonly-used constants in a pool of
constants located in the .text section
Exercise: add a call to alloca() in both foo() and bar() to see the effect on how the code gets
compiled. Try man alloca if unsure how to use it.
2008-2009
Informatics 3 - Computer Architecture 47
Additional notes:
Inf3 Computer Architecture - 2007-2008
47
Floating point arithmetic
! Usually based on the IEEE 754 oating point standard
! Useful when greater range of number is required
Integer: -2
m-1
.. +2
m-1
-1
Floating point:
Binary Decimal
Single precision (2-2
-23
)
127
~ 10
38.53
Double precision (2-2
-52
)
1023
~ 10
308.25
! See Hennessy & Patterson appendix for details of formats and operations
Set aside an hour to read their appendix and become familiar with the overall
structure of the FP standard (dont memorise details you can always refer
back to the standard if you ever need to use it)
! Key points for instruction sets:
Integer and Floating Point never mixed in same operation
Separate register sets for integer and FP operations are therefore common
Floating point operations often optional or omitted from embedded processors
Other ways to represent fractional values, e.g. xed-point types
Follow the suggested reading on Hennessy and Patterson from the second bullet point. Make
summary notes here.
2008-2009
Informatics 3 - Computer Architecture 48
Additional notes:
Inf3 Computer Architecture - 2007-2008
48
Encoding the Instruction Set
! How many bits per instruction?
Fixed-length 32-bit RISC encoding
Variable-length encoding (e.g. Intel x86)
Compact 16-bit RISC encodings
! ARM Thumb
! MIPS16
! ARCompact
! Formats dene instruction groups with a common set of
operands
An instruction format defines a set of operands that are used in common by a group of
instructions. An instruction set is simply a collection of formats and the operations defined
for each format.
2008-2009
Informatics 3 - Computer Architecture 49
Additional notes:
Inf3 Computer Architecture - 2007-2008
49
Design consideration for ISA encoding
! How compact is the encoding?
! Is the encoding orthogonal?
! How easy is it to extract operands unambiguously?
Register speciers should be aligned in all formats (ideally)
Implicitly dened registers will complicate decode
How are the literals aligned and/or extended?
! Are control transfers easily identiable?
If not, slow decoding of branches may increase CPI
! Op-code assignment:
Minimise Hamming distance between codes that perform
similar operations.
Leads to simpler and faster decode logic
If you dont know what Hamming distance is, see page 193 of Andrew Tanenbaum,
Computer Networks, 4
th
edition (a standard text in communications). A google search will
also find the definition. Think about why this is useful in instruction set design, and then
make notes here as a reminder.
2008-2009
Informatics 3 - Computer Architecture 50
Additional notes:
Inf3 Computer Architecture - 2007-2008
50
MIPS 32-bit Instruction Formats
! R-type (register to register)
three register operands
most arithmetic, logical and shift instructions
! I-type (register with immediate)
instructions which use two registers and a constant
arithmetic/logical with immediate operand
load and store
branch instructions with relative branch distance
! J-type (jump)
jump instructions with a 26 bit address
At this point you will find it helpful to read Appendix B from Hennessy and Patterson (4/e)
Putting it all together: The MIPS Architecture, p.B-32
Appendix B is all about ISA design issues, using the MIPS architecture as a teaching
vehicle.
2008-2009
Informatics 3 - Computer Architecture 51
Additional notes:
Inf3 Computer Architecture - 2007-2008
51
MIPS R-type instruction format
6 blLc 6 blLc 6 blLc 6 blLc 6 blLc 6 blLc
opcode reg rc reg rL reg rd cHamL fuocL
add $1, $2, $3
sll $4, $5, 16
special $2 $3 $1 add
special $5 $4 16 sll
Make your own list of instructions that follow this format.
2008-2009
Informatics 3 - Computer Architecture 52
Additional notes:
Inf3 Computer Architecture - 2007-2008
52
MIPS I-type instruction format
6 blLc
1 6 blLc
6 blLc 6 blLc
opcode reg rc reg rL lmmedlaLe value/addr
lw $2 $1 address offset
beq $4 $5 (PC - .L001) >> 2
lw $1, offset($2)
beq $4, $5, .L001
addi $1, $2, -10 addi $2 $1 0xfff6
Find more examples of instructions that follow this format and write them here.
2008-2009
Informatics 3 - Computer Architecture 53
Additional notes:
Inf3 Computer Architecture - 2007-2008
53
MIPS J-type instruction format
6 blLc
2 6 blLc
opcode addrecc
call func
call absolute func address >> 2
Again, find other examples of MIPS instructions that use this format.
2008-2009
Informatics 3 - Computer Architecture 54
Additional notes:
Inf3 Computer Architecture - 2007-2008
54
Code density optimisations
! Prologue and Epilogue
! Constant pools and PC relative loads
! 2-register formats
! Restricted register sets
! Non-orthogonality and implicit register operands
Read section B.10, Fallacies and Pitfalls, on page B-39 of Hennessy & Patterson. Make
brief notes here to remind you of the main points.
2008-2009
Informatics 3 - Computer Architecture 55
Additional notes:
Inf3 Computer Architecture - 2007-2008
55
Examples:
Special Features GP registers Instruction
Size
Instruction Set
Architecture
Freely-mixed compact
and 32-bit
instructions
Long-immediate data
8 direct
32 available
Mixed 16 and
32 bit
ARCompact
push and pop for
stack frame support
8 16 bit ARM thumb
Some special ABI
registers still
accessible
8 16 bit MIPS16
Most 32-bit architectures used in embedded systems have acquired a subset that is encoded
in 16 bits. These instructions still operate on 32-bit data, but are encoded more efficiently.
Generally speaking they all use two register operands rather than three, and also restrict the
number of general purpose registers to 8. The ARCompact instruction set allows a free
mixing of the original 32-bit instructions and the compact 16-bit instructions. This is not
permitted in ARM thumb or MIPS16, where each function must be compiled into the 32-bit
or the 16-bit instruction set. Recently, ARM introduced the Thumb2 instruction set which
removes that restriction.
2008-2009
Informatics 3 - Computer Architecture 56
Additional notes:
Inf3 Computer Architecture - 2007-2008
56
ARM Thumb Push and Pop instructions
! Particularly effective for encoding function entry and exit code in
a compact form.
! Operand is a bit vector, with each bit specifying whether one of
the callee saved registers should be pushed or popped.
! Push may also save the link register (equiv. to MIPS $ra)
! Pop may then pop that value directly into PC, causing the
function to return to the caller.
! E.g.
push { r4, r5, r6, r7, lr }
pop { r4, r5, r6, r7, pc }
! These are multi-cycle operations, performing up to 5 memory
reads or writes.
! Complex to implement, but highly effective in terms of code
density
Prologue and epilogue can account for 10-15% of the code space
Try to find other Instruction Set Architectures that support multi-register move operations.
List them here:
2008-2009
Informatics 3 - Computer Architecture 57
Additional notes:
Inf3 Computer Architecture - 2007-2008
57
Instruction Frequency
! Bottom-line: few instruction types account for most of the
instructions executed
96 Total
1 return
1 call
4 move register-register
5 sub
6 and
8 add
12 store
16 compare
20 conditional branch
22 load
Fraction (%) 80x86 instruction
H&P
Fig. 2.16
Bear in mind that each architecture is different, but that in general the frequencies shown above
are representative of typical desktop applications.
Embedded applications often see increasing frequencies of signal processing operations,
especially 16-bit multiplications.
2008-2009
Informatics 3 - Computer Architecture 58
Additional notes:
Inf3 Computer Architecture - 2007-2008
58
IS and Performance
! ISA ! Implementation: cycle time, pipelining, CPI, instruction length
! ISA ! Compiler: instruction scheduling, code motion, branch
optimizations, code generation, code size, register allocation
! Implementation ! instruction delays, register allocation, functional
units
ISA Compiler Implementation
Performance
This slide summarises the relationship between ISA and Compiler, and ISA and Implementation.
2008-2009
Informatics 3 - Computer Architecture 59
Additional notes:
Inf3 Computer Architecture - 2007-2008
59
IS Guidelines
! Regularity: operations, data types, addressing modes, and
registers should be independent (orthogonal)
! Primitives, not solutions: do not attempt to match HLL
constructs with special IS instructions
! Simplify tradeoffs: make it easy for compiler to make choices
based on estimated performance
! Trust compiler: provide compiler with instructions and
primitives that exploit knowledge at compile-time
Instruction Sets can vary enormously from one architecture to another. However, within the set of
all RISC architectures there are actually few substantial differences.
It is also worth noting that the number of distinct desktop architectures has been decreasing year
on year. In 2007 most new desktop systems shipped will have x86 processors. In the server space
one can still find Sun SPARC and IBM PowerPC architectures.
The embedded computing domain has a much greater diversity of architectures. Can you think
why this might be?
2008-2009
Informatics 3 - Computer Architecture 60
Additional notes:
Inf3 Computer Architecture - 2007-2008
60
Improving CPU Performance (H&P 2.11; A.1; A3)
! CPU performance can be computed by the CPU
performance equation: CPU time = IC x CPI x Clock time
! To reduce CPU time: " IC; " clock period; " CPI
! ISA inuences implementation, compiler optimizations, and
therefore performance
! ISA must be an easy compiler target
! No need to provide too many and too complex
instructions
! Compiler has a signicant role in improving performance
Essentially, to improve CPI we must reduce one of the three primary contributors, or else issue
more than one instruction per cycle (or both!)
2008-2009
Informatics 3 - Computer Architecture 61
Additional notes:
Inf3 Computer Architecture - 2007-2008
61
Program Structure: Basic-Blocks (BB)
! Denition: straight-line code with single entry and single exit
! Boundaries:
Branches and jumps
Calls and returns
Targets of branches, jumps, calls, and returns

lw r2,0(r1)
lw r3,4(r1)
addi r3,r3,n
bne r2,r3,Label2
Label1: lw r4,8(r1)
sub r2,r2,m
beq r2,r0,label1
Label2: add r1,r1,r3

BB1
BB2
BB3
BB1
BB2 BB3
Note: not all basic blocks are preceded by a branch. Contrive an example instruction sequence to
illustrate this point here:
2008-2009
Informatics 3 - Computer Architecture 62
Additional notes:
Inf3 Computer Architecture - 2007-2008
62
Structure of Modern Compilers
Dependences
Front-end
Function
Language dependent;
machine independent
Generate intermediate
representation
HLL code
High-level
optimizations
IR
Somewhat language independent
largely machine independent
Procedure inlining;
loop transformations
Global
optimizer
Optimized IR
Mostly language independent
mostly machine independent
Global + local optimizations;
register allocation
Code
generator
SSA
Language independent
machine dependent
Instruction selection;
scheduling
Machine code
If you are taking a compiler course this year, these optimisations will be familiar. If not, you need
to be at least aware of:
1. The difference between global and local optimisations
2. Machine dependent and machine independent optimisations
If you need help with understanding the role of compilers, read section B.8, Crosscutting Issues:
The Role of Compilers, in H&P (4/e) on page B-24
2008-2009
Informatics 3 - Computer Architecture 63
Additional notes:
Inf3 Computer Architecture - 2007-2008
63
Compiler Optimizations
! High-level: at HLL source
Procedure inlining
! Local: within basic-block (BB)
Common sub-expression elimination
Constant propagation
Stack height reduction
! Global: across BBs
Global common sub-expression elimination
Copy propagation
Code motion
Induction variable elimination
! Machine-dependent
Strength reduction
Pipeline scheduling
Branch offset optimization
This slide summarises the essential concepts. A little reading around the subject and
supplementary note-taking will help with revision.

Anda mungkin juga menyukai