0 penilaian0% menganggap dokumen ini bermanfaat (0 suara)
53 tayangan24 halaman
This document discusses register usage in the MIPS architecture instruction set. It contains three key points:
1. The MIPS architecture dedicates specific registers for special purposes like the stack pointer ($sp), frame pointer ($fp), and return address ($ra). Other registers like $t0-$t9 are used for temporary values while $s0-$s7 must be preserved across function calls.
2. Functions are assigned stack frames to store local variables and arguments. The $fp register points to the base of the current stack frame. Addresses within the frame are computed as offsets from $fp.
3. Calling and returning between functions involves saving/restoring registers as needed and managing stack frames and return
This document discusses register usage in the MIPS architecture instruction set. It contains three key points:
1. The MIPS architecture dedicates specific registers for special purposes like the stack pointer ($sp), frame pointer ($fp), and return address ($ra). Other registers like $t0-$t9 are used for temporary values while $s0-$s7 must be preserved across function calls.
2. Functions are assigned stack frames to store local variables and arguments. The $fp register points to the base of the current stack frame. Addresses within the frame are computed as offsets from $fp.
3. Calling and returning between functions involves saving/restoring registers as needed and managing stack frames and return
This document discusses register usage in the MIPS architecture instruction set. It contains three key points:
1. The MIPS architecture dedicates specific registers for special purposes like the stack pointer ($sp), frame pointer ($fp), and return address ($ra). Other registers like $t0-$t9 are used for temporary values while $s0-$s7 must be preserved across function calls.
2. Functions are assigned stack frames to store local variables and arguments. The $fp register points to the base of the current stack frame. Addresses within the frame are computed as offsets from $fp.
3. Calling and returning between functions involves saving/restoring registers as needed and managing stack frames and return
Additional notes: Inf3 Computer Architecture - 2007-2008 40 Register Usage in MIPS ABI Register Soft ABI function for this Number Name register $0 always contains zero $1 at recerved for accembler $2-$3 v0,v1 loLeg er fuocLloo reculL ouL) or cLaLlc llok lo) $4-$7 a0-a3 f lrcL 4 loLeg er-Lype fuocLloo arg umeoLc $8-$15 t0-t7 L emporary reg lcLerc for expreccloo evaluaLloo $16-$23 s0-s7 reg lcLerc precerved acrocc fuocLloo call $24-$25 t8,t9 L emporary reg lcLerc for expreccloo evaluaLloo $28 gp g lobal poloLer $29 sp cLack poloLer $30 fp f rame poloLer $31 ra reLuro addrecc The ABI gives well-understood functions to each of the registers in the general purpose register set. There are obvious uses, such as the stack pointer. There are also three other special registers; the return address (ra), the frame pointer (fp) and the global pointer (gp). The ra register is assigned the return address when a function call is made. Software will put this value on the stack if the called function itself calls further functions. The fp register points to the base of the stack frame for the current function. Well see that in the next slide. The gp register, when used, points to a pool of global data that can be commonly referenced by all functions. This may include variables with file or global scope. A function can use registers t0-t9 freely, but if it calls another function they may be overwritten. A function may not overwrite the contents of s0-s7, and must preserve their original contents if it wants to use them. Hence, s0-s7 are callee-saved, whereas t0-t9 are caller-saved registers. 2008-2009 Informatics 3 - Computer Architecture 41 Additional notes: Inf3 Computer Architecture - 2007-2008 41 Functions and Stack Frames foo (int i) { return bar (i); } int bar (int n) { int a = n+1, b = n-1; return (a*b); } ! Each function has a dynamically allocated stack frame ! Frame contents normally accessed by addresses that are relative to either the stack pointer $sp or the frame pointer $fp Stack frame for foo Stack frame for bar free stack space high addresses low addresses stack usually grows downwards $sp $fp Stacks usually grow downwards in memory. Can you think why this might be? 2008-2009 Informatics 3 - Computer Architecture 42 Additional notes: Inf3 Computer Architecture - 2007-2008 42 Anatomy of a Stack Frame int foo (int i) { return bar (i); } int bar (int n) { int a = n+1, b = n-1; return (a*b); } ! Positive offsets from $fp = args ! Negative offsets from $fp = locals ! Not all portions of frame are needed by all functions ! Callee save space holds previous $fp, $ra, and any $s0-$7 that are modied by function bar Stack frame for foo Stack frame for bar free stack space high addresses low addresses $sp $fp incoming args callee-save space local variables outgoing args The incoming arguments are values passed from foo to bar. Some of the args may be passed in registers and may not need space on the stack. The callee save space is a region that bar can use to save any of $s0-$s7 that may be modified in bar. Local variables in bar may require some storage space on the stack. The outgoing args space is where args for functions that bar calls will be stored. This space will become the incoming args space of functions that bar calls (if any). If bar calls several functions, then the outgoing args space would typically be the maximum space needed by any such function, allowing it to be allocated once. 2008-2009 Informatics 3 - Computer Architecture 43 Additional notes: Inf3 Computer Architecture - 2007-2008 43 Call Return Sequencing ! Call sequence Save caller-saved registers Copy arguments to stack or regs Call the function ! Return sequence Restore caller-saved registers ! Function Prologue Allocate callees stack frame Reposition frame pointer Save callee-saved registers ! < execute body of function > ! Function Epilogue Restore callee-saved registers Restore frame pointer De-allocated callees stack frame Return to caller Exercise: take the foo() and bar() code shown earlier. Compile it using gcc on your workstation to produce an assembler file, and identify the four sequences listed in this slide. To do this type: gcc O S o assembler.lis program.c Where assembler.lis is the output where your assembler code will be produced, and program.c is the name of your C source file containing foo() and bar(). 2008-2009 Informatics 3 - Computer Architecture 44 Additional notes: Inf3 Computer Architecture - 2007-2008 44 Categorising Data by Location and Access ! C programs contain several categories of data, according to where they live and how they are created ! The way addresses are computed depends on the category of access Static Read-only Static Read or Write Dynamic malloc(), free() Dynamic Function scope Dynamic Function scope How created $pc + signed offset Often in a constant pool in the .text section Embedded constants Addressing mode Where data is located Classication $gp + signed offset .bss section Global and static variables GPR + offset On the heap Dynamically allocated variables $fp + negative offset On stack, below frame pointer Automatic variables $fp + positive offset On stack, above frame pointer Function arguments Each category of data, whether a function argument or an automatic variable, is allocated in a different way, and is therefore accessed in a different way. There are well-defined regions, such as the stack, the heap and the global data area. Each may have its own pointer (e.g. $sp, $gp) or may be accessed relative to $pc or a general-purpose register. 2008-2009 Informatics 3 - Computer Architecture 45 Additional notes: Inf3 Computer Architecture - 2007-2008 45 Addressing Mode Frequency ! Bottom-line: few addressing modes account for most of the instructions in programs H&P Fig. 2.7 1 0 24 43 32 6 16 3 17 55 1 6 11 39 40 0 10 20 30 40 50 60 Indirect Scaled Register Immediate Displacement A d d r e s s i n g
m o d e Frequency of the addressing mode (%) gcc spice TeX In practice, compilers usually convert complex address calculations into unsigned integer computations and then use very simple addressing modes based on computed addresses. Many memory references are to variables located on the stack. These always use [sp + offset] addressing modes, making the Displacement mode one of the most common. Try compiling a simple piece of C code into assembler and look at the addressing modes obtained for each variable accessed by the code. Hint: gcc -S foo.c 2008-2009 Informatics 3 - Computer Architecture 46 Additional notes: Inf3 Computer Architecture - 2007-2008 46 Displacement Addressing and Data Classication ! Stack pointer and Frame pointer relative Compiler can often eliminate frame pointer Function must not call alloca() 5 to 10 bits of offset is sufcient in most cases ! Register + offset Generic form for accessing via pointers Multi-dimensional arrays require address calculations ! PC relative addresses Useful for locating commonly-used constants in a pool of constants located in the .text section Exercise: add a call to alloca() in both foo() and bar() to see the effect on how the code gets compiled. Try man alloca if unsure how to use it. 2008-2009 Informatics 3 - Computer Architecture 47 Additional notes: Inf3 Computer Architecture - 2007-2008 47 Floating point arithmetic ! Usually based on the IEEE 754 oating point standard ! Useful when greater range of number is required Integer: -2 m-1 .. +2 m-1 -1 Floating point: Binary Decimal Single precision (2-2 -23 ) 127 ~ 10 38.53 Double precision (2-2 -52 ) 1023 ~ 10 308.25 ! See Hennessy & Patterson appendix for details of formats and operations Set aside an hour to read their appendix and become familiar with the overall structure of the FP standard (dont memorise details you can always refer back to the standard if you ever need to use it) ! Key points for instruction sets: Integer and Floating Point never mixed in same operation Separate register sets for integer and FP operations are therefore common Floating point operations often optional or omitted from embedded processors Other ways to represent fractional values, e.g. xed-point types Follow the suggested reading on Hennessy and Patterson from the second bullet point. Make summary notes here. 2008-2009 Informatics 3 - Computer Architecture 48 Additional notes: Inf3 Computer Architecture - 2007-2008 48 Encoding the Instruction Set ! How many bits per instruction? Fixed-length 32-bit RISC encoding Variable-length encoding (e.g. Intel x86) Compact 16-bit RISC encodings ! ARM Thumb ! MIPS16 ! ARCompact ! Formats dene instruction groups with a common set of operands An instruction format defines a set of operands that are used in common by a group of instructions. An instruction set is simply a collection of formats and the operations defined for each format. 2008-2009 Informatics 3 - Computer Architecture 49 Additional notes: Inf3 Computer Architecture - 2007-2008 49 Design consideration for ISA encoding ! How compact is the encoding? ! Is the encoding orthogonal? ! How easy is it to extract operands unambiguously? Register speciers should be aligned in all formats (ideally) Implicitly dened registers will complicate decode How are the literals aligned and/or extended? ! Are control transfers easily identiable? If not, slow decoding of branches may increase CPI ! Op-code assignment: Minimise Hamming distance between codes that perform similar operations. Leads to simpler and faster decode logic If you dont know what Hamming distance is, see page 193 of Andrew Tanenbaum, Computer Networks, 4 th edition (a standard text in communications). A google search will also find the definition. Think about why this is useful in instruction set design, and then make notes here as a reminder. 2008-2009 Informatics 3 - Computer Architecture 50 Additional notes: Inf3 Computer Architecture - 2007-2008 50 MIPS 32-bit Instruction Formats ! R-type (register to register) three register operands most arithmetic, logical and shift instructions ! I-type (register with immediate) instructions which use two registers and a constant arithmetic/logical with immediate operand load and store branch instructions with relative branch distance ! J-type (jump) jump instructions with a 26 bit address At this point you will find it helpful to read Appendix B from Hennessy and Patterson (4/e) Putting it all together: The MIPS Architecture, p.B-32 Appendix B is all about ISA design issues, using the MIPS architecture as a teaching vehicle. 2008-2009 Informatics 3 - Computer Architecture 51 Additional notes: Inf3 Computer Architecture - 2007-2008 51 MIPS R-type instruction format 6 blLc 6 blLc 6 blLc 6 blLc 6 blLc 6 blLc opcode reg rc reg rL reg rd cHamL fuocL add $1, $2, $3 sll $4, $5, 16 special $2 $3 $1 add special $5 $4 16 sll Make your own list of instructions that follow this format. 2008-2009 Informatics 3 - Computer Architecture 52 Additional notes: Inf3 Computer Architecture - 2007-2008 52 MIPS I-type instruction format 6 blLc 1 6 blLc 6 blLc 6 blLc opcode reg rc reg rL lmmedlaLe value/addr lw $2 $1 address offset beq $4 $5 (PC - .L001) >> 2 lw $1, offset($2) beq $4, $5, .L001 addi $1, $2, -10 addi $2 $1 0xfff6 Find more examples of instructions that follow this format and write them here. 2008-2009 Informatics 3 - Computer Architecture 53 Additional notes: Inf3 Computer Architecture - 2007-2008 53 MIPS J-type instruction format 6 blLc 2 6 blLc opcode addrecc call func call absolute func address >> 2 Again, find other examples of MIPS instructions that use this format. 2008-2009 Informatics 3 - Computer Architecture 54 Additional notes: Inf3 Computer Architecture - 2007-2008 54 Code density optimisations ! Prologue and Epilogue ! Constant pools and PC relative loads ! 2-register formats ! Restricted register sets ! Non-orthogonality and implicit register operands Read section B.10, Fallacies and Pitfalls, on page B-39 of Hennessy & Patterson. Make brief notes here to remind you of the main points. 2008-2009 Informatics 3 - Computer Architecture 55 Additional notes: Inf3 Computer Architecture - 2007-2008 55 Examples: Special Features GP registers Instruction Size Instruction Set Architecture Freely-mixed compact and 32-bit instructions Long-immediate data 8 direct 32 available Mixed 16 and 32 bit ARCompact push and pop for stack frame support 8 16 bit ARM thumb Some special ABI registers still accessible 8 16 bit MIPS16 Most 32-bit architectures used in embedded systems have acquired a subset that is encoded in 16 bits. These instructions still operate on 32-bit data, but are encoded more efficiently. Generally speaking they all use two register operands rather than three, and also restrict the number of general purpose registers to 8. The ARCompact instruction set allows a free mixing of the original 32-bit instructions and the compact 16-bit instructions. This is not permitted in ARM thumb or MIPS16, where each function must be compiled into the 32-bit or the 16-bit instruction set. Recently, ARM introduced the Thumb2 instruction set which removes that restriction. 2008-2009 Informatics 3 - Computer Architecture 56 Additional notes: Inf3 Computer Architecture - 2007-2008 56 ARM Thumb Push and Pop instructions ! Particularly effective for encoding function entry and exit code in a compact form. ! Operand is a bit vector, with each bit specifying whether one of the callee saved registers should be pushed or popped. ! Push may also save the link register (equiv. to MIPS $ra) ! Pop may then pop that value directly into PC, causing the function to return to the caller. ! E.g. push { r4, r5, r6, r7, lr } pop { r4, r5, r6, r7, pc } ! These are multi-cycle operations, performing up to 5 memory reads or writes. ! Complex to implement, but highly effective in terms of code density Prologue and epilogue can account for 10-15% of the code space Try to find other Instruction Set Architectures that support multi-register move operations. List them here: 2008-2009 Informatics 3 - Computer Architecture 57 Additional notes: Inf3 Computer Architecture - 2007-2008 57 Instruction Frequency ! Bottom-line: few instruction types account for most of the instructions executed 96 Total 1 return 1 call 4 move register-register 5 sub 6 and 8 add 12 store 16 compare 20 conditional branch 22 load Fraction (%) 80x86 instruction H&P Fig. 2.16 Bear in mind that each architecture is different, but that in general the frequencies shown above are representative of typical desktop applications. Embedded applications often see increasing frequencies of signal processing operations, especially 16-bit multiplications. 2008-2009 Informatics 3 - Computer Architecture 58 Additional notes: Inf3 Computer Architecture - 2007-2008 58 IS and Performance ! ISA ! Implementation: cycle time, pipelining, CPI, instruction length ! ISA ! Compiler: instruction scheduling, code motion, branch optimizations, code generation, code size, register allocation ! Implementation ! instruction delays, register allocation, functional units ISA Compiler Implementation Performance This slide summarises the relationship between ISA and Compiler, and ISA and Implementation. 2008-2009 Informatics 3 - Computer Architecture 59 Additional notes: Inf3 Computer Architecture - 2007-2008 59 IS Guidelines ! Regularity: operations, data types, addressing modes, and registers should be independent (orthogonal) ! Primitives, not solutions: do not attempt to match HLL constructs with special IS instructions ! Simplify tradeoffs: make it easy for compiler to make choices based on estimated performance ! Trust compiler: provide compiler with instructions and primitives that exploit knowledge at compile-time Instruction Sets can vary enormously from one architecture to another. However, within the set of all RISC architectures there are actually few substantial differences. It is also worth noting that the number of distinct desktop architectures has been decreasing year on year. In 2007 most new desktop systems shipped will have x86 processors. In the server space one can still find Sun SPARC and IBM PowerPC architectures. The embedded computing domain has a much greater diversity of architectures. Can you think why this might be? 2008-2009 Informatics 3 - Computer Architecture 60 Additional notes: Inf3 Computer Architecture - 2007-2008 60 Improving CPU Performance (H&P 2.11; A.1; A3) ! CPU performance can be computed by the CPU performance equation: CPU time = IC x CPI x Clock time ! To reduce CPU time: " IC; " clock period; " CPI ! ISA inuences implementation, compiler optimizations, and therefore performance ! ISA must be an easy compiler target ! No need to provide too many and too complex instructions ! Compiler has a signicant role in improving performance Essentially, to improve CPI we must reduce one of the three primary contributors, or else issue more than one instruction per cycle (or both!) 2008-2009 Informatics 3 - Computer Architecture 61 Additional notes: Inf3 Computer Architecture - 2007-2008 61 Program Structure: Basic-Blocks (BB) ! Denition: straight-line code with single entry and single exit ! Boundaries: Branches and jumps Calls and returns Targets of branches, jumps, calls, and returns
BB1 BB2 BB3 BB1 BB2 BB3 Note: not all basic blocks are preceded by a branch. Contrive an example instruction sequence to illustrate this point here: 2008-2009 Informatics 3 - Computer Architecture 62 Additional notes: Inf3 Computer Architecture - 2007-2008 62 Structure of Modern Compilers Dependences Front-end Function Language dependent; machine independent Generate intermediate representation HLL code High-level optimizations IR Somewhat language independent largely machine independent Procedure inlining; loop transformations Global optimizer Optimized IR Mostly language independent mostly machine independent Global + local optimizations; register allocation Code generator SSA Language independent machine dependent Instruction selection; scheduling Machine code If you are taking a compiler course this year, these optimisations will be familiar. If not, you need to be at least aware of: 1. The difference between global and local optimisations 2. Machine dependent and machine independent optimisations If you need help with understanding the role of compilers, read section B.8, Crosscutting Issues: The Role of Compilers, in H&P (4/e) on page B-24 2008-2009 Informatics 3 - Computer Architecture 63 Additional notes: Inf3 Computer Architecture - 2007-2008 63 Compiler Optimizations ! High-level: at HLL source Procedure inlining ! Local: within basic-block (BB) Common sub-expression elimination Constant propagation Stack height reduction ! Global: across BBs Global common sub-expression elimination Copy propagation Code motion Induction variable elimination ! Machine-dependent Strength reduction Pipeline scheduling Branch offset optimization This slide summarises the essential concepts. A little reading around the subject and supplementary note-taking will help with revision.