Anda di halaman 1dari 34

Code Generation : Expressions

Expression evaluation.
Register evaluation The integer unit of the x86 series is like this Stack evaluation Some machines perform arithmetic on a stack. Example the Intel Floating Point Unit in the Pentium series.

stack generation
consider statement a:=(a*b)+(1-(c*2)) with a,b,c:real On a pentium this would tranlate to

stack generation

instruction

action

d dword [ a] T:=T-1;st[T]:=a fmul dword [ b] st[T]:=st[T]*b oadlit 1.0 T:=T-1;st[T]:=1 d dword [ c] T:=T-1;st[T]:=c oadlit 2.0 T:=T-1;st[T]:=2 fmulp st1 st[T+1]:=st[T]*st[T+1];T:=T+1 fsubp st1 st[T+1]:=st[T]-st[T+1];T:=T+1 faddp st1 st[T+1]:=st[T]+st[T+1];T:=T+1 fstp dword [ a] a:=st[T];T:=T+1 Note each instruction species 1 operand, the top of stack is always the other operand.

After remapping to actual registers. Assume T=0 to start out with and that arithmetic on T is modulo 8. We label the registers f0-f7

instruction d dword [ a] fmul dword [ b] oadlit 1.0 d dword [ c] oadlit 2.0 fmulp st1 fsubp st1 faddp st1 fstp dword [ a]

action T at end
f7:=a f7:=f7*b

f6:=1

f5:=c

f4:=2

f5:=f4*f5

f6:=f5-f6

f7:=f6+f7

a:=f7

7 7 6 5 4 5 6 7 0

Effect

instruction d dword [ a] fmul dword [ b] oadlit 1.0 d dword [ c] oadlit 2.0 fmulp st1 fsubp st1 faddp st1 fstp dword [ a]

action
f7:=a f7:=f7*b

effect f7:=a f7:=a*b f6:=1 f5:=c f4:=2 f5:=2*c f6:=1-2*c f7:=a*b+1-2*c a:=a*b+1-2*c

f6:=1

f5:=c

f4:=2

f5:=f4*f5

f6:=f5-f6

f7:=f6+f7

a:=f7

Register Evaluation
Other machines evaluate operations in registers. An example is the Intel integer arithmetic unit on the Pentium family. This has 8 registers each of which are given mnemonic identiers in assembly language

Register Evaluation

Reg Mnemonic reserved r0 eax r1 ecx r2 edx r3 ebx r4 esp yes r5 ebp yes r6 esi r7 edi It can be seen that these mirror the registers f0-f7 in the oating point unit, but the way they are used is different.

2 or 3 operand instructions
The fpu registers are treated as a stack but the integer registers are directly addressable. Instructions must specify the source and destination registers involved. Suppose a,b,c:integer, then the statment a:=(a*b)+1-(2*c) translates to

2 or 3 operand instructions

Instruction mov edi, [ a] imul edi, [ b] mov esi, 1 imul ebx, [ c], 2 sub esi,ebx lea edi,[ edi+esi] mov [ a],edi

action

effect

r7:=a r7=a r7:=r7*b r7=a*b r6:=1 r6=1 r3:=c*2 r3=c*2 r6:=1-r3 r6=1-c*2 r7:=r7+r6 r7=a*b+1-c*2 a:=r7 a:=a*b+1-c*2

Points to note
1. On stack machine register allocation done by hardware. 2. On register machine, register allocation done by compiler. 3. Underlying process is very similar in both cases. Registers mustbe chosen, allocated and reserved for the duration of use.

Example a:=w*(w+x*(x+y*z)); As stack code this compiles as push push push push push push * + * + w w x x y z

* Max stack depth used =6

Registers Naive allocation


Assume we have instructions of the form add reg, reg/mem Given a tree of the form x+ y reserve a register r1 load x into r1 reserve a register r2 load y into r2 emit add r1,r2 unreserve r2 result in r1 We assume that we maintain reserve bits for each register at compile time

Code generated this way


mov edi,DWORD [ w] mov esi,DWORD [ w] mov ebx,DWORD [ x] mov eax,DWORD [ x] mov edx,DWORD [ y] imul edx, [ z] add DWORD eax,edx imul ebx,eax add DWORD esi,ebx imul edi,esi ; result in edi ;edi:=w ;esi:=w ;ebx:=x ;eax:=x ;edx:=y ;edx:=y*z ;eax:=x+y*z ;ebx:=x*(x+y*z) ;esi:=w+x*(x+y*z) ;edi:=w*(w+x*(x+y*z))

5 registers,10 instructions, 6 memory accesses This is almost equivalent to the stack code

Problems
This kind of code can rapidly lead to the number of registers being exhausted, especially if you get a lot of right nested expresions of the form a*(b+(c*(d+(e*...... It uses more instructions than are strictly neccessary

We can however use far fewer registers consider: mov edi,DWORD [ y];edi:=y imul edi, [ z] ;edi:=y*z add edi, [ x] ;edi:=x+Y*z imul edi, [ x] ;edi:=x*(x+Y*z) add edi, [ w] ;edi:=w+x*(x+Y*z) imul edi, [ w] ;edi:=w*(w+x*(x+Y*z)) Uses 1 register, 6 instructions, 6 memory accesses this was done by reorganising the expression to ((y*z+x)*x+w)*w

Weights
We can reorganise expressions to minimize their usage of registers. The weight function estimates how many registers are used to compile a sub-tree. On encountering a dyadic operator If weight(right)>weight(left)then if commutes(operator) then swap (left,right) This is because we already need r1 in ADD r1,r2 ; r1:=r1+r2

to hold the result so a left expression uses 1 less free register than a right expression.

Commuting operators
for the reorganisation to work we need

ab ba
for some operator Examples : +,, and, or, xor,= Not valid with: , , <, >, , ,

The weight function which estimates register usage for an expression weight(node->int) case node.type of Dyad: l= weight(left); r= weight(right); return (l<r?r+1:(l>r?l:l+1)); register, constant: return 0; memoryloc: i= weight(address_expr); return (w<1?1:w);

Minimising instruction usage and register usage is not always optimal. It may be better to minimise memory fetches

Example
Example of the same original expression using common sub expression elimination mov eax,DWORD [ x] mov esi,DWORD [ w] mov edi,DWORD [ y] imul edi, [ z] add DWORD edi,eax imul edi,eax add DWORD edi,esi imul edi,esi ;eax:=x ;esi:=w ;edi:=y ;edi:=y*z ;edi:=x+y*z ;edi:=x*(x+y*z) ;edi:=w+x*(x+y*z) ;edi:=w*(w+x*(x+y*z))

Uses 3 registers, 8 instructions, 4 memory accesses

This was done by rst reorganising the expression to ((y*z+x)*x+w)*w then eliminating repeated variable accesses eax:=x in ((y*z+eax)*eax+w)*w then to in esi:=w in ((y*z+eax)*eax+esi)*esi

Using ILCG you can generate either stack code or register code. Which one gets used depends on both the data types you use and the order in which the different instructions are listed in the instructionset list. If you use oating point with the default instruction order, then stack code used. If you use integers with the default instruction order, then register code produced.

If you were to move the FPU instructions up the instructionset list, then even integer expressions would use the FPU stack. This is not a good idea for two reasons: 1. It reduces the number of on chip temporaries that are available from 8 registers + 8 stack elements to just 8 stack elements 2. Many integer expressions are used in array addressing and these must return their value in an index register not on the FPU stack, since the latter can not be used to index memory.

Reuse
If you have a register based code generator you can reuse variables Consider LET T:= C+(C*B) C is used twice, so the code generator can cache it in a register:

line15: ; #substituting in edi with 2 occurences and score mov edi,DWORD [ varC]; mov eax,DWORD [ varB]; imul eax,edi; lea eax,[ eax+edi]; mov DWORD [ varT],eax

If an expression is too complicated we can run out of registers and we must spill results to memory
LET C:= (B*((C+B)+(C*(3+3))))-((3-((3+C)*(B*C)))-(1-(1-9))) line12:; #substituting in edi with 4 mov edi,DWORD [ varC];substituting edi with 4 uses mov esi,DWORD [ varB];substituting esi with 3 mov DWORD eax, 3 MOV ebx, esi imul ebx,edi MOV edx, edi add DWORD edx, 3 imul ebx,edx sub eax,ebx add DWORD eax, -9 mov DWORD [ temp],eax ;here we ran short of registers imul eax,DWORD edi, 6 lea ebx,[ edi+esi] lea eax,[ eax+ebx] imul eax,esi sub eax, [ temp] ; get the temp back mov DWORD [ varC],eax

Register spilling
If an expression is complex it may not be possible to evaluate all of it in registers or, on a machine with a limited evaluation stack like the Pentium FPU, all of it on the stack. In that case we have to split the expression up into parts each of which can be evaluated within the available number of registers.

Expression splitting
Use the weight function to determine if an expression needs to be split. If weight(e) > the number of free regs then split (e) To split we take the subtree with the greatest weight and assign it to a temporary store location. The temporary store location is then subsituted for the subtree. Given that we have already re-ordered expressions so that the tree on the left has the greatest weight, this amounts to assigning the subtree on the left to a store location.

Pre-compute constant expressions

16 LET D:= ((C*7)+9)+1 The above can be simplied to (C*7)+10 so the compiler will generate line16: imul eax,DWORD [ varC], add DWORD eax, 10 mov DWORD [ varD],eax 7

Common sub expressions


Consider the following 17 LET D:= (4*(T*(B*C)))+(2 *(B*C)) the expression B*C occurs more than once, so it would be wasteful to evaluate it twice. The code generator will spot this and remove it by rewriting tree as follows 17 LET edi:=B*C in D:= (4*(T*edi))+(2 *edi) mov edi,DWORD [ varB] imul edi, [ varC] mov eax,DWORD [ varT] imul eax,edi imul eax,DWORD eax, 4 imul ebx,DWORD edi, 2

Common sub expressions


lea eax,[ mov DWORD eax+ebx] [ varD],eax

How is this done?


Search expression tree for repeated sub trees. You build a hash table of all subtrees, keyed on their printable source equivalent. You then count how often each subtree occurs, and if it is more than once, you allocate a register to it and substitute that into the tree before code generation.

Anda mungkin juga menyukai