Anda di halaman 1dari 19

Q:23 What is symbol table? What are info. Store in it ?Where it created ?

Explain different technique to store symbol table for non block structure language.
A symbol table is a data structure used by a language translator such as a compiler or interpreter, for storing names encountered in the source program, along with the relevant attributes for those names.

Which information store in symbol table?


Information about following entities is store in symbol table Variable/Identifier Procedure/function Keyword (store before lexical analysis starts) Constant Class name Label name Structure & union name Following are the information store in symbol table about entities Name Location (address) Type Size Valve (for name constants) Scope information Number of argument, type of argument, return type, type of passing (like call by valve etc) (if procedure name) Dimension with lower and upper ( if array) Line number where variable declared Etc..

What is role of symbol table in compiler?


A symbol table is an integral part of semantic processing phase. Symbol table is used in different phases of compiler as listed below Semantic Analysis: check correct semantic usage of language constructs May need checking types of identifiers Code generation: All program variables and temporaries need to be allocated some memory locations Symbol table provides information regarding memory size required for identifiers by their types Error Detection: Leave variables undefined Optimization: To reduce the total number of variables used in a program we need to reuse the temporaries generated by the compiler

What are the operations performed on symbol table?

The basic operations defined on a symbol table include: Allocate() to allocate a new empty symbol table free() to remove all entries and free the storage of a symbol table Insert(name, token) to insert a name in a symbol table and return a pointer to its entry Lookup(name) to search for a name and return a pointer to its entry set_attribute(*entry,attribute_valve) to associate an attribute with a given entry get_attribute(*entry,attribute_valve) to get an attribute associated with a given entry Other operations can be added depending on requirement _ For example, a delete operation removes a name previously inserted _ Some identifiers become invisible (out of scope) after exiting a block

When symbol table is constructed?


The point in the translation process at which the symbol-table handling routines are invoked depends primarily on the number and the nature of the passes in the compiler. In a multipass compiler, such as the one depicted in Fig. 1, the symbol table is created during the lexical-analysis (scanning) pass. Index entries for variables in the symbol table form part of the token string produced by the scanner. For example, in Fig. 1 when X and Y occupy symbol-table positions 1 and 2, respectively, a token string such as il:= i2 + il is created. The syntacticanalysis pass receives the token string, checks for syntactic correctness, and generates parse tree or some encoded form of the parse tree. This encoded form is then analyzed for semantic correctness (i.e., context-dependent correctness) and is used in the code-generation phase to generate a series of object-code instructions. The leaves of the parse tree contain indices into the symbol table. Note that no table-handling routines are used during the syntactic-analysis phase. It is not until the semantic-checking and code-generation phases that many of the attributes associated with a variable can be assigned values in the symbol table. For example, in a-language with explicit declarations, the type of a variable can be assigned only when it is recognized that a declaration statement is being compiled. It is in the code-generation phase that the full context of the declaration (i.e., both the identifier name and its type) is known through the sequence of parsing actions performed by the syntactic analyzer

Fig. 1 Multipass compiler A second approach to symbol-table handling is illustrated in Fig. 2. The lexical-analysis, syntactic-analysis, and code-generation phases are completed in one pass. As a result, it is possible in the code-generation phase to recognize that a declaration statement is being processed before the entire source statement is scanned. This helps immensely, since the attributes of a variable, as dictated by the declaration statement, can be placed in the table as they are identified during code generation. Therefore, in the second approach, the only module which needs to interact with the symbol table is the code generator.

Fig.2 Combined pass compiler configuration In summary, it is suggested that in multipass compilers variable names should be inserted into the symbol table during lexical analysis and the other attributes of the variable should be assigned during code generation. If the lexical-analysis, syntactic-analysis, and code-generation phases are combined in one pass, all symbol-table interaction can be confined to the codegeneration phase.

How Symbol table implemented?

The programming language can be categories as Non-block structure language Block structure language Symbol table implementation techniques for non-block structure language 1. Linear list (in the form of flat file, array, linked list) 2. Order list 3. Tree 4. Hashed table

Follow class note + topic 8.5(chapter 8) from the book compiler writing by Jean-Paul Tremblay and Paul G. Sorenson
1) Linear list The simplest and easiest to implement data structure for symbol table is a linear list of records. We use single array or collection of several arrays for this purpose to store name and their associated information. Now names are added to end of array. End of array always marks by a point known as space. When we insert any name in this list then searching is done in whole array from available to beginning of array. If word is not found in array then we create an entry at available and increment available by one or value of data type. In implementation of symbol table first field always empty because when object-lookup work then it will return 0 to indicate no string in symbol table.

It does not contain any pointers or other overhead, which takes memory but holds no data. Indeed, everything it contains is data needed during compilation. As a fundamental disadvantage, it requires a fixed specification of its size in advance, which implies placing a limit on the number of identifiers. Complexity: If any symbol table has n names then for inserting any new name we must search list n times in worst case. So cost of searching is O(n) and if we want to insert n name then cost of this insert is O(n^2) in worst case.

2)

Order list Variation of linear table List may be sorted and a binary search may be used for access in O (log n) Insertion needs to be done at proper place to preserve the sorted nature

Linked list Linked list is expandable as opposed to an array. Searching is sequential; however, a proper self organizing storage can speed up searching this list significantly. Specifically, storage of this kind should move frequently used items up to the list head, and this clustering at the beginning of the list shortens the average search time because references to the same identifiers tend to come in bursts during compilation.

Unordered liked list

Ordered linked list 3) Tree The time to perform a table-insertion operation can be reduced from that of an ordered table by using a tree-structured type of storage organization. In this type of organization an attempt is made to reflect the search structure of the binary search in the structural links of a binarysearch tree.

In a binary-tree organization, the tree is ordered such that every node in its left subtree precedes lexicographically the root node of the tree. Similarly, every node in its right subtree follows lexicographically the root node of the tree. This ordering holds for all subtrees in a tree as well It is important to realize that because of this insertion strategy the organization of the tree structure is dependent upon the order of insertion. Some time tree becomes heavily left or right side loaded

Tree implementation of symbol sequence FIRST, B, ANS, COMPANY#, M, FORMl, and X3

Tree implementation of symbol sequence ANS, B, COMPANY#, FIRST, FORMl, M, and X3

A major problem exists with the binary tree organization just described. Since insertion always takes place at the leaves of the tree, a search tree can become very unbalanced. Hence balance binary search tree or AVL tree are used to implement symbol table
4) Hash Table

Hashing for symbol table

Q.32 Discuss various code optimization technique in brief. A) Compiler time evaluation

B) Common sub-expression elimination

C) Copy Propagation Follow the text book D) Dead code elimination

E) LOOP Optimization e.1 Code motion


An important modification that decreases the amount of code in a loop is code motion. This transformation takes an expression that yields the same result independent of the number of times a loop is executed (a loop-invariant computation) and places the expression before the loop. Note that the notion "before the loop" assumes the existence of an entry for the bop. For example, evaluation of limit - 2 is a loop-invariant computation in the following while-statement: while { i limit-2 ) / x statement does nut change limit */ Code motion will result in the equivalent of t = limit-2; while ( i <= t ) /*: statement does not change limit or t */ Strength reduction with elimination of induction variable The following three conditions ensure that code motion does not change what the program computes. None of the conditions is absolutely essential

For detail follow text book e.2 Loop unrolling


Loop overhead can be reduced by reducing the number of iterations and replicating the body of the loop.

Example:
Loop unrolling involves replicating the body of the loop to reduce the required number of tests if the number of iterations are constant. For example consider the following loop:
I = 1 while (I <= 100) { x[I] = 0; I++; }

In this case, the test I <= 100 will be performed 100 times. But if the body of the loop is replicated, then the number of times this test will need to be performed will be 50. After replication of the body, the loop will be:
I = 1 while(I<= 100) { x[I] = 0; I++; X[I] = 0; I++; }

It is possible to choose any divisor for the number of times the loop is executed, and the body will be replicated that many times. Unrolling oncethat is, replicating the body to form two copies of the bodysaves 50% of the maximum possible executions.

e.3 Loop jamming


Loop jamming is a technique that merges the bodies of two loops if the two loops have the same number of iterations and they use the same indices. This eliminates the test of one loop. For example, consider the following loop:
{ for (I = 0; I < 10; I++) for (J = 0; J < 10; J++) X[I,J] = 0; for (I = 0; I < 10; I++) X[I,I] = 1; }

Here, the bodies of the loops on I can be concatenated. The result of loop jamming will be:
{ for (I = 0; I < 10; I++) { for (J = 0; J < 10; J++) X[I,J] = 0; X[I,I] = 1; } }

The following conditions are sufficient for making loop jamming legal: 1. No quantity is computed by the second loop at the iteration I if it is computed by the first loop at iteration J I. 2. If a value is computed by the first loop at iteration J I, then this value should not be used by second loop at iteration I. e.4 elimination of induction variable using strength reduction follow text book

Q:36Discuss the issues in the design of a code generator.

Ans While the details are depends on the target language and operating system; issues such as
memory management, instruction selection, register allocation and evaluation order are inherent in almost all code generation problems. INPUT TO CODE GENERATOR: The input to the code generator consist of the intermediate representation of source program produced by the front end, together with information in the symbol table that is used to determine the run-time addresses of the data objects denoted by the names in the intermediate representation. TARGET PROGRAM: The o/p of the code generator is the target program, like the intermediate code. This o/p may take a variety of forms. Absolute machine language program as o/p has the advantages that it can be placed in a fixed location in memory and immediately executed. Producing an assembly language program as o/p makes the process of code generation somewhat easier. It generates symbolic instruction and uses macro facilities. INSTRUCTION SELECTION: The nature of the instruction set of the target machine determines the difficulty of the instruction selection. The quality of the generated code is always determined by its speed and size. A target machine with a rich instruction set may provide several way of implementing a given operation. Ex: MOV a, Ro ADD #1, Ro MOV Ro, a Instruction speeds are needed to design good code sequence. REGISTER ALLOCATION: Instruction involving register operands are usually shorter and faster than those involving operands in memory. Therefore efficient utilization of registers is particularly

important in generating good code. During register allocation, we select the set of variables that will reside the registers at a point in the program. During a subsequent register assignment phase, pick the specific register that a variable will reside in. Ex: Multiplication M x, y. CHOICE OF EVALUATION ORDER: The order in which computations are performed can affect the efficiency of the target code. Initially, we shall avoid the problems by generating code for the 3add code statements in the order in which they have been produced by the intermediate code generator. APPLICATIONS TO CODE GENERATION 1. Undoubtedly the most important criterion for a code generator is that it produce correct code. 2. Correctness takes on special significance because of the number of special cases that a code generator might face. 3. Given the premium on correctness, designing a code generator so it can be easily implemented tested and maintained is an important design goal.

Que 21) Write a syntax directed definition to construct syntax tree for assignment statement.

Que 39) What is attribute grammer ? Can it be simulated by use of YACC ? What is input and output to YACC ? Context-free grammars are not able to completely specify the structure of programming languages. For example, declaration of names before reference, number and type of parameters in procedures and functions, the correspondence between formal and actual parameters, name or structural equivalence, scope rules, and the distinction between identifiers and reserved words are all structural aspects of programming languages which cannot be specified using context-free grammar. Even in a simple language like Simp, context-free grammars are unable to specify that variables appearing in expressions must have an assigned value. Context-free descriptions of syntax are supplemented with natural language descriptions of the static semantics or are extended to become attribute grammars. Attribute grammars are an extension of context-free grammars which permit the specification of context-sensitive properties of programming languages. Attribute grammars are actually much more powerful and are fully capable of specifying the semantics of programming languages as well. Attribute grammars have been used extensively in every phase of traditional compiler construction Definition: An attribute grammar is a context-free grammar that has been extended to provide context sensitive information by appending attributes to some of its nonterminals. With each nonterminal, it associates a set of attribute and with each production, a set of semantic rule for computing values of the attributes associated with the symbols appearing in that production.

Yes, attribute grammer can simulated by use of YACC Inputtranslation schema Output LALR parsing table

Q:Explain Challenges in compiler design

Q:explain machine dependent and machine independent The machine dependent optimization is based on characteristics of the target machine for the instruction set used and addressing modes for the instructions to produce the efficient target code The machine independent optimization is based on the characteristics of the programming languages for appropriate properties in order to reduce the execution time *The machine dependent optimization can be achieved using following criteria:

1.allocation of sufficient number of resourses to improve the execution efficiency of program. 2.Using immediate instruction. *The machine independent optimization can be achieved using following criteria: 1.The code should be analyzed complety and use alternative equivalent sequence of source code that will produce a minimum amount of target code 2.use appropriate program structures in order to improve efficiency of target code 3.erom the source program eliminate the unreachable code. 4. move two or more identical computation at one place and make use of the result instead of each time computing the expressions.
Q Write a production rule and semantic action for Declarative statement Production Semantic rule {offset:=0} SD {L.in=T.type DT L L.widthin=T.width} {T.type=int Tint T.width=2} {T.type=float Tfloat T.width=4} {T.type=char Tchar T.width=1} {L1.in=L.in LL1 ,id L1.widthin=L.widthin id.entry=lookup(id.name); if(id.entry==NULL) enter_symtab(id.name,L.in,offset) else addtype(id.entry,L.in,offset) offset=offset+L.widthin } LL1 ,*id {L1.in=L.in L1.widthin=L.widthin id.entry=lookup(id.name); if(id.entry==NULL) enter_symtab(id.name,pointer(L.in),offset) else addtype(id.entry,pointer(L.in),offset) offset=offset+ 2 }

Lid

L*id

{ id.entry=lookup(id.name); if(id.entry==NULL) enter_symtab(id.name,L.in,offset) else addtype(id.entry,L.in,offset) offset=offset+L.widthin } {L.width=2; id.entry=lookup(id.name); if(id.entry==NULL) enter_symtab(id.name,pointer(L.in),offset) else addtype(id.entry,pointer(L.in),offset) offset=offset+L.widthin }

Here type and width attribute are synthesized in and widthin attribute are inherited Q. Construct unambious production rule for statement involving +, -, * , / , ^ (power), unary E E + T | E- T | T T T * P | T- P | P P ^ P | U U -U | F F (E) | id | num

Q. Construct unambious production rule for Boolean statement E E OR T | T T T AND N | N N NOT N | F F true F false

Anda mungkin juga menyukai