Anda di halaman 1dari 20

ROLE OF THE PARSER The parser obtains a string of tokens from the lexical analyzer and verifies that

t the string can be generated by the grammar for the source language. The parser to report any syntax errors. To recover from occurring errors so that it can continue processing the remainder of its input. Three types of parsers for grammars
Universal Parser

Cocke-Younger-Kasami(CYK) algorithm and Earleys algorithm can parse any grammar. These methods are too inefficient to use in production compilers. To build parse trees from the top to the bottom. Token Lexical Source program analyzer Get next token Parser

Top-down Parser

Parse tree

Rest of Intermediate front end representation

Symbol table Fig. Position of parser in compiler model

Bottom-up Parser

To build parse trees from bottom to the top. In top-down and bottom-up cases, the input to the parser is scanned from left to right, one symbol at a time and to work only on sub classes of grammars, such as LL and LR grammars. Automated tools construct parsers for the larger class of LR grammars. The output of the parser is representation of the parse tree for the stream of tokens produced by the lexical analyzer.

Number of tasks conducted parsing

Collecting information about various tokens into the symbol table. Performing type checking and other kinds of semantic analysis. Generating intermediate code. The nature of syntactic errors and general strategies for error recovery, these two strategies are called panic-mode and phrase-level recovery.

Syntax error handling

If a compiler had to process only correct programs, its design and implementation would be simplified. When programmers write incorrect programs, a good compiler should assist the programmers in identifying and locating errors. Compiler requires for syntactic accuracy as computer languages. Planning the errors handling right from the start can both simplify the structure of a compiler and improve its response to errors.

Different levels of errors

Lexical Syntactic Semantic Logical

misspelling an identifier, keyword, or operator. an arithmetic expression with unbalanced parentheses. an operator applied to an incompatible operand. an infinitely recursive calls.

Error detection and recovery of the syntax analysis phase

Many errors are syntactic in nature or are exposed when the stream of tokens coming from the lexical analyzer disobeys the grammatical rules defining the programming languages. Parsing methods can detect the syntactic errors in programs very efficiently.

Accurately detecting the semantic and logical errors at compile time is very difficult task.

The error handler in a parser has simple-to-state goals

It should report the presence of errors clearly and accurately. It should recover from each error quickly enough to be able to detect subsequent errors. It should not significantly slow down the processing of correct programs. The LL and LR methods of parsers are detect an error as show as possible. They have the viableprefix property meaning they detect that an error has occurred as soon as they see a prefix of the input that is not a prefix of any string in the language.
Error-recovery strategies

To recover from a syntactic error of the different general strategies of a parser.

Panic-mode recovery

The parser discards input symbols one at a time until one of a designated set of synchronizing tokens found. The compiler designer must select the synchronizing tokens appropriate for the source language. While panic-mode correction skips an amount of input without checking it for additional errors. The synchronized tokens are delimiters, such as semicolon or end, whose role in the source program is clear. It can be used by most parsing methods. If the multiple errors occur in the same statement, this method may quite. Advantage : Simplest to implement and guaranteed not to go into an infinite loop.
Phrase-level recovery

A parser may perform local correction on the remaining input, that is, it may replace a prefix of the remaining input by some string that allows the parser to continue. A typical local correction would be to replace a comma by a semicolon, delete an extraneous semicolon, or insert a missing semicolon. The choice of the local correction is left to the complete designer. To choose replacement that do not lead to infinite loop. This type of replacement can correct any input string and has been used in several error-repairing compilers. The method was firs used with top-down parsing. Drawback : It has in coping with situations in which the actual error has occurred before the point of detection, is difficulty.

The common errors may be encountered, the grammar for the language at hand with productions that generate the erroneous constructs. The use of the grammar augmented by these error productions to construct a parser. If the parser uses an error production, can generate appropriate error diagnostics to indicate the erroneous construct that has been recognized in the input.
Global correction

A compiler to make as few changes as possible in processing an incorrect input string. There are algorithms for choosing a minimal sequence of changes to obtain a globally least cost correction. These methods are too costly to implement in terms of time and space, so these techniques are currently only of theoretical. Given an incorrect input string x and grammar G, these algorithms will find a parse tree for a related string y, such that the number of instructions, deletions, and changes of tokens required to transform x into y is as small as possible. The notion of least-cost correction does provide for evaluating error-recovery techniques and it has been used for finding optimal replacement strings for phrase-level recovery. CONTEXT-FREE GRAMMARS Construction of language has a recursive structure can be defined by context free grammars. For example, conditional statement defined by a rule, if s1 and s2 are statements and E is an expression, then if E then S1 else S2 is statement. The regular expressions can specify the lexical structure of tokens. Using the syntactic variable stmt to denote the class of statements an expr the class of expressions, then grammar production stmt if expr then stmt else stmt CFG consists of terminals, non-terminals, a start symbol and productions.

Terminals are basic symbols from which strings are formed. The word token is a synonym for terminal in programs, for programming languages. For example, stmt if expr then stmt else stmt. Each of the keywords if, then and else is a terminal. Non-terminals are syntactic variables that denote set of strings. The non-terminals define sets of strings that help define the language generated by the grammar. A hierarchical structure on the language that is useful for both syntax analysis and translation. In the above example, stmt and expr are non-terminals. Example : In a grammar, one non-terminal is expr expr op expr distinguished as the start symbol, and the set of strings it denotes is the language defined expr ( expr ) by the grammar. expr - expr The productions of a grammar, in which the expr id terminals and non-terminals can be op + combined to form strings. Each productions op consists of a non-terminal, followed by an op * arrow, followed by a string of non-terminals op / and terminals. op In this grammar, the terminal symbols are id + - * / ( ) the non-terminal symbols are expr and op and expr is the starting symbol.

Notational Conventions

These symbols are terminals Lower case letters in the alphabet such as a, b, c. Operator symbols such as +, - etc. Punctuation symbols such as parentheses, comma, etc. The digits 0,1,,9. Bold face strings such as id and if.

These symbols are non-terminals Uppercase letters in the alphabet such as A,B,C. The letter S, which, when it appears is usually the start symbol. Lower-case italic names such as expr or stmt. Upper-case letters in the alphabet, such as X,Y,Z represent grammar symbols, that is either nonterminals or terminals. Lower-case letters in the alphabet, u, v,,z represent strings of terminals. Lower-case letters , , represent strings of grammar symbols. A production can be written as A indicating a single non-terminal A on the left side of the production and a string of grammar symbols to the right side of the production. If A 1, A 2, ., A k are all productions with A on the left, A 1| 2 | | k, where, 1, 2 , , k the alternatives for A. Unless otherwise stated, the left side of the first production is the start symbol. For example, E E A E | ( E ) | - E | id , A + | - | * | / | , here, E and A are non-terminals, with E is the start symbol. The remaining symbols are terminals.


A production rule is in which the non-terminal on the left is replaced by the on the right side of the production. For example, E E + E | E * E | ( E ) | - E | id The production E -E, an expression preceded by a minus sign is also an expression. This production can be used to generate more complex expressions from simpler expressions by allowing to replace any instance of an E by E. This can be rewritten as E -E, which is E derives E. In abstract string, A if A is a production and and are arbitrary strings of grammar symbols. If 1 2 n, 1derives n. The symbol means derives in one step, the * means derives in zero or more steps, and + means derives in one or more steps.

1. * for any strings , 2. If * and , then * . A language can be generated by a grammar is said to be a context-free language. If two grammars generate the same language, the grammars are said to be equivalent. Strings in L(G) may contain only terminal symbols of G. A string of terminals w is in L(G) if and only if S + w. The string w is called a sentence of G. If string S* , where may contain non-terminals, then is a sentential form of G. A sentence is a sequential form with no non-terminals. For example, the string (id + id) is a sentence of grammar E E + E | E * E | ( E ) | -E | id because there is the derivation E -E - ( E ) -( E + E ) -( id + E ) -( id + id). The strings E, -E, -( E ), ,-( id + id ) appearing in this derivations are all sentential forms of this grammar. E * -( id + id ) to indicate that (id + id) can be derived from E. Leftmost derivations in which only the leftmost non-terminal in any sentential form is replaced at each step. It is called leftmost. For example, if by a step in which the leftmost non-terminal in is replaced, then written as lm . The leftmost derivation is E lm - E lm ( E ) lm - ( E + E ) lm - ( id +E ) lm - ( id + id ). Using notational conventions, every leftmost step can be written wA lm w where w consists of terminals only, A is the production applied, and is a string of grammar symbols by a leftmost derivation, *lm . If S *lm , then is a left sentential form of the grammar. Right most derivations in which the right most non-terminal is replaced at each step. Rightmost derivations are also called as canonical derivations.

Parse tree and derivations

Parse tree is a graphical representation for a derivation. Each interior node of a parse tree is labeled by non-terminal A, and the children of the node are labeled, from left to right, by the symbols in the right side of the production by which A was replaced in the derivation. The leaves of the parse tree are labeled by non-terminals or terminals and, read from left to right, they construct a sentential form, called the yield or frontier of the tree. 1 2 n , where 1 is a single non-terminal A. For each sentential form i in the derivation, a parse tree whose yield is i. The process is an induction on i. A parse tree whose yield is i-1 = X1 X2 Xk. i is derived from i-1 by replacing Xj, a non-terminal, by = Y1Y2Yr. That is, at the ith step of the derivation, production Xj

is applied to i-1 to derive i = X1 X2 Xj-1 Xj+1 Xk. For example, the parse tree for (id + id ) implied by derivation. E ( E id E E + ) E id

Fig. Parse Tree for (id + id )

Example :

The sentence id + id * id has the two distinct leftmost derivations. E E E + E E E * E id + E E + E * E E * E + E id + E * E id + E * E id + id * E id + id * E id + id * id id + id * id E + E id E * E id id

E id


id Parse Tree

Parse Tree

Note : * operator as having higher precedence than +. Ambiguity

A grammar produces more than one parse tree for some sentence is said to be ambiguous. An ambiguous grammar is one that produces more than one left most or right most derivation for the same sentence.

Writing a grammar

A limited amount of syntax analysis is done by a lexical analyzer as it produces the sequence of tokens from the input characters. The sequences of tokens accepted by a parser from a superset of a programming language. Grammars for expressions can be constructed using associativity and precedence information. It is useful for rewriting grammars for top-down parsing. Programming language constructs that cannot be described by any grammar.

Regular expressions Vs Context-Free Grammars

Every construct that can be described by a regular expression can also be described by a grammar. For example, the regular expression (a| b)*abb and the grammar A0 aA0 | bA0 | aA1 A1 bA2 A2 bA3 A3 describe the same language, the set of strings of as and bs ending in abb.

To convert a NFA into a grammar that generates the same language as recognized by the NFA. For each state i of the NFA, create a nonterminal symbol Ai . If state i has a transition to state j on symbol a, the production Ai aAj. If state i goes to state j on input , the production Ai Aj. If accepting state, Ai . If i is the start state, make Ai be the start symbol of the grammar.

Every regular set is a context-free language. Use of regular expressions to define the lexical syntax of a language : The lexical rules of a language are quite simple, and to describe no need a notation as powerful as grammars. More concise and easier to understand notation for tokens than grammars. More efficient lexical analyzers can be constructed automatically from regular expressions than from arbitrary grammars. Separating the syntactic structure of a language into lexical and non-lexical parts provides a convenient way of modularizing the front end of a compiler into two manageable-sized components. Regular expressions are useful for describing the structure of lexical constructs, such as, identifiers, constants, keywords. Grammars are useful for describing nested structures, such as, balanced parentheses, matching begin-ends, corresponding if-then-elses. Note: Nested structures cannot be described by regular expressions.
Verifying the language generated by a grammar

To reason that a given set of productions generates a particular language. Every string generated by a grammar G is in language L, and the language L generated by a grammar G. For example, the grammar S ( S ) S | This grammar generates all strings of balanced parentheses. Every sentence derivable from S is balanced, and every balanced string is derivable from S. The only string of terminals derivable from S is the empty string, which is balanced. A derivation must be of the form S ( S ) S * ( x ) S * ( x ) y The derivations of x and y from S, x and y are balanced. The string (x)y must be balanced.

Any string derivable from S is balanced, and uses of the length of a string. The empty string is derivable from S. All derivations n steps produce balanced sentences and a leftmost derivation of n steps. Every balanced string of length less than 2n is derivable from S, and a balanced string w of length 2n, n 1. Balanced string w begins with a left parenthesis. Let ( x ) be the shortest prefix of w having an equal number of left and right parentheses, so, it can be written as (x)y where both x and y are balanced, where x and y are of length less than 2n, they are derivable from S by the inductive hypothesis. To find a derivation of the form S ( S ) S * ( x ) S * ( x ) y proving that w = ( x )y is also derivable from S.

Eliminating ambiguity

An ambiguous grammar can be rewritten to eliminate the ambiguity. For example, danglingelse grammar: stmt if expr then stmt | if expr then stmt else stmt | other, here, other means for any other statement. According to this grammar, the compound conditional statement can be rewritten as if E1 then S1 else if E2 then S2 else S3 has the parse tree.

Rule : Match each else with the closest previous unmatched then.



expr E1


stmt S1 if








E1 S1 S2 Fig.(1) Parse tree for conditional statement Fig.1 is preferred, because, disambiguating rule can be incorporated directly into the grammar. Grammar is ambiguous, the string if E1 then S1 else if E2 then S2 has the parse tree stmt if expr E1 if expr E1 stmt if expr E1 if expr then then stmt else S2 stmt stmt then stmt S1 else stmt S2 then stmt

E2 S1 Fig. (2) Two parse trees for an ambiguous statements A statement appearing between a then and an else must be matched, i.e., it must not end with an unmatched then followed by any statement, for the else would then be forced to match this unmatched then. A matched statement is either an if-then-else statement containing no unmatched statements or it is any other kind of unconditional statement. To use the grammar stmt matched-stmt | unmatched-stmt matched-stmt if expr then matched-stmt else matched-stmt | other unmatched-stmt if expr then stmt | if expr then matched-stmt else unmatched-stmt this grammar generates the same set of strings as, if E1 then S1 else if E2 then S2 else S3, but it allows only one parsing for string if E1 then S1 else if E2 then S2, the one that associates each else with the closest previous unmatched then.

Elimination of left recursion

A grammar is left recursive if it has a non-terminal A such that there is a derivation A+ A for some string . The production of the form A A , is simple left recursion.

The production of the form A A | is a left recursive pair, it can be replaced by non-leftrecursive productions. A A , A A | without changing the set of strings derivable from A. Example : The grammar for arithmetic expression E TE E E+T|T E +TE | T T*F|F T FT F ( E ) | id T *FT | To eliminate the immediate left recursion to F ( E ) | id. the productions for E and T, To eliminate immediate left recursion from them by the following technique. The productions A A 1 | A 2 | |A m | 1 | 2 | | n where no i begins with an A. Then, replace the A-productions by A 1A | 2A | | nA , A 1A | 2A | | A | To eliminate the immediate left recursion S Aa | b A bdA | A A cA | adA |

Example :

S Aa | b A Ac | Sd | . This production can be rewritten as S Aa | b A Ac | Aad | bd |

Algorithm : Input : Output : Method :

Eliminating left recursion from a grammar. Grammar G with no cycles on -productions An equivalent grammar with no left recursion. 1. Arrange the non-terminals in some order A1 , A2 , , An. 2. for i := 1 to n do begin for j := 1 to i-1 do begin replace each production of the form Ai Aj by the productions Aj 1 | 2 | | k are all the current Aj-productions end eliminate the immediate left recursion from a grammar. end. To eliminate left recursion from a grammar systematically. If the grammar has no cycles or productions. Cycles can be systematically eliminated from a grammar as can -productions. It is a grammar transformation that is useful for producing a grammar suitable for predictive parsing. If A 1 | 2 are two A-productions, and the input begins with a non-empty string derived from . The left factored productions A A , A 1 | 2.
: : : :


Algorithm Input Output Method

Left factoring a grammar Grammar G. An equivalent left-factored grammar. For each non-terminal A find the longest prefix common to two or more of its alternatives. If , i.e., there is a non-trivial common prefix, replace all the A-productions A 1 | 2 | | n | , where represents all alternatives that do not begin with by A A | , A 1 | 2 | | n, here, A is a new non-terminal. Repeatedly apply this transformation until no alternatives for a non-terminal have a common prefix. The dangling-else problem

Example :

S iEtS|iEtSeS|a E b here, i, t and e stand for if, then and else, E and S for expression and statement. The left factored for this grammar is : S i E t S S | a S eS | E b In this grammar, S i E t S is the left factored of S i E t S S (1) on input S i E t S S S eS | (2) These (1) and (2) grammars are ambiguous, and on input , it will not be clear which alternative for S should be chosen. S iEtS|iEtSeS|a E b The left factored for this grammar is : S i E t S S | a S eS | E b

Parsing table M for grammar


a S a

i SiEtSS

Top-down Parsing

S eS S E b

To construct an efficient non-backtracking form of top-down parser called a predictive parser. The class of LL(1) grammars from which predictive parsers can be constructed automatically.

Recursive-descent parsing

Top-down parsing can be viewed as an attempt to find a leftmost derivation for an input string. It can be viewed as an attempt to construct a parse tree for the input starting from the root and creating the nodes of the parse tree in preorder. Recursive-descent parsing is called predictive parsing , where no backtracking is required. Top-down parsing is called recursive-descent, that may involve backtracking, that is making repeated scans of the input. Backtracking is not very efficient, rarely needed to parse programming language. For example, consider the grammar S c A d, A a b | a and the input string w = cad. S S S c A d c a A d b c A a d

Fig. Steps in top-down parse To construct a parse tree for this string top-down, to create a tree consisting of a single node labeled S. The leftmost leaf, labeled c, matches the first symbol of w. The advance input pointer to a, the second symbol of w, the next leaf labeled A, match for the second input symbol. The advance input pointer to d, the third input symbol and compared against the next leaf, labeled b. b does not match d, then go back to A, to reset the input pointer to position 2, which means that the procedure for A must store the input pointer in a local variable. The second alternative for A to obtain the tree. The leaf a matches the second symbol of w and the leaf d matches the third symbol. It produced a parse tree for w, and successful completion of parsing. A left-recursive grammar can cause a recursive-descent parser, even more with backtracking, to go into an infinite loop.

Predictive Parsers

To eliminate left recursion from a grammar, and left factoring the resulting grammar. To obtain a grammar that can be parsed by a recursive-descent parser that needs no backtracking i.e., a predictive parser. To construct a predictive parser, given the current input symbol a and the non-terminal A to be expanded, which one of the alternatives of production A 1 | 2 | | n is the unique alternative that derives a string beginning with a. For example, the productions stmt if expr then stmt else stmt | while expr do stmt | begin stmt-list end The keywords if, while, and begin, which alternative is the only one that can possible to success, if to find a statement.

Transition diagrams for Predictive Parsers

To create a transition diagram for the predictive parser, it is very useful plan or flowchart for lexical analyzer. The labels of edges are tokens and non--terminals. A transition on a token means that transition if that token is the next input symbol. To construct the transition diagram of a predictive parser from a grammar, first eliminate left recursion from the grammar, and then left factor the grammar. a) Create an initial and final state. b) For each production A X1 X2 Xn , create a path from the initial to the final state, with edges labeled X1 , X2 , , Xn . More than one transition from a state on the same input occurs ambiguity. To build a recursive-descent parser using backtracking to systematically. For example, E E + T | T , T T * F | F , F ( E ) | id contains a collection of transition diagrams for grammar E TE , E +TE | , T FT , T *FT | , F ( E ) | id. Substituting diagrams to the transformations on grammar can simplify transition diagrams.

E: 0 E : F: 3

T 1 + 4 (
1 4 1 5

E 2 T 5 E id
1 6

T: 7 E 6 ) T :
1 0

F 8 *
1 1

T 9 F
1 2

1 3

1 7 Fig. Transition diagram for grammar T + 5 E : 3 + 4 6 E : T 0 3 6 Fig. Simplified transition diagrams T: * F 1 6 7 8 3

1 6

E : 3 + 4 6 T E: 0 T 3 + T

4 6

E: 0 F:
1 4

+ T (

1 5

E id

1 7

Non-recursive predictive parsing

Fig. Simplified transition diagrams for arithmetic expressions

To build a non-recursive parser by maintaining a stack. The predictive parsing is determining the production to be applied for a non-terminal. To non-recursive parser looks up the production to be applied in a parsing table. The table can be constructed directly from grammars. A table-driven predictive parser has an input buffer, a stack, a parsing table, and an output stream. The input buffer contains the string to be parsed, followed by $, a symbol used as a right end marker to indicate the end of the input string. The stack contains the start symbol of the grammar on top of $. The parsing table is a two-dimensional array M [ A , a ], where A is a non-terminal, and a is a terminal or the symbol $. X the symbol on the top of the stack, and a the current input symbol. a + b $


X Y Z $ Parsing Tab le M

Predictive Parsing Program


Fig. Non-recursive Predictive Parser Three possibilities of two symbols : 1. If X = a = $, the parser halts and announces successful completion of parsing. 2. If X = a $, the parser pops Xoff the stack and advances the input pointer to the next input symbol. 3. If X is a nonterminal, the program consults entry M [ X , a ] of the parsing table M. This entry will be either an X-production of the grammar or an error entry. For example, if M [ X , a ] = { X U V W }, the parser replaces X on top of the stack by WVU. If M [ X , a ] = error, the parser calls an error recovery routine.
Algorithm : Input : Output : Method :

Non-recursive predictive parsing. A string w and a parsing table M for grammar G. If w is in L(G), a leftmost derivation of w; otherwise, an error indication. The parser is in a configuration in which it has $S on the stack with S, the start symbol of G on top, and w$ in the input buffer. The program that utilizes the predictive parsing table M to produce a parse for the input.

set ip to point to the first symbol of w$; repeat let X be the top of the stack symbol and a symbol pointed to by ip. if X is a terminal or $ then if X = a then pop X from the stack and advance ip else /* X is a non-terminal */ if M [ X , a ] = X Y1 Y2 Yk then begin pop X from the stack; push Yk , Yk-1 , Y1 onto the stack, with Y1 on top; output the production X Y1 Y2 Yk end else error() until X = $ /* Stack is empty */ Fig. Predictive parsing algorithm For example, E E + T | T , T T * F | F , F ( E ) | id E TE , E +TE | , T FT , T *FT | , F ( E ) | id. The input id + id * id $ the predictive parser create the sequence of moves. The input pointer points to the leftmost symbol of the string in the INPUT column. The leftmost derivation for the input, the productions output are those of a leftmost derivation The input symbol scanned already, followed by the grammar symbol on the stack ( from top to bottom), the left-sentential forms in the derivations. STACK $E $ E T INPUT id + id * id $ id + id * id $ OUTPUT E T E

$ E $ E $ E $ E $ E $ E $ E $ E $ E $ E $ E $ E $ E $

id + id * id $ T F T id + id * id $ F id + id * id $ + id * id $ T + id * id $ T+ E + T E id * id $ T F T F T id * id $ T id * id $ T * id $ T F * T * F T id $ T F id $ T id F id $ T $ T $ E Fig. Predictive Parser on input id + id * id T F T id T

FIRST and FOLLOW The construction of a predictive parser is aided by the functions associated with a grammar G. i) First ii) Follow Sets of tokens yielded by the FOLLOW functions can also be used as synchronizing tokens during panic-mode error recovery. FIRST( ) be the set of terminals, where, is any string of grammar symbols, strings derived from . If * , then is in FIRST( ). FOLLOW(A), be the non-terminal A, the set of terminals a that can appear immediately to the right of A. The set of terminals a such that there exists a derivation of the form S * A a for and . Note : During the derivation, have been symbols between A and a, they derived and disappeared. If A can be the rightmost symbol in sentential form, then $ is in FOLLOW(A).
Rules :

FIRST(X) for all grammar symbols X, apply the rules until no more terminals or can be added to any FIRST set. i) ii) iii) X is a terminal, then FIRST( terminal ) will be terminal itself X. { X }, (or) if X is a terminal, then FIRST(X) is { X }. X a is terminal, a is terminal, X ,(or) if X is a production, then add to FIRST(X). if X is non-terminal and X Y1 Y2 Yk is a production, then place a in FIRST(X) if for some i, a is in FIRST(Yi), and is in all of FIRST(Y1), FIRST(Yi-1); that is, Y1 Yi-1 * . If is in FIRST(Yj) for all j = 1, 2, , k, then added to FIRST(X).

Rules :

FOLLOW(A) for all non-terminals A, apply the rules until nothing can be added to any FOLLOW set. i) ii) iii) $ in FOLLOW(S), where S is the start symbol and $ is the input right end marker. If there is a production A B , then everything in FIRST( ) except for is placed in FOLLOW(B). If there is a production A B, or a production A B where FIRST( ) contains (i.e., * ), then everything in FOLLOW(A) is in FOLLOW(B).

For example, the grammar E E + T | T , T T * F | F , F ( E ) | id To eliminate the left recursion E TE , E +TE | , T FT , T *FT | , F ( E ) | id.


= = = = =

{ ( , id } { +, } { ( , id } { *, } { ( , id }


= = = = =

{ $, ) } { $, ) } { +, $, ) } { +, $, ) } { *, +, $, ) }

CONSTRUCTION OF PREDICTIVE PARSING TABLES i) ii) A is a production with a in FIRST( ). = or * , A the current input symbol is in FOLLOW(A), or if the $ on the input has been reached and $ is in FOLLOW(A). Construction of a predictive parsing table. Grammar G. Parsing table M. For each production A of the grammar, do steps 2 and 3. For each terminal a in FIRST( ), add A to M [ A , a ]. If is in FIRST( ), add A to M [ A , b ] for each terminal b in FOLLOW(A). If is in FIRST( ) and $ is in FOLLOW(A), add A to M [ A, $ ]. Make each undefined entry of M be error.

Algorithm : Input : Output : Method :

For above example,


( E T E

LL(1) Grammars

Id E T E

E + T E T F T T T * F T F (E)

E T F T T F id


A grammar whose parsing table has no multiply-defined entries is said to be LL(1). L in LL(1) stands for scanning the input from left to right. L in LL(1) stands for producing a left most derivation, and 1 in LL(1) stands for using one input symbol of look ahead at each step to make parsing action decisions.

No ambiguous or left recursive grammar can be LL(1). If A | are two different productions of G. i) For no terminal a do both and derive strings beginning with a. ii) At most one of and can derive the empty string. iii) If * , then does not derive any string beginning with a terminal in FOLLOW(A).


Writing a grammar for the source language such that a predictive parser can be constructed from the grammar. The resulting grammar hard to read and difficult to use for translation purposes.

There are no universal rules by which multiply-defined entries can be made single-valued without affecting the language recognized by the parser. An error is detected during predictive parsing when the terminal on top of the stack does not match the next input symbol or when non-terminal A is on top of the stack, a is the next input symbol, and the parsing table entry M [A , a ] is empty. Panic-mode error recovery is skipping symbols on the input until a token in a selected set of synchronizing tokens appears. Its effectiveness depends on the choice of synchronizing set. The parser recovers from errors to occur As a starting point, all symbols in FOLLOW(A) into the synchronizing set for non-terminal A. If skip tokens until an element of FOLLOW(A) is seen and pop A from the stack, it is likely that parsing can continue. It is not enough to use FOLLOW(A) as the synchronizing set for A. If symbols in FIRST(A) to the synchronizing set for non-terminals A, then it may be possible to resume parsing according to A if a symbol in FIRST(A) appears in the input. If a non-terminal can generate the empty string then the production deriving can be used as a default. To reduces the number of non-terminals that have to be considered during error recovery. If a terminal on top of the stack cannot be matched, the terminal was inserted and continue parsing.

Error recovery in Predictive parsing

Phrase-level recovery

Example :

To implement by filling in the blank entries in the predictive parsing table with pointers to error routines.

Using FIRST and FOLLOW symbols as synchronizing tokens works, when expressions are parsed to grammar E E + T | T , T T * F | F , F ( E ) | id . Construct the parsing table for this grammar, with synch indicating synchronizing tokens obtained from the FOLLOW set of the non-terminal.


( E T E

) synch E

E E T T F STACK $E $E $ E T $ E T F $ E T id $ E T $ E T F * $ E T F E + T E synch T

id E T E

$ synch E




synch T

T * F T T synch synch synch F (E) Fig. Synchronizing tokens added to parsing table INPUT ) id * + id $ id * + id $ id * + id $ id * + id $ id * + id $ * + id $ * + id $ + id $ OUTPUT Error, skip ) id is in FIRST(E)

F id


$ E T $ E $ E T + $ E T F $ E T id $ E T $ E T $ E $
Bottom-up parsing

+ id $ + id $ + id $ id $ id $ id $ $ $ $

Error, M[F,+]=synch, F has been popped

Bottom-up syntax analysis is known as shift-reduce parsing. It is easy to implement. The general method of shift-reduce parsing, called LR parsing. LR parsing is used in a number of automatic parser generators. To construct a parse tree for an input string beginning at the leaves and working up towards the root. At each reduction step a particular sub string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub string is chosen correctly at each step, a right most derivation is traced out is reverse. abbcde aAbcde aAde aABe S

Example :

Consider the grammar, S aABe A Abe|b B d The sentence abbcde can be reduced to S

(A b) (A Abe) (B d) (S aABe)

The sequence of four reductions, trace out the right most derivation in reverse : S rm a A B e rm a A d e rm a A b c e rm a b b c e

HANDLES A Handle of a string is a substring that matches the right side of a production, and whose reduction to the non-terminal on the left side of the production represents one step along the reverse of a rightmost derivations. A handle of a right-sentential form is a production A and a position of where the string may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of . If a grammar is unambiguous, then every right-sentential form of the grammar has exactly one handle. The handle represents the left most complete sub tree consisting of a node and all its children. Consider the grammar, E E + E | E * E | ( E ) | id. The right most derivation ( Or) E rm E + E E rm E * E E rm E + E * E E rm E * id3 E rm E + E * id3 E rm E + E * id3 E rm E + id2 * id3 E rm E + id2 * id3 E rm id1 + id2 * id3 E rm id1 + id2 * id3 Note : The string appearing to right of a handle contains only terminal symbols. HANDLE PRUNING A rightmost derivation in reverse can be obtained by handle pruning. If w is a sentential of the grammar at hand, then w= n, where n is the nth right-sentential form unknown rightmost derivation.

S = 0 rm 1 rm 2 rm . rm n-1 rm n = w. To construct this derivation in reverse order, the handle n in n and replace n by the left side of the production An n to obtain the (n-1)st right-sentential form n-1. Again repeat this process, the handle n-1 in n-1 and reduce this handle to obtain the rightsentential form n-2. If by continuing this process, produce a right sentential form consisting only of the start symbol S, then halt and get successful completion of parsing. The reverse of the sequence of the productions used in the reductions is a rightmost derivation for the input string. For example, the grammar E E + E | E * E | ( E ) | id

id1 + id2 * id3 E + id2 * id3 E + E * id3 E+E*E E+E E

id1 id2 id3 E*E E+E

E id E id E id EE*E EE+E

Fig. Reductions marked by Shift-reduce parser STACK IMPLEMENTATION OF SHIFT-REDUCE PARSING There are two problems must be solved to parse by handle pruning. i) To locate the sub string to be reduced in a right-sentential form. ii) To determine, more than one production with that sub string on the right side. To implement a shift-reduce parser is to use a stack to hold grammar symbols and an input buffer to hold the string to be parsed. Initially, the stack is empty, and the string is on the input. STACK INPUT $ w$ where , w is the string. The parser operates by shifting zero or more input symbols onto the stack until a handle is on top of the stack. The parser then reduces to the left side of the appropriate production. The parser repeats this cycle until it has detected and error or until the stack contains the start symbol and the input is empty. STACK INPUT S$ $ After entering this configuration, the parser halts and get successful completion of parsing.

Example: A shift reduce parser is parsing the input string id1 + id2 * id3 according to the grammar E E + E | E * E | ( E ) | id. STACK $ $ id1 $E $E+ $ E + id2 $E+E $E+E* $ E + E * id3 $E+E*E $E+E INPUT id1 + id2 * id3$ + id2 * id3$ + id2 * id3$ id2 * id3$ * id3$ * id3$ id3$ $ $ $ ACTION Shift Reduce by E id Shift Shift Reduce by E id Shift Shift Reduce by E id Reduce by E E * E Reduce by E E + E



Primary operations of the parser are shift and reduce. A shift-reduce parser can make the four possible actions : a) shift b) reduce c) accept and d) error a) In a shift action, the next input symbol is shifted onto the top of the stack. b) In a reduce action, the parser knows the right side of the handle is at the top of the stack. It must be locate the left end of the handle within the stack and decide with what non-terminal to replace the handle. c) In an accept action, the parser announces successful completion of parsing. d) In an error action, the parser discovers that a syntax error has occurred and calls an error recovery routine. The important use of a stack in shift-reduce parsing: The handle will always eventually appear on top of the stack, never inside. S *rm A z *rm B y z *rm y z , here, A *rm B y , B *rm S *rm B x A z *rm B x y z *rm x y z , here, A *rm y , B *rm In reverse of shift-reduce parser

1) STACK $ $ B $ By $ A $ Az $ Az $S

INPUT yz$ yz$ (B ) z $ ( Shift ) z$ (A By) $ ( Shift ) $ (S Az) $

2) STACK INPUT $ xyz$ $ B xyz$ (B ) $ Bx y z $ ( Shift ) $ Bxy z $ ( Shift ) $ BxA z$(A y) $ BxAz $ ( Shift ) $ BxAz $ ( S B x A z ) $S $ After making a reduction the parser had to shift zero or more symbols to get the next handle onto the stack. It never had to go into the stack to find the handle. Viable prefixes The set of prefixes of right sentential forms that can appear on the stack of a shiftreduce parser are called viable prefix. (or) It is a prefix of a right-sentential form that does not continue past the right end of the right most handle of that sentential form.

Conflicts during shift-reduce parsing Context-free grammars for which shift-reduce parsing cannot be used. Every shift-reduce parser for such a grammar can reach a configuration in which the parser, knowing the entire stack contents and the next input symbol, cannot decide whether to shift or to reduce (a shift/reduce conflict), or cannot decide which of several reductions to make (a reduce/reduce conflict). LR(k) class of grammars, the k in LR(k) refers to the number of symbols of look ahead on the input. Grammars used in compiling fall in the LR(1) class, with one symbol look ahead. Non-LR-ness occurs, but the stack contents and the next input symbol are not sufficient to determine which production should be used in a production.