Outline
y Role of parser y Context free grammars y Top down parsing y Bottom up parsing y Parser generators
Parser
getNext Token
Parse tree
Symbol table
Uses of grammars
E -> E + T | T T -> T * F | F F -> (E) | id
Error handling
y Common programming errors y Lexical errors y Syntactic errors y Semantic errors y Lexical errors y Error handler goals y Report the presence of errors clearly and accurately y Recover from each error quickly enough to detect subsequent errors y Add minimal overhead to the processing of correct progrms
Error-recover strategies
y Panic mode recovery y Discard input symbol one at a time until one of designated set of synchronization tokens is found y Phrase level recovery y Replacing a prefix of remaining input by some string that allows the parser to continue y Error productions y Augment the grammar with productions that generate the erroneous constructs y Global correction y Choosing minimal sequence of changes to obtain a globally least-cost correction
Derivations
y Productions are treated as rewriting rules to generate a
Parse trees
y -(id+id)
y E => -E => -(E) => -(E+E) => -(id+E)=>-(id+id)
Ambiguity
y For some strings there exist more than one parse tree y Or more than one leftmost derivation y Or more than one rightmost derivation y Example: id+id*id
Elimination of ambiguity
A grammar is left recursive if it has a non-terminal A + such that there is a derivation A=> A Top down parsing methods cant handle left-recursive grammars A simple rule for direct left recursion elimination:
For a rule like:
A -> A A -> A -> | A A |
y
y y
For (each j from 1 to i-1) { y Replace each production of the form Ai-> Aj by the production Ai -> 1 | 2 | | k where Aj-> 1 | 2 | | k are all current Aj productions y } y Eliminate left recursion among the Ai-productions }
Left factoring
y Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive or top-down parsing. y Consider following grammar: y Stmt -> if expr then stmt else stmt y | if expr then stmt y On seeing input if it is not clear for the parser which production to use y We can easily perform left factoring: y If we have A-> 1 | 2 then we replace it with
y y
A -> A ->
A 1 |
A -> A ->
A| 1 | 2 |
Introduction
y A Top-down parser tries to create a parse tree from the
root towards the leafs scanning input from left to right y It can be also viewed as finding a leftmost derivation for an input string y Example: id+id*id
E -> TE E -> +TE | T -> FT T -> *FT | F -> (E) | id
E
lm
E T E
lm
E T E T
lm
E T E T
lm
E T E T
lm
E T E T + T E
F id
F id
F id
nonterminal y Execution begins with the procedure for start symbol y A typical procedure for a non-terminal
void A() { choose an A-production, A->X1X2..Xk for (i=1 to k) { if (Xi is a nonterminal call procedure Xi(); else if (Xi equals the current input symbol a) advance the input to the next symbol; else /* an error has occurred */ } }
backtracking y In general form it cant choose an A-production easily. y So we need to try all alternatives y If one failed the input pointer needs to be reset and another alternative should be tried y Recursive descent parsers cant be used for leftrecursive grammars
Example
S->cAd A->ab | a Input: cad
S c A d c a
S A b d c
S A a d
Computing First
y To compute First(X) for all grammar symbols X, apply
* following rules until no more terminals or added to any First set:
can be
If X is a terminal then First(X) = {X}. 2. If X is a nonterminal and X->Y1Y2 Yk is a production for some k>=1, then place a in First(X) if for some i a is in First(Yi) and is in all of First(Y1), ,First(Yi-1) that * is Y1 Yi-1 => . if is in First(Yj) for j=1, ,k then add to First(X). 3. If X-> is a production then add to First(X)
1.
y Example!
Computing follow
y To compute First(A) for all nonterminals A, apply
y Example!
LL(1) Grammars
y Predictive parsers are those recursive descent parsers needing no
backtracking y Grammars for which we can create predictive parsers are called LL(1)
y The first L means scanning input from left to right y The second L means leftmost derivation y And 1 stands for using one input symbol for lookahead
in grammar do the
following:
For each terminal a in First( ) add A-> in M[A,a] 2. If is in First( ), then for each terminal b in Follow(A) add A-> to M[A,b]. If is in First( ) and $ is in Follow(A), add A-> to M[A,$] as well
1.
Example
E -> TE E -> +TE | T -> FT T -> *FT | F -> (E) | id
Non terminal
First F T E E T id +
{(,id} {(,id} {(,id} {+, } {*, }
Follow
{+, *, ), $} {+, ), $} {), $} {), $} {+, ), $}
Input Symbol ( *
E -> TE
)
E ->
$
E ->
E E T T F
Another example
S -> iEtSS | a S -> eS | E -> b
Non terminal
a
S -> a
Input Symbol i e
S -> iEtSS S -> S -> eS
$
S ->
S S E
E -> b
stack
X Y Z $
output
Parsing Table M
Example
y id+id*id$
Matched Stack E$ Input id+id*id$ Action
Example
Non terminal E E T T F
id E -> TE
+ E -> +TE
Input Symbol ( ) * E -> TE synch E -> T -> FT synch T -> *FT synch T ->
T -> FT
synch T ->
F -> id
synch
Stack
E$ E$ TE$ FTE$ idTE$ TE$ *FTE$ FTE$ TE$
Input
)id*+id$ id*+id$ id*+id$ id*+id$ id*+id$ *+id$ *+id$ +id$ +id$
Action
Error, Skip ) id is in First(E)
Introduction
y Constructs parse tree for an input string beginning at
the leaves (the bottom) and working towards the root (the top) y Example: id*id
E -> E + T | T T -> T * F | F F -> (E) | id
id*id F * id id T * id F id T*F F id id F T*F F id id E F T*F F id id
Shift-reduce parser
y The general idea is to shift some symbols of input to
the stack until a reduction can be applied y At each reduction step, a specific substring matching the body of a production is replaced by the nonterminal at the head of the production y The key decisions during bottom-up parsing are about when to reduce and about what production to apply y A reduction is a reverse of a step in a derivation y The goal of a bottom-up parser is to construct a derivation in reverse:
y E=>T=>T*F=>T*id=>F*id=>id*id
Handle pruning
y A Handle is a substring that matches the body of a
production and whose reduction represents one step along the reverse of a rightmost derivation
Right sentential form id*id F*id T*id T*F Handle id F id T*F Reducing production F->id T->F F->id E->T*F
Input else $
Reduce/reduce conflict
stmt -> id(parameter_list) stmt -> expr:=expr parameter_list->parameter_list, parameter parameter_list->parameter parameter->id expr->id(expr_list) expr->id expr_list->expr_list, expr Stack expr_list->expr id(id
Input ,id) $
LR Parsing
y The most prevalent type of bottom-up parsers y LR(k), mostly interested on parsers with k<=1 y Why LR parsers? y Table driven y Can be constructed to recognize all programming language constructs y Most general non-backtracking shift-reduce parsing method y Can detect a syntactic error as soon as it is possible to do so y Class of grammars for which we can construct LR parsers are superset of those which we can construct LL parsers
States of an LR parser
y States represent set of items y An LR(0) item of G is a production of G with the dot at
the following rules: y Add every item in I to closure(I) y If A-> .B is in closure(I) and B-> is a production then add the item B->. to clsoure(I). I0=closure({[E ->.E]} y Example:
T (
I2 E ->T. T->T.*F
I4 F->(.E) E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id
Closure algorithm
SetOfItems CLOSURE(I) { J=I; repeat for (each item A-> .B in J)
GOTO algorithm
SetOfItems GOTO(I,X) { J=empty; if (A-> .X
return J; }
is in I) add CLOSURE(A-> X. ) to J;
Example
I0=closure({[E ->.E]} E ->.E E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id
acc $ E T id (
I1 E ->E. E->E.+T I2 E ->T. T->T.*F I5 F->id.
I4 F->(.E) E->.E+T E->.T T->.T*F T->.F F->.(E) F->.id
+ * id
I9
+ E
I8 E->E.+T F->(E.)
I11 F->(E).
I3 T>F.
LR-Parsing model
INPUT
a1 ai an $
Sm Sm-1 $
LR Parsing Program
Output
ACTION
GOTO
LR parsing algorithm
let a be the first symbol of w$; while(1) { /*repeat forever */ let s be the state on top of the stack; if (ACTION[s,a] = shift t) { push t onto the stack; let a be the next input symbol; } else if (ACTION[s,a] = reduce A-> ) { pop | | symbols of the stack; let state t now be on top of the stack; push GOTO[t,A] onto the stack; output the production A-> ; } else if (ACTION[s,a]=accept) break; /* parsing is done */ else call error-recovery routine; }
Example
STATE id 0 1 2 3 4 5 6 7 8 9 10 11 S5 S5 S6 R1 R3 R5 S7 R3 R5 S5 R 6 R 6 S4 S4 S11 R1 R3 R5 R1 R3 R5 S5 S6 R2 R 4 S7 R7 S4 R6 R6 9 3 10 R2 R4 + * ACTON ( S4 Acc R2 R4 8 2 3 ) $ E 1 GOTO T 2 F 3
(0) E->E (1) E -> E + T (2) E-> T (3) T -> T * F (4) T-> F (5) F -> (E) (6) F->id
Line (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) Stac k 0 05 03 02 027 0275 02710 02 01 016 0165 0163 0169 01 id F T T* T*id T*F T E E+ E+id E+F E+T` E Symbol s
id*id+id?
Input id*id+id$ *id+id$ *id+id$ *id+id$ id+id$ +id$ +id$ +id$ +id$ id$ $ $ $ $ Action Shift to 5 Reduce by F->id Reduce by T->F Shift to 7 Shift to 5 Reduce by F->id Reduce by T>T*F Reduce by E->T Shift Shift Reduce by F->id Reduce by T->F Reduce by E>E+T accept
If [A-> .a ] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to shift j If [A-> .] is in Ii, then set ACTION[i,a] to reduce A-> for all a in follow(A) If {S ->.S] is in Ii, then set ACTION[I,$] to Accept
SLR(1). y If GOTO(Ii,A) = Ij then GOTO[i,A]=j y All entries not defined by above rules are made error y The initial state of the parser is the one constructed from the set of items containing [S ->.S]
Example
S ->S S->CC C->cC C->d
If [A-> .a , b] is in Ii and Goto(Ii,a)=Ij, then set ACTION[i,a] to shift j If [A-> ., a] is in Ii, then set ACTION[i,a] to reduce A-> If {S ->.S,$] is in Ii, then set ACTION[I,$] to Accept
LR(1). y If GOTO(Ii,A) = Ij then GOTO[i,A]=j y All entries not defined by above rules are made error y The initial state of the parser is the one constructed from the set of items containing [S ->.S,$]
Example
S ->S S->CC C->cC C->d
I7 C->d. , $
GO TO E 1
6 7 8 9
Readings
y Chapter 4 of the book