Anda di halaman 1dari 33

Parsing

Prepared by
Manuel E. Bermdez, Ph.D.
Associate Professor
University of Florida

Programming Language Principles
Lecture 3
Context-Free Grammars
Definition: A context-free grammar (CFG)
is a quadruple G = (u, E, P, S), where all
productions are of the form A o, for A u
and o (Euu )*.

Re-writing using grammar rules:

A => o if A o (derivation).
String Derivations

Left-most derivation: At each step, the
left-most nonterminal is re-written.

Right-most derivation: At each step, the
right-most nonterminal is re-written.


Derivation Trees
Derivation trees:
Describe re-writes, independently of the
order (left-most or right-most).

Each tree branch matches a production
rule in the grammar.

Derivation Trees
Notes:
1) Leaves are terminals.
2) Bottom contour is the sentence.
3) Left recursion causes left branching.
4) Right recursion causes right branching.
Goal of Parsing
Examine input string, determine whether
it's legal.
Equivalent to building derivation tree.
Added benefit: tree embodies syntactic
structure of input.
Therefore, tree should be unique.
Ambiguous Grammars
Definition: A CFG is ambiguous if there
exist two different right-most (or left-
most, but not both) derivations for some
sentence z.

(Equivalent) Definition: A CFG is
ambiguous if there exist two different
derivation trees for some sentence z.
Ambiguous Grammars
Classic ambiguities:

Simultaneous left/right recursion:
E E + E
i

Dangling else problem:
S if E then S
if E then S else S

Operator Precedence and
Associativity
Lets build a CFG for expressions consisting of:

elementary identifier i.
+ and - (binary ops) have lowest
precedence, and are left associative .
* and / (binary ops) have middle
precedence, and are right associative.
+ and - (unary ops) have highest
precedence, and are right associative.
Corresponding Grammar for Expressions
E E + T E consists of T's,
E - T separated by s and +'s
T (lowest precedence).
T F * T T consists of F's,
F / T separated by *'s and /'s
F (next precedence).
F - F F consists of a single P,
+ F preceded by +'s and -'s.
P (next precedence).
P '(' E ')' P consists of a parenthesized E,
i or a single i (highest precedence).
Operator Precedence and Associativity
Operator precedence:
The lower in the grammar, the higher the
precedence.
Operator Associativity:
Tie breaker for precedence.
Left recursion in the grammar means
left associativity of the operator,
left branching in the tree.
Right recursion in the grammar means
right associativity of the operator,
right branching in the tree.

Building Derivation Trees

Sample Input :
- + i - i * ( i + i ) / i + i

(Human) derivation tree construction:

Bottom-up.
On each pass, scan entire expression,
process operators with highest precedence
(parentheses are highest).
Lowest precedence operators are last, at
the top of tree.
Abstract Syntax Trees
AST is a condensed version of the
derivation tree.
No noise (intermediate nodes).
String-to-tree transduction grammar:
rules of the form A => 's'.
Build 's' tree node, with one child per tree
from each nonterminal in .
Example
E E + T => +
E - T => -
T
T F * T => *
F / T => /
F
F - F => neg
+ F => +
P
P '(' E ')'
i => i


Sample Input : - + i - i * ( i + i ) / i + i
String-to-Tree Transduction

We transduce from vocabulary of input
symbols, to vocabulary of tree node names.

Could eliminate construction of unary +
node, anticipating semantics.

F - F => neg
+ F // no more unary + node
P

The Game of Syntactic Dominoes
The grammar:

E E+T T P*T P (E)
T P i

The playing pieces: An arbitrary supply of
each piece (one per grammar rule).

The game board:
Start domino at the top.
Bottom dominoes are the "input."
The Game of Syntactic Dominoes
Game rules:
Add game pieces to the board.
Match the flat parts and the symbols.
Lines are infinitely elastic.

Object of the game:
Connect start domino with the input
dominoes.
Leave no unmatched flat parts.

Parsing Strategies
Same as for the game of syntactic dominoes.

Top-down parsing: start at the start
symbol, work toward the input string.
Bottom-up parsing: start at the input
string, work towards the goal symbol.

In either strategy, can process the input left-
to-right or right-to-left
Top-Down Parsing
Attempt a left-most derivation, by predicting
the re-write that will match the remaining
input.

Use a string (a stack, really) from which the
input can be derived.
Top-Down Parsing
Start with S on the stack.
At every step, two alternatives:
1) e (the stack) begins with a terminal t.
Match t against the first input symbol.
2) e begins with a nonterminal A. Consult an
OPF (Omniscient Parsing Function) to
determine which production for A would
lead to a match with the first symbol of
the input.
The OPF does the predicting in such a
predictive parser.
Classical Top-Down Parsing
Algorithm
Push (Stack, S);
while not Empty (Stack) do
if Top(Stack) e
then if Top(Stack) = Head(input)
then input := tail(input)
Pop(Stack)
else error (Stack, input)
else P:= OPF (Stack, input)
Push (Pop(Stack), RHS(P))
od

Top-Down Parsing
Most parsing methods impose bounds on
the amount of stack lookback and input
lookahead. For programming languages,
a common choice is (1,1).

We must define OPF (A,t), where A is the
top element of the stack, and t is the first
symbol on the input.

Storage requirements: O(n
2
), where n is
the size of the grammar vocabulary
(a few hundred).
LL(1) Grammars
Definition:
A CFG G is LL(1) (Left-to-right, Left-most, one-
symbol lookahead)
iff for all A u, and for all Ao, A|, o = |
,


Select (A o) Select (A |) = |

Previous example: Grammar is not LL(1).
More later on why, and what do to about it.
Example:
S A {b,}
A bAd {b}
{d, }


Disjoint!

Grammar is LL(1)!
d b
S S A S P
A A A bAd A
(At most) one
production per
entry.
Parsing
Prepared by
Manuel E. Bermdez, Ph.D.
Associate Professor
University of Florida

Programming Language Principles
Lecture 3

Anda mungkin juga menyukai