(Based on: Compilers, Principles, Techniques and Tools, by Aho, Sethi and Ullman, 1986)
Compilers
A Compiler is a program that reads a program written in one language (the source language) and translates it into another (the target language) A compiler operates in phases, each of which transforms the source program from one representation to the other Source program Lexical Analyzer Syntax Analyzer Semantic Analyzer Intermediate Code Generator Code Optimizer Code Generator Target Program The part of the compiler we will focus on in this part of the course is the Syntax Analyzer or Parser.
2
Parsing
Parsing is the process of determining whether a string of tokens can be generated by a grammar. Most parsing methods fall into one of two classes, called the top-down and bottom-up methods. In top-down parsing, construction starts at the root and proceeds to the leaves. In bottom-up parsing, construction starts at the leaves and proceeds towards the root. Efficient top-down parsers are easy to build by hand. Bottom-up parsing, however, can handle a larger class of grammars. They are not as easy to build, but tools for generating them directly from a grammar are available. 3
LL(1) Grammars
4
Left-Recursive Grammars II
The general procedure for removing direct left recursion recursion that occurs in one ruleis the following:
Group the A-rules as A A1 | | Am | 1 | 2 || n where none of the s begins with A Replace the original A-rules with
A 1A | 2 A | | n A A 1 A | 2 A | | m A
This procedure will not eliminate indirect left recursion of the kind:
A BaA B Ab [Another procedure exists that is not given here]
Direct or Indirect Left-Recursion is problematic for all topdown parsers. However, it is not a problem for bottom-up parsing algorithms.
Left-Factoring a Grammar I
Left Recursion is not the only trait that disallows top-down parsing. Another is whether the parser can always choose the correct Right Hand Side on the basis of the next token of input, using only the first token generated by the leftmost nonterminal in the current derivation. To ensure that this is possible, we need to leftfactor the non left-recursive grammar generated in the previous step.
10
Left-Factoring a Grammar II
Here is the procedure used to left-factor a grammar:
For each non-terminal A, find the longest prefix common to two or more of its alternatives. Replace all the A productions: A 1 | 2 | n | (where represents all alternatives that do not begin with ) By:
A A | A 1 | 2 | | n
11
14
Stack
Input id+id*id$ id+id*id$ id+id*id$ id+id*id$ +id*id$ +id*id$ +id*id$ id*id$ id*id$ id*id$ *id$ *id$ id$ id$ $ $ $
E
E
E TE
E + TE T FT T F id T * FT
E TE
E E T FT T T F (E) Algorithm Trace
T T
Parsing Table
First(E) = {+, }
LL(1) Grammars
A grammar whose parsing table has no multiplydefined entries is said to be LL(1) No ambiguous or left-recursive grammar can be LL(1). A grammar G is LL(1) iff whenever A | are two distinct productions of G, then the following conditions hold:
For no terminal a do both and derive strings beginning with a At most one of and can derive the empty string If can (directly or indirectly) derive , then does not derive any string beginning with a terminal in Follow(A).
19
LR Parsing: Advantages
LR Parsers can recognize any language for which a context free grammar can be written. LR Parsing is the most general non-backtracking shift-reduce method known, yet it is as efficient as ither shift-reduce approaches The class of grammars that can be parsed by an LR parser is a proper superset of that that can be parsed by a predictive parser. An LR-parser can detect a syntactic error as soon as it is possible to do so on a left-to-right scan of the input.
21
LR-Parsing: Drawback/Solution
The main drawback of LR parsing is that it is too much work to construct an LR parser by hand for a typical programming language grammar. Fortunately, specialized tools to construct LR parsers automatically have been designed. With such tools, a user can write a context-free grammar and have a parser generator automatically produce a parser for that grammar. An example of such a tool is Yacc Yet Another Compiler-Compiler
22
26
0
1 2 3
s5
s4
4
5 6 7 8 9 10
s5
r6 s5 s5 s6 r1 r3 s7 r3 r6
s4
r6 s4 s4 s11 R1 r3 r1 r3 r6
2
9
3
3 10
11
r5
r5
r5
r5
27
(3) 0 F 3
(4) 0 T 2 (5) 0 T 2 * 7 (6) 0 T 2 * 7 id 5
* id + id $ Reduce by T F
* id + id $ Shift id + id $ Shift + id $ Reduce by F id
(7) 0 T 2 * 7 F 10
(8) 0 T 2 (9) 0 E 1 (10) 0 E 1 + 6 (11) 0 E 1 + 6 id 5 (12) 0 E 1 + 6 F 3 (13) 0 E 1 + 6 T 9
+ id $ Reduce by T T * F
+ id $ Reduce by E T + id $ Shift id $ Shift $ Reduce by F id $ Reduce by T F $ EE+T
28
(14) 0 E 1
$ Accept
SLR Parsing
Definition: An LR(0) item of a grammar G is a production of G with a dot at some position of the right side. Example: A XYZ yields the four following items:
A .XYZ A X.YZ A XY.Z A XYZ.
The production A generates only one item, A . Intuitively, an item indicates how much of a production we have seen at a given point in the parsing process. 29
SLR Parsing
To create an SLR Parsing table, we define three new elements: An augmented grammar for G, the initial grammar. If S is the start symbol of G, we add the production S .S . The purpose of this new starting production is to indicate to the parser when it should stop parsing and accept the input. The closure operation The goto function
30
Augmented grammar
0. E E 1. E E + T 2. E T 3. E T * F 4. T F 5. F (E) 6. F id
Closure(I)= { [E .E], [E .E + T], [E .T], [E .T*F], [T .F], [F .(E)] [F .id] }
32
35