Syntax Analysis
The next phase is
called the syntax
analysis or parsing. It takes the token produced by lexical analysis as input and
generates a parse tree (or syntax tree). In this phase, token arrangements are checked
against the source code grammar, i.e. the parser checks if the expression made by the
tokens is syntactically correct.
Explanation
Syntax Analysis or Parsing is the second phase, i.e. after lexical analysis. It checks the
syntactical structure of the given input, i.e. whether the given input is in the correct syntax (of
the language in which the input has been written) or not. It does so by building a data structure,
called a Parse tree or Syntax tree. The parse tree is constructed by using the pre-defined
Grammar of the language and the input string. If the given input string can be produced with the
help of the syntax tree (in the derivation process), the input string is found to be in the correct
syntax. if not, error is reported by syntax analyzer.
Inshort: The parser analyzes the source code (token stream) against the production rules
to detect any errors in the code. The output of this phase is a parse tree. the parser
accomplishes two tasks, i.e., parsing the code, looking for errors and generating a parse
tree as the output of the phase.
Parsers are expected to parse the whole code even if some errors exist in the program.
Parsers use error recovering strategies.
TPX
Derivation
A derivation is basically a sequence of production rules, in order to get the input string.
During parsing, we take two decisions for some sentential form of input:
Left-most Derivation: If the sentential form (related text) of an input is scanned and
replaced from left to right, it is called left-most derivation. The sentential form derived by the
left-most derivation is called the left-sentential form.
Example 1
Production rules:
1. S = S + S
TPX
2. S = S - S
3. S = a | b |c
Input:
a - b + c
1. S = S+S
2. S = S-S+S
3. S = a-S+S
4. S = a-b+S
5. S = a-b+c
Example 2
Production rules:
E → E + E
E → E * E
E → id
Input string: id + id * id
The left-most derivation is:
E → E * E
E → E + E * E
E → id + E * E
E → id + id * E
E → id + id * id
Notice that the left-most side non-terminal is always processed first.
Right-most Derivation: If we scan and replace the input with production rules, from
right to left, it is known as right-most derivation. The sentential form derived from the
right-most derivation is called the right-sentential form.
Example 1
1. S = S + S
2. S = S - S
3. S = a | b |c
Input:
a-b+c
1. S = S - S
2. S = S - S + S
3. S = S - S + c
TPX
4. S = S - b + c
5. S = a - b + c
Example 2
Production rules:
E → E + E
E → E * E
E → id
Input string: id + id * id
The right-most derivation is:
E → E + E
E → E + E * E
E → E + E * id
E → E + id * id
E → id + id * id
Parse Tree
A parse tree is a graphical depiction of a derivation/symbol. The derivation/symbol can be
terminal or non-terminal. In parsing, the string is derived using the start symbol. The start
symbol of the derivation becomes the root of the parse tree.
So, in short parse tree is the graphical representation of derivation /symbol that can be
terminals or non-terminals.
Parse tree follows the precedence of operators. The deepest sub-tree is traversed/moves
first, therefore the operator in that sub-tree gets precedence over the operator which is in
the parent nodes.
1. T= T + T | T * T
2. T = a|b|c
Input:
TPX
a * b + c
Step 1: Step 2:
Step 3: Step 4:
Step 5:
RESUTING output of a * b + c
Example 2:
Here We take the left-most derivation of a + b * c
The left-most derivation is:
E → E * E
E → E + E * E
E → id + E * E
E → id + id * E
E → id + id * id
TPX
Example 2:
This is how semantic analysis happens – S = 2+3*4. Parse tree corresponding to S
would be
Ambiguity
A grammar G is said to be
ambiguous if there exists more than
one leftmost derivation or more
than one rightmost derivative or
more than one parse tree for the
given input string. If the grammar
is not ambiguous then it is called
unambiguous.
Example 1:
Example
E → E + E
TPX
E → E – E
E → id
For the input string id + id – id, the above grammar generates two parse trees:
Example 2:
Let us consider this grammar : E -> E+E|id
We can create 2 parse tree from this grammar to obtain a string id+id+id :
The following are the 2 parse trees generated by left most derivation:
Both the above parse trees are derived from same grammar rules but both parse trees
are different. Hence the grammar is ambiguous.
Example 3:
Let us now consider the following grammar:
Example 4:
1. S = aSb | SS
2. S = ∈
For the string aabb, the above grammar generates two parse trees:
TPX
Parser
In the syntax analysis phase, a compiler verifies whether or not the tokens generated by the
lexical analyzer are grouped according to the syntactic rules of the language. This is done by a
parser. The parser obtains a string of tokens from the lexical analyzer and verifies that the string
can be the grammar for the source language. It detects and reports any syntax errors and
produces a parse tree from which intermediate code can be generated. So, in short, the Parser
is a compiler that is used to break the data into smaller elements coming from lexical
analysis phase. A parser takes input in the form of sequence of tokens and produces output
in the form of parse tree.
Type of parser
2: Follow(): What is the Terminal Symbol which follow a variable in the process of
derivation
Rules for Follow Sets
1. First put $ (the end of input marker) in Follow(S) (S is the start symbol)
2. If there is a production A → aBb, (where a can be a whole string) then everything in
FIRST(b) except for ε is placed in FOLLOW(B).
3. If there is a production A → aB, then everything in FOLLOW(A) is in FOLLOW(B)
4. If there is a production A → aBb, where FIRST(b) contains ε, then everything in FOLLOW(A)
is in FOLLOW(B)
TPX
Bottom up paring
As the name suggests, bottom-up parsing starts with the input symbols and tries to
construct the parse tree up to the start symbol.
1. Shift-Reduce Parsing->
LR( 1 )
SLR( 1 )
CLR ( 1 )
LALR( 1 )
Production
1. E → T
2. T → T * F
3. T → id
4. F → T
5. F → id
o Sift reduce parsing performs the two actions: shift and reduce. That's why it is
known as shift reduces parsing.
o At the shift action, the current symbol in the input string is pushed to a stack.
o At each reduction, the symbols will replaced by the non-terminals. The symbol is the
right side of the production and non-terminal is the left side of the production.
Example:
Grammar:
1. S → S+S
2. S → S-S
3. S → (S)
4. S → a
Input string:
1. a1-(a2+a3)
Parsing table:
TPX
1. Operator-Precedence Parsing
2. LR-Parser