[1]
Language of a Grammar
If G is a grammar we write
L(G) = { w T | S w }
Definition: A language L is context-free iff there is a grammar G such that L
= L(G)
start symbol corresponds to start state
variable symbols correspond to states
terminal symbols T correspond to the alphabet
CFG
[2]
B 0B | 1C
C | 0B | 1C
CFG
[3]
B A0 | C0 | B0
C B1
CFG
[4]
Example of a derivation
Given the grammar for english above we can generate (leftmost derivation)
SENTENCE SUBJECT VERB OBJECT
ARTICLE NOUN VERB OBJECT
the NOUN VERB OBJECT
CFG
[5]
Derivation Tree
CFG
[6]
Derivation Tree
Both generation corresponds to the same derivation tree or parse tree which
reflects the internal structure of the sentence
Number of left derivations of one word
= number of right derivations
= number of parse trees
CFG
[7]
S (S) | S + S | S S | I
I1|2|3
The terminal symbols are {(, ), +, , 1, 2, 3}
The variable symbols are S and I
CFG
[8]
Ambiguity
CFG
[9]
Ambiguity
CFG
[10]
Ambiguity
A word like
if b then if b then s else s
can be interpreted as
if b then (if b then s else s)
or
if b then (if b then s) else s
10
CFG
[11]
CFG
[12]
12
CFG
[13]
13
CFG
[14]
Abstract Syntax
The parse tree has often too much information w.r.t. the internal structure of
a document. This structure is best reflected by an abstract syntax tree. We give
only an example here.
Here is BNF for arithmetic expression
E E + E | E E | (E) | I
I1|2|3
14
CFG
[15]
Abstract Syntax
This can be compared with the abstract syntax tree for the expression 2+(13)
Concrete syntax describes the way documents are written while abstract syntax
describes the pure structure of a document.
15
CFG
[16]
Abstract Syntax
data Exp = Plus Exp Exp | Times Exp Exp | Num Atom
data Atom = One | Two | Three
ex = Plus Two (Times One Three)
16
CFG
[17]
Ambiguity
17
CFG
[18]
Ambiguity
18
CFG
[19]
A simple example
= {0, 1}
L = {uuR | u }
This language is not regular: using the pumping lemma on 0k 10k
L = L(G) for the grammar
S | 0S0 | 1S1
We prove that if S v then v L by induction on the length of S v
We prove uuR L(G) if u by induction on the length of u
19
CFG
[20]
A simple example
20
CFG
[21]
Polish notation
Ia|b
21
CFG
[22]
Polish notation
The proof is by induction on |u|. If |u| = n + 1 with n > 1 there are three
cases
(1) u = +v then the derivation has to be of the form
E k +EEE k1 +v
for a derivation E k+1 v and we conclude by induction hypothesis
(2) u = v then the derivation has to be of the form
E k EEE k1 v
for a derivation E k+1 v and we conclude by induction hypothesis
22
CFG
[23]
Polish notation
23
CFG
[24]
Polish notation
24
CFG
[25]
Ambiguity
CFG
[26]
Ambiguity
CFG
[27]
Ambiguity
27
CFG
[28]
Ambiguity
28
CFG
[29]
Ambiguity
CFG
[30]
Ambiguity
30
CFG
[31]
Ambiguity
If (v) holds for |v| = n and |u| = n + 1 then u = 0v or u = 1v with |v| = n.
We have two cases
(1) u = 1v and S(1S)k 1v, the derivation has the form
S(1S)k (1S)k 1v
for a derivation S(1S)k1 v and we conclude by induction hypothesis
(2) u = 0v and S(1S)k 0v, the derivation has the form
S(1S)k 0S1S(1S)k 0v
for a derivation S(1S)k+1 v and we conclude by induction hypothesis
31
CFG
[32]
Inherent Ambiguity
A aAb | ab
B cBd | cd
C aCd | aDd
D bDc | bc
32
CFG
[33]
33
CFG
[34]
34
CFG
[35]
35
CFG
[36]
S CBh | D
A aaC
B Sf | ggg
C cA | d | C
D E | SABC
E be
36
CFG
[37]
37
CFG
[38]
Elimination of -productions
38
CFG
[39]
CFG
[40]
40
CFG
[41]
A aAA |
B bBB |
A aA | a
B bB | b
A aAA | aA | a
B bBB | bB | b
41
CFG
[42]
42
CFG
[43]
Reachable Symbols
43
CFG
[44]
Reachable Symbols
Example: Consider the following CFG
S aB | BC
B DB | C
A aA | c | aDb
Cb
DB
B DB | C
Cb
DB
44
CFG
[45]
Generating Symbols
45
CFG
[46]
Generating Symbols
Example: We consider
S aS | W | U
U a
W aW
V aa
V aa
46
CFG
[47]
Ab
47
CFG
[48]
Ab
48
CFG
[49]
S gAe | aY B | CY,
B dd | D,
D n,
C jV B | gi
U kW
V baXXX | oV,
X f V,
A bBY | ooC
W c
Y Y hm
49
CFG
[50]
Simplified grammar
S gAe
A ooC
C gi
50
CFG
[51]
51
CFG
[52]
A aA | c | aDb
Cb|B
S B,
SC
A A,
A D,
BB
B D,
B C,
CB
CFG
[53]
B C,
A, C D
53
CFG
[54]
A C,
B Ca,
Ca
C A,
C B,
CFG
[55]
Definition: A CFG is in Chomsky Normal Form (CNF) iff all productions are
of the form A BC or A a
Theorem: For any CFG G there is a CFG G0 in Chomsky Normal Form such
that L(G0) = L(G) {}
55
CFG
[56]
56
CFG
[57]
Aa
Bb
and then
S AC | SS | AB
Aa
Bb
C SB
57
CFG
[58]
CFG
[59]
59
CFG
[60]
CFG
[61]
|vx| > 0,
|vwx| 6 N,
61
CFG
[62]
|vx| > 0,
|vwx| 6 N,
Since |vwx| 6 N there is one letter d {a, b, c} that occurs not in vwx, and
since |vx| > 0 there is another letter e 6= d that occurs in vx. Then e has more
occurence than d in uv 2wx2y, and this contradicts uv 2wx2y L. Q.E.D.
62
CFG
[63]
63
CFG
[64]
A a,
B b,
C SB
CFG
[65]
T = {a, b, c}
L1 = {ak bk cm | k, m > 0}
L2 = {ambk ck | k, m > 0}
L1 and L2 are CFL, but the intersection
L1 L2 = {ak bk ck | k > 0}
is not CF
65
CFG
[66]
However one can show (we will not do the proof in this course, but you should
know the result)
Theorem: If L1 is context-free and L2 is regular then L1 L2 is
context-free
Application: The following language, for = {0, 1}
L = {uu | u }
is not context-free, by considering the intersection with L(0101)
One can show that the complement of L is context-free!
66
CFG
[67]
If L1 = L(G1) and L2 = L(G2) with disjoint set of variables V1 and V2, and
same alphabet T , we can define
G = (V1 V2 {S}, T, P1 P2 {S S1 | S2}, S)
It is then direct to show that L(G) = L(G1) L(G2) since a derivation has
the form
S S1 u
or
S S2 u
67
CFG
[68]
L1 L2 = L1 L2
So CFL cannot be closed under complement in general. Otherwise they would
be closed under intersection.
68
CFG
[69]
If L1 = L(G1) and L2 = L(G2) with disjoint set of variables V1 and V2, and
same alphabet T , we can define
G = (V1 V2 {S}, T, P1 P2 {S S1S2}, S)
It is then direct to show that L(G) = L(G1)L(G2) since a derivation has the
form
S S1S2 u1u2
with
S1 u1,
S2 u2
69
CFG
[70]
LL(1) parsing
There are algorithms to decide if a grammar is LL(1) (not done in this course)
Any LL(1) grammar is unambiguous (because by definition there is a at most
one left most derivation for any string)
70
CFG
[71]
Grammar transformations
The grammar
S AB, A aA | a, b bB | c
is equivalent to the grammar
S aAB, A aA | , b bB | c
71
CFG
[72]
Grammar transformations
The grammar
S Bb
B Sa | a
T abT |
which is LL(1)
72
CFG
[73]
Grammar transformations
The grammar
A Aa | b
is equivalent to the grammar
A bB, B aB |
In general however there is no algorithm to decide L(G1) = L(G2)
For regular expression, we have an algorithm to decide L(E1) = L(E2)
73
CFG
[74]
74
CFG
[75]
dynamic programming
f ib 0 = f ib 1 = 1
f ib (n + 2) = f ib n + f ib (n + 1)
f ib 5? calls f ib 4, f ib 3 and f ib 4 calls f ib 3
So in a top-down computation there is duplication of works (if one does not
use memoization)
75
CFG
[76]
dynamic programming
A BA | a,
B CC | b,
C AB | a
76
CFG
[77]
dynamic programming
The idea is to represent bab as the collection of the facts b(0, 1), a(1, 2), b(2, 3)
We compute then the facts X(i, k) for i < k by induction on k i
Only one rule:
If we have a production C AB and A in X(i, j) and B in X(j, k) then C
is in X(i, k)
77
CFG
[78]
B b,
C SB,
D SA
78
CFG
[79]
b(x, y) B(x, y)
79
CFG
[80]
The problem if one can one derive S aabbab is transformed to the problem:
can one produce S(0, 6) in this production system given the facts
a(0, 1), a(1, 2), b(2, 3), b(3, 4), a(4, 5), b(5, 6)
80
CFG
[81]
81
CFG
[82]
For instance the fact that C(3, 6) is produced corresponds to the derivation
C SB BAB bAB baB bab
In this way, we get a solution in O(n3)!
82
CFG
[83]
Forward-chaining inference
This idea works actually for any grammar. For instance
S SS | aSb |
is represented by the production system
S(x, x),
CFG
[84]
Forward-chaining inference
84
CFG
[85]
Complement of a CLF
We have seen that CLF are not closed under intersection, are closed under
union
It follows that they are not closed under complement
Here is an explicit example: one can show that the complement of
{anbncn | n > 0}
is a CFL
85
CFG
[86]
Undecidable Problems
86
CFG
[87]
Undecidable Problems
87
CFG
[88]
Haskell Program
isPrefix [] ys = True
isPrefix (x:xs) (y:ys) = x == y && isPrefix xs ys
isPrefix xs ys = False
isComp (xs,ys) = isPrefix xs ys || isPrefix ys xs
exists p [] = False
exists p (x:xs) = p x || exists p xs
exhibit p (x:xs) = if p x then x else exhibit p xs
88
CFG
[89]
Haskell Program
addNum k [] = []
addNum k (x:xs) = (k,x):(addNum (k+1) xs)
nextStep xs ys =
concat (map (\ (n,(s,t)) ->
map (\ (ns,(u,v)) -> (ns++[n],(u ++ s,v ++ t)))
ys)
xs)
89
CFG
[90]
Haskell Program
mainLoop xs ys =
let
bs = filter (isComp . snd) ys
prop (_,(u,v)) = u == v
in
if exists prop bs then exhibit prop bs
else if bs == [] then error"NO SOLUTION"
else mainLoop xs (nextStep xs bs)
90
CFG
[91]
Haskell Program
post xs =
let
as = addNum 1 xs
in mainLoop as (map (\ (n,z) -> ([n],z)) as)
xs1 = [("1","101"),("10","00"),("011","11")]
xs2 = [("001","0"),("01","011"),("01","101"),("10","001")]
91
CFG
[92]
Haskell Program
92
CFG
[93]
93
CFG
[94]
94
CFG
[95]
95
CFG
[96]
96
CFG
[97]
97
CFG
[98]
98