L 10

CFG
[1]
Language of a Grammar
If G is a grammar we write
L(G) = { w T | S w }
Definition: A language L is context-free iff there is a grammar G such that L
= L(G)
start symbol corresponds to start state
variable symbols correspond to states
terminal symbols T correspond to the alphabet
CFG
[2]
Context-Free Languages and Regular Languages
Theorem: If L is regular then L is context-free.

Proof: We know L = L(A) for a DFA A. From A we can build a CFG G
such that L(A) = L(G)
The variables are A, B, C, with start symbol A, the terminal token are 0, 1
and the productions are
A 1A | 0B
B 0B | 1C
C | 0B | 1C
CFG
[3]
Let LX be the language generated by the grammar with X as a start symbol
we prove (mutual induction!) that w LX iff (X,

w) = C by induction on |w|
Such a CFG is called right regular
It would be possible also to define L by a left regular language with start state
C
A | C1 | A1
B A0 | C0 | B0
C B1
The intuition here is that LX represents the path from A to X
CFG
[4]
Example of a derivation
Given the grammar for english above we can generate (leftmost derivation)
SENTENCE SUBJECT VERB OBJECT
ARTICLE NOUN VERB OBJECT
the NOUN VERB OBJECT
the NOUN VERB OBJECT
the cat VERB OBJECT
the cat caught OBJECT
the cat caught ARTICLE NOUN
the cat caught a NOUN
the cat caught a dog
CFG
[5]
Derivation Tree
Notice that the following generation is possible (rightmost derivation)

SENTENCE SUBJECT VERB OBJECT
SUBJECT VERB ARTICLE NOUN
SUBJECT VERB ARTICLE dog
SUBJECT VERB a dog SUBJECT caught a dog
ARTICLE NOUN caught a dog ARTICLE cat caught a dog
the cat caught a dog
5
CFG
[6]
Derivation Tree
Both generation corresponds to the same derivation tree or parse tree which
reflects the internal structure of the sentence
Number of left derivations of one word
= number of right derivations
= number of parse trees
CFG
[7]
A grammar for arithmetical expressions
S (S) | S + S | S S | I
I1|2|3
The terminal symbols are {(, ), +, , 1, 2, 3}
The variable symbols are S and I
CFG
[8]
Ambiguity
Definition: A grammar G is ambiguous iff there is some word in L(G) which

has two distinct derivation trees
Intuitively, there are two possible meaning of this word
Example: the previous grammar for arithmetical expression is ambigiuous
since the word 2 + 1 3 has two possible parse trees
CFG
[9]
Ambiguity
An example of ambiguity in programming language is else with the following

production
C if b then C else C
C if b then C
Cs
CFG
[10]
Ambiguity
A word like
if b then if b then s else s
can be interpreted as
if b then (if b then s else s)
or
if b then (if b then s) else s
10
CFG
[11]
Context-Free Languages and Inductive Definitions

Each CFG can be seen as an inductive definition
For instance the grammar for arithmetical expression can be seen as the
following inductive definition
1, 2, 3 are arithmetical expressions
if w is an arithmetical expression then so is (w)
if w1, w2 are arithmetical expressions then so are w1 + w2 and w1 w2
A natural way to do proofs on context-free languages is to follow this inductive
structure
11
CFG
[12]
The following language L = {anbn | n > 1} is context-free

We know that it is not regular
Proposition: The following grammar G generates L
S ab | aSb
12
CFG
[13]
We prove w L(G) implies w L by induction on the derivation of w L(G)

ab L(G)
if w L(G) then awb L(G)
We can prove also w L(G) implies w L by induction on the length of a
derivation S w
We prove anbn L(G) by induction on n
13
CFG
[14]
Abstract Syntax
The parse tree has often too much information w.r.t. the internal structure of
a document. This structure is best reflected by an abstract syntax tree. We give
only an example here.
Here is BNF for arithmetic expression
E E + E | E E | (E) | I
I1|2|3
Parse tree for 2 + (1 3)
14
CFG
[15]
Abstract Syntax
This can be compared with the abstract syntax tree for the expression 2+(13)
Concrete syntax describes the way documents are written while abstract syntax
describes the pure structure of a document.
15
CFG
[16]
Abstract Syntax
In Haskell, use of data types for abstract syntax
data Exp = Plus Exp Exp | Times Exp Exp | Num Atom
data Atom = One | Two | Three
ex = Plus Two (Times One Three)
16
CFG
[17]
Ambiguity
Definition: A grammar G is ambiguous iff there is some word in L(G) which

has two distinct derivation trees
Intuitively, there are two possible meaning of this word
Example: the previous grammar for arithmetical expression is ambigiuous
since the word 2 + 1 3 has two possible parse trees
17
CFG
[18]
Ambiguity
Let be {0, 1}.

The following grammar of parenthesis expressions is ambiguous
E | EE | 0E1
18
CFG
[19]
A simple example
= {0, 1}
L = {uuR | u }
This language is not regular: using the pumping lemma on 0k 10k
L = L(G) for the grammar
S | 0S0 | 1S1
We prove that if S v then v L by induction on the length of S v
We prove uuR L(G) if u by induction on the length of u
19
CFG
[20]
A simple example
Theorem: The grammar for S is not ambiguous

Proof: By induction on |v| we prove that there is at most one production
S v
20
CFG
[21]
Polish notation
The following is a grammar for arithmetical expressions

E EE | + EE | I,
Ia|b
Theorem: This grammar is not ambiguous

We show by induction on |u| the following.
Lemma: for any k there is at most one leftmost derivation of E k u
21
CFG
[22]
Polish notation
The proof is by induction on |u|. If |u| = n + 1 with n > 1 there are three
cases
(1) u = +v then the derivation has to be of the form
E k +EEE k1 +v
for a derivation E k+1 v and we conclude by induction hypothesis
(2) u = v then the derivation has to be of the form
E k EEE k1 v
for a derivation E k+1 v and we conclude by induction hypothesis
22
CFG
[23]
Polish notation
(3) u = iv with i = a or i = b, then the derivation has to be of the form

E k iE k1 iv
for a derivation E k1 v and we conclude by induction hypothesis
23
CFG
[24]
Polish notation
It follows from this result that we have the following property.

Corollary: If u1u2 = v1v2 L(E) then u1 = v1 and u2 = v2. Similarly if
+u1u2 = +v1v2 L(E) then u1 = v1 and u2 = v2.
but the result says also that if u L(E) then there is a unique parse tree for
u.
24
CFG
[25]
Ambiguity
Now, a more complicated example. Let be {0, 1}.

The following grammar of parenthesis expressions is ambiguous
E | EE | 0E1
We replace this by the following equivalent grammar
S 0S1S |
Lemma: L(S) = L(E)
Theorem: The grammar for S is not ambiguous
25
CFG
[26]
Ambiguity
Lemma: L(S)L(S) L(S)

Proof: we prove that if u L(S) then uL(S) L(S) by induction on |u|
If u = then uL(S) = L(S)
If |u| = n + 1 then u = 0v1w with v, w L(S) and |v|, |w| 6 n. By induction
hypothesis, we have wL(S) L(S) and so
uL(S) = 0v1wL(S) 0v1L(S) L(S)
since v L(S) and 0L(S)1L(S) L(S) Q.E.D.
26
CFG
[27]
Ambiguity
We can also do an induction on the length of a derivation S u

Using this lemma, we can show L(E) L(S)
27
CFG
[28]
Ambiguity
Lemma: L(E) L(S)

Proof: We prove that if u L(E) then u L(S) by induction on the length
of a derivation E u
If E = u then u L(S)
If E EE vw = u then by induction v, w L(S) and by the previous
Lemma we have u L(S)
If E 0E1 0v1 = u then by induction v L(S) and so u = 0v1 L(S).
Q.E.D.
28
CFG
[29]
Ambiguity
The proof that the grammar for S is not ambiguous is difficult

One first tries to show that there is at most one left-most derivation
S lm u
for any string u
If u is not we have that u should be 0u1 and then the derivation should be
S 0S1S 0u1
with S1S u1
29
CFG
[30]
Ambiguity
This suggests the following statement (u) to be proved by induction on the

length of u
For any k there exists at most one leftmost derivation S(1S)k u
We can then prove (u) by induction on |u|
If u = we should have k = 0 and the derivation has to be S
30
CFG
[31]
Ambiguity
If (v) holds for |v| = n and |u| = n + 1 then u = 0v or u = 1v with |v| = n.
We have two cases
(1) u = 1v and S(1S)k 1v, the derivation has the form
S(1S)k (1S)k 1v
for a derivation S(1S)k1 v and we conclude by induction hypothesis
(2) u = 0v and S(1S)k 0v, the derivation has the form
S(1S)k 0S1S(1S)k 0v
for a derivation S(1S)k+1 v and we conclude by induction hypothesis
31
CFG
[32]
Inherent Ambiguity
There exists a context-free language L such that for any grammar G if

L = L(G) then G is ambiguous
L = {anbncmdm | n, m > 1} {anbmcmdn | n, m > 1}
L is context-free
S AB | C
A aAb | ab
B cBd | cd
C aCd | aDd
D bDc | bc
32
CFG
[33]
Eliminating - and unit productions
Definition: A unit production is a production of the form A B with A, B

non terminal symbols.
This is similar to -transitions in a -NFA
Definition: A -production is a production of the form A
Theorem: For any CFG G there exists a CFG G0 with no - or unit productions
such that L(G0) = L(G) {}
33
CFG
[34]
Elimination of unit productions
Let P1 be a system of productions such that if A B and B are in P1

then so is A and G1 = (V, T, P1, S).
Let P2 the set of non unit productions of P1 and G2 = (V, T, P2, S)
Theorem: L(G1) = L(G2)
34
CFG
[35]
Proof: If u L(G1) and S u is a derivation of minimal length then this

derivation is in G2. Otherwise it has the shape
S 1A2 1B2 n 1B2 12 u
and we have a shorter derivation
S 1A2 n 1A2 12 u
contradiction.
35
CFG
[36]
S CBh | D
A aaC
B Sf | ggg
C cA | d | C
D E | SABC
E be
36
CFG
[37]
We eliminate unit productions

S SABC | be | CBh
A aaC
B Sf | ggg
C cA | d
37
CFG
[38]
Elimination of -productions
If G = (V, T, P, S) build the new system P1 closing P by adding rules

If A B and B then A
We have L(G1) = L(G). Let P2 the system obtained from P1 by taking away
all -productions
Theorem: L(G2) = L(G) {}
Proof: We clearly have L(G2) L(G1). We prove that if S u, u T
and u 6= is a production of minimal length then it does not use any -production,
so it is a derivation in G2. Q.E.D.
38
CFG
[39]

Starting from G = (V, T, P, S) we build a larger set P1 of productions
containing P and closed under the two rules
1. if A w1Bw2 and B are in P1 then A w1w2 is in P1
2. if A B and B w are in P1 then so is A w
We add only productions whose right-handside is a subtring of an old righthandside, so this process stops.
It can be shown that if L(V, T, P1, S) = L(G) and that if P 0 is the set of
productions in P1 that are not - neither unit production then L(V, T, P 0, S) =
L(G) {}
39
CFG
[40]
Example: If we start from the grammar

S aSb | SS |
we get first the new productions
S ab | S | S
and if we eliminate the - and unit productions we get
S aSb | SS | ab
40
CFG
[41]
Example: If we start from the grammar

S AB
A aAA |
B bBB |
we get first the new productions

S A|B
A aA | a
B bB | b
and if we eliminate the - and unit productions we get

S AB | aAA | aA | a | bB | b
A aAA | aA | a
B bBB | bB | b
41
CFG
[42]
Eliminating Useless Symbols
A symbol X is useful if there is some derivation S X w where w

is in T
X can be in V or T
X is useless iff it is not useful
X is generating iff X w for some w in T
X is reacheable iff S X for some ,
42
CFG
[43]
Reachable Symbols
By analogy with accessible states, we can define accessible or reachable

symbols. We give an inductive definition
BASIS: The start symbol S is reachable
INDUCTION: If A is reachable and A w is a production, then all symbols
occuring in w are reachable.
43
CFG
[44]
Reachable Symbols
Example: Consider the following CFG
S aB | BC
B DB | C
A aA | c | aDb
Cb
DB
Then s is accessible, hence also B and C, and hence D is accessible.

But A is not accessible.
We can take away A from this grammar and we get the same language
S aB | BC
B DB | C
Cb
DB
44
CFG
[45]
Generating Symbols
We define when an element of V T (terminal or non terminal symbols) is

generating by an inductive definition
BASIS: all elements of T are generating
INDUCTION: if there is a production X w where all symbols occuring in
w are generating then X is generating
This gives exactly the generating variables
45
CFG
[46]
Generating Symbols
Example: We consider
S aS | W | U
U a
W aW
V aa
Then U, V are generating because U a
V aa
Hence S is generating because S U

W is not generating, we have only W aW for production for W
46
CFG
[47]
Eliminating Useless Symbols
To eliminate useless symbols in a grammar G, first eliminate all nongenerating

symbols we get an equivalent grammar G1 and then eliminate all symbols in G1
that are non reachable.
We get a grammar G2 that is equivalent to G1 and to G
We have to do this in this order
Examples: For the grammar
S AB | a
Ab
47
CFG
[48]
B is not generating, we get the grammar

Sa
Ab
and then A is not reachable we get the grammar

Sa
48
CFG
[49]
Elimination of useless variables
S gAe | aY B | CY,
B dd | D,
D n,
C jV B | gi
U kW
V baXXX | oV,
X f V,
A bBY | ooC
W c
Y Y hm
49
CFG
[50]
Elimination of useless variables
Simplified grammar
S gAe
A ooC
C gi
50
CFG
[51]
Linear production systems
Several algorithms we have seen are instances of graph searching

algorithm/derivability in linear production systems
51
CFG
[52]
Linear Production systems

For testing for accessibility, for the grammar
S aB | BC,
B DB | C,
A aA | c | aDb
Cb|B
we associate the production system

S,
S B,
SC
A A,
A D,
BB
B D,
B C,
CB
and we can produce S, B, D, C

52
CFG
[53]
A lot of problems in elementary logic are of this form

A B,
B C,
A, C D
What can we deduce from A?
53
CFG
[54]
For computing generating symbols we have a more general form of production

system
For instance for the grammar
A ABC,
A C,
B Ca,
Ca
we can associate the following production system

A, B, C A,
C A,
C B,
and we can produce C, B, A. There is an algorithm for this kind of problem

in 7.4.3
54
CFG
[55]
Chomsky Normal Form
Definition: A CFG is in Chomsky Normal Form (CNF) iff all productions are
of the form A BC or A a
Theorem: For any CFG G there is a CFG G0 in Chomsky Normal Form such
that L(G0) = L(G) {}
55
CFG
[56]
Chomsky Normal Form
We can assume that G has no - or unit productions. For each terminal a we

introduce a new nonterminal Aa with the production
Aa a
We can then assume that all productions are of the form A a or A
B1B2 . . . Bk with k > 2
If k > 2 we introduce C with productions A B1C and C B2 . . . Bk until
we have only right-hand sides of length 6 2
56
CFG
[57]
Chomsky Normal Form
Example: For the grammar

S aSb | SS | ab
we get first
S ASB | SS | AB
Aa
Bb
and then
S AC | SS | AB
Aa
Bb
C SB
which is in Chomsky Normal Form
57
CFG
[58]
The Chomsky Hierarchy

Noam Chomsky 1956
Four types of grammars
Type 0: no restrictions
Type 1: Context-sensitive, rules A
Type 2: Context-free or context-insensitive
Type 3: Regular, rules of the form A Ba or A aB or A
Type 3 Type 2 Type 1 Type 0
Grammars for programming languages are usually Type 2
58
CFG
[59]
Theorem: If L is regular then L is context-free.

Proof: We know L = L(A) for a DFA A. We have A = (Q, , , q0, F ).
We define a CFG G = (Q, , P, q0) where P is the set of productions q aq 0
u) = q 0 and
if (q, a) = q 0 and q if q F . We have then q uq 0 iff (q,
u) F . In particular u L(G) iff u L(A).
q u iff (q,
A grammar where all productions are of the form A aB or A is called
left regular
59
CFG
[60]
Pumping Lemma for Left Regular Languages
Let G = (V, T, P, S) be a left regular language, and let N be |V |.

If a1 . . . ar is a string of length N any derivation
S a1B1 a1a2B2 a1 . . . aiA
a1 . . . aj A a1 . . . an
has length n and there is at least one variable A which is used twice (pigeon-hole
principle)
If x = a1 . . . ai and y = ai+1 . . . aj and z = aj+1 . . . an we have |xy| 6 N and
xy k z L(G) for all k
60
CFG
[61]
Pumping Lemma for Context-Free Languages
Let L be a context-free language

Theorem: There exists N such that if z L and N 6 |z| then one can write
z = uvwxy such that
z = uvwxy,
|vx| > 0,
|vwx| 6 N,
uv k wxk y L for all k
61
CFG
[62]
Pumping Lemma for Context-Free Languages
Theorem: The language {ak bk ck | k > 0} is not context-free

Proof: Assume L to be context-free. Then we have N as stated in the
Pumping Lemma. Consider z = aN bN cN . We have N 6 |z| so we can write
z = uvwxy such that
z = uvwxy,
|vx| > 0,
|vwx| 6 N,
uv k wxk y L for all k
Since |vwx| 6 N there is one letter d {a, b, c} that occurs not in vwx, and
since |vx| > 0 there is another letter e 6= d that occurs in vx. Then e has more
occurence than d in uv 2wx2y, and this contradicts uv 2wx2y L. Q.E.D.
62
CFG
[63]
Proof of the CFL Pumping Lemma
We can assume that the language is presented by a grammar in Chomsky

Normal Form, working with L {}
The crucial remark is that a binary tree with height p + 1 has at most 2p
leaves
The height of a binary tree is the number of nodes from the root to the
longest path
63
CFG
[64]
Proof of the CFL Pumping Lemma
Example: the Chomsky grammar

S AC | AB,
A a,
B b,
C SB
consider a parse tree for a4b4 corresponding to the derivation

S AC aC aSB aACB aaCB aaSBB
a2ACBB a3CBB a3SBBB a3ABBBB a4BBBB
a4BBBB a4bBBB a4b2BB a4b3B a4b4
The symbol S appears twice on a path u = aa, v = a, w = ab, x = b, y = bb
64
CFG
[65]
Non closure under intersection
T = {a, b, c}
L1 = {ak bk cm | k, m > 0}
L2 = {ambk ck | k, m > 0}
L1 and L2 are CFL, but the intersection
L1 L2 = {ak bk ck | k > 0}
is not CF
65
CFG
[66]
Non closure under intersection
However one can show (we will not do the proof in this course, but you should
know the result)
Theorem: If L1 is context-free and L2 is regular then L1 L2 is
context-free
Application: The following language, for = {0, 1}
L = {uu | u }
is not context-free, by considering the intersection with L(0101)
One can show that the complement of L is context-free!
66
CFG
[67]
Closure under union
If L1 = L(G1) and L2 = L(G2) with disjoint set of variables V1 and V2, and
same alphabet T , we can define
G = (V1 V2 {S}, T, P1 P2 {S S1 | S2}, S)
It is then direct to show that L(G) = L(G1) L(G2) since a derivation has
the form
S S1 u
or
S S2 u
67
CFG
[68]
Non-Closure Under Complement
L1 L2 = L1 L2
So CFL cannot be closed under complement in general. Otherwise they would
be closed under intersection.
68
CFG
[69]
Closure Under Concatenation
If L1 = L(G1) and L2 = L(G2) with disjoint set of variables V1 and V2, and
same alphabet T , we can define
G = (V1 V2 {S}, T, P1 P2 {S S1S2}, S)
It is then direct to show that L(G) = L(G1)L(G2) since a derivation has the
form
S S1S2 u1u2
with
S1 u1,
S2 u2
69
CFG
[70]
LL(1) parsing
A grammar is LL(1) if in a sequence of leftmost production we can decide

what is the production looking only at the first symbol of the string to be parsed
For instance S +SS | a | b is LL(1)
Any regular grammar S aA
a deterministic FA
, A bA | is LL(1) iff it corresponds to
There are algorithms to decide if a grammar is LL(1) (not done in this course)
Any LL(1) grammar is unambiguous (because by definition there is a at most
one left most derivation for any string)
70
CFG
[71]
Grammar transformations
The grammar
S AB, A aA | a, b bB | c
is equivalent to the grammar
S aAB, A aA | , b bB | c
71
CFG
[72]
The grammar
S Bb
B Sa | a
which is not LL(1) is equivalent to the grammar

S abT
T abT |
which is LL(1)
72
CFG
[73]
The grammar
A Aa | b
is equivalent to the grammar
A bB, B aB |
In general however there is no algorithm to decide L(G1) = L(G2)
For regular expression, we have an algorithm to decide L(E1) = L(E2)
73
CFG
[74]
The CYK Algorithm
We present now an algorithm to decide if w L(G), assuming G to be in

Chomsky Normal Form.
This is an example of the technique of dynamic programming
Let n be |w|. The natural algorithm (trying all productions of length < 2n)
may be exponential. This technique gives a O(n3) algorithm!!
74
CFG
[75]
dynamic programming
f ib 0 = f ib 1 = 1
f ib (n + 2) = f ib n + f ib (n + 1)
f ib 5? calls f ib 4, f ib 3 and f ib 4 calls f ib 3
So in a top-down computation there is duplication of works (if one does not
use memoization)
75
CFG
[76]
dynamic programming
For a bottom-up computation

f ib 2 = 2, f ib 3 = 3, f ib 4 = 5, f ib 5 = 8
What is going on in the CYK algorithm or Earley algorithm is similar
S AB | BC,
A BA | a,
B CC | b,
C AB | a
bab L(G)?? and aba L(G)?
76
CFG
[77]
dynamic programming
The idea is to represent bab as the collection of the facts b(0, 1), a(1, 2), b(2, 3)
We compute then the facts X(i, k) for i < k by induction on k i
Only one rule:
If we have a production C AB and A in X(i, j) and B in X(j, k) then C
is in X(i, k)
77
CFG
[78]
The CYK Algorithm
The algorithm is best understood in term of production systems

Example: the grammar
S AB | BA | SS | AC | BD
A a,
B b,
C SB,
D SA
becomes the production system
78
CFG
[79]
The CYK Algorithm
A(x, y), B(y, z) S(x, z),
B(x, y), A(y, z) S(x, z)
S(x, y), S(y, z) S(x, z),
A(x, y), C(y, z) S(x, z)
B(x, y), D(y, z) S(x, z),
S(x, y), B(y, z) C(x, z)
S(x, y), A(y, z) D(x, z),
a(x, y) A(x, y),
b(x, y) B(x, y)
79
CFG
[80]
The CYK Algorithm
The problem if one can one derive S aabbab is transformed to the problem:
can one produce S(0, 6) in this production system given the facts
a(0, 1), a(1, 2), b(2, 3), b(3, 4), a(4, 5), b(5, 6)
80
CFG
[81]
The CYK Algorithm
For this we apply a forward chaining/bottom up sequence of productions

A(0, 1), A(1, 2), B(2, 3), B(3, 4), A(4, 5), B(5, 6)
S(1, 3), S(3, 5), S(4, 6)
S(1, 5), C(1, 4), C(3, 6)
S(0, 4), . . .
S(0, 6)
81
CFG
[82]
The CYK Algorithm
For instance the fact that C(3, 6) is produced corresponds to the derivation
C SB BAB bAB baB bab
In this way, we get a solution in O(n3)!
82
CFG
[83]
Forward-chaining inference
This idea works actually for any grammar. For instance
S SS | aSb |
is represented by the production system
S(x, x),
S(x, y), S(y, z) S(x, z)
a(x, y), S(y, z), b(z, t) S(x, t)

and the problem to decide S aabb is replaced by the problem to derive S(0, 4)
from the facts
a(0, 1), a(1, 2), b(2, 3), b(3, 4)
83
CFG
[84]
Forward-chaining inference
This is the main idea behind Earley algorithm

Mainly used for parsing in computational linguistics
Earley parsers are interesting because they can parse all context-free languages
84
CFG
[85]
Complement of a CLF
We have seen that CLF are not closed under intersection, are closed under
union
It follows that they are not closed under complement
Here is an explicit example: one can show that the complement of
{anbncn | n > 0}
is a CFL
85
CFG
[86]
Undecidable Problems
We have given algorithm to decide L(G) 6= and w L(G). What is

surprising is that it can be shown that there are no algorithms for the following
problems
Given G1 and G2 do we have L(G1) L(G2)? Do we have L(G1) = L(G2)?
Given G and R regular expression, do we have L(G) = L(R)? L(R) L(G)?
Do we have L(G) = T where T is the alphabet of G? (Compare to the case of
regular languages)
Given G is G ambiguous??
86
CFG
[87]
Undecidable Problems
One reduces these problems to the Post Correspondance Problem

Given u1, . . . , un and v1, . . . , vn in {0, 1} is it possible to find i1, . . . , ik such
that
ui1 . . . uik = vi1 . . . vik
Example: 1, 10, 011 and 101, 00, 11
Challenge example: 001, 01, 01, 10 and 0, 011, 101, 001
87
CFG
[88]
Haskell Program
isPrefix [] ys = True
isPrefix (x:xs) (y:ys) = x == y && isPrefix xs ys
isPrefix xs ys = False
isComp (xs,ys) = isPrefix xs ys || isPrefix ys xs
exists p [] = False
exists p (x:xs) = p x || exists p xs
exhibit p (x:xs) = if p x then x else exhibit p xs
88
CFG
[89]
Haskell Program
addNum k [] = []
addNum k (x:xs) = (k,x):(addNum (k+1) xs)
nextStep xs ys =
concat (map (\ (n,(s,t)) ->
map (\ (ns,(u,v)) -> (ns++[n],(u ++ s,v ++ t)))
ys)
xs)
89
CFG
[90]
Haskell Program
mainLoop xs ys =
let
bs = filter (isComp . snd) ys
prop (_,(u,v)) = u == v
in
if exists prop bs then exhibit prop bs
else if bs == [] then error"NO SOLUTION"
else mainLoop xs (nextStep xs bs)
90
CFG
[91]
Haskell Program
post xs =
let
as = addNum 1 xs
in mainLoop as (map (\ (n,z) -> ([n],z)) as)
xs1 = [("1","101"),("10","00"),("011","11")]
xs2 = [("001","0"),("01","011"),("01","101"),("10","001")]
91
CFG
[92]
Haskell Program
Main> post xs1

([1,3,2,3],("101110011","101110011"))
Main> post xs2
ERROR - Garbage collection fails to reclaim sufficient space
[2,2,2,3,2,2,2,3,3,4,4,6,8,8,15,
21,15,17,18,24,15,12,12,18,18,24,24,45,
63,66,84,91,140,182,201,346,418,324,330,321,423,459,780
92
CFG
[93]
Post Correspondance Problem and CFL
To the sequence u1, . . . , un we associate the following grammar GA

The alphabet is {0, 1, a1, . . . , an}
The productions are
A u1a1 | . . . | unan | u1Aa1 | . . . | unAan
This grammar is non ambiguous
93
CFG
[94]
To the sequence v1, . . . , vn we associate the following grammar GB

The alphabet is the same {0, 1, a1, . . . , an}
The productions are
B v1a1 | . . . | vnan | v1Ba1 | . . . | vnBan
This grammar is non ambiguous
94
CFG
[95]
Theorem: We have L(GA)L(GB ) 6= iff the Post Correspondance Problem

for u1, . . . , un and v1, . . . , vn has a solution
95
CFG
[96]
Finally we have the grammar G with productions

SA|B
Theorem: The grammar G is ambiguous iff the Post Correspondance Problem
for u1, . . . , un and v1, . . . , vn has a solution
96
CFG
[97]
The complement of L(GA) is CF

We see this on one example u1 = 0, u2 = 10
The complement of L(GB ) is CF
Hence we have a grammar GC for the union of the complement of L(GA)
and the complement of L(GB )
97
CFG
[98]
Theorem: We have L(GC ) = T iff L(GA) L(GB ) =

Hence the problems
L(E) = L(G)
L(E) L(G)
are in general undecidable
98

L 10

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

L 10

Diunggah oleh

Hak Cipta:

Format Tersedia

CFG

Context-Free Languages and Regular Languages

Theorem: If L is regular then L is context-free.

Context-Free Languages and Regular Languages

Let LX be the language generated by the grammar with X as a start symbol

we prove (mutual induction!) that w LX iff (X,

The intuition here is that LX represents the path from A to X

the NOUN VERB OBJECT

the cat VERB OBJECT

the cat caught OBJECT

the cat caught ARTICLE NOUN

the cat caught a NOUN

the cat caught a dog

Notice that the following generation is possible (rightmost derivation)

A grammar for arithmetical expressions

Definition: A grammar G is ambiguous iff there is some word in L(G) which

An example of ambiguity in programming language is else with the following

Context-Free Languages and Inductive Definitions

Context-Free Languages and Regular Languages

The following language L = {anbn | n > 1} is context-free

Context-Free Languages and Regular Languages

We prove w L(G) implies w L by induction on the derivation of w L(G)

Parse tree for 2 + (1 3)

In Haskell, use of data types for abstract syntax

Definition: A grammar G is ambiguous iff there is some word in L(G) which

Let be {0, 1}.

Theorem: The grammar for S is not ambiguous

The following is a grammar for arithmetical expressions

Theorem: This grammar is not ambiguous

(3) u = iv with i = a or i = b, then the derivation has to be of the form

It follows from this result that we have the following property.

Now, a more complicated example. Let be {0, 1}.

Lemma: L(S)L(S) L(S)

We can also do an induction on the length of a derivation S u

Lemma: L(E) L(S)

The proof that the grammar for S is not ambiguous is difficult

This suggests the following statement (u) to be proved by induction on the

There exists a context-free language L such that for any grammar G if

Eliminating - and unit productions

Definition: A unit production is a production of the form A B with A, B

Elimination of unit productions

Let P1 be a system of productions such that if A B and B are in P1

Elimination of unit productions

Proof: If u L(G1) and S u is a derivation of minimal length then this

Elimination of unit productions

Elimination of unit productions

We eliminate unit productions

If G = (V, T, P, S) build the new system P1 closing P by adding rules

Eliminating - and unit productions

Eliminating - and unit productions

Example: If we start from the grammar

Eliminating - and unit productions

Example: If we start from the grammar

we get first the new productions

and if we eliminate the - and unit productions we get

Eliminating Useless Symbols

A symbol X is useful if there is some derivation S X w where w

By analogy with accessible states, we can define accessible or reachable

Then s is accessible, hence also B and C, and hence D is accessible.

We define when an element of V T (terminal or non terminal symbols) is

Then U, V are generating because U a

Hence S is generating because S U

Eliminating Useless Symbols

Eliminating - and unit productions

Eliminating - and unit productions

Eliminating - and unit productions

Eliminating - and unit productions

and if we eliminate the - and unit productions we get

We can assume that G has no - or unit productions. For each terminal a we

, A bA | is LL(1) iff it corresponds to