Anda di halaman 1dari 98

CFG

[1]

Language of a Grammar

If G is a grammar we write
L(G) = { w T | S w }
Definition: A language L is context-free iff there is a grammar G such that L
= L(G)
start symbol corresponds to start state
variable symbols correspond to states
terminal symbols T correspond to the alphabet

CFG

[2]

Context-Free Languages and Regular Languages

Theorem: If L is regular then L is context-free.


Proof: We know L = L(A) for a DFA A. From A we can build a CFG G
such that L(A) = L(G)
The variables are A, B, C, with start symbol A, the terminal token are 0, 1
and the productions are
A 1A | 0B

B 0B | 1C

C  | 0B | 1C

CFG

[3]

Context-Free Languages and Regular Languages

Let LX be the language generated by the grammar with X as a start symbol

we prove (mutual induction!) that w LX iff (X,


w) = C by induction on |w|
Such a CFG is called right regular
It would be possible also to define L by a left regular language with start state
C
A  | C1 | A1

B A0 | C0 | B0

C B1

The intuition here is that LX represents the path from A to X

CFG

[4]

Example of a derivation

Given the grammar for english above we can generate (leftmost derivation)
SENTENCE SUBJECT VERB OBJECT
ARTICLE NOUN VERB OBJECT
the NOUN VERB OBJECT

the NOUN VERB OBJECT

the cat VERB OBJECT

the cat caught OBJECT

the cat caught ARTICLE NOUN

the cat caught a NOUN

the cat caught a dog

CFG

[5]

Derivation Tree

Notice that the following generation is possible (rightmost derivation)


SENTENCE SUBJECT VERB OBJECT
SUBJECT VERB ARTICLE NOUN
SUBJECT VERB ARTICLE dog
SUBJECT VERB a dog SUBJECT caught a dog
ARTICLE NOUN caught a dog ARTICLE cat caught a dog
the cat caught a dog
5

CFG

[6]

Derivation Tree

Both generation corresponds to the same derivation tree or parse tree which
reflects the internal structure of the sentence
Number of left derivations of one word
= number of right derivations
= number of parse trees

CFG

[7]

A grammar for arithmetical expressions

S (S) | S + S | S S | I
I1|2|3
The terminal symbols are {(, ), +, , 1, 2, 3}
The variable symbols are S and I

CFG

[8]

Ambiguity

Definition: A grammar G is ambiguous iff there is some word in L(G) which


has two distinct derivation trees
Intuitively, there are two possible meaning of this word
Example: the previous grammar for arithmetical expression is ambigiuous
since the word 2 + 1 3 has two possible parse trees

CFG

[9]

Ambiguity

An example of ambiguity in programming language is else with the following


production
C if b then C else C
C if b then C
Cs

CFG

[10]

Ambiguity

A word like
if b then if b then s else s
can be interpreted as
if b then (if b then s else s)
or
if b then (if b then s) else s

10

CFG

[11]

Context-Free Languages and Inductive Definitions


Each CFG can be seen as an inductive definition
For instance the grammar for arithmetical expression can be seen as the
following inductive definition
1, 2, 3 are arithmetical expressions
if w is an arithmetical expression then so is (w)
if w1, w2 are arithmetical expressions then so are w1 + w2 and w1 w2
A natural way to do proofs on context-free languages is to follow this inductive
structure
11

CFG

[12]

Context-Free Languages and Regular Languages

The following language L = {anbn | n > 1} is context-free


We know that it is not regular
Proposition: The following grammar G generates L
S ab | aSb

12

CFG

[13]

Context-Free Languages and Regular Languages

We prove w L(G) implies w L by induction on the derivation of w L(G)


ab L(G)
if w L(G) then awb L(G)
We can prove also w L(G) implies w L by induction on the length of a
derivation S w
We prove anbn L(G) by induction on n

13

CFG

[14]

Abstract Syntax

The parse tree has often too much information w.r.t. the internal structure of
a document. This structure is best reflected by an abstract syntax tree. We give
only an example here.
Here is BNF for arithmetic expression
E E + E | E E | (E) | I

I1|2|3

Parse tree for 2 + (1 3)

14

CFG

[15]

Abstract Syntax

This can be compared with the abstract syntax tree for the expression 2+(13)
Concrete syntax describes the way documents are written while abstract syntax
describes the pure structure of a document.

15

CFG

[16]

Abstract Syntax

In Haskell, use of data types for abstract syntax

data Exp = Plus Exp Exp | Times Exp Exp | Num Atom
data Atom = One | Two | Three
ex = Plus Two (Times One Three)

16

CFG

[17]

Ambiguity

Definition: A grammar G is ambiguous iff there is some word in L(G) which


has two distinct derivation trees
Intuitively, there are two possible meaning of this word
Example: the previous grammar for arithmetical expression is ambigiuous
since the word 2 + 1 3 has two possible parse trees

17

CFG

[18]

Ambiguity

Let be {0, 1}.


The following grammar of parenthesis expressions is ambiguous
E  | EE | 0E1

18

CFG

[19]

A simple example

= {0, 1}
L = {uuR | u }
This language is not regular: using the pumping lemma on 0k 10k
L = L(G) for the grammar
S  | 0S0 | 1S1
We prove that if S v then v L by induction on the length of S v
We prove uuR L(G) if u by induction on the length of u
19

CFG

[20]

A simple example

Theorem: The grammar for S is not ambiguous


Proof: By induction on |v| we prove that there is at most one production
S v

20

CFG

[21]

Polish notation

The following is a grammar for arithmetical expressions


E EE | + EE | I,

Ia|b

Theorem: This grammar is not ambiguous


We show by induction on |u| the following.
Lemma: for any k there is at most one leftmost derivation of E k u

21

CFG

[22]

Polish notation
The proof is by induction on |u|. If |u| = n + 1 with n > 1 there are three
cases
(1) u = +v then the derivation has to be of the form
E k +EEE k1 +v
for a derivation E k+1 v and we conclude by induction hypothesis
(2) u = v then the derivation has to be of the form
E k EEE k1 v
for a derivation E k+1 v and we conclude by induction hypothesis
22

CFG

[23]

Polish notation

(3) u = iv with i = a or i = b, then the derivation has to be of the form


E k iE k1 iv
for a derivation E k1 v and we conclude by induction hypothesis

23

CFG

[24]

Polish notation

It follows from this result that we have the following property.


Corollary: If u1u2 = v1v2 L(E) then u1 = v1 and u2 = v2. Similarly if
+u1u2 = +v1v2 L(E) then u1 = v1 and u2 = v2.
but the result says also that if u L(E) then there is a unique parse tree for
u.

24

CFG

[25]

Ambiguity

Now, a more complicated example. Let be {0, 1}.


The following grammar of parenthesis expressions is ambiguous
E  | EE | 0E1
We replace this by the following equivalent grammar
S 0S1S | 
Lemma: L(S) = L(E)
Theorem: The grammar for S is not ambiguous
25

CFG

[26]

Ambiguity

Lemma: L(S)L(S) L(S)


Proof: we prove that if u L(S) then uL(S) L(S) by induction on |u|
If u =  then uL(S) = L(S)
If |u| = n + 1 then u = 0v1w with v, w L(S) and |v|, |w| 6 n. By induction
hypothesis, we have wL(S) L(S) and so
uL(S) = 0v1wL(S) 0v1L(S) L(S)
since v L(S) and 0L(S)1L(S) L(S) Q.E.D.
26

CFG

[27]

Ambiguity

We can also do an induction on the length of a derivation S u


Using this lemma, we can show L(E) L(S)

27

CFG

[28]

Ambiguity

Lemma: L(E) L(S)


Proof: We prove that if u L(E) then u L(S) by induction on the length
of a derivation E u
If E  = u then u L(S)
If E EE vw = u then by induction v, w L(S) and by the previous
Lemma we have u L(S)
If E 0E1 0v1 = u then by induction v L(S) and so u = 0v1 L(S).
Q.E.D.

28

CFG

[29]

Ambiguity

The proof that the grammar for S is not ambiguous is difficult


One first tries to show that there is at most one left-most derivation
S lm u
for any string u
If u is not  we have that u should be 0u1 and then the derivation should be
S 0S1S 0u1
with S1S u1
29

CFG

[30]

Ambiguity

This suggests the following statement (u) to be proved by induction on the


length of u
For any k there exists at most one leftmost derivation S(1S)k u
We can then prove (u) by induction on |u|
If u =  we should have k = 0 and the derivation has to be S 

30

CFG

[31]

Ambiguity
If (v) holds for |v| = n and |u| = n + 1 then u = 0v or u = 1v with |v| = n.
We have two cases
(1) u = 1v and S(1S)k 1v, the derivation has the form
S(1S)k (1S)k 1v
for a derivation S(1S)k1 v and we conclude by induction hypothesis
(2) u = 0v and S(1S)k 0v, the derivation has the form
S(1S)k 0S1S(1S)k 0v
for a derivation S(1S)k+1 v and we conclude by induction hypothesis
31

CFG

[32]

Inherent Ambiguity

There exists a context-free language L such that for any grammar G if


L = L(G) then G is ambiguous
L = {anbncmdm | n, m > 1} {anbmcmdn | n, m > 1}
L is context-free
S AB | C

A aAb | ab

B cBd | cd

C aCd | aDd

D bDc | bc

32

CFG

[33]

Eliminating - and unit productions

Definition: A unit production is a production of the form A B with A, B


non terminal symbols.
This is similar to -transitions in a -NFA
Definition: A -production is a production of the form A 
Theorem: For any CFG G there exists a CFG G0 with no - or unit productions
such that L(G0) = L(G) {}

33

CFG

[34]

Elimination of unit productions

Let P1 be a system of productions such that if A B and B are in P1


then so is A and G1 = (V, T, P1, S).
Let P2 the set of non unit productions of P1 and G2 = (V, T, P2, S)
Theorem: L(G1) = L(G2)

34

CFG

[35]

Elimination of unit productions

Proof: If u L(G1) and S u is a derivation of minimal length then this


derivation is in G2. Otherwise it has the shape
S 1A2 1B2 n 1B2 12 u
and we have a shorter derivation
S 1A2 n 1A2 12 u
contradiction.

35

CFG

[36]

Elimination of unit productions

S CBh | D
A aaC
B Sf | ggg
C cA | d | C
D E | SABC
E be

36

CFG

[37]

Elimination of unit productions

We eliminate unit productions


S SABC | be | CBh
A aaC
B Sf | ggg
C cA | d

37

CFG

[38]

Elimination of -productions

If G = (V, T, P, S) build the new system P1 closing P by adding rules


If A B and B  then A
We have L(G1) = L(G). Let P2 the system obtained from P1 by taking away
all -productions
Theorem: L(G2) = L(G) {}
Proof: We clearly have L(G2) L(G1). We prove that if S u, u T
and u 6=  is a production of minimal length then it does not use any -production,
so it is a derivation in G2. Q.E.D.

38

CFG

[39]

Eliminating - and unit productions


Starting from G = (V, T, P, S) we build a larger set P1 of productions
containing P and closed under the two rules
1. if A w1Bw2 and B  are in P1 then A w1w2 is in P1
2. if A B and B w are in P1 then so is A w
We add only productions whose right-handside is a subtring of an old righthandside, so this process stops.
It can be shown that if L(V, T, P1, S) = L(G) and that if P 0 is the set of
productions in P1 that are not - neither unit production then L(V, T, P 0, S) =
L(G) {}
39

CFG

[40]

Eliminating - and unit productions

Example: If we start from the grammar


S aSb | SS | 
we get first the new productions
S ab | S | S
and if we eliminate the - and unit productions we get
S aSb | SS | ab

40

CFG

[41]

Eliminating - and unit productions

Example: If we start from the grammar


S AB

A aAA | 

B bBB | 

we get first the new productions


S A|B

A aA | a

B bB | b

and if we eliminate the - and unit productions we get


S AB | aAA | aA | a | bB | b

A aAA | aA | a

B bBB | bB | b

41

CFG

[42]

Eliminating Useless Symbols

A symbol X is useful if there is some derivation S X w where w


is in T
X can be in V or T
X is useless iff it is not useful
X is generating iff X w for some w in T
X is reacheable iff S X for some ,

42

CFG

[43]

Reachable Symbols

By analogy with accessible states, we can define accessible or reachable


symbols. We give an inductive definition
BASIS: The start symbol S is reachable
INDUCTION: If A is reachable and A w is a production, then all symbols
occuring in w are reachable.

43

CFG

[44]

Reachable Symbols
Example: Consider the following CFG
S aB | BC
B DB | C

A aA | c | aDb
Cb

DB

Then s is accessible, hence also B and C, and hence D is accessible.


But A is not accessible.
We can take away A from this grammar and we get the same language
S aB | BC

B DB | C

Cb

DB
44

CFG

[45]

Generating Symbols

We define when an element of V T (terminal or non terminal symbols) is


generating by an inductive definition
BASIS: all elements of T are generating
INDUCTION: if there is a production X w where all symbols occuring in
w are generating then X is generating
This gives exactly the generating variables

45

CFG

[46]

Generating Symbols

Example: We consider
S aS | W | U
U a

W aW

V aa

Then U, V are generating because U a

V aa

Hence S is generating because S U


W is not generating, we have only W aW for production for W

46

CFG

[47]

Eliminating Useless Symbols

To eliminate useless symbols in a grammar G, first eliminate all nongenerating


symbols we get an equivalent grammar G1 and then eliminate all symbols in G1
that are non reachable.
We get a grammar G2 that is equivalent to G1 and to G
We have to do this in this order
Examples: For the grammar
S AB | a

Ab
47

CFG

[48]

B is not generating, we get the grammar


Sa

Ab

and then A is not reachable we get the grammar


Sa

48

CFG

[49]

Elimination of useless variables

S gAe | aY B | CY,
B dd | D,
D n,

C jV B | gi

U kW

V baXXX | oV,
X f V,

A bBY | ooC

W c

Y Y hm

49

CFG

[50]

Elimination of useless variables

Simplified grammar
S gAe
A ooC
C gi

50

CFG

[51]

Linear production systems

Several algorithms we have seen are instances of graph searching


algorithm/derivability in linear production systems

51

CFG

[52]

Linear Production systems


For testing for accessibility, for the grammar
S aB | BC,
B DB | C,

A aA | c | aDb
Cb|B

we associate the production system


S,

S B,

SC

A A,

A D,

BB

B D,

B C,

CB

and we can produce S, B, D, C


52

CFG

[53]

Linear Production systems

A lot of problems in elementary logic are of this form


A B,

B C,

A, C D

What can we deduce from A?

53

CFG

[54]

Linear Production systems

For computing generating symbols we have a more general form of production


system
For instance for the grammar
A ABC,

A C,

B Ca,

Ca

we can associate the following production system


A, B, C A,

C A,

C B,

and we can produce C, B, A. There is an algorithm for this kind of problem


in 7.4.3
54

CFG

[55]

Chomsky Normal Form

Definition: A CFG is in Chomsky Normal Form (CNF) iff all productions are
of the form A BC or A a
Theorem: For any CFG G there is a CFG G0 in Chomsky Normal Form such
that L(G0) = L(G) {}

55

CFG

[56]

Chomsky Normal Form

We can assume that G has no - or unit productions. For each terminal a we


introduce a new nonterminal Aa with the production
Aa a
We can then assume that all productions are of the form A a or A
B1B2 . . . Bk with k > 2
If k > 2 we introduce C with productions A B1C and C B2 . . . Bk until
we have only right-hand sides of length 6 2

56

CFG

[57]

Chomsky Normal Form

Example: For the grammar


S aSb | SS | ab
we get first
S ASB | SS | AB

Aa

Bb

and then
S AC | SS | AB

Aa

Bb

C SB

which is in Chomsky Normal Form

57

CFG

[58]

The Chomsky Hierarchy


Noam Chomsky 1956
Four types of grammars
Type 0: no restrictions
Type 1: Context-sensitive, rules A
Type 2: Context-free or context-insensitive
Type 3: Regular, rules of the form A Ba or A aB or A 
Type 3 Type 2 Type 1 Type 0
Grammars for programming languages are usually Type 2
58

CFG

[59]

Context-Free Languages and Regular Languages

Theorem: If L is regular then L is context-free.


Proof: We know L = L(A) for a DFA A. We have A = (Q, , , q0, F ).
We define a CFG G = (Q, , P, q0) where P is the set of productions q aq 0
u) = q 0 and
if (q, a) = q 0 and q  if q F . We have then q uq 0 iff (q,
u) F . In particular u L(G) iff u L(A).
q u iff (q,
A grammar where all productions are of the form A aB or A  is called
left regular

59

CFG

[60]

Pumping Lemma for Left Regular Languages

Let G = (V, T, P, S) be a left regular language, and let N be |V |.


If a1 . . . ar is a string of length N any derivation
S a1B1 a1a2B2 a1 . . . aiA
a1 . . . aj A a1 . . . an
has length n and there is at least one variable A which is used twice (pigeon-hole
principle)
If x = a1 . . . ai and y = ai+1 . . . aj and z = aj+1 . . . an we have |xy| 6 N and
xy k z L(G) for all k
60

CFG

[61]

Pumping Lemma for Context-Free Languages

Let L be a context-free language


Theorem: There exists N such that if z L and N 6 |z| then one can write
z = uvwxy such that
z = uvwxy,

|vx| > 0,

|vwx| 6 N,

uv k wxk y L for all k

61

CFG

[62]

Pumping Lemma for Context-Free Languages

Theorem: The language {ak bk ck | k > 0} is not context-free


Proof: Assume L to be context-free. Then we have N as stated in the
Pumping Lemma. Consider z = aN bN cN . We have N 6 |z| so we can write
z = uvwxy such that
z = uvwxy,

|vx| > 0,

|vwx| 6 N,

uv k wxk y L for all k

Since |vwx| 6 N there is one letter d {a, b, c} that occurs not in vwx, and
since |vx| > 0 there is another letter e 6= d that occurs in vx. Then e has more
occurence than d in uv 2wx2y, and this contradicts uv 2wx2y L. Q.E.D.

62

CFG

[63]

Proof of the CFL Pumping Lemma

We can assume that the language is presented by a grammar in Chomsky


Normal Form, working with L {}
The crucial remark is that a binary tree with height p + 1 has at most 2p
leaves
The height of a binary tree is the number of nodes from the root to the
longest path

63

CFG

[64]

Proof of the CFL Pumping Lemma

Example: the Chomsky grammar


S AC | AB,

A a,

B b,

C SB

consider a parse tree for a4b4 corresponding to the derivation


S AC aC aSB aACB aaCB aaSBB
a2ACBB a3CBB a3SBBB a3ABBBB a4BBBB
a4BBBB a4bBBB a4b2BB a4b3B a4b4
The symbol S appears twice on a path u = aa, v = a, w = ab, x = b, y = bb
64

CFG

[65]

Non closure under intersection

T = {a, b, c}
L1 = {ak bk cm | k, m > 0}
L2 = {ambk ck | k, m > 0}
L1 and L2 are CFL, but the intersection
L1 L2 = {ak bk ck | k > 0}
is not CF

65

CFG

[66]

Non closure under intersection

However one can show (we will not do the proof in this course, but you should
know the result)
Theorem: If L1 is context-free and L2 is regular then L1 L2 is
context-free
Application: The following language, for = {0, 1}
L = {uu | u }
is not context-free, by considering the intersection with L(0101)
One can show that the complement of L is context-free!
66

CFG

[67]

Closure under union

If L1 = L(G1) and L2 = L(G2) with disjoint set of variables V1 and V2, and
same alphabet T , we can define
G = (V1 V2 {S}, T, P1 P2 {S S1 | S2}, S)
It is then direct to show that L(G) = L(G1) L(G2) since a derivation has
the form
S S1 u
or
S S2 u

67

CFG

[68]

Non-Closure Under Complement

L1 L2 = L1 L2
So CFL cannot be closed under complement in general. Otherwise they would
be closed under intersection.

68

CFG

[69]

Closure Under Concatenation

If L1 = L(G1) and L2 = L(G2) with disjoint set of variables V1 and V2, and
same alphabet T , we can define
G = (V1 V2 {S}, T, P1 P2 {S S1S2}, S)
It is then direct to show that L(G) = L(G1)L(G2) since a derivation has the
form
S S1S2 u1u2
with
S1 u1,

S2 u2

69

CFG

[70]

LL(1) parsing

A grammar is LL(1) if in a sequence of leftmost production we can decide


what is the production looking only at the first symbol of the string to be parsed
For instance S +SS | a | b is LL(1)
Any regular grammar S aA
a deterministic FA

, A bA |  is LL(1) iff it corresponds to

There are algorithms to decide if a grammar is LL(1) (not done in this course)
Any LL(1) grammar is unambiguous (because by definition there is a at most
one left most derivation for any string)

70

CFG

[71]

Grammar transformations

The grammar
S AB, A aA | a, b bB | c
is equivalent to the grammar
S aAB, A aA | , b bB | c

71

CFG

[72]

Grammar transformations

The grammar
S Bb

B Sa | a

which is not LL(1) is equivalent to the grammar


S abT

T abT | 

which is LL(1)

72

CFG

[73]

Grammar transformations

The grammar
A Aa | b
is equivalent to the grammar
A bB, B aB | 
In general however there is no algorithm to decide L(G1) = L(G2)
For regular expression, we have an algorithm to decide L(E1) = L(E2)

73

CFG

[74]

The CYK Algorithm

We present now an algorithm to decide if w L(G), assuming G to be in


Chomsky Normal Form.
This is an example of the technique of dynamic programming
Let n be |w|. The natural algorithm (trying all productions of length < 2n)
may be exponential. This technique gives a O(n3) algorithm!!

74

CFG

[75]

dynamic programming

f ib 0 = f ib 1 = 1
f ib (n + 2) = f ib n + f ib (n + 1)
f ib 5? calls f ib 4, f ib 3 and f ib 4 calls f ib 3
So in a top-down computation there is duplication of works (if one does not
use memoization)

75

CFG

[76]

dynamic programming

For a bottom-up computation


f ib 2 = 2, f ib 3 = 3, f ib 4 = 5, f ib 5 = 8
What is going on in the CYK algorithm or Earley algorithm is similar
S AB | BC,

A BA | a,

B CC | b,

C AB | a

bab L(G)?? and aba L(G)?

76

CFG

[77]

dynamic programming

The idea is to represent bab as the collection of the facts b(0, 1), a(1, 2), b(2, 3)
We compute then the facts X(i, k) for i < k by induction on k i
Only one rule:
If we have a production C AB and A in X(i, j) and B in X(j, k) then C
is in X(i, k)

77

CFG

[78]

The CYK Algorithm

The algorithm is best understood in term of production systems


Example: the grammar
S AB | BA | SS | AC | BD
A a,

B b,

C SB,

D SA

becomes the production system

78

CFG

[79]

The CYK Algorithm

A(x, y), B(y, z) S(x, z),

B(x, y), A(y, z) S(x, z)

S(x, y), S(y, z) S(x, z),

A(x, y), C(y, z) S(x, z)

B(x, y), D(y, z) S(x, z),

S(x, y), B(y, z) C(x, z)

S(x, y), A(y, z) D(x, z),

a(x, y) A(x, y),

b(x, y) B(x, y)

79

CFG

[80]

The CYK Algorithm

The problem if one can one derive S aabbab is transformed to the problem:
can one produce S(0, 6) in this production system given the facts
a(0, 1), a(1, 2), b(2, 3), b(3, 4), a(4, 5), b(5, 6)

80

CFG

[81]

The CYK Algorithm

For this we apply a forward chaining/bottom up sequence of productions


A(0, 1), A(1, 2), B(2, 3), B(3, 4), A(4, 5), B(5, 6)
S(1, 3), S(3, 5), S(4, 6)
S(1, 5), C(1, 4), C(3, 6)
S(0, 4), . . .
S(0, 6)

81

CFG

[82]

The CYK Algorithm

For instance the fact that C(3, 6) is produced corresponds to the derivation
C SB BAB bAB baB bab
In this way, we get a solution in O(n3)!

82

CFG

[83]

Forward-chaining inference
This idea works actually for any grammar. For instance
S SS | aSb | 
is represented by the production system
S(x, x),

S(x, y), S(y, z) S(x, z)

a(x, y), S(y, z), b(z, t) S(x, t)


and the problem to decide S aabb is replaced by the problem to derive S(0, 4)
from the facts
a(0, 1), a(1, 2), b(2, 3), b(3, 4)
83

CFG

[84]

Forward-chaining inference

This is the main idea behind Earley algorithm


Mainly used for parsing in computational linguistics
Earley parsers are interesting because they can parse all context-free languages

84

CFG

[85]

Complement of a CLF

We have seen that CLF are not closed under intersection, are closed under
union
It follows that they are not closed under complement
Here is an explicit example: one can show that the complement of
{anbncn | n > 0}
is a CFL

85

CFG

[86]

Undecidable Problems

We have given algorithm to decide L(G) 6= and w L(G). What is


surprising is that it can be shown that there are no algorithms for the following
problems
Given G1 and G2 do we have L(G1) L(G2)? Do we have L(G1) = L(G2)?
Given G and R regular expression, do we have L(G) = L(R)? L(R) L(G)?
Do we have L(G) = T where T is the alphabet of G? (Compare to the case of
regular languages)
Given G is G ambiguous??

86

CFG

[87]

Undecidable Problems

One reduces these problems to the Post Correspondance Problem


Given u1, . . . , un and v1, . . . , vn in {0, 1} is it possible to find i1, . . . , ik such
that
ui1 . . . uik = vi1 . . . vik
Example: 1, 10, 011 and 101, 00, 11
Challenge example: 001, 01, 01, 10 and 0, 011, 101, 001

87

CFG

[88]

Haskell Program

isPrefix [] ys = True
isPrefix (x:xs) (y:ys) = x == y && isPrefix xs ys
isPrefix xs ys = False
isComp (xs,ys) = isPrefix xs ys || isPrefix ys xs
exists p [] = False
exists p (x:xs) = p x || exists p xs
exhibit p (x:xs) = if p x then x else exhibit p xs

88

CFG

[89]

Haskell Program

addNum k [] = []
addNum k (x:xs) = (k,x):(addNum (k+1) xs)
nextStep xs ys =
concat (map (\ (n,(s,t)) ->
map (\ (ns,(u,v)) -> (ns++[n],(u ++ s,v ++ t)))
ys)
xs)

89

CFG

[90]

Haskell Program

mainLoop xs ys =
let
bs = filter (isComp . snd) ys
prop (_,(u,v)) = u == v
in
if exists prop bs then exhibit prop bs
else if bs == [] then error"NO SOLUTION"
else mainLoop xs (nextStep xs bs)

90

CFG

[91]

Haskell Program

post xs =
let
as = addNum 1 xs
in mainLoop as (map (\ (n,z) -> ([n],z)) as)
xs1 = [("1","101"),("10","00"),("011","11")]
xs2 = [("001","0"),("01","011"),("01","101"),("10","001")]

91

CFG

[92]

Haskell Program

Main> post xs1


([1,3,2,3],("101110011","101110011"))
Main> post xs2
ERROR - Garbage collection fails to reclaim sufficient space
[2,2,2,3,2,2,2,3,3,4,4,6,8,8,15,
21,15,17,18,24,15,12,12,18,18,24,24,45,
63,66,84,91,140,182,201,346,418,324,330,321,423,459,780

92

CFG

[93]

Post Correspondance Problem and CFL

To the sequence u1, . . . , un we associate the following grammar GA


The alphabet is {0, 1, a1, . . . , an}
The productions are
A u1a1 | . . . | unan | u1Aa1 | . . . | unAan
This grammar is non ambiguous

93

CFG

[94]

Post Correspondance Problem and CFL

To the sequence v1, . . . , vn we associate the following grammar GB


The alphabet is the same {0, 1, a1, . . . , an}
The productions are
B v1a1 | . . . | vnan | v1Ba1 | . . . | vnBan
This grammar is non ambiguous

94

CFG

[95]

Post Correspondance Problem and CFL

Theorem: We have L(GA)L(GB ) 6= iff the Post Correspondance Problem


for u1, . . . , un and v1, . . . , vn has a solution

95

CFG

[96]

Post Correspondance Problem and CFL

Finally we have the grammar G with productions


SA|B
Theorem: The grammar G is ambiguous iff the Post Correspondance Problem
for u1, . . . , un and v1, . . . , vn has a solution

96

CFG

[97]

Post Correspondance Problem and CFL

The complement of L(GA) is CF


We see this on one example u1 = 0, u2 = 10
The complement of L(GB ) is CF
Hence we have a grammar GC for the union of the complement of L(GA)
and the complement of L(GB )

97

CFG

[98]

Post Correspondance Problem and CFL

Theorem: We have L(GC ) = T iff L(GA) L(GB ) =


Hence the problems
L(E) = L(G)
L(E) L(G)
are in general undecidable

98

Anda mungkin juga menyukai