Final3 PDF

Theory of Computation Class Notes1
based on the books by Sudkamp and by Hopcroft, Motwani and Ullman
ii
Contents
1 Introduction
1.1 Sets . . . . . . . . . . . . . . .
1.2 Functions and Relations . . . .
1.3 Countable and uncountable sets
1.4 Proof Techniques . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
5
5
2 Languages and Grammars

2.1 Languages . . . . . . . . . . . . . . . . . . .
2.2 Regular Expressions . . . . . . . . . . . . .
2.3 Grammars . . . . . . . . . . . . . . . . . . .
2.4 Classification of Grammars and Languages .
2.5 Normal Forms of Context-Free Grammars .
2.5.1 Chomsky Normal Form (CNF) . . .
2.5.2 Greibach Normal Form (GNF) . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
12
13
18
19
19
19
3 Finite State Automata

3.1 Deterministic Finite Automata (DFA) . . . . . . .
3.2 Nondeterministic Finite Automata (NFA) . . . . .
3.3 NFA with Epsilon Transitions (NFA- or -NFA)) .
3.4 Finite Automata and Regular Sets . . . . . . . . .
3.4.1 Removing Nondeterminism . . . . . . . . .
3.4.2 Expression Graphs . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
22
23
24
24
27
4 Regular Languages and Sets

4.1 Regular Grammars and Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Closure Properties of Regular Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3 Pumping Lemma for Regular Languages . . . . . . . . . . . . . . . . . . . . . . . . . .
33
33
35
35
5 Pushdown Automata and Context-Free Languages

5.1 Pushdown Automata . . . . . . . . . . . . . . . . . .
5.2 Variations on the PDA Theme . . . . . . . . . . . .
5.3 Pushdown Automata and Context-Free Languages .
5.4 The Pumping Lemma for Context-Free Languages .
5.5 Closure Properties of Context-Free Languages . . . .
5.6 A Two-Stack Automaton . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
37
37
38
39
40
42
42
6 Turing Machines
6.1 The Standard Turing Machine . . . . .
6.1.1 Notation for the Turing Machine
6.2 Turing Machines as Language Acceptors
6.3 Alternative Acceptance Criteria . . . . .
6.4 Multitrack Machines . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
45
45
46
46
48
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
iii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
iv
CONTENTS
6.5
6.6
6.7
6.8
Two-Way Tape Machines . . . . . . . . . .

Multitape Machines . . . . . . . . . . . . .
Nondeterministic Turing Machines . . . . .
Turing Machines as Language Enumerators
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49
49
49
50
7 The Chomsky Hierarchy

7.1 The Chomsky Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
51
8 Decidability
8.1 Decision Problems . . . . . . . . . . . . .
8.2 The Church-Turing Thesis . . . . . . . . .
8.3 The Halting Problem for Turing Machines
8.4 A Universal Machine . . . . . . . . . . . .
8.5 The Post Correspondence Problem . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
53
54
54
55
56
9 Undecidability
9.1 Problems That Computers Cannot Solve . . . . . . . . . . .
9.1.1 Programs that Print Hello, World . . . . . . . . .
9.1.2 The Hypothetical Hello, World Tester . . . . . . .
9.1.3 Reducing One Problem to Another . . . . . . . . . .
9.2 A Language That Is Not Recursively Enumerable . . . . . .
9.2.1 Enumerating the Binary Strings . . . . . . . . . . .
9.2.2 Codes for Turing Machines . . . . . . . . . . . . . .
9.2.3 The Diagonalization Language . . . . . . . . . . . .
9.2.4 Proof that Ld is not Recursively Enumerable . . . .
9.2.5 Complements of Recursive and RE languages . . . .
9.2.6 The Universal Language . . . . . . . . . . . . . . . .
9.2.7 Undecidability of the Universal Language . . . . . .
9.3 Undecidable Problems About Turing Machines . . . . . . .
9.3.1 Reductions . . . . . . . . . . . . . . . . . . . . . . .
9.3.2 Turing Machine That Accepts the Empty Language
9.3.3 Rices Theorem and Properties of RE Languages . .
9.4 Posts Correspondence Problem . . . . . . . . . . . . . . . .
9.4.1 The Modified PCP . . . . . . . . . . . . . . . . . . .
9.5 Other Undecidable Problems . . . . . . . . . . . . . . . . .
9.5.1 Undecidability of Ambiguity for CFGs . . . . . . .
9.5.2 The Complement of a List Language . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
59
59
59
60
62
63
63
63
64
64
65
67
68
69
69
70
72
77
77
77
77
78
10 Intractable Problems
10.1 The Classes P and N P . . . . . . . . . . . . . . . . . . . .
10.1.1 Problems Solvable in Polynomial Time . . . . . . . .
10.1.2 An Example: Kruskals Algorithm . . . . . . . . . .
10.1.3 An N P Example: The Travelling Salesman Problem
10.1.4 NP-complete Problems . . . . . . . . . . . . . . . . .
10.1.5 The Satisfiability Problem . . . . . . . . . . . . . . .
10.1.6 NP-Completeness of 3SAT . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
81
81
81
81
84
84
84
84
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Figures
2.1
Derivation tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
Example DFA . . . . . . . . . . . . . . . . . . .
L(M1 ) L(M2 ) . . . . . . . . . . . . . . . . . .
L(M1 )L(M2 ) . . . . . . . . . . . . . . . . . . .
L(M1 ) . . . . . . . . . . . . . . . . . . . . . .
Sample Union Construction . . . . . . . . . . .
Machines that accept the primitive regular sets
An NFA- . . . . . . . . . . . . . . . . . . . . .
Equivalent DFA . . . . . . . . . . . . . . . . . .
Expression Graph . . . . . . . . . . . . . . . . .
Expression Graph Transformation . . . . . . .
(a) w , (b) w1 w2 (w3 w4 (w1 ) w2 ) . . . . . .
Example 3.4.3 - 1(a),(b) . . . . . . . . . . . . .
Example 3.4.3 - 2(a),(b) . . . . . . . . . . . . .
Example 3.4.3 - 2(c),(d) . . . . . . . . . . . . .
Example 3.4.3 - 3 . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
24
24
24
25
25
26
26
27
27
27
29
29
30
31
4.1
4.2
NFA accepts a (a b+ ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example 4.1.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
34
5.1
5.2
5.3
L = {ai |i 0} {ai bi |i 0} . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PDA L(M ) = ww R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Pumping Lemma for CFL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
38
41
6.1
6.2
6.3
6.4
6.5
A Turing Machine . . . . . . . .
Turing Machine COPY . . . . . .
TM accepting (a b) aa(a b) .
TM accepting ai bi ci . . . . . . .
A k-tape TM for L = ai bi ci . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
46
47
47
48
50
8.1
8.2
8.3
8.4
8.5
8.6
8.7
Halting Machine . . . . . . . . . . . .
Turing Machine D with R(M) as input
Turing Machine D with R(D) as input
Universal Machine . . . . . . . . . . .
Post Correspondence System . . . . .
Post Correspondence Solution . . . . .
Example 8.5.1 . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
54
55
55
55
57
57
58
9.1
9.2
9.3
Hello-World Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fermats last theorem expressed as a hello-world program . . . . . . . . . . . . . . . .
A hypothetical program H that is a hello-world detector . . . . . . . . . . . . . . . . .
59
60
60
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vi
LIST OF FIGURES
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14
9.15
9.16
9.17
9.18
9.19
9.20
H1 behaves like H, but it says hello, world instead of no . . . . . . . . .

H2 behaves like H1 , but uses its input P as both P and I . . . . . . . .
What does H2 do when given itself as input? . . . . . . . . . . . . . . .
Reduction of P1 to P2 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The table that represents acceptance of strings by Turing machines . . .
Relationship between the recursive, RE, and non-RE languages . . . . .
Construction of a TM accepting the complement of a recursive language
Simulation of two TMs accepting a language and its complement . . . .
Organization of a universal Turing machine . . . . . . . . . . . . . . . .
Reduction of Ld to Lu . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Reductions turn positive instances into positive and negative to negative
Construction of a NTM to accept Lne . . . . . . . . . . . . . . . . . . .
Plan of TM M 0 constructed from (M, w) . . . . . . . . . . . . . . . . .
Construction of a M 0 for the proof of Rices Theorem . . . . . . . . . .
Turing Machine that accepts after guessing 10 strings . . . . . . . . . .
Turing Machine that simulates M on w . . . . . . . . . . . . . . . . . .
Turing Machine for L Lu . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
61
62
62
63
65
65
66
66
68
69
69
70
71
73
75
76
76
10.1 A graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
Chapter 1
Introduction
1.1
Sets
A set is a collection of elements. To indicate that x is an element of the set S, we write x S. The
statement that x is not in S is written as x
/ S. A set is specified by enclosing some description of
its elements in curly braces; for example, the set of all natural numbers 0, 1, 2, is denoted by
N = {0, 1, 2, 3, }.
We use ellipses (i.e.,. . .) when the meaning is clear, thus Jn = {1, 2, 3, , n} represents the set of all
natural numbers from 1 to n.
When the need arises, we use more explicit notation, in which we write
S = {i|i 0, i is even}
for the last example. We read this as S is the set of all i, such that i is greater than zero, and i is
even.
Considering a universal set U, the complement S of S is defined as
S = {x|x U x
/ S}
The usual set operations are union (), intersection (), and difference(), defined as
S1 S2 = {x|x S1 x S2 }
S1 S2 = {x|x S1 x S2 }
S1 S2 = {x|x S1 x
/ S2 }
The set with no elements, called the empty set is denoted by . It is obvious that
S=S=S
S=
=U
S = S
A set S1 is said to be a subset of S if every element of S1 is also an element of S. We write this as

S1 S
If S1 S, but S contains an element not in S1 , we say that S1 is a proper subset of S; we write this
as
S1 S
1
CHAPTER 1. INTRODUCTION
The following identities are known as the de Morgans laws,

1. S1 S2 = S1 S2 ,
2. S1 S2 = S1 S2 ,
1. S1 S2 = S1 S2 ,
x S1 S2
x U and x
/ S 1 S2
x U and (x S1 or x S2 )
(def.union)
x U and ((x S1 ) and (x S2 ))

x U and (x
/ S1 and x
/ S2 )
(negation of disjunction)
(x S1 and x S2 )
(def.complement)
(x U and x
/ S1 ) and (x U and x
/ S2 )
x S 1 S2
(def.intersection)
If S1 and S2 have no common element, that is,

S1 S2 = ,
then the sets are said to be disjoint.
A set is said to be finite if it contains a finite number of elements; otherwise it is infinite. The size of
a finite set is the number of elements in it; this is denoted by |S| (or #S).
A set may have many subsets. The set of all subsets of a set S is called the power set of S and is
denoted by 2S or P (S).
Observe that 2S is a set of sets.
Example 1.1.1
If S is the set {1, 2, 3}, then its power set is
2S = {, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}}
Here |S| = 3 and |2S | = 8. This is an instance of a general result, if S is finite, then
|2S | = 2|S|
Proof:
(By induction on the number of elements in S).
Basis: |S| = 1 2S = {, S} |2S | = 21 = 2

Induction Hypothesis: Assume the property holds for all sets S with k elements.
Induction Step: Show that the property holds for (all sets with) k + 1 elements.
Denote
Sk+1 = {y1 , y2 , . . . , yk+1 }
= Sk {yk+1 }
1.2. FUNCTIONS AND RELATIONS
where Sk = {y1 , y2 , y3 , . . . , yk }
2Sk+1 = 2Sk {yk+1 }
{y1 , yk+1 } {y2 , yk+1 } . . . {yk , yk+1 }
x,ySk {x, y, yk+1 } . . .

Sk+1
2Sk has 2k elements by the induction hypothesis.
The number of sets in 2Sk+1 which contain yk+1 is also 2k .
Consequently |2Sk+1 | = 2 2k = 2k+1 .
A set which has as its elements ordered sequences of elements from other sets is called the Cartesian
product of the other sets. For the Cartesian product of two sets, which itself is a set of ordered pairs,
we write
S = S1 S2 = {(x, y) | x S1 , y S2 }
Example 1.1.2
Let S1 = {1, 2} and S2 = {1, 2, 3}. Then
S1 S2 = {(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3)}
Note that the order in which the elements of a pair are written matters; the pair (3, 2) is not in S 1 S2 .
Example 1.1.3
If A is the set of throws of a coin, i.e., A ={head,tail}, then
A A = {(head,head),(head,tail),(tail,head),(tail,tail)}
the set of all possible throws of two coins.
The notation is extended in an obvious fashion to the Cartesian product of more than two sets;
generally
S1 S2 Sn = {(x1 , x2 , , xn ) | xi Si }
1.2
Functions and Relations
A function is a rule that assigns to elements of one set (the function domain a unique element of
another set (the range). We write
f : S1 S2
to indicate that the domain of the function f is a subset of S1 and that the range of f is a subset of
S2 . If the domain of f is all of S1 , we say that f is a total function on S1 ; otherwise f is said to be a
partial function on S1 .
1. Domain f = {x S1 | (x, y) f, for some y S2 } = Df
2. Range f = {y S2 | (x, y) f, for some x S1 } = Rf
3. The restriction of f to A S1 , f |A = {(x, y) f | x A}
4. The inverse f 1 : S2 S1 is {(y, x) | (x, y) f }
5. f : S1 S1 is called a function on S1
6. If x Df then f is defined at x; otherwise f is undefined at x;
7. f is a total function if Df = S1 .
8. f is a partial function if Df S1
9. f is an onto function or surjection if Rf = S2 . If Rf S2 then f is a function from S1 (Df )
into S2
10. f is a one to one function or injection if (f (x) = z and f (y) = z) x = y
11. A total function f is a bijection if it is both an injection and a surjection.
A function can be represented by a set of pairs {(x1 , y1 ), (x2 , y2 ), }, where each xi is an
element in the domain of the function, and yi is the corresponding value in its range. For such
a set to define a function, each xi can occur at most once as the first element of a pair. If this
is not satisfied, such a set is called a relation.
A specific kind of relation is an equivalence relation. A relation denoted r on X is an equivalence

relation if it satisfies three rules,
the reflexivity rule:
(x, x) r
x X
the symmetry rule:
(x, y) r then (y, x) r
x, y X
and
the transitivity rule:
(x, y) r, (y, z) r then (x, z) r
x, y, z X
An equivalence relation on X induces a partition on X into disjoint subsets called equivalence classes
Xj , j Xj = X, such that elements from the same class belong to the relation, and any two elements
taken from different classes are not in the relation.
Example 1.2.1
The relation congruence mod m (modulo m) on the set of the integers Z.
i = j mod m if i j is divisible by m; Z is partitioned into m equivalence classes:
{ , 2m, m, 0, m, 2m, }
{ , 2m + 1, m + 1, 1, m + 1, 2m + 1, }
{ , 2m + 2, m + 2, 2, m + 2, 2m + 2, }
{ , m 1, 1, m 1, 2m, 3m 1, }
1.3. COUNTABLE AND UNCOUNTABLE SETS
1.3
Countable and uncountable sets
Cardinality is a measure that compares the size of sets. The cardinality of a finite set is the number
of elements in it. The cardinality of a finite set can thus be obtained by counting the elements of the
set.Two sets X and Y have the same cardinality if there is a total one to one function from X onto
Y (i.e., a bijection from X to Y ). The cardinality of a set X is less than or equal to the cardinality
of a set Y if there is a total one to one function from X into Y . We denote cardinality of X by #X
or |X|.
A set that has the same cardinality as the set of natural numbers N , is said to be countably infinite
or denumerable. Sets that are either finite or denumerable are referred to as countable sets. The
elements of a countable set can be indexed (or enumerated) using N as the index set. Sets that are
not countable are said to be uncountable.
The cardinality of denumerable sets is #N = 0 (aleph0 )
The cardinality of the set of the real numbers, #R = 1 (aleph1 )
A set is infinite if it has proper subset of the same cardinality.
Example 1.3.1
The set J = N {0} is countably infinite; the function s(n) = n + 1 defines a one-to-one mapping
from N onto J . The set J , obtained by removing an element from N , has the same cardinality as
N . Clearly, there is no one to one mapping of a finite set onto a proper subset of itself. It is this
property that differentiates finite and infinite sets.
Example 1.3.2
The set of odd natural numbers is denumerable. The function f (n) = 2n + 1 establishes the bijection
between N and the set of the odd natural numbers.
The one to one correspondence between the natural numbers and the set of all integers exhibits
the countability of set of integers. A correspondence is defined by the function
(
b n2 c + 1 if n is odd
f (n) =
b n2 c
if n is even
Example 1.3.3
#Q+ = #J = #N
Q+ is the set of the rational numbers
1.4
p
q
> 0, where p and q are integers, q 6= 0.
Proof Techniques
We will give examples of proof by induction, proof by contradiction, and proof by Cantor diagonalization.
In proof by induction, we have a sequence of statements P1 , P2 , , about which we want to make
some claim. Suppose that we know that the claim holds for all statements P1 , P2 , , up to Pn .
We then try to argue that this implies that the claim also holds for Pn+1 . If we can carry out this
inductive step for all positive n, and if we have some starting point for the induction, we can say that
the claim holds for all statements in the sequence.
The starting point for an induction is called the basis. The assumption that the claim holds for
statements P1 , P2 , , Pn is the induction hypothesis, and the argument connecting the induction
hypothesis to Pn+1 is the induction step. Inductive arguments become clearer if we explicitly show
these three parts.
Example 1.4.1 Let us prove
Pn
i2 =
i=0
n(n+1)(2n+1)
6
by mathematical induction. We establish

Pn
(a) the basis by substituting 0 for n in i=0 i2 =
n(n+1)(2n+1)
6
and observing that both sides are 0.
(b) For the induction hypothesis, we assume that the property holds with n = k;
Pk
i=0
i2 =
k(k+1)(2k+1)
6
(c) In the induction step, we show that the property holds for n = k + 1; i.e.,
Pk
i=0
Since
Pk+1
i=0
Pk+1
i=0
i2 =
i2 =
i2 =
(k)(k+1)(2k+1)
6
(k+1)(k+2)(2k+3)
6
Pk
i=0
i2 + (k + 1)2
and in view of the induction hypothesis, we need only show that

(k)(k+1)(2k+1)
6
+ (k + 1)2 =
(k+1)(k+2)(2k+3)
6
The latter equality follows from simple algebraic manipulation.

In a proof by contradiction, we assume the opposite or contrary of the property to be proved; then
we prove that the assumption is invalid.
Example 1.4.2
Show that 2 is not a rational number.

As inall proofs by contradiction, we assume the contrary of what we want to show. Here we assume
that 2 is a rational number so that it can be written as
n
2= m
,
n
), we have
where n and m are integers without a common factor. Rearranging ( 2 = m
2m2 = n2
Therefore n2 must be even. This implies that n is even, so that we can write n = 2k or
2m2 = 4k 2
and
m2 = 2k 2
1.4. PROOF TECHNIQUES
Therefore m is even.
Butn this contradicts our assumption that n and m have no common factor.
) cannot exist and 2 is not a rational number.
Thus, m and n in ( 2 = m
This example exhibits the essence of a proof by contradiction. By making a certain assumption we
are led to a contradiction of the assumption or some known fact. If all steps in our argument are
logically sound, we must conclude that our initial assumption was false.
To illustrate Cantors diagonalization method, we prove that the set A = {f |f a total function,
f : N N }, is uncountable. This is essentially a proof by contradiction; so we assume that A
is countable, i.e., we can give an enumeration f0 , f1 , f2 , of A. To come to a contradiction, we
construct a new function f as
f(x) = fx (x) + 1
xN
The function f is constructed from the diagonal of the function values of fi A as represented in
the figure below. For each x, f differs from fx on input x. Hence f does not appear in the given
enumeration. However f is total and f : N N . Hence the set A is uncountable since such an f can
be given for any chosen enumeration.
Therefore A cannot be enumerated; hence A is uncountable.
f0 f0 (0) f0 (1) f0 (2)
f1 f1 (0) f1 (1) f1 (2)

f2 f2 (0) f2 (1) f2 (2)
f3 f3 (0) f3 (1) f3 (2)
Remarks:
The set of all infinite sequences of 0s and 1s is uncountable. With each infinite sequence of 0s and
1s we can associate a real number in the range [0, 1). As a consequence, the set of real numbers in
the range [0, 1) is uncountable. Note that the set of all real numbers is also uncountable.
Chapter 2
Languages and Grammars

2.1
Languages
We start with a finite, nonempty set of symbols, called the alphabet. From the individual symbols
we construct strings (over or on ), which are finite sequences of symbols from the alphabet.
The empty string is a string with no symbols at all. Any set of strings over/on is a language
over/on .
Example 2.1.1
= {c}
L1 = {cc}
L2 = {c, cc, ccc}
L3 = {w|w = ck , k = 0, 1, 2, . . .}
Example 2.1.2
= {a, b}
L1 = {ab, ba, aa, bb, }
L2 = {w|w = (ab)k , k = 0, 1, 2, 3, . . .}
= {, ab, abab, ababab, . . .}
The concatenation of two strings w and v is the string obtained by appending the symbols of v to the
right end of w, that is, if
w = a 1 a2 . . . an
and
v = b 1 b2 . . . bm ,
then the concatenation of w and v, denoted by wv, is
wv = a1 a2 . . . an b1 b2 . . . bm
which completes the induction step.
If w is a string, then w n is the string obtained by concatening w with itself n times. As a special case,
we define
w0 = ,
9
10
CHAPTER 2. LANGUAGES AND GRAMMARS
for all w. Note that w = w = w for all w. The reverse of a string is obtained by writing the symbols
in reverse order; if w is a string as shown above, then its reverse w R is
w R = a n . . . a2 a1
If
w = uv,
then v is said to be prefix and u a suffix of w.
The length of a string w, denoted by |w|, is the number of symbols in the string.
Note that,
|| = 0
If u and v are strings, then the length of their concatenation is the sum of the individual lengths,
|uv| = |u| + |v|
Let us show that |uv| = |u| + |v|. To prove this by induction on the length of strings, let us define the
length of a string recursively, by
|a| = 1
|wa| = |w| + 1
for all a and w any string on . This definition is a formal statement of our intuitive understanding of the length of a string: the length of a single symbol is one, and the length of any string
is incremented by one if we add another symbol to it.
Basis: |uv| = |u| + |v| holds for all u of any length and all v of length 1 (by definition).
Induction Hypothesis: we assume that |uv| = |u| + |v| holds for all u of any length and all v
of length 1, 2, . . . , n.
Induction Step: Take any v of length n + 1 and write it as v = wa. Then,
|v| = |w| + 1,
|uv| = |uwa| = |uw| + 1.
By the induction hypothesis (which is applicable since w is of length n).
|uw| = |u| + |w|.
so that
|uv| = |u| + |w| + 1 = |u| + |v|.
which completes the induction step.
If is an alphabet, then we use to denote the set of strings obtained by concatenating zero or
more symbols from . The set and + are always infinite since there is no limit on the length of
the strings in these sets.
A language can thus be defined as a subset of . A string w in a language L is also called a word or
a sentence of L.
Example 2.1.3
2.1. LANGUAGES
= {a, b}. Then
11
= {, a, b, aa, ab, ba, bb, aaa, aab, . . .}.
The set
{a, aa, aab}.
is a language on . Because it has a finite number of words, we call it a finite language. The set
L = {an bn |n 0}
is also a language on . The strings aabb and aaaabbbb are words in the language L, but the string
abb is not in L. This language is infinite.
Since languages are sets, the union, intersection, and difference of two languages are immediately
defined. The complement of a language is defined with respect to ; that is, the complement of L is
L = L
The concatenation of two languages L1 and L2 is the set of all strings obtained by concatenating any
element of L1 with any element of L2 ; specifically,
n
L1 L2 = {xy|x L1 and y L2 }
We define L as L concatenated with itself n times, with the special case

L0 = {}
for every language L.
Example 2.1.4
L1 = {a, aaa}
L2 = {b, bbb}
L1 L2 = {ab, abbb, aaab, aaabbb}

Example 2.1.5
For
L = {an bn |n 0},
then
L2 = {an bn am bm |n 0, m 0}
The string aabbaaabbb is in L2 .The star-closure or Kleene closure of a language is defined as
L = L 0 L 1 L 2
[
=
Li
i=0
and the positive closure as

L+ = L 1 L 2
[
=
Li
i=0
12
2.2
Regular Expressions
Definition 2.2.1 Let be a given alphabet. Then,

1. , (representing {}), a (representing {a}) a are regular expressions. They are called
primitive regular expressions.
2. If r and r1 are regular expressions so are (r), (r ), (r1 + r2 ), (r.r1 ).
3. A string is a regular expression if it can be derived from the primitive regular expressions by
applying a finite number of the operations +, * and concatenation.
A regular expression denotes a set of strings, which is therefore referred to as a regular set or language.
Regarding the notation of regular expression, texts will usually print them boldface; however, we
assume that it will be understood that, in the context of regular expressions, is used to represent
{} and a is used to represent {a}.
Example 2.2.1
b (ab ab ) is a regular expression.
Example 2.2.2
(c + da bb) = {c, dbb, dabb, daabb, . . .}
= {, c, cc, . . . , dbb, dbbdbb, . . . , dabb, dabbdabb, . . . , cdbb, cdabb, . . .}
Beyond the usual properties of + and concatenation, important equivalences involving regular expressions concern porperties of the closure (Kleene star) operation. Some are given below, where , ,
stand for arbitrary regular expressions:
1. ( ) = .
2. ( ) = .
3. + = .
4. ( + ) = + .
5. () = () .
6. ( + ) = ( + ) .
7. ( + ) = ( ) .
8. ( + ) = ( ) .
In general, the distribution law does not hold for the closure operation. For example, the statement
?
+ = ( + ) is false because the right hand side denotes no string in which both and
appear.
2.3. GRAMMARS
2.3
13
Grammars
Definition 2.3.1 A grammar G is defined as a quadruple

G = (V, , S, P )
where
V is a
is a
SV
P is a
finite set of symbols called variables or nonterminals,

finite set of symbols called terminal symbols or terminals,
is a special symbol called the start symbol,
finite set of productions or rules or production rules.
We assume V and are non-empty and disjoint sets.

Production rules specify the transformation of one string into another. They are of the form
xy
where
x (V )+ and
y (V ) .
Given a string w of the form
w = uxv
we say that the production x y is applicable to this string, and we may use it to replace x with y,
thereby obtaining a new string,
w z;
we say that w derives z or that z is derived from w.
Successive strings are derived by applying the productions of the grammar in arbitrary order. A
production can be used whenever it is applicable, and it can be applied as often as desired. If
w1 w 2 w 3 w
we say that w1 derives w, and write w1 w.

The * indicates that an unspecified number of steps (including zero) can be taken to derive w from
w1 . Thus
ww
is always the case. If we want to indicate that atleast one production must be applied, we can write
+
wv
Let G = (V, , S, P ) be a grammar. Then the set
L(G) = {w |s w}
is the language generated by G. If w L(G), then the sequence
S w1 w2 w
is a derivation of the sentence (or word) w. The strings S, w1 , w2 , , are called sentential forms of
the derivation.
14
Example 2.3.1
Consider the grammar
G = ({S}, {a, b}, S, P )
with P given by,
S aSb
S
Then
S aSb aaSbb aabb,
so we can write
S aabb.
The string aabb is a sentence in the language generated by G.
Example 2.3.2
P:
< sentence >< N oun phrase >< V erb phrase >
< N oun phrase >< Determiner >< N oun phrase > | < Adjective >< N oun >
< N oun phrase >< Article >< N oun >

< V erb phrase >< V erb >< N oun phrase >
< Determiner > T his
< Adjective > Old
< N oun > M an|Bus
< V erb > M issed
< Article > T he
Example 2.3.3
< expression >< variable > | < expression >< operation >< expression >
< variable > A|B|C| |Z
< operation > +| | |/
Leftmost Derivation
< expression >< expression >< operation >< expression >
< variable >< operation >< expression >
A < operation >< expression >
A+ < expression >
A+ < expression >< operation >< expression >
A+ < variable >< operation >< variable >
A + B < operation >< expression >
A + B < expression >
A + B < variable >
A+BC
2.3. GRAMMARS
15
4 $&>

(*)
+5 /

(*),+- =

!7698/,$:9;/0
(),+4
< %'6
? @

(*),+- . /0
0A7
? B,,6
"!#
$&%'
1(*),+- 2

3 +4
Figure 2.1: Derivation tree

This is a leftmost derivation of the string A + B C in the grammar (corresponding to A + (B C)).
Note that another leftmost derivation can be given for the above expression.
A grammar G (such as the one above) is called ambiguous if some string in L(G) has more than one
leftmost derivation. An unambiguous grammar for the language is the following:
< expr >< multi expr > | < multi expr >< add expr >< expr >
< multi expr >< variable > | < variable >< multi op >< variable >
< multi op > | /

< add op > + |
< variable > A | B | C | | Z
Note that, for an inherently ambiguous language L, every grammar that generates L is ambiguous.
Example 2.3.4
G : S | aSb | bSa | SS
L = {w|na (w) = nb (w)}
Show that L(G) = L
1. L(G) L. (All strings derived by G, are in L.)
For w L(G), all productions of G add a number of as which is same as the number of bs
added;
na (w) = nb (w)
wL
2. L L(G)
Let w L. By definition of L, na (w) = nb (w). We show that w L(G) by induction (on the
16

length of w).
Basis: is in both L and L(G).
|w| = 2. The only two strings of length 2 in L are ab and ba
S aSb
ab
S bSa
ba
Induction Hypothesis: w L with 2 |w| 2i, we assume that w L(G).
Induction Step: Let w1 L, |w1 | = 2i + 2.
(a) w1 of the form
w1 = awb (or bwa) where |w| = 2i
w L(G) (by I. H.)
We derive w1 = awb using the rule S aSb.
We derive w1 = bwa using the rule S bSa.
(b) w1 = awa or
w1 = bwb
Let us assign a count of +1 to a and -1 to b;
Thus for w1 L the total count = 0.
We will now show that count goes through 0 at least once within w1 = awa (case bwb
is similar)
w1 = a (count = +1) (count goes through 0) (count = -1) a (by end, count = 0).
w1 = w0 (count = 0) w00 where
w0 L,
w00 L.
We also have |w 00 | 2 and |w0 | 2 so that
|w0 | 2i and
|w00 | 2i
w0 , w00 L(G) (I. H.)
w1 = w0 w00 can be derived in G from w 0 and w00 , using the rule S SS.
Example 2.3.5
n
L(G) = {a2 |n 0}
G = (V, T, S, P ) where
V = {S, [, ], A, D}
T = {a}
P :S [A]
[ [D |
D] ]
DA AAD
]
Aa
2.3. GRAMMARS
17
For example, let us derive a4 .

S [A]
[DA]
[AAD]
[AA]
[DAA]
[AADA]
[AAAAD]
[AAAA]
AAAA
AAAA
aaaa
a4
Example 2.3.6
L(G) = {w {a, b, c} | na (w) = nb (w) = nc (w)}
V = {A, B, C, S}
T = {a, b, c}
P : S |ABCS
AB BA
AC CA
BC CB
BA AB
CA AC
CB BC
A a
B b
C c
derive ccbaba
Solution:
S ABCS
ABCABCS
ABCABC
ABCABC
ACBACB
CABCAB
CACBBA
CCABBA
CCBABA
cababa
Example 2.3.7
S | aSb
L(G) = {, ab, aabb, aaabbb, . . .}

L = {ai bi |i 0}
To prove that L = L(G)

1. L(G) L
2. L L(G)
18

2. L L(G) :
Let w L, w = ak bk
we apply S aSb (k times), thus
Sak Sbk
then S
S a k bk
1. L(G) L:
We need to show that, if w can be derived in G, then w L. is in the language, by definition.
We first show that all sentential forms are of the form ai Sbi , by induction on the length of the
sentential form.
Basis: (i = 1) aSb is a sentential form, since S aSb.
Induction Hypothesis: Sentential form of length 2i + 1 is of the form ai Sbi .
Induction Step: Sentential form of length 2(i + 1) + 1 = 2i + 3 is derived as
S aSb a(ai Sbi )b = ai+1 Sbi+1 .
To get a sentence, we must apply the production S ; i.e.,
S ai Sbi ai bi
represents all possible derivations; hence G derives only strings of the form ai bi (i 0).
2.4
Classification of Grammars and Languages
A classification of grammars (and the corresponding classes of languages) is given with respect to the
form of the grammar rules x y, into the Type 1, Type 2 and Type 3 classes, respectively.
Type 1 If all the grammar rules x y satisfy |x| |y|, then the grammar is context sensitive or Type
1. Grammar G will generate a language L(G) which is called a context-sensitive language. Note
that x has to be of length at least 1 and thereby y too. Hence, it is not possible to derive the
empty string in such a grammar.
Type 2 If all production rules are of the form x y where |x| = 1, then the grammar is said to be
context-free or Type 2 (i.e., the left hand side of each rule is of length 1).
Type 3 If the production rules are of the following forms:
A xB
Ax
where x (a string of all terminals or the empty string),
and A, B V (variables),
then the grammar is called right linear.
Similarly, for a left linear grammar, the production rules are of the form
A Bx
Ax
2.5. NORMAL FORMS OF CONTEXT-FREE GRAMMARS
19
For a regular grammar, the production rules are of the form

A aB
Aa
A
with a .
A language which can be generated by a regular grammar will (later) be shown to be regular. Note
that, a language that can be derived by a regular grammar iff it can be derived by a right linear
grammar iff it can be derived by a left linear grammar.
2.5
2.5.1
Normal Forms of Context-Free Grammars

Chomsky Normal Form (CNF)
Definition 2.5.1 A context-free grammar G = (V, , P, S) is in Chomsky Normal Form if each

rule is of the form
i) A BC
ii) A a
iii) S
where B, C V {S}
Theorem 2.5.1 Let G = (V, , P, S) be a context-free grammar. There is an algorithm to construct
a grammar G0 = (V, , P 0 , S) in Chomsky normal form that is equivalent to G (L(G0 ) = L(G)).
Example 2.5.1
Convert the given grammar G to CNF.
G :S aABC|a
A aA|a
B bcB|bc
C cC|c
Solution:
A CNF equivalent G0 can be given as :
G0 : S A0 T1 |a
A0 a
T1 AT2
T2 BC
A A0 A
B B 0 T3 |B 0 C 0
B0 b
T3 C 0 B
C C 0 C|c
C0 c
2.5.2
Greibach Normal Form (GNF)
If a grammar is in GNF, then the length of the terminals prefix of the sentential form is increased at
every grammar rule application, thereby enabling the prevention of the left recursion.
20
Definition 2.5.2 A context-free grammar G = (V, , P, S) is in Greibach Normal Form if each

rule is of the form,
i) A aA1 A2 . . . An
ii) A a
iii) S
Chapter 3
Finite State Automata

3.1
Deterministic Finite Automata (DFA)
Definition 3.1.1 A deterministic finite automaton (DFA) is a quintuple M = (Q, , , q 0 , F )

where Q is a finite set of states, a finite set of symbols called the alphabet, q 0 Q a distinguished
state called the start state, F a subset of Q consisting of the final or accepting states, and a total
function from Q to Q called the transition function.
Example 3.1.1

Figure 3.1: Example DFA
Some strings accepted by the machine are:

baab
baaab
babaabaaba
aaa a
All of the above strings are characterized by the presence of at least one aa substring.
According to the definition of a DFA, the following are identified:
Q = {q0 , q1 , q2 }
= {a, b}
: Q Q : (qi , a) 7 qj
where i can be equal to j and the mapping is given by the transition table below.
Transition Table:
21
22
CHAPTER 3. FINITE STATE AUTOMATA
q0
q1
q2
a
q1
q2
q2
b
q0
q0
q2
A sample computation, on the string abaab, is represented as
[q0 , abaab] 7 [q1 , baab]

7 [q0 , aab]
7 [q1 , ab]
7 [q2 , b]
7 [q2 , ]
Definition 3.1.2 Let M = (Q, , , q0 , F ) be a DFA. The language of M , denoted L(M ), is the set
of strings in accepted by M .
A DFA can be considered as a language acceptor; the language recognized by the machine is the set
of strings that are accepted by its computations. Two machines that accept the same language are
said to be equivalent.
Definition 3.1.3 The extended transition function of a DFA with transition function is a
function from Q to Q defined by recursion on the length of the input string.
i , ) = qi .
i) Basis: length(w) = 0. Then w = and (q
i , a) = (qi , a).
length(w) = 1. Then w = a, for some a and (q
i , ua) = ((q
i , u), a).
ii) Recursive step: Let w be a string of length n > 1. Then w = ua and (q
i , w). A string w is accepted
The computation of a machine in state qi with string w halts in state (q
0 , w) F . Using this notation, the language of a DFA M is the set L(M ) = {w|(q
0 , w) F }.
if (q
3.2
Nondeterministic Finite Automata (NFA)
Definition 3.2.1 A nondeterministic finite automaton is a quintuple M = {Q, , , q 0 , F } where

Q is a finite set of states, a finite set of symbols called the alphabet, q 0 Q a distinguished state
known as the start state, F a subset of Q consisting of the final or accepting states, and a total
function from Q to P(Q) known as the transition function.
Note that a deterministic finite automaton is considered a special case of a nondeterministic one. The
transition function of a DFA specifies exactly one state that may be entered from a given state and
on a given input symbol, while an NFA allows zero, one or more states to be entered. Hence, a string
input to an NFA may generate several distinct computations.
For the language over = {a, b} where each string has at least one occurrence of a double a, an
NFA can be given with the following transition table:
q0
q1
q2
a
{q0 , q1 }
{q2 }
{q2 }
b
{q0 }
{q2 }
3.3. NFA WITH EPSILON TRANSITIONS (NFA- OR -NFA))
23
Two computations on the string aabaa are given by:

[q0 , aabaa] 7 [q1 , abaa]
7 [q2 , baa]
7 [q2 , ba]
7 [q2 , a]
7 [q2 , ]
and
[q0 , aabaa] 7 [q0 , abaa]

7 [q0 , baa]
7 [q0 , ba]
7 [q1 , a]
7 [q2 , ]
We will further show that a language accepted by an NFA is also accepted by a DFA. As an
example, the language accepted by the above NFA is also accepted by the DFA of Example 3.1.1.
Definition 3.2.2 The language of an NFA M, denoted L(M ), is the set of strings accepted by M .
That is, L(M ) = {w| there is a computation [q0 , w] ` [qi , ] with qi F }.
3.3
NFA with Epsilon Transitions (NFA- or -NFA))
So far, in the discussion of Finite State automatons, the reading head was required to move at each
step of the transitions. Intuitively, an -transition allows the reading head of the automaton to remain
at a cell during a transition. Such a transition is called an -transition.
Definition 3.3.1 A nondeterministic finite automaton with -transitions is a quintuple M = (Q, , , q 0 , F )
where Q, , q0 , and F are as in an NFA. The transition function is a function from Q ( {}) to
2Q .
Epsilon transitions can be used to combine existing machines to construct more complex composite
machines. Let M1 and M2 be two finite automata which consists of a single start state and a final
state where there are no arcs entering the start state, and no arcs leaving the accepting state.
Composite machines that accept L(M1 ) L(M2 ) , L(M1 )L(M2 ), and L(M1 ) are constructed from
M1 and M2 as depicted in Figures 3.2-3.4.
The NFA- of Example 3.3.1 accepts the language over = {a, b} where each string has at least
one occurrence of aa or bb. The states of machines M1 and M2 are given distint names. A possible
computation on the string bbaaa is given below.
[q00 , bbaaa] 7 [0 , bbaaa]
7 [1 , baaa]
7 [2 , aaa]
7 [q20 , aa]
7 [q20 , a]
7 [q20 , ]
Example 3.3.1
24

Figure 3.2: L(M1 ) L(M2 )

Figure 3.3: L(M1 )L(M2 )

Figure 3.4: L(M1 )
3.4
Finite Automata and Regular Sets
Theorem 3.4.1 The set of languages accepted by finite state automata consists precisely of the regular
sets over
First we will show that every regular set is accepted by some NFA-. This follows from the recursive
definition of regular sets. The regular sets are built from the basis elements , {} and the singletons
containing a symbol from the alphabet. Machines that accept these sets are given in Figure 3.6. The
regular sets are constructed from the primitive regular sets using union, concatenation, and Kleene
star operations.
3.4.1
Removing Nondeterminism
Definition 3.4.1 The -closure of a state qi , denoted -closure(qi ), is defined recursively by,
i) Basis: qi -closure(qi ).
ii) Recursive step: Let qj be an element of -closure(qi ). If qk (qj , ), then qk -closure(qi ).
3.4. FINITE AUTOMATA AND REGULAR SETS
25

Figure 3.5: Sample Union Construction

Figure 3.6: Machines that accept the primitive regular sets

iii) Closure: qj is in -closure(qi ) only if it can be obtained from qi by a finite number of applications
of operations in ii).
Algorithm 1 Construction of DM, a DFA Equivalent to NFA- M (see text)
Example 3.4.1 For the NFA- of Figure 3.7, we derive the DFA of Figure 3.8.
26

Figure 3.7: An NFA-

!

!

(Note: the diagram of the figure is missing a transition from FG to BCE on 1, and transitions on 0
and 1 at .)
Figure 3.8: Equivalent DFA
3.4.2
27
Expression Graphs
Definition 3.4.2 An expression graph is a labeled directed graph in which the arcs are labeled by
regular expressions. An expression graph, like a state diagram, contains a distinguished start node and
a set of accepting nodes.
Example 3.4.2
The expression graph given in (fig 3.9) accepts the regular expressions u and u vw .

Figure 3.9: Expression Graph

Figure 3.10: Expression Graph Transformation

The reduced graph has atmost two nodes, the start node and an accepting node. If these are the
same node, the reduced graph has the form (fig 3.11(a)), accepting w . A graph with distinct start
and accepting nodes reduces to (fig 3.11(b)) and accepts the expression w1 w2 (w3 w4 (w1 ) w2 ) . This
expression may be simplified if any of the arcs in the graph are labeled .

Figure 3.11: (a) w , (b) w1 w2 (w3 w4 (w1 ) w2 )
28
Algorithm 2 Construction of a Regular Expression from a Finite Automaton

input: state diagram G of a finite automaton and the nodes of G are numbered 1, 2, . . . , n
1. Make m copies of G, each of which has one accepting state. Call these graphs G 1 , G2 , . . . , Gm .
Each accepting node G is the accepting node of Gt , for some t = 1, 2, . . . , m.
2. for each Gt , do
2.1. repeat
2.1.1. choose a node i in Gt , that is neither the start nor the accepting node of Gt .
2.1.2. delete the node i from Gt according to the procedure:
for every j,k not equal to i (this includes j = k) do
i) if wj,i 6= , and wi,i = then add an arc from node j to node k labeled wj,i wi,k
ii) if wj,i 6= , wi,k 6= , and wi,i 6= then add an arc from node j to node k labeled
wj,i (wi,i ) wi,k
iii) if nodes j and k have arcs labeled w1 , w2 , . . . , ws connecting them then replace them
by a single arc labeled w1 w2 . . . ws
iv) remove the node i and all arcs incident to it in Gt until the only nodes in Gt are the
start node and the single accepting node.
end for
2.2. determine the expression accepted by Gt . end for
3. The regular expression accepted by G is obtained by joining the expressions for each G t with .
The deletion of the node i is accomplished by finding all paths j, i, k of length two that have
i as the intermediate node. An arc from j to k is added by passing the node i. If there is no arc from i
to itself, the new arc is labeled by the concatenation of the expressions on each of the component arcs.
If wi,i 6= , then the arc wi,i can be traversed any number of times before following the arc from i to
k. The label for the new arc is wj,i (wi,i ) wi,k . These graph transformations are illustrated in (fig 3.10).
29
Example 3.4.3
1. Example 1: Fig 3.12(a) shows the original DFA which is reduced to an expression graph shown
in fig 3.12(b).

Figure 3.12: Example 3.4.3 - 1(a),(b)

2. Example 2: Explanation of elimination: Sequence of steps where one state is eliminated at
each step.
Step 1: Given: fig 3.13(a)
Step 2: Eliminating i at this step, fig 3.13(b)

Figure 3.13: Example 3.4.3 - 2(a),(b)

Step 3: After eliminating all but initial and final state in Gi , fig 3.14(c)
Step 4: Final regular expression,fig 3.14(d)
3. Example 3: Fig. 3.15 shows the different steps where
30

Figure 3.14: Example 3.4.3 - 2(c),(d)

L
= r1 r2 (r3 + r3 r4 r1 r2 r3 )
= r1 r2 (r3 + r4 r1 r2 )
or
L = r1 fig 3.14(d)
31

Figure 3.15: Example 3.4.3 - 3
32
Chapter 4
Regular Languages and Sets

4.1
Regular Grammars and Finite Automata
This chapter corresponds to Chapter 7 of the course textbook

Theorem 4.1.1 Let G = (V, , P, S) be a regular grammar. Define the NFA M = (Q, , , S, F ) as
follows:
(
V {Z} whereZ
/ V, if P contains a rule A a
i) Q =
V otherwise
ii) (A, a) =
iii) F =
B wheneverA aB P
Z wheneverA a P
{A|A P } {Z} if Z Q
{A|A P } otherwise
Then L(M ) = L(G).

Example 4.1.1
The grammar G generates and the NFA M accepts the language a (a b+ )
G :S aS|bB|a
B bB|
The derivation of a string such as aabb is explained below:
In G:
Sa
aaS
aabB
aabbB
aabb
aabb
33
34
CHAPTER 4. REGULAR LANGUAGES AND SETS
Figure 4.1: NFA accepts a (a b+ )

In M:
[S, aabb] 7 [S, abb]
7 [S, bb]
7 [B, b]
7 [B, ]
Similarly, a regular grammar that accepts the L(M ) is constructed from the automaton M.
G0 :S aS|bB|aZ
B bB|
Z
The transitions provide the S rules and the first B rule. The varepsilon rules are added since B
and Z are accepting states.
Note:
Example 4.1.2
A regular grammar is constructed from the given DFA in fig 4.2.

Figure 4.2: Example 4.1.2
4.2. CLOSURE PROPERTIES OF REGULAR SETS
35
S bB|aA
A aS|bC
B aC|bS|
C aB|bA
4.2
Closure Properties of Regular Sets
A language over an alphabet is regular if it is

i) a regular set (expression) over
ii) accepted by DFA, NFA, or NFA-
iii) generated by a regular grammar.
Theorem 4.2.1 Let L1 and L2 be two regular languages. The languages L1 , L2 , L1 L2 , and L1 are
regular languages.
Theorem 4.2.2 Let L be a regular language over . The language L is regular.
L = L
Theorem 4.2.3 Let L1 and L2 be regular languages over . The language L1 L2 is regular.
Proof: By DeMorgans law
L1 L 2 = L 1 L 2
The right-hand side of the equality is regular since it is built from L1 and L2 using union and
complementation.
Theorem 4.2.4 Let L1 be a regular language and L2 be a context-free language. The language L1 L2
is not necessarily regular.
Proof: Let L1 = a b and L2 = {ai bi | 0}. L2 is context-free since it is generated by the grammar
S aSb|. The intersection of L1 and L2 is L2 , which is not regular.
4.3
Pumping Lemma for Regular Languages
Pumping a string refers to constructing new strings by repeating (pumping) substrings in the original
string.
Theorem 4.3.1 Let L be a regular language that is accepted by a DFA M with n states. Let w be any
string in L with length(w) n. Then w can be written as xyz with length(xy) n, length(y) > 0,
and xy k z L for all k 0.
Example 4.3.1
Prove that the languge L = {ai bi |i 0} is not regular using the Pumping lemma for regular languages.
Proof: By contradiction: Assume L is regular; then the pumping lemma holds. Let w = an bn .
By splitting an bn into xyz, we get
x = ai , y = aj , and z = anij bn
36
CHAPTER 4. REGULAR LANGUAGES AND SETS
where
i + j n and j > 0
Pumping y to y 2 gives,
ai aj aj anij bn
= a n a j bn
/ L (contradiction with the pumping lemma).
Therefore, L is not regular.
Example 4.3.2
The language L = {ai |i is prime} is not regular.
Assume L is regular, and that a DFA with n states accepts L. Let m be a prime greater than n.
The pumping lemma implies that am can be decomposed as xyz, y 6= , such that xy k z is in L for all
k 0.
The length of s = xy m+1 z must be prime if s is in L. But,
length(xy m+1 z) = length(xyzy m)
= length(xyz) + length(y m)
= m + m(length(y))
(4.1)
= m(1 + length(y))
Since its length is not prime, xy m+1 z is not in L (contradiction with the pumping lemma). Hence, L
is not regular.
Corollary 4.3.1 Let DFA M have n states.
i. L(M ) is not empty if, and only if, M accepts a string w with length(w) < n.
ii. L(M ) has an infinite number of strings if, and only if, M accepts a string w where n
length(z) < 2n.
Theorem 4.3.2 Let M be a DFA. There is a decision procedure to determine whether,
i. L(M) is empty;
ii. L(M) is finite;
iii. L(M) is infinite.
Chapter 5
Pushdown Automata and

Context-Free Languages
5.1
Pushdown Automata
Definition 5.1.1 A pushdown automaton is a six tuple (Q, , , , q0 , F ), where Q is a finite set
of states, a finite set called the input alphabet, a finite set called the stack alphabet, q 0 the start
state, F Q a set of final states, and a transition function from Q ( {}) ( {}) to subsets
of Q ( {}).
Example 5.1.1
The language L = {ai |i 0} {ai bi |i 0} contains strings consisting solely of a0 s or an equal number
of a0 s and b0 s. The stack of the PDA M that accepts L maintains a record of the number of a0 s
processed until a b is encountered or the input string is completely processed.

Figure 5.1: L = {ai |i 0} {ai bi |i 0}

When scanning an a in state q0 , there are two transitions that are applicable. A string of the form
ai bi , i > 0, is accepted by a computation that remains in states q0 and q1 . If a transition to state
q2 follows the processing of the final a in a string ai , the stack is emptied and the input is accepted.
Reaching q2 in any other manner results in an unsuccessful computation, since no input is processed
after q2 is entered.
37
38
CHAPTER 5. PUSHDOWN AUTOMATA AND CONTEXT-FREE LANGUAGES
The -transition from q0 allows the machine to enter q2 after the entire input string has been read,
since a symbol is not required to process an -transition. The transition, which is applicable whenever
the machine is in state q0 , introduces nondeterministic computations of M .
Example 5.1.2
The even-length palindromes over {a, b} are accepted by the PDA That is, L(M ) = {ww R | w {a, b} }.

Figure 5.2: PDA L(M ) = ww R

A successful computation remains in state q0 while processing the string w and enters state q1 upon
reading the first symbol in w R .
5.2
Variations on the PDA Theme
Pushdown automata are often defined in a manner that differs slightly from Definition 5.1.1 In this
section we examine several alterations to our definition that preserve the set of accepted languages.
Along with changing the state, a transition in a PDA is accompanied by three actions: popping
the stack, pushing a stack element, and processing an input symbol. A PDA is called atomic if each
transition causes only one of the three actions to occur. Transitions in an atomic PDA have the form
[qj , ] (qi , a, )
[qj , ] (qi , , A)
[qj , A] (qi , , )
Theorem 5.2.1 shows that the languages accepted by atomic PDAs are the same as those accepted
by PDAs. Moreover, it outlines a method to construct an equivalent atomic PDA from an arbitrary
PDA.
Theorem 5.2.1 Let M be a PDA. Then there is an atomic PDA M 0 with L(M 0 ) = L(M ).
Proof: To construct M 0 , the nonatomic transitions of M are replaced by a sequence of atomic transitions. Let [qj , B] (qi , a, A) be a transition of M . The atomic equivalent requires two new states,
p1 and p2 , and the transitions
[p1 , ] (qi , a, )
(p1 , , A) = {[p2 , ]}
(p2 , , ) = {[qj , B]}
In a similar manner, a transition that consists of changing the state and performing two additional
actions can be replaced with a sequence of two atomic transitions. Removing all nonatomic transitions
produces an equivalent atomic PDA.
An extended transition is an operation on a PDA that replaces the stack top with a string of
symbols, rather than just a single symbol. The transition [qj , BCD] (qi , u, A) replaces the stack
top A with the string BCD with B becoming the new stack top. The apparent generalization does
5.3. PUSHDOWN AUTOMATA AND CONTEXT-FREE LANGUAGES
39
not increase the set of languages accepted by pushdown automaton. A PDA containing extended
transitions is called an extended PDA. Each extended PDA can be converted into an equivalent
PDA in the sense of Definition 5.1.1
To construct a P DA from an extended P DA, extended transitions are converted to a sequence of
transitions each of which pushes a single stack element. To achieve the result of an extended transition
that pushes k elements requires k 1 additional states to push the elements in the correct order. The
sequence of transitions
[p1 , D] (qi , u, A)
(p1 , , ) = {[p2 , C]}
(p2 , , ) = {[qj , B]}
replaces the stack top A with the string BCD and leaves the machine in state qj . This produces the
same result as the single extended transition [qj , BCD] (qi , u, A).
5.3
Pushdown Automata and Context-Free Languages
Theorem 5.3.1 Let L be a context-free language. Then there is a PDA that accepts L.
Proof:
Let G = (V, , P, S) be a grammar in Greibach normal form that generates L. An extended
PDA M with start state q0 is defined by
QM = {q0 , q1 }
M =
M = V {S}
FM = {q1 }
with transitions
(q0 , a, ) = {[q1 , w] | S aw P }
(q1 , a, A) = {[q1 , w] | A aw P and A V {S}}

(q0 , , ) = {[q1 , ]} if S P.
We first show that L L(M ). Let S uw be a derivation with u + and w V .

We will prove that there is a computation
[q0 , u, ] ` [q1 , , w]
in M . The proof is by induction on the length of the derivation and utilizes the correspondence
between derivations in G and computations of M . The basis consists of derivations S aw of length
one. The transition generated by the rule S aw yields the desired computation. Assume that for
n
all strings uw generated by derivations S uw there is a computation
[q0 , u, ] ` [q1 , , w]
in M .
n+1
Now let S uw be a derivation with u = va + and w V . This derivation can be written as

n
S vAw2 uw,
40
where w = w1 w2 and A aw1 is a rule in P . The inductive hypothesis and the transition [q1 , w1 ]
(q1 , a, A) combine to produce the computation
[q0 , va, ] ` [q1 , a, Aw2 ]

` [q1 , , w1 w2 ]
For every string u in L of positive length, the acceptance of u is exhibited by the computation in M
corresponding to the derivation S u. If L, then S is a rule of G and the computation

[q0 , , ] ` [q1 , , ] accepts the null string. The opposite inclusion, L(M ) L, is established by show
ing that for every computation [q0 , u, ] ` [q1 , , w] there is a corresponding derivation S uw in G.
Theorem 5.3.2 Let P = (Q, , , , q0 , F ) be a PDA. Then there is a context-free grammar G such
that L(G) = L(P ).
5.4
The Pumping Lemma for Context-Free Languages
Lemma 5.4.1 Let G be a context-free grammar in Chomsky normal form and A w a derivation
of w with derivation tree T . If the depth of T is n, then length(w) 2n1 .
Corollary 5.4.1 Let G = (V, , P, S) be a context-free grammar in Chomsky normal form and S w
a derivation of w L(G). If length(w) 2n , then the derivation tree has depth at least n + 1.
Theorem 5.4.1 (Pumping Lemma for Context-Free Languages)
Let L be a context-free language. There is a number k, depending on L, such that any string z L
with length(z) > k can be written as z = uvwxy where
i) length(vwx) k
ii) length(v) + length(x) > 0
iii) uv i wxi y L, for i 0.
Proof: Let G = (V, , P, S) be a Chomsky normal form grammar that generates L and let k = 2 n
where n = #V . We show that all stings in L with length k or greater can be decomposed to satisfy
the conditions of the pumping lemma. Let z L(G) be such a string and S z a derivation in G. By
Corollary 5.4.1, there is a path of length at least n + 1 = #V + 1 in the derivation tree of S z. Let
p be a path of maximal length from the root S to a leaf of the derivation tree. Then p must contain
at least n + 2 nodes, all of which are labeled by variables except the leaf node which is labeled be a
terminal symbol. The pigeon hole principle gurantees that some variable A must occur twice in the
final n + 2 nodes of this path.
The derivation tree can be divided into subtrees where the nodes labeled by the variable A indicated
in the diagram are the final two occurrences of A in the path p.
The derivation of z consists of the subderivations
1. S r1 Ar2
2. r1 u
+
3. A vAx
4. A w
5. r2 y.
5.4. THE PUMPING LEMMA FOR CONTEXT-FREE LANGUAGES
41

Figure 5.3: Pumping Lemma for CFL
Subderivation 3 may be omitted or be repeated any number of times before applying subderivation 4.
The resulting derivations generate the strings uv i wxi y L(G) = L.
We now show that conditions (ii) and (iii) in the pumping lemma are satisfied by this decomposition.
+
The subderivation A vAx must begin with a rule of the form A BC. The second occurence of
the variable A is derived from either B or C. If it is derived from B, the derivation can be written
A BC
vAyC
vAyz
= vAx
The string z is nonnull since it is obtained by a derivation from a variable in a Chomsky normal form
grammar that is not the start symbol of the grammar. It follows that x is also nonnull. If the second
occurrence of A is derived from the variable C, a similar argument shows that v must be nonnull.
The subpath of p from the first occurrence of the variable A in the diagram to a leaf must be of the
length at most n + 2. since this is the longest path in the subtree with root A, the derivation tree
generated by the derivation A vwx has depth at most n + 1. Also the string vwx obtained from
n
this derivation has length k = 2 or less.
Example 5.4.1
The language L = {ai bi ci |i 0} is not context-free.
Proof: Assume L is context-free. By Theorem 5.4.1, the string w = ak bk ck , where k is the number
specified by the pumping lemma, can be decomposed into substrings uvwxy that satisfy the repetition
properties. Consider the possibilities for the substrings v and x. If either of these contain more than
one type of terminal symbol, then uv 2 wx2 y contains a b preceding an a or a c preceding a b. In either
case, the resulting string is not in L.
By the previous observation, v and x must be substrings of one of ak ,bk or ck . Since at most one of
the strings v and x is null, uv 2 wx2 y increases the number by at least one, maybe two, but not all
three types of terminal symbols. This implies that uv 2 wx2 y
/ L. Thus there is no decomposition of
ak bk ck satisfying the conditions of the pumping lemma; consequently, L is not context-free.
42
5.5
Closure Properties of Context-Free Languages
Theorem 5.5.1 The set of context-free languages is closed under the operations union, concatenation,
and Kleene star.
Proof: Let L1 and L2 be two context-free languages generated by G1 = (V1 , 1 , P1 , S1 ) and G2 =
(V2 , 2 , P2 , S2 ), respectively. The sets V1 and V2 of variables are assumed to be disjoint. Since we
may rename variables, this assumption imposes no restriction on the grammars.
A context-free grammar will be constructed from G1 and G2 that establishes the desired closure
property.
i) Union: Define G = (V1 V2 {S}, 1 2 , P1 P2 {S S1 |S2 }, S). A string w is in L(G)
if, and only if, there is a derivation S Si w for i = 1or2. Thus w is in L1 or L2 . On the
other hand, any derivation Si w can be initialized with the rule S Si to generate w in G.
ii) Concatenation: Define G = (V1 V2 {S}, 1 2 , P1 P2 {S S1 S2 }, S). The start symbol
initiates derivation in both G1 and G2 . A leftmost derivation of a terminal string in G has the
form S S1 S2 uS2 uv, where u L1 and v L2 . The derivation of u uses only rules from
P1 and v rules from P2 . Hence L(G) L1 L2 . The opposite inclusion is established by observing
that every string w in L1 L2 can be written uv with u L1 and v L2 . The derivations S1 G1 u
and S2 G1 v along with the S rule of G generate w in G.

iii) Kleene Star: Define G = (V1 , 1 , P1 S S1 S|, S). The S rule of G generates any number of
copies of S1 . Each of these, in turn, initiates the derivation of a string in L1 . The concatenation
of any number of strings from L1 yields L1 .
Theorem 5.5.2 The set of context-free languages is not closed under intersection or complementation.
Proof:
i) Intersection: Let L1 = {ai bi cj |i, j 0} and L2 = {aj bi ci |i, j 0}. L1 and L2 are both
context-free, since they are generated by G1 and G2 , respectively.
G1 :S BC
G2 : S AB
B aBb|
C cC|
A aA|
B bBc|
The intersection of L1 and L2 is the set {ai bi ci |i 0}, which, by Example 5.4.1, is not contextfree.
ii) Complementation: Let L1 and L2 be any two context-free languages. If the context-free languages are closed under complementation, then, by Theorem 5.5.1 the language
L = L 1 L2
is context-free. By DeMorgans law, L = L1 L2 . This implies that the context-free languages
are closed under intersection, contradicting the result of part(i).
5.6
A Two-Stack Automaton
Finite automata accept regular languages. Pushdown automata accepts context-free languages.
Definition 5.6.1 A two-stack PDA is structure (Q, , , , q0 , F ), where Q is a finite set of states,
a finite set called the input alphabet, a finite set called the stack alphabet, q 0 the start state,
F Q a set of final states, and a transition function from Q ( {}) ( {}) ( {}) to
subsets of Q ( {}) ( {}).
5.6. A TWO-STACK AUTOMATON
43
Example 5.6.1
The two-stack PDA defined below accepts the language L = {ai bi ci |i 0}. The first stack is used to
match the a0 s and b0 s and the second b0 s and c0 s.
Q = {q0 , q1 , q2 }
= {a, b, c}
= {A}
F = {q2 }
(q0 , , , ) = {[q2 , , ]}
(q0 , a, , ) = {[q0 , A, ]}
(q0 , b, A, ) = {[q1 , , A}
(q1 , b, A, ) = {[q1 , , A}
(q1 , c, , A) = {[q2 , , ]}
(q2 , c, , A) = {[q2 , , ]}
The computation that accepts aabbcc

[q0 , aabbcc, , ] ` [q0 , abbcc, A, ]
` [q0 , bbcc, AA, ]
` [q1 , bcc, A, A]
` [q1 , cc, , AA]

` [q2 , c, , A]
` [q2 , , , ]
illustrates the interplay between the two stacks.
44
Chapter 6
Turing Machines
6.1
The Standard Turing Machine
Definition 6.1.1 A Turing machine is a quintuple M = (Q, , , , q0 ) where Q is a finite set of

states, is a finite set called the tape alphabet, contains a special symbol B that represents a blank,
is a subset of {B} called the input alphabet, is a partial function from Q to Q {L, R}
called the transition function, and q0 Q is a distinguished state called the start state.
6.1.1
Notation for the Turing Machine
We may visualize a Turing Machine as in fig 6.1. The machine consists of a finite control, which can
be in any of a finite set of states. There is a tape divided into squares or cells; each cell can hold any
one of a finite number of symbols.
Initially, the input, which is a finite length string of symbols chosen from the input alphabet, is placed
on the tape. All other tape cells, extending infinitely to the left and right, initially hold a special
symbol called the blank. The blank is a tape symbol, and there may be other tape symbols besides the
input symbols and the blank, as well.
There is a tape head that is always positioned at one of the tape cells. The Turing Machine is said to
be scanning that cell. Initially, the tape head is at the left-most cell that holds the input.
A move of the Turing Machine is a function of the state of the finite control and the tape symbol
scanned. In one move, the Turing Machine will:
1. Change state. The next state optionally may be the same as the current state.
2. Write a tape symbol in the cell scanned. This tape symbol replaces whatever symbol was in
that cell. Optionally, the symbol written may be the same as the symbol currently there.
3. Move the tape head left or right. In our formalism we require a move, and do not allow the head
to remain stationary. This restriction does not constrain what a Turing Machine can compute,
since any sequence of moves with a stationary head could be condensed, along with the next
tape head move, into a single state change, a new tape symbol, a move left or right.
Turing machines are designed to perform computations on strings from the input alphabet. A computation begins with the tape head scanning the leftmost tape square and the input string beginning
at position one. All tape squares to the right of the input string are assumed to be blank. The Turing
machine defined with initial conditions as described above, is referred to as the standard Turing
machine. A language accepted by a Turing machine is called a recursively enumerable language.
A language accepted by a Turing machine that halts for all input strings is said to be recursive.
Example 6.1.1
45
46
CHAPTER 6. TURING MACHINES

Figure 6.1: A Turing Machine

The Turing machine COPY fig 6.2 with input alphabet a, b produces a copy of the input string. That
is, a computation that begins with the tape having the form BuB terminates with tape BuBuB.
6.2
Turing Machines as Language Acceptors
Example 6.2.1
The Turing machine accepts the language (a b) aa(a b) . The computation
q0 BaabbB ` Bq1 aabbB
` Baq2 abbB
` Baaq3 bbB
examines only the first half of the input before accepting the string aabb. The language (ab) aa(ab)
is recursive; the computations of M halt for every input string. A successful computation terminates
when a substring aa is encountered. All other computations halt upon reading the first blank following
the input.
Example 6.2.2
The language {ai bi ci |i 0} is accepted by the Turing machine in fig 6.4. A computation successfully
terminates when all the symbols in the input string have been transformed to the appropriate tape
symbol.
6.3
Alternative Acceptance Criteria
Definition 6.3.1 Let M = (Q, , , , q0 ) be a Turing machine that accepts by halting. A string
u is accepted by halting if the computation of M with input u halts (normally).
Theorem 6.3.1 The following statements are equivalent:
i) The language L is accepted by a Turing machine that accepts by final state.
6.3. ALTERNATIVE ACCEPTANCE CRITERIA
47

!

!

"
#
$

Figure 6.2: Turing Machine COPY
4-4 /
%'&
(+
,.-
,0/
(21
- /
3 3
(5
- /
3 3
(*)
4-4 /
Figure 6.3: TM accepting (a b) aa(a b)
ii) The language L is accepted by a Turing machine that accepts by halting.
Proof: Let M = (Q, , , , q0 ) be a Turing machine that accepts L by halting. The machine
M 0 = (Q, , , , q0 , Q) in which every state is a final state, accepts L by final state.
Conversely, let M = (Q, , , , q0 , F ) be a Turing machine that accepts the language L by final state.
Define the machine M 0 = (Q qf , , , 0 , q0 ) that accepts by halting as follows:
i) If (qi , x) is defined, then 0 (qi , x) = (qi , x).
ii) For each state qi Q F , if (qi , x) is undefined, then 0 (qi , x) = [qf , x, R].
iii) For each x , 0 (qf , x) = [qf , x, R].
Computations that accept strings in M and M 0 are identical. An unsuccessful computation in M
may halt in a rejecting state, terminate abnormally, or fail to terminate. When an unsuccessful
computation in M halts, the computation in M 0 enters the state qf . Upon entering qf , the machine
moves indefinitely to the right. The only computations that halt in M 0 are those that are generated
by computations of M that halt in an accepting state. Thus L(M 0 ) = L(M ).

48

!

"

#

$

Figure 6.4: TM accepting ai bi ci
6.4
Multitrack Machines
A multitrack tape is one in which the tape is divided into tracks. Multiple tracks increase the amount
of information that can be considered when determining the appropriate transition. A tape position
in a two-track machine is represented by the ordered pair [x, y], where x is the symbol in track 1 and
y in track 2. The states, input alphabet, tape alphabet, initial state, and final states of a two-track
machine are the same as in the standard Turing machine. A two-track transition reads and rewrites
the entire tape position. A transition of a two-track machine is written (qi , [x, y]) = [qj , [z, w], d],
where d {L, R}.
The input to a two-track machine is placed in the standard input position in track 1. All the positions
in track 2 are initially blank. Acceptance in multitrack machines is by final state.
Languages accepted by two-track machines are precisely the recursively enumerable languages.
Theorem 6.4.1 A language L is accepted by a two-track Turing machine if, and only if, it is accepted
by a standard Turing machine.
Proof: Clearly, if L is accepted by a standard Turing machine it is accepted by a two-track machine.
The equivalent two-track machine simply ignores the presence of the second track.
Let M = (Q, , , , q0 , F ) be a two-track machine. A one-track machine will be constructed in which
a single tape square contains the same information as a tape position in the two-track tape. The

6.5. TWO-WAY TAPE MACHINES
49
representation of a two-track tape position as an ordered pair indicates how this can be accomplished.
The tape alphabet of the equivalent one-track machine M 0 consits of ordered pairs of tape elements of
M . The input to the two-track machine consists of ordered pairs whose second component is blank.
The input symbol a of M is identified with the ordered pair [a, B]of M 0 . The one-track machine.
M 0 = (Q, {B}, , 0 , q0 , F )
with transition function
0 (qi , [x, y]) = (qi , [x, y])
accepts L(M ).
6.5
Two-Way Tape Machines
A Turing machine with a two-way tape is identical to the standard model except that the tape extends
indefinitely in both directions. Since a two-way tape has no left boundary, the input can be placed
anywhere on the tape. All other tape positions are assumed to be blank. The tape head is initially
positioned on the blank to the immediate left of the input string.
6.6
Multitape Machines
A k-tape machine consists of k tapes and k independent tape heads. The states and alphabets of
a multitape machine are the same as in a standard Turing machine. The machine reads the tapes
simultaneously but has only one state. This is depicted by connecting each of the independent tape
heads to a single control indicating the current state. A transition is determined by the state and
symbols scanned by each of the tape heads. A transition in a multitape machine may
i) change the state
ii) write a symbol on each of the tapes
iii) independently reposition each of the tape heads.
The repositioning consists of moving the tape head one square to the left or one square to the right or
leaving it at its current position. The input to a multitape machine is placed in the standard position
on tape 1. All the other tapes are assumed to be blank. The tape heads origanlly scan the leftmost
position of each tape. Any tape head attempting to move to the left of the boundary of its tape
terminates the computation abnormally. Any language accepted by a k-tape machine is accepted by
a 2k + 1-track machine.
Theorem 6.6.1 The time taken by the one-tape TM N to simulate n moves of a k-tape TM M is
O(n2 )
6.7
Nondeterministic Turing Machines
A nondeterministic Turing machine may specify any finite number of transitions for a given configuration. The components of a nondeterministic machine, with the exception of the transition function,
are identical to those of the standard Turing machine. Transitions in a nondeterministic machine are
defined by a partial function from Q to subsets of Q {L, R}.
Language accepted by a nondeterministic Turing machine is recursively enumerable.
50
6.8
Turing Machines as Language Enumerators
Definition 6.8.1 A k-tape Turing machine E = (Q, , , , q0 ) enumerates a language L if

i) the computation begins with all tapes blank
ii) with each transition, the tape head on tape 1(the output tape) remains stationary or moves to
the right
iii) at any point in the computation, the nonblank portion of tape 1 has the form
B#u1 #u2 # #uk # or B#u1 #u2 # #uk #v,
where ui L and v
iv) u will be written on the o/p tape 1 preceded and followed by # if, and only if, u L.
Example 6.8.1
The machine E enumerates the language L = {ai bi ci |i 0}.

('
+*
,-.,/0
)

+*

!
"#

1

"#
$
%&
#
Figure 6.5: A k-tape TM for L = ai bi ci
Lemma 6.8.1 If L is enumerated by a Turing machine, then L is recursively enumerable.

Proof:
Assume that L is enumerated by a k-tape Turing machine E. A k+1-tape machine M
accepting L can be constructed from E. The additional tape of M is the input tape; the remaining
k tapes allow M to simulate the computation of E. The computation of M begins with a string u
on its input tape. Next M simulates the computation of E. When the simulation of E writes #, a
string w L has been generated . M then compares u with w and accepts u if u = w. Otherwise,
the simulation of E is used to generate another string from L and the comparision cycle is repeated.
If u L, it will eventually be produced by E and consequently accepted by M .
Chapter 7
The Chomsky Hierarchy

7.1
The Chomsky Hierarchy
Grammars
Type 0 grammars,
phrase-structure grammars,
unrestricted grammars
Type 1 grammars,
context-sensitive grammars,
monoatonic grammars
Type 2 grammars,
context-free grammars
Type 3 grammars,
regular grammars,
left-linear grammars,
right-linear grammars
Languages
Recursively
enumerable
languages
Context-sensitive
languages
Accepting Machines
Turing Machine,
nondeterministic
Turing Machine
Linear-bounded
automata
Context-free
languages
Regular languages
Pushdown automata
51
Deterministic finite
automata,
nondeterministic
finite automata
52
CHAPTER 7. THE CHOMSKY HIERARCHY
Chapter 8
Decidability
8.1
Decision Problems
A decision problem P is a set of questions, each of which has a yes or no answer. The single
question Is 8 a perfect square? is an example of the type of question under consideration in a
decision problem. A decision problem usually consists of an infinite number of related questions. For
example, the problem PSQ of determining whether an arbitrary natural number is a perfect square
consists of the following questions:
p0 : Is 0 a perf ect square?
.
.
.
A solution to a decision problem P is an algoritm that determines the appropriate answer to every
question p P.
An algorithm that solves a decision problem should be
1. Complete
2. Mechanistic
3. Deterministic.
A procedure that satisfies the preceding properties is often called effective.
A problem is decidable if it has a representation in which the set of accepted input strings form a
recursive language. Since computations of deterministic multitrack and multitape machines can be
simulated on a standard Turing machine, solutions using these machines also establishes the decidability of a problem.
53
54
CHAPTER 8. DECIDABILITY
8.2
The Church-Turing Thesis
The Church-Turing thesis asserts that every solvable decision problem can be transformed into an
equivalent Turing machine problem.
The Church-Turing thesis for decision problems: There is an effective procedure to solve a
decision problem if, and only if, there is a Turing machine that halts for all input strings and solves
the problem.
The extended Church-Turing thesis for decision problems A decision problem P is partially
solvable if, and only if, there is a Turing machine that accepts precisely the elements of P whose
answer is yes.
A proof by the Church-Turing thesis is a shortcut often taken in establishing the existence of a decision
algorithm. Rather than constructing a Turing machine solution to a decision problem, we describe
an intuitively effective procedure that solves the problem. The Church-Turing thesis asserts that a
decision problem P has a solution if, and only if, there is a Turing machine that determines the answer
for every p P. If no such Turing machine exists, the problem is said to be undecidable.
8.3
The Halting Problem for Turing Machines
Theorem 8.3.1 The halting problem for Turing machines is undecidable.

Proof:
The proof is by contradiction. Assume that there is a Turing machine H that solves the
halting problem. A string is accepted by H if
i) the input consists of the representation of a Turing machine M followed by a string w
ii) the computation of M with input w halts.
If either of these conditions is not satisfied, H rejects the input. The operation of the machine
H is depicted by the fig 8.1 The machine H is modified to construct a Turing machine H 0 . The

"!
# *) *#+, -

$#% & '(

/.0 )1 0 2# 3 *#4 , -

Figure 8.1: Halting Machine

computations of H 0 are the same as H except H 0 loops indefinitely whenever H terminates in an
accepting state, that is, whenever M halts on input w. The transition function of H 0 is constructed
from that of H by adding transitions that causes H 0 to move indefinitely to the right upon entering
an accepting configuration of H.
H 0 is combined with a copy machine to construct another Turing machine D. The input to D is a
Turing machine representation R(M ). A computation of D begins by creating the string R(M )R(M )
from the input R(M ). The computation continues by running H 0 on R(M )R(M ). The input to the
machine D may be the representation of any Turing machine with alphabet 0, 1, B. In particular, D
is such a machine. Consider a computation of D with input R(D). Rewriting the previous diagram
with M replaced by D and R(M ) by R(D), we get
Examining the preceding computation, we see that D halts with input R(D) if, and only if, D
does not halt with input R(D). This is obviously a contradiction. However, the machine D can be
8.4. A UNIVERSAL MACHINE
55
!"

./0
+%,%-

$#&%('()%(* !)

Figure 8.2: Turing Machine D with R(M) as input
3:9;<=>6@?=9?ABC!="12.3D4
12.354*6
78
12.3D4012.354
<+F,F-B
78
9;<=
3:E&F,G(>A)FH=*9;<+=I6@?=9?AB!C!=I12+3D4
Figure 8.3: Turing Machine D with R(D) as input

constructed directly from a machine H that solves the halting problem. The assumption that the
halting problem is decidable produces the preceding contradiction. Therefore, we conclude that the
halting problem is undecidable.
Corollary 8.3.1 The language LH = {R(M )w|R(M )} where R(M ) is the representation of a Turing
machine M and M halts with input w over {0, 1} is not recursive.
8.4
A Universal Machine
The machine U is called a universal Turing machine since the outcome of the computation of any
machine M with input w can be obtained by the computation of U with input R(M )w.
T `J P O_ @V Z O ` Z X N c!O V

RS.TUV
JKLKLMNO
WYX[Z\ M^],_HJ P T JKa` Z X M SWbU

Ted Q M(_ X Q Of`J P O V@Z O` ZX Nc!O V
Figure 8.4: Universal Machine
P+Q(Q N
56
Theorem 8.4.1 The language LH is recursively enumerable.

Proof: A deterministic three-tape machine U is designed to accept LH by halting. A computation
of U begins with the input on tape 1. The encoding scheme presented in Section 8.3 is used to represent the input Turing machine. If the input string has the form R(M )w, the computation of M with
input w is simulated on tape 3. The universal machine uses the information encoded in the representation R(M ) to simulate the transitions of M . A computation of U consists of the following actions:
1. If the string does not have the form R(M )w for a Turing machine M and string w, U moves
indefinitely to the right.
2. The string w is written on tape 3 beginning at position one. The tape head is then repositioned
at the leftmost square of the tape. The configuration of tape 3 is the initial configuration of a
computation of M with input w.
3. A single 1, the encoding of state q0 , is written on tape 2.
4. A transition of M is simulated on tape 3. The transition of M is determined by the symbol
scanned on tape 3 and the state encoded on tape 2. Let x be the symbol from tape 3 and q i the
state encoded on tape 2.
a) Tape 1 is scanned for a transition whose first two components match en(qi ) and en(x). If
there is no such transition, U halts accepting the input.
b) Assume tape 1 contains the encoded transition en(qi )0en(x)0en(qj )0en(y)0en(d).
Then
i) en(qi ) is replaced by en(qj ) on tape 2.
ii) The symbol y is written on tape 3.
iii) The tape head of tape 3 is moved in the direction Example 3.4.3-2n specified by
5. The next transition of M is simulated by repeating steps 4 and 5.
The simulations of the universal machine U accepts the strings in LH . The computations of U loop
indefinitely for strings in {0, 1} LH . Since LH = L(U ), LH is recursively enumerable.
Corollary 8.4.1 The recursive languages are a proper subset of the recursively enumerable languages.
Proof: The acceptance of LH by the universal machine demonstrates that LH is recursively enumerable while Corollary 8.3.1 established that LH is not recursive.
Note: A language L is recursive if both L and L are recursively enumerable.
Corollary 8.4.2 The language LH is not recursively enumerable.
8.5
The Post Correspondence Problem
The undecidable problems presented in the preceding sections have been concerned with the properties
of Turing machines or mathematical systems that stimulate Turing machines. The Post correspondence problem is a combinatorial question that can be described as a simple game of manipulating
dominoes. A domino consists of two strings from a fixed alphabet, one on the top half of the domino
and the other on the bottom.
The game begins when one of the dominoes is placed on a table. Another domino is then placed to
the immediate right of the domino on the table. This process is repeated, constructing a sequence
of adjacent dominoes. A Post correspondence system can be thought of as defining a finite set of
domino types. We assume that there is an unlimited number of dominoes of each type; playing a
8.5. THE POST CORRESPONDENCE PROBLEM
57

domino does not limit the number of future moves. A string is obtained by concatenating the strings
in the top halves of a sequence of dominoes. We refer to this as the top string. Similarly, a sequence
of dominoes defines a bottom string. The game is successfully completed by constructing a finite
sequence of dominoes in which the top and bottom strings are identical.
Consider the Post correspondence system defined by dominoes in fig 8.5, The sequence in fig 8.6 is a

Figure 8.5: Post Correspondence System
Figure 8.6: Post Correspondence Solution

solution to this Post correspondence system.
Formally, a Post Correspondence System consists of an alphabet and a finite set of ordered
pairs [ui , vi ], i = 1, 2, , n, where ui , vi + . A solution to a Post correspondence system is a
sequence i1 , i2 , , ik such that
ui 1 ui 2 u i k = v i 1 v i 2 v i k .
The problem of determining whether a Post correspondence system has a solution is the Post correspondence problem.
Example 8.5.1
The Post correspondence system with alphabet {a, b} and ordered pairs [aaa, aa], [baa, abaaa] has a
solution.
58

Figure 8.7: Example 8.5.1
Chapter 9
Undecidability
There are specific problems we cannot solve using a computer. These problems are called undecidable. While a Turing Machine looks nothing like a PC, it has been recognized as an accurate model
for what any physical computing device is capapble of doing. We use the Turing Machine to develop
a theory of undecidable problems. We show that a number of problems that are easy to express are
in fact undecidable.
9.1
Problems That Computers Cannot Solve
One particular problem that we discuss is whether the first thing a C program prints is hello, world.
Although we might imagine that simulation of the program would allow us to tell what the program
does, we must in reality contend with programs that take an unimaginably long time before making
any output at all. This problem - not knowing when, if ever, something will occur - is the ultimate
cause of our inablility to tell what a program does.
9.1.1
Programs that Print Hello, World
In fig 9.1, it is easy to discover that the program prints hello, world and terminates. However, there
are other programs that also print hello, world; yet the fact that they do so is far from obvious. Figure
9.2 shows another program that might print hello, world. It takes an input n, and looks for positive
integer solutions to the equation xn + y n = z n . If it finds one, it prints hello, world. If it never
finds integers x, y, and z to satisfy the equation, then it continues searching forever, and never prints
hello, world.
If the value of n that the program reads is 2, then it will eventually find combinations of integers
main()
{
printf (hello, world);
}
Figure 9.1: Hello-World Program
such as total = 12, x = 3, y = 4, and z = 5, for which xn + y n = z n . Thus, for input 2, the program
does print hello, world.
However, for any integer n > 2, the program will never find a triple of positive integers to satisfy
xn + y n = z n , and thus will fail to print hello, world. Interestingly, until a few years ago, it was
known whether or not this program would print hello, world for some large integer n. The claim that
59
60
CHAPTER 9. UNDECIDABILITY
int exp (int i, n)

/* computes i to the power n */
{
int ans, j;
ans = 1;
for (j=1; j<=n; j++) ans *=i;
return(ans);
}
main()
{
int n, total, x, y, z;
scanf(%d, &n);
total = 3;
while( 1 ) {
for( x=1; x<=total-2; x++ )
for ( y=1; y<=total-x-1; y++) {
z = total - x - y;
if ( exp (x,n) + exp (y,n) == exp (z,n) )
printf (hello, world);
}
total++;
}
}
Figure 9.2: Fermats last theorem expressed as a hello-world program
it would not, i.e., that there are no integer solutions to the equation xn + y n = z n if n > 2, was made
by Fermat 300 yars ago, but no proof was found until quite recently. This statement is often referred
to as Fermats last theorem.
Let us define the hello world problem to be: determine whether a given C program, with a given
input, prints hello, world as the first 12 characters that it prints. It would be remarkable indeed if we
could write a program that could examine any program P and input I for P , and tell whether P , run
with I as its input, would print hello, world. We shall prove that no such program exists.
9.1.2
The Hypothetical Hello, World Tester
The proof of impossibility of making the hello-world test is a proof by contradiction. That is, we
assume there is a program, call it H, that takes as input program P and an input I, and tells whether
P with input I prints hello, world. Figure 9.3 is a representation of what H does.
If a problem has an algorithm like H, that always tells correctly whether an instance of the problem

!
Figure 9.3: A hypothetical program H that is a hello-world detector
9.1. PROBLEMS THAT COMPUTERS CANNOT SOLVE
61
has answer yes or no, then the problem is said to be decidable. Our goal is to prove that H
does not exist, i.e. the hello-world problem is undecidable.
In order to prove that statement by contradiction, we are going to make several changes to H,
eventually constructing a related program called H2 that we show does not exist. Since the changes
to H are simple transformations that can be done to any C program, the only questionable statement
is the existence of H, so it is that assumption we have contradicted.
To simplify our discussion, we shall make a few assumptions about C programs.
1. All output is character-based, e.g., we are not using a graphics package or any other facility to
make output that is not in the form of characters.
2. All character-based output is performed using printf, rather than put-char() or another characterbased output function.
We now assume that the program H exists. Our first modification is to change the output no, which
is the response that H makes when its input program P does not print hello, world as its first output
in reponse to input I. As soon as H prints n, we know it will eventually follow with o. Thus, we
can modify any printf statement in H that prints n to instead print hello, world. Another printf
statement that prints o but not the n is omitted. As a result, the new program, which we call
H1 , behaves like H, except it prints hello, world exactly when H would print no. H1 is suggested by
Fig 9.4.
Since we are interested in programs that take other programs as input and tell something about

Figure 9.4: H1 behaves like H, but it says hello, world instead of no

them, we shall restrict H1 so that it:
a. Takes only input P , not P and I.
b. Asks what P would do if its input were its own code, i.e., what would H1 do on inputs P as
program and P as input I as well?
The modifications we must perform on H1 to produce the program H2 as shown in fig 9.5 are as
follows:
1. H2 first reads the entire input P and stores it in an array A, which it mallocs for the purpose.
2. H2 then simulates H1 , but whenever H1 would read input from P or I, H2 reads from the stored
copy in A. To keep track of how much of P and I H1 has read, H2 can maintain two cursors
that mark positions in A.
Fig, 9.6 shows what H2 does when given itself as input. Recall that H2 , given any program P as
input, makes output yes if P prints hello, world when given itself as input. Also, H2 prints hello,
world if P , given itself as input, does not print hello, world as its first output.
Suppose that the H2 represented by the box in fig 9.6 makes the output yes. Then the H2 in the box
is saying about its input H2 that H2 , given itself as input, prints hello, world as its first output. But
we just supposed that the first output H2 makes in this situation is yes rather than hello, world.
Thus it appears that in fig. 9.6 the output of the box is hello, world, since it must be one or the
62

Figure 9.5: H2 behaves like H1 , but uses its input P as both P and I

!
Figure 9.6: What does H2 do when given itself as input?
other. But if H2 , given itself as input, prints hello, world first, then the output of the box in fig. 9.6
must be yes. Whichever output we suppose H2 makes, we can argue that it makes the other output.
This situation is paradoxical, and we conclude that H2 cannot exist. As a result, we have contradicted
the assumption that H exists. That is, we have proved that no program H can tell whether or not a
given program P with input I prints hello, world as its first output.
9.1.3
Reducing One Problem to Another
Suppose we want to determine whether or not some other problem is solvable by a computer. We can
try to write a program to solve it, but if we cannot figure out how to do so, then we can try to prove
that no such program exists.
We could prove this new problem undecidable by a technique similar to what we did for the helloworld problem: assume there is a program to solve it and develop a paradoxical program that must
do two contradictory things. However, once we have a problem that we know is undecidable, we no
longer have to prove the existence of a paradoxical situation. It is sufficient to show that if we could
solve the new problem, then we could use that solution to solve a problem that we already know is
undecidable. This technique is called the reduction of P1 to P2 and is illustrated in fig. 9.7.
Suppose we know that P1 is undecidable, and P2 is a new problem that we would like to prove is
undecidable as well. We suppose that there is a program represented in fig. 9.7 by the diamond
labeled decide; this program prints yes or no, depending on whether its input instance of problem
P2 is or is not in the language of that problem.
In order to make a proof that problem P2 is undecidable, we have to invent a construction, represented
by the square box in fig. 9.7, that converts instances of P1 to instances of P2 that have the same
answer. Once we have this construction, we can solve P1 as follows:
1. Given an instance of P1 , that is, given a string w that may or may not be in the language P1 ,
apply the construction algorithm to produce a string x.
2. Test whether x is in P2 , and give the same answer about w and P1 .
9.2. A LANGUAGE THAT IS NOT RECURSIVELY ENUMERABLE

63

Figure 9.7: Reduction of P1 to P2
If w is in P1 , then x is in P2 , so this algorithm says yes. If w is not in P1 , then x is not in P2 , and
the algorithm says no. Either way, it says the truth about w. Since we assumed that no algorithm
to decide membership of a string in P1 exists, we have proof by contradiction that the hypothesized
decision algorithm for P2 does not exist; i.e., P2 is undecidable.
We shall now give a formal proof of the existence of a problem about Turing Machines that no Turing
Machine can solve. We divide problems that can be solved by a Turing Machine into two classes:
those that have an algorithm(i.e.,a Turing Machine that halts whether or nor it accepts its input),
and those that are only solved by Turing Machines that may run forever on inputs they do not accept.
We prove undecidable the following problem:
Does this Turing Machine accept this input?
Then, we exploit this undecidability result to exhibit a number of other undecidable problems.
9.2
A Language That Is Not Recursively Enumerable
Our goal is to prove undecidable the language consisting of pairs (M, w) such that:
1. M is a Turing machine (coded in binary) with input alphabet 0,1,
2. w is a string of 0s and 1s, and
3. M accepts input w.
We must give a coding for Turing Machines that uses only 0s and 1s, regardless of how many states
the TM has. We can then treat any binary string as if it was a Turing Machine. If the string is not a
well-formed representation of some TM, we may think of it as representing a TM with no moves.
9.2.1
Enumerating the Binary Strings
We shall assign integers to all binary strings so that each integer corresponds to one string. If w is a
binary string, treat w as a binary integer i. Then we shall call w the ith string. That is, is the first
string, 0 is the second, 1 is the third, 00 is the fourth, 01 the fifth, and so on. Equivalently, strings
are ordered by length, and strings of equal length are ordered lexicographically. We refer to the ith
string as wi .
9.2.2
Codes for Turing Machines
To represent a TM M = (Q, 0, 1, , , q1 , B, F ) as a binary string, we must first assign integers to the

states, tape symbols, and directions L and R.
64
1. We shall assume the states are q1 , q2 , ..., qr for some r.The start state will always be q1 , and q2
will be the only accepting state.
2. We shall assume the tape symbols are X1 , X2 , ..., Xs for some s.X1 always will be the symbol
0, X2 will be 1, X3 will be B, the blank. Other tape symbols can be assigned to the remaining
integers arbitrarily.
3. We shall refer to direction L as D1 and direction R as D2 .
Once we have established an integer to represent each state, symbol, and direction, we can encode
the transition function . Suppose one transition rule is (qi , Xj ) = (qk , Xl , Dm ), for some integers
i, j, k, l and m. We shall code this rule by the string 0i 10j 10k 10l 10m . Notice that since all of i, j, k, l,
and m are atleast one, there are no occurrences of two or more consecutive 1s within the code for a
single transition.
A code for the entire TM M consists of all the codes for the transitions, in some order, separated by
pairs of 1s:
C1 11C2 11...Cn 111Cn
where each of the Cs is the code for one transition of M .
9.2.3
The Diagonalization Language
There is now a concrete notion of Mi , the ith Turing Machine: that TM M whose code is wi , the ith
binary string. Many integers do not correspond to any TM at all. If wi is not a valid TM code, we
shall take Mi to be a TM with one state and no transitions. That is, for these values of i, Mi is a
Turing Machine that immediately halts on any input. Thus L(Mi ) is if wi fails to be a valid TM code.
The language Ld , the diagonalization language, is the set of strings wi such that wi is not in
L(Mi ).
That is, Ld consists of all strings w such that the TM M whose code is w does not accept when given
w as input.
The reason Ld is called a diagonalization language can be seen if we consider Fig. 9.8. This table
tells for all i and j, whether TM Mi accepts inout string wj ; 1 means yes and 0 means no. We
may think of the ith row as the characteristic vector for the language L(M i ); that is the 1s in this
row indicate the strings that are members of this language.
The diagonal value tells whether Mi accepts wi . To construct Ld , we complement the diagonal. For
instance, if fig. 9.8 were the correct table, then the complemented diagonal would begin 1,0,0,0,....
Thus, Ld would contain w1 = , not contain w2 through w4 , which are 0,1, and 00, and so on.
The trick of complementing the diagonal to construct the charactersitic vector of a language that
cannot be the language that appears in any row, is called diagonalization.
9.2.4
Proof that Ld is not Recursively Enumerable
Theorem 9.2.1 Ld is not recursively enumerable language. That is, there is no Turing machine that
accepts Ld .
Proof: Suppose Ld were L(M ) for some TM M . Since Ld is a language over alphabet {0,1}, M
would be in the list of Turing machine we have constructed since it includes the TMs with input
alphabet {0,1}. Thus, there is atleast one code for M , say i; that is M = Mi .
Now, ask if wi is in Ld .
If wi is in Ld , then Mi accepts wi . But then, by definition of Ld , wi is not in Ld , because Ld
contains only those such wj that Mj does not accept wj

65

Figure 9.8: The table that represents acceptance of strings by Turing machines
Similarly, if wi is not Ld , then Mi does not accept wi . Thus, by definition of Ld , wi is in Ld .
Since wi can niether be in Ld nor fail to be in Ld , we conclude that there is a contradiction of our
assumption that M exists. That is Ld is not recursively enumerable languages.
+*

!#"!$%$&
'(")
Figure 9.9: Relationship between the recursive, RE, and non-RE languages
9.2.5
Complements of Recursive and RE languages
Theorem 9.2.2 If L is a recursive language, so is L.

Proof: Let L = L(M ) for some TM M that always halts. We construct a TM M such that
L = L(M) by the construction suggested in fig 9.10. That is, M behaves just like M . However, M
is modified as follows to create M:
1. The accepting states of M are made non accepting states of M with no transitions; i.e., in these
states M will halt without accepting.
66
2. M has a new accepting state r ; there are no transitions from r.
3. For each combination of a nonaccepting state of M and a tape symbol of M such that M has
no transition(i.e., M halts without accepting), add a transition to the accepting state r.

Figure 9.10: Construction of a TM accepting the complement of a recursive language

Since M is generated to halt, we know thet M is also guaranteed to halt. Moreover, M accepts exactly
those strings that M does not accept, Thus, M accepts L.
Theorem 9.2.3 If both a language L and its complement are RE, then L is recursive. Note that by
Theorem 9.2.2, L is recursive as well.
Proof: The proof is suggested by fig 9.11. Let L = L(M1 ) and L = L(M2 ). Both M1 and M2 are
simulated in parallel by a TM M . We can make M a two-tape TM, and then convert it to a one-tape
TM, to make the simulation easy and obvious. One tape of M simulates the tape of M 1 , while the
other tape of M simulates the tape of M2 . The states of M1 and M2 are each components of the state
of M .

Figure 9.11: Simulation of two TMs accepting a language and its complement
If input w to M is in L, then M1 will eventually accept. If so, M accepts and halts. If w is not in L,
then it is in L, so M2 will eventually accept. When M2 accepts, M halts without accepting. Thus on
all inputs, M halts, and L(M ) is exactly L. Since M always halts, and L(M ) = L, we conclude that
L is recursive.
67
Summarizing Theorem 9.2.2 and 9.2.3:

L and L are both recursive.
L and L are both RE.
L is RE but not recursive then L is not RE.
L is RE but not recursive then L is not RE.
9.2.6
The Universal Language
Definition 9.2.1 Lu , the universal language, is the set of binary strings that encode, a pair (M, w),
where M is a TM with binary input alphabet, and w is a string in (0 + 1) , such that w is in L(M ).
That is, Lu is the set of strings representing a TM.
We shall show that there is a TM U , often called the universal Turing machine, such that L u = L(U ).
Since the input to U is a binary string U is in fact some Mj in the list of binary-input Turing machine.
Lu = { (M, w) | w L(M ) }
It is easy to describe U as a multitape Turing machine, in the spirit of fig 9.12. In case of U , the
transitions of M are stored initially on the first tape, along with the string w. A second tape will
be used to hold the simulated tape of M , using the same format as for the code of M . That is tape
symbol Xi of M will be represented by 0i , and tape symbols will be seperated by single 1s. The third
tape U holds the state of M , with stae qi represented by i 0s. A sketch of U is in fig 9.12.
The operation of U can be summarized as follows:
1. Examine the input to make sure that the code for M is a legitimate code for some TM. If not,
U halts without accepting. Since invalid codes are assumed to represent the TM with no moves,
and such a TM accepets no inputs, the action is correct.
2. Initialize the second tape to contain the input w, in its encoded form. That is, for each 0 w,
place 10 on the second tape, and for each 1 of w, place 100 there. Note that the blanks on the
simulated tape of M , which are represented by 1000, will not actually appear on that tape; all
cells beyond those used for w will hold the blank of U . However, U knows that, should it look
for a simulated symbol of M and find its own blank, it must replace that blank by the sequence
1000 to simulate the blank of M .
3. Place 0, the start state of M , on the third tape, and move the head of U s second tape to the
first simulated cell.
4. To simulate a move of M , U searches on its first tape for a transition 0i 10j 10k 10l 10m , such that
0i is the state on tape 3, and 0j is the tape symbol of M that begins at the position on tape 2
scanned by U . This transition is the one M would next make. U should:
(a) Change the contents of tape 3 to 0k ; that is, simulate the state change of M . To do so, U
first changes all the 0s on the tape 3 to blanks, and then copies 0k from tape 1 to tape 3.
(b) Replace 0j on tape 2 by 0l ; that is, change the tape symbol of M . If more or less space
is needed (i.e., i 6= l), use the scratch tape and the shifting-over techniques to manage the
spacing.
(c) Move the head on tape 2 to the position of the next 1 to the left or right, respectively,
depending on whether m = 1 (move left) or m = 2 (move right). Thus, U simulates the
move the M to the left or to the right.
5. If M has no transition that matched the simulated state and tape symbol, then in (4), no
transition will be found. Thus, M halts in the simulated configuration, and U must do likewise.
6. If M enters its accepting state, then U accepts.
In this manner, U simulated M on w. U accepts the coded pair (M, w) if and only if M accepts w.
68

!"#$%&
'('('*)+'('('('('*)+',)+''(').- -/-
0 1%&
'('('2-/- - '(3435-/- -

Figure 9.12: Organization of a universal Turing machine
9.2.7
Undecidability of the Universal Language
We can now exhibit a problem that is RE but not recursive; it is the language Lu . Knowing that Lu is
undecidable(i.e., not a recursive language), is in many ways more valuable than our previous discovery
that Ld is not RE. The reason is that the reduction of Lu to another problem P can be used to show
there is no algorithm to solve P , regardless of whether or not P is RE. However, reduction of L d to P
is only possible if P is not RE, so Ld cannot be used to show undecidability for these problems that
are RE but not recursive. On the other hand, if we want to show a problem not to be RE, then only
Ld can be used; Lu is useless since it is RE.
Theorem 9.2.4 Lu is RE but not recursive.
Proof: We just proved in section 9.2.6 that Lu is RE. Suppose Lu were recursive. Then by
Theorem 9.2.2, Lu , the complement of Lu , would also be recursive. However, if we have a TM M to
accept Lu , then we can construct a TM to accept Ld . SInce we already know that Ld is not RE, we
have a contradiction of our assumption that Lu is recursive.
9.3. UNDECIDABLE PROBLEMS ABOUT TURING MACHINES

69

" $#
!
Figure 9.13: Reduction of Ld to Lu
9.3
9.3.1
Undecidable Problems About Turing Machines

Reductions
In general if we have an algorithm to convert instances of a problem P1 to instances of problem P2

that have the same answer, then we say that P1 reduces to P2 . As in Fig 9.14, a reduction must
turn any instance of P1 that has a yes answer into instances of P2 with a yes answer, and every
instance of P1 with a no answer be turned into instance of P2 with a no answer.
Figure 9.14: Reductions turn positive instances into positive and negative to negative
Theorem 9.3.1 If there is a reduction from P1 to P2 , then:

(a) If P1 is undecidable then so is P2 .
(b) If P1 is non-RE then so is P2 .
Proof: First suppose P1 is undecidable. If it is possible to decide P2 , then we can combine the
reduction from P1 to P2 with the algorithm that decides P2 to construct an algorithm, that decides
P1 , as suggested in fig 9.7. Suppose if we are given an instance w of P1 . Apply to w the algorithm
that converts w into an instance x of P2 . Then apply the algorithm that decides P2 to x. If that
algorithm says yes, then x is in P2 . Because we reduced P1 to P2 , we know the answer to w for P1
is yes;i.e., w is in P1 . Likewise, If x is not in P2 then w is not in P1 , and whether answer we give
to the question is x in P2 ? is also the correct answer to w in P1 ?
70
We have thus contradicted the assumption that P1 is undecidable. Our conclusion is that if P1 is
undecidable, then P2 is undecidable.
Now, consider part (b). Assume that P1 is non-RE, but P2 is RE. Now, we have an algorithm to
reduce P1 to P2 , but we have only a procedure to recognize P2 ; that is, there is a TM that says yes,
if its input is in P2 , but may not halt if its input is not in P2 . As for part (a), starting with an instance
w of P1 , convert it by the reduction algorithm to an instance x of P2 . Then apply the TM for P2 to
x. If x is accepted is accepted then accept w.
This procedure describes a TM whose language is a TM. If w is a TM, then x is in P2 , so this TM
will accept w. If w is not in P1 , then x not in P2 . Then, the TM may or may not halt, but will surely
not accept w, Sice we assumed no TM for P1 exists, we have shown by contradiction that no TM for
P2 exists either; i.e., if P1 is non-RE, then P2 is non-RE.
9.3.2
Turing Machine That Accepts the Empty Language
Definition 9.3.1
Le = { M | L(M ) = }
Lne = { M | L(M ) 6= }
Theorem 9.3.2 Lne is recursively enumerable.
Proof:
A nondeterministic TM M can be converted to a deterministic TM

Figure 9.15: Construction of a NTM to accept Lne

The operations of M are as follows:
1. M takes as input a TM code Mi .
2. M : is non deterministic
L(Mi ) 6= M guesses an input w L(Mi )
3. M simulates Mi on w
If Mi accepts w M accepts its own input, which is Mi
In this manner, if Mi accepts even one string, M will guess that string, and accept Mi . However
L(Mi ) = , then no guess w leads to acceptance by Mi , so M does not accept Mi . Thus, L(M ) = Lne .
Theorem 9.3.3 Lne is not recursive.

71

Figure 9.16: Plan of TM M 0 constructed from (M, w)

Proof: Reduction: Lu Lne
Working of M 0 :
1. x on input tape; M 0 overrides x with string (M, w). Since M 0 is designed for a specific pair
(M, w), which has some length n, we may construct M 0 to have a sequence of states q0 , q1 , ..., qn ,
where q0 is the start state.
(a) In state qi , for i = 0, 1, ..., n 1, M 0 writes the (i + 1)st bit of the code for (M, w), goes to
state q( i + 1), and moves right.
(b) In state qn , M 0 moves right, If necessary, replacing any nonblanks by blanks.
2. When M 0 reaches a blank in state qn , it uses a similar collection of states to reposition its head
at the left end of the tape.
3. Now, using additional states, M 0 simulates TM U on its present tape.
4. If U accepts (w), then M 0 accepts (x). If U never accepts, then M 0 never accepts either.
Construction of M 0 :
If M accepts w: M 0 accepts anything
L(M 0 ) 6= M 0 Lne
If M never accpets w: M 0 never accepts anything
L(M 0 ) = M 0
/ Lne
Reduction of Lu and non-recursiveness of Lu is sufficient to complete the proof.
Lne not recursuve.
However, to illustrate the impact of reduction consider if Lne were recursive, then we could develop
an algorithm for Lu as follows: Since from Theorem 9.2.4 that no such algorithm for Lu exists. we
Algorithm 3 Algorithm for Lu
1. Convert (M, w) to the TM M 0 as above.
2. Use the hypothetical algorithm for Lne to tell whether or not L(M 0 ) = . If so, say M does not
aceept w; if L(M 0 ) 6= , say M does not accept w.
have contradicted the assumption that Lne is recursive, and conclude that Lne is not recursive.
Now, we know the status of Le . If Le were RE, then by Theorem 9.2.3, both it and Lne would be
recursive. Since Lne is not recursive by Theorem 9.3.3, we conclude that:
72
Theorem 9.3.4 Le is not RE.
9.3.3
Rices Theorem and Properties of RE Languages
A property of the RE languages is simply a set of RE languages. Thus, the property of being contextfree is formally the set of all CFLs. The property of being empty is the set {} consisting of only the
empty language.
A property is trivial if it is either empty (i.e., satisfied by no language at all), or is all RE languages.
Otherwise, it is nontrivial.
Note that the empty property, , is different from the property of being an empty language,{}.
if P is a property of the RE languages, the language LP is the set of codes for the Turing Machines
in Mi such that L(Mi ) is a language in P. When we talk about the decidability of a property P, we
mean the decidability of the language LP
Theorem 9.3.5 Every nontrivial property of the RE languages is undecidable.
Let P be a nontrivial property of the RE languages. Assume to begin that , the empty language, is
not in P; Since P is nontrivial, there must be some non-empty languages L that is in P. Let ML be
a TM accepting L.
LP = { M | L(M ) P}
Example:
1. LP is { M | L(M ) 6= } = Lne
2. LP is { M | L(M ) = } = Le
3. LP is { M | L(M ) is regular language }
4. LP is { M | L(M ) is CF }
Proof:
Reduce Lu LP
1. Assume
/P
P is non-trivial L P which is not empty.
L is RE TM ML that accepts L.
(a) Simulate M on input w. Note that w is not the input M 0 ; rather, M 0 writes M and w onto
one of its tapes and simulates the universal TM U on that pair.
(b) If M does not accept w, then M 0 does not accept nothing else. M 0 never accepts its own
input x, so L(M 0 ) =. Since we assume is not in property P, that means the code for M 0
is not in LP .
(c) If M accepts w, then M 0 begins simulating ML on its own input x. Thus, M 0 will accept
exaclty the language L. Since L is in P, the code for M 0 is in LP .
M accepts w L(M 0 ) P
M does not accepts w L(M 0 )

/P
Since the above algorithm turns (M, w) into M 0 that is in LP if and only if (M, w) is in Lu , this
algorithm is reduction of Lu to LP , and proves that the property P is undecidable.

73

Figure 9.17: Construction of a M 0 for the proof of Rices Theorem

2. Now P
/ LP
From (1) we know that LP is undecidable. However, since every TM accepts an RE language,
LP , the set of codes for turing machines that do not accept a language in P is the same as LP ,
the set of TMs that accept language in P. Suppose LP were decidable. Then so would LP ,
because the complement of a recursive language is recursive (Theorem 9.2.2).
Theorem 9.3.6 Rices theorem on recursive index sets states that if P is non-trivial, L P is not
recursive.
Theorem 9.3.7 If LP is RE; then the list of binary codes for the finite sets in P is enumerable.
Proof: Let (i, j) be a pair generated and we treat i as the binary code of a finite set, assuming
0 is the code for comma, 10 the code for zero, and 11 the code for one. We may in straightforward
manner contruct a TM M (i) (essentially a finite automaton) that accepts exactly the words in the
finite language represented i. We then simulate the enumerator for LP for j steps. If it has printed
M (i) , we print the code for the finite set represented by i, that is, the binary representation of i
itself,followed by a delimiter symbol #. In any event, after the simulation we return control to the
pair generated, which generates the pair following (i, j).
Theorem 9.3.8 LP is RE if and only if
1. If L is in P and L L0 , for some RE L0 , then L0 is in P
2. If L is an infinite set in P, then there is some finite subset L0 of L that is in P.
3. The set of finite languages in P is enumerable.
Corollary 9.3.1
The following properties of RE.,sets are not RE.
a) L =
b) L =
c) L is recursive
d) L is not recursive
74
e) L is a singleton
f) L is a regular set
g) L Lu 6=
Example 9.3.1
Show that the following properties of RE languages are not RE.
1. L =
P = { L | L = }.
LP = { M T M description | L(M ) = }.
Let L1 P i.e. L1 = . Let L2 = . L1 is a subset of L2 , but L2
/ P . Rule 1 of theorem 9.3.8
is not satisfied and so L is not RE.
2. S1 = {M (T M )|L(M ) = 01 0}
P = {01 0}. Let L1 = S1 . Let us take L2 = . We can see that L1 is a subset of L2 , and we
know that is RE. But we also know that does not belong in P i.e. L2
/ P.The first rule
of theorem 9.3.8 is not satisfied. Therefore, S1 is not RE.
3. S2 = M |L(M ) 6= 01 0
P 6= {01 0}. Let L1 be the language that contains strings of the form 00 i.e. L1 = 00 and L2 =
01 0. L1 L2 . L2 is RE and L2
/ P . This once again does not satisfy rule 1 of theorem 9.3.8.
Therefore, S2 is not RE.
4. L is Recursive.
P = { L | L is recursive }.
LP = { M T M description | L(M ) is recursive }.
Let L1 P i.e. L1 is recursive. Let L2 = L1 Lu . L1 is a subset of L2 , but L2
/ P , since Lu
is RE but not recursive(theorem 6.3.1). Rule 1 of theorem 9.3.8 is not satisfied and so L is not
RE.
5. L is not Recursive.
P = { L | L is not recursive }.
LP = { M T M description | L(M ) is not recursive }.
Let L1 P i.e. L1 is not recursive. Let L2 = . L1 is a subset of L2 , but L2
/ P . Rule 1 of
theorem 9.3.8 is not satisfied and so L is not RE.
6. L is a singleton.
P = {L | L is a singleton }.
LP = { M T M description | L(M ) is a singleton }.
Let L1 P i.e. L1 is a singleton. Let L2 = . L1 is a subset of L2 , but L2

/ P . Rule 1 of
theorem 9.3.8 is not satisfied and therefore L is not RE.
7. L is a regular set.
P = {L | L is a regular set }.
LP = { M T M description | L(M ) is a regular set }.

Let L1 P i.e. L1 is a regular set. Let L2 = L1 {0n 1n | n 0 }. L1 is a subset of L2 , but
L2
/ P . Rule 1 of theorem 9.3.8 is not satisfied and therefore L is not RE.
75
Corollary 9.3.2
The following properties of RE.,sets are RE.
a) L 6=
b) L contains atleast 10 memebers
c) w is in L for some fixed word w
d) L Lu 6=
Example 9.3.2
Show that the following properties of RE sets are RE.
1. L 6=
P = { L | L 6= }.
LP = { M T M description | L(M ) 6= }, which is the language Lne
We know that Lne is RE(theorem 9.3.2).
2. L contains atleast 10 memebers.
P = { L | |L| 10 }.
LP = { M T M description | |L(M )| 10 }
There exists a TM T10 (fig. 9.18) that non-deterministically guesses strings. The TM accepts
after 10 strings are guessed. Therefore, L is RE.

Figure 9.18: Turing Machine that accepts after guessing 10 strings

3. w is in L for some fixed word w
There exists a TM Tw (fig. 9.19) that will take as input TM M and simulates M on the string
w. If M accepts w, them Tw accepts M . If Tw accepts M , the string w L(M ). Therefore, the
property: w is in L for some fixed word w is RE.
4. L Lu 6=
There exists a TM TLLu (fig. 9.20) that takes as input some string w and simulates TM M on
w. If M accepts w, then TLLu simulates Lu on w. If Lu accepts w then TLLu accepts and
halts. This Turing machine accepts strings that are in L Lu . Therefore this property is RE.
76

Figure 9.19: Turing Machine that simulates M on w

Figure 9.20: Turing Machine for L Lu

9.4. POSTS CORRESPONDENCE PROBLEM
9.4
77
Posts Correspondence Problem
Definition 9.4.1
An instance of Posts Correspondence Problem(PCP) consists of two lists of strings over some alphabet
; the two lists must be of equal length. We generally refer to th A and B lists, and write A =
w1 , w2 , ...wk and B = x1 , x2 ...xk , for some integer k. For each i, the pair (wi , xi ) is said to be a
corresponding pair.
We say this instance of PCP has a solution, if there is a sequence of one or more integers i 1 , i2 , ..., im
that, when interpreted as indexes for strings in the A and B lists, yield the same string. That is,
wi1 wi2 ...wim = xi1 xi2 ...xim . We say the sequence i1 , i2 , ..., im is a solution to this instance of PCP, if
so. The Posts correspondence problem is:
Given an instance of PCP, tell whether this instance has a solution.
We shall prove PCP undecidable by reducing Lu to PCP.
9.4.1
The Modified PCP
It is easier to reduce Lu to PCP if we first introduce an intermediate version of PCP, which we call
the Modified Posts Correspondence Problem, or MPCP. In the modified PCP, there is an additional
requirement on a solution that the first pair of the A and B lists must be the first pair in the solution.
More formally, an instance of MPCP is two lists A = w1 , w2 , ..., wk and B = x1 , x2 , ..., xk , and a
solution is a list of 0 or more integers i1 , i2 , ..., im such that
w1 wi1 wi2 ...wim = x1 xi1 xi2 ...xim
Theorem 9.4.1 MPCP reduces to PCP.
Theorem 9.4.2 Posts Correspondence Problem is undecidable.
9.5
Other Undecidable Problems
Now, we shall consider a variety of problems that we can prove undecidable by reducing PCP to the
problem we wish to prove undecidable.
9.5.1
Undecidability of Ambiguity for CFGs
Now, we shall see how to reduce PCP to the problem: the question of whether a given context-free
grammar is ambiguous.
Let the PCP instance consist of lists A = w1 , w2 , ..., wk and B = x1 , x2 , ..., xk . For list A we shall
construct a CFG with A as the only variable. The terminals are all the symbols of the alphabet
used for this PCP instance, plus a distinct set of index symbols a1 , a2 , ..., ak that represent the choices
of pairs of strings in a solution to the PCP instance. That is, the index symbol ai represents the
choice of wi from the A list of xi from the B list. the productions for the CFG for the list A are:
A w1 Aa1 |w2 Aa2 |...|wk Aak |w1 a1 |w2 a2 |...|wk ak
We shall call this grammar GA and its language LA . LA is the language for list A.
Notice that the terminal strings derived by GA are all those of the form wi1 wi2 ...wim aim ...a12 ai1 for
some m 1 and list of integers i1 , i2 , ..., im ; each integer is in the range 1 to k. The sentential forms of
GA all have a single A between the strings (the ws) and the index symbols (the as), until we use one
of the last group of k productions, none of which have an A in the body. Also, only two production
bodies end with a given index symbol ai : A wi Aai and A wi ai .
Now, let us consider the other part of the given PCP instance, the list B = x1 , x2 , ..., xk . For this list
we develop another grammar GB :
78
B x1 Ba1 |x2 Ba2 |...|xk Bak |x1 a1 |x2 a2 |...|xk ak
The language of this grammar will be referred to as LB . The same observations that we made for
GA apply also to GB . In particular a terminal string in LB has a unique derivation, which can be
determined by the index symbols in the tail of the string.
Finally, we combine the languages and grammars of the two lists to form a grammar G AB for the
entire PCP instance. GAB consists of:
1. Variables A, B, and S; the latter is the start symbol.
2. Productions S A|B.
3. All the productions of GA .
4. All the productions of GB .
We claim that GAB is ambiguous if and only if the instance (A, B) of PCP has a solution; that argument is the core of the next theorem.
Theorem 9.5.1 It is undecidable whether a CFG is ambiguous.

Proof: We have already given the reduction of PCP to the question of whether a CFG is ambiguous;
that reduction proves the problem of CFG ambiguity to be undecidable, since PCP is undecidable.
We have to show that the above construction is correct, that is:
GAB is ambiguous if and only if instance (A, B) of PCP has a solution.
(If) Suppose i1 , i2 , ..., im is a solution, we know that wi1 wi2 ...wim = xi1 xi2 ...xim . Thus, these two
derivations are derivations of the same terminal string. SInce the derivations themselves are clearly
two distinct, leftmost derivations of the same terminal string, we conclude that GAB is ambiguous.
(Only if) We already observed that a given terminal string cannot have more than one derivation in
GA and not more than one in GB . So the only way that a terminal string could have two leftmost
derivations in GAB is if one of them begins S A and continues with a derivation in GA , while the
other begins S B and continues with a derivation of the same string in GB .
The string with two derivations has a tail of indexes aim ...ai2 ai1 , for some m 1.This tail must be
a solution to the PCP instance, because what precedes the tail in the string with two derivations is
both wi1 wi2 ...wim and xi1 xi2 ...xim .
9.5.2
The Complement of a List Language
Having context-free languages like LA for the list A let us show a number of problems about CFLs to
be undecidable. More undecidability facts for CFLs can be obtained by considering the complement
languages LA . Notice that the languages LA consists of all strings over the alphabet {a1 , a2 , ...ak }
that are not in LA , where is the alphabet of some of PCP, and the ai s are distict symbols representing the indexes of pairs in that PCP instance.
We claim that LA is a CFL. Unlike LA , it is not very easy to design a grammar for LA , but we can
design a PDA, in fact a deterministic PDA, for LA . The construction is in the next theorem.
Theorem 9.5.2 If LA is the language for the list A, then LA is a context-free language.
Proof: Let be the alphabet of strings on a list A = w1 , w2 , ..., wk , and let I be the set of index
symbols. I = {a1 , a2 , ..., ak }. The DPDA P we design to accept LA works as follows.
1. As long as P sees symbols in , it stores then on its stack. Since all strings in as in LA , P
accepts as it goes.
9.5. OTHER UNDECIDABLE PROBLEMS
79
2. As soon as a P sees an index symbol in I, say ai , it pops its stack to see if the top symbols from
wiR , that is , the reverse of the corresponding string.
(a) If not, then the input seen so far, and any continuation of this input is in LA . Thus, P goes
to an accepting state in which it consumes all future inputs without changing the stack.
(b) If wiR was popped from the stack, but the bottom-of-stack marker is not yet exposed on
the stack, then P accepts, but remembers, in its state that it is looking for symbols in I
only, and may yet see a string in LA (which P will not accept). P repeats step(2) as long
as the question of whether the input is in LA is unresolved.
(c) If wiR was popped from the stack, and the bottom-of-stack marker is exposed, then P has
seen an input in LA . P does not accept this input. However, since any input continuation cannot be in LA , P goes to a state where it accepts all futute inputs, leaving stack
unchanged.
3. If, after seeing one or more symbols of I, P sees another symbol in , then the input is not of
the correct form to be in LA . Thus, P goes to a state in which it accepts this and all future
inputs without changing its stack.
Theorem 9.5.3
Let G1 and G2 be context-free grammars, and let R be a regular expression. Then the following are
undecidable.
(a) Is L(G1 ) L(G2 ) = ?
(b) Is L(G1 ) = L(G2 )?
(c) Is L(G1 ) = L(R)?
(d) Is L(G1 ) = T for some alphabet T ?
(e) Is L(G1 ) L(G2 )?
(f) Is L(R) L(G1 )?
Proof: Each of the proofs is reduction from PCP. We show how to take an instance (A, B) of PCP
and convert it to a question about CFGs and/or regular expressions that has answer yes if and
only if the instance of PCP has a solution. In some cases, we reduce PCP to the question as stated in
the theorem; in other cases we reduce it to the complement. It doesnt matter, since if we show the
complement of a problem to be undecidable, it is not possible that the problem is decidable, since the
recursive languages are closed under complementation. (Theorem 9.2.2)
Let the alphabet strings for the instance be and the alphabet of index symbols be I. Our reductions
depend on the fact that LA , LB , LA and LB all have CFGs. We construct these CFGs either
directly as in section 9.5.1 or by the construction of a PDA for the complement languages given in
Theorem 9.5.2 coupled with conversation from a PDA to a CFG.
(a) Let L(G1 ) = LA and L(G2 ) = LB . Then L(G1 ) L(G2 ) is the set of solutions to this instance
of PCP. The intersection is empty if and only if there is no solution. Note that, technically, we
reduced PCP to the language of pairs of CFGs, whose intersection is nonempty;i.e., we have
shown the problem is the intersection of two CFGs nonempty to be undecidable. However, as
mentioned in the introduction of the proof, showing complement of a problem to be undecidable
is tantamount to showing the probelm itself undecidable.
(b) Since CFGs are closed under union, we can construct a CFG G1 for L(G1 ) L(G2 ). Since
( I) is a regular set, we surely may construct for it a CFG G2 . Now LA LB = LA LB .
Thus, L(G1 ) is missing only those strings represent solutions to the instance of PCP. L(G2 ) is
missing no string in ( I) . Thus, their languages are equal if and only if the PCP instance
has no solution.
80
(c) The argument is the same as for (b), but we let R be the regular expression ( I) .
(d) The argument of (c) suffices, since I is the only alphabet of which LA LB could possibly
be the closure.
(e) Let G1 be a CFG for ( I) and let G2 be a CFG for LA LB . Then L(G1 ) L(G2 ) if and
only if LA LB = ( I) ,i.e., if and only if PCP instance has no solution.
(f) The argument is the same as (e), but let R be the regular expression ( I) , and let L(G1 ) be
in LA LB .
Chapter 10
Intractable Problems
In this chapter we introduce the theory of intractability, that is techniques for showing problems
not to be solvable in polynomial time.
10.1
The Classes P and N P
10.1.1
Problems Solvable in Polynomial Time
A Turing machine M is said to be of time-complexity T (n) [or to have running time T (n)] if whenever
M is given an input w of length n, M halts after making at most T (n) moves, regardless of whether
or not M accepts. We say a language L is in class P if there is some polynomial time T (n) such that
L = L(M ) for some deterministic TM M of time complexity T (n).
10.1.2
An Example: Kruskals Algorithm
Let us consider the problem of finding a minimum-weight spanning tree for a graph. A spanning tree
is a subset of the edges such that all nodes are connected through these edges, yet there are no cycles.
An example of a spanning-tree appears in fig 10.1. A minimum-weight spanning tree has the least
possible total edge weight of all spanning trees.
There is a well-known greedy algorithm, called Kruskals algorithm, for finding a MWST.

Figure 10.1: A graph

1. Maintain for each node the connected component in which the node appears, using whatever
edges of the tree have been selected so far. Initially, no edges are selected, so every node is then
in a connected component by itself.
81
82
CHAPTER 10. INTRACTABLE PROBLEMS

2. Consider the lowest-weight edge that has not yet been considered; break ties anyway you like.
If this edge connects two nodes that are currently in different connected components then:
(a) Select that edge for the spanning tree, and
(b) Merge the two connected components involved, by changing the component number of all
nodes in one of the two components to be the same as the component number of the other.
If, on the other hand, the selected edge connects two nodes of the same component, then this
edge does not belong in the spanning tree; it would create a cycle.
3. Continue considering edges until either all edges have been considered, or the number of edges
selected for the spanning tree is one less than the number of nodes. Note that in the latter case,
all nodes must be in one connected component, and we can stop considering edges.
Example 10.1.1
In the graph of fig 10.1, we first consider the edge (1,3), because it has the lowest weight, 10. since
1 and 3 are initially in different components, we accept this edge, and make 1 and 3 have the same
component number, say component 1. The next edge in order of weights is (2,3), with weight 12.
Since 2 and 3 are in different components, we accept this edge and merge node 2 into component
1. The third edge is (1,2), with weight 15. However, 1 and 2 are now in the same component, so we
reject this edge and proceed to the fourth edge, (3,4). Since 4 is not in component 1, we accept this
edge. Now, we have three edges for the spanning tree of a 4-node graph, and so may stop.
It is possible to implement this algorithm(using a computer not a TM) on a graph with m nodes
and e edges in time O(m + e log e). A simpler implementation proceeds in e rounds. A table
gives the current component of each node. We pick the lowest-weight remaining edge in O(e) time,
and find the components of the two nodes connected by the edge in m time. If they are in different
components, merge all nodes wth those numbers in O(m) time, by scanning the table of nodes. The
total time taken by this algorithm is O(e(e + m)). This running time is polynomial in the size of
the input, which we might informally take to be the sum of e and m.
When we translate the above ideas to Turing Machines, we face several issues:
When we study algorithms we encounter problems that ask for outputs in a variety of forms, such
as a list of edges in a MWST. When we deal with Turing machines, we may ony think of problems
as languages, and th only output is yes or no, i.e. accept or reject. For instance, the MWST
tree problem could be couched as: Given this graph G and limit W , does G have a spanning
tree of weight W or less? That problem may seem easier to answer han the MWST problem
with which we are familiar, since we dont even learn what the spanning tree is. However, in
the theory of intractability. we generally want to argue that a problem is hard, not easy, and
the fact that a yes-no version of a problem is hard implies that a more standard version, where
a full answer must be computed, is also hard.
While we might think informally of the size of a graph as the number of nodes or edges, the
inout to a TM is a string over a finite alphabet. Thus, problem elements such as nodes and edges
must be encoded suitably. The effect of this requirement is that inputs to Turing Machines are
generaly slightly longer than the intuitive size of the input. However there are two reasons why
the difference is not significant:
1. The difference between the size as a TM input string and as an informal problem inout is
never more than a small factor, usually the logarithm of the input size. Thus, what can
be done in polynomial time using one measure can be done in polynomial time using the
other measure.
2. The length of a string representing the input is actually a more accurate measure of the
number of bytes a real computer has to read to get its input. For instance, if a node is
10.1. THE CLASSES P AND N P
83
represented by an integer, then the number of bytes needed to represent that integer is
proportional to the logarithm of the integers size, and it is not 1 byte for any node as
we might imagine in an informal accounting for input size.
Example 10.1.2
Let us consider a possible code for the graphs and weight limits that could be input to the MWST
problem. The code has five symbols, 0,1, the left and right parentheses, and the comma.
1. Assign integers 1 through m to the nodes.
2. Begin the code with the value of m in binary and the weight limit W in binary, separated by a
comma.
3. If there is an edge between nodes i and j with weight w, place (i, j, w) in the code. The integers
i, j, and w are coded in binary. The order of i and j within an edge, and the order of edges
within the code are immaterial.
Thus, one of the possible codes for graph of Fig 10.1 with limit W = 40 is
100,101000(1,10,1111)(1,11,1010)(10,11,1100)(10,100,10100)(11,100,10010)
If we represent inouts to the MWST problem as in Example 10.1.2, then an inout of length n can
represent at most O(n/logn) edges. It is possible that m, the number of nodes, could be exponential
in n, if there are very few edges. However, unless the number if edges, e, is atleast m-1, the graph
cannot be connected and therefore will have no MWST, regardless of its edges. Consequently, if the
number of nodes is not atleast some fraction of n/log n, there is no need to run Kruskals algorithm
at all; we simply say no; there is no spanning tree of that weight.
Thus, if we have an upper bound on the running time of Kruskals algorithm as a function of m and
e, such as the upper bound O(e(m + e)) developed above, we can conservatively replace both m and
e by n and say that the running time, as a function of the input length n is O(n(n + n)), or O(n2 ).
We claim that in O(n2 ) steps we can implement the version of Kruskals algorithm described above
on a multitape TM. The extra tapes are used for several jobs:
1. The input tape can hold the code for graph beginning with the number of nodes, the limit W
and the edges as described in Example 10.1.2.
2. The second tape stores the list of nodes and their current components. This tape is O(n) in
length.
3. A third tape is used to store the current least weight edge. Scanning for the lowest-weight
unmarked edge takes O(n) time.
4. When an edge is selected in a round, place its two nodes on a fourth tape. Search the table of
nodes and components to find the components of these two nodes. This requires O(n) time.
5. A tape can be used to hold the two components i and j, being merged. We scan the table of
nodes and components and each node found to be in component i has its component number
changed to j. This scan takes O(n) time.
We should thus be able to say that one round can be executed in O(n) time on a multi-tape TM.
Since the number of rounds, e, is at most n, we conclude that O(n2 ) time suffices on a multi-tape
TM. Theorem 6.6.1 says that whatever a multitape TM can do in s steps, a single-tape TM can d in
O(s2 ) steps. Thus, if the multitape TM takes O(n2 ) steps, then we can construct a single-tape TM
to do the same thing in 0(n4 ) steps. Our conclusion is that the yes-no version of the MWST problem,
does graph G have a MWST of total weight W or less, is in P.
84
CHAPTER 10. INTRACTABLE PROBLEMS
10.1.3
An N P Example: The Travelling Salesman Problem
The Travelling Salesman Problem (TSP) is an example of a problem that appears to be in N P but
not in P. The input to TSP is the same as to MWST, a graph with integer weights on the edges such
as that of fig 10.1, and a weight limit W . The question asked whether graph has a Hamilton circuit
of total weight at most W .
Definition 10.1.1 A Hamilton circuit is a set of edges that connect the nodes into single cycle,
with each node appearing exactly once. Note that the number of edges on a Hamilton circuit must
equal the number of nodes in the graph.
10.1.4
NP-complete Problems
Definition 10.1.2 Let L be a language(problem) in N P. L is NP-complete if the following statements are true about L:
1. L is in N P.
2. For every language L0 in N P there is a polynomial-time reduction of L0 to L.
Theorem 10.1.1 If P1 is NP-complete, P2 is in N P, and there is a polynomial-time reduction of P1
to P2 , then P2 is NP-complete.
Theorem 10.1.2 If some NP-complete probelm P is in P, then P = N P.
10.1.5
The Satisfiability Problem
Definition 10.1.3
The boolean expressions are built from:
1. Variables whose values are boolean;i.e., they either have the value 1(true) or 0(false).
2. Binary operations and , standing for the logical AND and OR of two expressions.
3. Unary operation standing for logical negation.
4. Parentheses to group operators and operands, if necessary to alter the default precedence of
operators: highest, then , and finally .
Theorem 10.1.3 (Cooks Theorem) SAT is NP-complete.
10.1.6
NP-Completeness of 3SAT
Definition 10.1.4 3SAT probelm is:

Given a boolean expression E that is the product of cluases, each of which is the sum of three
distict literals, is E satisfiable?
Theorem 10.1.4 3SAT is NP-complete.
Bibliography
[1] Sudkamp A. Thomas, Languages and Machines, An Introduction to the Theory of Compute Science, Addison Wesley Publishing Company, Inc.,United States of America, 1997.
85

Final3 PDF

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Final3 PDF

Diunggah oleh

Hak Cipta:

Format Tersedia

Theory of Computation Class Notes1

based on the books by Sudkamp and by Hopcroft, Motwani and Ullman

2 Languages and Grammars

3 Finite State Automata

4 Regular Languages and Sets

5 Pushdown Automata and Context-Free Languages

Two-Way Tape Machines . . . . . . . . . .

7 The Chomsky Hierarchy

H1 behaves like H, but it says hello, world instead of no . . . . . . . . .

A set S1 is said to be a subset of S if every element of S1 is also an element of S. We write this as

The following identities are known as the de Morgans laws,

x U and ((x S1 ) and (x S2 ))

If S1 and S2 have no common element, that is,

(By induction on the number of elements in S).

Basis: |S| = 1 2S = {, S} |2S | = 21 = 2

1.2. FUNCTIONS AND RELATIONS

{y1 , yk+1 } {y2 , yk+1 } . . . {yk , yk+1 }

x,ySk {x, y, yk+1 } . . .

Functions and Relations

A specific kind of relation is an equivalence relation. A relation denoted r on X is an equivalence

1.3. COUNTABLE AND UNCOUNTABLE SETS

Countable and uncountable sets

> 0, where p and q are integers, q 6= 0.

by mathematical induction. We establish

and observing that both sides are 0.

and in view of the induction hypothesis, we need only show that

The latter equality follows from simple algebraic manipulation.

Show that 2 is not a rational number.

1.4. PROOF TECHNIQUES

f0 f0 (0) f0 (1) f0 (2)

f1 f1 (0) f1 (1) f1 (2)

f3 f3 (0) f3 (1) f3 (2)

Languages and Grammars

L2 = {c, cc, ccc}

L1 = {ab, ba, aa, bb, }

CHAPTER 2. LANGUAGES AND GRAMMARS

= {, a, b, aa, ab, ba, bb, aaa, aab, . . .}.

We define L as L concatenated with itself n times, with the special case

L1 L2 = {ab, abbb, aaab, aaabbb}

and the positive closure as

CHAPTER 2. LANGUAGES AND GRAMMARS

Definition 2.2.1 Let be a given alphabet. Then,

Definition 2.3.1 A grammar G is defined as a quadruple

finite set of symbols called variables or nonterminals,

We assume V and are non-empty and disjoint sets.

we say that w1 derives w, and write w1 w.

CHAPTER 2. LANGUAGES AND GRAMMARS

< N oun phrase >< Article >< N oun >

 4 $&>

(*),+- .  /0

Figure 2.1: Derivation tree

< multi op > | /

< variable > A | B | C | | Z

CHAPTER 2. LANGUAGES AND GRAMMARS

For example, let us derive a4 .

L(G) = {, ab, aabb, aaabbb, . . .}

To prove that L = L(G)

CHAPTER 2. LANGUAGES AND GRAMMARS

Classification of Grammars and Languages

2.5. NORMAL FORMS OF CONTEXT-FREE GRAMMARS

For a regular grammar, the production rules are of the form

Normal Forms of Context-Free Grammars

Definition 2.5.1 A context-free grammar G = (V, , P, S) is in Chomsky Normal Form if each

4 $&>

(*),+- . /0