Anda di halaman 1dari 30

AUTOMATA THEORY, COMPUTABILITY AND FORMAL LANGUAGE

(CSC309)
Lecturer:
Dr. Oyelade, O. J.
AIM AND OBJECTIVES
The course is all about the theories that enable computation, and
computation is all about modeling, designing, and programming the
computer system to simulate our model.
In this course, we will be concern about the languages or in other words,
formal languages that enable computation with the computer possible.
The course content, therefore, includes:
1.0 Introduction
1.1 Alphabet and Strings
1.2 Languages
1.3 Language operation
2.0 Grammars
2.1 Definition
2.2 Regular Grammar
2.3 Regular expression
2.4 Relationship between regular grammar and regular expression
2.5 Types of Grammar (Chomsky hierarchy)
3.0 Finite Automata
3.1 Deterministic and Non-deterministic finite automata
3.2 Conversion automata to certain types of grammars and back again,
using non-deterministic automata
3.3 Conversion of non-deterministic finite automata to deterministic finite
automata
3.4 Regular expressions and their relationship to finite automata
4.0 Pushdown automata and context-free grammars
4.1 Deterministic and non-deterministic pushdown automata
4.2 Context-free grammars
4.3 Useless production and emptiness test
4.4 Ambiguity
4.5 Context-free grammars for pushdown automata and vice-versa
5.0 Properties of Context-free languages
5.1 Pumping lemma
5.2 Closure properties
5.3 Existence of non-context-free languages
5.4 Turing languages
5.5 Decidability and Undecidability

Automata Theory, Computability and Formal Language

Oyelade O.J.

Recommended Texts
1.
Lawson, M.V. Finite Automata. Chapman and Hall/CRC, 2004
2.
Brookshear, J.G. Theory of Computation: Formal languages,
Automata, and Complexity. The Benjamin/Cummings Publishing
Company, Inc. 1989.
3.
Carroll, J. and Long, D. Theory of Finite Automata (with an
introduction to formal languages). Prentice Hall, 2004.

Automata Theory, Computability and Formal Language

1.0

Oyelade O.J.

Introduction

1.1 Alphabet and strings


Nowadays, people are familiar with the idea of digitizing information i.e.
conversion of information from analogue to discrete form; for example, computers
operate only in 0s and 1s and users of computers do not have to communicate
with them in binary; for example, voice recognition technology enables us to input
data without using the keyboard, computer graphic present output in the form of
animation. All these things are only possible because of the underlying sequences
of 0s and 1s that encode this information.
Therefore, information in all its forms usually represented as sequences of symbols
and any set of symbols that is used in this way is called an alphabet. Any finite
sequence whose components are drawn from is called a string over or a
string. The elements of an alphabet are called symbols or letter. The number
of symbols in an alphabet is denoted by ||.
Some examples of alphabets are the following:
an alphabet for working of a computer is {0, 1}
an alphabet for representing natural numbers in base 10 is
{0,1,2,3,4,5,6,7,8,9}
an alphabet for writing letters in English is {a,.,z, A,., Z}
the alphabet used for describing a programming language called a set of
tokens of the language. e.g. in C language, the following are all tokens:
main, printf(), {, }.
e.t.c.
If is a string, then | | denote the total number of symbols appearing in and is
called the the length of if a then | |a is the total number of as appearing in
. For example, | | = 0 and |01101| = 5; |01101|0 = 2 and |01101|1 = 3.

The set of all strings over the alphabet is denoted by * and the empty string is
denoted by | |.
The set of all strings except the empty one is denoted one is denoted by + .
Given two strings x, y *, a new string xy can be form, which we called
concatenation of x and y by adjoining the symbols in y to those in x.
For example, if = 0,1 and both 0101 and 101010 are strings over . The
concatenated of both gives string 0101101010.
3

Automata Theory, Computability and Formal Language

Oyelade O.J.

If x, y * then |xy| = |x| + |y|, i.e. when two strings are concatenated, the result of
their lengths is the sum of the lengths of the two strings.
Note: for empty string ; if x *, then x = x = x .
The order in which strings are concatenated is important. For example, suppose
= a, b and u = ab and v = ba, then uv = abba and vu = baab uv vu . Therefore,
the order matter in spelling.
Associativity also holds for concatenation. For example, given three strings x, y,
and z. there are two ways to concatenate them in this order:
we can concatenate x and y first to obtain xy and then concatenate xy
with z to obtain xyz or
we can concatenate y and z first to obtain yz and then concatenate x with
yz to obtain xyz. That is, (xy)z = x(yz).
The usual law of indices hold. For example, if x is a string and x = ba, then (ba)2 =
baba. If m,n 0 then xmxn = xm+n
1.1.1 Prefix, Suffix, proper factor and substring of a string
Given x,y,z *. If u = xyz then y is called a factor of u, x is called a prefix of u,
and z is called a suffix of u. y is called proper factor of u if at least one of x and z
is not just the empty string. Also prefix x (or suffix z) is proper if x u (or z u).
The string u is a substring of string v if u = a1, , an, where ai and strings
x0, .., xn v = x0a1x1xn-1anxn. Let x *. We call a representation of x =
u1un, where each ui *, factorization of x.
For example, given string u = abab over the alphabet {a,b}.
the prefix of u are ,a, ab, aba, abab
the suffix of u are ,b, ab, bab, abab
factors of u are , a, b, ab, ba, aba, bab, abab.
Examples of substring of u are aa, bb, abb. And
u = ab.ab is a factorization of u.
1.1.2 The tree order on *
This is the standard way of listing strings over an alphabet i.e. let x,y * and x
< y iff |x| < |y| and the string x occurs to the left of the string y in the tree over * .
For example, let = {0,1} and 0 < 1 the tree over * of length 2 are:
4

Automata Theory, Computability and Formal Language

01

Oyelade O.J.

10
11

00

Therefore, the tree oder for * are , 0, 1, 00, 01, 10, 11

Automata Theory, Computability and Formal Language

Oyelade O.J.

Languages

1.2

Given be an alphabet consisting of all words in an English dictionary. The set *


consists of all possible finite sequences of words. A subset L of * consists of all
sequences of words that form grammatically correct English sentences e.g. the
sequence (to, be, or, not, to, be) L but (be, be, to, to, or, not) L.
Therefore, for any alphabet any subset of * is called an -language, or a
language over or a language.
Let us consider the following examples:
in arithmetic expression, we use the alphabet = {0, ..,9} {+, *, -, -,
=} {9, 0}. We can form the language L of all correct sums: i.e. the
sequence 2+2 = 4 L but the sequence 1- 0 L
in computer science the set of all syntactically correct programs in a
given computer language, Java, C/C++ , constitute a language.
Both and * are language over
Language also arise from decision problems which are problems whose answer
is either yes or no e.g. we can ask whether a number is a prime; the answer is
either yes or no
Let us consider the following example:
A simple graph is one with no loops and no multiple edges. A graph is said to
be connected if any two vertices can be joined by a path. Therefore, a graph is
either connected or not. is the graph connected is an example of a decision
problem. Let us see how we can construct a language from this decision
problem.
A simple graph can be represented by an adjacency matrix whose entries
consist of 0s and 1s. e.g. consider the graph G below:
a

Automata Theory, Computability and Formal Language

Oyelade O.J.

The matrix formation is given below:


0
1
0
0

1
0
1
0

0
1
0
1

0
0
1
0

This adjacency matrix can be used to construct a binary string by adjoining (or
concatenating) the rows of the matrix.
Therefore, the graph G is represented by :
0100.1010.0101.0010 = 0100101001010010
code(G) = 0100101001010010

Therefore, every simple graph can be encoded by a string over the alphabet
= {0,1}.
Let L = {x {0, 1}* : x = code(G) where G is connected}.
This is a language that corresponds to the decision problem: is the simple
graph G connected? G answers yes iff code(G) L

1.3

Language operations

Suppose X is any set, then P(X) is the set of all subsets of X, (called the power
set of X).
Let be an alphabet. A language over is any subset of *, this implies that,
the set of all languages over is P(*). If L and M are languages over so
are L M, L M and L\M (relative complement). If L is a language over ,
then L = *\L is a language called the complement of L.
The operations of intersection, union, and complementation are called Boolean
operations (from set theory).
Note that x L M means x L or x M or both.

Automata Theory, Computability and Formal Language

1.3.1

Oyelade O.J.

Product of languages

Let L and M be languages, then


L.M = {xy: x L and y M} is called the product of L and M. We writhe LM
rather than L.M.
A string belongs to LM if it can be written as a string in L followed by a string
in M. this implies that the product operation enables us to talk about the order in
which symbols or strings occur.
Some examples of products of languages are:
1. L = L for any language L
2. { }L = L = L{ } for any language L
3. Let L = {aa, bb} and M = {aa, ab, ba, bb}. Then
LM = {aaaa, aaab, aaba, aabb, bbaa, bbab, bbba, bbbb} and
ML = {aaaa, aabb, abaa, abbb, baaa, babb, bbaa, bbbb}
LM ML in general
Given a language L, we define L0 = { } and Ln+1 = Ln.L. for n>0.
Therefore, language Ln consists of all strings u of the form u = x1, ., xn where
xi L.
1.3.2

Kleen star of a language

The Kleen star of a language L, denoted by L* , is defined to be


L* = L0 + L1 + L2 + . and
L+ = L1 + L2 + ,..
Some examples of Kleen star of languages are:
1. * = { } and { }* = { }.
2. The language {a2}* consists of the strings , a2, a4, a6, ..
3. A string u belongs to {ab, ba}* if it iws empty or if u can be factorised u =
x1..xn where each xi is either ab or ba. Thus, the string abbaba belongs to the
language because abbaba = ab.ba.ba, but the string abaaba does not because
abaaba = ab.aa.ba.

Note: We can use the Boolean operations, the product, and the Kleene star to
describe languages.
8

Automata Theory, Computability and Formal Language

Oyelade O.J.

For example, let L = {a, b}*\ {a, b}*{aa,bb}{a, b}* . This consists of all strings
over the alphabet {a,b} that do not contain a doubled symbol.
Therefore, the string ababab L. But string abaaba L.
Some examples of languages over the alphabet = {a, b} are:
1. * can be written as (a + b)*, i.e. * = {a, b}* = ({a} + {b})* = (a + b)*
2. The language (a + b)3 consists of all 8 strings of length 3 over . This is
because (a + b)3 means (a + b) (a + b) (a + b). A string x belongs to this
language if we can write it as x = a1a2a3 where a1,a2,a3 {a, b}.
3. The language aab(a + b)* consists of all strings that begin with the string aab,
but the language (a + b)* aab consists of all strings that end in the string aab.
The language (a + b)*aab(a + b)* consists of all strings that contain the string
aab as a factor.
4. The language (a + b)*a(a + b)* a(a + b)* b(a + b)* consists of all strings that
contain the string aab as a substring.

Automata Theory, Computability and Formal Language

Oyelade O.J.

GRAMMARS
Definition
Automata are devices for recognizing strings while Grammars are devices for
generating strings belonging to a language
Let us consider the following fragment of English
Sentence denoted by S
Noun-phrase denoted by NP
Verb-phrase denoted by VP
Noun denoted by N
Definite article denoted by T
Verb denoted by V
There are rules that tell us how the grammatical categories are related to each
other. We shall use to indicate how a grammatical category on the left can be
constructed from grammatical categories on the right. That is:
1.
2.
3.

S NP.VP
NP T.N
VP V.NP

Also, let us include amongst these rules the specific English words that belong to
those grammatical categories consisting only of words; the symbol | means or

4.
5.
6.

T the
N girl | boy | ball | bat | frissbee
V hit | took | threw | lost

We can use this grammar to generate a language over the alphabet

= {the, girl, boy, ball, bat, Frisbee, hit, took, threw, lost}.
The starting point is always the symbol S.
S NP.VP
NP.VP T.N.VP
T.N.VP T.N.V.NP
10

Automata Theory, Computability and Formal Language

Oyelade O.J.

T.N.V.NP T.N.V.T.N
T.N.V.T.N the N.V.T.N
the N.V.T.N the boy V.T.N
the boy V.T.N the boy threw T.N
the boy threw T.N the boy threw the N.
the boy threw the N the boy threw the ball.

We can see that the string the boy threw the ball belongs to the language
generated by the grammar.
Therefore, the Grammar G can be defined as a quadruple or 4-tuple G = {VT, VN,
P, S} where:

VT is a finite set of terminal symbols (letters)


VN is a set of nonterminal symbols and VT and VN are disjoint sets;
+
*
P is a finite set of productions in the form of (, ) where V , V
S is a start or distinguished symbol, and S VN

A grammar is said to generate a particular string of terminals if, by starting with


the start symbol, one can produce that string by successively replacing patterns
found on the left of the grammars rewrite rules (or production rules) with the
corresponding expressions on the right, until only terminals remain. The sequence
of steps in this process is called a derivation of the string.
For example, consider the following Grammar G:
S
S
Y
X
Z

XSZ
Y
yY |
x
z

1
2

(3,4)
5
6

With start symbol S, and a derivation showing how that grammar can generate
the string xyz as follows:
S XSZ
xSZ
xyZ
xyz
If the terminals of a grammar G are symbols in an alphabet, we say that G are
*
*
actually strings in . In such a case the strings generated by G are strings in .
11

Automata Theory, Computability and Formal Language

Oyelade O.J.

Therefore, a grammar G over an alphabet specifies a language over that


consists of the strings G generates and the language is denoted by L(G).
Let us consider another example;
Given grammar G = ({A,S}, {a, b, c}, P, S) where P has the following productions:
S abASc |
bAa abA
bAc bc
bAb bbA

(1,2)
(3)
(4)
(5)

Derive the string a2b2c2 from the grammar above.


S abASc
abAabAScc
abAabAcc
abAabc2
a2bAbc2
a2bbAcc
a2b2c2
Ex: Confirm that, the language generated by the grammar in the above
example is L(G) = {xmynzm | m, n, 0}
Consider another example of a grammar:
S Xc
X YX |
Ya|b
The non-terminals of the grammar are S, X, and Y; the terminals are a, b, c.
Anything which we can derive from the start symbol by applying the production
rules is called a Sentential form.
The last sentential form which contains no non-terminals is called a Sentence.
The above grammar derive all words starts with many as or bs and finish with a
c and this is the language defined by the regular expression (a|b)*c
Note: Every regular expression can be converted to a grammar but not every
grammar can be converted back to a regular expression. Any grammar which can
12

Automata Theory, Computability and Formal Language

Oyelade O.J.

be converted back to a regular expression is called a regular grammar and the


language it defines is a regular language.
i.e.

Regular expression
Regular expression

Grammar
Regular Grammar

Regular Grammars
A regular grammar is a grammar whose production (rewrite rule) rules conform to
the following restrictions:

The left side of any production rule in a regular grammar must consist
of a single nonterminal.

The right side must be a terminal followed by a nonterminal, or a single


terminal, or an empty string.

Therefore, the production rules of the form


Z yX
Z x
W
Would be allowed in a regular grammar.

But production rules of the form:


yW X
X xZy
YX WvZ

would not.

e.g. S xX
X yY
Y xX |
13

Automata Theory, Computability and Formal Language

Oyelade O.J.

This is a regular grammar that generates strings consisting of one or more copies of
the pattern xy
Note: Any rule of the form: N x in a regular grammar could be replaced by the
pair of rules
N xX
X
where X is a nonterminal that does not appear

elsewhere in the grammar without altering the set of strings that could be
generated by the grammar.
Regular expression
Regular expressions are used to define patterns of characters. It is just a form of
notation, used for describing sets of words.
For any given set of characters , a regular expression over is defined by:
The empty string is a regular expression.
Each member of is a regular expression. For instance, if we write a as
a regular expression, this means take the letter a from the input.
If p and q are regular expressions, then so is p q
If p and q are regular expressions, then so is p.q
If p is a regular expression then, so is p*. i.e. the Kleen closure of a regular
expression denoted by * indicates zero or more occurrences of that
expression. Thus p* is the (infinite) set { , p, pp, ppp, ..} and means
take zero or more p from the input
The Chomsky hierarchy
Based on pioneering work by a linguist (Chomsky, 1959), computer scientists now
recognize four classes of grammar. The classification depends on the format of the
productions, and may be summarized as follows:
Type 0 Grammars (Unrestricted)
An unrestricted grammar is one in which there are virtually no restrictions on the
form of any of the productions, which have the general form

with (N T )* N (N T )* , (N T )*

(thus the only restriction is that there must be at least one non-terminal symbol on
the left side of each production). The other types of grammars are more restricted;
to qualify as being of type 0 rather than one of these more restricted types it is
necessary for the grammar to contain at least one production with | | > |
14

Automata Theory, Computability and Formal Language

Oyelade O.J.

|, where | | denotes the length of . Such a production can be used to "erase"


symbols - for example, aAB aB erases A from the context aAB. This type is so
rare in computer applications that we shall consider it no further here. Practical
grammars need to be far more restricted if we are to base translators on them.
Type 1 Grammars (Context-sensitive)
If we impose the restriction on a type 0 grammar that the number of symbols in the
string on the left of any production is less than or equal to the number of symbols
on the right side of that production, we get the subset of grammars known as type 1
or context-sensitive. In fact, to qualify for being of type 1 rather than of a yet more
restricted type, it is necessary for the grammar to contain at least one production
with a left side longer than one symbol.
Productions in type 1 grammars are of the general form

with | | | | , (N T )* N (N T )* , (N T )+

Strictly, it follows that the null string would not be allowed as a right side of any
production. However, this is sometimes overlooked, as -productions are often
needed to terminate recursive definitions. Indeed, the exact definition of "contextsensitive" differs from author to author. In another definition, productions are
required to be limited to the form
with , (N T )*, A N+, (N T )+

(It can be shown that the two definitions are equivalent.) Here we can see the
meaning of context-sensitive more clearly - A may be replaced by when A is
found in the context of (that is, surrounded by) and .
A much quoted simple example of such a grammar is as follows:
G
N
T
S
P

=
=
=
=
=

{ N , T , S , P }
{ A , B , C }
{ a , b , c }
A
A aABC | abC
CB BC
bB bb
bC bc
cC cc
15

(1, 2)
(3)
(4)
(5)
(6)

Automata Theory, Computability and Formal Language

Oyelade O.J.

Let us derive a sentence using this grammar. A is the start string: let us choose to
apply production (1)
A aABC
and then in this new string choose another production for A, namely (2) to derive
A a abC BC
and follow this by the use of (3). (We could also have chosen (5) at this point.)
A aab BC C
We follow this by using (4) to derive
A aa bb CC
followed by the use of (5) to get
A aab bc C
followed finally by the use of (6) to give
A aabbcc
However, with this grammar it is possible to derive a sentential form to which no
further productions can be applied. For example, after deriving the sentential form
aabCBC
if we were to apply (5) instead of (3) we would obtain
aabcBC
but no further production can be applied to this string. The consequence of such a
failure to obtain a terminal string is simply that we must try other possibilities until
we find those that yield terminal strings. The consequences for the reverse
problem, namely parsing, are that we may have to resort to considerable
backtracking to decide whether a string is a sentence in the language.
Type 2 Grammars (Context-free)
A more restricted subset of context-sensitive grammars yields the type 2 or
context-free grammars. A grammar is context-free if the left side of every
16

Automata Theory, Computability and Formal Language

Oyelade O.J.

production consists of a single non-terminal, and the right side consists of a nonempty sequence of terminals and non-terminals, so that productions have the form

with | | | | , N , (N T )+

with A N , (N T )+

that is

Strictly, as before, no -productions should be allowed, but this is often relaxed to


allow (N T)*. Such productions are easily seen to be context-free, because
if A occurs in any string, say A, then we may effect a derivation step A
without any regard for the particular context (prefix or suffix) in which A occurs.
Type 3 Grammars (Regular, Right-linear or Left-linear)
Imposing still further constraints on productions leads us to the concept of a type 3
or regular grammar. This can take one or other of two forms (but not both at
once). It is right-linear if the right side of every production consists of zero or one
terminal symbols, optionally followed by a single non-terminal, and if the left side
is a single non-terminal, so that productions have the form
A a or A aB

with a T , A, B N

It is left-linear if the right side of every production consists of zero or one


terminals optionally preceded by a single non-terminal, so that productions have
the form
A a or A Ba

with a T , A, B N

(Strictly, as before, productions are ruled out - a restriction often overlooked). A


simple example of such a grammar is one for describing binary integers
BinaryInteger
| "0" | "1" .

= "0" BinaryInteger | "1" BinaryInteger

Regular grammars are rather restrictive - local features of programming languages


like the definitions of integer numbers and identifiers can be described by them,
but not much more. Such grammars have the property that their sentences may be
parsed by so-called finite state automata, and can be alternatively described by
regular expressions, which makes them of theoretical interest from that viewpoint
as well.

17

Automata Theory, Computability and Formal Language

Oyelade O.J.

The relationship between grammar type and language type


It should be clear from the above that type 3 grammars are a subset of type 2
grammars, which themselves form a subset of type 1 grammars, which in turn form
a subset of type 0 grammars as in the figure below:
.------------------------------------------------------------------.
| .-------------------------------------------------.
|
| | .--------------------------------.
|
|
| | | .--------------.
|
|
|
| | | | Type 3
|
Type 2
|
Type 1
|
Type 0
|
| | | |
|
|
|
|
| | | | Regular
| Context-free |
Context| Unrestricted |
| | | |
|
|
sensitive |
|
| | | `--------------'
|
|
|
| | `--------------------------------'
|
|
| `-------------------------------------------------'
|
`------------------------------------------------------------------'

The Chomsky hierarchy of grammars

18

Automata Theory, Computability and Formal Language

Oyelade O.J.

FINITE AUTOMATA
FINITE
The word finite is self explanatory. It means having an end or limit or having a countable number
of elements.
AUTOMATA
These are devices for recognizing strings.
Therefore, a finite automaton is a mathematical model of computing device that has discrete
inputs and outputs as well as a finite set of internal states. Finite automata are formal way of
describing certain simple but highly useful languages called regular expressions. It is also a
graph with a finite number of nodes called states which are known as finite state machines.
Finite automaton is a device that has a processing unit with limited memory capacity and has no
auxiliary that is main memory. It receives input on a special input tape and reads it, one character
at a time. Reading a character results in changing the states of the automaton and moving the
head one position to the right; the set of states is its finite control. The automaton has no
means to deliver output; however, some states can be designated as favorable. Thus, even
though an automaton does not produce any physical output, it still can be used as a recognition
device. We sometimes call these favorable states accepting states. Note that a start state could
also be the final or accepting state.
A diagram showing an accepting state and a start state:

S1 is the start state and also the accepting state.


The finite automaton is basically a very simple computer that consists only of an input
tape, a tape reading device, and a finite control unit. The input tape provides the string of
symbols to be computed. The tape reading device linearly reads the tape telling the control unit
which symbol is currently being read. The finite control unit exists in a one of a finite number of
states, which contain a start state and some number of final states. For each character that is read
the control unit either stays in the same state or moves to another state, for any state and any
character this decision is fixed. If after reading a string from an input tape the automaton is in a
final state then the string is said to be accepted by the automaton.
Finite state automaton also known as the finite state machine is a model of behavior
composed of a finite number of states, transitions between those states, and actions. A state
stores information about the past, i.e. it reflects the input changes from the system start to the
present moment. A transition indicates a state change and is described by a condition that would
need to be fulfilled to enable the transition. The concept of the FSM is at the center of theory of
computing, as it begins with the basic processes by which finite bits of properly encoded
information could theoretically be handled intelligently by a machine.
19

Automata Theory, Computability and Formal Language

Oyelade O.J.

The language of the FA is the set of strings that label paths that go from the start state to
the accepting state.
Formally, there are two types of finite automata and these are:
Deterministic finite automata and
Non deterministic finite automata
DETERMINISTIC FINITE AUTOMATA (DFA)
In the theory of computation, a deterministic finite automaton (DFA) also known as
deterministic finite state machine is a finite state machine where for each pair of state and
input symbol there is one and only one transition to the next state. DFAs recognize the set of
regular languages and no other languages.
The term deterministic is used as every move of a finite automaton is completely
determined by the input and its current state and no choice is allowed. A DFA will take in a
string of input symbols. For each input symbol it will then transition to a state given by following
a transition function. When the last input symbol has received it will either accept or reject the
string depending on whether the DFA is in an accepting state or non-accepting state.
Deterministic finite automata (DFA) is denoted by a:
Quadruple A= (Q, , , I, F)
where
Q is a finite set of states;
is a finite input alphabet;
is a transition function from Q X A to Q which can be written Q X AQ;
I Q is the initial state of the automaton;
F Q is the set of favorable(final) states.
is the transition function from QxA to Q that takes a state q in Q and a symbol a in and returns a
new state r, which is the state the automaton should make a transition to upon reading the input
symbol a in the state q. I is simply an initial state, and F is a subset of Q called the set of final, or
accepting states.

There are two ways of providing the five pieces of information needed to specify an automaton:
Transition diagrams and Transition tables.
The main part of the finite automaton is a function that defines allowable transitions for all
current states and all input symbols.
A transition diagram is a diagrammatic representation of inputs being made into a machine. It
is a collection of circles, which are labeled for reference purposes and are connected to each
other by arrows known as arcs. These arcs are labeled with a symbol that might occur in the
input string being analyzed.
The initial state is preceded by an inward arrow () and the accepting states by double circles

20

Automata Theory, Computability and Formal Language

Oyelade O.J.

Example

A transition table is a tabular representation of a transition diagram. The rows of the table is
labeled by the state and the columns labeled by the inputs letters. The initial state is preceded by
an inward arrow () and the accepting states by an outward arrow (). In the case where a state
is the initial and final state, it is denoted by ().

To illustrate a transition table we will consider the example below.


This example is a DFA M, with a binary input, which determines if the input contains an even
number of 0s.

The state machine for M


M = (S, , T, s, A) where

S = {S1, S2},
= {0, 1},
s = S1,
A = {S1}, and
T is defined by the following state transition table:
0

S1

S2

S1

S2

S1

S2

Simply put, the state S1 represents that there has been an even number of 0s in the input so far,
while S2 signifies an odd number. A 1 in the input does not change the state of the automaton.
When the input ends, the state will show whether the input contained an even number of 0s or
not.
The language of M is the regular language given by this regular expression:

21

Automata Theory, Computability and Formal Language

Oyelade O.J.

Conditions Necessary for an Automaton to be Deterministic


These requirements are needed for an automaton to be deterministic: No state may have more than one arrow leaving it with the same input being
carried. That is it is impossible for two arrows to leave the same state having the
same label.
There must be a transition from every state on every character of the input
alphabet. This means there must be no missing arrows. Because of this, we say
that our machines are complete.
Advantages and Disadvantages
DFAs are one of the most practical models of computation, since there is a trivial linear
time, constant-space, online algorithm to simulate a DFA on a stream of input. Given two DFAs
there are efficient algorithms to find a DFA recognizing the union, intersection, and
complements of the languages they recognize. There are also efficient algorithms to determine
whether a DFA accepts any strings, whether a DFA accepts all strings, whether two DFAs
recognize the same language, and to find the DFA with a minimum number of states for a
particular regular language
Creating a Finite Automata
Steps involved in creating an FA are:
Define the meaning of the states
Determine the transition function
Label the start state and final states
Test your FA.
Example
To create a DFA that accepts the input string containing 100 as the substring over the alphabet
{0, 1}.
Following the steps listed above, assume the string is 11001.

q0

q1

q2

0
q3

q4

Sn

22

1
q5

0, 1

Automata Theory, Computability and Formal Language

Oyelade O.J.

q0 is the start state.


q5 is the accepting state.
Transition functions are:
( q0, 1)  q1
( q1, 1)  q2
( q2, 0)  q3
( q3, 0)  q4
( q4, 0)  q5
Where Sn is the sink (or dead) state.

NON-DETERMINISTIC FINITE AUTOMATA (NFA)


It is an important tool for designing string processors, e.g. lexical analyzers. It is imaginary in
the sense that it has to be implemented deterministically on each root to leaf branch of
computation.
The nondeterministic finite automaton is a more powerful variant of finite automata. The main
difference is that the transition rules T may contain for a given input (a) and state q, multiple arrows
leading to different states. Thus the next state is not determined by the current state and the current
input, hence the non-determinism. One can pick any of the states pointed to by the arrows as the next
state. A nondeterministic automaton is said to accept an input if there exists any possible choice of
transitions that would be consistent with the input and would lead in the end to an accepting state.
Thus intuitively the nondeterministic automaton is like the deterministic automaton but has as its
disposal the right to make lucky guesses at each transition point that might lead to an accepting state
if such a path through state space exists. It turns out that any nondeterministic finite automaton can
be simulated by a deterministic one, so the class of languages accepted by both types of automaton is
the same. The main use of non-deterministic automata is theoretical; they are simpler to design and
are useful in proofs.
A non-deterministic finite automaton is much like a deterministic automaton except that we
allow multiple initial states and we impose no restrictions on transitions as long they are labeled
by symbols in the input alphabet. It is a tool used in easy to use and can help when designing
automata. Non-deterministic finite automaton also enables us to establish a link with what are
known as grammars. Grammars are devices for generating strings belonging to a language. A
non-deterministic finite automaton violates the rules for a transition being complete in ways such
as:
There are more than one initial state
There are forbidden configurations.
In the case of a non-deterministic finite automaton the device has a choice. It can choose from
where to go from whole set of next states. It tries to uses which way will lead to a favorable state.
It can be argued that a real computational device cannot guess. This argument can hardly be
refuted. However, here are two facts that fully justify this variant of finite automata:
Non-deterministic finite automata are much easier to design
Non-determinism does not increase computational power of finite automata.

The standard model used is the finite-state automaton. We can convert any definition involving
regular expressions into an implementable finite automaton in two steps:
23

Automata Theory, Computability and Formal Language

Regular expression

NFA

Oyelade O.J.

DFA

The purpose of an NFA is to model the process of reading in characters until we have formed
one of the words that we are looking for. A non-deterministic automaton is exactly like an
automaton except that we allow multiple initial states and we impose no restrictions on
transitions as long as they are labeled by symbols in the input alphabet. The non-deterministic
automaton is regarded as tools helpful in designing deterministic automata rather than real life
machines.
Therefore, the formal definition of an NFA is given below:

Finite set of states, Q.


Finite alphabet of input symbols, .
A transition function, . This function:
Takes a state and an input symbol or as arguments.
Transition function works over all possible state, symbol pairs to return a set of
states ( : Q P(Q) )
q0 Q is the start state.
F Q is the set of final states.

Therefore, any FA is represented as the five-tuple: A = (Q, , , q0 , F).


Example 1
Draw the equivalent NFA that accepts the input string containing 100 as the substring over the
alphabet {0, 1}, assume the string is 11001.

1
q0

1
q1

0
q2

0
q3

Example 2
Create an NFA that accept the string aabba

q0 is the start state.


q4 is the accepting state.

24

1
q4

q5

Automata Theory, Computability and Formal Language

Oyelade O.J.

Transition functions are:


( q0, a)  q0
( q0, a)  q1
( q1, b)  q2
( q2, b)  q3
( q3, a)  q4

PRACTICAL APPLICATIONS AND AREAS


Actual instances of finite automaton in technology are ubiquitous. One familiar example
may be the control circuitry of an elevator. The state of the system is given by the current floor
the elevator is on, while the input is given by the button pushed. This example is unique in the
sense that the alphabet A and the set of states Q are the same, and slightly irregular in the sense
that the automaton has no final state; it runs forever as long as people are there to give its input.
Indeed any kind of switching circuit can be viewed as a finite state machine.
From a theoretical point of view, an important application of finite automata is the
problem of language recognition. A language L over an alphabet of symbols A is simply some
subset of A*, where A* is the set of all possible sequences of symbols drawn from A. A finite
automaton is said to accept a sequence S in A* if and only if the automaton takes as input
sequence S and ends up in one of the special accepting states in F right after reading the last
input symbol in S. Otherwise the automaton is said to reject S. A finite automaton is said to
accept a language L in A* if the automaton accepts a sequence S whenever it is an element of the
language L, and rejects a sequence S whenever it is not an element of L.
Finite automata are used in describing the operation of a simplified version of vending
machine. Many other systems operating in practice can also be modeled by finite automata such
as control circuits of computers, computer network communication protocols, and lexical
analyzers for compilers etc. Many of those systems fall into the class of systems called reactive
system.
A reactive system is a system that changes its actions, outputs and conditions/status in response
to stimuli from within or outside it. It is an event driven or control driven system continuously
having to react to external and/or internal stimuli. The inputs for a reactive system are never
ready unlike for example when two numbers are added together by an adder. An adder does not
respond unless the input i.e. two numbers to be added are ready. A system such as an adder is
called a transformational system.
It is generally agreed that finite automata are a natural medium used to describe dynamic
behaviors of reactive systems. Finite automata are formal and rigorous and computer programs
can
be
easily
written
to
simulate
their
behaviors.
To model a reactive system with finite automaton, first the states the system goes in or the modes
of its operation are identified. These become the states of the finite automaton that models it.
Then the transitions between the states triggered by events and conditions, external or internal to
the system, are identified and they become arcs in the transition diagram of the finite automaton.
In addition actions that may take place in those states can also be added to the model.

25

Automata Theory, Computability and Formal Language

Oyelade O.J.

Another practical application of finite automata is in the simplified version of login


process to a computer from the computer point of view. Let us assume for simplicity that this
computer
accepts
a
single
user
at
a
time.
Initially the computer waits for a user name to be typed in. This is one state of the system. When
a name is typed in, it checks whether or not the name is valid. If it is valid, then it asks for and
then waits for the password, which is another state. If the user name typed in is not valid, it goes
back to the initial state. We could make it go to a different state and count the number of login
attempts for security purpose. But to make it simple, when a password is typed in and it is
correct, then it accepts the user and starts a session. That is another state though it could further
be broken down into a number of more states. When the session terminates, it gets a signal, goes
back to the initial state and waits for another login. If the password typed in is incorrect, then it
informs the user of that and waits for the next try. That is a fourth state. If the second password
fails, it goes to the initial state and starts all over again. Again what we have seen is a model for
one level of abstraction. Depending on how much detail we are interested in, different states
would be identified and transitions would have to be selected accordingly.
Other potential applications
1. software engineering
2. Bioinformatics
3. Process Control
4. Pattern Recognition
5. Task Scheduling
6. Control of service activities
7. Optimization Problems
8. Image Processing
9. Diagnosis
10.
Computer Vision
11.
Concept Learning
12.
Routine and Bandwidth allocation in computer communication network.
LIMITATIONS OF FINITE AUTOMATA
There are quite important issues about what kind of linguistic structures a Finite State
Automaton can model. These structures do not occur in the kinds of applications used as
examples in this part: word recognition, morphological analysis and spelling correction. The
limitations are best seen in relation to the syntactic structure of languages
There are most definitely many languages that cannot be recognized by finite automata.
Consider the alphabet A consisting of 0 and 1 alone, so that A* is simply the set of binary
numbers of any length, possibly with extra zeroes on the left. Furthermore consider the language
L consisting of those binary numbers in A* that contain a sequence of n2 1's for any positive
integer n. It can be proven that no finite automaton can recognize this language. Intuitively, the
idea behind the proof is that any such automaton would have to count an arbitrary number of 1's,
say m of them and check that m is a perfect square. Not machine with a finite number of states
can do this for arbitrarily large m. Although a finite automaton is not up to the job, a Turing
machine can be designed to recognize this language. In fact it turns out finite automata in the
grand scheme of computing devices are relatively weak. They form the lowest rung in a
hierarchy that includes, in order of increasing power, finite automata, pushdown automata,
bounded linear automata, and Turing machines. Pushdown automata are merely finite automata
26

Automata Theory, Computability and Formal Language

Oyelade O.J.

endowed with access to an unbounded stack, and bounded linear automata are Turing machines
whose read/write head is not allowed to leave the region of tape where the input is given.
Since there are relatively strong limitations on what a finite automaton can do, one may
well ask is there a way to characterize all languages that actually can be recognized by finite
automata. This set of languages is known as the set of regular languages and can be specified by
regular expressions. Regular languages themselves are very restricted classes of languages, in
keeping with the relative weakness of finite automata. Since pushdown automata are more
powerful than finite automata, they can recognize a more general class of languages known as
context free languages, whereas the linear bounded automata can recognize an even more general
class: the context sensitive languages. Finally Turing machines, the most powerful machines can
recognize any language that can be recognized by an algorithmic procedure. Such a language is
called recursively enumerable. This hierarchy of machines and languages is intimately related to
the theory of grammars and is called the Chomsky hierarchy after the linguist Noam Chomsky.

Regular expressions
Regular expressions are algebraic equivalent to finite automata. They are used in many
places as a language for describing simple patterns in text.
Let A= {a1, a2, a3 } be an alphabet. A regular expression over A is a sequence of symbols
formed by repeated applications of the following rules:
(R1) is a regular expression.
(R2) is a regular expression.
(R3) a1, a2, a3 are regular expressions.
(R4) R1R2 is a regular expression if R1, R2 are regular expressions
(R5) R1R2 is a regular expression if R1, R2 are regular expressions
(R6) (R1)* is a regular expression if R2 is regular expression.
(R7) every regular expression arises by a finite number of applications of the rule (R1) to (R6).
Examples
L (001) = {001}
ab*a= {aa, aba, abba, abbba}
a*b*= {, a, b, abb, aab, aaabbb, aaaaaabb, aaaaa, bbbbbb}
a*b*a*={, a, b, ab, aa, ba, aaab, abbba, baaaaa }
(a U b)*= {, a, b, aa, bb, aaaa, bbbb} etc.
The languages accepted by DFA, NFA, and NFA-, or expressed by RE are called the regular
languages.
A regular language over an alphabet is one that can be obtained from basic languages
using the operations of union concatenation and kleene*. A regular language therefore can be
described by an explicit formula. It is common to simplify the formula slightly, by leaving out
the brackets {} or replacing them with parentheses and by replacing U (i.e. union) by +; the
result is called a regular expression.
Here are several examples of regular languages over the alphabet {0, 1}, along with the
corresponding regular expressions.

27

Automata Theory, Computability and Formal Language

Language

Oyelade O.J.

Corresponding Regular Expression

1. { }
2. {0}
3. {001}(i.e.{0}{0}{1})
4. {0,1}(i.e.{0}U{1})
5. {0,10}(i.e.{0}U{10})
6. {1, }{001}
7. {110}*{0,1}
8. {1}*{10}
9. {10,111,11010}*
10 {0,10}*({11}*U{001, })

0
001
0+1
0+10
(1+ )001
(110)*(0+1)
1*10
(10+111+11010)*
(0+10)*((11)*+001+ )

We think of a regular expression as representing the most typical string in the


corresponding language. For example, 1*10 stands for a string that consists of the substring 10
proceeded by any number of 1s.

Converting a regular expression to a NFA - Thompson's Algorithm


We will use the rules which defined a regular expression as a basis for the construction:
1. The

NFA

representing

the

empty

string

is:

2. If the regular expression is just a character, eg. a, then the corresponding NFA is :

3. The union operator is represented by a choice of transitions from a node; thus a|b can be
represented
as:

4. Concatenation simply involves connecting one NFA to the other; e.g. ab is:

5.

The Kleene closure must allow for taking zero or more instances of the letter from the
input; thus a* looks like:

28

Automata Theory, Computability and Formal Language

Oyelade O.J.

Summary
Regular Expressions: Let be an alphabet. A regular expression is constructed from the
symbols , and a, where a , together with the symbols +, . , and * left and right brackets
according to the following rules: , and a are regular expressions, and if s and t are regular
expressions so are (s+t), (s.t), (s*).
Regular Languages: Every regular expression R describes a language L(r). A language is regular
if it can be described by a regular expression.
The -automata
Every non-deterministic automaton with -transitions can be converted into a non-deterministic
automaton without -transitions that recognize the same language.
NFA with -transitions
If is a label on arcs; that is, it is an input then:
-when an arc labeled is traversed no input will be consumed.
NON-DETERMINISTIC FINITE AUTOMATA VERSUS DETERMINISTIC FINITE
AUTOMATA
A deterministic Finite State Automaton requires less memory than a non-deterministic
Finite State Automaton. Given that no alternatives have to be stored, then the search can
consume constant memory - that is, it never has to use more memory than that used to traverse
the first arc.
A deterministic Finite State Automaton is likely (on average) to be faster than a nondeterministic Finite State Automaton because there are fewer arcs that can be traversed for any
given input.
Finite State Automata that are non-deterministic in recognition have two or more arcs
with the same label emanating from a given state. Any non-deterministic Finite State Automaton
can be converted into a corresponding deterministic Finite State Automaton while for
deterministic, no two arcs from one state can have the same label. This means that when
recognizing some input, the finite state machine can only go along one route through the
network.
Automata theory: formal languages and formal grammars
Chomsky
Minimal
Grammars
Languages
hierarchy
automaton
Type-0
Unrestricted
Recursively enumerable Turing machine
Type-1
Context-sensitive
Context-sensitive
Linear-bounded
Type-2
Context-free
Context-free
Nondeterministic Pushdown
n/a
Deterministic Context-free Deterministic Context-free Deterministic Pushdown
Type-3
Regular
Regular
Finite
Each category of languages or grammars is a proper subset of the category directly above it.

29

Automata Theory, Computability and Formal Language

Oyelade O.J.

ALGORITHMS OF FINITE AUTOMATA


Algorithm 1: (Automaton to regular expression)
Transition elimination: given any two states p and q, all the transitions can be replaced by a single
transition.
The input to this algorithm is a normalized automaton A. The output is a regular expression r
such that L(r) = L (A).
Repeat the following procedure until there are only two states and at most one transition
between them, at which point the algorithm terminates.
Procedure: repeatedly apply transition rule if necessary until the resulting automaton has the property
that between each pair of states there is at most one transition; now repeatedly apply loop elimination
rule (L) if necessary to eliminate all loops; finally, apply state elimination rule (S) to eliminate a
state.
When the algorithm has terminated, a regular expression describing the language recognized
by the original machine is given by the label of the unique transition, if there is a transition,
otherwise the language is the empty set.
Algorithm 2: Transition tree of an Automata
Let A be an automata, the transition tree of A is constructed inductively in the following way: we
assume that a linear ordering of A is specified at the outset so we can refer meaningfully to the
element of A in turn:
1. The root of the tree is s0 and we put T0 = {s0}.
2. Assume that Ti has been constructed; vertices in TI will have been labeled either closed or nonclosed. The meaning of these two terms will be made clear below. We now show how to construct
TI+1.
3. For each non-closed leaf s in Ti and for each a A in turn construct an arrow from s to s . a labeled
by a; if, in addition, s . a is a repeat of any state that has already been constructed, then we say it is
closed and mark it with a x.

4. The algorithm terminates when all leaves are closed.

30

Anda mungkin juga menyukai