Anda di halaman 1dari 9

Theory of Automata Lecture Notes: 1.

Introduction & Theory of Formal Language Text Book: Introduction to Computer Theory by Daniel I A Cohen

1. Automata Theory: Introduction & Theory of Formal Languages.


Introduction: Automata (plural of automation) theory is the basis for developing the theory of formal languages. The basic definition of a formal language can be described as:

A symbol (a simple character, i.e. meaningless by itself). An alphabet (a finite set of symbols). A word (a finite string of symbols from a given alphabet). A language (a set of words formed from a given alphabet/word).

The set of words that form a language is usually infinite, but it may be finite or empty. Formal languages are treated like mathematical models which involves union and intersection of sets. Since computers are mathematical models, so building the mathematical models means building the machines. Automata theory is used to analyze the machine by types of input on which that machine can operate successfully. The collection of these successful inputs is called the language of that specific machine. That is, an automata provides a method for accepting or rejecting a string(s) of a language by a machine. The acceptance or rejection of a string is performed by defining a grammar for that specific machine. Grammars are termed as a set of rules. The machine performs the computations (as per rule) on an input by moving through a series of states. Thats why an automata is also called as an abstract mathematical model for Finite State Machines (FSM). What is a grammar? A grammar is a powerful tool (a set of rules) for describing and analyzing a language. Using grammar, valid sentences in a language are constructed. A simple example of English grammar can be given as: (the symbol | stands for or) Example-1: sentence subject verb-phrase adverb verb object noun subject verb-phrase object This | Computers | I adverb verb | verb never is | run | am | tell the noun | a noun | noun university | world | cheese | lies

Using the above grammar (rules), some simple sentences can be constructed such as: This is a university. Computers run the world. I am the cheese. I never tell lies. Inst: Dr. Mohammed Yousuf Khan 1 of 9

Theory of Automata Lecture Notes: 1. Introduction & Theory of Formal Language Text Book: Introduction to Computer Theory by Daniel I A Cohen Now consider the derivation (construction) of the first sentence using the above grammar. sentence subject This This This This This verb-phrase verb-phrase verb is is is object object object object a a

noun university

In addition to several reasonable sentences, some can also derive nonsense sentences like Computers run cheese or This am a lies. These sentences don't make semantic sense, but they are syntactically correct because they are of the sequence of subject, verbphrase, verb and object. It is very difficult to define the complete language with a finite number of rules. It is difficult to list all acceptable sentences of a language. In general the language should have the following properties: Well defined, without ambiguity. Using formula, we should be able to recognize in a finite time, whether any given word is in the language or not. Formal Definition of Grammar: A formal definition of a grammar G can be given in 4-tuples as: G = (N, , P, S) where N is a finite set of non-terminals; or T is a finite nonempty set of terminals; S is the start symbol and S N; P is a finite set of productions of the form: as given in the above English grammar. Definitions of the terms used above & some other grammar related terms are described as: Symbols: Alphabet: A symbol means a point, letters, digits etc.

An alphabet () is a finite, nonempty set of fundamental units, set of letters, character or symbols (Cohen pp-8) i.e. = {a, b, c,z}i.e. an alphabet of cardinality 26. Roman alphabet is {a, b, z}; here commas and parenthesis are not necessary, it may be juxtapose as abcz. Binary alphabet is {0, 1}, an alphabet of cardinality 2 Octal alphabet is {0, 1, 2, 7}, an alphabet of cardinality 8. Alphabet of finite sequence 2 of 9

Inst: Dr. Mohammed Yousuf Khan

Theory of Automata Lecture Notes: 1. Introduction & Theory of Formal Language Text Book: Introduction to Computer Theory by Daniel I A Cohen String: Any finite sequence of symbols is a string or a sequence of elements drawn from an alphabet. If consist of the characters #, 4, %, + then it can be written as = {#, 4, %, +} So, beautiful is a string over alphabet = {a, b,,z} and 4444 is a string over the alphabet = {#, 4, %, +}. Language: Certain specified set of strings of characters from the alphabet is called the language.

A set of string over an alphabet (may be the empty string) is called a language. If a language has: (Cohen pp-8) No words, means Consisting only of the empty word, means {} Consisting only of a single word abbab, means {abbab} Empty or null strings: A null set or an empty string means zero symbols or no symbols at all and are represented by different authors as , , . Remember that, a language with no words is represented by . That is, empty word is and empty language is , (Cohen pp-8). When a string has no letters, we call it empty or null string (). Remember is not a symbol of alphabet. null symbol is sometimes useful to specify that a symbol can be replaced by nothing at all e.g., A B | . Word: Those strings that are permissible in the language, we call words.

Non-terminal: A grammar symbol that can be replaced/expanded to a sequence of symbols. For example, sentence, subject, verb-phrase, and object are some of the non-terminals in the above given first example. Terminal: An actual word in a language; these are the symbols in a grammar that cannot be replaced by anything else. "terminal" means a dead-end, no further expansion is possible. For example, This, Computer, never, is, .. A grammar rule that describes how to replace/exchange symbols. The general form of a production for a non-terminal is: sentence subject verb-phrase object The non-terminal sentence is equivalent to the concatenation of the some terminals or non-terminals.

Production:

Inst: Dr. Mohammed Yousuf Khan

3 of 9

Theory of Automata Lecture Notes: 1. Introduction & Theory of Formal Language Text Book: Introduction to Computer Theory by Daniel I A Cohen The production means that anywhere where we encounter sentence we may replace it by subject verb-phrase object. Eventually we will have a string containing nothing that can be expanded further, i.e., it will consist of only terminals. Start symbol: All sentences are derived from the start symbol sentence by successive replacement using the productions of the grammar. Start symbol is a nonterminal from which all sentences are derived. Expression: Expression is described by means of single symbol and combined with . Example: If 1 = { a, b, z} and 2 = { 0, 1, 9} are alphabets then: abba is string over 1 and 234 is a string over 2, but a43 is not a string over 1 or 2 because it contains symbol that are not in 1 or in 2. Whereas is a string over any 1 or 2. A set of natural numbers is not an alphabet, because it is not finite. An alphabet of cardinality 2 is called a binary alphabet. An alphabet of cardinality 1 is called a unary alphabet and string over a unary alphabet is called unary strings. Example: {0, 1} is a binary alphabet and {1} is unary alphabet and 11 is a binary string over the alphabet {0, 1} or a unary string over the alphabet 1. Length: It is denoted by | |. That is, |x| Length of word x (i.e. the number of symbols in the given word) For example: | 0102 | = 4, | abcdef | = 6, || = 0, | 01 | + | 1 | = 3. Note: = i.e. appending no symbols to no symbols gives no symbols. So x = x = x for all word x. Normally parenthesis are not letters in the alphabet but are used for demarcating the ends of factors. If = {x ( )}, then parenthesis are letters of the alphabet. So length(xxxxx) = 5 whereas length ((xx)(xxx)) = 9, (Cohen pp16) Reverse: Palindrome: Reverse (xxx) = xxx ; reverse (145) = 541, (Cohen pp-13) A language palindrome over alphabet = { a b} is defined as: Palindrome = { , and all strings x such that reverse(x) = x}. The elements of palindrome can be: Palindrome = { a b aa bb aaa aba bab bbb aaaa ...} All the strings which are the same if they spelt backwards, (Cohen pp-13) Say = {0, 1}, the string 11 is canonically smaller in * than the string 000, because 11 is shorter string than 000, or 00 is canonically smaller 4 of 9

Canonical:

Inst: Dr. Mohammed Yousuf Khan

Theory of Automata Lecture Notes: 1. Introduction & Theory of Formal Language Text Book: Introduction to Computer Theory by Daniel I A Cohen than 11, because the strings are equal in length but 00 is alphabetically smaller than 11. The set * = { 0 1 00 01 10 11 000 001 .} is given in its canonical ordering. Prefix/Suffix: If , 0, 1, 01, 11 and 011 are the sub string of 011, then , 0 and 01 are prefix of 011, similarly , 1 and 11 are suffix of 011. Grammar Implementations: Consider some more examples of grammar implementations: Example 2: Grammar for Simple English: Suppose we limit ourselves to a very restrictive subset of the sentences in plain English. The production rules are described in simple English without identifying the terminals or non-terminals. 1. A sentence is a noun-phrase followed by a transitive-verb-phrase and another noun-phrase. 2. A sentence is a noun-phrase followed by a intransitive-verb-phrase. 3. A noun-phrase is an article followed by a noun. 4. A noun-phrase is a noun. 5. A transitive-verb-phrase is a transitive-verb. 6. An intransitive-verb-phrase is an intransitive-verb followed by an adverb. 7. An intransitive-verb-phrase is a intransitive-verb. 8. An article is a. 9. An article is the. 10. A noun is dog. 11. A noun is cat. 12. A transitive-verb is chases. 13. A transitive-verb is meets. 14. An intransitive-verb is runs. 15. An adverb is slowly. 16. An adverb is rapidly. Implementing the notational representation of start symbol, terminal and non-terminals, the above grammar becomes: sentence noun-phrase transitive-verb-phrase noun-phrase sentence noun-phrase intransitive-verb-phrase noun-phrase article noun noun-phrase noun transitive-verb-phrase transitive-verb intransitive-verb-phrase intransitive-verb adverb intransitive-verb-phrase intransitive-verb article a article the Inst: Dr. Mohammed Yousuf Khan 5 of 9

Theory of Automata Lecture Notes: 1. Introduction & Theory of Formal Language Text Book: Introduction to Computer Theory by Daniel I A Cohen noun dog noun cat transitive-verb chases transitive-verb meets intransitive-verb runs adverb slowly adverb rapidly The sentence, a dog runs rapidly, is derived as follows: Sentence noun-phrase intransitive-verb-phrase noun-phrase intransitive-verb adverb noun-phrase intransitive-verb rapidly noun-phrase runs rapidly article noun runs rapidly article dog runs rapidly a dog runs rapidly It can be noted that the derivation may be from right or left. The above derivation starts from right, the same can be tried with left derivation as: Sentence noun-phrase intransitive-verb-phrase article noun intransitive-verb-phrase a noun intransitive-verb-phrase a dog intransitive-verb-phrase a dog intransitive-verb a dog runs a dog runs

adverb adverb rapidly

Simple exercise: Provide a step-by-step derivation (right or left) for the following sentences. (it is not necessary that all the sentences will be accepted by the automa, there may be some rejected sentences, beware folks!!!)
(a) (b) (c) (d) (e) (f) (g) (h) (i) (j)

cat chases the dog the dog meets rapidly the cat meets cat rapidly the cat meets slowly a dog chases rapidly cat runs rapidly a cat slowly chases the dog dog runs the cat dog slowly meets the cat cat runs

Inst: Dr. Mohammed Yousuf Khan

6 of 9

Theory of Automata Lecture Notes: 1. Introduction & Theory of Formal Language Text Book: Introduction to Computer Theory by Daniel I A Cohen Exercise: Try the example given on Cohen pp-227 as well. Example 3: Grammar for Arithmetic Expression: Consider the grammar which specifies assignment statements involving identifiers: A, B, C, D, arithmetic operators: +, *, equal sign: = and the left & right parenthesis: ( ). Let T = {A, B, C, D, +, *, (, ), =} and N = {asgn_stat, exp, term, factor, id} with asgn_stat being the starting symbol. Let the following be the set of the productions: asgn_stat id = exp exp exp + term exp term term term * factor term factor factor (exp) factor id id A id B may also be written as: id C id A | B | C | D id D Consider the derivation of the sentence C = A + D * (D + B) , if the sentence is accepted then the expression C = A + D * (D + B) is said to be in the language generated by the above grammar, otherwise the expression is not in the language of the above grammar. Now the step wise derivation is: (right derivation) asgn_stat id = exp id = exp + term id = exp + term * factor id = exp + term * (exp) id = exp + term * (exp + term) id = exp + term * (exp + factor) The process of generating a id = exp + term * (exp + id) sentence means replacing the id = exp + term * (exp + B) whole sentence structure by a id = exp + term * (term + B) particular configuration of id = exp + term * (factor + B) sub-structure, replacing those id = exp + term * (id + B) sub-structure by other sub id = exp + term * (D + B) structure, and so on, until id = exp + factor * (D + B) eventually basic symbols are generated. id = exp + id * (D + B) id = exp + D * (D + B) id = term + D * (D + B) id = factor + D * (D+ B) id = id + D * (D + B) id = A + D * (D + B) Inst: Dr. Mohammed Yousuf Khan 7 of 9

Theory of Automata Lecture Notes: 1. Introduction & Theory of Formal Language Text Book: Introduction to Computer Theory by Daniel I A Cohen C = A + D * (D + B) Expression is accepted. So, the expression C = A + D * (D + B) is the language of the above grammar. Try the following sentences whether each one is in the language generated by the above grammar or not, provide a step-by-step derivation: (take your pick !!! for left or right derivation) 1. B = C * D + A 2. B = C + D * A 3. C = ( A + B * (C + D)) Example 4: Grammar for Logical Expression: It involve the basic AND, OR and NOT gates with true and false. The following grammar is given in a standard format of the grammar known as Backus Naur Form (BNF). For details see the lecture notes: 2. Backus Naur Form (BNF) & Extended BNF. <G> <BE> <BS> <BF> <BP> ::= ::= ::= ::= ::= <BE> not <BE> | <BS> <BF> and <BS> | <BF> <BP> or <BF> | <BP> true | false

Evaluate: a) not true and true or false b) not not true or true and false c) not false or true or false d) not true and not false e) not not true or not not false f) not and not false g) true and true true h) true and true false Some derivation is shown below: a: not true and true or false b: not not true or true and false <G> ::= <BE> <G> ::= <BE> not <BE> not <BE> not <BS> not not <BE> not <BF> and <BS> not not <BS> not <BP> and <BS> not not <BF> and <BS> not true and <BS> not not <BP> or <BF> and <BS> not true and <BF> not not true or <BF> and <BS> not true and <BP> or <BF> not not true or <BP> and <BS> not true and true or <BF> not not true or true and <BS> not true and true or <BP> not not true or true and <BF> not true and true or false not not true or true and <BP> Inst: Dr. Mohammed Yousuf Khan 8 of 9

Theory of Automata Lecture Notes: 1. Introduction & Theory of Formal Language Text Book: Introduction to Computer Theory by Daniel I A Cohen not not true or true and false Now consider the derivation for d: not true and not false <G> ::= <BE> not <BE> not <BS> not <BF> and <BS> not <BP> and <BS> not true and <BS> not true and <BF> not true and ???????? ;There is no not is available under the <BF> production, so the string is rejected. We may change the grammar rules to accept this string. If we change the 3rd production as: <G> ::= <BE> <BE>::= not <BE> | <BS> <BS> ::= <BF> and <BS> | <BF> change it to <BS> ::= <BF> and <BS>|<BF>|<BE> <BF> ::= <BP> or <BF> | <BP> <BP> ::= true | false Then the above string will accept. <G> ::= <BE> not <BE> not <BS> not <BF> not <BP> not true not true not true not true not true not true not true

and and and and and and and and and

<BS> <BS> <BS> <BE> not <BE> not <BS> not <BF> not <BP> not false

BNF & EBNF lecture notes follows in the 2nd set of TA-lecture notes.

Inst: Dr. Mohammed Yousuf Khan

9 of 9

Anda mungkin juga menyukai