Anda di halaman 1dari 6

5.

CONTEXT-FREE GRAMMARS AND LANGUAGES

Introduction Context-free grammars have played a major role in compiler technology since the 1960s; they turned the implementation of parsers from a time-consuming, ad-hoc implementation task into a routine job that can be done in a few hours. More recently, the context-free grammar has been used to describe document formats, via the so-called document-type definition (DTD) that is used in XML community for information exchange on the Web. Definition of Context-Free Grammars A context-free grammar , or simply grammar, or CFG, represented as G= (V, T, P, S), is a grammar with four components: 1. A finite set of symbols (T) that form the strings of the language being defined. We call this alphabet the terminals, or terminal symbols. 2. A finite set of variables (V), also called nonterminals or syntactic categories. Each variable represents a language; i.e. a set of strings. 3. One of the variables represents the language being defined; it is called the start symbol(S). Other variables represent auxiliary classes of strings that are used to help define the language of the start symbol. 4. There is a finite set of productions (P) or rules that represent the recursive definition of a language. Each production consists of: a) A variable that is being defined by the production. This variable is often called the head of the production. b) The production symbol . c) A string of zero or more terminals and variables. This string, called the body of the production, represents one way to form strings in the language of the variable of the head.

Example 1: The CFG for the palindromes on = {0, 1} is represented by G= ({P}, {0, 1}, A, P) where A represents the set of five productions: 1. 2. 3. 4. 5. P P 0 P 1 P 0P0 P1P1

Example 2: Design CFG for the following languages: a) b) c) d) The set {0n1n | n1} The set {0n1n | n0} The set of strings { aibjck |i,j,k 0} The set of strings { aibjck |i,j,k 0, ij or jk} SOLUTION a) S -> 0S1 | 01 b) S -> 0S1 | 01 |

c) S -> ABC A -> |aA B -> |bB C -> |cC d) S -> AB | CD A -> aA | B -> bBc | E | cD C -> aCb | E | aA D -> cD | E -> bE | b To understand how this grammar works, observe the following: a) b) c) d) A generates zero or more a's. D generates zero or more c's. E generates one or more b's. B first generates an equal number of b's and c's, then produces either one or more b's (via E) or one or more c's (via cD). That is, B generates strings in b*c* with an unequal number of b's and c's. e) Similarly, C generates unequal numbers of a's then b's. f) Thus, AB generates strings in a*b*c* with an unequal numbers of b's and c's, while CD generates strings in a*b*c* with an unequal number of a's and b's.

Derivations Using a Grammar With derivations, we apply the productions of a CFG to infer that certain strings are in the language of a certain variable. We use productions from head to body. We expand the start symbol using one of its productions. We further expand the resulting string by replacing one of the variables by the body of one of its productions, and so on, until we derive a string consisting entirely of terminals. The language of the grammar is all strings of terminals that we can obtain in this way. Example 3: The following CFG represents (a simplification of ) expressions in a typical programming language. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. EI EE+E EE*E E (E) Ia Ib I Ia I Ib I I0 I I1

Using derivation, show that a string a*(a+b00) is in this CFG SOLUTION We can derive the string starting with start symbol E. Here is one such derivation: E E*E I*E a*E a*(E) a*( E+E) a*(I+E) a*( a+ E) a* (a + I) a*(a+I0) a* (a+ b00)

Note 1. We can use the relationship to condense the derivation. We know E E by the basis. Repeated use of the inductive part gives us E E*E, E I*E, and so on, until finally E a*(a+b00). 2. A string of terminals w is inferred to be in the language of some variable A if and only if A w

Leftmost and Rightmost Derivations In order to restrict the number of choices we have in deriving a string, it is often useful to require that at each step we replace the leftmost variable with one of its production bodies. Such a derivation is called a leftmost derivation, and we indicate that a derivation is leftmost by using the relations and one or many steps respectively. The derivation in example 3 above is a leftmost derivation. Similarly, it is possible to require that at each step the rightmost variable is replaced by one of its bodies. If so, we call the derivation rightmost and use the symbols and derivations steps respectively. Example 4: The leftmost derivation in example 3 above can actually be described as follows: E E*E I*E a*E a*(E) a*( E+E) a*(I+E) a*( a+ E) a* (a + I) a*(a+I0) a* (a+ b00) We can also summarize the leftmost derivation by saying E

, for

to indicate one or many

a* (a+ b00)

Example 5: Redo example 4 above using rightmost derivation E E*E E*(E) E*(E+E) E*(E+I) E*( E+I0) E*(E+I00) E*( E+ b00) E* (I + b00) I*(a+b00) a* (a+ b00) This derivation allows us to conclude E

a* (a+ b00)

Example 6: Given the following grammar, S A1B A 0A| B 0B|1B|

Give the leftmost and rightmost derivation of the following strings:

i.

00101

ii. iii.

1001 00011

SOLUTION i. Leftmost: S => A1B => 0A1B => 00A1B => 001B => 0010B => 00101B => 00101 Rightmost: S => A1B => A10B => A101B => A101 => 0A101 => 00A101 => 00101 H/W H/W

ii. iii. Note

Any derivation has an equivalent leftmost and an equivalent rightmost derivation. That is, if w is a terminal string, and A a variable, then A w if and only if A

w, and A w if and only if A

w.

Parse Trees
A parse tree is a tree representation of derivations that has proved extremely helpful. It shows clearly how the symbols of the terminal string are grouped into substrings, each of which belongs to the language of one of the variables of the grammar. A parse tree is also used in compilers as a data structure of choice to represent a source program. Constructing Parse Trees A parse tree for a grammar G=(V, T, P, S) is a tree with the following conditions: 1. Each interior node is labeled by a variable in V. 2. Each leaf is labeled by either a variable, a terminal, or . However if the leaf is labeled , then it must be the only child of its parent. 3. If an interior node is labeled A and its children are labeled X1, X2,, Xk respectively from the left, then AX1X2Xk is a production in P. Note that the only time one of the Xs can be is if that is the label of the only child, and A is a production of G.

Example 7: Construct a parse tree using a grammar of example 3 to show that I+ E can be derived from E SOLUTION

I A parse tree showing the derivation of I+E from E

Example 8: Construct a parse tree using a grammar of example 1 to show that 0110 can be derived from P SOLUTION P

A parse tree showing the derivation P 0110

Anda mungkin juga menyukai