ENGINEERING
LAB MANUAL
FOR
1 Using Divide and Conquer Strategies design a function for Binary Search using
C++/ Java/ Python/Scala.
Group A
2 Using Divide and Conquer Strategies design a class for Concurrent Quick Sort
using C++.
Prerequisites:
Knowledge of writing programs in C++.
Objectives:
To learn the concept of Divide and Conquer Strategy.
To study the design and implementation of Binary Search algorithm.
Theory:
Divide and Conquer strategy:
A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-
problems of the same (or related) type, until these become simple enough to be solved directly. The
solutions to the sub-problems are then combined to give a solution to the original problem.
This technique is the basis of efficient algorithms for all kinds of problems, such as sorting (e.g.,
quicksort, merge sort), multiplying large numbers, syntactic analysis (e.g., top-down parsers) and
computing the discrete Fourier transform (FFTs).
Searching
Sequential Algorithm
function sequential (T [ 1 .. n ], x)
{ sequential search for x in array T
} for i1 to n do
if T[i] x then return
i return n+1
This algorithm clearly takes a time in (r), where r is the index returned : this is O(n) in the worst case and
(1) in the best case. If we assume that all the elements of T are distinct, that x is indeed somewhere in the
array
CL-I B.E. Computer Engineering
Binary Search
The binary search algorithm begins by comparing the target value to value of the middle element of the
sorted array. If the target value is equal to the middle element's value, the position is returned. If the
target value is smaller, the search continues on the lower half of the array, or if the target value is
larger, the search continues on the upper half of the array. This process continues until the element is
found and its position is returned, or there are no more elements left to search for in the array and a
"not found" indicator is returned.
Binary search can be applied to sorted list only. It searches sorted lists using a divide and conquer
technique. On each iteration the search domain is cut in half, until the result is found. The
computational complexity of a binary search is O(log n).
functionbinrec (T [i .. j ], x)
{ binary search for x in subarray T [i .. j] }
If i = j then return i k
(i+j+1)/2
7
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Conclusion:
8
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
The concept of divide and conquer strategy is studied and binary search algorithm is implemented
using C++.
FAQs:
1) What is Divide and Conquer approach? Also explain its advantages.
3) Explain the need of analysis of algorithm with respect to complexities as well as techniques
used for analysis.
4) Compute time complexity and space complexity of your program. Also give the proper
justification for same.
5) Compare the conventional Binary Search algorithm and the Divide and Conquer Binary Search
algorithm. Also explain the advantages of Divide and Conquer approach in terms of quick sort.
6) Compare between Divide and Conquer, Concurrent programming, Back tracking,brach and
bound approach.
Assignment No: 02
9
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Title: Using Divide and Conquer Strategies design a class for Concurrent Quick Sort using
C++.
Prerequisites:
Knowledge of writing programs in C++.
Objectives:
To learn the concept of Divide and Conquer Strategy.
To study the design and implementation of Quick Sort algorithm.
Theory:
Divide and Conquer strategy:
A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-
problems of the same (or related) type, until these become simple enough to be solved directly. The
solutions to the sub-problems are then combined to give a solution to the original problem.
This technique is the basis of efficient algorithms for all kinds of problems, such as sorting (e.g.,
quicksort, merge sort), multiplying large numbers, syntactic analysis (e.g., top-down parsers) and
computing the discrete Fourier transform (FFTs).
Sorting
Quick Sort
The sorting algorithm invented by Hoare, usually known as "quicksort", is also based on the idea of
divide-and-conquer. As a first step, this algorithm chooses one of the items in the array to be sorted as
the pivot. The array is then partitioned on either side of the pivot, elements are moved in such a way
that those greater than the pivot are placed on its right, whereas all the others are moved to its left. If
now the two sections of the array on either side of the pivot are sorted independently by recursive calls
of the algorithm, the final result is a completely sorted array, no subsequent merge step being
necessary. To balance the sizes of the two sub instances to be sorted, we would like to use the median
element as the pivot. Finding the median takes more time than it is worth. For this reason we simply
use the first element of the array as the pivot. The quick sort algorithm is given below.
procedure quicksort (T [i .. j ])
10
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
p <-T[i]
k<- i; 1<-j+1;
repeat k- k + 1 until T [k] > p or k >-
j repeat I E- 1- 1 until T [1] <- p
while k < I do
interchange T [k] and T [1]
repeat k F- k + 1 until T [k] >
p repeat 1 f- 1- 1 until T [1] p
interchange T [i] and T [1]
Quicksort is a sequential based, sequential sorting algorithm. It is a recursive algorithm that uses the list,
the pivot, and finds its position in the list where the key should be placed. This is the low side of the pivot
and ii) the keys larger than or equal to the pivot are placed to the high side of the pivot. Then the
same program is recursively applied on these two parts.
11
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
The average time complexity of Quick sort is O(n log n). The worst-case time complexity is O(n2 )
Flow Chart for Quick Sort using Divide and Conquer Approach.
12
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
13
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Conclusion:
The concept of divide and conquer strategy is studied and Concurrent Quick Sort algorithm is
implemented using C++.
FAQs
1) Explain the need of Divide and Conquer approach for Quick Sort.
3) Compare the conventional Quick Sort algorithm with Quick sort using Divide and Conquer .
Assignment No: 3
14
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Aim:
Assignment to understand the syntax of LEX specifications, built-in functions and variables. (Lexical
analyzer for sample language using LEX)
Objective:
1. To understand how to construct a compiler using LEX and YACC. LEX and YACC are tools used to
generate lexical analyzers and parsers.
2. To understand the application of data structures such as linked-lists and trees.
3. To understand LEX programming.
4. To understand rules ie LEX specification, built-in functions and variables.
What is LEX?
It is a tool for generating Lexical Analyzer. It takes a specification of tokens in the form of a
list of regular expression. From above input LEX generate a lexical analyzer. Its source file is a
specification file consisting of a set of regular expression together with an action.
%{
%}
Definition Section
%%
Rules Section
15
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
%%
User Subroutines
I] Definition Section:
In this section literal block, definitions, internal table declaration, start conditions and
translations are included.
We can use C code also as it is just by writing that code in special brackets as shown in above diagram i.e %{
%} all code in between those brackets is copied as it is in lex.yy.c. we can also declare Regular expression in
this section which we can use in Rule section.
There are some regular expressions used by the LEX with their meaning is listed below:
16
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
^ indicates match any character except the ones within the brackets
\ Escape character
17
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
matched by any input stream. This action is a typical C Code Statements stating what action
should be taken by LEX after matching pattern.
This section is for defining the other subroutines required for the Lexical analyzer like symbol
table management etc.
Hence it is also a typical C Code section. The main() method is defined here which will include yylex()
method. yylex() method is defined in LEX which calls the lex.yy.c.
Block Diagram:
a.out
18
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Input : FirstLexProgram.l
Output : lex.yy.c
This command will convert lex specification given in FirstLexProgram.l into C code. There is fixed
destination or the default file to store this C code and that is lex.yy.c.
Input : lex.yy.c
Output : a.out
This command will check that the lex.yy.c generated by first step is syntactically correct or not
according to C Language Syntax.
-O : It is Redirecting output to some file means store the result of compilation into the file mentioned after it
a.out: File containing the output of compilation. A.out is default. We can change this file. i.e. we can store
result in any file.
19
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Final a.out is nothing but the lexical analyzer. If we provide an input stream to the a.out it will separate
out the different tokens in given input stream
Build in variables
1. yytext: - It is the array variable which contains the current text
matched the pattern.
Build in Functions
1. yylex() : - Lexical analyzer produced by LEX is C routing called
yylex().
Build in macros
a. input():- Gets Next Character From Input
b. unput():- Put character back in logical input stream
If two possible rules that match the same length, LEXER use the ear Heres a program that does
nothing at all. All input is matched, but no action is associated with any pattern, so there will be no output.
%%
\n
The following example prepends line numbers to each line in a file. Some implementations of lex
predefine and calculate yylineno. The input file for lex is yyin, and defaults to stdin.
20
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Whitespace must separate the defining term and the associated expression. References to substitutions
in the rules section are surrounded by braces ({letter}) to distinguish them from literals. When we have a
match in the rules section, the associated C code is executed. Here is a scanner that counts the number of
characters, words, and lines in a file (similar to Unix wc).
21
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Conclusion:
22
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
LEX is a tool which accepts regular expressions as an input & generates a C code to recognize that
token. If that token is identified, then the LEX allows us to write user defined routines that are to be executed.
When we give input specification file to LEX, LEX generates lex.yy.c file as an output which contains
function yylex() which is generated by the LEX tool & contains a C code to recognize the token & action to be
carried out if we find the token.
We also wrote a small LEX specification for recognizing the C type comments.
FAQs:
3. What is a parser?
23
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Assignment No. 4
Aim:
Write an ambiguous CFG to implement Parser for sample Language using YACC and Lex.
Provide the details of all conflicting entries in the parser table generated by LEX and YACC and how
they have been resolved.
Objectives:
Theory:
Ambiguous grammars:
C and Java have an ambiguity in the grammar for expressions, which, hugely simplified, looks
something like this:
exp : exp '-' sub_exp
| sub_exp
;
sub_exp : '(' type_name ')' sub_exp
| '-' sub_exp
24
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
| id
| literal
| '(' exp ')'
;
type_name : id
| more_complex_type_descriptions
;
This allows expressions like: 1, a, a - 1, ( a ), ( a - 1 ), - a, ( int ) a
but what is meant by: ( b ) - ( c ) ?
The problem is that a single input string corresponds to more than one possible parse tree.
That is, it is a valid part of the language, but we don't know what it means for certain!
This is a genuine problem with Java and with C, that takes extra work by compiler-writers to
solve - every identifier has to be checked (e.g. by LEX) to see if it has already appeared in a class or
typedef declaration, in which case it definitely a type_name, otherwise it is an ordinary id and can't
become a type_name. We would also need to modify the grammar slightly to make this distinction
clear.
Ambiguous grammars are, by definition, going to be difficult to handle no matter what tools
we use. The assumption made with languages designed for computers is that we do our best to make
them unambiguous. Therefore, we would normally expect any tools we use, like YACC, only to have
to handle unambiguous grammars. Given that, can they handle any unambiguous grammar?
Unfortunately, the answer is ``no'' - there are unambiguous grammars that tools like YACC
and JAVACC can't handle. Luckily, for most good tools, you are unlikely to come across such a
grammar, and if you do, you can usually modify the grammar to overcome the problems but still
recognize the same language.
Equally unfortunately, there is no way of deciding whether a grammar is ambiguous or not -
the best that can be done is to try to create a parser, but if the process fails it can't tell us whether this
is because the grammar is really ambiguous or if it is just because the grammar is too confusing for
the kind of parser we are trying to make.
How to confuse parsers:
The decision that a parser repeatedly makes is: given what it has already read of the input, and
the grammar rules it has already recognised, what grammar rule comes next? The more input the
25
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
parser can look at before it has to make a decision, the more likely it is to be able to avoid confusion
and get it right.
For example, suppose we look at languages where assignment is a particular kind of
statement, rather than an operation that can be embedded in any expression:
stat : target '=' exp ';'
target '(' explist ')' ';'
;
target : id
| target '.' id
;
An LL(1) parser trying to compile this language would have difficulties distinguishing
between assignments (e.g. a=x;) and procedure calls i.e. functions/methods returning void (e.g. a(x);).
This is because an LL(1) parser has to decide which kind of statement it is looking at after seeing only
1 symbol (i.e. a), and it isn't until we see the = or ( that we can tell what is intended. Suppose we used
a more complex algorithm, such as LL(3) - even this couldn't decide between e.g. a.b=x and a.b(x). In
fact, no matter how far it looks ahead, an LL(n) parser, which looks ahead a fixed amount, can always
be confused by a sufficiently complicated target in an assignment or call.
There are two kinds of solutions - the parser can use a variable amount of lookahead, as
JAVACC can be asked to do, so it reads as far as the = or ( before making a decision - or we can
rewrite the grammar, by left-factorising it, so that the two kinds of statement are merged until we can
make the decision:
stat : target assign_or_call ';'
;
assign_or_call : '=' exp
| '(' explist ')'
;
An LR (1) parser has no difficulty dealing with the original grammar, as it will have read to
the end of the statement, and seen the = or (on the way, before it has to decide whether to recognize
an assignment or a call.
26
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
It is possible to construct unambiguous grammars that would confuse any LR(n) parser (as
well as any LL(n) parser) e.g. palindromes - strings that are their own mirror images, such as abba or
abacaba:
P:
| 'a' | 'b' | 'c' | . . .
| 'a' P 'a' | 'b' P 'b' | 'c' P 'c' | . . .
;
The problem is that, although it is perfectly obvious to us what to do - find the middle, and
work out to both ends - LR(n) and LL(n) read strictly left-to-right, and can only locate the middle of
the string by using their finite lookahead to find the end of the string. This could not work for strings
of length > n for LL(n), or length >2n for LR(n).
Confusing YACC:
Once an ambiguity has been pointed out in a grammar, it is usually clear enough to the user
what the problem is, even if it isn't obvious what to do about it. However, what kinds of error
messages are reported by tools like YACC, and how easy is it to find the corresponding ambiguity or
confusion?
YACC reports problems with grammars, whether ambiguous or just confusing, as
shift/reduce conflicts (where YACC can't decide whether to perform a shift or reduce - i.e. the
grammar rule is complete?) and/or as reduce/reduce conflicts (where YACC can't decide which
reduce to perform - i.e. which grammar rule is it?).
An example of a shift/reduce conflict:
The start of a function/method declaration in a C-like language, that accepts headers like void
fred(int a, int b, float x, float z), looks something like this header:
type_name id '(' params ')'
| type_name id '(' ')'
;
params : param
| params ',' param
;
27
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
param : type_name id
;
YACC has no problems with this grammar, but what if we modify it? It might be nice to be
able to write the example above simply as void fred(int a, b, float x, z). We could try rewriting the
grammar like this:
param : type_name ids
;
ids : id
| ids ',' id
;
But now, YACC reports a shift/reduce conflict, and the details from the y.output file are:
13: shift/reduce conflict (shift 15, reduce 5) on ','
state 13
param : type_name ids . (5)
ids : ids . ',' id (7)
That is, when the generated parser sees a , after a list of identifiers in a param, it doesn't know
whether that , (and the id it expects after) is part of the same param (in which case it should shift, to
include them as part of the RHS) or the start of the next param (in which case it should reduce this
RHS and start a new RHS).
This is not ambiguous, just confusing to YACC, as it needs more lookahead to see if the next
few symbols are e.g. , a b (a is a type_name, b is a parameter name of type a) or , a , or , a ) (a is a
parameter name of the current type). The way to make this clear to YACC is to rewrite the grammar
so that it can see more of the input before having to make a decision:
params : type_name id
| params ',' type_name id
| params ',' id
;
An example of a reduce/reduce conflict:
state 8
sub_exp : id . (5)
type_name : id . (8)
28
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
That is, when it sees id) it doesn't know whether the id is a variable giving a value or a type
name, so it doesn't know which rule to use to recognize the id.
Assuming we don't already know what the problem is, this hasn't helped much, but we can get
more information by working back through the states in the y.output file to try to find how we get
here. To do so, we need to look for states that include shift 8 or goto 8. In this example, all we find is:
state 4
sub_exp : '(' . type_name ')' sub_exp (3)
sub_exp : '(' . exp ')' (7)
...
id shift 8
So the input must include (id), which can be recognized either as a type-cast or as an
expression.
This is a big hint about the source of the ambiguity in the grammar, but more by luck than
anything else - YACC remains confused even if we make the grammar unambiguous, by removing
the rule sub_exp : '-' sub_exp. YACC still reports the same reduce/reduce conflict for this modified
grammar, as it is confused by an input as simple as ( a ) - it has to decide whether this is a value in an
expression or a type-cast before it reads past the ) to see e.g. ( a ) 99 (i.e. a type-cast) or ( a ) - 99 (i.e.
the value a - 99).
Luckily, the solution to the general problem of the ambiguity - to somehow get LEX to
distinguish between identifiers that are really type names (or class names) and all other identifiers -
also solves this confusion for YACC.
Epilogue:
Most of the time, an ambiguous grammar results from an error made by the implementers of a
programming language. Sometimes, however, it is the fault of the language designer. Many languages
are defined in such a way that some part is either inherently ambiguous or confusing (e.g. not LR(1)).
Does this matter? We should not limit language designers to what a particular type of parser generator
29
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
can cope with, but on the other hand there is no particular merit in making a language harder to
compile if a small change can simplify the problem.
An example of this is a well-known problem with conditional statements; the dangling else.
Most imperative languages permit conditional statements to take two slightly different forms:
if ( ... ) ...
So the else d in if (a) if (b) c else d could be associated either with if (a) or with if (b).
Most languages attempt to fix this problem by stating that the second interpretation is more
natural, and so is correct, although some languages have different rules. Whatever the language
definition, it is an extra rule that anyone learning the language has to remember.
Similarly, the compiler writer has to deal with this special case: if we use a tool like YACC
we get a shift/reduce error - do we shift the else to get if (b) c else d, or do we reduce the if (b) c as it
stands, so we get if (a) ... else d To overcome this problem, we can rewrite the grammar to explicitly
say ``you can't have an unmatched then (logically) immediately before an else - the then and the else
must be paired up'':
stat : matched
| unmatched
|...
|...
| exp
| . . .;
31
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
32
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Conclusion:
We have written an ambiguous CFG to recognize an infix expression and implement a parser
that recognizes the infix expression using YACC, And also the details of all conflicting entries in the
parser table generated by LEX and YACC and how they have been resolved.
Questions:
4. What is ambiguity?
Assignment No. 05
Aim:
Theory:
34
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Semantic Actions:
Parsing tools use a generalization of CFG's in which each grammar symbol one or more
values, called attributes, have associated with it. Each production of the grammar may have an
associated "action", which can refer to and compute the values of attributes. So we have:
Terminals & non-terminals . have attributes
Productions . have semantic actions
Example:
E -> E' + E
| E'
E' -> int * E'
| int
For each symbol, let X.val be an integer value associated with X.
For terminal symbols, val is the lexeme provided by the lexical analyzer.
For non-terminals, val should be the integer value of the expression. This attribute is
computed from the attributes of sub-expressions.
Production Action:
E -> E' + E1 E.val = E'.val + E1.val
| E' E.val = E'.val
E' -> int * E1' E'.val = int.val * E1'.val
| int E'.val = int.val
Note: the attribute of some grammar symbols, such as the terminals + and *, is unused.
Example:
5*3+2*4
Parse Tree Equations
E1 E1.val = E3'.val + E2.val
------------- E3'.val = int7.val + E4'.val
E3'+E2 E4'.val = int8.val
35
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
37
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
- In practice, computing all of the attribute dependencies from the AST is rarely, if ever,
used. Instead, special cases of syntax-directed definitions are used where the attribute
evaluation order can be determined once and for all from the actions.
- The most important special case is S-attributed grammars: grammars with only
synthesized attributes. Building an AST is an example of an S-attributed grammar (i.e., PA3).
These attributes can be evaluated bottom-up during parsing.
Testing For Circularity:
- If an attribute grammar has a dependence cycle among attributes in some parse tree, then
the attribute grammar is said to be circular.
- Circular attribute grammars are considered meaningless---that is, erroneous.
- It is possible to check whether a given attribute grammar is circular.
Input:
Identifiers from the input in a symbol table and other relevant information about the identifiers
Output:
Instructions:
For the For Statement, if, if-else statement as per the syntax of C or Pascal and generate
equivalent three address code for the given input made up of constructs mentioned above using LEX
and YACC. Write a code to store the identifiers from the input in a symbol table and also to record
other relevant information about the identifiers from the input in a symbol table and also to records
stored in the symbol table.
38
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
39
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Conclusion:
Questions:
40
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
41
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
GROUP B: ASSIGNMENTS
( any 6 Assignments)
Assignment No: 07
Aim:
Objective:
42
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Theory:
Code generation is final phase of compiler. Basically code generation is process of creating low
level (assembly language or m/c ) code for three address code (generated by intermediate code
generation phase) or optimized three address code(Optimized by Code Optimizer phase).
Symbol Table
Read the expression in the form of operator ,operand1,operand2 and generate code using following
algorithm .
Gen_Code(operator,operand1,operand2)
{
If(operand1.addressmode=R)
{
If(operator=+)
Generate(ADD operand2,R0);
else if(operator=-)
Generate(SUB operand2,R0);
else if(operator=*)
43
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Generate(MUL operand2,R0);
else if(operator=/)
Generate(DIV operand2,R0);
}
else If(operand2.addressmode=R)
{
If(operator=+)
Generate(ADD operand1,R0);
else if(operator=-)
Generate(SUB operand1,R0);
else if(operator=*)
Generate(MUL operand1,R0);
else if(operator=/)
Generate(DIV operand1,R0);
}
else{
If(operator=+)
Generate(ADD operand2,R0);
else if(operator=-)
Generate(SUB operand2,R0);
else if(operator=*)
Generate(MUL operand2,R0);
else if(operator=/)
Generate(DIV operand2,R0);
}
}
44
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Example:
X:= (a+b)*(c-d)+((e/f)*(a+b))
t1:=a+b
t2:=c-d
t3:=e/f
t4:=t1*t2
t5=t3*t1
t6:=t4+t5
Using simple code generation algorithm the sequence target code can be generated
MOV a,R0
ADD b,R0
Empty
R0 contains t1
t1 R R0
t2:=c-d
MOV c,R1
SUB d,R1
R1 contains c
R1 contains t2
t2 R R1
t3:=e/f
MOV e,R2
45
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
DIV f,R2
R2 contains e
R2 contains t3
t3 R R2
t4:=t1*t2
MUL R0,R1
R0 contains t1
R1 contains t2
R1 contains t4
t4 R R1
t5=t3*t1
MUL R2,R1
R2 contains t3
R0 contains t1
R0 contains t5
t5 R R0
t6:=t4+t5
ADD R1,R0
R0 contains t4
R0 contains t5
R0 contains t6
t6 R R0
46
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
47
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Conclusion :
Thus we have studied to generate the target code for the optimized code.
Questions:
1. What is complier?
4. What is Ambiguity?
5. Explain the difference between the target code and intermediate code?
48
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Assignment No: 8
49
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Aim: Write a LEX and YACC program to generate abstract syntax tree.
Objective:
To understand working of Code Generation Phase of Compiler
Theory:
The purpose of this lab is to create and print an abstract syntax tree for a C program. The C program will
use only a small subset of the grammar.
As an example of a syntax tree, consider the statement tri_area = (base * height)/2;
The root node is an assignment operation. Its left subtree is a pointer to tri area.
Its right subtree represents the expression (base * height)/2. The tree looks like the tree in Figure
2
In this display, each node is followed by its left subtree and then its right subtree, indented one tab
stop. Notice that base and height are dereferenced, but tri area isn't. That will be explained next.
Tree Nodes and the Tree Node Class
A tree node will be implemented by the Tree Node class. If a tree node is an interior node, then it will
contain an operator that acts on the left and right subtrees. The operator will have a mode, which will be
the data type involved in the operation. For example, if the mode of an assignment operator is INT, then
the operator will assign an int to an int. If a tree node is an exterior (leaf) node, then it will contain an
object, which will be an identi_er or a number (and later a string). The mode of an exterior node will be
the kind of object stored in that node. For example, if the object is an integer variable (l-value), then the
mode will be a pointer to an INT.
If the object is an integer constant, then the mode will be INT. Open the _le TreeNode.java. This
_le de_nes the TreeNode class whose objects have the following attributes: the operation (oper)
represented by the node, the mode (mode) of the operation, a reference to the left subtree (left), a
reference to the right subtree (right), the identi_er (id) represented by the node, the number (num)
represented by the node, and the string (str) represented by the node.
If the node is a binary interior node, then left and right will be non-null, and id, num, and str will be
unde_ned. On the other hand, if the node is an exterior node, then left and right will be null, while exactly
one of id, num, and str will be de_ned, depending on the kind of exterior node. From time to time, we will
have unary interior nodes. They will always use the left subtree rather than the right subtree.
Note the types of the data members oper, mode, left, right, id, num, and
str. Also, one constructor
publicTreeNode(IdEntryi)
and the toString() function have been de_ned. You will de_ne three additional constructors. First, de_ne
the default constructor:
publicTreeNode()
It should set oper, mode, and num to 0 and left, right, id, and str to
null. Next, de_ne the following constructor.
publicTreeNode(int op, int m, TreeNode l, TreeNode r)
The purpose of this constructor is to join together two existing trees, with root nodes l and r, as the left
and right subtrees of a new tree with this node as its root node.
In the root node, the value of oper should be op and the value of mode should be
m. Finally, define the constructor
51
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
publicTreeNode(int n)
It will create a node that represents a number. The member oper should be Ops.NUM, mode
should be Ops.INT, and num should be the value of n. Write these constructors. We will use these
constructors later in this lab.
Yacc is a tool for building syntax analyzers, also known as parser,yacc has been used to
implement hundreds of languages. Its applications range from small desk calculators, to medium-sized
preprocessors for typesetting, to large compiler front ends for complete programming languages.
A yacc specification is based on a collection of grammar rules that describe the syntax of a
language; yacc turns the specification into a syntax analyzer. A pure syntax analyzer merely checks
whether or not an input string conforms to the syntax of the language.
Algorithm:
Step1: Start
Step2: declare the declarations as a header file {include<ctype.h>}
Step3: token digit
Step4: define the translations rules like line, expr, term, factor
Line: expr \n {print (\n %d \n,$1)}
Expr: expr+ term ($$=$1=$3}
Term: term + factor ($$ =$1*$3}
Factor: (enter) {$$ =$2)
%%
Step5: define the supporting C routines
Step6: Stop
Conclusion:
FAQs
1. What is AST?
2. What is the need of AST?
3. Which phase of compiler generates AST?
4. What are the applications of AST in compiler?
52
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Assignment No: 9
Objective:
To develop a recursive-descent parser for a given grammar.
53
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
A recursive descent parser is a kind of top-down parser built from a set of mutually-recursive
procedures (or a non-recursive equivalent) where each such procedure usually implements one of
the production rules of the grammar. Thus the structure of the resulting program closely mirrors that
of the grammar it recognizes.
This parser attempts to verify that the syntax of the input stream is correct as it is read from left to
right. A basic operation necessary for this involves reading characters from the input stream and
matching then with terminals from the grammar that describes the syntax of the input. Our recursive
descent parsers will look ahead one character and advance the input stream reading pointer when
proper matches occur. What a recursive descent parser actually does is to perform a depth-first
search of the derivation tree for the string being parsed. This provides the 'descent' portion of the
name. The 'recursive' portion comes from the parser's form, a collection of recursive procedures.
As our first example, consider the simple grammar
E ->
x+T T -
> (E) T
-> x
and the derivation tree in figure 2 for the expression x+(x+x)
54
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
A recursive descent parser traverses the tree by first calling a procedure to recognize an E. This
procedure reads an 'x' and a '+' and then calls a procedure to recognize a T. This would look like the
following routine.
Procedure E()
Begin
If (input_symbol=x)
then next();
If (input_symbol=+) then
Next();
T();
Else
Errorhandler();
END
Note that the 'next' looks ahead and always provides the next character that will be read from the
input stream. This feature is essential if we wish our parsers to be able to predict what is due to
arrive as input. Note that 'errorhandler' is a procedure that notifies the user that a syntax error has
been made and then possibly terminates execution.
In order to recognize a T, the parser must figure out which of the productions to execute. This is not
difficult and is done in the procedure that appears below.
Procedure T()
Begin
Begin
If (input_symbol=()
then next();
E();
If (input_symbol=))
then next();
end
else If (input_symbol=x)
then next();
else
55
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Errorhandler
();
END
In the above routine, the parser determines whether T had the form (E) or x. If not then the error
routine was called, otherwise the appropriate terminals and nonterminals were recognized.
Algorithm:
1. Make grammar suitable for parsing i.e. remove left recursion (if required).
2. Write a function for each production with error handler.
3. Given input is said to be valid if input is scanned completely and no error function is called.
Conclusion:
FAQs:
1.What do you mean by Recursive Descent Parsing?
2.What are the applications of Recursive descent parse
3.Advantages of Recursive descent parse
56
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
GROUP C: ASSIGNMENTS
(Any one)
57
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Assignment No:13
Title: Generate Huffman codes for a gray scale 8 bit image.
Prerequisites:
Knowledge of Huffman codes.
Objectives:
To generate Huffman codes for a gray scale 8 bit image.
Theory:
Huffman coding,
Huffman coding, an algorithm developed by David A. Huffman while he was a Ph.D. student at MIT,
and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".
The output from Huffman's algorithm can be viewed as a variable-length code table for encoding a source
symbol (such as a character in a file). The algorithm derives this table from the estimated probability or
frequency of occurrence (weight) for each possible value of the source symbol. As in other entropy
encoding methods, more common symbols are generally represented using fewer bits than less common
symbols. Huffman's method can be efficiently implemented, finding a code in linear time to the number of
input weights if these weights are sorted. However, although optimal among methods encoding symbols
separately, Huffman coding is not always optimal among all compression methods.
The beauty of Huffman codes is that variable length codes can achieve a higher data density than fixed
length codes if the characters differ in frequency of occurrence. The length of the encoded character is
inversely proportional to that character's frequency. Huffman wasn't the first to discover this, but his
paper presented the optimal algorithm for assigning these codes. Huffman codes are similar to the
Morse code. Morse code uses few dots and dashes for the most frequently occurring letter. An E is
represented with one dot. A T is represented with one dash. Q, a letter occurring less frequently is
represented with dash-dash-dot-dash.
Huffman codes are created by analyzing the data set and assigning short bit streams to the datum occurring
58
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
most frequently. The algorithm attempts to create codes that minimize the average number of bits per
character. Table 9.1 shows an example of the frequency of letters in some text and their corresponding
Huffman code. To keep the table manageable, only letters were used. It is well known that
in English text, the space character is the most frequently occurring character.
As expected, E and T had the highest frequency and the shortest Huffman codes. Encoding with these
codes is simple. Encoding the word toupee would be just a matter of stringing together the appropriate
bit strings, as follows:
T 0 U P E E
One ASCII character requires 8 bits. The original 48 bits of data have been coded with 23 bits
achieving a compression ratio of 2.08.
59
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
K 0.43 1100011
L 3.79 00101
M 3.06 10100
N 6.81 0110
O 7.59 0100
P 2.58 10110
Q 0.14 1100010000
R 6.67 0111
S 7.64 0011
T 8.37 111
U 2.43 10111
V 0.97 0101001
W 1.07 0101000
X 0.29 11000101
Y 1.46 010101
Z 0.09 1100010001
60
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
Modified Huffman coding is used in fax machines to encode black on white images (bitmaps). It is also
an option to compress images in the TIFF file format. It combines the variable length codes of Huffman
coding with the coding of repetitive data in run length encoding. Since facsimile transmissions are
typically black text or writing on white background, only one bit is required to represent each pixel or
sample. These samples are referred to as white bits and black bits. The runs of white bits and black bits
are counted, and the counts are sent as variable length bit streams.
The encoding scheme is fairly simple. Each line is coded as a series of alternating runs of white and
black bits. Runs of 63 or less are coded with a terminating code. Runs of 64 or greater require that a
makeup code prefix the terminating code. The makeup codes are used to describe runs in multiples of
64 from 64 to 2560. This deviates from the normal Huffman scheme which would normally require
encoding all 2560 possibilities. This reduces the size of the Huffman code tree and accounts for the
term modified in the name.
Studies have shown that most facsimiles are 85 percent white, so the Huffman codes have been
optimized for long runs of white and short runs of black. The protocol also assumes that the line begins
with a run of white bits. If it doesn't, a run of white bits of 0 length must begin the encoded line. The
encoding then alternates between black bits and white bits to the end of the line. Each scan line ends
with a special EOL (end of line) character consisting of eleven zeros and a 1 (000000000001). The
EOL character doubles as an error recovery code. Since there is no other combination of codes that has
more than seven zeroes in succession, a decoder seeing eight will recognize the end of line and
continue scanning for a 1. Upon receiving the 1, it will then start a new line. If bits in a scan line get
corrupted, the most that will be lost is the rest of the line. If the EOL code gets corrupted, the most that
will get lost is the next line.
Tables 13.2 and 13.3 show the terminating and makeup codes. Figure 13.1 shows how to encode a
1275 pixel scanline with 53 bits.
Run White bits Black bits Run White bits Black bits
Length Lengt
h
0 00110101 0000110111 32 00011011 000001101010
1 000111 010 33 00010010 000001101011
61
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
64 11011 000000111
128 10010 00011001000
192 010111 000011001001
256 0110111 000001011011
320 00110110 000000110011
384 00110111 000000110100
448 01100100 000000110101
512 01100101 0000001101100
576 01101000 0000001101101
640 01100111 0000001001010
704 011001100 0000001001011
768 011001101 0000001001100
832 011010010 0000001001101
896 101010011 0000001110010
960 011010100 0000001110011
1024 011010101 0000001110100
1088 011010110 0000001110101
1152 011010111 0000001110110
1216 011011000 0000001110111
1280 011011001 0000001010010
1344 011011010 0000001010011
1408 011011011 0000001010100
1472 010011000 0000001010101
1536 010011001 0000001011010
1600 010011010 0000001011011
63
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
words
0 white 00110101
1 block 010
4 white 1011
2 block 11
64
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering
1 white 0111
block 010
1266 white 011011000 + 01010011
EOL 000000000001
Conclusion:
Generation of Huffman codes for a gray scale 8 bit image is studied.
65
Dr. D. Y. Patil College of Engg.,Ambi