Anda di halaman 1dari 65

DEPARTMENT OF COMPUTER

ENGINEERING

B.E COMP. ENGG. SEM: I

LAB MANUAL
FOR

410446: Computer Laboratory-I

410446 Computer Laboratory-I

Teaching Scheme Examination Scheme Oral Assessment: 50


Practicals: 4 Hrs/Week Practical Assessment : 50
List of Practicals

1 Using Divide and Conquer Strategies design a function for Binary Search using
C++/ Java/ Python/Scala.
Group A
2 Using Divide and Conquer Strategies design a class for Concurrent Quick Sort
using C++.

3 Assignment to understand the syntax of LEX specifications, built-in functions and


variables. (Lexical analyzer for sample language using LEX).

4 Write an ambiguous CFG to implement Parser for sample Language using


YACC and Lex. Provide the details of all conflicting entries in the parser
table generated by LEX and YACC and how they have been resolved

5 To write an attributed translation grammar to recognize declarations of simple


variables, arithmetic expressions, for if, if-else statement as per syntax of
C to generate three address code for the given input.
6 A company has three offices at remote locations with requirement of
interoperability with remote services.
Each office has a server, TCP/IP and different users including administrator,
privileged users and common
clients. Design a network model for the same. Demonstrate the network
model using

7 To generate the target code for the optimized code in assignment.


8 Write a LEX and YACC program to generate abstract syntax tree.

9 Write a program to generate Recursive Descent Parser.

10 Write a program in python to calculate end-to-end packet delay for ethernet,


802.11 and 802.15.4 and compare the results. End-to-end packet delay should
include processing delay, queuing delay, transmission delay and propagation
delay.
11 For wireless routing, design and compare distributed Bellman-Ford algorithm
and Dijkstras algorithm use FOSS Eclipse C++/ Java/ Python/ Scala for
programming.
12 The class rooms and laboratories are connected through a distributed network
having n nodes with security cameras (IP-based) along with the other
sensors such as thumb marks of attendance. Design a network for your college
for security management and attendance management. The departments are
connected in a bipartite graph and Heads are connected to the administrative
offices of the college. Design a network and test it the efficient data handling
by different entities. Develop a model to demonstrate Dijkstras algorithm for
sampling the data. Use Python and NS3.
Group C 13 Generate Huffman codes for a gray scale 8 bit image.
GROUP A: ASSIGNMENTS
(Mandatory Six Assignments)
Assignment No: 01
Title: Using Divide and Conquer Strategies design a function for Binary Search using C++/
Java/ Python/Scala.
Aim: Implementation of Binary Search algorithm using using C++/ Java/ Python/Scala.

Prerequisites:
Knowledge of writing programs in C++.
Objectives:
To learn the concept of Divide and Conquer Strategy.
To study the design and implementation of Binary Search algorithm.
Theory:
Divide and Conquer strategy:
A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-
problems of the same (or related) type, until these become simple enough to be solved directly. The
solutions to the sub-problems are then combined to give a solution to the original problem.
This technique is the basis of efficient algorithms for all kinds of problems, such as sorting (e.g.,
quicksort, merge sort), multiplying large numbers, syntactic analysis (e.g., top-down parsers) and
computing the discrete Fourier transform (FFTs).

Searching
Sequential Algorithm
function sequential (T [ 1 .. n ], x)
{ sequential search for x in array T
} for i1 to n do
if T[i] x then return
i return n+1

This algorithm clearly takes a time in (r), where r is the index returned : this is O(n) in the worst case and
(1) in the best case. If we assume that all the elements of T are distinct, that x is indeed somewhere in the
array
CL-I B.E. Computer Engineering

Binary Search
The binary search algorithm begins by comparing the target value to value of the middle element of the
sorted array. If the target value is equal to the middle element's value, the position is returned. If the
target value is smaller, the search continues on the lower half of the array, or if the target value is
larger, the search continues on the upper half of the array. This process continues until the element is
found and its position is returned, or there are no more elements left to search for in the array and a
"not found" indicator is returned.
Binary search can be applied to sorted list only. It searches sorted lists using a divide and conquer
technique. On each iteration the search domain is cut in half, until the result is found. The
computational complexity of a binary search is O(log n).

Binary Search Algorithm


function binsearch (T [ 1 .. n ], x)
{ binary search for x in array T [1..n]} if
n = 0 or x > T [n] then return n+1
else return binrec (T[1..n], x)

functionbinrec (T [i .. j ], x)
{ binary search for x in subarray T [i .. j] }

If i = j then return i k
(i+j+1)/2

if x T [k] then return binrec (T [i . k ], x)


else return binrec (T [k+1 .. j ],x)
Binary searching is the algorithm used to look up a word in a dictionary or a name in a telephone
directory. It is probably the simplest application of divide-and-conquer. It can be applied to a sorted list
only.

7
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Flowchart for Implementation of Divide and Conquer

Conclusion:
8
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

The concept of divide and conquer strategy is studied and binary search algorithm is implemented
using C++.

FAQs:
1) What is Divide and Conquer approach? Also explain its advantages.

2) What is Time Complexity of algorithm? Explain the different time complexities.

3) Explain the need of analysis of algorithm with respect to complexities as well as techniques
used for analysis.

4) Compute time complexity and space complexity of your program. Also give the proper
justification for same.

5) Compare the conventional Binary Search algorithm and the Divide and Conquer Binary Search
algorithm. Also explain the advantages of Divide and Conquer approach in terms of quick sort.

6) Compare between Divide and Conquer, Concurrent programming, Back tracking,brach and
bound approach.

Assignment No: 02
9
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Title: Using Divide and Conquer Strategies design a class for Concurrent Quick Sort using
C++.

Aim: Implementation of Concurrent Quick Sort algorithm using C++.

Prerequisites:
Knowledge of writing programs in C++.
Objectives:
To learn the concept of Divide and Conquer Strategy.
To study the design and implementation of Quick Sort algorithm.
Theory:
Divide and Conquer strategy:
A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-
problems of the same (or related) type, until these become simple enough to be solved directly. The
solutions to the sub-problems are then combined to give a solution to the original problem.
This technique is the basis of efficient algorithms for all kinds of problems, such as sorting (e.g.,
quicksort, merge sort), multiplying large numbers, syntactic analysis (e.g., top-down parsers) and
computing the discrete Fourier transform (FFTs).

Sorting

Quick Sort

The sorting algorithm invented by Hoare, usually known as "quicksort", is also based on the idea of
divide-and-conquer. As a first step, this algorithm chooses one of the items in the array to be sorted as
the pivot. The array is then partitioned on either side of the pivot, elements are moved in such a way
that those greater than the pivot are placed on its right, whereas all the others are moved to its left. If
now the two sections of the array on either side of the pivot are sorted independently by recursive calls
of the algorithm, the final result is a completely sorted array, no subsequent merge step being
necessary. To balance the sizes of the two sub instances to be sorted, we would like to use the median
element as the pivot. Finding the median takes more time than it is worth. For this reason we simply
use the first element of the array as the pivot. The quick sort algorithm is given below.
procedure quicksort (T [i .. j ])
10
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

{ sorts array T [i .. j ] into increasing order


} if j - i is small then insert (T [i .. j ])
else pivot (T [i .. j ],1)
quicksort(T [i .. I -1])
quicksort (T [1 +1 .. j ])
Let p = T [i ] be the pivot. One good way of pivoting consists of scanning the array T [i .. j ] just once,
but starting at both ends. Pointers k and 1 are initialized to i and j + 1, respectively. Pointer k is then
incremented until T [k] >p, and pointer I is decremented until T [1] <- p. Now T [k] and T [1] are
interchanged. This process continues as long as k < 1. Finally, T [i] and T [1] are interchanged to put the
pivot in its correct position.
procedure pivot (T [i .. j ] ; var 1)
{ permutes the elements in array T [i .. j ] in such a way that, at the
end, i<- l <- j, the elements of T [i .. 1-1] are not greater than p,
T[11 =p, and the elements of T J1+1 .. j ] are greater than
p, where p is the initial value of T [i ] }

p <-T[i]
k<- i; 1<-j+1;
repeat k- k + 1 until T [k] > p or k >-
j repeat I E- 1- 1 until T [1] <- p
while k < I do
interchange T [k] and T [1]
repeat k F- k + 1 until T [k] >
p repeat 1 f- 1- 1 until T [1] p
interchange T [i] and T [1]

Quicksort is a sequential based, sequential sorting algorithm. It is a recursive algorithm that uses the list,
the pivot, and finds its position in the list where the key should be placed. This is the low side of the pivot
and ii) the keys larger than or equal to the pivot are placed to the high side of the pivot. Then the
same program is recursively applied on these two parts.

11
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

The average time complexity of Quick sort is O(n log n). The worst-case time complexity is O(n2 )

Flow Chart for Quick Sort using Divide and Conquer Approach.
12
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

13
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Conclusion:
The concept of divide and conquer strategy is studied and Concurrent Quick Sort algorithm is
implemented using C++.

FAQs

1) Explain the need of Divide and Conquer approach for Quick Sort.

2) What is advantage of Divide and Conquer Technique over the recursion?

3) Compare the conventional Quick Sort algorithm with Quick sort using Divide and Conquer .

4) When does the worst case of Quick Sort occur?

5) What are the advantages and disadvantages of quick sort?

6) What is the complexity of quick sort?

Assignment No: 3
14
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Aim:

Assignment to understand the syntax of LEX specifications, built-in functions and variables. (Lexical
analyzer for sample language using LEX)

Objective:

1. To understand how to construct a compiler using LEX and YACC. LEX and YACC are tools used to
generate lexical analyzers and parsers.
2. To understand the application of data structures such as linked-lists and trees.
3. To understand LEX programming.
4. To understand rules ie LEX specification, built-in functions and variables.

What is LEX?
It is a tool for generating Lexical Analyzer. It takes a specification of tokens in the form of a
list of regular expression. From above input LEX generate a lexical analyzer. Its source file is a
specification file consisting of a set of regular expression together with an action.

LEX Specification Format ( . l File Format):-


This file has three sections as given below

%{

<C global variables, prototypes and comments >

%}

Definition Section

%%

Rules Section
15
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

%%

User Subroutines

Fig. 1.1: LEX Specification Structure

Let Discus each section in brief:

I] Definition Section:

In this section literal block, definitions, internal table declaration, start conditions and
translations are included.

We can use C code also as it is just by writing that code in special brackets as shown in above diagram i.e %{
%} all code in between those brackets is copied as it is in lex.yy.c. we can also declare Regular expression in
this section which we can use in Rule section.

There are some regular expressions used by the LEX with their meaning is listed below:

Regular Expression used by LEX:

Regular expression: A Regular Expression is a pattern description using a meta


language, a language that you use to describe particular patterns of interest.

Regular Expressions used by LEX

Regular Expression Meaning

. Matches Any single character except \n

16
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

* Matches zero or more copies of preceding expression

[] Character class that matches any character within the brackets.

- indicates within the range any character

eg. [a-z] means any character between a to z.

^ indicates match any character except the ones within the brackets

^ Matches beginning of line as a first character of a regular expression

$ Matches end of line as a Last character of a regular expression

{} Indicates how many time previous pattern is allowed to match.

Eg. A{1,3} matches one to three occurances of letter A

\ Escape character

+ One Or More occurrences

? Zero or One occurrence

| Matches Either preceding or following regular expression

/ Matches preceding regular expression but only if followed by the following


regular expression

() Groups series of regular expression as a new regular expression

II] Rules Section:

Rule section is list of rules as follows

< Pattern > {Action}

<Pattern > {Action}

< Pattern > {Action}

<pattern> :- It is the rule that matched by token. It is nothing but the

regular expression for that particular token.

17
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

{ Action} :- It is the action taken by LEX when particular pattern is

matched by any input stream. This action is a typical C Code Statements stating what action
should be taken by LEX after matching pattern.

III] User Subroutines:

This section is for defining the other subroutines required for the Lexical analyzer like symbol
table management etc.

Hence it is also a typical C Code section. The main() method is defined here which will include yylex()
method. yylex() method is defined in LEX which calls the lex.yy.c.

Block Diagram:

LEX Specification LEX lex.yy.c

( i.e. .l file) Compiler (c routine for .l file

lex.yy.c CC Output File

a.out

Input Stream a.out List Of Token

Fig. 1.2: Block Diagram

How to create and Execute LEX Program?

18
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

As shown in block diagram there are three steps for execution

Let we have written lex specification in file named FirstLexProgram.l

Step 1. First compile Lex specification file using LEX Compiler.

Command : $> Lex FirstLexProgram.l

Input : FirstLexProgram.l

Output : lex.yy.c

This command will convert lex specification given in FirstLexProgram.l into C code. There is fixed
destination or the default file to store this C code and that is lex.yy.c.

Step 2. Compile the lex.yy.c using C compiler.

Command : $> cc lex.yy.c o a.out -ll

Input : lex.yy.c

Output : a.out

This command will check that the lex.yy.c generated by first step is syntactically correct or not
according to C Language Syntax.

-O : It is Redirecting output to some file means store the result of compilation into the file mentioned after it

a.out: File containing the output of compilation. A.out is default. We can change this file. i.e. we can store
result in any file.

-ll : This is linking the lex libraries at the time of compilation.

Step 3. Token Generation.

Command : $> a.out

Input : Input Stream

Output : List Of Tokens

19
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Final a.out is nothing but the lexical analyzer. If we provide an input stream to the a.out it will separate
out the different tokens in given input stream

Build in variables
1. yytext: - It is the array variable which contains the current text
matched the pattern.

2. yylval:- This variable contains the current matched number


3. yyleng:- It holds the Length of the string recognized by Lexer
4. yyin :- holds name of standard I/O. bydefault i.e. stdin

Build in Functions
1. yylex() : - Lexical analyzer produced by LEX is C routing called
yylex().

2. yywrap():- Returns 0 if no more input to read or 1 if more input to


read.

Build in macros
a. input():- Gets Next Character From Input
b. unput():- Put character back in logical input stream

Disambiguating Rules for LEX


c. LEX patterns only match a given input character or string once.
d. LEX executes the action for the longest possible match for current input.

If two possible rules that match the same length, LEXER use the ear Heres a program that does
nothing at all. All input is matched, but no action is associated with any pattern, so there will be no output.

%%

\n

The following example prepends line numbers to each line in a file. Some implementations of lex
predefine and calculate yylineno. The input file for lex is yyin, and defaults to stdin.
20
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Whitespace must separate the defining term and the associated expression. References to substitutions
in the rules section are surrounded by braces ({letter}) to distinguish them from literals. When we have a
match in the rules section, the associated C code is executed. Here is a scanner that counts the number of
characters, words, and lines in a file (similar to Unix wc).

Flow Chart for Lexical Analysis

Flow Chart for Execution Process

21
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Conclusion:

22
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

LEX is a tool which accepts regular expressions as an input & generates a C code to recognize that
token. If that token is identified, then the LEX allows us to write user defined routines that are to be executed.

When we give input specification file to LEX, LEX generates lex.yy.c file as an output which contains
function yylex() which is generated by the LEX tool & contains a C code to recognize the token & action to be
carried out if we find the token.

We also wrote a small LEX specification for recognizing the C type comments.

FAQs:

1. What are tokens?

2. What is the Lexical Analysis?

3. What is a parser?

4. Explain the working of lexical analyzer LEX?

5. Define the following terms: a) lexemes b) tokens c) pattern

6. What are the parse trees?

7. What are two parts of compilation?

8. What is the role of finite state automata in compiler construction?

9. Explain the term bootstrapping?

10. Enlist the various lexical analysis tools?

23
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Assignment No. 4

Aim:

Write an ambiguous CFG to implement Parser for sample Language using YACC and Lex.
Provide the details of all conflicting entries in the parser table generated by LEX and YACC and how
they have been resolved.

Objectives:

1. To understand the Ambiguous and unambiguous grammar.


2. To understand effect of ambiguity on parsing i.e. its consequences.
3. To build a parser table by automating its function through program.
4. To study Top down and Bottom up parser in details.

Theory:

Ambiguous and confusing grammars:

Ambiguous grammars:

C and Java have an ambiguity in the grammar for expressions, which, hugely simplified, looks
something like this:
exp : exp '-' sub_exp
| sub_exp
;
sub_exp : '(' type_name ')' sub_exp
| '-' sub_exp

24
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

| id
| literal
| '(' exp ')'
;
type_name : id
| more_complex_type_descriptions
;
This allows expressions like: 1, a, a - 1, ( a ), ( a - 1 ), - a, ( int ) a
but what is meant by: ( b ) - ( c ) ?
The problem is that a single input string corresponds to more than one possible parse tree.
That is, it is a valid part of the language, but we don't know what it means for certain!
This is a genuine problem with Java and with C, that takes extra work by compiler-writers to
solve - every identifier has to be checked (e.g. by LEX) to see if it has already appeared in a class or
typedef declaration, in which case it definitely a type_name, otherwise it is an ordinary id and can't
become a type_name. We would also need to modify the grammar slightly to make this distinction
clear.
Ambiguous grammars are, by definition, going to be difficult to handle no matter what tools
we use. The assumption made with languages designed for computers is that we do our best to make
them unambiguous. Therefore, we would normally expect any tools we use, like YACC, only to have
to handle unambiguous grammars. Given that, can they handle any unambiguous grammar?
Unfortunately, the answer is ``no'' - there are unambiguous grammars that tools like YACC
and JAVACC can't handle. Luckily, for most good tools, you are unlikely to come across such a
grammar, and if you do, you can usually modify the grammar to overcome the problems but still
recognize the same language.
Equally unfortunately, there is no way of deciding whether a grammar is ambiguous or not -
the best that can be done is to try to create a parser, but if the process fails it can't tell us whether this
is because the grammar is really ambiguous or if it is just because the grammar is too confusing for
the kind of parser we are trying to make.
How to confuse parsers:
The decision that a parser repeatedly makes is: given what it has already read of the input, and
the grammar rules it has already recognised, what grammar rule comes next? The more input the
25
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

parser can look at before it has to make a decision, the more likely it is to be able to avoid confusion
and get it right.
For example, suppose we look at languages where assignment is a particular kind of
statement, rather than an operation that can be embedded in any expression:
stat : target '=' exp ';'
target '(' explist ')' ';'
;
target : id
| target '.' id
;
An LL(1) parser trying to compile this language would have difficulties distinguishing
between assignments (e.g. a=x;) and procedure calls i.e. functions/methods returning void (e.g. a(x);).
This is because an LL(1) parser has to decide which kind of statement it is looking at after seeing only
1 symbol (i.e. a), and it isn't until we see the = or ( that we can tell what is intended. Suppose we used
a more complex algorithm, such as LL(3) - even this couldn't decide between e.g. a.b=x and a.b(x). In
fact, no matter how far it looks ahead, an LL(n) parser, which looks ahead a fixed amount, can always
be confused by a sufficiently complicated target in an assignment or call.
There are two kinds of solutions - the parser can use a variable amount of lookahead, as
JAVACC can be asked to do, so it reads as far as the = or ( before making a decision - or we can
rewrite the grammar, by left-factorising it, so that the two kinds of statement are merged until we can
make the decision:
stat : target assign_or_call ';'
;
assign_or_call : '=' exp
| '(' explist ')'
;
An LR (1) parser has no difficulty dealing with the original grammar, as it will have read to
the end of the statement, and seen the = or (on the way, before it has to decide whether to recognize
an assignment or a call.

26
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

It is possible to construct unambiguous grammars that would confuse any LR(n) parser (as
well as any LL(n) parser) e.g. palindromes - strings that are their own mirror images, such as abba or
abacaba:
P:
| 'a' | 'b' | 'c' | . . .
| 'a' P 'a' | 'b' P 'b' | 'c' P 'c' | . . .
;
The problem is that, although it is perfectly obvious to us what to do - find the middle, and
work out to both ends - LR(n) and LL(n) read strictly left-to-right, and can only locate the middle of
the string by using their finite lookahead to find the end of the string. This could not work for strings
of length > n for LL(n), or length >2n for LR(n).

Confusing YACC:
Once an ambiguity has been pointed out in a grammar, it is usually clear enough to the user
what the problem is, even if it isn't obvious what to do about it. However, what kinds of error
messages are reported by tools like YACC, and how easy is it to find the corresponding ambiguity or
confusion?
YACC reports problems with grammars, whether ambiguous or just confusing, as
shift/reduce conflicts (where YACC can't decide whether to perform a shift or reduce - i.e. the
grammar rule is complete?) and/or as reduce/reduce conflicts (where YACC can't decide which
reduce to perform - i.e. which grammar rule is it?).
An example of a shift/reduce conflict:
The start of a function/method declaration in a C-like language, that accepts headers like void
fred(int a, int b, float x, float z), looks something like this header:
type_name id '(' params ')'
| type_name id '(' ')'
;
params : param
| params ',' param
;

27
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

param : type_name id
;
YACC has no problems with this grammar, but what if we modify it? It might be nice to be
able to write the example above simply as void fred(int a, b, float x, z). We could try rewriting the
grammar like this:
param : type_name ids
;
ids : id
| ids ',' id
;
But now, YACC reports a shift/reduce conflict, and the details from the y.output file are:
13: shift/reduce conflict (shift 15, reduce 5) on ','
state 13
param : type_name ids . (5)
ids : ids . ',' id (7)
That is, when the generated parser sees a , after a list of identifiers in a param, it doesn't know
whether that , (and the id it expects after) is part of the same param (in which case it should shift, to
include them as part of the RHS) or the start of the next param (in which case it should reduce this
RHS and start a new RHS).
This is not ambiguous, just confusing to YACC, as it needs more lookahead to see if the next
few symbols are e.g. , a b (a is a type_name, b is a parameter name of type a) or , a , or , a ) (a is a
parameter name of the current type). The way to make this clear to YACC is to rewrite the grammar
so that it can see more of the input before having to make a decision:
params : type_name id
| params ',' type_name id
| params ',' id
;
An example of a reduce/reduce conflict:
state 8
sub_exp : id . (5)
type_name : id . (8)
28
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

That is, when it sees id) it doesn't know whether the id is a variable giving a value or a type
name, so it doesn't know which rule to use to recognize the id.
Assuming we don't already know what the problem is, this hasn't helped much, but we can get
more information by working back through the states in the y.output file to try to find how we get
here. To do so, we need to look for states that include shift 8 or goto 8. In this example, all we find is:
state 4
sub_exp : '(' . type_name ')' sub_exp (3)
sub_exp : '(' . exp ')' (7)
...
id shift 8

So the input must include (id), which can be recognized either as a type-cast or as an
expression.

This is a big hint about the source of the ambiguity in the grammar, but more by luck than
anything else - YACC remains confused even if we make the grammar unambiguous, by removing
the rule sub_exp : '-' sub_exp. YACC still reports the same reduce/reduce conflict for this modified
grammar, as it is confused by an input as simple as ( a ) - it has to decide whether this is a value in an
expression or a type-cast before it reads past the ) to see e.g. ( a ) 99 (i.e. a type-cast) or ( a ) - 99 (i.e.
the value a - 99).

Luckily, the solution to the general problem of the ambiguity - to somehow get LEX to
distinguish between identifiers that are really type names (or class names) and all other identifiers -
also solves this confusion for YACC.

Epilogue:

Most of the time, an ambiguous grammar results from an error made by the implementers of a
programming language. Sometimes, however, it is the fault of the language designer. Many languages
are defined in such a way that some part is either inherently ambiguous or confusing (e.g. not LR(1)).
Does this matter? We should not limit language designers to what a particular type of parser generator

29
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

can cope with, but on the other hand there is no particular merit in making a language harder to
compile if a small change can simplify the problem.

An example of this is a well-known problem with conditional statements; the dangling else.
Most imperative languages permit conditional statements to take two slightly different forms:

if ( ... ) ...

if ( ... ) ... else ...

So the else d in if (a) if (b) c else d could be associated either with if (a) or with if (b).

Most languages attempt to fix this problem by stating that the second interpretation is more
natural, and so is correct, although some languages have different rules. Whatever the language
definition, it is an extra rule that anyone learning the language has to remember.

Similarly, the compiler writer has to deal with this special case: if we use a tool like YACC
we get a shift/reduce error - do we shift the else to get if (b) c else d, or do we reduce the if (b) c as it
stands, so we get if (a) ... else d To overcome this problem, we can rewrite the grammar to explicitly
say ``you can't have an unmatched then (logically) immediately before an else - the then and the else
must be paired up'':

stat : matched

| unmatched

unmatched : IF '(' exp ')' stat

| IF '(' exp ')' matched ELSE unmatched

| FOR '(' exp ';' exp ';' exp ')' unmatched

| WHILE '(' exp ')' unmatched


30
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

|...

matched : IF '(' exp ')' matched ELSE matched

| FOR '(' exp ';' exp ';' exp ')' matched

| WHILE '(' exp ')' matched

|...

| exp

Alternatively, it is possible to make a simple change to the language which completely


removes this ambiguity - to have a terminating keyword such as end_if or fi:

stat : IF '(' exp ')' stat FI

| IF '(' exp ')' stat ELSE stat FI

| . . .;

Flowchart for YACC File Compilation

31
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Flow Chart for Parser Execution Process

32
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Conclusion:

We have written an ambiguous CFG to recognize an infix expression and implement a parser
that recognizes the infix expression using YACC, And also the details of all conflicting entries in the
parser table generated by LEX and YACC and how they have been resolved.

Questions:

1. Can YACC handle any unambiguous grammar?


2. Describe the way to avoid confusion of parsers?
3. What is reduce/ reduce conflict?

4. What is ambiguity?

5. What is shift reduce conflict? Does yacc overcome this conflict?

6. What is yywrap() function ?explain its role in parsing?

7. How yacc refers to symbol table?

8. Explain the role of syntax analysis in compiler construction?

9. Enlist the all syntax analysis tools available?

10. Enlist the limitation of yacc?


33
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Assignment No. 05

Aim:

To write an attributed translation grammar to recognize declarations of simple variables,


arithmetic expressions, for if, if-else statement as per syntax of C to generate three address code
for the given input.

Theory:

34
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Semantic Actions:
Parsing tools use a generalization of CFG's in which each grammar symbol one or more
values, called attributes, have associated with it. Each production of the grammar may have an
associated "action", which can refer to and compute the values of attributes. So we have:
Terminals & non-terminals . have attributes
Productions . have semantic actions
Example:
E -> E' + E
| E'
E' -> int * E'
| int
For each symbol, let X.val be an integer value associated with X.
For terminal symbols, val is the lexeme provided by the lexical analyzer.
For non-terminals, val should be the integer value of the expression. This attribute is
computed from the attributes of sub-expressions.

Production Action:
E -> E' + E1 E.val = E'.val + E1.val
| E' E.val = E'.val
E' -> int * E1' E'.val = int.val * E1'.val
| int E'.val = int.val
Note: the attribute of some grammar symbols, such as the terminals + and *, is unused.

Example:
5*3+2*4
Parse Tree Equations
E1 E1.val = E3'.val + E2.val
------------- E3'.val = int7.val + E4'.val
E3'+E2 E4'.val = int8.val
35
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

------------- E2.val = E5'.val


int7 * E4' E5' E5'.val = int9.val * E6'.val
------------- E6'.val = int0.val
int8 int9*E6' int7.val = 5
------------- int8.val = 3
int0 int9.val = 2
int0.val = 4
Working from the leaves to the root, we can compute each val attribute.
For example, E6'.val = 4 and E5'.val = 8. Finally, E1.val = 23.
Notes:
1. Fresh attributes are associated with every node in the parse tree.
2. The semantic actions specify a system of equations; they don't say in what order the
equations are resolved. The user just gives a specification and the parser takes care of the
implementation.
Warning: You can use side-effects in semantic actions, but in this case you have to understand the
order in which attributes get computed or the results will seem unpredictable.
3. In this example, the val attribute can be evaluated bottom-up: the .val attribute for a node
of the parse depends only on the .val attributes of its children.
4. The parse tree need not actually be built by the parser. In fact, a parser tool would
compile this specification into code that simply traces out the structure of the parse tree
without actually building it.
5. Pattern/action parsing can be though of as a systematic translation of the original text into
a new form specified by the semantic actions. Because the translation is guided by the syntax,
it is called syntax-directed translation. (NB: Book uses SDT in a narrower sense.)
6. Attributes may also be passed top-down: an attribute of a node may depend on an attribute
of the parent in the parse tree. Such an attributed is called "inherited". We will talk about
inherited attributes eventually, but they will not be used in the course project.

Synthesized and Inherited Attributes:


Synthesized:
Attribute value depends on descendants of the node
36
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Example: the val attribute above


Inherited:
Attribute value depends on parent and siblings of the node
Example: symbol table environment
S-attributed Definitions:
- An attribute grammar is S-attributed if it consists only of inherited attributes
- Can be evaluated bottom-up:
- Keep a stack S parallel to parsing stack
- consider production
A -> XY A.val = X.val + Y.val
- When reducing by A -> XY
- the top of the S stack has X.val and Y.val
- compute A.val
- pop X.val and Y.val from S, push A.val
- symmetric with reduce action on the parse stack
- Tools like Bison/Flex support S-attributed definitions
Evaluating Attributes:
- S attributed definitions are a very special case of attribute grammars
- The most general method is to construct an ordering from the parse tree itself: Define a
graph as follows. For each attribute E.a to be computed add a node in the graph. If E.a
depends on E1.a1,...,En.an then add directed edges from Ei.ai to E.a for
1 <= i <= n.
A topological sort of the graph is any ordering n1,...,nk of the nodes such that edges of
the graph are all from left-to-right in the ordering; i.e., a node appears in the ordering after all of the
nodes it depends on. Any topological sort is a legal evaluation order of the attributes.
Note: for the topological sort to make sense there can be no cycles in the graph.

Cyclically defined attributes are not legal.


- can make sense even cyclically defined attributes if they are treated as recursive
definitions

37
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

- In practice, computing all of the attribute dependencies from the AST is rarely, if ever,
used. Instead, special cases of syntax-directed definitions are used where the attribute
evaluation order can be determined once and for all from the actions.
- The most important special case is S-attributed grammars: grammars with only
synthesized attributes. Building an AST is an example of an S-attributed grammar (i.e., PA3).
These attributes can be evaluated bottom-up during parsing.
Testing For Circularity:
- If an attribute grammar has a dependence cycle among attributes in some parse tree, then
the attribute grammar is said to be circular.
- Circular attribute grammars are considered meaningless---that is, erroneous.
- It is possible to check whether a given attribute grammar is circular.

Algorithm: Not applicable

Input:

Identifiers from the input in a symbol table and other relevant information about the identifiers

Output:

Equivalent three address code for the given input

Instructions:

For the For Statement, if, if-else statement as per the syntax of C or Pascal and generate
equivalent three address code for the given input made up of constructs mentioned above using LEX
and YACC. Write a code to store the identifiers from the input in a symbol table and also to record
other relevant information about the identifiers from the input in a symbol table and also to records
stored in the symbol table.

Flow Chart for Intermediate Code Generation

38
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Flow Chart for Execution Process

39
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Conclusion:

We have written an attributed translation grammar to recognize declarations of simple


variables

Questions:

1. What are Semantic Actions?

2. What are synthesized and Inherited Attributes?

40
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

3. What are S-attributed Definitions?

4. What is circular Grammar?

5. What is semantic analysis?

6. What is syntax directed translation?

7. How s-attributes are evaluated?

8. How L-attributes are evaluated?

9. Explain the need of SDT?

10. Enlist the all subroutines/inbuilt functions in YACC?

41
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

GROUP B: ASSIGNMENTS
( any 6 Assignments)

Assignment No: 07

Aim:

To generate the target code for the optimized code in assignment.

Objective:

42
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

1. To understand how to construct the machine code.


2. To understand the basic instructions in the ASM.
3. To understand LEX and YACC programming.
4. To understand rules for generating the target code by providing three address code as a input.

Theory:

Code generation is final phase of compiler. Basically code generation is process of creating low
level (assembly language or m/c ) code for three address code (generated by intermediate code
generation phase) or optimized three address code(Optimized by Code Optimizer phase).

Proposed Code Generator:-

Intermediate optimized Intermediate


Source Assembly
Front End Code Optimization Code Generator
Program Code code Code

Symbol Table

Fig. Position of code generator in compiler

Algorithm for code generation:

Read the expression in the form of operator ,operand1,operand2 and generate code using following
algorithm .

Gen_Code(operator,operand1,operand2)
{
If(operand1.addressmode=R)
{
If(operator=+)
Generate(ADD operand2,R0);
else if(operator=-)
Generate(SUB operand2,R0);
else if(operator=*)
43
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Generate(MUL operand2,R0);
else if(operator=/)
Generate(DIV operand2,R0);
}

else If(operand2.addressmode=R)
{
If(operator=+)
Generate(ADD operand1,R0);
else if(operator=-)
Generate(SUB operand1,R0);
else if(operator=*)
Generate(MUL operand1,R0);
else if(operator=/)
Generate(DIV operand1,R0);
}
else{
If(operator=+)
Generate(ADD operand2,R0);
else if(operator=-)
Generate(SUB operand2,R0);
else if(operator=*)
Generate(MUL operand2,R0);
else if(operator=/)
Generate(DIV operand2,R0);
}
}

44
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Example:

We will generate code for following expression

X:= (a+b)*(c-d)+((e/f)*(a+b))

The corresponding three address code can be given as,

t1:=a+b

t2:=c-d

t3:=e/f

t4:=t1*t2

t5=t3*t1

t6:=t4+t5

Using simple code generation algorithm the sequence target code can be generated

Three Address Code


Target Code Sequence
Register Descriptor
Operand Descriptor
t1:=a+b

MOV a,R0
ADD b,R0
Empty
R0 contains t1
t1 R R0
t2:=c-d

MOV c,R1
SUB d,R1
R1 contains c
R1 contains t2
t2 R R1
t3:=e/f

MOV e,R2

45
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

DIV f,R2
R2 contains e
R2 contains t3
t3 R R2
t4:=t1*t2

MUL R0,R1
R0 contains t1
R1 contains t2
R1 contains t4
t4 R R1
t5=t3*t1

MUL R2,R1
R2 contains t3
R0 contains t1
R0 contains t5
t5 R R0
t6:=t4+t5

ADD R1,R0
R0 contains t4
R0 contains t5
R0 contains t6
t6 R R0

Flow Chart for Target Code Generation

46
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Flow Chart for Execution Process

47
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Conclusion :

Thus we have studied to generate the target code for the optimized code.

Questions:

1. What is complier?

2. What is front and back end of compiler?

3. Write steps for program execution?

4. What is Ambiguity?

5. Explain the difference between the target code and intermediate code?

48
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Assignment No: 8

Title: Generating abstract syntax tree using LEX and YACC.

49
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Aim: Write a LEX and YACC program to generate abstract syntax tree.

Objective:
To understand working of Code Generation Phase of Compiler

Theory:

The purpose of this lab is to create and print an abstract syntax tree for a C program. The C program will
use only a small subset of the grammar.
As an example of a syntax tree, consider the statement tri_area = (base * height)/2;
The root node is an assignment operation. Its left subtree is a pointer to tri area.
Its right subtree represents the expression (base * height)/2. The tree looks like the tree in Figure

Fig: Abstract syntax tree


ASSIGN INT
ID PTR|INT value = "tri_area"
DIVIDE INT
TIMES INT
DEREF INT
ID PTR|INT value = "base"
DEREF INT
ID PTR|INT value =
"height" NUM INT value =
50
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

2
In this display, each node is followed by its left subtree and then its right subtree, indented one tab
stop. Notice that base and height are dereferenced, but tri area isn't. That will be explained next.
Tree Nodes and the Tree Node Class
A tree node will be implemented by the Tree Node class. If a tree node is an interior node, then it will
contain an operator that acts on the left and right subtrees. The operator will have a mode, which will be
the data type involved in the operation. For example, if the mode of an assignment operator is INT, then
the operator will assign an int to an int. If a tree node is an exterior (leaf) node, then it will contain an
object, which will be an identi_er or a number (and later a string). The mode of an exterior node will be
the kind of object stored in that node. For example, if the object is an integer variable (l-value), then the
mode will be a pointer to an INT.
If the object is an integer constant, then the mode will be INT. Open the _le TreeNode.java. This
_le de_nes the TreeNode class whose objects have the following attributes: the operation (oper)
represented by the node, the mode (mode) of the operation, a reference to the left subtree (left), a
reference to the right subtree (right), the identi_er (id) represented by the node, the number (num)
represented by the node, and the string (str) represented by the node.
If the node is a binary interior node, then left and right will be non-null, and id, num, and str will be
unde_ned. On the other hand, if the node is an exterior node, then left and right will be null, while exactly
one of id, num, and str will be de_ned, depending on the kind of exterior node. From time to time, we will
have unary interior nodes. They will always use the left subtree rather than the right subtree.
Note the types of the data members oper, mode, left, right, id, num, and
str. Also, one constructor
publicTreeNode(IdEntryi)
and the toString() function have been de_ned. You will de_ne three additional constructors. First, de_ne
the default constructor:
publicTreeNode()
It should set oper, mode, and num to 0 and left, right, id, and str to
null. Next, de_ne the following constructor.
publicTreeNode(int op, int m, TreeNode l, TreeNode r)
The purpose of this constructor is to join together two existing trees, with root nodes l and r, as the left
and right subtrees of a new tree with this node as its root node.
In the root node, the value of oper should be op and the value of mode should be
m. Finally, define the constructor

51
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

publicTreeNode(int n)

It will create a node that represents a number. The member oper should be Ops.NUM, mode
should be Ops.INT, and num should be the value of n. Write these constructors. We will use these
constructors later in this lab.
Yacc is a tool for building syntax analyzers, also known as parser,yacc has been used to
implement hundreds of languages. Its applications range from small desk calculators, to medium-sized
preprocessors for typesetting, to large compiler front ends for complete programming languages.
A yacc specification is based on a collection of grammar rules that describe the syntax of a
language; yacc turns the specification into a syntax analyzer. A pure syntax analyzer merely checks
whether or not an input string conforms to the syntax of the language.

Algorithm:
Step1: Start
Step2: declare the declarations as a header file {include<ctype.h>}
Step3: token digit
Step4: define the translations rules like line, expr, term, factor
Line: expr \n {print (\n %d \n,$1)}
Expr: expr+ term ($$=$1=$3}
Term: term + factor ($$ =$1*$3}
Factor: (enter) {$$ =$2)
%%
Step5: define the supporting C routines
Step6: Stop

Conclusion:

We have studied and implemented code generation techniques.

FAQs
1. What is AST?
2. What is the need of AST?
3. Which phase of compiler generates AST?
4. What are the applications of AST in compiler?
52
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Assignment No: 9

Title: Implementing recursive descent parser for sample language.

Aim: Write a program to generate Recursive Descent Parser.

Objective:
To develop a recursive-descent parser for a given grammar.

53
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

To generate a syntax tree as an output of the parser


To handle syntax errors.
Theory:

A recursive descent parser is a kind of top-down parser built from a set of mutually-recursive
procedures (or a non-recursive equivalent) where each such procedure usually implements one of
the production rules of the grammar. Thus the structure of the resulting program closely mirrors that
of the grammar it recognizes.
This parser attempts to verify that the syntax of the input stream is correct as it is read from left to
right. A basic operation necessary for this involves reading characters from the input stream and
matching then with terminals from the grammar that describes the syntax of the input. Our recursive
descent parsers will look ahead one character and advance the input stream reading pointer when
proper matches occur. What a recursive descent parser actually does is to perform a depth-first
search of the derivation tree for the string being parsed. This provides the 'descent' portion of the
name. The 'recursive' portion comes from the parser's form, a collection of recursive procedures.
As our first example, consider the simple grammar
E ->
x+T T -
> (E) T
-> x
and the derivation tree in figure 2 for the expression x+(x+x)

54
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Figure: Derivation Tree for x+(x+x)

A recursive descent parser traverses the tree by first calling a procedure to recognize an E. This
procedure reads an 'x' and a '+' and then calls a procedure to recognize a T. This would look like the
following routine.
Procedure E()
Begin
If (input_symbol=x)
then next();
If (input_symbol=+) then
Next();
T();
Else
Errorhandler();
END
Note that the 'next' looks ahead and always provides the next character that will be read from the
input stream. This feature is essential if we wish our parsers to be able to predict what is due to
arrive as input. Note that 'errorhandler' is a procedure that notifies the user that a syntax error has
been made and then possibly terminates execution.
In order to recognize a T, the parser must figure out which of the productions to execute. This is not
difficult and is done in the procedure that appears below.
Procedure T()
Begin
Begin
If (input_symbol=()
then next();
E();
If (input_symbol=))
then next();
end
else If (input_symbol=x)
then next();
else
55
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Errorhandler
();
END
In the above routine, the parser determines whether T had the form (E) or x. If not then the error
routine was called, otherwise the appropriate terminals and nonterminals were recognized.

Algorithm:
1. Make grammar suitable for parsing i.e. remove left recursion (if required).
2. Write a function for each production with error handler.
3. Given input is said to be valid if input is scanned completely and no error function is called.
Conclusion:

We have studied and implemented Recursive Descent Parser.

FAQs:
1.What do you mean by Recursive Descent Parsing?
2.What are the applications of Recursive descent parse
3.Advantages of Recursive descent parse

56
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

GROUP C: ASSIGNMENTS
(Any one)

57
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Assignment No:13
Title: Generate Huffman codes for a gray scale 8 bit image.

Aim: Generate Huffman codes for a gray scale 8 bit image.

Prerequisites:
Knowledge of Huffman codes.
Objectives:
To generate Huffman codes for a gray scale 8 bit image.

Theory:
Huffman coding,

Huffman coding, an algorithm developed by David A. Huffman while he was a Ph.D. student at MIT,
and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

The output from Huffman's algorithm can be viewed as a variable-length code table for encoding a source
symbol (such as a character in a file). The algorithm derives this table from the estimated probability or
frequency of occurrence (weight) for each possible value of the source symbol. As in other entropy
encoding methods, more common symbols are generally represented using fewer bits than less common
symbols. Huffman's method can be efficiently implemented, finding a code in linear time to the number of
input weights if these weights are sorted. However, although optimal among methods encoding symbols
separately, Huffman coding is not always optimal among all compression methods.

The beauty of Huffman codes is that variable length codes can achieve a higher data density than fixed
length codes if the characters differ in frequency of occurrence. The length of the encoded character is
inversely proportional to that character's frequency. Huffman wasn't the first to discover this, but his
paper presented the optimal algorithm for assigning these codes. Huffman codes are similar to the
Morse code. Morse code uses few dots and dashes for the most frequently occurring letter. An E is
represented with one dot. A T is represented with one dash. Q, a letter occurring less frequently is
represented with dash-dash-dot-dash.

Huffman codes are created by analyzing the data set and assigning short bit streams to the datum occurring

58
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

most frequently. The algorithm attempts to create codes that minimize the average number of bits per
character. Table 9.1 shows an example of the frequency of letters in some text and their corresponding
Huffman code. To keep the table manageable, only letters were used. It is well known that
in English text, the space character is the most frequently occurring character.

As expected, E and T had the highest frequency and the shortest Huffman codes. Encoding with these
codes is simple. Encoding the word toupee would be just a matter of stringing together the appropriate
bit strings, as follows:

T 0 U P E E

111 0100 10111 10110 100 100

One ASCII character requires 8 bits. The original 48 bits of data have been coded with 23 bits
achieving a compression ratio of 2.08.

Letter Frequency Code


A 8.23 0000
B 1.26 110000
C 4.04 1101
D 3.40 01011
E 12.32 100
F 2.28 11001
G 2.77 10101
H 3.94 00100
I 8.08 0001
J 0.14 110001001

59
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

K 0.43 1100011
L 3.79 00101
M 3.06 10100
N 6.81 0110
O 7.59 0100
P 2.58 10110
Q 0.14 1100010000
R 6.67 0111
S 7.64 0011
T 8.37 111
U 2.43 10111
V 0.97 0101001
W 1.07 0101000
X 0.29 11000101
Y 1.46 010101
Z 0.09 1100010001

Table C1.1 Huffman codes for the alphabet letters.

60
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Modified Huffman Coding

Modified Huffman coding is used in fax machines to encode black on white images (bitmaps). It is also
an option to compress images in the TIFF file format. It combines the variable length codes of Huffman
coding with the coding of repetitive data in run length encoding. Since facsimile transmissions are
typically black text or writing on white background, only one bit is required to represent each pixel or
sample. These samples are referred to as white bits and black bits. The runs of white bits and black bits
are counted, and the counts are sent as variable length bit streams.

The encoding scheme is fairly simple. Each line is coded as a series of alternating runs of white and
black bits. Runs of 63 or less are coded with a terminating code. Runs of 64 or greater require that a
makeup code prefix the terminating code. The makeup codes are used to describe runs in multiples of
64 from 64 to 2560. This deviates from the normal Huffman scheme which would normally require
encoding all 2560 possibilities. This reduces the size of the Huffman code tree and accounts for the
term modified in the name.

Studies have shown that most facsimiles are 85 percent white, so the Huffman codes have been
optimized for long runs of white and short runs of black. The protocol also assumes that the line begins
with a run of white bits. If it doesn't, a run of white bits of 0 length must begin the encoded line. The
encoding then alternates between black bits and white bits to the end of the line. Each scan line ends
with a special EOL (end of line) character consisting of eleven zeros and a 1 (000000000001). The
EOL character doubles as an error recovery code. Since there is no other combination of codes that has
more than seven zeroes in succession, a decoder seeing eight will recognize the end of line and
continue scanning for a 1. Upon receiving the 1, it will then start a new line. If bits in a scan line get
corrupted, the most that will be lost is the rest of the line. If the EOL code gets corrupted, the most that
will get lost is the next line.

Tables 13.2 and 13.3 show the terminating and makeup codes. Figure 13.1 shows how to encode a
1275 pixel scanline with 53 bits.

Run White bits Black bits Run White bits Black bits
Length Lengt
h
0 00110101 0000110111 32 00011011 000001101010
1 000111 010 33 00010010 000001101011

61
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

2 0111 11 34 00010011 000011010010

3 1000 10 35 00010100 000011010011


4 1011 011 36 00010101 000011010100
5 1100 0011 37 00001110 000011010101
6 1110 0010 38 00010111 000011010110
7 1111 00011 39 00101000 000011010111
8 10011 000101 40 00101001 000001101100
9 10100 000100 41 00101010 000001101101
10 00111 0000100 42 00101011 000011011010
11 01000 0000101 43 00101100 000011011011
12 001000 0000111 44 00101101 000001010100
13 000011 00000100 45 00000100 000001010101
14 110100 00000111 46 00000101 000001010110
15 110101 000011000 47 00001010 000001010111
16 101010 0000010111 48 00001011 000001100100
17 101011 0000011000 49 01010010 000001100101
18 0100111 0000001000 50 01010011 000001010010
19 0001100 00001100111 51 01010100 000001010011
20 0001000 00001101000 52 01010101 000000100100
21 0010111 00001101100 53 00100100 000000110111
22 0000011 00000110111 54 00100101 000000111000
23 0000100 00000101000 55 01011000 000000100111
24 0101000 00000010111 56 01011001 000000101000
25 0101011 00000011000 57 01011010 000001011000
26 0010011 000011001010 58 01011011 000001011001
27 0100100 000011001011 59 01001010 000000101011
28 0011000 000011001100 60 01001011 000000101100
29 00000010 000011001101 61 00110010 000001011010
30 00000011 000001101000 62 001110011 000001100110
31 00011010 000001101001 62 00110100 000001100111
62
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

Table 13.2 Terminating codes

64 11011 000000111
128 10010 00011001000
192 010111 000011001001
256 0110111 000001011011
320 00110110 000000110011
384 00110111 000000110100
448 01100100 000000110101
512 01100101 0000001101100
576 01101000 0000001101101
640 01100111 0000001001010
704 011001100 0000001001011
768 011001101 0000001001100
832 011010010 0000001001101
896 101010011 0000001110010
960 011010100 0000001110011
1024 011010101 0000001110100
1088 011010110 0000001110101
1152 011010111 0000001110110
1216 011011000 0000001110111
1280 011011001 0000001010010
1344 011011010 0000001010011
1408 011011011 0000001010100
1472 010011000 0000001010101
1536 010011001 0000001011010
1600 010011010 0000001011011
63
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

1664 011000 0000001100100


1728 010011011 0000001100101
1792 00000001000 00000001000
1856 00000001100 00000001100
1920 00000001101 00000001101
1984 000000010010 000000010010
2048 000000010011 000000010011
2112 000000010100 000000010100

2170 000000010101 000000010101


2240 000000010110 000000010110
2304 000000010111 000000010111
2368 000000011100 000000011100
2432 000000011101 000000011101
2496 000000011110 000000011110
2560 000000011111 000000011111
EOL 000000000001 000000000001

Table 13.3 Makeup code

words

0 white 00110101
1 block 010
4 white 1011
2 block 11

64
Dr. D. Y. Patil College of Engg.,Ambi
CL-I B.E. Computer Engineering

1 white 0111
block 010
1266 white 011011000 + 01010011
EOL 000000000001

Figure 13.1 Example encoding of a scanline.

Conclusion:
Generation of Huffman codes for a gray scale 8 bit image is studied.

65
Dr. D. Y. Patil College of Engg.,Ambi

Anda mungkin juga menyukai