Compiler : Compiler is a translator program that translates a program written in (HLL) the
source program and translate it into an equivalent program in (MLL) the target program. As
an important part of a compiler is error showing to the programmer.
Executing a program written in High Level programming language is basically of two parts.
The source program must first be compiled translated into a object program.
Then the result object program is loaded into a memory for execution..
Compiler Design
OVERVIEW OF LANGUAGE PROCESSING SYSTEM
Preprocessor : A preprocessor produce input to compilers.
They perform the following functions.
1. Macro processing: A preprocessor may allow a user to
define macros that are short hands for longer constructs.
2. File inclusion: A preprocessor may include header files into
the program text.
3. Rational preprocessor: these preprocessors augment older
languages with more modern flow-of-control and data
structuring facilities.
4. Language Extensions: These preprocessor attempts to add
capabilities to the language by certain amounts to build-in
macro
Compiler Design
OVERVIEW OF LANGUAGE PROCESSING SYSTEM
Translator : A translator is a program that takes as input a program written in one language and produces
as output a program in another language. Beside program translation, the translator performs another
very important role, the error-detection. Any violation of HLL specification would be detected and
reported to the programmers.
Type Of Translators :
Interpretor
Compiler
Preprossessor
Compiler Design
Interpreter Vs Compiler
Compiler Interpreter
Fourth generation : Designed for specific application like NOMAD for report
generation, ABAP(SAP) for ERP, SQL for database queries, Postscript for
text formatting.
Fifth generation: These have been applied to logic and constraint based
languages like Prolog and OPS5 etc,.
Programming Languages classification
Based on functions :
Imperative languages : C,C++,C#,Java- notion of program state and
statements that change the state.
Based On architecture:
Von Neumann Languages : The languages which have computational
model as that of von Neumann computer architectures.
Token 1: (const, -)
Token 2: (identifier, ‘pi’)
Token 3: (=, -)
Token 4: (realnumber, 3.14159)
Token 5: (;, -)
Outline
• Role of lexical analyzer
• Specification of tokens
• Recognition of tokens
• Lexical analyzer generator
• Finite automata
• Design of lexical analyzer generator
The role of lexical analyzer
token
Source To semantic
Lexical Analyzer Parser
program analysis
getNextToken
Symbol
table
Why to separate Lexical analysis and
parsing
1. Simplicity of design
2. Improving compiler efficiency
3. Enhancing compiler portability
Tokens, Patterns and Lexemes
• A token is a pair a token name and an optional
token value
• A pattern is a description of the form that the
lexemes of a token may take
• A lexeme is a sequence of characters in the
source program that matches the pattern for a
token
Example
• Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
Extensions
• One or more instances: (r)+
• Zero of one instances: r?
• Character classes: [abc]
• Example:
– letter_ -> [A-Za-z_]
– digit -> [0-9]
– id -> letter_(letter|digit)*
Recognition of tokens
• Starting point is the language grammar to
understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
|Ɛ
expr -> term relop term
| term
term -> id
| number
Recognition of tokens (cont.)
• The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
• We also need to handle whitespaces:
ws -> (blank | tab | newline)+
Transition diagrams
• Transition diagram for relop
Transition diagrams (cont.)
• Transition diagram for reserved words and
identifiers
Lexical Analyzer Generator - Lex
lex.yy.c
C a.out
compiler
declarations
%%
translation rules Pattern {Action}
%%
auxiliary functions
Example
%{
Int installID() {/* funtion to install the
/* definitions of manifest constants
lexeme, whose first character is
LT, LE, EQ, NE, GT, GE, pointed to by yytext, and whose
IF, THEN, ELSE, ID, NUMBER, RELOP */ length is yyleng, into the symbol
%} table and return a pointer thereto
*/
/* regular definitions }
delim [ \t\n]
ws {delim}+ Int installNum() { /* similar to
installID, but puts numerical
letter [A-Za-z]
constants into a separate table */
digit [0-9]
}
id {letter}({letter}|{digit})*
number {digit}+(\.{digit}+)?(E[+-]?{digit}+)?
%%
{ws} {/* no action and no return */}
if {return(IF);}
then{return(THEN);}
else {return(ELSE);}
{id} {yylval = (int) installID(); return(ID); }
{number} {yylval = (int) installNum(); return(NUMBER);}
…
Finite Automata
• Regular expressions = specification
• Finite automata = implementation