Overview of compiler
Environment
pass and phase
phases of compiler
regular expression
Lexical Analyzer
LEX tool
Bootstrapping.
Compiler - Introduction
A compiler is a computer
program that translates a
program in a source
language into an equivalent
program in a target
language.
A source program/code
is a program/code written in
the source language, which
is usually a high-level
language.
A target program/code is
a program/code written in
the target language, which
often is a machine language
or an intermediate code.
Input
Source
program
compiler
Error
message
Target
program
Output
A language-processing system
Preprocessor
Source Program
Compiler
gcc -v myprog.c
Libraries and
Relocatable Object Files
3
Phases of a Compiler
Source program
Lexical analyzer
token stream
Syntax analyzer
syntax tree
Semantic analyzer
Symbol
Table
syntax tree
Error
Handler
Source
Code
Front End
Intermediate
Code
Back End
Target
Code
Semantic Analysis:
Checking to ensure Correctness of Components
structure
Parsing = Diagramming Sentences
The diagram is a tree
assignment
statement
identifier
position
expression
+
expression
expression
identifier
initial
*
expression
expression
identifier
number
rate
60
:=
position
position
initial
initial
*
rate
60
*
rate
inttofloat
60
Compressed Tree
Conversion Action
character stream
Lexical Analyzer
Syntax Analyzer
=
+
<id,1>
<id,2>
<id,3>
60
=
Semantic Analyzer
<id,1>
<id,2>
<id,3>
*
inttofloat
60
t1 = inttofloat(60)
t2 = id3 * t1
t3 = id2 + t2
id1 = t3
Machine-Independent
Code Optimizer
t1 = id3 * 60.0
id1 = id2 + t1
Code Generator
1
position
initial
rate
SYMBOL TABLE
Phase
program
The portions of one or more phases are combined into a module called pass
between phases
reduces memory
Lexical Analysis
The compiler sees the following code as
if (i == j)
Z = 0;
\tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
else
Z = 1;
Token Class (or Class)
In English:
Noun, verb, adjective ..
In a programming language:
Token
- A classification for a common set of strings
Pattern
- Identifiers : x, count , . . .
<Class, String>
Parser
{
return x+y;
}
4. printf( i = %d , $i = %p,i,&i);
DO 5 I = 1.25
if (i == j)
Z = 0;
else
Z = 1;
Regular Languages
expressions including
R=
| c
c is in
| R + R | RR
| R*
RE examples :
Formal Languages
Def. Let be a set of characters (an alphabet).
A language over is a set of strings of characters drawn from
Alphabet = English characters
Alphabet = ASCII
Language = C programs
Lexical Specifications
Keyword: if or else or then or
At least one:
A+ = AA*
Union:
A|B= A+B
Option:
A? = A +
x1xi L(R)
Resolving Ambiguities
How much input is used?
- Maximal Munch
Which token is used?
- Choose the one listed first
Lexical errors
Error recovery
Input buffering
Two buffers of the same size, say 4096, are alternately reloaded.
Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;\
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
Transition diagrams
Transition diagram for relop