Anda di halaman 1dari 29

What is a Compiler?

A compiler is a computer
program that translates a
program in a source
language into an equivalent
program in a target
language.
Source Target
A source program/code is program compiler program
a program/code written in
the source language, which
is usually a high-level
language. Error
A target program/code is message
a program/code written in
the target language, which
often is a machine language

Chapter 1 2301373: Introduction 2


A Language Processing System
Structure of compiler
Two parts:
1. Analysis- breaks up the source pgm into pieces,
-imposes a grammatical structure, -creates an
intermediate representation of the source pgm.
- if syntactically or semantically wrong,
informative messages provided
-information about program stored in symbol table
-front end of compiler
2. Synthesis (Back end of compiler)
-constructs the target pgm from the
intermediate repn and the information
in symbol table
Phases of Compiler
Compilation of a program proceeds
through a fixed series of phases
Each phase use an (intermediate) form
of the program produced by an earlier
phase
The symbol table (information about the
entire source program) is used by all
phases of the compiler.

4/16/17 COP4020 Spring 2014 6


Lexical Analysis/Scanning
A scanner reads a stream of
characters and groups them together
into some meaningful units called
lexemes.
It produces a stream of tokens,
corresponding to each lexeme for the
next phase of compiler.
<token_name,attribute_value>
attribute_value points to an entry in
the symbol table
Chapter 1 2301373: Introduction 8
position = initial + rate * 60
Would be grouped into the following lexemes:
1. position ------- <id,1>
2. =--------------------<=>
3. initial------------<id,2>
4. + ---------------------<+>
5. rate----------------<id,3>
6. *----------------------<*>
7. 60--------------------<60>
Syntax analysis/Parsing
A parser gets a stream of tokens from
the scanner, and determines if the
syntax (structure) of the program is
correct according to the (context-free)
grammar of the source language.
Then, it produces a data structure,
called a parse tree or a syntax
tree, which describes the syntactic
structure of the program.

Chapter 1 2301373: Introduction 10


Parser: Syntax Analysis
Checks whether the token stream meets the
grammatical specification of the language and
generates the syntax tree.
A syntax error is produced by the compiler
when the program does not meet the
grammatical specification.
For grammatically correct program, this phase
generates a syntax tree (also called a parse tree).
A grammar of a programming language is
typically described by a context free grammer,
which also defines the structure of the parse tree.

4/16/17 COP4020 Spring 2014 11


Semantic analysis
It gets the parse tree from the parser
together with information in the symbol
table
It determines if the semantics or meaning of
the program is correct.
Mostly, a semantic analyzer does type
checking.

int a;
int b;
char c[ ];
a=b + c; (Type check is done)
Chapter 1 2301373: Introduction 12
Some type conversions or coercions may
also be permitted
It may sometimes modify the parse tree in
order to get that semantically correct code
Eg .. A binary arithmetic operator is applied to a pair of
integers or a pair of floating point numbers.
If operation is specified between an integer and a
floating point no, the compiler will convert the
integer to floating point number.

inttofloat
Semantic Analysis
Static semantic checks (done by the compiler) are
performed at compile time
Type checking
Every variable is declared before used
Identifiers are used in appropriate contexts
Check subroutine call arguments
Dynamic semantic checks are performed at run time, and
the compiler produces code that performs these checks
Array subscript values are within bounds
Arithmetic errors, e.g. division by zero
A variable is used but hasn't been initialized

4/16/17 COP4020 Spring 2014 14


Intermediate code
generation
An intermediate code generator
takes a parse tree from the semantic
analyzer
generates an intermediate representation.

Easy to produce
Easy to translate to the target
language

Chapter 1 2301373: Introduction 15


Intermediate code
generation
One of the popular intermediate
(contd) code
is three-address code.
t1=inttofloat(60)
t2= id3*t1
t3=id2+t2
id1=t3

Chapter 1 2301373: Introduction 16


Code optimization
Replacing an inefficient sequence of
instructions with a better sequence of
instructions.
Sometimes called code improvement.
improves the intermediate code, so
that faster-running machine code
result.
Results in a better target code (faster
code)
temp1=id3*60.0 // removed
unnecessary
Chapter 1 2301373: Introduction 17
Code generation
A code generator
takes an intermediate representation of
source pgm
produces the target program
Compilers may generate different
types of target codes depending on
the machine.
LDF R2,id3
MULF R2,R2,#60.0
LDF R1,id2
ADDF R1,R1,R2 2301373: Introduction
Chapter 1 18
The Structure of a Compiler (8)
Code Generator
[Intermediate Code Generator]

Non-optimized
Scanner Intermediate Code
[Lexical Analyzer]

Tokens

Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Cod
Parse
tree

Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code

Abstract Syntax Tree w/


Attributes

19
Symbol Table
Identifiers are names of variables,
constants, functions, data types, etc.
Store information associated with
identifiers
Symbol table is a data structure
that contains a record for each
identifier (with attributes of the
identifier)
Finding record for each identifier easily
When an identifier is detected by a
lexical analyzer, the identifier is
Chapter 1 2301373: Introduction 20
Symbol Table (contd)
Accessed in every phase of compilers
The scanner put names of identifiers in
symbol table.
The semantic analyzer stores more
information (e.g. data types) in the table.
The intermediate code generator, code
optimizer and code generator use
information in symbol table to generate
appropriate code.

Chapter 1 2301373: Introduction 21


Error Handler
Each phase can encounter errors
After detecting an error, a phase must deal
with that error.
Large fraction of errors detected during
syntax and semantic analysis phases
-lexical phase error- the characters
remaining in the input do not form any token
-Syntax phase error- token stream violates
the syntax of the language
-Semantic analysis phase error-
Right syntactic structure, but no
meaning in the operations involved
COMPILER CONSTRUCTION
TOOLS
Parser generators
Produce syntax analyzers from input
based on CFG
Use powerful parsing algorithms,
which are difficult to be carried out
manually
Scanner generators
Produce lexical analyzers, from
specification based on regular
expressions
Utilizes finite automata
Syntax Directed Translation
Engines
Produce collections of routines that
traverse the parse tree
Finally yielding intermediate code
Basic idea...
One or more translations are associated
with each node of the parse tree
Each translation is defined in terms of
translations at its neighbour nodes
Automatic Code Generators
Produces the machine language
Basic technique- template matching
Intermediate code is replaced by
templates that represent
sequences of machine instructions
Data Flow Engines
Code optimizations involves data
flow analysis
How values are transmitted from one
part of a program to another part

Anda mungkin juga menyukai