Anda di halaman 1dari 27

Teknik Kompilasi

Analisa Leksikal (Scanning)


Sulistyo Pusptodjati

Sumber: Compiler Construction by Vana Doufexi


users.ece.northwestern.edu/~boz283/cs-322-original

1
Kuis
subject: kuis pretest1_Kelas_Nama_NPM

1. Compiler bagian dari:


a. Interpreter b. translator
2. Translator yang tidak mempunyai hubungan dengan
pembentukan bahasa mesin:
a. Kompiler b. interpreter
3. Membentuk program output (dalam bentuk exe) yang
dapat dijalankan (run) terpisah dari program asli
a. Kompiler b. interpreter
4. Intrepeter lebih bagus untuk web programming dibanding
dengan kompiler
a.benar b. salah
5. Menggunakan Kompiler atau interpreter: C, Ruby, C++,
JAVA, php?
2
Jawaban:
4Ia16:
https://drive.google.com/drive/folders/14mvtPQtP4CckBu6C
IcNwfJwoXVPA-4s3?usp=sharing

4IA09:
https://drive.google.com/drive/folders/1W1l-2wgwRTE9puR
hhQREFnlqXn3rztMe?usp=sharing

4IA11:
https://drive.google.com/drive/folders/1d2kk81A9XLBJnQfr
88kUlmEK5-RQaUO_?usp=sharing

4IA07:
https://drive.google.com/drive/folders/1EkI9wENGGY1uFCr
3
Source program with macros

Preprocessor

Proses Kompilasi Source program

Compiler
Targetwith
Try g++ assembly program
–v, -E, -S flags
on linprog.
assembler

Relocatable machine code

linker

Absolute machine code


4
Kuis2
1. Kompiler adalah …
2. Kompiler 2 fase, terdiri dari fase … dan ...
3. Front end compiler terdiri dari tiga tahap: 1)…, 2)…,
3)…
4. Luaran dari pass front end dari compiler adalah …
5. Back end compiler melakukan …

5
Compiler Front- and Back-end Pass
Source program (character stream) Abstract syntax tree or
other intermediate form
Scanner
Machine-
(lexical analysis)
Independent Code
Tokens
Improvement
Front end

Back end
synthesis
Parser Modified intermediate form
analysis

(syntax analysis)
Target Code
Parse tree
Generation
Semantic Analysis Assembly or object code
and Intermediate
Machine-Specific
Code Generation
Code Improvement
Abstract syntax tree or
Modified assembly or object code
other intermediate form 6
Contoh Proses
Kompilasi

7
Proses Scanning (Analisa Leksikal)
 Tujuan utama: mengenal kata (token)
 Bagaimana? Dengan mengenal patterns/pola
 Contoh: identifier berbentuk susunan huruf atau digits yang
diawali dengan huruf
 Pola lexical membentuk bahasa regular
 Regular languages dapat dirumuskan menggunakan
regular expressions (REs)
 Dapatkan RE recognizer diotomatisasi?
 Yes!

8
The scanning process
 Goal: automate the process
 Idea:
 Start with an RE/RD
 Build a DFA
 How?
 We can build a non-deterministic finite automaton
(Thompson's construction)
 Convert that to a deterministic one
(Subset construction)
 Minimize the DFA
(Hopcroft's algorithm)
 Implement it
 Existing scanner generator: flex, lex

9
Proses Scanning
 Definisi: Regular expressions (atas alfabet )
  /  /  adalah RE dengan notasi {}
 Jika a , maka  adalah RE dengan notasi {a}
 Jika r dan s adalah RE, maka
 (r) adalah RE dengan notasi L(r)

 r|s adalah RE dengan notasi L(r)L(s)

 rs adalah RE dengan notasi L(r)L(s)

 r* adalah RE dengan notasi Kleene closure dari L(r)

 Sifat: RE tertutup pada banyak operasi


 This allows us to build complex REs.
{ab} =  {ab}i , i > = 0
{ab, b}* = {, abab, ab, b , ababab, abb }
{ab, b}{ab, b} = {abab, abb, bab, bb} ={ab, b}2 10
 {ab, b}* = {…, bbbbbbbb, …}
 letter  a|...|z|A|...|Z
 Token_if  letter letter  {ab, cd, ci, ic}

11
Regular Definitions
 A regular expression that describes digits is:
0|1|2|3|4|5|6|7|8|9
 For convenience, we'd like to give it a name and then
use the name in building more complex regular
expressions:
digit  0|1|2|3|4|5|6|7|8|9
 This is called a regular definition.
 Example
 Integer  0|((1|2|3|..|9 )digit*)
 letter  a|...|z|A|...|Z
 ident  letter (letter | digit)*
 Token_if  if
 Token_Then  t|T H|h e|E n|N

tHeN 12
 digit  0|1|2|3|4|5|6|7|8|9
 letter  a|...|z|A|...|Z

 ident  letter (letter | digit)*

bEtE2
bE2tE

Integer  (+|-)? 0|((1|2|3|..|9 )digit*)

13
What’s next
 Given an input string, we need a “machine” that has
a regular expression hard-coded in it and can tell
whether the input string matches the pattern
described by the regular expression or not.

 A machine that determines whether a given string


belongs to a language is called a finite automaton.

14
The scanning process
 Definition: Deterministic Finite Automaton
 a five-tuple (, S, , s0, F) where
  is the alphabet
 S is the set of states
  is the transition function (SS)
 s0 is the starting state
 F is the set of final states (F  S)
 Notation:
 Use a transition diagram to describe a DFA

 DFAs are equivalent to REs


 Hey! We just came up with a recognizer!

15
The scanning process
 Main goal: recognize words/tokens
 Snapshot:
 At any point in time, the scanner has read some input and is
on the way to identifying what kind of token has been read
(e.g. identifier, operator, integer literal, etc.)
 Once the scanner identifies a token, it sends it off to the
parser and starts over with the next word.
 Some tokens need additional data to be carried along

with them
 For example, an identifier token needs to have the
identifier itself attached to it.
 Alternatively, the scanner generates a file of tokens which is
then input to the parser.
16
The scanning process
 A simple hand-written scanner would look a bit like this:

nextchar = getNextChar();
switch (nextchar) {
case '(': return LPAREN; /* return LPAREN token */
case 0:
case 1:
...
case 9: nextchar = getNextChar();
while (nextchar is a digit) {
concat the digits to build an integer
nextchar = getNextChar();
}
putBack(nextchar)
make a new INTEGER token with the integer value attached
return INTEGER;
...
}
… 17
The scanning process
 Not always as simple as it seems
 Example from old versions of FORTRAN:

DO 5 I=1,10
vs.
DO 5 I=1.10
 Instead of writing a scanner by hand, we can
automate the process.
 Specify what needs to be recognized and what to do when
something is recognized.
 Have a scanner generator create the scanner based on our
specification.
 Hand-written vs. automated scanner

18
The scanning process
 Goal: automate the process
 Idea:
 Start with an RE
 Build a DFA
 How?
 We can build a non-deterministic finite automato_  NFA
(Thompson's construction)
 Convert that to a deterministic one = DFA
(Subset construction)
 Minimize the DFA
(Hopcroft's algorithm)
 Implement it
 Existing scanner generator: lex, flex. dll

19
Scanner generator: Lex
 Lex source is a table of
 regular expressions and
 corresponding program fragments

digit [0-9]
letter [a-zA-Z]
%%
{letter}({letter}|{digit})* printf(“id: %s\n”, yytext);
\n printf(“new line\n”);
%%
main() {
yylex();
}

20
Lex Source
 Lex source is separated into three sections by %
% delimiters
 The general format of Lex source is

{definitions}
%% (required)
{transition rules}
%% (optional)
{user subroutines}
 The absolute minimum Lex program is thus

%% 21 21
22
23
Contoh untuk suatu bahasa “Tiny”

24
25
26
Tugas
 Baca ppt ini
 Buat kelompok tidak lebih dari 5, untuk tugas
selanjutnya

2 Mode Pembelajaran T Kompilasi


Mode 1: hadir, menerjakan tugas, mengikuti uts – uts
membuat compiler.
Mode 2: Membuat compiler sendiri – tidak perlu hadir
setaip pekan (gmeet, vclass)

27

Anda mungkin juga menyukai