Anda di halaman 1dari 20

July 2011

Master of Computer Application (MCA) - Semester 3


Mc007l 5ystem Proqromminq4 credits
4ssiqnment 5et 1

1) xp/oin the fo//owinq

4) Lexico/ 4no/ysis


ln computer science /exico/ ono/ysis is the process of convertinq o sequence of chorocters into
o sequence of tokens 4 proqrom or function which performs /exico/ ono/ysis is co//ed o /exico/
ono/yter /exer or sconner 4 /exer often exists os o sinq/e function which is co//ed by o porser
or onother function



8) 5yntox 4no/ysis
Unlike other aspects of the compiler, the syntax analysis parts are not very separable, since
they are mixed up with calls to all other parts, such as semantic analysis.
However the method used is that commonly known as recursive descent. 1his will not be
treated in great detail here - consult any book on compiler theory for details.
1he method depends on writing a separate parsing procedure for each kind of syntactic
structure, such as if statement, assignment statement, expression and so on, and each of these
is only responsible for analysing its own kind of structure. If any structure contains another
structure then the parsing procedure can call the procedure for this contained structure.
As an example, consider the procedure ifstatement Eliminating all but the syntax analysis
parts leaves
procedure ifstatement;
begin
expression;
if sy = thensy then insymbol
else error(52);
statement;
if sy = elsesy then
begin
insymbol; statement
end
end;

whot is kl5c ond how it is different from the cl5c?
4ns
cl5c
4 comp/ex lnstruction 5et computer {cl5c) supp/ies o /orqe number of comp/ex instructions ot the
ossemb/y /onquoqe /eve/ 4ssemb/y /onquoqe is o /ow/eve/ computer proqromminq /onquoqe in which
eoch stotement corresponds to o sinq/e mochine instruction cl5c instructions foci/itote the extensive
monipu/otion of /ow/eve/ computotiono/ e/ements ond events such os memory binory orithmetic ond
oddressinq 1he qoo/ of the cl5c orchitecturo/ phi/osophy is to moke microprocessors eosy ond f/exib/e
to proqrom ond to provide for more efficient memory use
1he cl5c phi/osophy wos unquestioned durinq the 190s when the eor/y computinq mochines such os
the popu/or uiqito/ quipment corporotion PuP 11 fomi/y of minicomputers were beinq proqrommed
in ossemb/y /onquoqe ond memory wos s/ow ond expensive
cl5c mochines mere/y used the thenovoi/ob/e techno/oqies to optimite computer performonce 1heir
odvontoqes inc/uded the fo//owinq {1) 4 new processor desiqn cou/d incorporote the instruction set of
its predecessor os o subset of on everqrowinq /onquoqeno need to reinvent the whee/ codewise
with eoch desiqn cyc/e {) lewer instructions were needed to imp/ement o porticu/or computinq tosk
which /ed to /ower memory use for proqrom storoqe ond fewer timeconsuminq instruction fetches
from memory {l) 5imp/er compi/ers sufficed os comp/ex cl5c instructions cou/d be written thot c/ose/y
resemb/ed the instructions of hiqh/eve/ /onquoqes ln effect cl5c mode o computers ossemb/y
/onquoqe more /ike o hiqh/eve/ /onquoqe to beqin with /eovinq the compi/er /ess to do
1he terms cl5c ond kl5c {keduced lnstruction 5et computer) were coined ot this time to ref/ect the
wideninq sp/it in computerorchitecturo/ phi/osophy
kl5c
1he keduced lnstruction 5et computer or kl5c is o microprocessor cPu desiqn phi/osophy thot fovors
o simp/er set of instructions thot o// toke obout the some omount of time to execute 1he most
common kl5c microprocessors ore 4vk Plc 4kM uc 4/pho P4kl5c 5P4kc MlP5 ond l8Ms
PowerPc

kl5c chorocteristics
5mo// number of mochine instructions /ess thon 150
5mo// number of oddressinq modes /ess thon 4
5mo// number of instruction formots /ess thon 4
lnstructions of the some /enqth l bits {or 4 bits)
5inq/e cyc/e execution
Lood / 5tore orchitecture
Lorqe number of 6kPs {6enero/ Purpose keqisters) more thon l
nordwired contro/
5upport for nLL {niqh Leve/ Lonquoqe)
kl5c v5 cl5c
cl5c kl5c
mphosis on hordwore mphosis on softwore
lnc/udes mu/tic/ock

comp/ex instructions
5inq/ec/ock

reduced instruction on/y
Memorytomemory

LO4u ond 51Ok
incorporoted in instructions
keqister to reqister

LO4u ond 51Ok
ore independent instructions
5mo// code sites

hiqh cyc/es per second
Low cyc/es per second

/orqe code sites
1ronsistors used for storinq

comp/ex instructions
5pends more tronsistors

on memory reqisters













l xp/oin the fo//owinq with respect to the desiqn specificotions of on 4ssemb/er
4) uoto 5tructures 8) poss1 poss 4ssemb/er f/ow chort
4nsl
ata Structure
1he second step in our design procedure is to establish the databases that we have to work
with.


Pass 1 ata Structures
1. Input source program
2. A Location Counter (LC), used to keep track of each instruction's location.
3. A table, the Machine-operation 1able (MO1), that indicates the symbolic mnemonic, for
each instruction and its length (two, four, or six bytes)
4. A table, the Pseudo-Operation 1able (PO1) that indicates the symbolic mnemonic and
action to be taken for each pseudo-op in pass 1.
5. A table, the Symbol 1able (S1) that is used to store each label and its corresponding value.
. A table, the literal table (L1) that is used to store each literal encountered and its
corresponding assignment location.
7. A copy of the input to be used by pass 2.
Pass 2 ata Structures
1. Copy of source program input to pass1.
2. Location Counter (LC)
3. A table, the Machine-operation 1able (MO1), that indicates for each instruction, symbolic
mnemonic, length (two, four, or six bytes), binary machine opcode and format of instruction.
4. A table, the Pseudo-Operation 1able (PO1), that indicates the symbolic mnemonic and
action to be taken for each pseudo-op in pass 2.
5. A table, the Symbol 1able (S1), prepared by pass1, containing each label and corresponding
value.
. A 1able, the base table (B1), that indicates which registers are currently specified as base
registers by USIAC pseudo-ops and what the specified contents of these registers are.
7. A work space IAS1 that is used to hold each instruction as its various parts are being
assembled together.
8. A work space, PRIA1 LIAE, used to produce a printed listing.
9. A work space, PUACH CAR, used prior to actual outputting for converting the assembled
instructions into the format needed by the loader.
1. An output deck of assembled instructions in the format needed by the loader.

Format of ata Structures


1he third step in our design procedure is to specify the format and content of each of the data
structures. Pass 2 requires a machine operation table (MO1) containing the name, length,
binary code and format; pass 1 requires only name and length. Instead of using two different
tables, we construct single (MO1). 1he Machine operation table (MO1) and pseudo-operation
table are example of fixed tables. 1he contents of these tables are not filled in or altered
during the assembly process.
1he following figure depicts the format of the machine-op table (MO1)
-------------- bytes per entry ------------
Mnemonic
Opcode
(4bytes)
characters
Binary Opcode
(1byte)
(hexadecimal)
Instruction
length
(2 bits) (binary)
Instruction
format
(3bits)
(binary)
Aot
used
here
(3 bits)
"Abbb" 5A 1 1
"Ahbb" 4A 1 1
"ALbb" 5E 1 1
"ALRB" 1E 1
... ... ... ...
:b' represents "blank"


1he primary function performed by the analysis phase is the building of the symbol table. For
this purpose it must determine the addresses with which the symbol names used in a program
are associated. It is possible to determine some address directly, e.g. the address of the first
instruction in the program, however others must be inferred.
















4 uefine the fo//owinq
4) Porsinq
8) 5conninq
c) 1oken

4ns 4

Parsing:

Parsing transforms input text or string into a data structure, usually a tree, which is suitable
for later processing and which captures the implied hierarchy of the input. Lexical analysis
creates tokens from a sequence of input characters and it is these tokens that are processed by
a parser to build a data structure such as parse tree or abstract syntax trees.
Conceptually, the parser accepts a sequence of tokens and produces a parse tree. In practice
this might not occur.
1. 1he source program might have errors. Shamefully, we will do very little error handling.
2. Real compilers produce (abstract) syntax trees not parse trees (concrete syntax trees). We
don't do this for the pedagogical reasons given previously.
1here are three classes for grammar-based parsers.
1. Universal
2. 1op-down
3. Bottom-up
1he universal parsers are not used in practice as they are inefficient; we will not discuss them.







5conninq ond token
1here ore three phoses of ono/ysis with the output of one phose the input of the next och of these
phoses chonqes the representotion of the proqrom beinq compi/ed 1he phoses ore co//ed /exico/
ono/ysis or sconninq which tronsforms the proqrom from o strinq of chorocters to o strinq of tokens
5yntox 4no/ysis or Porsinq tronsforms the proqrom into some kind of syntox tree ond 5emontic
4no/ysis decorotes the tree with semontic informotion
1he chorocter streom input is qrouped into meoninqfu/ units co//ed /exemes which ore then mopped
into tokens the /otter constitutinq the output of the /exico/ ono/yter
lor exomp/e ony one of the fo//owinq c stotements
xl y + l
xl y + l
xl y+ l
but not
x l y + l
wou/d be qrouped into the /exemes xl y + l ond 4 token is o
tokennomeottributevo/ue poir 1he hierorchico/ decomposition obove sentence is qiven fiqure 10

4 token is o tokennome ottributevo/ue poir
lor exomp/e
1 1he /exeme xl wou/d be mopped to o token such os id1 1he nome id is short for identifier 1he
vo/ue 1 is the index of the entry for xl in the symbo/ tob/e produced by the compi/er 1his tob/e is used
qother informotion obout the identifiers ond to poss this informotion to subsequent phoses
1he /exeme wou/d be mopped to the token ln reo/ity it is probob/y mopped to o poir whose
second component is iqnored 1he point is thot there ore mony different identifiers so we need the
second component but there is on/y one ossiqnment symbo/
l 1he /exeme y is mopped to the token id
4 1he /exeme + is mopped to the token +
5 1he number l is mopped to number somethinq but whot is the somethinq On the one hond
there is on/y one l so we cou/d just use the token numberl
nowever there con be o difference between how this shou/d be printed {eq in on error messoqe
produced by subsequent phoses) ond how it shou/d be stored {fixed vs f/oot vs doub/e) Perhops the
token shou/d point to the symbo/ tob/e where on entry for this kind of l is stored 4nother possibi/ity is
to hove o seporote numbers tob/e

7 1he /exeme is mopped to the token
Note nonsiqnificont b/onks ore normo//y removed durinq sconninq ln c most b/onks ore non
siqnificont 1hot does not meon the b/onks ore unnecessory consider
int x
intx
Note thot we con define identifiers numbers ond the vorious symbo/s ond punctuotion without usinq
recursion {compore with porsinq be/ow)
Porsinq invo/ves o further qroupinq in which tokens ore qrouped into qrommotico/ phroses which ore
often represented in o porse tree












5 uescribe the process of 8ootstroppinq in the context of Linkers
4ns5
Boot straping:
In computing, bootstrapping refers to a process where a simple system activates another more
complicated system that serves the same purpose. It is a solution to the Chicken-and-egg
problem of starting a certain system without the system already functioning. 1he term is most
often applied to the process of starting up a computer, in which a mechanism is needed to
execute the software program that is responsible for executing software programs (the
operating system).
Bootstrap loading:
1he discussions of loading up to this point have all presumed that there's already an operating
system or at least a program loader resident in the computer to load the program of interest.
1he chain of programs being loaded by other programs has to start somewhere, so the obvious
question is how is the first program loaded into the computer?
Many Unix systems use a similar bootstrap process to get user-mode programs running. 1he
kernel creates a process, then stuffs a tiny little program, only a few dozen bytes long, into that
process. 1he tiny program executes a system call that runs /etc/init, the user mode
initialization program that in turn runs configuration files and starts the daemons and login
programs that a running system needs.
Software Bootstraping & Compiler Bootstraping:
Bootstrapping can also refer to the development of successively more complex, faster
programming environments. 1he simplest environment will be, perhaps, a very basic text
editor (e.g. ed) and an assembler program. Using these tools, one can write a more complex
text editor, and a simple compiler for a higher-level language and so on, until one can have a
graphical IE and an extremely high-level programming language
Compiler Bootstraping:
In compiler design, a bootstrap or bootstrapping compiler is a compiler that is written in the
target language, or a subset of the language, that it compiles. Examples include gcc, CHC,
OCaml, BASIC, PL/I and more recently the Mono C# compiler.



uescribe the procedure for desiqn of o Linker
Ans:
esign of a linker
Relocation and linking requirements in segmented addressing
1he relocation requirements of a program are influenced by the addressing structure of the
computer system on which it is to execute. Use of the segmented addressing structure reduces
the relocation requirements of program.
A Linker for MS-OS
Example : Consider the program of written in the assembly language of intel 888. 1he
ASSUME statement declares the segment registers CS and S to the available for memory
addressing. Hence all memory addressing is performed by using suitable displacements from
their contents. 1ranslation time address o A is 19. In statement 1, a reference to A is
assembled as a displacement of 19 from the contents of the CS register. 1his avoids the use of
an absolute address, hence the instruction is not address sensitive. Aow no relocation is
needed if segment SAMPLE is to be loaded with address 2 by a calling program (or by the
OS). 1he effective operand address would be calculated as <CS>+19, which is the correct
address 219. A similar situation exists with the reference to B in statement 17. 1he reference
to B is assembled as a displacement of 2 from the contents of the S register. Since the S
register would be loaded with the execution time address of A1A_HERE, the reference to B
would be automatically relocated to the correct address.

1hough use of segment register reduces the relocation requirements, it does not completely
eliminate the need for relocation. Consider statement 14 .
MOJ AX, A1A_HERE
Which loads the segment base of A1A_HERE into the AX register preparatory to its transfer
into the S register . Since the assembler knows A1A_HERE to be a segment, it makes
provision to load the higher order 1 bits of the address of A1A_HERE into the AX register.
However it does not know the link time address of A1A_HERE, hence it assembles the MOJ
instruction in the immediate operand format and puts zeroes in the operand field. It also
makes an entry for this instruction in RELOC1AB so that the linker would put the appropriate
address in the operand field. Inter-segment calls and jumps are handled in a similar way.
Relocation is somewhat more involved in the case of intra-segment jumps assembled in the
FAR format. For example, consider the following program :
FAR_LAB EQU 1HIS FAR ; FAR_LAB is a FAR label
1MP FAR_LAB ; A FAR jump
Here the displacement and the segment base of FAR_LAB are to be put in the 1MP
instruction itself. 1he assembler puts the displacement of FAR_LAB in the first two operand
bytes of the instruction , and makes a RELOC1AB entry for the third and fourth operand
bytes which are to hold the segment base address. A segment like
AR_A W OFFSE1 A
(which is an :address constant') does not need any relocation since the assemble can itself put
the required offset in the bytes. In summary, the only RELOCA1AB entries that must exist for
a program using segmented memory addressing are for the bytes that contain a segment base
address.
For linking, however both segment base address and offset of the external symbol must be
computed by the linker. Hence there is no reduction in the linking requirements.








July 2011
Master of Computer Application (MCA) - Semester 3
Mc007l 5ystem Proqromminq4 credits
4ssiqnment 5et

1 uiscuss the vorious 4ddressinq mode for cl5c


4ns1
AJJrexxlng MoJex of CISC :
1he 8 addressing (Motorola) modes
Register to Register,
Register to Memory,
Memory to Register, and
Memory to Memory
8 Supports a wide variety of addressing modes.
Immediate mode -- the operand immediately follows the instruction
Absolute address - the address (in either the "short" 1-bit form or "long" 32-bit form) of
the operand immediately follows the instruction
Program Counter relative with displacement - A displacement value is added to the program
counter to calculate the operand's address. 1he displacement can be positive or negative.
Program Counter relative with index and displacement - 1he instruction contains both the
identity of an "index register" and a trailing displacement value. 1he contents of the index
register, the displacement value, and the program counter are added together to get the final
address.
Register direct - 1he operand is contained in an address or data register.
Address register indirect - An address register contains the address of the operand.
Address register indirect with predecrement or postdecrement - An address register contains
the address of the operand in memory. With the predecrement option set, a predetermined
value is subtracted from the register before the (new) address is used. With the postincrement
option set, a predetermined value is added to the register after the operation completes.
Address register indirect with displacement - A displacement value is added to the register's
contents to calculate the operand's address. 1he displacement can be positive or negative.
Address register relative with index and displacement - 1he instruction contains both the
identity of an "index register" and a trailing displacement value. 1he contents of the index
register, the displacement value, and the specified address register are added together to get
the final address.



































write obout ueterministic ond Nonueterministic linite 4utomoto with suitob/e numerico/
exomp/es
4ns
eterministic finite automata (FA) :
A deterministic finite automaton (FA) is a 5-tuple: (S, 2, 1, s, A)
an alphabet (2)
a set of states (S)
a transition function (1 : S 2 S).
a start state (s S)
a set of accept states (A S)
1he machine starts in the start state and reads in a string of symbols from its alphabet. It uses
the transition function 1 to determine the next state using the current state and the symbol just
read. If, when it has finished reading, it is in an accepting state, it is said to accept the string,
otherwise it is said to reject the string. 1he set of strings it accepts form a language, which is
the language the FA recognizes.

on-etermlnlxtlc Flnlte Automuton (-FA):
A Aon-eterministic Finite Automaton (AFA) is a 5-tuple: (S, 2, 1, s, A)
an alphabet (2)
a set of states (S)
a transition function (1: S 2 S).
a start state (s S)
a set of accept states (A S)
Where P(S) is the power set of S and e is the empty string. 1he machine starts in the start state
and reads in a string of symbols from its alphabet. it is in an accepting state, it is said to accept
the string, otherwise it is said to reject the string. 1he set of strings it accepts form a language,
which is the language the AFA recognizes.
l write o short note on
4) c Preprocessor for 6cc version
8) conditiono/ 4ssemb/y
4nsl
1he C Preprocessor for CCC version 2
1he C preprocessor is a macro processor that is used automatically by the C compiler to
transform your program before actual compilation. It is called a macro processor because it
allows you to define macros, which are brief abbreviations for longer constructs.
1he C preprocessor provides four separate facilities that you can use as you see fit:
Inclusion of header files. 1hese are files of declarations that can be substituted into your
program.
Macro expansion. You can define macros, which are abbreviations for arbitrary fragments of
C code, and then the C preprocessor will replace the macros with their definitions throughout
the program.
Conditional compilation. Using special preprocessing directives, you can include or exclude
parts of the program according to various conditions.
Line control. If you use a program to combine or rearrange source files into an intermediate
file which is then compiled, you can use line control to inform the compiler of where each
source line originally came from.
AASI Standard C requires the rejection of many harmless constructs commonly used by
today's C programs. Such incompatibility would be inconvenient for users, so the CAU C
preprocessor is configured to accept these constructs by default. Strictly speaking, to get AASI
Standard C, you must use the options `-trigraphs', `-undef' and `-pedantic', but in practice the
consequences of having strict AASI Standard C make it undesirable to do this.

Conditional Assembly :
Means that some sections of the program may be optional, either included or not in the final
program, dependent upon specified conditions. A reasonable use of conditional assembly
would be to combine two versions of a program, one that prints debugging information during
test executions for the developer, another version for production operation that displays only
results of interest for the average user. A program fragment that assembles the instructions to
print the Ax register only if ebug is true is given below. Aote that true is any non-zero value.

Here is a conditional statements in C programming, the following statements tests the
expression `BUFSIZE 12', where `BUFSIZE' must be a macro.
#if BUFSIZE 12
printf ("Large buffers!n");
#endif / BUFSIZE is large /


























4 write obout different Phoses of compi/otion
4ns4
Phases of Compiler
A compiler takes as input a source program and produces as output an equivalent sequence of
machine instructions. 1his process is so complex that it is not reasonable, either from a logical
point of view or from an implementation point of view, to consider the compilation process as
occurring in one single step. For this reason, it is customary to partition the compilation
process into a series of sub processes called phases, as shown in the Fig 1.2. A phase is a
logically cohesive operation that takes as input one representation of the source program and
produces as output another representation.
1he syntax analyzer groups tokens together into syntactic structures. For example, the three
tokens representing A + B might be grouped into a syntactic structure called an expression.
Expressions might further be combined to form statements. Often the syntactic structure can
be regarded as a tree whose leaves are the tokens. 1he interior nodes of the tree represent
strings of tokens that logically belong together.
Code Optimization is an optional phase designed to improve the intermediate code so that the
ultimate object program runs faster and / or takes less space. Its output is another
intermediate code program that does the same job as the original, but perhaps in a way that
saves time and / or space.
1he final phase, code generation, produces the object code by deciding on the memory
locations for data, selecting code to access each datum, and selecting the registers in which
each computation is to be done. esigning a code generator that produces truly efficient object
programs is one of the most difficult parts of compiler design, both practically and
theoretically.
1he 1able-Management, or bookkeeping, portion of the compiler keeps track of the names
used by the program and records essential information about each, such as its type (integer,
real, etc). 1he data structure used to record this information is called a Symbol table.
1he Error Handler is invoked when a flaw in the source program is detected. It must warn the
programmer by issuing a diagnostic, and adjust the information being passed from phase to
phase so that each phase can proceed. It is desirable that compilation be completed on flawed
programs, at least through the syntax-analysis phase, so that as many errors as possible can be
detected in one compilation. Both the table management and error handling routines interact
with all phases of the compiler.





5 whot is M4ckO? uiscuss its uses
4ns5
Macro definition and Expansion
efinition : macro

A macro name is an abbreviation, which stands for some related lines of code. Macros are
useful for the following purposes:
1o simplify and reduce the amount of repetitive coding
1o reduce errors caused by repetitive coding
1o make an assembly program more readable.
A macro consists of name, set of formal parameters and body of code. 1he use of macro name
with set of actual parameters is replaced by some code generated by its body. 1his is called
macro expansion.
Macros allow a programmer to define pseudo operations, typically operations that are
generally desirable, are not implemented as part of the processor instruction, and can be
implemented as a sequence of instructions. Each use of a macro generates new program
instructions, the macro has the effect of automating writing of the program.

For instance,
define max (a, b) a>b? A: b
efines the macro max, taking two arguments a and b. 1his macro may be called like any C
function, using identical syntax. 1herefore, after preprocessing
z max(x, y);
Becomes z x>y? X:y;
While this use of macros is very important for C, for instance to define type-safe generic data-
types or debugging tools, it is also slow, rather inefficient, and may lead to a number of
pitfalls.

whot is compi/er? xp/oin the compi/er process
Compller :
A compiler is a computer program (or set of programs) that translates text written in a
computer language (the source language) into another computer language (the target
language). 1he original sequence is usually called the source code and the output called object
code. Commonly the output has a form suitable for processing by other programs (e.g., a
linker), but it may be a human-readable text file.
Compller BuckenJ:
While there are applications where only the compiler frontend is necessary, such as static
language verification tools, a real compiler hands the intermediate representation generated
by the frontend to the backend, which produces a functional equivalent program in the output
language. 1his is done in multiple steps:
1. Optimization - the intermediate language representation is transformed into functionally
equivalent but faster (or smaller) forms.
2. Code Ceneration - the transformed intermediate language is translated into the output
language, usually the native machine language of the system. 1his involves resource and
storage decisions, such as deciding which variables to fit into registers and memory and the
selection and scheduling of appropriate machine instructions.
1he compiler frontend consists of multiple phases in itself, each informed by formal language
theory:
1. Scanning - breaking the source code text into small pieces, tokens - sometimes called
:terminals' - each representing a single piece of the language, for instance a keyword,
identifier or symbol names. 1he token language is typically a regular language, so a finite
state automaton constructed from a regular expression can be used to recognize it.
2. Parsing - Identifying syntactic structures - so called :non-terminals' - constructed from one
or more tokens and non-terminals, representing complicated language elements, for instance
assignments, conditions and loops. 1his is typically done with a parser for a context-free
grammar, often an LL parser or LR parser from a parser generator
3. Intermediate Language Ceneration - an equivalent to the original program is created in a
special purpose intermediate language

Anda mungkin juga menyukai