Anda di halaman 1dari 88

CS 9303 SYSTEM SOFTWARE

NTERNALS
V.P JAYA CHTRA
Computer Technology Dept
Course Objective
This course aids the learners to understand the basic functions
of Software components, viz. Assemblers ,Loaders Linkers,
Macro processors and Compilers. Also discusses about design
and implementation of Assemblers and Macro processor with
examples. t then ntroduces the concept of Virtual machine
with object-oriented features supported. The performance of
Emulation Techniques were also analyzed. As a prerequisite
the learner should have had some exposure to elementary data
structures and Assembly language.
$COPE
At the end of the course, the learners will be able to:
Design and mplement Assemblers.
Understand and Analyze the features of Loaders and Linkers.
Design and implement Macro processor.
Understand about the design and operations of Compilers.
Analyze the implementation of Virtual machine by supporting
object oriented programming features.
Analyze the performance of emulation techniques
&% PLA
&% 1
&3it PIa3
Title Sessions
Machine nstructions and programs Session 1
Assemblers Basic Assemblers functions Session 2
Simple SC Assembler Session 3
Assembler algorithm and data structures Session 4
Machine-dependent Assembler features
i)nstruction formats and Addressing
modes
Session 5
Co3td.
Title Sessions
Machine-dependent Assembler features
ii)Program Relocation
Session 6
Machine-independent Assembler features
i)Litrerals ii)Statements iii)Expressions
Session 7
Machine-independent Assembler features
iv)Program blocks v)Control sections and
Program Linking
Session 8
&3it 1:Review of Computer
Architecture
&3it 1:
Review of Computer Architecture
Objective:
n this unit the basic concepts of program assembly is explained
using SC machine. This begins with the discussion of the
relationships between system software and machine
Architecture. The assemblers machine-dependent and Machine-
ndependent features is also discussed. The essentials of a one
and two-pass assembler is also presented. As a result this unit
aids in design and implementation of an Assembler.
achi3e 3structio3s a3d programs
$essio3 1
3troductio3 to system software
$oftware
AppIicatio3 software usually used by end-user
Concerned with the solution of some problem, using the
computer as a tool.
$ystem software
System software consists of a variety of programs that support
the operation of a computer.
Acts as an intermediary between users and hardware.
Creates a virtual environment for the user that hides the actual
computer architecture.
Virtual Machine: Set of services and resources created by
the system software and seen by the user.
The characteristic in which most system software differ from
application software is machi3e depe3de3cy.
$ystem software.
System
Software
Hardware
3terface B
Actual
Machine
nterface
Virtual
Machine
nterface
nterface A
Virtual Machine
Figure 1.1 The Role of System Software
$ystem software.
compo3e3ts
La3guage $ervices
Write programs in a high-level, user-oriented language, and then
execute them i.e Translator
assembler
compiler
interpreter
emory ma3agers
Allocate and retrieve memory space
loader
linker
other utiIities
Collections of library routines that provide services either to user or
system routines.
DBMS, editor, debugger, ...
$ystem software.
Compiler :
Translates high-level language to assembly language.
Assembler :
Translates assembly language to machine language
(object files).
Linker :
Builds an executable file from a collection of object files.
Loader:
Reads instructions from the object file and stores them
into memory for execution.
ssues i3 $ystem $oftware
Adva3ced architectures compIicates system software
Superscalar CPU
Memory model
Multiprocessor
ew appIicatio3s
Embedded systems
Mobile computing
achi3e 3structio3s a3d programs
Instruction Set
Load and store registers
LDA, LDX, STA, STX, etc.
Integer arithmetic operations
ADD, SUB, MUL, DIV
All arithmetic operations involve register A and a word in
memory, with the result being leIt in A
COMP
Conditional iump instructions
JLT, JEQ, JGT
Subroutine linkage
JSUB, RSUB
I/O (transIerring 1 byte at a time to/Irom the rightmost 8 bits oI
register A)
Test Device instruction (TD)
Read Data (RD)
Basic AssembIer Fu3ctio3s
Session 2
3troductio3 to AssembIers
Assembler Functions:
Translating mnemonic operation codes to their machine
language equivalents.
mnemonic code to machine code
Assigning machine addresses to symbolic labels.
symbols to addresses
Handles
Constants
Literals
Addressing
Assembly language:
A symbolic representation of machine instructions.
AssembIers.
Assembler
Linker
Loader
Source
Program
Object
Code
Executable
Code
Figure 1.2 Compilation pipeline
AssembIers.
Basic assembIer directives:
START : Starting address of the program
END : ndicate the end of the program
BYTE : To represent the constant
WORD : Generate one-word integer constant
RESB : Reserve the indicated number of bytes
for a data area.
RESW : Reserve the indicated number of words
for a data area.
$C AssembIer
AssembIer Fu3ctio3s:
Convert Mnemonic Operation Codes to Machine Level Equivalents.
Mnemonic code (or instruction name) opcode.
Convert Symbolic Operands to their equivalent machine addresses.
(Requires Two passes).
Symbolic operands (e.g., variable names) addresses.
Build the machine instructions in the proper format
Convert data constants specified in source program into their internal
machine representations.
Constants Numbers.
To write Object Program and assembly listing.
$C AssembIer
$essio3 3
$C AssembIer.
ssues :
Address translation
Contains forward reference
Reference to label that is defined later in the program.
Requires two passes
label definitions and assign addresses
actual translation (object code)
ExampIe Program with Object
Code
3e oc Source stateme3t Object code
5 1000 COPY START 1000
10 1000 FIRST STL RETADR 141033
15 1003 CLOOP JSUB RDREC 482039
20 1006 LDA LENGTH 001036
25 1009 COMP ZERO 281030
30 100C JEQ ENDFIL 301015
35 100F JSUB WRREC 482061
40 1012 J CLOOP 3C1003
45 1015 ENDFIL LDA EOF 00102A
50 1018 STA BUFFER 0C1039
55 101B LDA THREE 00102D
60 101E STA LENGTH 0C1036
65 1021 JSUB WRREC 482061
70 1024 LDL RETADR 081033
75 1027 RSUB 4C0000
80 102A EOF BYTE C'EOF' 454F46
85 102D THREE WORD 3 000003
90 1030 ZERO WORD 0 000000
95 1033 RETADR RESW 1
100 1036 LENGTH RESW 1
105 1039 BUFFER RESB 4096
110 .
115 . SUBROUTINE TO READ RECORD INTO BUFFER
Fig. 1.3 ExampIe Program
Co3td..
3e oc Source stateme3t Object co
120 .
125 2039 RDREC LDX ZERO 041030
130 203C LDA ZERO 001030
135 203F RLOOP TD INPUT E0205D
140 2042 JEQ RLOOP 30203D
145 2045 RD INPUT D8205D
150 2048 COMP ZERO 281030
155 204B JEQ EXIT 302057
160 204E STCH BUFFER,X 549039
165 2051 TIX MAXLEN 2C205E
170 2054 JLT RLOOP 38203F
175 2057 EXIT STX LENGTH 101036
180 205A RSUB 4C0000
185 205D INPUT BYTE X'F1' F1
190 205E MAXLEN WORD 4096 001000
195 .
200 . SUBROUTINE TO WRITE RECORD FROM BUFFER
205 .
210 2061 WRREC LDX ZERO 041030
215 2064 WLOOP TD OUTPUT E02079
220 2067 JEQ WLOOP 302064
225 206A LDCH BUFFER,X 509039
230 206D WD OUTPUT DC2079
235 2070 TIX LENGTH 2C1036
240 2073 JLT WLOOP 382064
245 2076 RSUB 4C0000
250 2079 OUTPUT BYTE X'05' 05
255 END FIRST Fig. 1.4 ExampIe Program
Object code.
Purpose
Reads records from input device (code F1)
Copies them to output device (code 05)
At the end of the file, writes EOF on the output device, then RSUB
to the operating system
Data transfer (RD, WD)
A buffer is used to store record
Buffering is necessary for different /O rates
The end of each record is marked with a null character (00
16
)
The end of the file is indicated by a zero-length record
Subroutines (JSUB, RSUB)
RDREC, WRREC
Save link register first before nested jump
Object Program
The generated object code of an assembler .
The Object program format contains three types of records:
Header
Contains program name, start address and length.
Text
Contains Translated code and data of the program with
addresses (where to be loaded)
End
Specifies the end of the Object program
Address of first executable instruction
Object Program
Header record:
Col. 1 H
Col. 2-7 Program name
Col. 8-13 Starting address (hex)
Col. 14-19 Length of object program in bytes (hex)
Text record:
Col.1 T
Col.2-7 Starting address in this record (hex)
Col. 8-9 Length of object code in this record in bytes (hex)
Col. 10-69 Object code (69-10+1)/6=10 instructions
End record:
Col.1 E
Col.2-7 Address of first executable instruction (hex)
g 1.5 Object Program
Co3td.
Pass 1 (define symbols)
H COPY 001000 00107A
T 001000^1E^141033^482039^001036^281030^301015^482061 ...
T 00101E^15^0C1036^482061^081044^4C0000^454F46^000003^000000
T 002039^1E^041030^001030^E0205D^30203F^D8205D^281030
T 002057^1C^101036^4C0000^F1^001000^041030^E02079^302064
T 002073^07^382064^4C0000^05
E 001000 starting address
Fig 1.6 Object program Corresponding to Fig 1.3, Fig 1.4
Symbol used to separate fields
Co3td.
Pass 1(define symbols)
1. Assign addresses to all statements in the program
2. Save the values assigned to all labels for use in Pass 2
3. Perform some processing of assembler directives
Pass 2(assemble instructions and generate object program)
1. Assemble instructions
2. Generate data values defined by BYTE, WORD
3. Perform processing of assembler directives not done in Pass 1
4. Write the object program and the assembly listing
AssembIer AIgorithm a3d Data
$tructures
$E$$O 4
AssembIer AIgorithm a3d Data
$tructures
OPTAB (operation code table)
mnemonic, machine code (instruction format, length) etc.
static table
instruction length
array or hash table, easy for search
SYMTAB (symbol table)
label name, value, flag, (type, length) etc.
dynamic table (insert, delete, search)
hash table, non-random keys, hashing function
Location Counter
counted in bytes
Co3td.
Pass 1 Pass 2
ntermediate
file
Source
program
Object
code
Optab
Symtab
Symtab
AIgorithm for pass1 assembIer
Co3td.
Co3td.
Co3td.
Co3td.
AIgorithm for pass 2 AssembIer
Co3td.
AssembIer Features
$E$$O 5
AssembIer Features
Machine Dependent Assembler Features
instruction formats and addressing modes
program relocation
Machine ndependent Assembler Features
literals
symbol-defining statements
expressions
program blocks
control sections and program linking
3structio3 Format a3d Addressi3g
ode
Addressing Modes:
Extended format: +op m
ndirect addressing: op @m
mmediate addressing: op #c
ndex addressing: op m,X
Relative addressing: op m
3structio3 Format a3d Addressi3g
ode
START directive specifies a beginning program address of 0: a
relocatable program.
Register-to-register i3structio3s: simply convert the
mnemonic name to their number equivalents
OPTAB: for opcodes
SYMTAB: preloaded with register names and their values
Fetch a value stored in a register is much faster than
fetch it from the memory - mproves ececution speed.
Co3td.
PC or base reIative addressi3g
Calculate displacement
Displacement must be small enough to fit in the 12-bit field
(-2048..2047 for PC relative mode, 0..4095 for base relative
mode)
Can save one byte from using format 3 rather than format 4.
Reduce program storage space
Reduce program instruction fetch time
Relocation will be easier.
Exte3ded i3structio3 format (4-byte)
20-bit field for direct addressing
Co3td.
mmediate addressing mode is used whenever possible.
Operand is already included in the fetched instruction.
There is no need to fetch the operand from the memory.
ndirect addressing mode is used whenever possible.
Just one instruction rather than two is enough.
ExampIes:
ReIocatabIe programs
Starting address is 0.
Register to register i3structio3s
$impIe addressi3g
Use extended format instructions (bit e = 1).
15 0006 CLOOP +JSUB RDREC 4B101036
125 1036 RDREC CLEAR X B410
150 1049 COMPR A,S A004
5 0000 COPY START 0
Co3td.
PC-reIative
RETADR (0030) 3 = 2D.
Bits 5, 3, & = 1(set to 1).
Operand address is 0006, PC is 0001A.
Displacement is 6 1A = 14 (FEC in 2's complement).
40 0017 J CLOOP 3F2FEC
10 0000 FRST STL RETADR 17202D
Co3td.
Base reIative:
Declare value of base register.
Address of identifier LENGTH (0033).
Directives BASE& NOBASE do not generate code.
Address of BUFFER is 0036.
Contents of BASE are 0033.
Displacement 0036- 0033= 0003.
Note: Bits x& b are 1.
12 LDB #LENGTH
13 BASE LENGTH
160 104E STCH BUFFER,X 57C003
Co3td.
mmediate addressi3g
Operand (= 3) part of instruction.
Bit = 1, indicates immediate addressing.
Operand (4096) > 12 bits.
"+ char indicates extended format (bit e = 1).
Directive "# is address-of operator.
55 0020 LDA #3 01003
133 103C +LDT #4096 75101000
12 0003 LDB #LENGTH 69202D
Program ReIocatio3
$E$$O 7
Program ReIocatio3
AbsoIute Program :
Program with starting address specified at assembly time.
Program reIocatio3:
Programs with absolute addresses must be loaded at a specific
start3 address. so that they can be loaded and execute correctly
at any place in the memory. The address may be invalid if the
program is loaded into some where else.
%o have relocatable programs
Assembler identifies object records that must be modified.
Loader modifies these records.
Co3td.
eed for Program ReIocatio3:
To increase the productivity of the machine
Want to load and run several programs at the same time
(multiprogramming)
Must be able to load programs into memory wherever there is
room
Actual starting address of the program is not known until load
time
Co3td.
ExampIe :
Co3sider the foIIowi3g i3structio3s
3structio3 "+JSUB RDREC"
3structio3 "STL RETADR"
AssembIer i3serts address of RDREC relative to start of
program.
AssembIer i3structs Ioader to add program's begi33i3g
address to address of fieId i3 J$&B i3structio3 at Ioad
time.
Contd.
odificatio3 Record:
When the assembler generate an address for a symbol, the
address to be inserted into the instruction is relative to the start
of the program.
The assembler also produces a modification record, in which
the address and length of the need-to-be-modified address field
are stored.
The loader, when seeing the record, will then add the beginning
address of the loaded program to the address field stored in the
record.
Co3td.
nstructions need to be modified:
The address portion of those instructions that use absolute
(direct) addresses.
nstructions need not be modified:
mmediate addressing (no memory references)
Register-to-register instructions (no memory references)
PC or base-relative addressing (relative displacement
remains the same regardless of different starting addresses)
Co3td.
odificatio3 Record
Col. 1 M
Col. 27 Starting location of the address field to be modified,
relative to the beginning of the program (hex)
Col. 89 Length of the address field to be modified in half-bytes.
ampleJ$&B RDREC 3structio3
nstruction "JSUB RDREC assembles into 4B101036.
Starts at address 0006.
Modification record M00000705.
Load address to be added to field at relative address, 00007.
Field to be modified is 5 half-bytes long (20 bits).
Co3td.
Fig 1.6 Examples of Relocation Program
achi3e 3depe3de3t Feature
$E$$O 7
LiteraIs
LiteraI
Operand whose value appears lterally (co3sta3t) in instruction.
dentified by the prefix "=
'C' chars (1 per byte); 'X' hexadecimals (2 per byte).
Assembler defines constant in memory.
Operand becomes reference to this location.
iteral pools
Literals are assembled into literal pools.
LTORG creates literal pool and inserts accumulated literals.
Ensures short addresses are valid.
:plicate iterals
Assembler must recognize du5lcate literals and store only one copy of
the specified data value .
Special literals (e.., =*) must be duplicated.
LiteraI - mpIeme3tatio3
LTTAB
Literal name, the operand value and length, the address assigned
to the operand
Pass 1
Build LTTAB with literal name, operand value and length, leaving
the address unassigned
When LTORG statement is encountered, assign an address to
each literal not yet assigned an address
Pass 2
search LTTAB for each literal operand encountered
generate data values using BYTE or WORD statements
generate modification record for literals that represent an address
in the program
$ymboIs
LabeIs o3 i3structio3s or data areas
EQ& irective
symbol EQU value
Creates e3try i3 symboI tabIe ($%AB) & assig3s vaIue to it.
Value may be expression involving constants and symbols
previously defined.
ORG irective
# value
Resets LOCC%R to vaIue specified.
Co3td.
amples
$impIe co3sta3ts
MAXLEN EQU 4096
. . .
+LDT #MAXLEN
Array of records
STAB RESB 1100
ORG STAB
SYMBOL RESB 6
VALUE RESB 1
FLAGS RESB 2
ORG STAB+1100
. . .
LDA VALUE,X
For an ordinary two-pass assembler, all symbols must be defined during Pass 1.
Hence, the following sequences could not be processed by an ordi3ary two-pass
assembIer.
All terms used to specify the value of the new symbol must have been defined
previously in the program.
BETA EQU ALPHA
ALPHA RESW 1
Disallowed
ORG ALPHA
BYTE1 RESB 1
BYTE2 RESB 1
BYTE3 RESB 1
ORG
ALPHA RESB 1
Disallowed
ALPHA RESW 1
BETA EQU ALPHA
Allowed
Expressio3s
Expressio3 may use co3sta3ts, user-defi3ed terms, speciaI terms.
Location counter is one such special term.
Expressio3s ca3 be cIassified as absoIute expressio3s or reIative
expressio3s
bsol:te vs. Relative pressions
A3 absol:te expressio3 is i3depe3de3t of program Iocatio3.
Expressions that only contain absolute terms are absolute.
The difference of two relative terms is absolute.
Expressions with pairs of relative terms with opposite signs are absolute.
%he absoIute expressio3 may co3tai3s reIative terms provided the reIative
terms occur i3 pairs a3d the terms i3 each such pair have opposite sig3s.
o reIative term ca3 e3ter muItipIicatio3 or divisio3 operatio3.
e.g. AXLE EQ& B&FED-B&FFER
Co3td.
A relative expressio3 depe3ds o3 program Iocatio3.
The value of a relative expression is relative to the beginning address of the object
program.
All of the relative terms except one have opposite signs.
The remaining relative term is positive.
A reIative expressio3 is o3e i3 which aII of the reIative terms except
o3e ca3 be paired as described above. %he remai3i3g u3paired term
must have a positive sig3. o reIative term ca3 e3ter muItipIicatio3 or
divisio3 operatio3.
BUFEND+BUFFER, 100-BUFFER, and 3*BUFFER are neither relative expressions
nor absolute expressions.
Expressions that are neither relative nor absolute should be flagged by the assembler as errors.
Symbol table entries must be tagged as relative or absolute.
Co3td.
ample
Co3sider some of the symboIs
RETADR RESW 1
LENGTH RESW 1
BUFFER RESB 4096
BUFEND EQU *
MAXLEN EQU BUFFEND-BUFFER
$ymboI %ype VaIue
RETADR R 0030
LENGTH R 0033
BUFFER R 0036
BUFEND R 1036
MAXLEN A 1000
Program BIocks
efinition
Code segments that are rearranged within a single object program unit.
ontrol Sections
Code segments that are translated into independent object program units.
&$E irective
&$E [ BIock_ame]
3dicates which portio3s of program beIo3g to various bIocks:
Default unnamed block, or
Named block.
&sed to reduce addressi3g probIems i3 a program.
Rearra3ged at link time or load time.
f no USE statements are included, the entire program belongs
to this single block unit.
Program BIocks - mpIeme3tatio3
Pass 1
Each program block has a separate location counter .
Each label is assigned an address that is relative to the start of the
block that contains it .
At the end of Pass 1, the latest value of the location counter for
each block indicates the length of that block .
The assembIer can then assign to each block a starting address
in the object program .
Pass 2
The address of each symbol can be computed by adding the
assigned block starting address and the relative address of the
symbol to that block .
Co3td.
Each source line is given a relative address assigned and a block
number
ample
BIock %abIe
BIock ame ame Address Le3gth
(default) 0 0000 0066
CDATA 1 0066 000B
CBLKS 2 0071 1000
Program Li3ki3g
ontrol Sections
Code segme3ts tra3sIated i3to i3depe3de3t object program u3its.
Each sectio3 ca3 be Ioaded & reIocated i3depe3de3tIy.
A sectio3 is made o3e or more reIated routi3es.
$ectio3s must be Ii3ked together to form a program.
C$EC% irective
label CSECT
Starts and names a new control section.
Exter3aI Defi3itio3 a3d Refere3ces
Exter3aI defi3itio3
EX%DEF 3ame [, 3ame]
EXTDEF names symbols that are defined in this control
section and may be used by other sections
Exter3aI refere3ce
EX%REF 3ame [,3ame]
EXTREF names symbols that are used in this control
section and are defined elsewhere
Co3td.
EX%REF irective
EXTREF symbol(,symbol)*
EXTDEF symbol(,symbol)*
ExampIe
15 0003 CLOOP +JSUB RDREC 4B100000
160 0017 +STCH BUFFER,X 57900000
190 0028 MAXLEN WORD BUFEND-BUFFER 000000
mpIeme3tatio3
The assembler must include inIormation in the obiect program that
will cause the loader to insert proper values where they are required
bject File Records
DeIine record
Col. 1 D
Col. 2-7 Name oI external symbol deIined in this control section
Col. 8-13 Relative address within this control section (hexadeccimal)
Col.14-73 Repeat inIormation in Col. 2-13 Ior other external symbols
ReIer record
Col. 1 D
Col. 2-7 Name oI external symbol reIerred to in this control section
Col. 8-73 Name oI other external reIerence symbols
Co3td.
Modification record (ew & mproved)
Col. 1 M
Col. 2-7 Starting address of the field to be modified (hexiadecimal)
Col. 8-9 Length of the field to be modified, in half-bytes
(hexadeccimal)
Col. 10 Modification flag (+ or ).
Col.11-16 External symbol whose value is to be added to or
subtracted from the indicated field
Note: control section name is automatically an external symbol, i.e. it is
available for use in Modification records.
ssembler Desg3
Assembler Design can be done in:
Single pass
Two pass
One Pass Assembler:
Does everything in single pass
Cannot resolve the forward referencing
Co3td.
Multi pass assembler:
Does the work in two pass
Resolves the forward references
First pass:
Scans the code
Validates the tokens
Creates a symbol table
Second Pass:
Solves forward references
Converts the code to the machine code
O3e Pass ssembler
Problems in One-pass assembler
Forward references to Data items
Forward references to labels on instructions
Solution
Require all such areas be defined before they are referenced
Labels on instructions: no good solution
Two types of one-pass assembler
Load-and-go
Produce code for immediate execution.
The other
Produce code for later execution
Load-a3d-go Assembler
Characteristics
Useful for program development and testing
Avoids the overhead of writing the object program out and
reading it back
Both one-pass and two-pass assemblers can be designed
as load-and-go.
However one-pass also avoids the over head of an
additional pass over the source program
For a load-and-go assembIer, the actual address must be
known at assembly time, we can use an absolute program
uIti-Pass AssembIers
Restriction on EQU and ORG
No forward reference, as symbol's value can't be defined during
the first pass .
Example:
ALPHA EQ& BE%A
BE%A EQ& DEL%A
DEL%A RE$ 1
Assemblers with 2 passes cannot resolve .
Co3td.
Resolve forward references with as many
passes as needed
Portions that involve forward references in symbol
definition are saved during Pass 1.
Additional passes through stored definitions.
Finally a normal Pass 2.
Example implementation:
Use link lists to keep track of whose value depend
on an undefined symbol.
mpIeme3tatio3 exampIe:
icrosoft A$ AssembIer
SEGMENT
a collection segments, each segment is defined as
belonging to a particular class, CODE, DATA, CONST,
STACK
registers: CS (code), SS (stack), DS (data), ES, FS, GS
similar to program blocks in SC
ASSUME
e.g. ASSUME ES:DATASEG2
e.g. MOVE AX, DATASEG2
MOVE ES,AX
similar to BASE in SC
Co3td.
JUMP with forward reference
near jump: 2 or 3 bytes
far jump: 5 bytes
e.g. JMP TARGET
Warning: JMP FAR PTR TARGET
Warning: JMP SHORT TARGET
Pass 1: reserves 3 bytes for jump instruction
phase error
PUBLC, EXTRN
similar to EXTDEF, EXTREF in SC

Anda mungkin juga menyukai