8086 Programming

INTRODUCTION
Intel 8086 is one of the very successful microprocessors which have been there
from 1978 onwards. It is upward compatible with the advanced Intel processors based on
the IA-32 Architecture, and is the processor which every beginner of microprocessor
studies invariably goes through. It has a fairly complex and quite powerful instruction
set. A good understanding of the hardware register capabilities as well as of the
instruction set is needed to program and get the best from the processor.
Students of microprocessor courses find programming in the assembly language

rather difficult. But programming in the assembly language is important as it gives one a
very clear picture of the internals of the processor. Assembly language programming is
perhaps the best way to study the different features provided in the processor. In certain
situations assembly language programming is the best way of using the processor. It can
be efficient in use of memory and in execution time as well. As such, it may be ideal for
handling processors in embedded systems. The difficulty most people have in assembly
language programming is mainly due to the two views to be understood while writing the
assembly language program. For one thing, the programmer must view the hardware
available clearly in terms of registers and their capabilities. Secondly the programmer
must not lose the algorithmic perspective of the job being done. Managing both these
simultaneously is where the difficulty lies. In the higher level languages one is free to
concentrate on the algorithm only and the compiler will handle the hardware register
details etc. without bothering the programmer. However, if the assembly language
programs are properly commented with reference to the algorithm used, it should not be
too difficult to write an assembly language program, and to understand the logic of the
program at any time later, without ambiguity.
In the present book, sufficient care is taken to make the background required for
programming very clear in the first two chapters. In the rest of the book, adequate
examples are discussed to nail down the various finer aspects of programming. In
writing an assembly language program, normally, text books give a working program and
that is that; no alternatives are discussed. Discussions on the logic of selecting a
particular algorithm and allotting registers to the variables to be handled are not seriously
done. In this book quite a lot of insight into proper register selection and proper
algorithm selection are all discussed adequately. In many instances more than one
program is presented for solving a given problem.
It is to be noted that we have used only the simplest programming mode in the
MASM. Use of helps like tiny, small medium and large models is not done, as the focus
here is more on remaining as close as possible to the processor hardware and not so much
on the study of the advanced features of the tool, namely, of the MASM.
Another important and unique feature of the book is that the programs given here
are fully tested using MASM version 5.10 for assembling; the debug environment is used
for testing. The working of the program in the debug is adequately illustrated with
i
adequate study documents produced from the debug during its operation. One can almost
get the hands on experience while going through these documents.
The book presents programs at various levels of difficulty, from simple to complex.
The stress is however, on number crunching type of programs, although some basic I/O
programs like keyboard handling and simple screen displays are included. Again, it is
also to be noted that the concentration throughout the book is on assembly language
programming at the hardware level. This means no serious note is made on MASM
feature based programming like use of macro libraries or modular programming etc.
which are not functions of the processor hardware. In fact, at the end of chapter 6, the
reader should be in a position to create his/her own library of macros useful in programs
handling large numbers. The individual macros are discussed at length as they serve to
introduce a concept of generating our own instructions to add to the instruction set of the
processor. The entire book is organized as follows:
In chapter 1, a basic introduction to the assembly language programming is

presented along with a study of the tools used for the study. The Appendix 1.A and 1.B
elaborate on certain aspects raised in the chapter. Appendix 1.A refers to the 8085
processor also to bring out the difference between the ALU of the 8085 and 8086
processors in respect of DAA and DAS operations. Appendix 1.B uses the subroutine
which is described later in the chapter 4, and, if necessary, this study may be deferred till
that chapter is gone through.
Thorough descriptions of the register set of 8086 as well as of the instruction set of
8086 are presented in Chapter 2. They have to be fully understood before proceeding to
program in the assembly language. Like a person trying to play chess must be thorough
with the rules of the game before he starts to play, an assembly language programmer
must be completely confident of the register set and register capabilities along with the
instruction set for the processor before beginning to write programs. Examples are given
in this chapter for studying single instructions using the debug all by itself.
In Chapter 3, we see the basics of programming. Generally beginners study

programming by studying specific programs for doing specific jobs. Almost invariably
they end up with the idea – one job, one program. In the Chapter 3, it has been shown
that programming is much more individualistic, almost to the level – one person, one
program. Just like different persons may describe a given situation in different ways, so
can different persons write different programs to do a given job. In this sense
programming is like an art, capable of variations to suit the taste of the programmer.
Illustrating this effectively is the objective of the Chapter.
Chapter 4 discusses the use of macros and subroutines, which are quite useful in a
programming environment. Although they both serve almost similar purposes, namely
reducing repetitions from the point of view of the programmer, and freezing specific type
of useful tasks to reusable tested program units, they have their own differences and these
differences are clearly brought out and illustrated in this chapter.
ii
Chapter 5 is devoted to simple example programs which help the beginners in
understanding various aspects of developing working programs. In this process, one
should observe that if a program works for the first time in the lab, it will not be a good
learning material. Only when one goes through and rectifies errors that are inevitable
with any program, will the learning be complete.
Chapter 6 illustrates the power of the Intel 8086 processor in number crunching.
Very large numbers are handled in this Chapter including large BCD numbers. However,
as a beginner’s learning material, these are not suitable. These are there to show the
capabilities, and to motivate the believers (one need not necessarily go into the details,
but if one believes the details can be worked out with enough patience and is able to see
the results and also verify them.) into getting enthusiastic about assembly language
programming.
The author is grateful to the Nitte Education Trust and the Principal and the staff of
the NMAM Institute of Technology, for providing an encouraging atmosphere where the
author can peacefully pursue his interest. The staff and students of the National Institute
of Technology, Karnataka where the author worked earlier, and where the author was
introduced to the 8086 Processor are gratefully acknowledged for motivating the author
to study the intricacies of assembly language programming. I cannot, of course, miss to
mention the constant support from my family in all my endeavors. I believe the book
will be useful to the staff and students to understand the basics of assembly language
programming.
K M Hebbar
Copyright © 2008 K M Hebbar
iii
1. ASSEMBLY LANGUAGE PROGRAMMING
Use of microprocessors in embedded systems catering to some special equipment or

needs, as well as in general purpose personal computing systems is continuously
increasing. All these microprocessors need to have a lot of system and other application
software programmed into them before they can be used. Embedded systems are
programmed once for all during manufacture, while personal computers are supplied with
system programs initially and may be programmed also during use by the user. This
programming requirement can be met at different levels. The programming may be done
at the machine language level, at the assembly language level or in any one of the high
level languages (HLL’s) like C, Java etc, with progressively increasing ease. Machine
language programming is quite difficult, for it requires not only an intelligent adaptation
of the processor hardware facilities and the instruction capabilities in solving the problem
at hand, but also a clear ability to handle the commands coded in terms of numbers. It is
this difficulty of using a number based command language that is overcome when we use
the assembly language. In the assembly language program (ALP), we use command
words rather than command numbers. For example, the command binary number 01 in
the machine language of the Intel 8085 processor implies the MOVE command; the
number 001 in the register field in this move command implies the register C of the
processor, while the number 000 represents the register B. Thus to cause the data in
register B to be moved to register C, we use the machine language command binary
number: 01 001 000. The assembly language version of this command is: MOV C, B. It
is easily seen that the command in the ALP is much more convenient for us to handle as
compared to the machine language strings of 1’s and 0’s. The processor can then be used
to map the character and word symbols of the ALP to the number symbols of the machine
language (ML). Assembly language thus removes one of the difficulties for the
programmer, for, words of English language or word-like character combinations make it
easier to remember and use the commands as compared to the number symbols for
commands.
The programmer using the AL (assembly language) must still have a complete
knowledge of the register set of the processor and their capabilities, before he can write
an efficient program. Writing an ALP to solve a problem requires thinking in two
distinct levels, one at the processor hardware level, and another at the problem level. Let
us take the simple operation of multiplying two variables. At the problem or algorithmic
level, all that is to be done is to multiply two numbers. At the processor or hardware
level, one has to worry about where the variables are to be placed: if they are to be in the
processor registers or in the memory, and where the result is to be put. The HLL’s differ
from AL in this aspect. When using HLL, one need not be concerned about where, in the
hardware, the variables are to be located. One can directly program in terms of the
variables and the operations to be done on them, that is, think only in terms of the
algorithm and not about implementation details in terms of the processor system
hardware. The conversion of HLL to ML to use the hardware facilities available in a
given processor is done by the HLL compiler, specific to the target processor. It is to be
noted that different compilers may have different levels of program optimization.
However, a specific problem may have special features and a general purpose
1
optimization of an optimsing compiler may not be able to fully exploit these special
features in its optimization. An efficient human programmer may be better in exploiting
such special features. Of course, it requires more effort on the part of the programmer,
but once this effort is put, the resulting gain in the speed of execution of the problem is
available every time the program is used. An ALP can have this advantage over a
compiled program. The raw power of the processor can be best handled only at the AL
level and not so much by the HLL level. Further, when using a HLL, the programmer is
bound by the compiler in respect of the data types he may use. For example, very very
long integers cannot be used. For further discussions on this, you may refer to the website
http://webster.cs.ucr.edu/Articles/GreatDebate/index.html
An assembly language may have one additional advantage. Normally, people

consider the processor as a black box, but assembly language can give an insight to the
working of the processor, making it possible to have some peep into the subunits of the
processor, giving at least a grey box view (if not the complete white box view), of the
processor. Consider the following very unconventional and almost meaningless looking
assembly language program for an 8085 processor:
CPI 0A ; Compare register A with hex number 0A

SBB 2F ; subtract immediate with borrow, the hex number 2F from register A
DAA ; adjust register A for decimal addition! Note DAA is restricted to be used
; only after addition, here we are using after subtraction.
This 3-line program converts the single hex digit in register A to its ASCII
equivalent. A corresponding program given below in 8086 will not do this conversion.
CMP AL, 0A H
SBB AL, 2F H
DAA
If we try to investigate the reason for this difference, we will discover quite a lot
about how the ALU of the two processors differ in their design. (See appendix A for a
brief discussion on these programs).
To study the hardware details to any extent, we should be as close to the hardware as
possible. The closest one can get without serious involvement with the machine language
is through the assembly language. The only other language which can be considered in
this context is the C language, which has some features of the assembly language. But
the assembly language gives the best possible approach to develop an insight into the
hardware of the processors.
Assembly language, however, requires thinking in two levels as we have already

noted, at the algorithmic level, and at the hardware level. People feel it is a serious
difficulty to continuously think at two levels. However, it can be looked at positively, in
that it will improve the mental faculty of concentration and focusing. When an ALP is
written, the instructions themselves show what happens at the hardware level. But how
2
does it relate to the problem or the algorithm at hand is not very clear. After a couple of
weeks, or even days, the programmer himself may not be able, perhaps, to understand
why a certain instruction is present or what it does in the program. To overcome this
difficulty, writing proper comments is a necessity. The instructions will clearly give the
hardware action, but the algorithmic basis for doing that hardware action is what the
comment should say. Comment for comment’s sake, as in the example shown below,
does not say anything beyond what the mnemonic says and should be strictly avoided:
MOV BX, AX ; move the data from reg. AX, to reg. BX
Note the comments are to be separated from the instructions by a semi colon, “;”. In
any line, the assembler will ignore whatever that comes after the semi colon. If the above
instruction has to be commented, the comment, depending on the algorithmic context,
can be something like:
MOV BX, AX ; save AX in BX for later use.
Writing the instructions with accompanying relevant comments at the problem
level will further make the process of thinking at two levels easier than otherwise.
Tools used for the ALP Studies: The basic tools required for studying the 8086
processor at the ALP level, are (i) a macro assembler, MASM, for example, (ii) the
associated linker, and (iii) a debugger, DEBUG, for example. We shall be using the
MASM (version 5.10), which can assemble files with the .asm extension, to produce a
.obj file (and .lst file also, if required), the associated linker which produces a .exe file
from the .obj file(s). The .exe file can be studied in the DEBUG. As we show later,
simple studies (like single instruction studies, for example), which do not need more than
a few instructions can directly be done in the DEBUG itself. We shall look into these
tools one by one.
The DEBUG: The Debug is a low level facility which allows programs to be
assembled as well as executed either step-by-step, tracing the entire register contents on
execution of each instruction or up to a specified break point. The trace facility also
indicates the next instruction to be executed, along with any relevant memory data
associated with the execution of the next instruction. Execution to a break point is also
permitted, in which case, the register contents etc. will be displayed after the execution of
the final instruction before the break. In case of a subroutine the trace through the
subroutine can be suppressed, and the result of the execution of the subroutine can be
seen at the return from the subroutine. Below, is shown, the format of the trace display in
the debug
-t
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 8B4140 MOV AX,[BX+DI+40] DS:0040=0050
-
Color coding is done above for easy identification of the different fields of the trace
display, and the different trace fields are described below:
-t The trace command; note the “-” here is the debug prompt; this prompt is
also seen after the trace operation is completed in the next line (in the 5 th
line of the display above), prompting for fresh command to be issued.
3
AX=0000: The thirteen registers, excluding the Flag registers are displayed and
their contents after the execution of the previous instruction are indicated in
hex.
NV: The eight flag conditions are indicated explicitly as existing after the
execution of the previous instruction, as follows:
Overflow flag: NV- No oVerflow, OV- OVerflow
Direction flag: UP- address increasing (string instructions) DN- address
decreasing
Interrupt flag: EI- Enable Interrupt; DI- Disable Interrupt
Sign flag: PL- positive or PLus; NG- NeGative
Zero flag: ZR- ZeRo; NZ- Not Zero
Auxiliary carry: AC- Auxiliary Carry present; NA- No Auxiliary carry present
Parity: PE- indicates Even Parity; PO- indicates Parity Odd
Carry Flag: CY- indicates CarrY present; NC- indicates No Carry present
1377:0100 8B4140 Shows the next instruction address and the next instruction
machine language coding.
MOV AX,[BX+DI+40] Next instruction in the assembly language, ready for
execution if a ‘t’ command is to be given next.
DS:0040=0050 The relevant word data (at DS:[BX+DI+40] which is DS:[40] here)
indicated as 50 at that location in hex.
General features of the debug: The debug prompt is the ‘–’ sign, as we have
already seen. All commands of debug are single letter commands. The commands may
have one or two parameters normally (sometimes a list of numbers), to represent address,
data or register names. The parameters are just given as hex numbers or register names
following the command letter with a space to separate the two, if there are two
parameters. The commands of debug are not case sensitive. ‘A’ or ‘a’ will carry out the
same command in the debug. A200, A 200, a200 or a 200 are all the same in the debug.
Similarly u200 210 or U 200 210 are also the same; at least one blank space separating
two parameters of the command is obviously a must, but space/s separating the command
character and the first parameter is optional. Only 16 bit register names may be used as
parameters in the command. For example, ‘ax’ can be used with a register command, but
not ‘ah’ or ‘al’; ‘rax’ or ‘r ax’ is a valid command, but not ‘ral’ or ‘r ah’. We shall now
look into some of the commands.
Table 1. Some Debug Commands
command ch parameters
assemble a [address]
dump d [address range]
enter e address [list]
go g [= address] [list of alternative addresses]
proceed p [=address] [number]
quit debug q none
register r [register]
trace t [= address] [value]
unassemble u [range]
Help ? none
4
Note:
1. All commands are single characters as shown in column 2 of the table.
The third column shows the parameters. Optional parameters are shown
within square brackets.
2. When optional parameters are not given, a default value based on the
current conditions will be taken.
3. Full details of the commands can be had in debug using the command ‘?’.
Several examples in the next chapter will clarify the use of these commands. An
example of the study of the g command is shown here. In this exercise you can see the
method of using not only the g command but several other commands as well.
-a ; assemble at the default address (no parameter given)
1377:0100 mov ax, 1234
1377:0103 ja 10a ; jump if no carry, to 10a hex location
1377:0105 jb 110 ; jmp if carry, to 110
1377:0107 jz 120 ; jmp if zero flag is set, to 120
1377:0109 ; simply press ‘enter’ to exit from ‘assemble’ command
-r ; display registers
1377:0100 B83412 MOV AX,1234
-g =103 10a 110 120 ; start from 103 and halt at 10a, 110 or 120

DS=1377 ES=1377 SS=1377 CS=1377 IP=010A NV UP EI PL NZ NA PO NC
1377:010A 03DB ADD BX,BX; note, instruction at 100 is not
; ; executed, execution is from 103 only
-rip ; show reg ip contents and alter as indicated.
IP 010A ; ip shown as is
:100 ; alter it to 100
-rf
NV UP EI PL NZ NA PO NC -cy ; set carry flag
-r
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO CY
1377:0100 B83412 MOV AX,1234
-g = 103 10a 110 120

DS=1377 ES=1377 SS=1377 CS=1377 IP=0110 NV UP EI PL NZ NA PO CY
1377:0110 BB2000 MOV BX,0020
-rip
IP 0110
:100
-rf
NV UP EI PL NZ NA PO CY -zr nc ; set zero and clear carry.
5
-r

DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL ZR NA PO NC
1377:0100 B83412 MOV AX,1234
-g =103 10a 110 120

DS=1377 ES=1377 SS=1377 CS=1377 IP=0120 NV UP EI PL ZR NA PO NC
1377:0120 0000 ADD [BX+SI],AL DS:0000=CD
; the data at DS:[BX+SI] is shown
-q ; quit debug
Also see Appendix B for a study of the proceed command of the debug.
The Macro Assembler MASM, and the Linker, LINK: The assembly language
program is written using the edit command in the DOS environment, with a filename
with the file extension .asm. It can then be assembled using the command MASM
filename; for example, to assemble the file hex_to_bcd.asm, the command will be:
Masm hex_to_bcd;
The ; at the end will make the assembler ask no further questions about the files to be
generated. It will only generate the object file, a machine language file with a file
extension .obj; for the above command shown in italics, the file generated (considering
there are no errors in the ALP) will be:
Hex_to_bcd.obj
The object file will have machine codes for the ALP, but the segment will not be
initialized. This means, the file will be re locatable, depending on how the segments are
specified. Every effective address in the program or for data is relative to one of the
segment (cs:, ds:, es: or ss)base address values. The program will be completely
executable with all segments initialized, and if it is a multi module program, with all the
modules properly linked up, by using the link command. For a single module program,
as the one indicated above, the link command is: link hex_to_bcd;
The result of the link operation will produce an executable hex_to_bcd.exe file, if
everything is OK. This executable file can directly be worked or studied step-by-step or
using break points and so on, in the debug environment. It can also be directly executed
to the finish as a command under DOS.
The Assembler Directives: The following skeleton of an ALP shows the main features
of a simple .asm file, indicating many of the assembler directives used.
1 data segment
2 val_hex dw 1234h, 567h, 0abch
3 val_dec dw 3 dup (?)
4 data ends
5 stak segment stack
6 dw 256 dup(?)
7 tos label word
8 stak ends
9 code segment
10 assume cs:code, ds:data, es:data, ss: stak
6
11 start: mov ax, data
a. mov ds, ax
b. mov es, ax
c. mov ax, stak
d. mov ss, ax
e. mov sp, offset tos
f. ;program instructions
12 code ends
13 end start
Color coding: highlights: segment named data; segment named stak; and
segment named code. End of compilation indication to the assembler
Character colours: black (followed by red) symbol/variable names
Red: Assembler Directives; Blue: Data or processor hardware related information
We shall now look at the above skeleton program line-by-line.
Line 1: The assembler directive used is the segment. This makes the assembler
open up a new segment. The word data is the name given to the segment.
The format for the segment opening is name_of_segment followed by the
word segment.
Line2: val_hex: name of the first variable stored.
dw: is the assembler directive, define_word, signifying the variable val_hex
and the rest of the data on that line are word or 16-bit data
db or dd in that position will indicate define_byte and define_double word
(32-bits) respectively.
The blue numbers in the line initialize the 3 variables in the beginning of the
segment named ‘data’ to the values given. Values can be given in binary
(b), decimal (default) hexadecimal (h)
Line 3: val_dec: name of the variable stored after the 3 words of line 2
dw: assembler directive define word
3 dup (?) : this indicates 3 items (here 3 words) of data are provided
here, without being initialized. The directive means 3 duplicate (any data
word). Such un-initialized locations are kept for storing the results of the
program. If required, these locations can all be loaded with any data like all
0s by simply changing this part to 3 dup (0).
Line 4: data: The variable (segment name), as we have seen already.
ends: the assembler directive to end (or close) the segment (data segment in
this case).
Line 5: stak: a variable or label name.
segment: segment directive, indicating open a new segment as we have
already seen (and name it as stak)
stack: consider it as the stack segment. This is mainly information to the
user of the program, like a comment. The assembler essentially does not do
anything about it.
Line 6: dw: define word directive, which we have already seen.
256 dup (?): un-initialized words 256 (= 100hex). This is the memory
provided for the stack in this program.
Line 7: tos: variable or label name
label: directive to consider tos as a label name.
7
word: this indicates to the assembler that the label tos is a word pointer.
Line 8: Stak ends: marks the end of the segment named stak.
Line 9: code segment: indicates the start of a new segment named code.
Line 10: assume: this is an assembler directive. The program is written in the code
segment and to start the program, the first instruction is to be fetched from
the location pointed by cs:ip. This requires both cs and the ip must be
available in the beginning. Defining the cs segment is taken care of by this
assembler directive. At the time of linking, this will be taken note of.
Defining ip we shall see in the next line. The directive assume indicates
what are the segments used by the program and what are their names. Note,
cs cannot be managed by the program, because the program itself cannot
start without a cs being defined. The other segment registers can be loaded
in the program itself and hence, even though indicated in the assume
directive, are indicated only for user reference, and are to be specifically
managed in the program. They are simply like comments. Only cs is of
significance to the assembler, linker. Some assemblers do take care of
loading the other segments also using slightly different type of directives.
cs:code: indicates the assembler to use the segment named code as the cs.
ds:data, es:data, ss:stak: information for the user as already stated above.
Some more ideas on this can be had by looking at the pr.asm program
discussed in appendix B at the end of this Chapter.
Line 11: start: this is a label used for reference purposes.
mov ax, data: the first instruction of the program. This instruction is for
loading the different segment registers. The segment name data is to be
loaded into ds and es segment registers. So this segment name is moved to
register ax and from there to registers ds and es in the succeeding
instructions.
Lines a. to e: These instructions take care of initializing segment registers ds, es, and
ss, and also of the stack pointer sp.
Line f: Line f and onwards, the real useful operations of the program are written.
Line 12: code ends: Line 12 tells the assembler that the segment named code has to
be ended using the ends or end segment directive.
Line 13: end: The end directive tells the assembler that it is the end of the assembly.
start: is a reference to the start label, telling the assembler to load the
effective address or the offset address of the start label to the ip, and start
executing from that address.
The description above is, in brief, an introduction to the assembly language

programming (ALP). We have, in this chapter studied the importance of assembly
language programming, in the context of embedded systems, as well as in the context of
writing efficient programs, and special programs. We have seen the basics of the debug
and the use of MASM and LINK programs. With this knowledge of the tools used, we
can now proceed in the next chapter to the study of instruction set and other details of the
processor to arm ourselves fully to writing good programs at the ALP level.
8
The Segment Definitions and Segment Integrity Aspects: With the assembler
directives segment and ends, it may look as though the assembler MASM will take care
of maintaining the segment limits and prevent other segment operations over writing and
damaging the integrity of any segment. Unfortunately, it is not so. The assembler simply
converts the program as given, into a suitable machine language program. But if inside
the program, there happen to be instructions that violate any segment integrity, either
going beyond the segment defined areas or crossing over into regions defined for other
segments, the assembler will not be able to check on this, as this happens at the run-time,
and is not known at the assemble-time. The example shown below indicates this feature.
This means the programmer should work out in advance the requirement (maximum
requirement) for the data, extra and the stack segments and make adequate provisions for
these requirements in his program. The hardware of the processor will not check these
aspects during the running of the program. Later versions of the processor, 286 onwards
have guarded against these eventualities in the protected mode of operation, by defining
clearly the segment limits and providing hardware to prevent such infringements on the
segment integrity and segment boundaries. The example demo of the 8086 unprotected
system is given below:
8086 does not control the segment size

nor does it protect the integrity of the segments.
These are the programmer’s responsibilities.
;the program to indicate 8086 has no control over the segment overflows, nor
;does it maintain segment integrity.
data segment
base dw 5 dup(3)
data ends ; segment as defined has only 5 words in it with data 0003
;
code segment
assume cs: code; ds; data
start: mov ax, data
mov ds, ax
lea bx, base
mov cx, 9
back: mov ax, [bx]
add ax,10
mov [bx], ax
add bx,2
loop back ; this loop forces 9 words into the data segment
int 01
code ends
end start
;Testing in debug
-u 0 18
0B46:0000 B8450B MOV AX,0B45 ; note cs = 0B46, is just (ds + 1)
0B46:0003 8ED8 MOV DS,AX
0B46:0005 8D1E0000 LEA BX,[0000]
0B46:0009 B90900 MOV CX,0009 ; this forces 9 iterations of loop.
0B46:000C 8B07 MOV AX,[BX]
0B46:000E 050A00 ADD AX,000A
0B46:0011 8907 MOV [BX],AX
0B46:0013 83C302 ADD BX,+02
0B46:0016 E2F4 LOOP 000C
9
0B46:0018 CD01 INT 01
-g 0e
AX=0003 BX=0000 CX=0009 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

DS=0B45 ES=0B35 SS=0B45 CS=0B46 IP=000E NV UP EI PL NZ NA PO NC
0B46:000E 050A00 ADD AX,000A
-d 0 1f
0B45:0000 03 00 03 00 03 00 03 00-03 00 00 00 00 00 00 00 ................

0B45:0010 B8 45 0B 8E D8 8D 1E 00-00 B9 09 00 8B 07 05 0A .E..............
;defined data segment, extra space undefined, space defined as code segment.
-g 18
AX=45C2 BX=0012 CX=0000 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

DS=0B45 ES=0B35 SS=0B45 CS=0B46 IP=0018 NV UP EI PL NZ NA PE NC
0B46:0018 CD01 INT 01
-d 0 1f
0B45:0000 0D 00 0D 00 0D 00 0D 00-0D 00 0A 00 0A 00 0A 00 ................

0B45:0010 C2 45 0B 8E D8 8D 1E 00-00 B9 09 00 8B 07 05 0A .E..............
; data segment now has 9 words, last word over written on code segment
; changing the program itself as the listing after execution shows below:
-u 0 18
0B46:0000 C2450B RET 0B45 ; code segment over written
0B46:0005 8D1E0000 LEA BX,[0000]
0B46:0009 B90900 MOV CX,0009
0B46:000C 8B07 MOV AX,[BX]
0B46:000E 050A00 ADD AX,000A
0B46:0011 8907 MOV [BX],AX
0B46:0013 83C302 ADD BX,+02
0B46:0016 E2F4 LOOP 000C
-q
The program above brings out one of the reasons for incorporating protection
features in the processor. In the absence of protection, a user program may destroy itself
during the execution. It is not difficult to see that one user’s program may also destroy
another’s program. Advanced processors, including upgrades of 8086 starting from
80286 have the protection features included in the hardware design of the processor. .
EXERCISES
1. What are the advantages of programming in the assembly language?

2. Why people find it difficult to write assembly language programs as compared to
writing in HLL? How could the difficulty be mitigated?
3. What is the purpose of writing comments? Give an example of a wrong ALP
comment, and indicate how this comment can be corrected.
4. Show how debug can be used to study the instructions: (i) ADD AX, BX.
(ii) MUL BX (iii) ASL CX, 1 (iv) XOR AX, AX
5. How would you use MASM to obtain the list file also along with the object file?
==00==
10
APPENDIX 1.A
8085 Operations for the hex to ASCII conversion:

The data in the register A is a single hex digit, which is 0, or 1, or 2, or…., or 0F hex.
Comparing this with 0A hex will divide it into two classes, 0 to 9 hex will set the carry
flag, while 0A to 0F hex will have the carry flag reset. The second instruction will cause
the number 0D0 hex, complement (that is, not function) of 2F hex, with the carry
inverted to be added to the contents of register A. It is recalled that subtraction is done
by causing addition of complement of the subtrahend (with the inverted carry input) to be
added to the accumulator. This would imply, if the accumulator data was initially 0 to 9
hex, 0D0 hex will be added to the number, else 0D1 will be added based on the carry
input to the second instruction. This produces the following results:
Group 1. If A register had 0 to 9 initially, it will now have 0D0 to 0D9, with auxiliary
carry and the carry flags both reset.
Group 2. Else, if it had 0A to 0E hex, it will now have, 0DB to 0DF hex with both
carry and auxiliary carry flags cleared
Group 3. Else, for the input 0F hex, it will now have 0E0 hex with auxiliary carry set
and carry reset.
Now we should look at the operation of the DAA. As we know, DAA logic is based
on the accumulator data and the carry and the auxiliary carry flags. For numbers in the
group 1 above, DAA will add 60 hex to get numbers 30 to 39 in the accumulator.
Numbers in group 2 above will have 66 added to them to get 41 to 45 hex. The number
in 3 above will also have 66 added (because of auxiliary carry flag) and it will become 46
hex. It could be easily seen that the result is the conversion of hex 0 to 0F in the
accumulator to ASCII characters ‘0’ to ‘F’
8086 operations of this program: Shown below is a demo of the 8086 program on
the same lines, along with results of execution of the program, step by step
CASE 1: LISTING 13D5:0000 B008 MOV AL,08
13D5:0002 3C0A CMP AL,0A
13D5:0004 1C2F SBB AL,2F
13D5:0006 27 DAA
WORKING OF THE PROGRAM CASE 1:
11
13D5:0000 B008 MOV AL,08
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0002 NV UP EI PL NZ NA PO NC
13D5:0002 3C0A CMP AL,0A

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0004 NV UP EI NG NZ AC PO CY
13D5:0004 1C2F SBB AL,2F

AX=00D8 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI NG NZ AC PE CY
13D5:0006 27 DAA
AX=003E BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ AC PO CY
CASE 3: LISTING 13D5:0000 B00F MOV AL,0F
13D5:0002 3C0A CMP AL,0A
13D5:0004 1C2F SBB AL,2F
13D5:0006 27 DAA
WORKING OF THE PROGRAM CASE 3:

13D5:0000 B00F MOV AL,0F
AX=000F BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
13D5:0002 3C0A CMP AL,0A

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0004 NV UP EI PL NZ NA PE NC
13D5:0004 1C2F SBB AL,2F

AX=00E0 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI NG NZ NA PO CY
13D5:0006 27 DAA
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ NA PO CY
Two of the 3 cases are studied here in these debug demonstration, case 1 of the entry
being in the group 0 to 9 hex, and group 3 for the number 0F hex. In group 1, we get 3E
instead of 38 (6 more), and in group 3 we get 40 (6 less), instead of 46. In group 2 we
get, on SBB, the AL register having numbers DB to DF hex (this case is not shown
above), and 66 will be added to give the correct ASCII code in this case. When the result
goes wrong, the culprit is seen to be the auxiliary carry flag shown highlighted (yellow)
in the response to the SBB instruction in the demo shown above.
The interesting feature to be noted here is that when performing subtraction using 2’s
complement addition, the carry of this ADD operation needs to be complemented to get
the real borrow of the subtraction at every bit, as can be easily verified. However, neither
8085 nor 8086 indicates carry at each bit stage. Only the carry at the half byte stage
(auxiliary carry) and the final byte stage carry are used. In the 8085 processor,
adjustment for decimal subtraction is not provided, while 8086 provides for this
operation. Because of this, 8085 ALU (arithmetic logic unit) does not bother about
correcting the auxiliary carry for subtraction, because as a rule, auxiliary carry is not used
after subtraction in 8085. We have used it here, sort of illegally. 8086 keeps the
auxiliary carry at the correct value, to accommodate the DAS operation. Due to this
feature, we have 6 added to the correct value in group 1 numbers, and 6 less in the group
12
3 number. An equivalent program for the 8086 could work if we complement auxiliary
carry after the SBB instruction to simulate the 8085 ALU behavior. However this would
require 3 additional instructions:
LAHF ; load the lower byte of flag register to AH register
XOR AH, 04 ; this will complement auxiliary carry flag
SAHF ; store the AH register as the lower byte of flag register.
This makes the program a little bigger. Using the ideas of this program a more efficient
assembly language program can certainly be designed. A program more suitable for hex
to ASCII conversion in 8086, based on these ideas can be:
CMP AL, 0A H
CMC
ADC AL, 30 H
DAA
The reader could easily make out how this program works. The program can
conveniently be used as a macro (see chapter 4 for macros) to translate a hex digit in AL
to its ASCII equivalent. The program avoids the conditional jump that would normally
be used for this purpose of converting hex to ASCII as shown below, and conditional
jump would require more time to process. The conventional hex to ASCII program is as
follows:
CMP AL, 0A H
JB DOWN
ADD AL, 7
DOWN: ADD AL, 30 H
But this program uses a conditional jump instruction which generally takes more time
for execution.
Exercises of this type can give a lot of insight into the design aspects of the processor
sub-units.
APPENDIX 1.B
STUDY OF THE PROCEED COMMAND OF DEBUG
Note: the file is named pr.asm. To understand the segment relations, see the list file also
(pr.lst file).
The pr.asm program studied
Data_here segment
asc db 16 dup(0)
data ends
code segment
assume cs:code ; note other segments not indicated. It will
; make the program rather difficult to follow or debug.
start: mov ax,data_here
mov es, ax ; ‘data_here’ now becomes the extra segment, es.
; ds segment will be separate now.
mov di,offset asc ; offset in the ‘data’ (es) segment
cld
13
mov cx,16
mov bl,0
back: mov al, bl
call hasc
stosb
inc bl
loop back
int 1
hasc proc near
cmp al,10
cmc
adc al,30h
daa
ret
hasc endp
code ends
end start
The pr.exe program in the debug environment

-u 0 20
13D6:0000 B8D513 MOV AX,13D5

13D6:0003 8EC0 MOV ES,AX
13D6:0005 BF0000 MOV DI,0000
13D6:0008 FC CLD
13D6:0009 B91000 MOV CX,0010 ;this is in hex (16 decimal = 10 hex).
13D6:000C B300 MOV BL,00
13D6:000E 8AC3 MOV AL,BL
13D6:0010 E80700 CALL 001A
13D6:0013 AA STOSB
13D6:0014 FEC3 INC BL
13D6:0016 E2F6 LOOP 000E
13D6:0018 CD01 INT 01
13D6:001A 3C0A CMP AL,0A ; procedure starts here
13D6:001C F5 CMC
13D6:001D 1430 ADC AL,30
13D6:001F 27 DAA
13D6:0020 C3 RET ;procedure ends
-g 10

DS=13C5 ES=13D5 SS=13D5 CS=13D6 IP=0010 NV UP EI PL NZ NA PO NC
13D6:0010 E80700 CALL 001A ; note es ≠ ds and es = ‘data_here’
-p 4 ; execute p command 4 times serially.
DS=13C5 ES=13D5 SS=13D5 CS=13D6 IP=0013 NV UP EI PL NZ NA PE NC
13D6:0013 AA STOSB ; p1 over

DS=13C5 ES=13D5 SS=13D5 CS=13D6 IP=0014 NV UP EI PL NZ NA PE NC
13D6:0014 FEC3 INC BL;p2 over

DS=13C5 ES=13D5 SS=13D5 CS=13D6 IP=0016 NV UP EI PL NZ NA PO NC
13D6:0016 E2F6 LOOP 000E ;p3 over, p4 will complete the loop

DS=13C5 ES=13D5 SS=13D5 CS=13D6 IP=0018 NV UP EI PL NZ AC PO NC
13D6:0018 CD01 INT 01 ; p4 over; INT 01 to be executed next
-d es:0 f ; note the result is in es segment and not in ds
13D5:0000 30 31 32 33 34 35 36 37-38 39 41 42 43 44 45 46 0123456789ABCDEF
14
;the display at the end of the above line, indicates the characters printed
-q
The pr.lst file
Microsoft (R) Macro Assembler Version 5.10 1/19/7

Page 1-1
0000 data_here segment

0000 0010[ asc db 16 dup(0)
00
]
0010 data ends

0000 code segment
assume cs:code
0000 B8 ---- R start: mov ax,data_here
0003 8E C0 mov es, ax
0005 BF 0000 R mov di,offset asc
0008 FC cld
0009 B9 0010 mov cx,16
000C B3 00 mov bl,0
000E 8A C3 back: mov al, bl
0010 E8 001A R call hasc
0013 AA stosb
0014 FE C3 inc bl
0016 E2 F6 loop back
0018 CD 01 int 1
001A hasc proc near
001A 3C 0A cmp al,10
001C F5 cmc
001D 14 30 adc al,30h
001F 27 daa
0020 C3 ret
0021 hasc endp
0021 code ends
end start

Symbols-1
Segments and Groups:
N a m e Length Align Combine Class
CODE . . . . . . . . . . . . . 0021 PARA NONE

DATA_HERE . . . . . . . . . . 0010 PARA NONE
Symbols:
N a m e Type Value Attr
ASC . . . . . . . . . . . . . . L BYTE 0000 DATA_HERE Length = 0010
BACK . . . . . . . . . . . . . . L NEAR 000E CODE
HASC . . . . . . . . . . . . . . N PROC 001A CODE Length = 0007
START . . . . . . . . . . . . . L NEAR 0000 CODE
@CPU . . . . . . . . . . . . . . TEXT 0101h
15
@FILENAME . . . . . . . . . . . TEXT pr
@VERSION . . . . . . . . . . . . TEXT 510
26 Source Lines
26 Total Lines
11 Symbols
47090 + 412122 Bytes symbol space free
0 Warning Errors
0 Severe Errors
16
2. REGISTER SET AND INSTRUCTION SET OF 8086
In this chapter, we shall look at the register set of 8086, as accessible to the
programmer, and then we shall have a detailed look at the instructions; some of the
instructions do require a little bit of appreciation of the actual situation where the
instructions become useful (this is a general feature of all CISC – complex instruction set
computing – type of processors). Where required, such situations are examined with
worked out examples.
Register set of 8086 accessible to programmers
1. General purpose: ax (16 bits) or ah:al (8 bits each) – accumulator

bx (16 bits) or bh:bl (8 bits each) – base register
cx (16 bits) or ch:cl (8 bits each) – counter and loop control
register
and dx (16 bits) or dh:dl (8 bits each) – extended accumulator
and I/O address register
2. Pointers and index registers: si (16 bits) – source index register
di (16 bits) – destination index register
bp (16 bits) – base pointer (for stack frame base)
sp (16 bits) – stack pointer (pointer for stack top)
3. Segment registers: cs (16 bits) – code segment base pointer
ds (16 bits) – data segment base pointer
es (16 bits) – extra segment base pointer
ss (16 bits) – stack segment base pointer
4. Other utility registers: ip (16 bits) – instruction pointer
f – or status (16 bits) – flag or status register
Note that in 8086, the data of 16-bits are called words (words are used to
represent data or address), and 8-bits are called bytes or half words (bytes are to represent
data or ASCII characters) and 4-bits are called nibbles (nibbles can be used to represent
BCD digits or HEX digits). Also note that the register names are not case sensitive, when
using ALP. That means ax and AX will both indicate the same register in ALP. Also
note that registers can be indicated in capital letters or in lower case in the assembly
language programs.
Discussion on the use of registers: We will start with the general purpose
registers. Although the registers ax, bx, cx and dx are called general purpose registers for
handling data (either in 8 bits or in 16 bits), these have some special capabilities. The
registers ax (16 bits), and al (8 bits) are used as accumulators, capable of doing certain
specific operations. When the registers AX or AL are used this way, they are implied in
the instruction without being specifically indicated. These registers act as one of the
source operands as well as the destinations for the result of the instruction. For example:
MUL CX will mean the word in ax (implied, and not directly specified in the
instruction) is to be multiplied by the word in cx (specified in the instruction) to get a
16
double word product and the result will go to implied registers dx (high word of product
in the extended accumulator) and ax (low word of product in the accumulator).
There are many such instructions which use ax, and al as implied accumulators, as
explained later while discussing the instructions in detail. In case of multiplication and
division of word size data, register dx is used as the high word extension of the
accumulator for the double word product in multiplication and for double word dividend
in division. The registers ax/ al are used as the accumulator in string instructions like
lodsb/w or stosb/w etc.
The register bx, is called the base register. As a 16 bit register, it can be used to
store an address. In the instruction XLAT (translate), it is used as the implied offset
address in the data segment, where the look up table for translation is located. The other
register associated with the XLAT instruction is the register al; al stores the byte pointer
in the look up table before execution of XLAT and after execution the data in the table
goes to al. The instruction can be used to realize any random Boolean function with up to
eight inputs and eight outputs.
The register cx is called the counter register. It is used as a counter in handling

arrays and with string instructions when they are repeatedly executed, with the prefix
‘rep’. It is also used as a loop counter, while executing loops a number of times. The
register cl is used as a counter to control the number of bits of shifts/ rotations in shift and
rotate instructions.
The register dx is used as an extension to the accumulator as already mentioned.

It has also an additional function of storing the I/O port address for indirect addressing of
the ports. The port address can be up to 16 bits. When the port address does not exceed
eight bits, direct addressing (with address given directly in the instruction itself) of the
ports is possible, but when the port address is more than eight bits, addressing must be
only through register dx indirectly.
The general purpose registers ax (ah, al), bx (bh, bl), cx (ch, cl) and dx (dh, dl) are
used for 16 or 8-bit data handling and are capable of performing arithmetic and logic
operations, shift and rotate operations on the data stored in them. In this sense, they are
all general purpose data handling registers.
Pointers and Index registers: We shall now look at the next set of registers,
which are five in number, and which are used for handling mainly addresses. They are
all 16-bit registers which can store the 16-bit offset address in a segment. Two of them
are index registers: si and di (source index and destination index); and the other three are
pointer registers: bp, sp and ip (base pointer, stack pointer and instruction pointer).
The registers si and di normally carry addresses of data in the data segment; in
case of string instructions, however, si refers to the source address in the data segment
and di refers to the destination address in the extra segment. We shall later discuss the
method addressing data using a segment with an offset address in registers.
17
The registers bp (normally) and sp (always) refer to address of data in the stack
segment, while the ip refers always to the address of the instruction in the code segment.
The registers si, di, bp and sp can all handle 16-bit arithmetic and logic
operations, like the registers, ax, bx, cx and dx. Although arithmetic addition and
subtraction will be useful for handling addresses, it is difficult to see how multiply, divide
and logical operations could be used for address handling. It simply means that these
registers can also serve as data registers for 16-bit data handling, when they are not used
for address handling. They have no arrangement for handling two separate 8-bit data,
unlike al, ah of ax.
The register ip, is meant exclusively for pointing to the next instruction to be
executed. As all instructions are in the segment cs, ip is always used with cs to generate
the instruction address. Although ip can take part in 16-bit addition/ subtraction, using an
instruction which does not appear to be doing this operation; ip cannot enter into any
other arithmetic (like multiply) or logic (like ex-or) operations. In short, it cannot at all
be used to handle data. Whatever add/ subtract it can do is simply limited to getting the
address of the next instruction by adding or subtracting an integer from the current
contents of ip. We shall see further about this while discussing the jump instructions.
Segment registers: There are four segment registers: cs (code segment), ds (data
segment), es (extra segment) and ss (stack segment). The register cs, as we have already
seen, indicates where the program instructions are located. The data and the extra
segments are indicators of the locations for storing the data used by the program
including results. The need for two segments for data storage will be brought out when
we discuss string instructions. The register SS is indicating the memory area used for
stack purposes. As we shall see later, stack is a very useful data structure which makes it
convenient to perform certain operations, during the execution of programs.
The flag register: The flags are single bits of information based on the nature of
the result of the results of the immediately preceding arithmetic or logic operation. The
8086 updates six flags when any arithmetic or logic instruction is executed.
The overflow flag indicates the addition or subtraction of numbers interpreted as

signed, has resulted in a value which cannot be represented in the destination register.
One more bit is necessary to correctly represent the value.
Example: Consider 4 bit numbers in 4-bit registers to simplify the explanation.
Let the numbers added be -4 and -5, represented as 1100 and 1011 in the 2’s complement
notation in the 4-bit registers. Note, the processor does not know about signed numbers.
All it does is to add the binary numbers to produce the result 0111 with a carry of 1.
However, the result 0111 in the destination register is not the correct result, as it is +7
which cannot be the sum of -4 and -5. The correct result is -9. Recall -8 is the lowest
negative number that can be accommodated in the 4-bit register. The number -9 requires
5 bits or more, as it can be written only as 10111. However, this result is not satisfied by
considering the carry as the overflow bit – it just happens in this case, and may not
always happen, as shown in the next example. In this example, we add +7 (0111) and +2
(0010) the result obtained is 1001, with a carry of 0. This result has no carry, but if we
18
interpret the data as signed 2’s complement number of 4-bits, it becomes negative 7,
which is not correct. The correct result is 01001 and requires 5-bits or more to be
represented correctly. An overflow has occurred, but carry bit is not set. It is thus seen
that 2’s complement overflow and carry are not the same; therefore a separate indication
for 2’s complement add/ subtract overflow is needed. This is the overflow flag, and
when it gets set, it indicates that the preceding add/ subtract operation has resulted in a
number which cannot be represented completely by the result register, if the numbers are
interpreted as signed numbers in the 2’s complement notation.
The carry flag indicates the same feature, for numbers interpreted as unsigned.
This is the carry resulting from the normal binary add. It indicates as unsigned numbers,
the numbers added/ subtracted have produced a result which cannot be represented fully
in the destination register.
The zero flag indicates that add/ subtract operation has produced a result which is
zero. The zero applies only to the data stored in the destination register, and not to the
actual result of the arithmetic operation. To clarify this statement, consider adding 80
hex to 80 hex. The result is 100 hex. But what is stored in the 8-bit register will only be
00 hex, and the zero flag will be set, along with carry flag. The following experiment in
debug environment shows this fact.
-a ; start assembling at default address of 100 in the CS
1377:0100 add al,ah

1377:0102
-rax
AX 0000
:8080 ; load ax with 8080, that is, ah and al with 80 hex each
-r ; display all registers

1377:0100 00E0 ADD AL,AH ; add al and ah

DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 OV UP EI PL ZR NA PE CY
-q ; quit debug
Note: The complete result of this addition is not zero as seen

by the overflow and the carry flags being set. But the ZF indicates
simply that the result that has gone into the destination register AL
is zero. When using the zero flag as indicating the result of addition
is zero, one has to be cautious of this possibility.
Exercise: Show that after subtraction of two numbers if this

flag is checked, it will be set only when the two data are exactly
identical, no matter what the data are and no matter what the overflow
and the carry flags say. But if the check is done after addition, and
if the intention of the check is to see if the two data are negatives
of each other, then overflow flag setting can cause problems. Give
examples to support these statements and check them in the debug
environment.
[Hint: There is only one specific case which can have such a
problem on addition: that is the case indicated in the example above.]
19
The sign flag indicates that the result in the register is storing a negative number,
if interpreted in the 2’s complement mode. That is, the leading or the leftmost bit in the
result register is 1.
The auxiliary carry flag indicates in case of 8-bit or 16-bit arithmetic operation,
the presence or absence of a carry at the L.S. (lowest significant) Digit or L.S. nibble.
This flag is useful in applying corrections, as we shall see later, in connection with
decimal arithmetic instructions.
The parity flag indicates if the number of 1’s in the result register is odd or even.
It is used in data communication type of application for carrying out parity check on the
data received, and also for producing parity bits during transmission. For this purpose,
parity flag indicates the parity of the lower byte of the result in case of 16 bit operations,
especially, the 16-bit add operation produces a parity flag which corresponds only to the
lower 8 bits of the result. Communication uses 8-bit operations normally. See the
demonstration below:
-a
1377:0100 add ax, 0

1377:0103 add ax, ax
1377:0105 add ax, ax
1377:0107 add ax, ax
1377:0109 add ax, ax
1377:010B
-rax
AX 0000
:00f0
-r
AX=00F0 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
1377:0100 050000 ADD AX,0000; on adding AX has even parity
; AL also even parity
-t5

DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP EI PL NZ NA PE NC
1377:0103 01C0 ADD AX,AX ; on adding, AX has even parity
; AL has also even parity
AX=01E0 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

1377:0105 01C0 ADD AX,AX ; AX has even parity
; AL has odd parity
AX=03C0 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

; AL also has even parity

; AL has odd parity
20
DS=1377 ES=1377 SS=1377 CS=1377 IP=010B NV UP EI PL NZ NA PE NC
1377:010B 0000 ADD [BX+SI],AL DS:0000=CD
; AX has even parity
; AL also has even parity
-q
In addition, there are two more flags that the user can manipulate. These are: the
direction flag and the interrupt flag. The direction or the D flag is used to control the
direction of the string operation in the string type of instructions. The interrupt or I flag
is used for interrupt control purposes as we shall see later.
There is still one more flag which is not accessible to the user, and that is the
trace or the T flag, which is essentially controlled by the system.
The Flag register details are shown below bitwise (each column indicates a bit):
X X X X OV DIR INT TRACE SIGN ZERO X AUX. CY X PARITY X CY
The Flag register has 16 bits which are shown above. X’s are don’t
cares.
The experiment suggested below is an attempt to study the flag register details in
the debug environment.
-a ; assemble at cs:100
1377:0100 pushf ;push flag register onto stack

1377:0101 pop ax ;pop this into AX
1377:0102 xor ax,0ed5 ;toggle the eight flag bits
1377:0105 push ax ;put this result back in stack and then
1377:0106 popf ;into the flag register
1377:0107 ;’enter’ pressed to end assembly.
;Now execution of the program.
-r ;get the initial register contents.

1377:0100 9C PUSHF
-t5 ;trace next 5 instructions.
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000

1377:0101 58 POP AX

1377:0102 35D50E XOR AX,0ED5
;Watch the highlighted flag register contents, see how they match the
;indications in register AX. Watch the change in parity bit on execution
;of this instruction. Reason out why.
AX=3CD7 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
1377:0105 50 PUSH AX
21
AX=3CD7 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000
1377:0106 9D POPF
AX=3CD7 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

DS=1377 ES=1377 SS=1377 CS=1377 IP=0107 OV DN DI NG ZR AC PE CY
-q
Explain the program and the flag conditions at the high lighted places.
Exercises: There are in all 8 flags which can be user controlled, and the ‘xor’ing
above reverses all these 8 flag bits. Use the method of PUSH AX followed by POPF to
identify one by one, which flag register bit corresponds to which flag. Find also which
value of the flag bit represents which condition of the flagged entity.
The data types that can be used
Although the computer basically works on unsigned binary number system, there
are instructions which can manipulate the data in registers/ memory as signed numbers,
decimal (binary coded or BCD) numbers or as ASCII characters for display purposes and
so on. We will now look into the details of these different data types handled by the
instruction set of 8086 processor. The data size handled by the processor is 8-bits and
16-bits as we have already seen.
The data can be simple unsigned binary numbers. In this case, carry and zero
flags will be of interest.
Question: In case of subtraction of unsigned numbers, do you think the sign flag
can give any meaningful indication about which of the two numbers is bigger? [Hint:
No. Can you support it with examples? Also try to reason out which flag(s) give this
information about the greater of the two source numbers, in the subtraction.]
The processor 8086 can also handle signed binary numbers of 16-bits or 8-bits. In
this case, the overflow, sign and zero flags will be of interest.
Question: Reason out which flags will be required to find out if the minuend is
greater than, the same as or less than the subtrahend in case of subtraction/ comparison of
two signed numbers. [Hint: All three flags indicated above. Give the logic for the
comparison in terms of these three flags]
Two 2-digit BCD (binary coded decimal) numbers can be handled for addition or
subtraction, in which, the result of binary addition or subtraction should be in the register
AL. In general, the operation for BCD number addition/ subtraction requires two
instructions to be executed. The first indicating the normal add, subtract binary operation
with the result in register AL, and the second to correct or adjust the result of the binary
operation in AL to be consistent with the result of BCD operation. If correction for
operation of addition is required, the add must be followed by the instruction DAA
(decimal adjust AL for addition), and if correction for subtraction is desired, the subtract
instruction must be followed by DAS (decimal adjust AL for subtraction). This BCD
22
correction process involves the use of the carry and the auxiliary carry flags as we shall
see later.
Question: In the above description, we have not included the comparison
operation. Comparison, as you know, is based on the result of subtraction. Do you think
we should use DAS after the comparison instruction while comparing two BCD
numbers? [Hint: The answer is ‘No’. Try to reason out this issue.]
It should be noted that only provision is made for addition and subtraction of 2-
digit BCD numbers, and these numbers are to be unsigned only. No direct way is
provided for handling signed BCD numbers or handling larger BCD numbers. Direct
multiplication and division of BCD numbers is not provided for, in 8086. The
instructions AAA, AAS, AAM and AAD provide for decimal addition, subtraction,
multiplication and division essentially at single digit level, in two stage operations, as we
shall see later while discussing these instructions.
There are some (actually very little) provisions in 8086 for handling ASCII
(American Standard Code for Information Interchange) characters. The console
keyboard and the monitor or other input/ output devices handle characters in the ASCII
code, as they have to take care of numbers, as well as textual material. However,
interpretation of data as ASCII is mainly at the operating systems level. Certain interrupt
operations interpret the data in the registers AL, AH and AX as ASCII characters.
Details of these operations we shall look into later.
Instruction Set Architecture

8086 uses what is known as the two address register/ memory type of
architecture. Many operations handled by microprocessors fall into operations on two
operands producing a single result. Operations like arithmetic ADD or logical AND etc.
are having two input operands and produce one result. In all, to handle these three data,
we require three registers or memory locations. Intel 8086 processor specifies only two
locations for such instructions in terms of registers or in terms of one register and one
memory. Use of memory to specify both source operands is not permitted in 8086. Both
these (register and register or register and memory), specify the source operands. Then
where does the result go? The result goes to the first source operand specified, may be
memory or register, replacing that operand. Note this becomes quite convenient if we
have to do an operation like ADD, on a series or a chain of data stored in memory or
registers. Suppose we want to add contents of AX, BX and CX registers. In such a
situation, the two instructions: ADD AX, BX, followed by ADD AX, CX will give us
the total of BX, CX and what was originally in the register AX.
If the operation involves only a single operand like increment or rotate etc., the
result naturally replaces that operand.
This type of instruction set architecture is known as register/ memory architecture.
Other instruction set architectures used in other processors can be of the zero
address type (operands, top two in stack, result replacing the stack top operand, used in
calculator type of systems), accumulator type or single address architecture(one operand
23
assumed to be in a special register called accumulator, the other source operand specified
by the instruction, and the result replacing the accumulator data – used mainly in 8-bit
processors), or three address architecture, where three separate locations in registers or
memory are specified, two for the source operands and one for the result. The data
sources, in three address machines will normally be only in registers (this is known as
load/ store or register/ register architecture, where memory data can only be loaded to a
register or register data can only be stored in memory, while only register data can
participate in arithmetic logic operations). This architecture is used mainly in RISC
(reduced instruction set computer) type of machines. Other types of architectures are also
there, but they are less commonly used.
Addressing Modes and Addressing

We have seen many instructions need to specify two source operands. The
processor 8086 permits a wide variety of methods for addressing the operands. If the
operand is directly available in a register, the method of specifying this data is known as
register direct. Consider the instruction: ADD AX, [BX]. The instruction has three
parts; the first part ADD is known as the opcode part. The second part, AX, is the Source
1, while the third, [BX], is source 2. Here the data for source 1 is the content of the
register AX, and is directly to be taken from the register AX; this method of specifying
the data can be done in the assembly language programming by simply indicating the
name of the register where the data is available. The source 2 is indicated as BX within
square brackets. This means the data is not directly available in the register BX; the data
is in the memory at the address pointed by the data contained in the register BX. The
method of addressing used here is known as register indirect. The net result of
executing this instruction is to add to the data in register AX, the contents of the memory
at the address indicated in register BX. Original contents of AX will be replaced by the
sum now. This would involve a memory read first, then the ALU operation and finally
replacing of the data in AX by writing into the register AX, the result obtained from the
ALU of the processor. Had the instruction been ADD [BX], AX, on execution the AX
data will not be altered; the result from ALU will now be written to the memory at the
address indicated in BX. While ADD AX, [BX] involves only a memory read operation,
ADD [BX], AX will involve a memory read for the source 1, and a memory write back to
the same location indicated by BX. Note, instructions of the type ADD [BX], [SI] are not
permitted, as they involve both operands in memory. Such operations are permissible
only in memory/ memory type of instruction set architecture like in Motorola 68000
processor.
Exercise: Study register direct and indirect addressing in the debug environment.
Immediate addressing: Immediate addressing is done by giving the data directly

in the instruction. For example, in the instruction ADD AX, 1234h, the data to be added
to the contents of AX (register direct) is directly the hex number 1234, appearing
immediately after AX in the instruction. The result will be stored in the location of the
first source operand, that is, in register AX. Note that with this type of addressing, the
first operand must be a place where the result can be stored, which means that the
immediate data will always be the second operand.
24
Based or indexed addressing: Based or indexed addressing gives useful method
of addressing data arrays stored in memory. Although both do the same thing, the two
different names provide two different situations where this method of addressing can be
used. An example of this type addressing is: INC [BX+2] which can also be written as
INC 2[BX]. This means read the data from memory at the address which is 2 more than
the address contained in register BX; increment the data, and write it back to the same
location. Consider a byte array starting at address 1200 hex. The first element of the
array is available by indirect addressing through the register BX, with BX having the
address 1200 hex, while any ith byte of the array is addressable with the index i-1 in the
array using the address [BX+i-1]. Here we have the base address of the array in BX and
the index number is specified as an unsigned integer in the instruction. This is known as
based addressing. Now consider another situation where we have 2 different byte arrays;
one starting at 1200 hex and the other, say, at 1380 hex. And we want to handle, say, the
5th byte entry of each of the arrays. Then we will store 5 -1, or the index number 4 in the
BX register, and use the address 1200h[BX] to refer to the 5th byte of the first array and
1380h[BX] to refer to the 5th byte of the second array. In this case we have stored the
index number in the register and the base address of the array is specified as an
immediate data to be added to the BX register content to get the address of the data in
memory. Since the index number is now in the register BX, this method of addressing is
known as indexed addressing. Note that the processing required to get the memory
address in both based as well as indexed addressing is the same. The difference is only in
our interpretation in terms of the problem requirement.
Based and indexed addressing: Intel 8086 permits a combination of based and
indexed addressing with an immediate number in the instruction. For this purpose, BX
and BP are considered as base registers, while SI and DI are considered as index
registers. Any base register and any index register along with an additional offset
number can be used for the addressing in this mode. The address 2[BP+DI] is a valid
address in this mode; this address will correspond to the address obtained by adding the
contents of BP and DI registers and then adding the number 2 to this sum. The address
2[BX+BP] will be invalid and so will the address 2[SI+DI] as in both these cases we do
not satisfy the combination one base and one index register within the square brackets.
There are several ways in which the based indexed instructions with displacements can
be written in an assembly language program. Exercise: Use the debug environment and
try to find four valid methods of writing this based indexed addressing instruction, with
displacement, in the assembly language. (Hint: Try various forms like 2[bx][di], 2[bx,
di], 2[bx]di etc and find which gets rightly unassembled as [bx+di+2]).
The role of the segment registers in memory addressing: The Intel 8086
processor provides for addressing of memory with 20 bits of address, 00000 hex to fffff
hex. We have so far been seeing that addresses can be contained in 16-bit registers (like
in register indirect as well as based or indexed addressing etc.). Then how is the 20-bit
address produced? The answer lies in the fact that not just one register, but two 16-bit
values are used in producing the 20 bit address. Of these, we have already seen in the
above section on addressing modes, how the 16 bit memory address is generated based
25
on the instruction. The address thus obtained is known as the EA (effective address) or
the offset address of the data. This address is now combined with a value derived from
one of the segment registers to produce the 20-bit absolute address of the data. The
derivation of the 20-bit address is done by extending 16-bit segment register to 20-bits by
simply adding four binary 0’s or just one hex zero at the end of the segment register
content. The 16-bit address, EA, obtained from the instruction is now added to the 20-bit
address derived from the segment register to get the 20-bit absolute memory address.
Any carry resulting from this addition is simply ignored by the processor. Which of the
four segment registers goes with which effective address? IP is the EA of the instruction
to be fetched. It always goes with the code segment. Normally EA of any data goes with
the data segment DS. However, in case of string instructions the EA of the source is
associated with the SI register, and this goes with the data segment. The destination is
associated with the DI register and this goes with the extra segment ES. The use of
separate segments with the source and data addresses permit the movement of data from
any source address to any destination address in the full 20 bit address range of the
memory. Attaching both source and destination addresses to the same segment register
will give a total address range of only 16 bits from the segment base, as the effective
address is only 16 bits. Please note that the offset address in any segment can only be in
terms of 16-bits, which means a segment can accommodate 65536 bytes of data or
program.
Segment and effective address register combinations permitted: IP register is

always combined with CS, as we have already seen. BX, SI and DI registers normally go
with the data segment register DS. In string instructions, DI always goes with ES.
Register SP always goes with SS and BP normally goes with SS. Wherever we have
used the word ‘normally’, other segment registers can be used, provided they are
explicitly stated in the instruction by using segment override instruction prefixes.
Following is an example of the study of use of instruction prefixes in the debug
environment. It also shows how the confusion about word or byte of memory data is
resolved in an ambiguous situation.
-a 100
1377:0100 mov bx,200
1377:0103 cs: ; segment override prefix. Normally
; [BX] would use the DS segment.
1377:0104 mov word ptr [bx], 34; ambiguity removed, (word at address bx
; intended); data intended is not the
; byte 34, but the word 0034
1377:0108 cs:
1377:0109 mov ax,[bx] ; no ambiguity, AX is a word reg; so BX is a
word ; pointer.
1377:010B
-u100 10a ; unassemble between 100 and 10a
1377:0100 B80002 MOV BX,200

1377:0103 2E CS: ; segment override prefix
1377:0104 C7073400 MOV WORD PTR [BX],0034
1377:0108 2E CS: ; segment override prefix again.

1377:0109 8B07 MOV AX,[BX]
-r
26
1377:0100 BB0002 MOV BX,0200
-t3
1377:0103 2E CS:
1377:0104 C7073400 MOV WORD PTR [BX],0034 CS:0200=75C2

1377:0108 2E CS:
1377:0109 8B07 MOV AX,[BX] CS:0200=0034

DS=1377 ES=1377 SS=1377 CS=1377 IP=010B NV UP EI PL NZ NA PO NC
-q ; quit the debug and go to DOS environment.
Exercise: The DS register in an 8086 has the hex number 1234. What memory
address in 20 bits is indicated at the effective or offset address of 123A hex? If the
segment override prefix, ES: is used, which effective address will point to the same
physical memory location when ES has the hex address 123B? [Ans: 1357A hex and
11CA hex]
Instruction set: The instructions of 8086 are discussed in detail below. The
instruction description and the operation details of the instructions are taken essentially
from the Intel IA-32 Software Developer’s manual, vol. 2. To start with, the various
types of 8086 instructions available are listed, and then the instructions available in each
type and the details of their operation are presented. There are several types of
instructions as listed below:
1. Data transfer instructions (including I/O transfers)

2. Binary arithmetic instructions
3. Decimal (BCD, ASCII) arithmetic instructions
4. Logical instructions
5. Shift and rotate instructions
6. Control transfer instructions
7. String instructions
8. Flag control instructions
9. Segment register instructions
10. Miscellaneous instructions
Instruction Details
27
1. Data Transfer instructions (including I/O transfers): Data Transfer
instructions essentially copy the data and do not affect any flags (except of
course the POPF instruction which modifies all the flags as per the stack top
word)
• MOV: stands for move, it is actually copy, that is, when data is moved
from one register source to a destination register, source is not destroyed,
only, there will be a copy of this data in the destination register.
Examples: mov ax, bx ; data in bx is copied into ax
mov dx, [bx]
mov [si + 34], cx
mov bx, 1234h; 1234 hex goes to reg. bx
mov wordptr [bx+si + 2], 23 ; 23 decimal or 0017 h is moved.
• XCHG: Exchange instruction exchanges data between registers or
between register and memory
Examples: xchg bx, dx ; reg-reg exchange
xchg ax, [bx] ; reg-memory exchange
xchg bx, [1234] ; reg-memory with direct addressing
• PUSH: Push causes the data in the source register to be copied on to the
stack top.
Examples: push ax
push [bx] ; push memory word at address in bx, on to the stack.
Push [1234] ; push word at effective address 1234 to the stack
pushf ; push the flag register on to the stack top.
• POP: Pop causes the stack top moved to (that is, removed from the stack
and loaded onto) the destination register or memory specified by the
instruction.
Examples: pop bx ; stack top moved to reg. bx
pop [bx + si + 4] ; stack top to memory at the address given
pop [1234] ; pop to memory at effective address 1234
popf ; pop the stack top onto the flag register
Among all the data transfer instructions popf is the only instruction that
affects and modifies the flags.
• IN: The IN instruction reads from an input port into AL or AX (only these
two registers), to be specified in the instruction. The port address is
generally in the register DX. But if it is 8-bits or less, then it can also be
directly given in the instruction.
Example: in ax, 28h; read 16-bit port at address 28 hex into reg ax
in ax, dx; read 16-bit port at address in dx into reg ax
in al, 15 h; read 8-bit port at address 15 hex into reg al
in al, dx; read 8-bit port at address in dx
• OUT: The OUT instruction outputs the data in register AX or AL (only)
to be specified in the instruction to the output port indicated directly in the
instruction if the port address is 8 bits or less, or in the register DX (for
addresses 16 bits or less)
Examples: out 16h, ax; write to output port at address 16h from reg. ax
out dx, ax; write to output port at address in dx from reg. ax
28
out 23h, al; write to output port at address 23h from reg al
out dx, al; write to output port at address in dx from reg al
• CBW: Convert byte to word. The source register al and the destination ax
are both implied and not specifically mentioned in this instruction. This
instruction is used to extend the 8-bit integer (signed number) in reg. al to
16-bit integer in ax. The process is called sign extension. If the number in
al is positive, ah will be loaded with 00 hex, else ah will be loaded with ff
hex.
Example: cbw
Exercise: Study the instruction in debug
• CWD: Convert word in reg. ax (implied and not stated in the instruction)
to double word in regs. dx:ax (also implied and not stated). That is, sign
extend from ax into dx:ax.
Example: cwd
Exercise: Study the instruction in debug
2. Binary Arithmetic instructions: All binary arithmetic instruction update the

flag register based on the result of the operation performed.
• ADD: Description
Adds the first operand (destination operand) and the second operand
(source operand) and stores the result in the destination operand. The
destination operand can be a register or a memory location; the source
operand can be an immediate, a register, or a memory location.
(However, two memory operands cannot be used in one instruction.)
When an immediate value is used as an operand, it is sign-extended to
the length of the destination operand format. The ADD instruction
performs integer addition. It evaluates the result for both signed and
unsigned integer operands and sets the OF and CF flags to indicate a
carry (overflow) in the signed or unsigned result, respectively. The SF
flag indicates the sign of the signed result. Operation:
DEST ← DEST + SRC;
The OF, SF, ZF, AF, PF, and CF flags are set according to the result.
Examples: ADD AX, BX
ADD [BX + 4], DX
ADD CX, 2[SI]
ADD DX, 123 H
• ADC: Adds with carry. Same as ADD with the following change in the
operation:
DEST ← DEST + SRC + CF;
• SUB: Subtracts. Follows the same rules as ADD with the following
change in the operation:
DEST ← DEST – SRC;
• SBB: Subtract with borrow. Same as SUB with the following change in
the operation:
29
DEST ← DEST – (SRC + CF);
• CMP: Compare two operands.
Compares the first source operand with the second source operand and
sets the status flags in the FLAGS register according to the results. The
comparison is performed by subtracting the second operand from the
first operand and then setting the status flags in the same manner as the
SUB instruction. When an immediate value is used as an operand, it is
sign-extended to the length of the first operand. Operation:
temp ← SRC1 − SRC2;
In case an immediate value is used, then
temp ← SRC1 − Sign Extend (SRC2);
Modify Status Flags; (* Modify status flags in the same manner as the
SUB instruction*)
Flags Affected:
The CF, OF, SF, ZF, AF, and PF flags are set according to the result.
Examples: CMP AX, 24 H; (24 is sign extended to 16 bits before
subtraction, because AX is a 16-bit register)
CMP BYTEPTR[BX], -24 H; (no sign extension done, data in
bytes are being handled)
CMP BX, SI
CMP AL, [BX]; (BX will be taken only as a byte pointer, as
AL is a byte register)
The meaning of sign extension is seen in the following debug study:
-a
1377:0100 cmp ax,25

1377:0103 cmp ax,-25
1377:0106 cmp ax,db
1377:0109 cmp ax,-db
1377:010C
-u 100 10B
1377:0100 3D2500 CMP AX,0025

1377:0103 3DDBFF CMP AX,FFDB
1377:0106 3DDB00 CMP AX,00DB
1377:0109 3D25FF CMP AX,FF25
• MUL: Description: Performs an unsigned multiplication of the first

operand (destination operand) and the second operand (source
operand) and stores the result in the destination operand. The
destination operand is an implied operand located in register AL
or AX (depending on the size of the operand); the source
operand is located in a general-purpose register or a memory
location. The action of this instruction and the location of the
result depend on the opcode and the operand size. Operation:
IF byte operation
THEN
AX ← AL ∗ SRC
30
ELSE (* word operation *)
DX:AX ← AX ∗ SRC
Flags Affected
The OF and CF flags are set to 0 if the upper half of the result is 0;
otherwise, they are set to 1.
The SF, ZF, AF, and PF flags are undefined.
Examples: MUL BX
MUL WORDPTR [BX + DI]48 H
MUL BYTEPTR [SI]
MUL CL
• IMUL: Integer (signed) multiply. Similar to MUL, except the data are
considered as signed integers.
Flags are also affected similarly as for MUL.
• DIV: Divide the unsigned integer dividend in the accumulator by the
unsigned integer divisor specified in the instruction. If the divisor
specified is a word register or word memory, the dividend is considered to
be the double word in DX:AX and the quotient of division will be in
register AX, with the remainder in register DX and the divisor word
specified in the instruction will not be altered. If the divisor specified in
the instruction is a byte register or byte memory, then the accumulator will
be the word register AX. The quotient of the division will be in AL
register, and AH register will have the remainder. In case of word
division, if the divisor word is not greater than the part of the dividend
word in DX, then the quotient obviously will not fit into the register AX.
Execution of DIV instruction in such a case will cause a division overflow
exception to be generated, and operating system should take care of this
exception. We shall see later, what ‘exception’ means. Similarly if the
byte divisor specified in the byte divide instruction is not greater than the
byte part of the dividend contained in the register AH, division overflow
exception will be generated.
Examples of DIV instruction: DIV BX;
DIV WORDPTR [DI];
DIV CL;
DIV BYTEPTR [SI];
Exercises: Test the DIV instruction in the debug environment. See what
happens when the data is such as to generate division overflow exception.
[Hint: here is an example of such a study.
-a
1377:0100 div bl
1377:0102
-rax
AX 0000
:1234
-rbx
BX 0000
:000f
-r
31
AX=1234 BX=000F CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
1377:0100 F6F3 DIV BL
; Note AH > BL and division overflow exception will occur.
-t ; trace the execution of this instruction
Divide overflow ; this is now in the DOS environment as seen by the

; DOS prompt appearing below, the debug prompt
‘-‘ ; should have been seen otherwise in front of
this.
C:DOCUME~1\acer\MYDOCU~1\MYFILE~1\REF~1.MAT\DOSPRO~1> ; DOS prompt.
On executing the instruction, the overflow exception is seen to

cause an exit from debug to the DOS environment, and the words
‘Division overflow’ get displayed in the DOS environment.]
The CF, OF, SF, ZF, AF and PF flags are undefined, when DIV is
executed
• IDIV: Integer divide, same as DIV but the data and the results are
considered as signed integers.
The CF, OF, SF, ZF, AF, and PF flags are undefined, when IDIV is
executed.
There is an interesting doubt that can come up with signed division.
Suppose we divide -7 by +3, there is no doubt about the sign of the
quotient, here the quotient can only be negative. The confusion is about
the magnitude of the quotient and the sign of the remainder. In the given
example, one can say the quotient is -3 and the remainder is +2, or the
quotient is -2 and the remainder is -1, as both these solutions satisfy the
basic requirement that (quotient)*(divisor) + remainder = dividend, and
that the magnitude of the remainder is less than the magnitude of the
divisor;
(-3)*(+3) + (+2) = -7; also (-2)*(+3) + (-1) = -7; which is correct?
Exercise: Try to see, in the debug environment, what the processor
actually gives; try to reason out logically if that is alright. [Hint: You will
find the processor gives a result corresponding to doing the division of the
magnitudes involved, and then attach signs as necessary, based on the
signs of the given data. In the given example, the division of magnitude 7
by magnitude 3 is done to get the result 2 for the quotient, and 1 for the
remainder. Since the dividend and the divisor have opposite signs as
given, the quotient becomes negative, while the remainder will carry the
same sign as that of the dividend. It is logically OK, because you are
distributing negative numbers to three people, and after distributing 2
negative to each, you have remaining with you, 1 negative. So the
quotient is -2 and the remainder is -1.
The CF, OF, SF, ZF, AF, and PF flags are undefined.
Below, we see the demonstration in the debug environment.
-a
32
1377:0100 idiv bl
1377:0102
-rax
AX 0000
:fff9 ; this makes AX, the dividend = -7
-rbx
BX 0000
:803 ; this makes the divisor in BL = +3; (we ignore BH)
-r
AX=FFF9 BX=0803 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
1377:0100 F6FB IDIV BL
-t
AX=FFFE BX=0803 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

; we see the result: quotient in AL = -2, and remainder in AH = -1.
; The devisor in BL is unaltered. (The data in BH which is not relevant to the
program is also not altered.)
-q ; quit debug.
Note: In the above experiment if initially you load into AX some number
like FABC H, and keep BL the same, you will see divide overflow
occurring.
• INC: Increment register or memory. This involves only a single operand.
Adds 1 to the destination operand, while preserving the state of the CF
flag. The destination operand can be a register or a memory location. This
instruction allows a loop counter to be updated without disturbing the CF
flag. (If we use an ADD instruction with an immediate operand of 1 to
perform an increment operation that does update the CF flag.) Operation:
DEST ← DEST + 1;
The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set
according to the result.
• DEC: Similar to INC, this instruction does the operation:
DEST ← DEST – 1;
The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set
according to the result.
• NEG: Replaces the value of operand (the destination operand) with its
two’s complement. (This operation is equivalent to subtracting the
operand from 0.) The destination operand is located in a general-purpose
register or a memory location.
DEST ← – (DEST)
Flags Affected
The CF flag set to 0 if the source operand is 0; otherwise it is set to 1. The
OF, SF, ZF, AF, and PF flags are set according to the result.
33
3. Decimal (BCD, ASCII) arithmetic instructions: 8086 provides for handling
decimal digits in byte size, either as 2-bigit BCD (unsigned) or single-digit ASCII
character byte ( 30-39 Hex in order, standing for 0-9 BCD ). The instructions
DAA and DAS are for handling 2-digit BCD for addition and subtraction, while
the instructions AAA, AAS, AAM and AAD are for handling single digit ASCII
data for BCD digits for addition, subtraction, multiplication and division
respectively. It is to be noted that inputs from the keyboard or other input
devices, as well as outputs to monitor, printer and other output devices will
usually be in ASCII code, so the four ASCII adjust instructions above, starting
with the characters AA, facilitate the handling of the BCD digits in the ASCII
character code, for add, subtract, multiply and divide operations. It is also to be
noted that all the six instructions stated above have an A as the middle character.
This A stands for ADJUST. This implies the operation of add, subtract, multiply
and divide are not done by these instructions; they are done separately by the
normal ADD, SUB, MUL and DIV instructions considering the data as normal
binary. What these instructions do is, to adjust the result of binary operation, to
match the result of decimal operation. We now look into the details.
• DAA: Decimal adjust accumulator for addition
Description: Adjusts the sum of two packed BCD values to create a
packed BCD result. The AL register is the implied source and destination
operand. The DAA instruction is only useful when it follows an ADD
instruction that adds (binary addition) two 2-digit, packed BCD values and
stores a byte result in the AL register. The DAA instruction then adjusts
the contents of the AL register to contain the correct 2-digit, packed BCD
result. If a decimal carry is detected, the CF and AF flags are set
accordingly. Operation: A complete description of the operation is as
follows:
old_AL ← AL;
old_CF ← CF; AL & CF are saved in temporary registers
CF ← 0;
IF (((AL AND 0FH) > 9) OR AF = 1)
THEN
AL ← AL + 6;
CF ← old_CF OR (Carry from AL ← AL + 6);
AF ← 1;
ELSE
AF ← 0; The first IF ends here
IF ((old_AL > 99H) OR (old_CF = 1))
THEN
AL ← AL + 60H;
CF ← 1;
ELSE
CF ← 0;
Flags affected: The CF and AF flags are set if the adjustment of the value
results in a decimal carry in either digit of the result (see the “Operation”
34
section above). The SF, ZF, and PF flags are set according to the result.
The OF flag is undefined.
An experimental study of DAA using an ALP converted into an
executable program with MASM and LINK and execution of the program
in the debug environment is presented below.
The assembly language program studied is given below. (Note the data chosen to
be added to the byte 87 [1000 0111] in BL register in the nine cases.)
code segment
assume cs:code
start: mov bl, 87h ; this is the addend
mov cx, 9 ; 9 different augends are chosen
assume ds: code
mov ax, cs
mov ds, ax ; initialise data segment; note this method
cld
lea si, augends
back: lodsb ; string load byte, without ‘rep’ prefix.
; note cx (count reg) is not relevant here
add al, bl ; get the binary sum
daa ; correct the sum for decimal addition
; note, data in ah is unaffected by this inst.
loop back
int 01
;
augends db 12h; no cy, no ac, no 'abcdef' hex in the sum
db 19h; no cy, ac, no 'abcdef' in the sum
db 91h; cy, no ac, no 'abcdef' in the sum
db 32h; 'b' in msd, no cy, no ac
db 16h; 'd' in lsd, no cy, no ac
db 96h; cy and 'd' in lsd
db 69h; ac and 'e' in msd
db 99h; ac and cy and no ‘abcdef’ in the sum
db 67h; sum becomes 'ee'
code ends
end start
The execution of the program in debug environment:
-u 0 1e ;unassemble up from 0 to 1e hex in the code segment

13D5:0000 B387 MOV BL,87
13D5:0002 B90900 MOV CX,0009 ; 9 different data added to 87
13D5:0005 8CC8 MOV AX,CS
13D5:0007 8ED8 MOV DS,AX ; DS is made the same as CS
13D5:0009 FC CLD ; instruction to be studied yet
13D5:000A 8D361600 LEA SI,[0016]; yet to be studied
13D5:000E AC LODSB ; yet to be studied
13D5:000F 02C3 ADD AL,BL
13D5:0011 27 DAA
13D5:0012 E2FA LOOP 000E ; yet to be studied
13D5:0014 CD01 INT 01 ; end execution & return to Debug
From here it is actually data sitting in the code segment and interpreted as
instructions (unassembled) under the ‘u 0 1e’command
13D5:0016 12 19 ADC BL,[BX+DI]; 2 data bytes 13D5:0018

91 XCHG CX,AX ; 1 data byte 13D5:0019 32 16
96 69 XOR DL,[6996] ; 4 data bytes 13D5:001D 99
CWD ; 1 data byte 13D5:001E 67 DB
67 ; 1 data byte
35
-g 11 ; execute until (and excluding) the instruction at 11 hex.
; Stop just before DAA for the first data, that is, after ADD AL, BL. ; From
here, the program is traced for every data.

DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ NA PE NC
13D5:0011 27 DAA
; no change in result; BCD or binary addition give the same result
13D5:0012 E2FA LOOP 000E

DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000E NV UP EI NG NZ NA PE NC
13D5:000E AC LODSB ; second data

DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI NG NZ NA PE NC

AX=13A0 BX=0087 CX=0008 DX=0000 SP=0000 BP=0000 SI=0018 DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ AC PE NC
13D5:0011 27 DAA
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ AC PE CY
; note the modification here, 66 h added to result of binary add; why?
13D5:000E AC LODSB; third data

DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI PL NZ AC PE CY

DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 OV UP EI PL NZ NA PE CY
13D5:0011 27 DAA
; note here, 6 added to MSD. Reason?
13D5:000E AC LODSB; fourth data

AX=1332 BX=0087 CX=0006 DX=0000 SP=0000 BP=0000 SI=001A DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F OV UP EI PL NZ NA PE CY

AX=13B9 BX=0087 CX=0006 DX=0000 SP=0000 BP=0000 SI=001A DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ NA PO NC
13D5:0011 27 DAA
AX=1319 BX=0087 CX=0006 DX=0000 SP=0000 BP=0000 SI=001A DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ NA PO CY
; here also 6 added to MSD; why?
13D5:000E AC LODSB ; fifth data

AX=1316 BX=0087 CX=0005 DX=0000 SP=0000 BP=0000 SI=001B DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI PL NZ NA PO CY
36
AX=139D BX=0087 CX=0005 DX=0000 SP=0000 BP=0000 SI=001B DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ NA PO NC
13D5:0011 27 DAA
AX=1303 BX=0087 CX=0005 DX=0000 SP=0000 BP=0000 SI=001B DI=0000
; 66 added to binary sum; why?
13D5:000E AC LODSB ; sixth data

AX=1396 BX=0087 CX=0004 DX=0000 SP=0000 BP=0000 SI=001C DI=0000

AX=131D BX=0087 CX=0004 DX=0000 SP=0000 BP=0000 SI=001C DI=0000
13D5:0011 27 DAA
AX=1383 BX=0087 CX=0004 DX=0000 SP=0000 BP=0000 SI=001C DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 OV UP EI NG NZ AC PO CY
; 66 added to result of binary addition; why?
3D5:000E AC LODSB ; seventh data

AX=1369 BX=0087 CX=0003 DX=0000 SP=0000 BP=0000 SI=001D DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F OV UP EI NG NZ AC PO CY

AX=13F0 BX=0087 CX=0003 DX=0000 SP=0000 BP=0000 SI=001D DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ AC PE NC
13D5:0011 27 DAA
AX=1356 BX=0087 CX=0003 DX=0000 SP=0000 BP=0000 SI=001D DI=0000
; 66 added, reason out why
13D5:000E AC LODSB ; Eighth data

AX=1399 BX=0087 CX=0002 DX=0000 SP=0000 BP=0000 SI=001E DI=0000

DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 OV UP EI PL NZ AC PO CY
13D5:0011 27 DAA
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 OV UP EI NG NZ AC PO CY
; also 66 added; why?
13D5:0012 E2FA LOOP 000E

DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000E OV UP EI NG NZ AC PO CY
13D5:000E AC LODSB ; last data
AX=1367 BX=0087 CX=0001 DX=0000 SP=0000 BP=0000 SI=001F DI=0000

DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=000F OV UP EI NG NZ AC PO CY

AX=13EE BX=0087 CX=0001 DX=0000 SP=0000 BP=0000 SI=001F DI=0000
13D5:0011 27 DAA
37
AX=1354 BX=0087 CX=0001 DX=0000 SP=0000 BP=0000 SI=001F DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ AC PO CY
; 66 again! Why?
13D5:0012 E2FA LOOP 000E ; not executed.

-q ; quit debug
Exercise: Write a comprehensive analysis of what happens when DAA is

executed, so that the results of binary addition are converted to results of
BCD addition. Note, the processor does not know BCD. Its ALU only
does binary operations. BCD is only the user’s interpretation, oblivious to
the processor hardware. Note the two hex digits of the sum and the AC
and CY flags completely determine, whether 00, 06, 60 or 66 h is added to
the binary sum, to make the resulting sum match the result of BCD
addition. The carry of the BCD addition is available as carry after the
DAA is executed. You can also execute the program with a jump
instruction at the end and trace through the program as many times as you
wish and try several data, to get more information on the performance of
DAA
• DAS: Adjusts the result of the subtraction of two packed BCD values to
create a packed BCD result. The AL register is the implied source and
destination operand. The DAS instruction is only useful when it follows a
SUB instruction that subtracts (binary subtraction) one 2-digit, packed
BCD value from another and stores a byte result in the AL register. The
DAS instruction then adjusts the contents of the AL register to contain the
correct 2-digit, packed BCD result. If a decimal borrow is detected, the CF
and AF flags are set accordingly.
Operation:
old_AL ← AL;
old_CF ← CF; AL and CF stored in temporary registers.
CF ← 0;
IF (((AL AND 0FH) > 9) OR AF = 1)
THEN
AL ← AL − 6;
CF ← old_CF OR (Borrow from AL ← AL − 6);
AF ← 1;
ELSE
AF ← 0; The first IF ends here.
IF ((old_AL > 99H) OR (old_CF = 1))
THEN
AL ← AL − 60H;
CF ← 1;
ELSE
CF ← 0;
Example: The execution of the sequence of instructions SUB and DAS is
shown below, with details before and after the execution of each
instruction.
SUB AL, BL: Before: AL=35H BL=47H FLAGS(OSZAPC)=XXXXXX
38
After: AL=EEH BL=47H FLAGS(0SZAPC)=010111
DAA Before: AL=EEH BL=47H FLAGS(OSZAPC)=010111
After: AL=88H BL=47H FLAGS(0SZAPC)=X10111
Flags Affected by the DAS instruction:
The CF and AF flags are set if the adjustment of the value results in a
decimal borrow in either digit of the result (se e the “Operation”
section above). The SF, ZF, and PF flags are set according to the result.
The OF flag is undefined.
• AAA: ASCII adjust AL after addition.
The instruction AAA, adjusts the sum of two unpacked BCD values(or
even ASCII values, as the AAA destroys the upper nibble of the result of
AL register and does not depend on CY flag for its operation) to create an
unpacked BCD result. The AL register is the implied source and
destination operand for this instruction. The AAA instruction is only
useful when it follows an ADD instruction that adds (binary addition) two
unpacked BCD values and stores a byte result in the AL register. The
AAA instruction then adjusts the contents of the AL register to contain the
correct 1-digit unpacked BCD result. If the addition produces a decimal
carry, the AH register increments by 1, and the CF and AF flags are set. If
there was no decimal carry, the CF and AF flags are cleared and the AH
register is unchanged. In either case, bits 4 through 7 of the AL register are
set to 0. The operational details are as follows:
IF ((AL AND 0FH) > 9) OR (AF = 1)

THEN
AL ← AL + 6;
AH ← AH + 1;
AF ← 1;
CF ← 1;
ELSE
AF ← 0;
CF ← 0;
AL ← AL AND 0FH;
Flags Affected:
The AF and CF flags are set to 1 if the adjustment results in a decimal
carry; otherwise they are set to 0. The OF, SF, ZF, and PF flags are
undefined.
Below is shown an example of executing AAA following an ADD
instruction in the debug environment
-a ;assemble at 100 (default)
13D5:0100 mov al, 36 ;ASCII character ‘6’
13D5:0102 mov bl, 39 ;ASCII character ‘9’
13D5:0104 add al, bl ;binary add
13D5:0106 aaa ;adjust for unpacked BCD after ASCII add
13D5:0107
-u 100 106 ; unassemble between 100 and 106

13D5:0100 B036 MOV AL,36 ; ASCII value taken here
13D5:0102 B339 MOV BL,39
39
13D5:0104 02C3 ADD AL,BL
13D5:0106 37 AAA
- r ;show initial register contents

13D5:0100 B036 MOV AL,36
-t4
;execute and trace four instructions
13D5:0102 B339 MOV BL,39

13D5:0104 02C3 ADD AL,BL

13D5:0106 37 AAA
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ AC PO CY
-q ; quit debug
Note: It will help if AH is zero before executing AAA; any carry as comes
here (sum > 9), will then directly be in AH. If AH has any data, then it
will be simply incremented. You can try this in the debug environment as
an additional experiment
• AAS:ASCII adjust AL after subtraction:
Adjusts the result of the subtraction of two unpacked BCD values (or
ASCII values as in AAA) to create a unpacked BCD result. The AL
register is the implied source and destination operand for this instruction.
The AAS instruction is only useful when it follows a SUB instruction that
subtracts (binary subtraction) one unpacked BCD value from another and
stores a byte result in the AL. The AAA instruction then adjusts the
contents of the AL register to contain the correct 1-digit unpacked BCD
result. If the subtraction produced a decimal carry, the AH register
decrements by 1, and the CF and AF flags are set. If no decimal carry
occurred, the CF and AF flags are cleared, and the AH register is
unchanged. In either case, the AL register is left with its top nibble set to
0.
Operation:
IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL ← AL – 6;
AH ← AH – 1;
AF ← 1;
CF ← 1;
ELSE
CF ← 0;
40
AF ← 0;
AL ← AL AND 0FH;
Flags Affected
The AF and CF flags are set to 1 if there is a decimal borrow; otherwise,
they are set to 0. The OF, SF, ZF, and PF flags are undefined.
Exercise: study the AAS instruction in the debug.
• AAM: ASCII adjust AX after multiply:
Adjusts the result of the multiplication of two unpacked BCD values to
create a pair of unpacked (base 10) BCD values. The AX register is the
implied source and destination operand for this instruction. The AAM
instruction is only useful when it follows a MUL instruction that
multiplies (binary multiplication) two unpacked BCD values and stores a
word result in the AX register. The AAM instruction then adjusts the
contents of the AX register to contain the correct 2-digit unpacked (base
10) BCD result.
The generalized version of this instruction allows adjustment of the
contents of the AX to create two unpacked digits of any number base (see
the “Operation” section below). Here, the imm8 byte is set to the selected
number base (for example, 08H for octal, 0AH for decimal, or 0CH for
base 12 numbers). The AAM mnemonic is interpreted by all assemblers to
mean adjust to ASCII (base 10) values. To adjust to values in another
number base, the instruction must be hand coded in machine code (D4
imm8).
Operation:
tempAL ← AL;
AH ← tempAL / imm8; (* imm8 is set to 0AH for the AAM mnemonic *)
AL ← tempAL MOD imm8;
The immediate value (imm8) is taken from the second byte of the
instruction.
Flags Affected:
The SF, ZF, and PF flags are set according to the resulting binary value in
the AL register. The OF, AF, and CF flags are undefined.
The following is an example of hand coding for the base 12 conversion in
the debug environment:
-rax
AX 0000
:2070 ; AH has an irrelevant data 20H and AL has the data 70h (= 94
; in the base 12 system, = 112 in the decimal system)
-e cs:100
; enter the hand code D40C at the default assembly
; address 100H (in debug) in the code segment (cs)
1377:0100 07.d4 BB.c
-u 100 101
; unassemble the first 2 bytes of code
41
1377:0100 D40C AAM 0C ; Note how the hand coded instruction
; gets unassembled, but if we try to
; give this as an instruction, AAM 0C
; it will produce an error in debug or
; when assembled in MASM.
-r ; display register contents

1377:0100 D40C AAM 0C
-t

-q ; quit debug
Exercise: study the regular AAM instruction (D4 0C) in the debug.
Note: This AAM instruction could be used for getting 2 digit unpacked
BCD in register AX, from 2-digit packed hex number, less than 64 H (=
100 decimal) in register AL. Using general form of the hand coded
instruction it is possible to apply this to general base conversion. You
may also check what happens when we use this instruction with, say, FF H
in register AL.
• AAD: ASCII adjust AX before division:
Adjusts two unpacked BCD digits (the least-significant digit in the AL
register and the most significant digit in the AH register) so that a division
operation performed on the result will yield a correct unpacked BCD
value. The AAD instruction is only useful when it precedes a DIV
instruction that divides (binary division) the adjusted value in the AX
register by an unpacked BCD value. The AAD instruction sets the value
in the AL register to (AL + (10 * AH)), and then clears the AH register to
00H. The value in the AX register is then equal to the binary equivalent of
the original unpacked two-digit (base 10) number in registers AH and AL.
The generalized version of this instruction allows adjustment of two
unpacked digits of any number base (see the “Operation” section below),
by setting the imm8 byte to the selected number base (for example, 08H
for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAD
mnemonic is interpreted by all assemblers to mean adjust ASCII (base 10)
values. To adjust values in another number base, the instruction must be
hand coded in machine code (D5 imm8).
Operation:
tempAL ← AL;
tempAH ← AH;
AL ← (tempAL + (tempAH ∗ imm8)) AND FFH; (* imm8 is set to 0AH
for the AAD mnemonic *)
AH ← 0
The immediate value (imm8) is taken from the second byte of the
instruction.
42
Flags Affected:
The SF, ZF, and PF flags are set according to the resulting binary value in
the AL register; the OF, AF, and CF flags are undefined.
Note: This instruction can be used to convert 2-digit unpacked BCD in
AX to 2-digit packed hex in AL. The generalized hand coded version will
be useful for doing the same in any base less than 16 decimal (Why less
than 16? Try to reason out).
Exercise: Check the regular and the hand coded versions in the debug.
Hand coding in the .asm file is demonstrated below.
HAND CODING IN THE .ASM FILE
code segment
assume cs:code
start: mov ax,050AH
dw 0BD5H ; hand coded AAD instruction, with 0B or
; 11 decimal after code D5
; note the instruction word is D50B
; but loaded in memory with LS byte first.
; the base of conversion is now 0B or 11 decimal
int 01 ; return control to debug
Code ends
end start
The above file can now be assembled and linked using masm and link
programs to produce a .exe file which can be executed and seen in the debug
environment, as demonstrated below.
-u 0 6
; unassemble the first 6 bytes of the code segment
13D5:0000 B80A05 MOV AX,050A
13D5:0003 D50B AAD 0B
13D5:0005 CD01 INT 01
-r : display registers

13D5:0000 B80A05 MOV AX,050A
-t2
; trace execution of two instructions
AX=050A BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
13D5:0003 D50B AAD 0B

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0005 NV UP EI PL NZ AC PE NC
13D5:0005 CD01 INT 01 ; not executed
;The hex value of the data 5A in Base 11 can be verified to be 41 in hex,
;validating the result seen in the register AX.
-q ; return to DOS
Question: How is ASCII involved in AAM and AAD instructions?

[Hint: Two-digit unpacked BCD in a 16-bit register can be easily
converted to two ASCII characters of the decimal digits. Check how.]
4. Logical Instructions: The logical instructions perform the basic logical

operations AND, OR, NOT and EX-OR on bytes or words in a bitwise fashion.
There is a further instruction TEST, which can be thought of as logical compare.
43
This instruction does a bitwise AND of the two operands, but does not place the
result in the destination register. The nature of the result goes to the flag register.
There are many logical operations available. How is it, these four functions:
AND, OR, NOT and EX-OR only are chosen? It can be seen that the functions
provide the programmer, with a capability to handle individual bits of a word
selectively. Consider we want to selectively set bit 4 from the left in a byte. We
can use a mask with a 1 on the 4th bit from left and 0 in all other bits, the mask
will then be 0000 1000. If we OR the data byte with this mask, we see that no
other bit is changed, but the 4th bit is set irrespective of the condition of that bit in
the original data byte. Similarly AND can be used to selectively clear a specific
bit irrespective of its original condition. The mask required will be the
complement of the mask we used for ORing above. A 1 will not alter a data on
ANDing, but a 0 will clear the data when ANDed. An EX-OR will be similarly
useful for selective toggling of the data. A 1 will toggle the data but a 0 will not
when EX-ORed. NOT will be useful for finding the 1’s complement of a full data
word. The logic function group AND, OR, NOT will form a universal logic
group, which means, any logic function could be generated using these three
functions appropriately on a bitwise basis, and hence no further logic functions
will be needed. The EX-OR function will also be useful in data comparisons also.
If we EX-OR two bytes or words, the result will be complete zero (every bit is
zero and the zero flag will be set to indicate this condition clearly), when the two
data bytes or words are equal. With this introduction we will now look at the four
instructions in detail.
• AND: Performs a bitwise AND operation on the destination (first) and
source (second) operands and stores the result in the destination operand
location. The source operand can be an immediate, a register, or a memory
location; the destination operand can be a register or a memory location.
(However, two memory operands cannot be used in one instruction.) Each
bit of the result is set to 1 if both corresponding bits of the first and second
operands are 1; otherwise, it is set to 0.
Operation:
DEST ← DEST AND SRC;
Flags Affected:
The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result.
The state of the AF flag is undefined.
• OR: Performs a bitwise inclusive OR operation between the destination
(first) and source (second) operands and stores the result in the destination
operand location. The source operand can be an immediate, a register, or a
memory location; the destination operand can be a register or a memory
location. (However, two memory operands cannot be used in one
instruction.) Each bit of the result of the OR instruction is set to 0 if both
corresponding bits of the first and second are 0; otherwise it is set to 1.
Operation:
DEST ← DEST OR SRC;
Flags Affected:
44
• XOR: Performs a bitwise exclusive OR (XOR) operation on the
destination (first) and source (second) operands and stores the result in the
destination operand location. The source operand can be an immediate, a
register, or a memory location; the destination operand can be a register or
a memory location. (However, two memory operands cannot be used in
one instruction.) Each bit of the result is 1 if the corresponding bits of the
operands are different; each bit is 0 if the corresponding bits are the same.
Operation:
DEST ← DEST XOR SRC;
Flags Affected:
• NOT: Performs a bitwise NOT operation (each 1 is set to 0, and each 0 is
set to 1) or does 1’s complementing on the destination operand and stores
the result in the destination operand location. The destination operand can
be a register or a memory location.
Operation:
DEST ← NOT DEST;
Flags Affected:
None.
Exercise on Logical instructions: The register AX has some unknown
data. Give a single instruction that will produce a 1 in the 4th and the 12th
bit from the left in AX without altering the other bits. If the mask used in
the above case is used with AND instruction, what will happen to the data
in AX?
Give an instruction using XOR logic that will produce the same result on a
register as the NOT instruction does.
• TEST: Bitwise AND the two sources operands, ignore the outcome, but
preserve the nature of the result in the flag register.
This instruction computes the bit-wise logical AND of first operand
(source 1 operand) and the second operand (source 2 operand) and sets the
SF, ZF, and PF status flags according to the result. The result is then
discarded.
Operation:
TEMP ← SRC1 AND SRC2;
SF ← MSB(TEMP);
IF TEMP = 0
THEN ZF ← 1;
ELSE ZF ← 0;
PF ← Parity of the lower 8-bits of TEMP;
CF ← 0;
OF ← 0;
(*AF is Undefined*)
Flags Affected
45
The OF and CF flags are set to 0. The SF, ZF, and PF flags are set according to the result.
5. Shift and Rotate Instructions: Shift and rotate instructions shift the data by one
or more bits towards either left or right, straight or in a circular fashion. The carry
flag is always involved in these operations. There are in all 3 shift instructions
and 4 rotate instructions. The shift/ rotate counts can be a single bit denoted as
such or multi bits based on the contents of register CL. The 8086 processor
performs multi bit shifts as per the data in CL register completely, taking 4 clocks
for each bit shift. The upper end processors starting from 286 onwards mask the
upper 11 bits and use only the last 5 bits as specifying the shift count. These
processors also permit multi bit shift counts to be specified as an immediate data
in the instruction, while 8086 allows only single bit shift to be directly specified
in the instruction. SHL BX, 15 H is an invalid instruction in 8086 (only SHL BX,
1 is valid), but valid in other higher end processors starting from 80286. We will
now go to the details.
• SAL/SHL/SAR/SHR: The shift instructions, although shown with four
separate mnemonics, are only three separate instructions. SAL and SHL
are the same, but SAR and SHR are not so. (The debug will only accept
the code SHL and indicates a fault on SAL. But MASM accepts both and
produces the same code for both.) These instructions shift the bits in the
first operand (destination operand) to the left or right by the number of bits
specified in the second operand (count operand). Bits shifted beyond the
destination operand boundary are first shifted into the CF flag, and then
discarded. At the end of the shift operation, the CF flag contains the last
bit shifted out of the destination operand. The destination operand can be a
register or a memory location. The count operand can be the immediate
value of 1, or it can be any 8-bit value in register CL for multiple shifts.
The shift arithmetic left (SAL) and shift logical left (SHL) instructions
perform the same operation; they shift the bits in the destination operand
to the left (toward more significant bit locations). For each shift count, the
most significant bit of the destination operand is shifted into the CF flag,
and the least significant bit is cleared.
The shift arithmetic right (SAR) and shift logical right (SHR) instructions
are different instructions as described below. They do the right shift of the
bits of the destination operand (toward less significant bit locations). For
each shift count, the least significant bit of the destination operand is
shifted into the CF flag, and the most significant bit is either set or cleared
depending on the instruction type. The SHR instruction clears the most
significant bit, and the SAR instruction sets or clears the most significant
bit to correspond to the sign (most significant bit) of the original value in
the destination operand. In effect, the SAR instruction fills the empty bit
position’s shifted value with the sign of the unshifted value.
The SAR and SHR instructions can be used to perform signed or unsigned
division, respectively, of the destination operand by powers of 2. For
example, using the SAR instruction to shift a signed integer 1 bit to the
46
right divides the value by 2. Using the SAR instruction to perform a
division operation does not produce the same result as the IDIV
instruction. The quotient from the IDIV instruction is rounded toward
zero, whereas the “quotient” of the SAR instruction is rounded toward
negative infinity. This difference is apparent only for negative numbers.
For example, when the IDIV instruction is used to divide -9 by 4, the
result is -2 with a remainder of -1. If the SAR instruction is used to shift -9
right by two bits, the result is -3 and the “remainder” is +3; however, the
SAR instruction stores only the most significant bit of the remainder (in
the CF flag). The OF flag is affected only on 1-bit shifts. For left shifts,
the OF flag is set to 0 if the most significant bit of the result is the same as
the CF flag (that is, the top two bits of the original operand were the
same); otherwise, it is set to 1. For the SAR instruction, the OF flag is
cleared for all 1-bit shifts. Execution of the SHR instruction, sets the OF
flag to correspond to the most-significant bit of the original operand.
(The use of the OF flag in these instructions is not indicated in the
manual.)
Exercises: Study the case of dividing -9 by +4 using SAR in the debug

environment, and compare it with the operation using the IDIV
instruction. Can you think of a way of getting the full remainder +3 on
dividing -9 in register AX by 4, using the SAR instruction? [Hint: Try
using SAR AX, 1 twice, instead of using SAR AX, CL with 2 in CL and
manage to get the full remainder using the carries of the two operations]
Do you think the SHL instruction can do multiplication by powers of 2?
Explain.
Demonstrate in the debug environment, that a register can be cleared using
the SHR or SHL instructions. [Answer: See the following program.]
-a
1377:0100 mov ax, 1234 ; any random data in AX
1377:0103 mov cl, 12 ; shift count in CL is 18 decimal or 12 hex.
1377:0105 shl ax, cl ; left shift
1377:0107 mov ax, 5678 ; fresh random data loaded in AX
1377:010A shr ax, cl ; right shift by 18 bits
- u 100 10b
1377:0100 B83412 MOV AX,1234

1377:0103 B112 MOV CL,12
1377:0105 D3E0 SHL AX,CL
1377:0107 B87856 MOV AX,5678
1377:010A D3E8 SHR AX,CL
-r
1377:0100 B83412 MOV AX,1234
-t5
47
1377:0103 B112 MOV CL,12

1377:0105 D3E0 SHL AX,CL

DS=1377 ES=1377 SS=1377 CS=1377 IP=0107 NV UP EI PL ZR NA PE NC
1377:0107 B87856 MOV AX,5678
;NOTE: Shift count is more than 16 and hence 16 0’s are shifted into AX.

DS=1377 ES=1377 SS=1377 CS=1377 IP=010A NV UP EI PL ZR NA PE NC
1377:010A D3E8 SHR AX,CL

DS=1377 ES=1377 SS=1377 CS=1377 IP=010C NV UP EI PL ZR NA PE NC
-q
• RCL, RCR, ROL, and ROR: Rotate instructions, rotate including carry
(RCL, RCR) and rotate only the register (ROR, ROL): These instructions
shift (rotates) the bits of the first operand (destination operand) the number
of bit positions specified in the second operand (count operand) and stores
the result in the destination operand. The destination operand can be a
register or a memory location; the count operand is either the immediate
value 1 or a value in the CL register.
The rotate left (ROL) and rotate through carry left (RCL) instructions shift
all the bits toward more-significant bit positions, except for the most-
significant bit, which is rotated to the least significant bit location. The
rotate right (ROR) and rotate through carry right (RCR) instructions shift
all the bits toward less significant bit positions, except for the least-
significant bit, which is rotated to the most-significant bit location.
The RCL and RCR instructions include the CF flag in the rotation. The
RCL instruction shifts the CF flag into the least-significant bit and shifts
the most-significant bit into the CF flag. The RCR instruction shifts the
CF flag into the most-significant bit and shifts the least-significant bit into
the CF flag. For the ROL and ROR instructions, the original value of the
CF flag is not a part of the result, but the CF flag receives a copy of the bit
that was shifted from one end to the other.
The OF flag is defined only for the 1-bit rotates; it is undefined in all other
cases (except that a zero-bit rotate does nothing, that is affects no flags).
For left rotates, the OF flag is set to the exclusive OR of the CF bit (after
the rotate) and the most-significant bit of the result. For right rotates, the
OF flag is set to the exclusive OR of the two most-significant bits of the
result.
The 8086 does not mask the rotation count. However, all other IA-32
processors (starting with the Intel 286 processor) do mask the rotation
count to 5 bits, resulting in a maximum count of 31. This masking is done
48
in all operating modes (including the virtual-8086 mode) to reduce the
maximum execution time of the instructions.
The SF, ZF, AF and PF flags are not affected by the rotate instructions.
Exercises: Study the rotate instructions in the debug.
For the 8086 processor show that in case of ROL and ROR instructions,
the result of rotations, using CL register for shift count, is independent of
the upper nibble of CL. This upper nibble only increases the execution
time of the instruction.
6. Control Transfer Instructions: The intelligence in any program lies in the

ability of the program to follow different courses of action based on intermediate
results produced during the working of the program; that way, the program is
enabled to perform data sensitive tasks. This capability is obtained by context
sensitive jump operations, in contrast to the normal sequential cyclical operation
of fetching the next instruction and executing it, in the order in which it is found
in the program. So this set of control transfer instructions to be discussed now,
give the processor all its intelligence and its raw power. The nature of the data is
determined by the intermediate results that we get. It should be noted that when
we speak of the nature of the data or the nature of partial results, we do not
exactly mean the value of the data, but its nature, whether it is positive, negative
or zero, or whether it is larger or smaller than another data and so on. The
purpose of the flag register is to keep track of this sort of information on the
intermediate results and it is natural that the transfer of control must depend
heavily on the flag register as we may see from the details of these instructions.
We will look at these instructions now one by one.
• JMP: Jump instruction: Transfers program control to a different point in
the instruction stream without recording return information. The
destination (target) operand specifies the address of the instruction being
jumped to. This operand can be an immediate value, a general-purpose
register, or a memory location. This instruction can be used to execute
three different types of jumps:
Near jump—A jump to an instruction within the current code segment
(the segment currently pointed to by the CS register), sometimes referred
to as an intra segment jump.
Short jump—A near jump where the jump range is limited to –128 to
+127 from the current IP value.
Far jump—A jump to an instruction located in a different segment than
the current code segment, is sometimes referred to as an inter segment
jump.
Near and Short Jumps: When executing a near jump, the processor jumps
to the address (within the current code segment) that is specified with the
target operand. The target operand specifies either an absolute offset (that
is an offset from the base of the code segment) or a relative offset (a
signed displacement relative to the current value of the instruction pointer
in the IP register). A near jump to a relative offset of 8-bits (rel8) is
49
referred to as a short jump. The CS register is not changed on near and
short jumps.
An absolute offset is specified indirectly in a general-purpose register or a
memory location (r/m16), or it may directly be specified in the instruction.
The following is a study of jump instructions in the debug environment.
-a
1377:0100 jmp
112 ; coded as short relative
1377:0102 jmp
1234 ; coded as relative – but 16 bit displacement
1377:0105 jmp
bx ; register direct; bx has the address
1377:0107 jmp
[bx] ; register indirect (near) jump
1377:0109 jmp
wordptr [bx]; same as above
1377:010B jmp
dwordptr[bx]; interpreted same as above? See unassembly.
1377:010D jmp
far[bx] ; register indirect far jump
1377:010F jmp
near [bx] ; register indirect near jump
1377:0111 jmp
far bx ; error, as far jump requires 32 bits of
^ Error ; address, while BX can only store 16 bits.
1377:0111 jmp far [1234] ; far jump to address @ DS:1234
1377:0115
-u100 114
1377:0100 EB10 JMP 0112 ; near, short, rel-8 address

1377:0102 E92F11 JMP 1234 ; near, rel-16 address
1377:0105 FFE3 JMP BX ; BX  IP
1377:0107 FF27 JMP [BX] ; memory at address in BX  IP
1377:0109 FF27 JMP [BX]
1377:010B FF27 JMP [BX] ; interpreted only as near jump?
1377:010D FF2F JMP FAR [BX] ; memory dword @ [BX]  CS:IP
1377:010F FF27 JMP [BX] ; memory word @ [BX]  IP
1377:0111 FF2E3412 JMP FAR [1234]; memory word @ [1234]  CS:IP
• Conditional Jump Instructions: There are several conditional jump
instructions as shown in the table below with conditions described either
in terms of the flags directly, like JC (jump if carry); alternatively the
conditions may be described in terms of problem requirements like JA
(jump if above), which in terms of the flags would be a little involved. It
would cause a jump only if both carry and the zero flags are both reset.
The instructions and their operations are detailed below. The abbreviation
cb stands for code byte which gives the jump target address relative to the
IP address of the instruction following the jump instruction, and the
instruction requires this relative address in the range -128 to +127, so that
it can fit in the code byte indicated as cb in the opcodes field below, and
represented as rel8 in the table below. It should also be noted that the
terms ‘above’ and ‘below’ refer to the comparison of two data,
considering them as unsigned integers, while the words ‘greater’ and ‘less’
correspond to the signed integer data comparison. These conditional jump
instructions are normally used after subtract or compare instructions.
8086 processor does not have any near or far jumps other than the short
relative jumps mentioned above. You may also note that in the table
below all opcodes start with hex digit 7(excepting the JCXZ which is
anyway a special instruction) so in all there can only be 16 instructions
possible, but the table shows 30 entries excluding the JCXZ entry. This
means there 14 instructions having duplicate mnemonics, and 2
50
instructions having single mnemonic, identify equivalent mnemonics and
check that they represent the same condition. Identify also opcodes which
are having a single mnemonic.
Opcode Instruction Description
1. 77 cb JA rel8 Jump short if above (CF=0 and ZF=0)
2. 73 cb JAE rel8 Jump short if above or equal (CF=0)
3. 72 cb JB rel8 Jump short if below (CF=1)
4. 76 cb JBE rel8 Jump short if below or equal (CF=1 or ZF=1)
5. 72 cb JC rel8 Jump short if carry (CF=1)
6. E3 cb JCXZ rel8 Jump short if CX register is 0
7. 74 cb JE rel8 Jump short if equal (ZF=1)
8. 7F cb JG rel8 Jump short if greater (ZF=0 and SF=OF)
9. 7D cb JGE rel8 Jump short if greater or equal (SF=OF)
10. 7C cb JL rel8 Jump short if less (SF ≠ OF)
11. 7E cb JLE rel8 Jump short if less or equal (ZF=1 or SF<>OF)
12. 76 cb JNA rel8 Jump short if not above (CF=1 or ZF=1)
13. 72 cb JNAE rel8 Jump short if not above or equal (CF=1)
14. 73 cb JNB rel8 Jump short if not below (CF=0)
15. 77 cb JNBE rel8 Jump short if not below or equal (CF=0 and ZF=0)
16. 73 cb JNC rel8 Jump short if not carry (CF=0)
17. 75 cb JNE rel8 Jump short if not equal (ZF=0)
18. 7E cb JNG rel8 Jump short if not greater (ZF=1 or SF ≠ OF)
19. 7C cb JNGE rel8 Jump short if not greater or equal (SF ≠ OF)
20. 7D cb JNL rel8 Jump short if not less (SF=OF)
21. 7F cb JNLE rel8 Jump short if not less or equal (ZF=0 and SF=OF)
22. 71 cb JNO rel8 Jump short if not overflow (OF=0)
23. 7B cb JNP rel8 Jump short if not parity (PF=0)
24. 79 cb JNS rel8 Jump short if not sign (SF=0)
25. 75 cb JNZ rel8 Jump short if not zero (ZF=0)
26. 70 cb JO rel8 Jump short if overflow (OF=1)
27. 7A cb JP rel8 Jump short if parity (PF=1)
28. 7A cb JPE rel8 Jump short if parity even (PF=1)
29. 7B cb JPO rel8 Jump short if parity odd (PF=0)
30. 78 cb JS rel8 Jump short if sign (SF=1)
31. 74 cb JZ rel8 Jump short if zero (ZF = 1)
The following examples in debug show sample instructions also with unassembly :
-a
1377:0100 ja 114
1377:0102 jnb 1234
^ Error; relative address more than 8 bits
1377:0102 jae 85
1377:0104
-u 100 103
1377:0100 7712 JA 0114 ; address relative to 102H is 12H

1377:0102 7381 JNB 0085 ; address relative to 104H is 81H
; or -7F H
1377:0104
-q
51
Exercise: All the conditional jumps are only possible with displacements
in the range -128 to +127 from the current location. If a longer range of
conditional jump is required how can you arrange for that? [Hint: try using
a simple jump with a longer or 16-bit relative address (in addition to the
conditional jump) – at the destination of the conditional jump]
• LOOP, LOOPZ (LOOPE) and LOOPNZ (LOOPNE): These are
unconditional and conditional Loop instructions. Note that there are only
two conditional loops, both based on the condition of the zero flag. Loop
on zero (or Loop on equal) and Loop on not zero (or loop if unequal, that
is when the comparison of 2 data items show that they are unequal)
Description:
The loop instruction performs a loop operation using the CX register as a counter. Each
time the LOOP instruction is executed, the count register is decremented, then checked
for 0. If the count is 0, the loop is terminated and program execution continues with the
instruction following the LOOP instruction. If the count is not zero, a near jump is
performed to the destination (target) operand, which is presumably the instruction at the
beginning of the loop. If the address-size attribute is 32 bits, the ECX register is used as
the count register; otherwise the CX register is used. The target instruction is specified
with a relative offset (a signed offset relative to the current value of the instruction
pointer in the IP register). This offset is generally specified as a label in assembly code,
but at the machine code level, it is encoded as a signed, 8-bit immediate value, which is
added to the instruction pointer. Offsets of –128 to +127 are allowed with this instruction.
Conditional loop instructions (LOOPcc) accept the ZF flag as a condition for terminating
the loop before the count reaches zero. With these forms of the instruction, a condition
code (cc) is associated with each instruction to indicate the condition being tested for.
Here, the LOOPcc instruction itself does not affect the state of the ZF flag; the ZF flag is
changed by other instructions in the loop. Loopz stands for loop if zero, loopnz for loop
if not zero.
Opcode Instruction Description
E2 cb LOOP rel8 Decrement count; jump short if count ≠ 0
E1 cb LOOPE rel8 Decrement count; jump short if count ≠ 0 and ZF=1
E1 cb LOOPZ rel8 Decrement count; jump short if count ≠ 0 and ZF=1
E0 cb LOOPNE rel8 Decrement count; jump short if count ≠ 0 and ZF=0
E0 cb LOOPNZ rel8 Decrement count; jump short if count ≠ 0 and ZF=0
• CALL: Call instruction is a returnable jump to the destination or the target
address provided in the instruction. This instruction can be used to
execute two different types of calls:
Near call—A call to a procedure within the current code segment (the
segment currently pointed to by the CS register), sometimes referred to as
an intrasegment call.
Far call—A call to a procedure located in a different segment than the
current code segment, sometimes referred to as an intersegment call.
Near Call: When executing a near call, the processor pushes the value of
the IP register (which contains the offset of the instruction following the
CALL instruction) onto the stack (for use later as a return-instruction
pointer). The processor then branches to the address in the current code
52
segment specified with the target operand. The target operand specifies
either an absolute offset in the code segment (that is an offset from the
base of the code segment) or a relative offset (a signed displacement
relative to the current value of the instruction pointer in the IP register,
which points to the instruction following the CALL instruction). The CS
register is not changed on near calls.
For a near call, an absolute offset is specified indirectly in a general-
purpose register or a memory location (r/m16 ). Absolute offsets are
loaded directly into the IP register. (When accessing an absolute offset
indirectly using the stack pointer [SP] as a base register, the base value
used is the value of the SP before the instruction executes.)
Far Calls: When executing a far call, the processor pushes the current
value of both the CS and IP registers onto the stack for use as a return-
instruction pointer. The processor then performs a “far branch” to the code
segment and offset specified with the target operand for the called
procedure. Here the target operand specifies an absolute far address either
directly with a pointer (ptr16:16) or indirectly with a memory location
(m16:16 ). With the pointer method, the segment and offset of the called
procedure is encoded in the instruction, using a 4-byte far address
immediate. With the indirect method, the target operand specifies a
memory location that contains a 4-byte far address. The operand-size
attribute determines the size of the offset (16 or 32 bits) in the far address.
The far address is loaded directly into the CS and EIP registers. If the
operand-size attribute is 16, the upper two bytes of the EIP register are
cleared.
Exercises: In the debug, check and see how the following instructions are
machine coded on unassembling:
CALL 156
CALL SHORT 156
CALL 6789
CALL BX
CALL SHORT BX
CALL [BX]
CALL NEAR [BX]
CALL SHORT [BX]
CALL FAR [BX]
CALL FAR 1234:5678
CALL SP
CALL [AX]
That will give you a fair idea of the machine codes used, as well as of the
different modes of call instructions. However, by far the most common
method used for call is by directly giving the address of the procedure in
the instruction. In ALP (assembly language programming) this is done by
using the label name used for the procedure or subroutine.
53
• RET: The RET (return) instruction returns control back from the
procedure to the program that has called the procedure. The control will
be returned to the instruction following the procedure call.
This instruction transfers the program control to a return address located
on the top of the stack. The address is usually placed on the stack by a
CALL instruction, and the return is made to the instruction that follows
the CALL instruction. The optional source operand specifies the number
of stack bytes to be released after the return address is popped; the default
is none. This operand can be used to release parameters from the stack that
were passed to the called procedure and are no longer needed.
Exercises: Study following RET instructions in the debug and see their
machine codes.
RET
RET NEAR
RETF ; this stands for return far
RET 120
RETF 120
• INTn, INTO and INT 3: These instructions are software interrupt
procedure calls. Software interrupts are special procedures that can be
invoked or called using an 8-bit number, known as the interrupt number.
Many system services are rendered using software interrupts. The
interrupt invoked procedures are normally known as interrupt service
routines. I/O devices also can obtain system services using these calls.
They first get the attention of the processor by activating the interrupt pin
of the processor. When the pin is activated, the processor goes through a
sequence of operations to which the interrupting I/O device responds by
inputting an 8-bit number n. The processor then invokes the service
routine for INT n. This process is known as the hardware interrupt
operation. Once an interrupt is invoked, the processor pushes the FLAGS,
the CS and the IP value (corresponding to the instruction immediately
following the interrupt call). With this, the processor is ready to accept a
returnable far jump (new values in CS and IP, returnable because the old
values of CS and IP are stored in the stack along with the old flags). The
destination operand n in the instruction specifies an interrupt vector
number from 0 to 255, encoded as an 8-bit unsigned intermediate value.
Each interrupt vector number n provides an index to a 4-byte array – the
interrupt vector array – storing the far call address associated with the
particular n. In all, there is provision for 256 interrupts, and with each
interrupt having a 4 byte address (far call address CS and IP), the interrupt
vector array is placed in the lowest 1 KB of the memory space. The first
32 interrupt vector numbers are reserved by Intel for system use. Some of
these interrupts are used for internally generated exceptions.
The INT n instruction is the general mnemonic for executing a software-
generated call to an interrupt handler, with the vector number n.
The INTO instruction is a special mnemonic for calling overflow
exception, interrupt vector number 4. The overflow interrupt checks the
54
OF flag in the FLAGS register and calls the overflow interrupt handler,
that is, the interrupt with a vector number 4, if the OF flag is set to 1.
The INT 3 instruction generates a special one byte opcode (CC) that is
intended for calling the debug exception handler. This one byte form is
valuable because it can be used to replace the first byte of any instruction
with a breakpoint, including other one byte instructions, without over-
writing other code.
Exercise: 1. Unassemble the following instructions in the debug:
INTO
INT 3
INT 4
INT 73
2. Although INTO and INT 4 appear to be the same, INTO is a
conditional execution of vector 4 interrupt based on the overflow flag, but
INT 4 is unconditional software interrupt at vector 4, occurring even
without there being an overflow, as seen in the following debug
experiment. The experiment is done without the OF being reset, and as
can be seen in the execution, INT 4 actually branches to the interrupt
routine, while INTO does not. After setting the OF, executing INTO
invokes the interrupt at vector 4.
-a
1377:0100 int 4
1377:0102 into
1377:0103
-d 0000:0010 0013
;the data below shows the interrupt 4 vector
0000:0010 8B 01 70 00 ..p.
-r
1377:0100 CD04 INT 04
-t
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000

DS=1377 ES=1377 SS=1377 CS=0070 IP=018B NV UP DI PL NZ NA PO NC
0070:018B 1E PUSH DS; INT 4 has taken place even with
; OF being not there, NV = no overflow.
-rip
IP 018B
:102
-rcs
CS 0070
:1377 ; get back to cs:ip = original 1377:102 (to INTO instruction)
-r
DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 NV UP DI PL NZ NA PO NC
1377:0102 CE INTO
-t
55
1377:0103 0000 ADD [BX+SI],AL ; INTO has not occurred, as
;OF is not set, NV = no o’flow.
-rip
IP 0103
:102 ; again get back to INTO instruction
-rf
NV UP DI PL NZ NA PO NC – ov ; set overflow flag
-r

DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 OV UP DI PL NZ NA PO NC
1377:0102 CE INTO
-t

DS=1377 ES=1377 SS=1377 CS=0070 IP=018B OV UP DI PL NZ NA PO NC
0070:018B 1E PUSH DS; with OF set, interrupt 4 has occurred.
Exercise: check if the codes CD 03 and CC, both standing for interrupt 3
have any difference in execution. [Hint: the code CC is only a
convenience for debug purposes, for break point provision]
• IRET: Return from interrupt: the IRET instruction performs a far return to
the interrupted program or procedure. During this operation, the processor
pops the return instruction pointer, return code segment selector, and
FLAGS image from the stack to the IP, CS, and FLAGS registers,
respectively, and then resumes execution of the interrupted program or
procedure.
Exercise: Why should the flag be saved at entry to the interrupt service
routine and why should it be retrieved on return? What about other
registers used by the interrupt routine? How are their integrity maintained
on return? [Hint: It is the responsibility of the interrupt program to return
them intact]
Why are there no instructions like IRET NEAR, IRETF or IRET n? [Hint:
consider hardware interrupts by I/O devices]
7. String Instructions: String instructions operate on strings of bytes or words

allowing them to be moved between memory and register or memory and
memory. There are several instructions in this category, involving comparison of
memory data with AX or AL, comparison of two data arrays, handling an array of
input from an input device, outputting a data array through an output device and
so on, as we shall see from the instruction details below. When memory arrays
are used, the source array is addressed using the address DS:SI,(but with
segment over ride prefix ES:, it can be ES: SI), and destination address is
always ES:SI (cannot be over-ridden). The DI and SI values will change at
every execution of the instruction so that these addresses point to the next data
element of the relevant arrays. The next address of the data element may be in the
56
upward direction or downward direction (SI, DI increasing or decreasing by 2 or
1, depending on word or byte operation). When the direction flag D is 0, the
upward direction (address increasing) is taken. When it is 1, the downward
direction (address decreasing) is taken for address modification. The direction
flag can be controlled by the instruction CLD (clear direction flag D) or STD (set
direction flag D). These instructions can be used with REP prefix to repeat a
certain number of times as per the array length. The array length should be in CX
before invoking the REP action.
• MOVSB, MOVSW: String byte, string word move. It is only in these
two instructions and the two CMPS instructions discussed next, that we
can have both source and destination operands in memory, although
implicitly specified. All other instructions will have at least one operand
specified by a register. The move here is the command to enable to copy
(a string of bytes or a string of words) like move elsewhere. These
instructions move the byte or word, specified with the second operand
(source operand) to the location specified with the first operand
(destination operand). Both the source and destination operands are
located in memory. The address of the source operand is read from the
DS:SI registers. With segment override prefix ES: the source is ES:SI.
The address of the destination operand is always from memory at ES:DI.
Note that these operands are not explicitly mentioned in the instruction,
but implied. The instructions are just MOVSB, MOVSW all by
themselves. After the data move, the addresses in SI and DI are
appropriately modified depending on the D flag and on whether it is a byte
or a word move. What makes it necessary to use up or down addressing?
Study the debug experiment below.
-e 250
;enter the source array values at address DS:250 onwards
1377:0250 74.1 03.2 E9.3 6A.4 FF.5 B8.
; source array is: 01 02 03 04 05
-a ; assemble at 100
1377:0100 mov cx,5 ; number of bytes to be transferred
1377:0103 mov si, 250 ; Source array start

1377:0106 mov di, 252 ; Destination array start
1377:0109 rep movsb ; repeat byte transfer cx times
1377:010B ; assembly over
-r
1377:0100 B90500 MOV CX,0005 ;note, es and ds are both same.
-t8
1377:0103 BE5002 MOV SI,0250

1377:0106 BF5202 MOV DI,0252
57
1377:0109 F3 REPZ
1377:010A A4 MOVSB

1377:0109 F3 REPZ
1377:010A A4 MOVSB

1377:0109 F3 REPZ
1377:010A A4 MOVSB

1377:0109 F3 REPZ
1377:010A A4 MOVSB

1377:0109 F3 REPZ
1377:010A A4 MOVSB

DS=1377 ES=1377 SS=1377 CS=1377 IP=010B NV UP EI PL NZ NA PO NC
1377:010B 0100 ADD [BX+SI],AX ; movsb not repeated
-d 250 258
1377:250 01 02 01 02 01 02 01 00 – EB .........
; note, the copying has not come out correct, can you figure out why?
; [Hint: Look at the D flag. Work out how the program can be corrected]
• CMPSB, CMPSW: Compare string byte and compare string word.
These instructions compare the byte or word, specified with the second
operand (source operand) to the byte or word specified with the first
operand (destination operand). Both the source and destination operands
are located in memory. The address of the source operand is read from the
DS:SI registers. The address of the destination operand is read from the
ES:DI registers. Note that the operands are not explicitly mentioned in the
instruction, but implied. The instructions are just CMPSB, CMPSW all by
themselves. After the data compare, the addresses in SI and DI are
appropriately modified depending on the D flag and on whether it is a byte
or a word compare, exactly as in MOVSB and MOVSW instructions. The
flag register is modified as per the results of comparison as would happen
in a normal comparison of data. These compare string instructions can
also take the REP prefix like all string move instructions (with CX
initialized to the length of the array in bytes or words depending on the
instruction and decremented with every repetition); only the repeat action
becomes meaningless if it is unconditional. Two conditional repeats are
therefore provided: REPZ (also the same as REPE) and REPNZ (same as
REPNE) as prefixes to these instructions. REPZ, REPE will come out of
the repeat loop when a mismatch occurs between the two array-elements
compared, even though CX has not reached 0. REPNZ, REPNE will
cease to repeat when a match occurs, between the array-elements
58
compared, even if CX has not reached 0. It may be noted that REP and
REPZ or REPE have the same opcode. This means the REP, REPE,
REPZ prefixes can be used with MOVSB or MOVSW and when so used,
the zero flag will not be checked nor modified during execution, but the
instruction will continue repeating until CX becomes 0. When the REPZ
(REPE) or REPNZ (REPNE) prefixes are used with CMPSB or CMPSW,
the flag register is not changed to correspond to the result of
comparison, neither is the repeat action decided by the zero flag in the
flag register, but the result of comparison is directly used for deciding to
repeat comparing or not. However, at exit (may be because the condition
is not satisfied or because CX has reached zero), the result of comparison
from which the exit from the loop has occurred is seen in the zero flag.
The following assembled program and the tracing of its execution in the
debug, clearly brings out this fact.
It is also to be noted that only this cmps instruction and the scas
instruction discussed next that distinguish between the two, repz and
repnz prefixes. All other string instructions do not distinguish
between the two prefixes.
STUDY OF REP CMPSB INSTRUCTION
ASSEMBLY LANGUAGE PROGRAM:
data segment
cmpdt db 1,2,45,4,5,6,7,8, 1,2,3,4,5,6,7,8; 3rd data not matching

; group 1 compared with group2
data ends
code segment
assume cs:code, ds: data, es:data
start: mov ax, data
mov ds, ax
mov es, ax ; initialize
back2: lea si, cmpdt ; point to start of group 1
mov di, si
add di, 5 ; point to start of group 2
cmp si, di ; reset zero flag
Back1: mov cx, 5
rep cmpsb ; compare 5 bytes (the third is a mismatch)
jnz back1 ; on mismatch, compare next 5 bytes which match
; completely
jmp back2 ; start all over again (if you want another run)
int 1 ; you won’t reach here, but just in case!
code ends
end start
; UNASSEMBLING IN THE DEBUG

-u 0 1c ; unassemble from 0 to 1c
13D6:0000 B8D513 MOV AX,13D5

13D6:0003 8ED8 MOV DS,AX
13D6:0007 8D360000 LEA SI,[0000]
13D6:000B 39FE MOV DI,SI
13D6:000D 83C708 ADD DI,+08
13D6:0010 3BC0 CMP SI,DI
13D6:0012 B90500 MOV CX,0005
13D6:0015 F3 REPZ
59
13D6:0016 A6 CMPSB
13D6:0017 75F9 JNZ 0012
13D6:0019 EBEC JMP 0007
13D6:001B CD01 INT 01
-g 7 ; Execute upto instruction at cs:7 (excluding)

AX=13D5 BX=0000 CX=002D DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0007 NV UP EI PL NZ NA PO NC
13D6:0007 8D360000 LEA SI,[0000] DS:0000=0201
;explain the indication highlighted yellow above
-d 0 f
; Group 1 Group 2
13D5:0000 01 02 2D 04 05 06 07 08-01 02 03 04 05 06 07 08 ..-.............
-g 15 ; execute till CS:15

DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0015 NV UP EI PL NZ NA PE NC
13D6:0015 F3 REPZ ; next instruction to be executed
13D6:0016 A6 CMPSB
-ta ; trace execution of next A hex (10) instructions

13D6:0015 F3 REPZ
13D6:0016 A6 CMPSB
AX=13D5 BX=0000 CX=0003 DX=0000 SP=0000 BP=0000 SI=0002 DI=000A

13D6:0015 F3 REPZ
13D6:0016 A6 CMPSB
AX=13D5 BX=0000 CX=0002 DX=0000 SP=0000 BP=0000 SI=0003 DI=000B

13D6:0017 75F9 JNZ 0012; first exit from the loop (NZ)
; Note: result of the comparison is available in the zero flag at exit from the
; loop.

13D6:0012 B90300 MOV CX,0005

13D6:0015 F3 REPZ
13D6:0016 A6 CMPSB
AX=13D5 BX=0000 CX=0004 DX=0000 SP=0000 BP=0000 SI=0004 DI=000C

13D6:0015 F3 REPZ
13D6:0016 A6 CMPSB
AX=13D5 BX=0000 CX=0003 DX=0000 SP=0000 BP=0000 SI=0005 DI=000D

13D6:0015 F3 REPZ
13D6:0016 A6 CMPSB
AX=13D5 BX=0000 CX=0002 DX=0000 SP=0000 BP=0000 SI=0006 DI=000E

13D6:0015 F3 REPZ
13D6:0016 A6 CMPSB
60
AX=13D5 BX=0000 CX=0001 DX=0000 SP=0000 BP=0000 SI=0007 DI=000F
13D6:0015 F3 REPZ
13D6:0016 A6 CMPSB
DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0017 NV UP EI PL ZR NA PE NC
13D6:0017 75F9 JNZ 0012; second exit from loop (CX= 0)
; Note result from the comparison is put in the zero flag only at exit from the
; REP loop in both loop exit situations. WHY? What is your conclusion from the
experiment?
-q ; quit
• SCASB, SCASW: Scan string byte, scan string word: This instruction is
the same as the earlier CMPSB, CMPSW instructions we saw in the
previous section, except for the fact that the source for comparison is the
register AL for SCASB, or AX for SCASW. The destination is the same,
namely, ES:DI, and on execution, DI will point to the next byte or word,
based on the instruction and the direction flag as in the earlier cases of
MOVES and CMPS instructions. With CX initialized to the length of the
destination array, and DI initialized to the array start address when D flag
is reset, or to the end address of the array when the D flag is set, we can
use the conditional instruction prefixes REPE (REPZ) or REPNE
(REPNZ). The repetition will then go on until the condition gets
contradicted or until the register CX reaches zero (that is, the destination
array is completed). The zero flag in the flag register is updated only
when the loop exits, exactly as we saw in case of the CMPS instruction.
• LODSB, LODSW: Load string byte, load string word: These instructions
are similar to MOVSB, MOVSW except the destination of the move
becomes AL for LODSB, and AX for LODSW. The source is DS:SI. On
execution the data will come to the register AL or AX, and SI will be
properly modified. The REP or REPE (REPZ) prefix may be used like in
the MOVES instructions.
The example below shows that the repeat prefix produces the same result
for this instruction whether it is used as REPE or as REPNE. (See
discussion in connection with the instruction CMPS)
THE .asm PROGRAM
code segment
assume cs:code
Start: mov ax, cs
mov ds, ax ; ds is made same as cs
mov cx,6
mov si, offset array
repe lodsb
mov cx, 6
repne lodsb
int 1
jmp start
array db 00,11h,22h,33h,44h,55h,66h,77h,88h,99h,0aah,0bbh
code ends
end start
TESTING IN THE DEBUG
61
-u 0 14
13DC:0000 8CC8 MOV AX,CS

13DC:0002 8ED8 MOV DS,AX
13DC:0004 B90600 MOV CX,0006
13DC:0007 BE1500 MOV SI,0015
13DC:000A F3 REPZ
13DC:000B AC LODSB
13DC:000C B90600 MOV CX,0006
13DC:000F F2 REPNZ
13DC:0010 AC LODSB
13DC:0011 CD01 INT 01
13DC:0013 EBEB JMP 0000
-d cs:15 20
13DC:0010 00 11 22-33 44 55 66 77 88 99 AA .."3DUfw...

13DC:0020 BB .
-r

DS=13CC ES=13CC SS=13DC CS=13DC IP=0000 NV UP EI PL NZ NA PO NC
-g
AX=13BB BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=0021 DI=0000

DS=13DC ES=13CC SS=13DC CS=13DC IP=0013 NV UP EI PL NZ NA PO NC
13DC:0013 EBEB JMP 0000
-q
Exercise: From the example shown, try to prove that as far as executing the
LODSW instruction is concerned, the prefix REPE behaves the same as the
prefix REPNE, although they are coded differently; Intel literature gives only
the prefix REPE for the purpose, which is the authentic coding for this repeat
operation here.
• STOSB, STOSW: Store string word, store string word: These

instructions are similar to the above, except here, the source is the
accumulator, AL for byte store and AX for word store operations. The
destination is ES:DI. The destination address modification after execution
and repeat loop with CX initialized and using REP (REPE, REPZ) prefix
will work like in LODS instructions. This can be used to initialize a
memory block with a fixed data as shown in the assembly language
program segment below:
MOV CX, 100H
MOV DI, 500H
MOV AX, 0
REP STOSW
This program segment when executed will clear memory from address
ES:500 to ES:7FF inclusive (100 words from address 500).
62
8. Flag Control Instructions: There are two types of instructions which control the
flags. The first type controls specific flags, like the C flag, the D flag and the I
flag. The other moves either the entire flag register or the lower significant byte
of the register. The details are given below:
• STC, CLC and CMC: These instructions control the carry flag in the
flag register. They stand for set carry, clear carry and complement carry.
No other flags are affected by these instructions.
STC operation: CF ← 1;
CLC operation: CF ← 0;
CMC operation: CF ← NOT CF.
• STD and CLD: These instructions control the direction flag in the flag
register. They stand for set and clear the direction flag. Other flags are
not affected by these instructions. The need for controlling the D flag is
already seen in connection with the string instructions.
STD operation: DF ← 1; enables string addresses to be decremented
CLD operation: DF ← 0; enables string addresses to be incremented
• STI and CLI: These instructions modify the Interrupt control flag in the
flag register. When this I flag is set, the processor is enabled to accept the
hardware interrupts. Otherwise when it is reset, the processor will not be
interrupted by activating the interrupt pin of the processor from the
external hardware. Software interrupts are not disabled by clearing the I
flag, as the debug experiment below shows. Other flags are not affected
by these instructions.
STI operation: IF ← 1; Hardware interrupts enabled.
CLI operation: IF ← 0; Hardware interrupts disabled.
The debug experiment:
-a ; assemble at 100 onwards
1377:0100 cli
1377:0101 int 20
1377:0103
-u 100 102; unassemble

100 to 102
1377:0100 FA CLI
1377:0101 CD20 INT 20
-r
1377:0100 FA CLI
-t2
1377:0101 CD20 INT 20

DS=1377 ES=1377 SS=1377 CS=00A7 IP=1072 NV UP DI PL NZ NA PO NC
00A7:1072 90 NOP
; INT 20 vector address: segment = 0000, offset = 80-83.

-d 0000:80 83
0000:0080 72 10 A7 00 ; It is clear that interrupt is invoked even with
; the interrupt disabled, by clearing the I flag.
63
• LAHF: Load flag register (lower byte) to register AH. This is flag
register move instruction
Description:
Moves the low byte of the EFLAGS register (which includes status flags SF, ZF, AF, PF,
and
CF) to the AH register. Reserved bits 1, 3, and 5 of the EFLAGS register are set in the
AH register as shown in the “Operation” section below.
Operation:
AH ← FLAGS (SF:ZF:0:AF:0:PF:1:CF);
Flags Affected:
None (that is, the state of the flags in the EFLAGS register is not affected).
• SAHF: Store the contents of AH register into the lower byte of Flag
register.
Description:
Loads the SF, ZF, AF, PF, and CF flags of the FLAGS register with values from the
corresponding bits in the AH register (bits 7, 6, 4, 2, and 0, respectively). Bits 1, 3, and 5
of register AH are ignored; the corresponding reserved bits (1, 3, and 5) in the FLAGS
register remain as shown in the “Operation” section below.
Operation:
FLAGS (SF: ZF: 0: AF: 0: PF: 1: CF) ← AH;
Flags Affected:
The SF, ZF, AF, PF, and CF flags are loaded with values from the AH register. Bits 1, 3,
and 5 of the EFLAGS register are unaffected, with the values remaining 1, 0, and 0,
respectively.
• PUSHF and POPF: These instructions have already been discussed in
connection with data transfer instructions (classified under type 1
instructions). It may be noted here that when we pop from the stack into
the FLAG register, only the bits that represent the flags will be transferred,
but the other bits (marked as don’t cares in the description of Flag register
will not be altered). The debug program below indicates this feature of the
flag register.
-a
1377:0100 pushf
; flag register  stack top
1377:0101 pop ax
; stack top  AX
1377:0102 xor ax,f02a
; the non-flag bits are complemented in AX
1377:0105 push ax ; modified AX  stack top
1377:0106 popf ; and thence to the flag register
1377:0107 pushf
; the modified flag register  stack top
1377:0108 popax
; and thence to the register AX
1377:0109
-u 100 108
64
1377:0100 9C PUSHF
1377:0101 58 POP AX
1377:0102 352AF0 XOR AX,F02A
1377:0105 50 PUSH AX
1377:0106 9D POPF
1377:0107 9C PUSHF
1377:0108 58 POP AX
-r

1377:0100 9C PUSHF
-t7
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000

1377:0101 58 POP AX

1377:0102 352AF0 XOR AX,F02A; original content of flags = 3202
; XOR with 1111 0000 0010 1010 bin
; the word chosen to complement the non-flag bits of the flag register
; Note: Non-flag bits, specially, the bits of the M S nibble of the flag
;reg were later used for identifying different x86 series of processors.
;what we see in the MS nibble of flag register here is not 1111 which is
;the ID for 8086. Here, I have a Pentium mobile processor operating in
;the real mode. Hence the nibble here is an unchangeable 0011 or 3 hex.
AX=C228 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

DS=1377 ES=1377 SS=1377 CS=1377 IP=0105 NV UP EI NG NZ NA PE NC
1377:0105 50 PUSH AX
AX=C228 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000

DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI NG NZ NA PE NC
1377:0106 9D POPF
AX=C228 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

1377:0107 9C PUSHF
AX=C228 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000

1377:0108 58 POP AX

; obviously the flag register has not changed even one bit.
-q
9. Segment Register Instructions: Segment registers DS and ES can be loaded

along with a pointer register by the instructions LDS and LES. Direct moves
between segment registers is not allowed, for example, MOV ES, DS is invalid.
Moves between a segment register DS, ES or SS and any one of the 8 registers
AX, BX, CX, DX, SI, DI, BP and SP are permitted. If we wish to copy DS to ES,
65
we have to use one of these 8 registers as an intermediate register. MOV AX, DS
followed by MOV ES, AX will be a valid operation for copying DS in ES.
• LDS and LES: LDS stands for Load DS and an address register indicated
as the first operand in the instruction. LES stands for load ES and an
address register indicated as the first operand for the instruction. The
second operand is a memory pointer, where the far address of 4 bytes is
stored. The following are examples of valid instructions.
LDS SI, [BX +1234]
LES SP, [1234]; It may be noted this instruction is only given to
; indicate a very rare possibility, but may not be
; normally having any serious use other than for
; simultaneously loading both the registers ES
; and SP with a single instruction, provided the
; memory data is suitably manipulated for this
; requirement.
• CS:, DS:, ES: and SS: : These are segment override prefixes which we
have already seen.
10. Miscellaneous Instructions: These instructions cannot be easily classified.

These are LEA, NOP, HLT, WAIT and XLAT. The LOCK instruction prefix
also can be considered here, as we have not found a place for it elsewhere. We
shall now look at them one by one.
• LEA: Load effective address: This instruction computes the effective
address of the second operand (the source operand) and stores it in the first
operand (destination operand). The source operand is a memory address
(offset part) specified with one of the processors addressing modes; the
destination operand is one of the 8 registers AX, BX, CX, DX, SI, DI, BP
or SP.
Operation:
DEST ← Effective Address (SRC);
Flags affected: None.
The instruction will not be very useful in the debug environment (why?),
but in the assembly language programs it will be quite useful as the
following program shows. Get the program assembled and linked. Test it
in debug.
data segment
addr dw 1234h, 5678h ; two words defined.
data ends
code segment
assume cs:code, ds:data
start: mov ax, data
mov ds, ax
lea ax, addr ; on execution, the address of this data  AX
lea bx, addr+2 ; address of, or pointer to the next data  BX
mov cx, [bx] ; data word 5678h  CX
mov bx, ax ; why is this necessary?
mov dx, [bx] ; data word 1234h  DX
int 01
code ends
end start
66
• NOP: No operation: This is a one byte instruction doing nothing except
incrementing the IP by 1. It can also be coded in the assembly language as
XCHG AX, AX, which also does nothing really. The machine code for
XCHG AX, AX and NOP are the same; both are 90 H.
• HLT: Go to the HALT state by stopping the cyclic fetch and execute
operations of the processor. The processor will remain in this HALT state
until this state is interrupted by a hardware activation (taken to logic high)
on one of its pins: INTR (interrupt request), NMI (non maskable interrupt)
or RESET.
• WAIT or FWAIT: Wait for Test signal or hardware interrupt – INTR,
NMI or RESET. This instruction takes the processor to an idle state until
one of the following happens: 1. The TEST pin of the processor goes low,
or 2. INTR, NMI or RESET goes logic high. With test pin going low, the
processor comes out of the wait state and proceeds normally. In case of
accepting the interrupt, the processor executes the interrupt service routine
and returns to the wait state again. The return address pushed to the stack
is the address of the wait instruction itself, and not of the following
instruction. This instruction is used for synchronizing the math or I/O
coprocessor operations with the 8086.
• LOCK prefix: This instruction prefix is useful when there are instructions
which do both read and write operation at a memory address while
executing a single instruction. This happens when a memory address is
used as the destination operand of an instruction; instructions like XCHG
AX, [BX], INC or DEC a memory data etc. These instructions cause a
memory read first at the destination register in the beginning, and at the
end of execution of the instruction, there will be a memory write, writing
back the result to the same memory location. When 8086 is used in a
multi processor, parallel processing environment, it will be necessary to
have these read and write operations to follow successively without
allowing other processors to use the common data transfer bus. The bus is
then supposed to be locked to the processor for the duration of the read
followed by the write operation, in fact for the duration of the execution of
the entire instruction. What the instruction actually does is to activate
(drive to a logic low voltage) a processor signal at the LOCK pin of the
processor, for the entire duration of the execution of the instruction
prefixed by the LOCK prefix. The system bus should be designed to
ensure that the data transfer bus – DTB – (the DTB consists of the lines
handling the data, address and read/ write and other control lines
associated with the transfer of data between the processor and other
system units like memory etc.) control cannot be taken up by any other
processor as long as the LOCK# (the # symbol is used to indicate an active
low signal) remains activated. For example, if we have the instruction,
LOCK INC [BX], the LOCK# pin of the processor will remain active all
through the execution of this instruction. That is, no other processor will
be able to access and control the DTB during the period of the reading of
67
the original data at the memory location at DS:BX, and subsequent write
back of the incremented value to the same memory location, while
executing this LOCK prefixed INC instruction. We can only say here that
parallel processor systems do require this type of control.
Exercise: Find out and list the type of instruction that can take a LOCK
prefix. We have already indicated the instructions XCHG and INC/ DEC
type. List other instruction types if any.
In this chapter, we have studied in detail, the instruction set of 8086. The
instruction set of all the advanced Intel processors of the IA-32 Architecture, namely,
80x86 and the Pentium processors are all supersets of this basic set; any assembly
language program (ALP) written using the instructions we have studied here, can
normally be executed in these advanced processors. That is why it is very necessary to
understand this instruction set very well if we are to work on these processors at the
assembly level. In this chapter, the 8086 instruction set is studied to a sufficient depth,
with examples of actual programs in the debug and .asm environments. This study is
made in my system, which uses a Pentium mobile processor, working in the real 8086
mode. Advantages of assembly level working, we have already seen. Later chapters will
give examples of ALPs at a serious level.
EXERCISES
1. Use the DEBUG to study the following instructions after loading different
segment addresses in CS, DS, ES and SS: (i) mov [si], 58 followed by mov ax,
[si]; (ii) stosw; (iii) mov [di], 54 (iv) mov ax, 24; (v) add ax, 38; (vi) daa;
(vii) mov ax, [bp]; (viii) try another 5 instructions of your choice.
2. Write a program directly in the DEBUG to manipulate the unsigned data
available in the registers ax, bx and cx so that ax has the largest of the three and
cx has the smallest. Check the working of the program. What will be the
modification required to the program if the data are considered as signed
numbers?
3. Write a program directly in the DEBUG to shift the 32-bit data in registers
dx:ax by one bit to the left. Check the working of the program.
4. Repeat the question 3 with these modifications: (i) one bit shift to the right;
(ii) one bit rotate to the right; (iii) one bit rotate through carry to the right
and (iv) one bit arithmetic shift to the right.
Below is shown a demonstration for the working for question 3.

-rax
AX 0000
:7777 ; initialize ax with 7777 hex (random data chosen for test)
-rdx
DX 0000
:eeee ; and dx with EEEE hex; 32-bit data is now EEEE7777 hex
-a ; assemble the program (2 instructions)
68
137B:0100 shl ax, 1
137B:0102 rcl dx, 1 ; watch carefully the two instructions used.
137B:0104 ; program over.
-r ; examine the registers at the start of the execution.

AX=7777 BX=0000 CX=0000 DX=EEEE SP=FFEE BP=0000 SI=0000 DI=0000
DS=137B ES=137B SS=137B CS=137B IP=0100 NV UP EI PL NZ NA PO NC
137B:0100 D1E0 SHL AX,1
-t
AX=EEEE BX=0000 CX=0000 DX=EEEE SP=FFEE BP=0000 SI=0000 DI=0000
DS=137B ES=137B SS=137B CS=137B IP=0102 OV UP EI NG NZ NA PE NC
137B:0102 D1D2 RCL DX,1
-t
AX=EEEE BX=0000 CX=0000 DX=DDDC SP=FFEE BP=0000 SI=0000 DI=0000
DS=137B ES=137B SS=137B CS=137B IP=0104 NV UP EI NG NZ NA PE CY
137B:0104 0000 ADD [BX+SI],AL DS:0000=CD
; The shifted data is DDDCEEEE hex as seen from dx:ax now.

-q
69
3. PROGRAMMING BASICS
Programming is a science and an art as well. Different people writing programs

for a given problem at the assembly language level may write different programs
altogether. The variations can be in several dimensions. There is some flexibility in
allotting registers to the variables, there may be different algorithms available for solving
a given problem, some more suitable in certain processors; depending on the processor,
some problems may become very simple with certain tricky operations. Some may prefer
to write simple programs, simple to understand but perhaps not very efficient; some
others may revel in complex but efficient programming and so on.
What is it that we are looking for in a program; when can we say, a program is
good? We should answer this question first, before attempting to write programs. We
could put forth a few criteria for a good program. First, it must solve the given problem
completely and for all sets of data. Sometimes the input to the program may not be a
valid data, like an input of 0 for the divisor in a division program. In such cases, the
program should exit giving an indication of the data invalidity. A good program must be
easy to follow. Above all, it should be efficient in resource or register usage, efficient in
terms of time of execution and efficient in terms of memory usage. In the context of
continuously increasing memory and resource availability, it may look like the most
important entity to be economized is the time of execution of the program. However,
over indulgence in time optimization at the cost of simplicity may not be worthwhile.
The way things are evolving, most of the features including the speed of systems are
continuously improving and in this environment, a good program can be the one which is
easier to understand and modify if needed, by any programmer with average expertise.
This implies simplicity of the program may be a primary concern, more important than
resource usage or the time taken for execution.
The following sets of alternative programs illustrate that even a simple function
may be achieved in so many ways. Consider we wish to round off an eight bit number to
only seven significant bits, that is, if the last bit is 0, we leave it unaltered, but if it is a 1,
we increment the number so that the number will become the next number,
approximately equal to the original eight bit number to seven bit accuracy. The
following programs consider the number in AL register at input as well as after
modification. They also do not use any other registers. Four possibilities (among
several) are given below.
Alternative 1: ROR AL, 1

ROL AL, 1; AL is unchanged, but carry has the l. s. bit
JNC DOWN
INC AL
DOWN:
Alternative 2: ROR AL, 1

ROL AL, 1
ADC AL, 0
69
Alternative 3: INC AL
AND AL, 0FE h; kill the last bit after incrementing
Alternative 4: TEST AL, 01; Test does not destroy the data tested
JZ DOWN
INC AL
DOWN:
Alternative 5:uses register AH also

MOV AH, 01
AND AH, AL
ADD AL, AH
The fact, that even such a simple operation has so many possible ways of
programming, indicates that programming is something where different individuals may
come up with different versions for doing the same job. The programming language is
thus quite flexible, almost similar to our normal languages like English.
We shall now take a little more serious problem, and see how we can program it
in different styles, with different levels of goodness, or efficiency.
The problem we take is a 4-digit HEX to 5-digit BCD conversion. BCD to hex
and hex to BCD conversions are useful in many situations. We understand BCD or
decimal numbers better, but the processor is more at home with hex (actually binary, but
binary is practically same as hex and we think of it as hex, for hex is more compact
compared to binary). Because of this, at human-machine interaction level, this
conversion from hex to BCD as well as BCD to hex will be necessary to make the
systems user-friendly. So this program has a serious application.
Basics of number Base Conversions: There are two basic methods of number
conversion from one base to another. The first method consists of separating the digits of
the given number first, then multiplying the digits with the powers of the base and
adding. Suppose we want to convert a hexadecimal number 12A to decimal. What we
can do is to separate the digits 1, 2, and A, and multiply in decimal, each of these digits
with the appropriate powers of 16 and add the results in decimal. Accordingly digit 1 is
multiplied by 162 = 256 decimal to get 256, digit 2 is multiplied by 16 decimal to get 32
decimal, and the digit A, which is 10 decimal is multiplied by 1 to get 10. All these are
now decimal added 256 + 32 + 10 to give the value 298 decimal for the number 12A hex.
Horner’s rule can be used to simplify these calculations: 12A hex = (1*16 +
2)*16 + 10 decimal
= 18*16 + 10
= 288 + 10 = 298 decimal.
Note, in this method all calculations (multiplications and additions) are to be in decimal.
70
An alternative method to do this is to divide the hexadecimal number by 10 decimal, that
is by 0A hex (using completely hex as the base of computation) successively to get the
decimal digits as remainders every time and then putting these digits in proper sequence.
According to this method: we have, 0A)12A
using hex computation: 0A)1D – 8↑
2 – 9|
Here we are using the hexadecimal calculation to separate the decimal digits from the
given number and then we can assemble the digits properly. In our example as shown by
the division above, we see the decimal equivalent of 12A as 2-9-8 digit wise, which,
assembled, gives the decimal number 298. The calculations done here are all in
hexadecimal to get the digits and then it is only a question of assembling the digits
properly. We, being conversant with decimal calculations, will find the first method
(method with calculations in decimal) more convenient, but in computers it is always the
method using hexadecimal computations, that is, the second method, here, of separating
the digits by hexadecimal division which is simpler to use. If we want to convert BCD
to hex, we would find decimal division successively by 16, to separate the digits to be
more convenient, while in the computer, multiplying the decimal digits by powers 0A in
the hexadecimal system and adding the hex results in the hex base would be convenient.
While going from an arbitrary base to another arbitrary base, we may find it convenient
to go via decimal system using decimal computations, and in the computers it will be
convenient to go through hex system using hex calculations.
Exercise: Convert the number AB5 in base 13 to its equivalent in base 12, as
decimal- system-using people would do it, and also as hex-system-using computer would
do it.
[hint: decimal: AB5 in base 13 = (10*13 + 11)*13 + 5 = 1838 decimal = 12)1838
Hence the result: AB5 in base 13 = 1092 in base 12 12)153 – 2↑
12)12 – 9 |
1–0|
In hexadecimal computation: AB5 = (A*D + B)*D + 5 = 72E hex = C)72E
Hence the result: AB5 in base D = 1092 in base C C)99 – 2↑
C) C – 9 |
1–0| ]
With this background, we shall now study different programs in different styles
for the conversion of 4-digit hex to 5-digit BCD.
Programming Style 1: The first style we use is the simple minded approach to
successively divide the hex number by 0A hex and assemble properly, the different
decimal digits that we get. We continuously use word sized division, although some
divisions could be byte size (we are simple minded in this respect). The method,
involving hex operations, is ideally suited for a binary computer. We will consider the
number to be originally in register AX in hexadecimal, and we want to get the output 5
digits in DX:AX; the most significant digit in DX and the rest of the digits in AX. We
will use one register to store each digit, as we have adequate registers. CX, we use for
storing the divisor 0A initially and different shift counts (required for positioning the
71
digits properly) later. Registers BX, SI and DI are used to store three of the digits, while
the last two of the digits happen to be in DX and AX, where we would finally need them.
We would need no further registers. The program and its execution are given below.
The program is simple and self explanatory.
The style1.asm program
; 4-digit hex in ax, converted to 5-digit BCD in dx:ax

; regs used bx, cx, dx, si, di, bp
; algorithm used: separate digits by successively dividing by
; decimal 10 (hex 0a), and aligning the digits appropriately
codeseg segment
assume cs:codeseg
start: sub dx, dx; 0  dx, prepare to do word division
mov cx, 10; divisor for digit separation
; initially digits are separated and stored in different registers.
div cx ; word division
mov bx, dx; l.s.digit  bx
sub dx, dx
div cx
mov si, dx; next m.s.digit  si
sub dx, dx
div cx
mov di, dx; next m.s.digit  di
sub dx, dx
div cx
xchg ax,dx; next m.s.digit  ax and m.s.digit dx
; at this point dx has the m.s.digit as required, and bx has
; the l.s.digit properly positioned. Other digits will have
; to be properly positioned by shifting appropriately.
mov cx, 12
shl ax, cl; ax is positioned
mov cx,8
shl di, cl; di is positioned
mov cx, 4
shl si, cl; si is positioned
; now all we require is to assemble the last 4-digits of the
; result in ax.
add ax, si
add ax, di
add ax, bx
int 01
codeseg ends
end start
Execution of the .exe program: (i) Program unassembled
-u 0 30
13D6:0000 2BD2 SUB DX,DX

13D6:0002 B90A00 MOV CX,000A
13D6:0005 F7F1 DIV CX
13D6:0007 8BDA MOV BX,DX
13D6:000B F7F1 DIV CX
13D6:000D 8BF2 MOV SI,DX
13D6:000F 2BD2 SUB DX,DX
13D6:0011 F7F1 DIV CX
13D6:0013 8BFA MOV DI,DX
13D6:0017 F7F1 DIV CX
72
13D6:0019 92 XCHG DX,AX
13D6:001A B90C00 MOV CX,000C
13D6:001D D3E0 SHL AX,CL
13D6:001F B90800 MOV CX,0008
13D6:0022 D3E7 SHL DI,CL
13D6:0024 B90400 MOV CX,0004
13D6:0027 D3E6 SHL SI,CL
13D6:0029 03C6 ADD AX,SI
13D6:002B 03C7 ADD AX,DI
13D6:002D 03C3 ADD AX,BX
13D6:002F CD01 INT 01
(ii) Testing of the program with data FFEF hex = 65519 decimal
-rax
AX 0000
:ffef
; initialize ax with the hex data FFEF
-r
; diplay intial register contents
AX=FFEF BX=0000 CX=0031 DX=1234 SP=0000 BP=0000 SI=0000 DI=0000
13D6:0000 2BD2 SUB DX,DX ;execute the 1st instn.
-t16
; trace 16 hex (that is 22 decimal) instructions.
AX=FFEF BX=0000 CX=0031 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0002 NV UP EI PL ZR NA PE NC
13D6:0002 B90A00 MOV CX,000A ;2nd
AX=FFEF BX=0000 CX=000A DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D6:0005 F7F1 DIV CX ;3rd
AX=1997 BX=0000 CX=000A DX=0009 SP=0000 BP=0000 SI=0000 DI=0000

13D6:0007 8BDA MOV BX,DX ;4th

13D6:0009 2BD2 SUB DX,DX ;5th

DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=000B NV UP EI PL ZR NA PE NC
13D6:000B F7F1 DIV CX ;6th
AX=028F BX=0009 CX=000A DX=0001 SP=0000 BP=0000 SI=0000 DI=0000

DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=000D NV UP EI PL ZR NA PE NC
13D6:000D 8BF2 MOV SI,DX ;7th

DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=000F NV UP EI PL ZR NA PE NC
13D6:000F 2BD2 SUB DX,DX ;8th

13D6:0011 F7F1 DIV CX ;9th

13D6:0013 8BFA MOV DI,DX ;10th
73
13D6:0015 2BD2 SUB DX,DX ;11th

13D6:0017 F7F1 DIV CX ;12th

13D6:0019 92 XCHG DX,AX ;13th

DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=001A NV UP EI PL ZR NA PE NC
13D6:001A B90C00 MOV CX,000C ;14th
AX=0005 BX=0009 CX=000C DX=0006 SP=0000 BP=0000 SI=0001 DI=0005

13D6:001D D3E0 SHL AX,CL ;15th
AX=5000 BX=0009 CX=000C DX=0006 SP=0000 BP=0000 SI=0001 DI=0005

DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=001F NV UP EI PL NZ NA PE NC
13D6:001F B90800 MOV CX,0008 ;16th

13D6:0022 D3E7 SHL DI,CL ;17th

13D6:0024 B90400 MOV CX,0004 ;18th

13D6:0027 D3E6 SHL SI,CL ;19th

13D6:0029 03C6 ADD AX,SI ;20th

DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=002B NV UP EI PL NZ NA PO NC
13D6:002B 03C7 ADD AX,DI ;21st

DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=002D NV UP EI PL NZ NA PO NC
13D6:002D 03C3 ADD AX,BX ;22nd

DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=002F NV UP EI PL NZ NA PO NC
13D6:002F CD01 INT 01 ;23rd instruction not executed.
; The result in DX:AX is seen to be 6:5519, as the equivalent of FFEF hex.

; The registers BX, CX, DX, SI and DI have lost their original contents.
-q
Review and Comments on the style of the above Program: As already stated,
the program is simple. The only attempt at economic register management is seen in
74
making a single register choice of CX register, initially for the devisor store, and later,
after the digit separation, for the shift count store. For shift count store, no other register
will be useful. However, for storing the divisor 0A another register, BP, could have been
used, which means the demands of the program are much less compared to the register
resources available. The operations performed are mindlessly repeated as many times as
required without any attempts to optimize. Firstly, digit separation using word division,
and then positioning the words for the final assembly. The data is handled throughout in
terms of words, while byte handling at places could have simplified the operation. The
style reminds me of the children’s story style, with repetitions of identical stuff many
times. It is tolerable perhaps in a beginner’s program.
Programming Style 2, efficient Resource Management: In this style we use

again the same algorithm and try to improve the program to overcome the deficiencies
noted above. After two word divisions, we use byte divisions, and carry utmost economy
in register usage by handling the digits as bytes rather than words. CX, CL register is
used for divisor and for shift count. The program conceived on this basis is shown
below, along with a demonstration in the debug.
Style2.asm
; As before, the data to be converted is considered to be in register AX.
; Consider result to be d4d3d2d1d0, where each dn is a nibble of BCD digit.
; We want to have 000d4 in dl, and d3d2d1d0 in ax as the result.
; Registers destroyed: BX, CX and DX
codeseg segment
assume cs:codeseg
begin: mov cx,0ah ; decimal 10
sub dx,dx ; preparing for word division
div cx
mov bx,dx ; 000d0  bx
sub dx,dx ; for word division again
div cx ; 00  dh, 0d1  dl
div cl ; byte division now
xchg bh, ah ; 0d20d0  bx ;00  ah for next byte division
div cl ; 0d30d4  ax
xchg dl, al ; 000d4  dx; 0d30d1  ax
mov cl, 04
shl ax,cl ; d30d10  ax
add ax,bx ; d3d2d1d0  ax; dx already has 000d4
; hence the result is ready
int 01
codeseg ends
end begin
Execution of the program in the debug (i) Program unassembled
-u 0 1b
13D5:0000 B90A00 MOV CX,000A

13D5:0005 F7F1 DIV CX
13D5:000D F6F1 DIV CL
13D5:000F 86FC XCHG BH,AH
75
13D5:0011 F6F1 DIV CL
13D5:0013 86D0 XCHG DL,AL
13D5:0015 B104 MOV CL,04
13D5:0017 D3E0 SHL AX,CL
13D5:0019 03C3 ADD AX,BX
13D5:001B CD01 INT 01
(ii) Program executed for the data ABCD hex = 43981 decimal
-rax
AX 0000
:abcd
-r
AX=ABCD BX=0000 CX=001D DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0000 B90A00 MOV CX,000A
-t 0d ; execute the next 0d(hex) or 13(decimal) instructions
AX=ABCD BX=0000 CX=000A DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

AX=ABCD BX=0000 CX=000A DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0005 F7F1 DIV CX
AX=112E BX=0000 CX=000A DX=0001 SP=0000 BP=0000 SI=0000 DI=0000



AX=01B7 BX=0001 CX=000A DX=0008 SP=0000 BP=0000 SI=0000 DI=0000

13D5:000D F6F1 DIV CL
AX=092B BX=0001 CX=000A DX=0008 SP=0000 BP=0000 SI=0000 DI=0000

13D5:000F 86FC XCHG BH,AH
AX=002B BX=0901 CX=000A DX=0008 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0011 F6F1 DIV CL


13D5:0015 B104 MOV CL,04
76
13D5:0017 D3E0 SHL AX,CL

13D5:0019 03C3 ADD AX,BX

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=001B NV UP EI PL NZ NA PE NC
13D5:001B CD01 INT 01
-q
Review and Comments on Style 2: In the program of Style 2, we see that even
though the same algorithm is used, the resource management has been very finely tuned
to the problem at hand. Perhaps in this program, it may not be possible to alter a single
instruction (and of course alter the rest of the program, if necessary, to give proper result)
without altering the efficiency of the program. It is something like a good piece of
poetry. Good poetry (like the Elegy Written in a Country Churchyard, by Thomas Gray)
they say, is such that a single word in the writing cannot be replaced by an alternative
word, without somehow degrading the quality of the writing. I call this style therefore,
the poetry style, and it is this style which good programmers normally try to develop.
There are a large number of variations possible, in the programming domain, between the
simple style 1 and the style 2, to suit the taste and capabilities of any programmer. An
example of such an intermediate type of program is given below without comments and
without a demonstration of its working, for the purpose of your study. As one
approaches the reasonably perfect program, it becomes more and more difficult to
improve on the program, till at last one comes to a point where one thinks further
improvement is not worth the trouble. That will be the style 2 program. Below I give a
program which is only partially optimized, with a style between the styles 1 and 2. The
program is given without comments for your study.
A Program with a style between Style 1 and Style 2 for hex-to-BCD conversion.
; This program uses regs. BX, CX, DX and SI

codehere segment
assume cs: codehere
strt: mov si, 0ah
mov cx, 4
sub dx,dx
div si
mov bx,dx
mov dl,dh
div si
shl dx,cl
or bx,dx
mov dl,dh
div si
mov bh,dl
mov dl,dh
div si
xchg ax,dx
ror ax,cl
or ax,bx
int 01
codehere ends
77
end strt
Programming Style 3, extracting Full Power from the Instruction Set: This is
a very complex style of programming wherein one tries to exploit as much as possible,
the raw power of the processor instructions and capabilities. Properly exploited, this
method would provide the best possible program for a given job. May be, this requires a
little thinking in what is sometimes called ‘out of the box’ fashion. It is not worth
wasting time on this, as it is a sort of creative type of activity, where there is no guarantee
of a solution. If you get it, you get it, else, you don’t; so leave it at that. In our chosen
example, we still follow the same method of digit separation and positioning, but we do it
in a slightly more efficient fashion. The following is the program:
Style3.asm
; Input is assumed in AX as before, and the equivalent 5-digit BCD in DX:AX.

; Registers used BX, CX and DX.
; Instead of dividing by 10 four times (twice word, and twice byte in style 2),
; here we divide by 100 twice, (once word div and one more byte div) and the
100s
; are converted to 10s and units digits using the instruction AAM, and this is
; an extended or unusual use of the instruction. It should be noted that while
; handling AAM on a two digit hex in AL, it is not necessary to have the
; register AH cleared, AAM automatically loads AH with the upper BCD digit.
code_here segment
assume cs:code_here
star: mov cx, 100; 100 decimal = 64 hex
sub dx, dx ; prepare for word division
div cx ; 00  DH and hex eq. of d1d0 BCD  DL as hex
div cl ; 0d4  AL and hex eq. of d3d2 BCD  AH
xchg dl,al ; 000d4  DX and hex of d1d0  AL; (hex of d3d2 BCD  AH)
mov bl, ah ; hex of d3d2  BL
aam ; 0d10d0  AH
xchg bx,ax ; 0d10d0  BX; hex of d3d2  AL
aam ; 0d30d2  AX
xchg al,bh ; 0d30d1  AX; 0d20d0  BX
rol ax, cl ; watch this! (CL=64; but effective rotation is only 4).
; d30d10  AX
add ax, bx ; d3d2d1d0  AX, DX already has 000d4; all set to finish
int 01 ; finish
code_here ends
end star
Program executed in debug: (i) program unassembled
-u 0 18
13D5:0000 B96400 MOV CX,0064

13D5:0005 F7F1 DIV CX
13D5:0007 F6F1 DIV CL
13D5:000B 8ADC MOV BL,AH
13D5:000D D40A AAM
13D5:000F 93 XCHG BX,AX
13D5:0010 D40A AAM
13D5:0012 86C7 XCHG AL,BH
13D5:0014 D3C0 ROL AX,CL
13D5:0016 03C3 ADD AX,BX
78
13D5:0018 CD01 INT 01
(ii) Program tested with data AFBE hex = 44990 decimal
-rax
AX 0000
:afbe
-r
AX=AFBE BX=0000 CX=001A DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0000 B96400 MOV CX,0064
-t c ; execute 12 (0C hex) instructions
AX=AFBE BX=0000 CX=0064 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

AX=AFBE BX=0000 CX=0064 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0005 F7F1 DIV CX
AX=01C1 BX=0000 CX=0064 DX=005A SP=0000 BP=0000 SI=0000 DI=0000

13D5:0007 F6F1 DIV CL; 5A hex = 90 decimal
AX=3104 BX=0000 CX=0064 DX=005A SP=0000 BP=0000 SI=0000 DI=0000

13D5:0009 86D0 XCHG DL,AL ; 31 hex = 49 decimal

13D5:000B 8ADC MOV BL,AH

13D5:000D D40A AAM ; note AH is over-written by this instruction.

13D5:000F 93 XCHG BX,AX; 5A hex = 90 decimal

13D5:0010 D40A AAM

13D5:0012 86C7 XCHG AL,BH; 31 hex = 49 decimal

13D5:0014 D3C0 ROL AX,CL

13D5:0016 03C3 ADD AX,BX
79
13D5:0018 CD01 INT 01
-q
Review and Comments on Style 3: This style 3 may not always be available as
pointed out already. But when available and properly applied, it can give the best results
by way of giving a very efficient program. It may be a little difficult to comprehend also,
and may require extensive commenting in the assembly program at every instruction
used. I would call this a power method or a creative method, requiring an extensive and
thorough knowledge of the instruction set. Normal programmers will do well not to
bother too much about this type of programming. This is also difficult to maintain and
modify if required.
Programming Style 4: This style makes use of an algorithm which is not suitable
for the processor on hand and is presented here as a style to be avoided. In terms of
literary activities, this style would correspond to using a method of presentation which is
not befitting the theme presented, like trying to write a big novel on a material suitable
only for a short story. Only consummate artists may perhaps do it effectively
successfully.
Many processors may not be fully geared to handle certain specific types of jobs.
The Intel 8086 for example, is not very efficient for handling computations in decimal. If
we make the 4-digit hex to 5-digit BCD conversion by decimal computations in this
processor, we will have to be using very circuitous methods. All that 8086 can do in
respect of multi digit decimal handling is to handle 2-digit decimal addition/ subtraction.
The hex-to-BCD conversion can still be done and here is a way of doing it. But I repeat,
the method becomes quite complex and wasteful of resources and is recommended to be
avoided.
In the program given, the most significant digit is still computed using subtraction
for simplicity, and the remaining 4 digits, whose hex value can at most be 270F (=9999
decimal) are found by decimal computation (see discussion at the beginning of this
chapter). The method consists of finding the place value of each bit in decimal, and
adding it to the result number as a decimal number, if the corresponding bit is present in
the hex number to be converted. To give a small example, if we want to calculate the
decimal value of 10111 binary, we calculate the weights of each bit in decimal, b4 = 16
decimal, b3 = 8 decimal, b2= 4 decimal, b1 = 2 decimal and b0 = 1 decimal. In the given
number, b3 is absent, so the decimal value of the number is (16+4+2+1); addition to be
done in the decimal system, and it works out to 23 decimal. The program is given below,
with a test demo. Remember, we are not even using Horner’s rule here.
Program Style4.asm
co segment
assume cs:co
star: mov si, -1; in si we want to get the digit d4 using successive
; subtraction of 10000 (dec) from the given
number
80
mov bx, 10000
back: inc si
sub ax, bx
jnc back
add ax, bx; the m s digit d4 is now in SI; remaining 14-bit no. in ax
mov cx, 14; loop count
mov di, ax; the remaining 14-bit number to DI for bit checking
sub bx, bx; bx is where we are adding the decimal numbers
; which will give us the final result (along with SI)
mov dx, 1 ; in dx we have the weight of the current bit in decimal.
jmp down
loopst: mov al, dl
add al,al
daa
mov dl, al
mov al, dh
adc al,al
daa
mov dh, al ; decimal doubling of DX contents
down: shr di,1 ;
jnc loopend
mov al,bl
add al,dl
daa
mov bl, al
mov al, bh
adc al, dh
daa
mov bh, al ; decimal adding of DX to BX
loopend: or di, di ; check for any data in di
loopnz loopst; loop termination if di = 0, or count = 14.
mov ax, bx
mov dx, si
int 01
co ends
end star
-u 0 42
13D5:0000 BEFFFF MOV SI,FFFF

13D5:0003 BB1027 MOV BX,2710
13D5:0006 46 INC SI
13D5:0007 2BC3 SUB AX,BX
13D5:0009 73FB JNB 0006
13D5:000B 03C3 ADD AX,BX
13D5:000D B90E00 MOV CX,000E
13D5:0010 8BF8 MOV DI,AX
13D5:0012 2BDB SUB BX,BX
13D5:0014 BA0100 MOV DX,0001
13D5:0017 EB0F JMP 0028
13D5:0019 90 NOP ; not in the .asm; inserted by the assembler
13D5:001A 8AC2 MOV AL,DL
13D5:001C 02C0 ADD AL,AL
13D5:001E 27 DAA
13D5:001F 8AD0 MOV DL,AL
13D5:0021 8AC6 MOV AL,DH
13D5:0023 12C0 ADC AL,AL
13D5:0025 27 DAA
13D5:0026 8AF0 MOV DH,AL
13D5:0028 D1EF SHR DI,1
13D5:002A 730E JNB 003A
13D5:002C 8AC3 MOV AL,BL
13D5:002E 02C2 ADD AL,DL
81
13D5:0030 27 DAA
13D5:0031 8AD8 MOV BL,AL
13D5:0033 8AC7 MOV AL,BH
13D5:0035 12C6 ADC AL,DH
13D5:0037 27 DAA
13D5:0038 8AF8 MOV BH,AL
13D5:003A 0BFF OR DI,DI
13D5:003C E0DC LOOPNZ 001A
13D5:003E 8BC3 MOV AX,BX
13D5:0040 8BD6 MOV DX,SI
13D5:0042 CD01 INT 01
-rax
AX 0000
:abcd
-r
AX=ABCD BX=0000 CX=0044 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0000 BEFFFF MOV SI,FFFF
-t22; trace next 34 (22h) instruction executions.

AX=ABCD BX=0000 CX=0044 DX=0000 SP=0000 BP=0000 SI=FFFF DI=0000
13D5:0003 BB1027 MOV BX,2710
AX=ABCD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=FFFF DI=0000

13D5:0006 46 INC SI
AX=ABCD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL ZR AC PE NC
AX=84BD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 NV UP EI NG NZ NA PE NC
13D5:0009 73FB JNB 0006

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI NG NZ NA PE NC
13D5:0006 46 INC SI

AX=5DAD BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0001 DI=0000

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 OV UP EI PL NZ NA PO NC
13D5:0009 73FB JNB 0006

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 OV UP EI PL NZ NA PO NC
13D5:0006 46 INC SI

AX=369D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0002 DI=0000

13D5:0009 73FB JNB 0006
82
13D5:0006 46 INC SI

AX=0F8D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0003 DI=0000

13D5:0009 73FB JNB 0006

13D5:0006 46 INC SI

AX=E87D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0004 DI=0000

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0009 NV UP EI NG NZ NA PE CY
13D5:0009 73FB JNB 0006
AX=E87D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0004 DI=0000

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000B NV UP EI NG NZ NA PE CY
13D5:000B 03C3 ADD AX,BX

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000D NV UP EI PL NZ NA PE CY
13D5:000D B90E00 MOV CX,000E
AX=0F8D BX=2710 CX=000E DX=0000 SP=0000 BP=0000 SI=0004 DI=0000

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0010 NV UP EI PL NZ NA PE CY
13D5:0010 8BF8 MOV DI,AX
AX=0F8D BX=2710 CX=000E DX=0000 SP=0000 BP=0000 SI=0004 DI=0F8D

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ NA PE CY

13D5:0014 BA0100 MOV DX,0001

13D5:0017 EB0F JMP 0028

13D5:0028 D1EF SHR DI,1
AX=0F8D BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=002A NV UP EI PL NZ NA PE CY
13D5:002A 730E JNB 003A
AX=0F8D BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=002C NV UP EI PL NZ NA PE CY
13D5:002C 8AC3 MOV AL,BL
AX=0F00 BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=002E NV UP EI PL NZ NA PE CY
13D5:002E 02C2 ADD AL,DL
83
13D5:0030 27 DAA

13D5:0031 8AD8 MOV BL,AL

13D5:0033 8AC7 MOV AL,BH

13D5:0035 12C6 ADC AL,DH

13D5:0037 27 DAA

13D5:0038 8AF8 MOV BH,AL

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=003A NV UP EI PL ZR NA PE NC
13D5:003A 0BFF OR DI,DI

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=003C NV UP EI PL NZ NA PE NC
13D5:003C E0DC LOOPNZ 001A ; this is a trace of one run of the loop
-g 42
; 14 such runs are executed (unless there is termination due to zero flag)

13D5:0042 CD01 INT 01 ; here loop has terminated after 12 runs
; still 2 runs to go, as CX = 2.
-q
Review and Comments on Style 4: The method used in this program involves
digit separation and summation of the place values of the binary digits in decimal terms.
We are using 4-digit BCD summation, while the processor provides only two digit
summation and that too using two instructions. One of the data is to be in AL for the 2-
digit decimal add to be successful. A good part of the program, practically the entire loop
in the above program, is devoted to this double byte decimal addition processes. This is
the price paid for using algorithms, with not well supported computational techniques in a
processor system. In this case the price paid is in terms of the time of execution and
memory space required for the program, along with increased usage of register resources.
Notice the loop has a lot of instructions and all these instructions are executed 14 times
normally, which reflects heavily on the time of execution of the program. This program
could be simplified using Horner’s Rule for polynomial evaluation as shown in the ALP
below. The program is not explained, nor commented. These are left to the reader as
exercises.
84
co segment
assume cs:co
sr: mov cx,10000
mov dx,-1
back: inc dx
sub ax,cx
jnc back
add ax,cx
shl ax,1
shl ax,1
mov bx,ax
mov cx,14
sub ax,ax
star: shl bx,1
adc al,al
daa
xchg al,ah
adc al,al
daa
xchg al,ah
loop star
int 1
co ends
end sr
Programming Style 5, add your own Instructions to the Instruction Set:

While programming you may sometimes feel, “Ah! If only there is an instruction that
would do this, and this, oh! It would be wonderful!” No need to despair. You can cook
up your own instructions and use them. When in a particular problem, there is a small set
of operations to be done again and again, it would be helpful to bundle up all these
operations by having a short name for the operation bundle, and invoke this as many
times as you want. To illustrate this, we go back to our Programming style 1, where we
had a sort of mindless repetition of word division, shifting and adding. We can combine
all these, put them in a sort of macro instruction bundle and keep using the macro as
many times as we want, without having to get bored of writing similar sequence of
instructions several times, beset with possibilities of committing mistakes while we
repeat the sequence. It further, makes the program a lot easier to follow. The program
will also look more elegant. Let us go to the details. Macros are more seriously covered
in the next chapter.
Style5.asm
; The problem is still the same, namely converting 4-digit hex to 5-digit BCD
; The hex number to be converted is in register AX, result in DX:AX
; BX, CX and SI are the other registers used by the program.
code segment
assume cs:code
; start: ; first we define the macro bundle 'sep' with shift parameter
'num'
sep macro num ; macro with name on the left and parameter(s) on
; the right
sub dx, dx ; Clear dx preparatory to word division
div si ; si must have decimal 10 when the macro is used
mov cl,num
shl dx, cl
add bx, dx ; bx is where the decimal number is built up
endm ; end of macro bundle.
85
; macro over, we will now use it in the program
start: mov si,0ah ; divisor 10
sub bx, bx ; initialize bx to zero.
sep 00 ; macro used with the shift count 00 to be loaded in CL
sep 04 ; shift count 4 for the next digit and so on.
sep 08
sep 12
mov dx, ax ; the m.s.digit to dx.
mov ax, bx ; move the assembled 4 digits to ax.
int 01 ; terminate
code ends
end start
; Apart from the macro definition and the macro use, there are only four
; other instructions in the program.
; The assembled program is shown below from the debug. See the expanded macros!
-u 0 31
13D5:0000 BE0A00 MOV SI,000A

13D5:0007 F7F6 DIV SI ;
13D5:0009 B100 MOV CL,00 ;macro sep 00
13D5:000B D3E2 SHL DX,CL
13D5:000D 03DA ADD BX,DX
13D5:000F 2BD2 SUB DX,DX
13D5:0011 F7F6 DIV SI ;
13D5:0013 B104 MOV CL,04 ; macro sep 04
13D5:0015 D3E2 SHL DX,CL
13D5:0017 03DA ADD BX,DX
13D5:001B F7F6 DIV SI ;
13D5:001D B108 MOV CL,08 ; macro sep 08
13D5:001F D3E2 SHL DX,CL
13D5:0021 03DA ADD BX,DX
13D5:0025 F7F6 DIV SI ;
13D5:0027 B10C MOV CL,0C ; macro sep 12
13D5:0029 D3E2 SHL DX,CL
13D5:002B 03DA ADD BX,DX
13D5:002D 8BD0 MOV DX,AX
13D5:002F 8BC3 MOV AX,BX
13D5:0031 CD01 INT 01
-q
; The working of this program is not shown; may be studied as an exercise.
Review and Comments on Style 5: If style 1 made the program something like
a children’s story, the style 5 makes it even simpler; it reduces the program almost to a
child’s play! If the problem permits by having operations that repeat several times, this
style is very simple in respect of visualizing the operations and writing the program.
However time and memory optimization may not be there to the extent possible. In the
interest of repetition here, we have to use word division throughout and byte operations
cannot be used. The resulting time and memory economy will not be there. Further, in
the above program, the first use of the macro does unnecessary shift and add operations.
The whole macro could be replaced for the first time by: SUB DX, DX
DIV SI
MOV BX, DX
86
If that was done, the 2nd instruction of our program, namely SUB BX, BX will become
superfluous and could be omitted without harm. As a basis to be improved towards style
2, this sort of program is easy to write, and the assembled program gives us a starting
style 1 program for being taken towards style 2.
In the foregoing, we have seen a few programming styles. Style 3 is the best from
memory and time efficiency points of view, but is difficult for those with an average
ability in the use of the instruction set, to venture into. Styles 1 or 5 may form the basis
of our starting framework, to be kept at the back of our mind, and on the fly, using the
ideas of these styles, we could attempt actual program writing in style 2, which is perhaps
the normal assembly language programmer’s goal. In the initial analysis we will have to
find the algorithm that best suits the given problem and the processor system we have, so
as to avoid programming in style 4. Style 3 may be left out actually, as the gain from this
will be marginal, and it will not be worthwhile considering the effort to be put in, as well
as the depth and the breadth of the system knowledge required for the purpose. The
program written in this style is not good from maintainability point of view either. Any
modification or alteration to the program will be quite difficult. In this type of program, a
change at one point may produce unanticipated side effects elsewhere in the program
which may turn out to be very hard to catch and correct. Optimality and maintainability
are conflicting requirements many a time, and assembly language programmers are
advised not to use style 3 programs, but to settle for style 2 or style 5, with moderate
optimization and with adequate comments indicating the logic of the processes. We
don’t have experts available all the time to handle any modifications or alterations to the
program when required. The programs must be understandable, not only to the original
programmer, but also to any programmer with average expertise, at any time, in order for
it to be maintainable.
Although style 3 programs are not to be used for commercial purposes, from the point
of learning the art of programming, developing expertise and for getting a deeper
knowledge of the instruction set and the processor, they are perhaps the best.
Exercises:
1. Using the ideas presented in this chapter, write a program to convert 4-digit BCD
number in the AX register to 4-digit hex also output in the AX register.
2. Study the 6-digit hex to 8-digit BCD conversion program bincvt given last in Fig
10.35 of the Microprocessor book by Douglas Hall (2nd edition or 2nd revised
addition, TMH Publications) and identify the programming style used. Could
you think of a suitable style 1or style 2 programs in this context?
3. Given below is a Style 3 program, without comments, for converting 4-digit
BCD in AX, to its equivalent hex. The out put is in register AX itself. The
program uses BX and CX registers. Figure out the logic of the program and fill
in the comments.
co segment
assume cs:co
strt: mov bx,ax
and ax,0f0f0H
87
mov cl,2
shr ax,cl
sub bx,ax
shr ax,1
sub bx,ax
mov ah,bh
sub al,al
shr ax,1
sub bx,ax
shr ax,cl
sub bx,ax
inc cl
shr ax,cl
add ax,bx
int 01
co ends
end strt
4. Here is another optimized program for doing the 4-digit BCD to 4-digit hex
conversion, also given without comments. Test the program and reason out how
it works. The program enters with the BCD number in AX (uses just the two
registers CX and DX) and returns the hex result also in AX.
code segment
assume cs: code
strt: mov dx, ax
mov cx, 0a04h
and ax, 0f0f0h
sub dx, ax
rol ax, cl
mov cl, dl
mov dl, ah
mul ch
add al, dh
mov dh, ah
mul ch
add ax, dx
mov dl, ch
mul dx
mov ch, dh
add ax, cx
int 1
code ends
end strt
Compare the two programs given in the exercises 3 and 4 above. Both are
perhaps style 3 programs, however. Determine which one is the worst of the two.
You may note that the program of exercise 4 uses Horner’s rule. Observe the way in
which the registers are managed and the whole process is optimized in this program.
88
4. MACROS AND SUBROUTINES
Macros and Subroutines normally appear to be doing similar type of jobs, namely,
avoiding writing the same string of instructions several times in a program. However,
there are quite a lot of differences between the two. We shall be looking into these
differences and then learning about the proper use of Macros and Subroutines (or
Procedures) in this chapter.
Features of a Macro: We have already been introduced to macros in the previous

chapter. There, we have described macros as a sort of user defined sequence of
instructions, in which the operands could be varied as per the parameters of the macro.
Well. They have certain additional capabilities as well, like handling loops, conditional
operations etc, where labels are to be used, and these labels have to be localized for the
particular invocation of the macro, and should not be repeated when the macro is invoked
again. This aspect is explained below:
1. Macros can support local labels for instructions: If we want to have a
loop or a conditional jump we need to provide a label for the loop start or
the conditional jump destination. Let us consider we want a loop to be
handled in a macro, and let us say we have labeled the loop start
instruction as lpst. As we know when we invoke the macro, the entire
sequence of instructions with the label and everything will be inserted at
the point of invocation, only the parameters of the macro will be
substituted by the parameters supplied at the invocation. If the macro
with the label is invoked, the label will be appearing as such in the
sequence of the macro instructions. If the invocation of the macro is
done only once in the program it will work fine. But if invoked more
than once (which is why we bundle it as a macro), the label for the loop
inside the macro will carry the same name lpst at the start of the loop in
every instance of the macro. In the program this will cause a lpst label to
come up once for every invocation of the macro. This program will
obviously not work. In order to overcome this problem, such labels of the
macro will have to be defined as local to the macro, right at the beginning
in the macro. The following program will illustrate the use of local
variables, for a conditional jump operation:
THE .ASM PROGRAM
code segment
assume cs:code
ddd macro rg,n
local lbl
mov rg,n
or rg,rg
jnz lbl
inc rg
lbl:
endm
strt: ddd ax,0
ddd bx,4
int 01
code ends
89
end strt
THE.LST FILE OBTAINED FROM THE ASSEMBLER

Page 1-1
0000 code segment

assume cs:code
ddd macro rg,n
local lbl
mov rg,n
or rg,rg
jnz lbl
inc rg
lbl:
endm
0000 strt: ddd ax,0 ; first invocation of macro
; followed by expansion by the
; assemler
0000 B8 0000 1
mov ax,0
0003 0B C0 1
or ax,ax
0005 75 01 1
jnz ??0000 ; first value of ‘lbl’
0007 40 inc ax1
0008 ??0000: 1 ;see how labels are localized.
ddd bx,4 ; second invocation & expansion
0008 BB 0004 1 mov bx,4
000B 0B DB 1 or bx,bx
000D 75 01 1 jnz ??0001 ; second value of the same
; label.
000F 43 1 inc bx
0010 1 ??0001:
0010 CD 01 int 01
0012 code ends
end strt
Symbols-1
Macros:
N a m e Lines
DDD . . . . . . . . . . . . . . 5
CODE . . . . . . . . . . . . . . 0012 PARA NONE
Symbols:
STRT . . . . . . . . . . . . . . L NEAR 0000 CODE
??0000 . . . . . . . . . . . . . L NEAR 0008 CODE

??0001 . . . . . . . . . . . . . L NEAR 0010 CODE
@CPU . . . . . . . . . . . . . . TEXT 0101h
@FILENAME . . . . . . . . . . . TEXT hb
@VERSION . . . . . . . . . . . . TEXT 510
90
15 Source Lines
25 Total Lines
10 Symbols
0 Warning Errors
0 Severe Errors
DEBUGGING THE PROGRAM
-u 0 10
13D5:0000 B80000 MOV AX,0000

13D5:0003 0BC0 OR AX,AX
13D5:0005 7501 JNZ 0008
13D5:0007 40 INC AX
13D5:0008 BB0400 MOV BX,0004
13D5:000B 0BDB OR BX,BX
13D5:000D 7501 JNZ 0010 ;see the different labels produced.
13D5:000F 43 INC BX
13D5:0010 CD01 INT 01
-r

13D5:0000 B80000 MOV AX,0000
-g 10

13D5:0010 CD01 INT 01
2. Defining the macros at the beginning: Macros can be defined any

where in the assembly language program before they are invoked. Most
of the programmers, however, prefer defining the macros right at the
beginning of the program. It will be convenient for understanding and
debugging the program also, as the entire set of macros can be seen at one
single place. In contrast, subroutines are anyway separate from the main
program, and they can appear anywhere in the program module.
In the program seen above, we had the macro defined in the beginning
itself. It was necessary there, as the first instruction in the program was
invoking the macro. But even if it is not so, it is a good practice to define
the macro in the beginning. The following program is an example.
; an example (only for illustration purposes, otherwise this is replacing only
a ; single instruction) of defining the macro at the start of the code segment
and
; of using segment override for the parameter of Macro
co segment
assume cs:co
mox macro reg, n
mov reg,n ; macro defined here, seen only by the assembler
endm
begin: mov bx,20 ; the program starts from here with a normal instruction
91
mox ax, cs:[bx] ; note how the segment override is used
mox cl, 04 ; note how it is applicable to 8-bit regs also.
int 01
co ends
end begin
Note: direct instruction ‘mov ax, cs:[bx]’ will not be valid, and ‘cs: mox
ax, [bx]’ will also be not valid.
The assembled program seen in the debug

13D5:0000 BB1400 MOV BX,0014 ;
13D5:0003 2E CS: ;
13D5:0004 8B07 MOV AX,[BX] ;
13D5:0006 B104 MOV CL,04 ;
13D5:0008 CD01 INT 01
3. Macros do not require maintaining the stack balance: Procedures need

to maintain stack balance, that is, at the time of exit, the return address of
the subroutine which is stored at the stack top on entry to the subroutine,
must still be available at the stack top when the return instruction is to be
executed. This means that whatever is pushed onto the stack in the
subroutine must be popped and cleared, and nothing further is to be
popped before the return instruction is executed. If this condition is not
satisfied, the proper return address will not be available at the stack top
and the program will behave in an unpredictable fashion. Such a
requirement is not there with the macros, as stack is not used at all in
managing the macros. The following is a macro to illustrate this:
; the macro below just pushes 3 registers onto the stack
Pushreg macro r1, r2, r3
Push r1
Push r2
Push r3
endm
Such macros will be useful at the beginning of subroutines for saving three
registers in the stack at one stroke. A similar ‘popreg’ macro at the end
can recover the pushed registers. The Popreg macro which will undo the
above pushes to be placed at the end of the subroutine can be:
Popreg macro r1, r2, r3
Pop r3
Pop r2
Pop r1
Endm
Note the reverse order of registers in the pop operation so that the register
sequence could become identical as parameters in both the pushreg and
popreg macros. Also note this type of operation cannot be got done using
subroutines because of stack unbalance.
4. The parameters of the macro are more flexible than those of the
subroutines: In the assembly language, the parameters of the subroutine
are passed using registers or through the stack. This makes the
parameters to be of a fixed size. The option of using either word or byte
92
size parameters is not normally available for subroutines, whereas, in
respect of macros any size that makes sense in an instruction is valid. In
the program given under Para 2 above, we see the invocation of the macro
mox at two places. At the first instance, the reg parameter is the register
AX, and the n parameter is the CS segment over ridden indirect
addressing through register BX, that is the data in the memory at address
CS:BX. In the next case of the invocation, the parameter reg is the 8-bit
register CL, while the parameter n is just the simple number 04. Such
wide flexibility is unthinkable in procedures. In section 7 of this chapter
and also in chapter 6 we will see processor opcodes can also be used as
parameters of the macro, which will mean a single macro can execute
different operations depending on the opcode parameter used at its
invocation.
5. Macros are expanded in the executable programs, while subroutines

are executed by a returnable jump: When a macro is invoked, the
sequence of instructions making up the macro is directly inserted at the
place of invocation, with the parameters properly substituted. This
implies two things. Firstly, the macros increase the size of the executable
machine language program every time they are invoked as compared to
procedures, and secondly the program executes faster than with a
procedure doing the same job. The overhead of storing the return address
and taking an initial jump to the subroutine and a final jump back to
return to the stored address is not there in macros.
6. Macros exist only in the ALP and not at the Machine Language level:
Having said all the 5 points above, we have to note a fundamental
difference between macros and subroutines. Macros exist only at the
assembly language level, while the subroutines are seen at the machine
language level also. This means there are hardware provisions for
handling the subroutines by way of storing the return address in the stack,
while at the machine language level there are no macros visible. Macros
are only short cuts at the assembly language level and are handled by the
assembler software, but do not appear as separate entities in the
executable machine language programs.
Normally, for small operations, it is common practice to write macros, while large
and complex operations repeated several times are handled through subroutines, as a
result of the point 6 indicated above. An example of a good macro for improving the
DIV instruction is given below:
Macro Smart-div: The divide instruction is rather restrictive as shown below.
Divide instruction takes a double size dividend (double word for word division, or double
byte for byte division) and a single size devisor to produce a single size quotient and a
single size remainder. It is not always possible to have the quotient limited to single size,
when dividing a double size dividend by a single size divisor. Whenever the quotient
size exceeds, the processor does not carry out the division, but simply gives an indication
of the divide overflow by producing an internal interrupt in the processor. This can be
93
taken care of by the programmer like we did in case of style 2, and elsewhere in Chapter
3. But that may not always be possible. Sometimes exact idea of the quotient size may
not be known beforehand. In such cases to ensure that the program does not get caught at
this point, it is possible to think of a macro which will carry out the division properly,
producing a double size quotient, instead of single size. The macro given here, follows
almost the same register allocation for division inputs, that is DS:AX for the dividend of
word division, or only AX for the dividend of the byte division. The same registers carry
the quotient after the division. The divisor is specified as a parameter for the macro. An
additional single size parameter is provided for the remainder and is specified in the
macro. The macro is defined below with examples of its use.
THE PROGRAM
code segment
assume cs:code
smdiv macro d1,d2, dv, rr;; d1d2:double size dividend, dv : divisor,
;; rr: remainder
local down
sub rr,rr ;; clear remainder register
cmp d1,dv
jb down ;; if below, only one div is enough, so go down
xchg rr, d1 ;; else, save d1 in rr and load 00 in d1
xchg rr, d2 ;; net result of the 2 instns:0  d1, d1  d2, d2  rr
div dv ;; first division, rem  d1, quot.  d2
xchg rr, d2 ;; m.s.quotient  rr, l.s. part of dividend  d2
down: div dv
xchg rr, d1
endm
; examples of smart divide macro use
start: mov dx, 0abch
mov ax, 1234h
mov bx, 45abh
smdiv dx,ax,bx,cx
mov bx, dabch
mov ax, 2345h
mov cl, 35
smdiv ah,al, cl, bl
int 01
code ends
end start
DEBUG OPERATIONS
-u 0 33
13D5:0000 BABC0A MOV DX,0ABC

13D5:0003 B83412 MOV AX,1234
13D5:0006 BBAB45 MOV BX,45AB
13D5:0009 2BC9 SUB CX,CX
13D5:000B 3BD3 CMP DX,BX
13D5:000D 7206 JB 0015
13D5:000F 87CA XCHG CX,DX
13D5:0011 91 XCHG CX,AX ; 1st expansion of smdiv
13D5:0012 F7F3 DIV BX
13D5:0014 91 XCHG CX,AX
13D5:0015 F7F3 DIV BX
13D5:0017 87CA XCHG CX,DX
13D5:0019 BB0001 MOV BX,DABC; this data will be destroyed by smdiv
13D5:001C B84523 MOV AX,2345
13D5:001F B123 MOV CL,23
13D5:0021 2ADB SUB BL,BL
94
13D5:0023 3AE1 CMP AH,CL
13D5:0025 7208 JB 002F
13D5:0027 86DC XCHG BL,AH
13D5:0029 86D8 XCHG BL,AL ; 2nd expansion of smdiv
13D5:002B F6F1 DIV CL
13D5:002D 86D8 XCHG BL,AL
13D5:002F F6F1 DIV CL
13D5:0033 CD01 INT 01
-r

13D5:0000 BABC0A MOV DX,0ABC
-t 14
AX=0000 BX=0000 CX=0035 DX=0ABC SP=0000 BP=0000 SI=0000 DI=0000

13D5:0003 B83412 MOV AX,1234
AX=1234 BX=0000 CX=0035 DX=0ABC SP=0000 BP=0000 SI=0000 DI=0000

13D5:0006 BBAB45 MOV BX,45AB
AX=1234 BX=45AB CX=0035 DX=0ABC SP=0000 BP=0000 SI=0000 DI=0000

13D5:0009 2BC9 SUB CX,CX ; 1st smdiv

13D5:000B 3BD3 CMP DX,BX

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000D NV UP EI NG NZ NA PE CY
13D5:000D 7206 JB 0015

13D5:0015 F7F3 DIV BX
AX=2771 BX=45AB CX=0000 DX=44B9 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0017 87CA XCHG CX,DX
AX=2771 BX=45AB CX=44B9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0019 BB0001 MOV BX,DABC
; result: quotient 00002771 hex, remainder 44B9 hex.
AX=2771 BX=DABC CX=44B9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=001C NV UP EI NG NZ NA PE CY
13D5:001C B84523 MOV AX,2345
AX=2345 BX=DABC CX=44B9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=001F NV UP EI NG NZ NA PE CY
13D5:001F B123 MOV CL,23
AX=2345 BX=DABC CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0021 2ADB SUB BL,BL ; 2nd smdiv
95
AX=2345 BX=DA00 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
13D5:0023 3AE1 CMP AH,CL

13D5:0025 7208 JB 002F


13D5:0029 86D8 XCHG BL,AL

13D5:002B F6F1 DIV CL

13D5:002D 86D8 XCHG BL,AL

13D5:002F F6F1 DIV CL


13D5:0033 CD01 INT 01
; result: quotient 0101 hex, remainder 22 hex.
-q
Exercise: Multiplication of two byte size data need not always produce a word
size result. For example multiplication of the byte 32 hex with the byte 05 hex produces
a byte size result, 0FA hex. Write a macro which will return the product, if single byte,
only in register AL, without altering the register AH, and indicate the fact by returning a
cleared carry flag. In case the result is word size, the register AH will be altered to give
the complete word result and the fact is indicated by the carry flag. (The carry flag will
anyway indicate whether the result is a byte or a word. The idea of this macro is to save
AH register if it is not used by the result.) The macro may not really be very useful. It is
only given as an academic exercise. Solution to the problem is given below:
THE PROGRAM
code segment
assume cs: code
smmul macro mr ; mr is the multiplier register
local lbl
push cx
push ax
mul mr
pop cx ; note the manipulations in this and the next 2 instrns.
jc lbl ; carry is set by mul instn, if the result is a full word.
96
mov ah,ch
lbl: pop cx
endm
start: mov ax, 5632h
mov cl, 05
smmul cl
smmul cl
int 01
code ends
end start
DEBUGGING
-u 0 19
13D5:0000 B83256 MOV AX,5632

13D5:0003 B105 MOV CL,05
13D5:0005 51 PUSH CX
13D5:0006 50 PUSH AX
13D5:0007 F6E1 MUL CL
13D5:0009 59 POP CX
13D5:000A 7202 JB 000E
13D5:000C 8AE5 MOV AH,CH
13D5:000E 59 POP CX
13D5:000F 51 PUSH CX
13D5:0010 50 PUSH AX
13D5:0011 F6E1 MUL CL
13D5:0013 59 POP CX
13D5:0014 7202 JB 0018
13D5:0016 8AE5 MOV AH,CH
13D5:0018 59 POP CX
13D5:0019 CD01 INT 01
-r
AX=0000 BX=0000 CX=001B DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0000 B83256 MOV AX,5632
-t2
AX=5632 BX=0000 CX=001B DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0003 B105 MOV CL,05

13D5:0005 51 PUSH CX
-g f
AX=56FA BX=0000 CX=0005 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=000F NV UP EI PL NZ NA PO NC
13D5:000F 51 PUSH CX
;32 * 05 = FA. This is in AL; AH returned unchanged; carry is clear.
-g 19
AX=04E2 BX=0000 CX=0005 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0019 OV UP EI PL NZ NA PO CY
13D5:0019 CD01 INT 01
; FA * 05 = 4E2; this is in AH:AX; AH is occupied by the result; carry set.
-q
97
A macro to take in 4 BCD digits input from the keyboard, using the DOS
interrupt 21h, function1: Many times we need to take a 4 digit BCD number
from the key board. Here is a simple macro to do the job. It will take jus 4 BCD
digits from the key board ignoring the non- BCD keys. It can be improved by
making the “$” key as the terminating key from the key board, so that less than4-
digit numbers can be had, and also if we go wrong in making the entry, we can
reenter all 4 keys over again to get the correct number.
bcd4 macro reg ; register in which the number is to be
returned.
local again
xor bx, bx
mov cx, 0404h
mov ah, 1
again: int 21h
cmp al, 30h
jb again
cmp al, 39h
ja again
sub al, 30h
shl bx, cl
add bl, al
dec ch
jnz again
mov reg, bx
endm
7. Power of the macros to realize variable operations: The following example

illustrates how a single macro can achieve either addition or subtraction of two items of
data with large number of words. The trick is in using the opcode also as a parameter
for the macro. The following example illustrates the principle of the macro use in this
fashion. The program takes 2 large numbers of identical word lengths and does either
addition of the two multi word data or subtraction of the data and stores the result. The
result of the operation is stored at a third place. If the two words added are of n words in
size, the result space must be n+1 words in size.
The .ASM program
data segment
dat dw 234h, 5678h, 89abh, 7604h, 0abc0h, 3 dup(0)
daat dw 0abcdh, 2348h, 253h, 4589h, 0fb23h, 3 dup (0)
dat1 dw 8 dup(?)
dat2 dw 8 dup(?)
num dw 5
data ends
;
code segment
assume cs: code, ds: data, es: data
multas macro src1, src2, res, n, as
local bak
mov si, offset src1
mov di, offset res
mov bx, offset src2 - src1-2;; see the comment on statement below.
mov cx, n ;;
cld ;;
clc ;;
98
bak: lodsw ;;
as ax, [si + bx] ;; note the manipulations here
stosw
loop bak
as cx, cx ;; note, cx=0 here
mov [di], cx
endm
;
strt: mov ax, data
mov ds, ax
mov es, ax
multas dat, daat, dat1, num, adc
multas dat, daat, dat2, num, sbb
int 1
code ends
end strt
-u 0 3b
13E1:0000 B8DC13 MOV AX,13DC

13E1:0003 8ED8 MOV DS,AX
13E1:0005 8EC0 MOV ES,AX ; from here only the two macros
13E1:0007 BE0000 MOV SI,0000
13E1:000A BF2000 MOV DI,0020
13E1:000D BB0E00 MOV BX,000E
13E1:0010 8B0E4000 MOV CX,[0040]
13E1:0014 FC CLD
13E1:0015 F8 CLC
13E1:0016 AD LODSW
13E1:0017 1300 ADC AX,[BX+SI]
13E1:0019 AB STOSW
13E1:001A E2FA LOOP 0016
13E1:001C 13C9 ADC CX,CX
13E1:001E 890D MOV [DI],CX
13E1:0020 BE0000 MOV SI,0000
13E1:0023 BF3000 MOV DI,0030
13E1:0026 BB0E00 MOV BX,000E
13E1:0029 8B0E4000 MOV CX,[0040]
13E1:002D FC CLD
13E1:002E F8 CLC
13E1:002F AD LODSW
13E1:0030 1B00 SBB AX,[BX+SI]
13E1:0032 AB STOSW
13E1:0033 E2FA LOOP 002F
13E1:0035 1BC9 SBB CX,CX
13E1:0037 890D MOV [DI],CX
13E1:0039 CD01 INT 01
13E1:003B 83EC08 SUB SP,+08
-g
AX=B09D BX=000E CX=FFFF DX=0000 SP=0000 BP=0000 SI=000A DI=003A
DS=13DC ES=13DC SS=13DC CS=13E1 IP=003B NV UP EI NG NZ AC PE CY
13E1:003B 83EC08 SUB SP,+08
-d0 4f
13DC:0000 34 02 78 56 AB 89 04 76-C0 AB 00 00 00 00 00 00 4.xV...v........
13DC:0010 CD AB 48 23 53 02 89 45-23 FB 00 00 00 00 00 00 ..H#S..E#.......
13DC:0020 01 AE C0 79 FE 8B 8D BB-E3 A6 01 00 00 00 00 00 ...y............
13DC:0030 67 56 2F 33 58 87 7B 30-9D B0 FF FF 00 00 00 00 gV/3X.{0........
13DC:0040 05 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
99
8. Here is a beautiful example of the use of Macros for realizing hardware
functions and using them in an easily understandable way: The example below
shows a matrix keyboard as shown in the circuit, with a flow chart for a simple key
identification subroutine, along with a program following that flow chart. It should be
understood there are different methods possible for interpreting the keys, some with only
one key press permitted
at a time, and some with multiple keys permitted at a time. The program given is for a
simple interpretation of one key pressed at a time.
A Program to identify the key pressed to work on the above hardware, and
following the flowchart given: Note how the use of macros simplify the programming
; this subroutine keyid below, does the following:1. wait for clearing of all
; previous keys; 2. wait for a new key press. 3. Give a de bounce delay. 4. if
; the key is still remaining pressed, then identify and return with the key
; number in reg AH. The procedure assumes a hardware circuit as shown in the
; figure above, and a flow chart, also as shown above.
;
assume cs:code
code segment
; First we start with macro definitions
anykey macro
mov al, 00 ;; all rows to be 0's
out dx, al ;; dx has address of port A of 8255
add dx,2
in al, dx ;; column in, through port B
sub dx,2
and al, 0Fh
cmp al, 0Fh
100
endm
;;
rows macro patn, num
mov ah, num
mov al, patn
out dx, al
add dx,2
in al, dx
sub dx,2
and al, 0Fh
cmp al, 0Fh
jnz colchk ;;key pressed in the row, so check columns
endm
;;
cols macro ;;find the column of the pressed key, and return.
local back
mov cx, 4
back: ror al,1
jnc found
dec ah
loop back
jmp err
endm
;;
;; debounce macro is a standard delay macro using cx as counter
;; (20n – 8) clocks of delay will be produced normally by this macro
debounce macro n
local lup
mov cx, n
lup: nop
loop lup
101
endm
strt:call keyfind
int 01
;
keyfind proc near
start: anykey
jnz start ; some key pressed, so wait for its release, else
; all previous keys are cleared, now look for fresh key.
again: anykey
jz again ; if no, try again. Else debounce
debounce 5000 ; will produce about 20msec delay on a 5 MHz clock.
rows 0Eh, 3
rows 0Dh, 7
rows 0Bh, 0Bh
rows 07, 0Fh
err: stc ; no rows have key pressed, indicate error thro’ CY flag.
found: ret
colchk: cols ; find the column having the key pressed & return
keyfind endp
code ends
end strt
The List file for the above program as obtained from the assembler MASM: note
how the macros are expanded.
;KEYID routine
; this subroutine keyid below, does the following:1. wait for clearing of all
; previous keys; 2. wait for a new key press. 3. Give a debounce delay. 4. if
; the key is still remaining pressed, then identify and return with the key
; number in reg AH. The procedure assumes a hardware circuit as shown in the
; figure above, and a flow chart, also as shown above.
;
assume cs:code
0000 code segment
; First we start with macro definitions
anykey macro
mov al, 00 ;; all rows to be 0's
out dx, al ;; dx has address of port A of 8255
add dx,2
in al, dx ;; column in, through port B
sub dx,2
and al, 0Fh
cmp al, 0Fh
endm
;;
rows macro patn, num
mov ah, num
mov al, patn
out dx, al
add dx,2
in al, dx
sub dx,2
and al, 0Fh
cmp al, 0Fh
jnz colchk;;key pressed in the row, so check
;;columns
endm
102
;;
cols macro
local back
mov cx, 4
back: ror al,1
jnc found
dec ah
loop back
jmp err
endm
;;
;; debounce macro is a standard delay macro using cx as counter
;; now the subroutine using these macros
;
debounce macro n
local lup
mov cx, n
lup: nop
loop lup
endm
0000 E8 0005 R strt:call keyfind
0003 CD 01 int 01
;
0005 keyfind proc near
0005 start: anykey
0005 B0 00 1 mov al, 00 ;
0007 EE 1 out dx, al ;
0008 83 C2 02 1 add dx,2
000B EC 1 in al, dx ;
000C 83 EA 02 1 sub dx,2
000F 24 0F 1 and al, 0Fh
0011 3C 0F 1 cmp al, 0Fh
0013 75 F0 jnz start
0015 again: anykey
0015 B0 00 1 mov al, 00 ;
0017 EE 1 out dx, al ;
0018 83 C2 02 1 add dx,2
001B EC 1 in al, dx ;
001C 83 EA 02 1 sub dx,2
001F 24 0F 1 and al, 0Fh
0021 3C 0F 1 cmp al, 0Fh
0023 74 F0 jz again
debounce 5000
0025 B9 1388 1 mov cx, 5000
0028 90 1 ??0000: nop
0029 E2 FD 1 loop ??0000
rows 0Eh, 3
002B B4 03 1 mov ah, 3
002D B0 0E 1 mov al, 0Eh
002F EE 1 out dx, al
0030 83 C2 02 1 add dx,2
0033 EC 1 in al, dx
0034 83 EA 02 1 sub dx,2
0037 24 0F 1 and al, 0Fh
0039 3C 0F 1 cmp al, 0Fh
003B 75 38 1 jnz colchk ;
rows 0Dh, 7
003D B4 07 1 mov ah, 7
003F B0 0D 1 mov al, 0Dh
0041 EE 1 out dx, al
0042 83 C2 02 1 add dx,2
0045 EC 1 in al, dx
0046 83 EA 02 1 sub dx,2
103
0049 24 0F 1 and al, 0Fh
004B 3C 0F 1 cmp al, 0Fh
004D 75 26 1 jnz colchk ;
rows 0Bh, 0Bh
004F B4 0B 1 mov ah, 0Bh
0051 B0 0B 1 mov al, 0Bh
0054 83 C2 02 1 add dx,2
0057 EC 1 in al, dx
0058 83 EA 02 1 sub dx,2
005B 24 0F 1 and al, 0Fh
005D 3C 0F 1 cmp al, 0Fh
005F 75 14 1 jnz colchk ;
rows 07, 0Fh
0061 B4 0F 1 mov ah, 0Fh
0063 B0 07 1 mov al, 07
0066 83 C2 02 1 add dx,2
0069 EC 1 in al, dx
006A 83 EA 02 1 sub dx,2
006D 24 0F 1 and al, 0Fh
006F 3C 0F 1 cmp al, 0Fh
0071 75 02 1 jnz colchk ;
0073 F9 err: stc
0074 C3 found: ret
0075 colchk: cols
0075 B9 0004 1 mov cx, 4
0078 D0 C8 1 ??0001: ror al,1
007A 73 F8 1 jnc found
007C FE CC 1 dec ah
007E E2 F8 1 loop ??0001
0080 EB F1 1 jmp err
0082 keyfind endp
0082 code ends
end strt
Macros:
N a m e Lines
ANYKEY . . . . . . . . . . . . . 7
COLS . . . . . . . . . . . . . . 6
DEBOUNCE . . . . . . . . . . . . 3
ROWS . . . . . . . . . . . . . . 9
CODE . . . . . . . . . . . . . . 0082 PARA NONE
Symbols:
AGAIN . . . . . . . . . . . . . L NEAR 0015 CODE
COLCHK . . . . . . . . . . . . . L NEAR 0075 CODE
ERR . . . . . . . . . . . . . . L NEAR 0073 CODE
FOUND . . . . . . . . . . . . . L NEAR 0074 CODE
104
KEYFIND . . . . . . . . . . . . N PROC 0005 CODE Length = 007D
START . . . . . . . . . . . . . L NEAR 0005 CODE

STRT . . . . . . . . . . . . . . L NEAR 0000 CODE
??0000 . . . . . . . . . . . . . L NEAR 0028 CODE

??0001 . . . . . . . . . . . . . L NEAR 0078 CODE
@CPU . . . . . . . . . . . . . . TEXT 0101h
@FILENAME . . . . . . . . . . . TEXT keyid
@VERSION . . . . . . . . . . . . TEXT 510
74 Source Lines
133 Total Lines
19 Symbols
0 Warning Errors
0 Severe Errors
A note on the Key Debounce operation: The Debounce delay ensures that the
previous key being released will not be seen repeated. When the key is released, there
are bounces, which open the contacts briefly and then again make them on briefly several
times, before finally breaking off. This will appear as repeated pressing of the same key a
few times. This illusion is avoided if you find the key once detected still remains
detected after a delay that would have caused the vibrations to cease, would mean a new
key and not the key bounce. In a similar way, a key being pressed also would appear
multi press of the same key, which would be avoided if we consider the key after
Debounce delay. Look at the flowchart from this point of view.
ALP Procedures or Sub Routines: The features of the procedures as compared

to those of the macros have already been discussed in connection with the discussion of
macros. We will now make this comparison clearer, by doing the smart divide as a
subroutine. This procedure has been given in the book on microprocessors by Douglas V
Hall, which I am giving here with a little modification. I am giving a near procedure
(that is, procedure in the same code segment) with just a small modification. A
procedure can always be used several times without repeating the sequence of
instructions in the ALP and in the machine language (executable) version of the program.
It is always a good practice to indicate at the beginning of the procedure, the following by
way of comments: (i) Which registers or stack locations or memory locations should have
the input variables at the time of calling the procedure; (ii) Where the output variables are
located when the procedure returns; (iii) What are the registers used and whose contents
are destroyed by the procedure; and more importantly (iv) What the procedure actually
does. When this information is provided, the user of the procedure can use the procedure
confidently, and arrange to save the contents of any register that is destroyed by the
procedure prior to calling the procedure, and retrieve the saved data to the appropriate
place after the return from the procedure. So, here is an example of a main program
calling a procedure to perform a smart word division.
THE PROGRAM
data segment
105
dta dw 4567h, 0abcdh, 789ah, 1234h, 5678h, 89abh
rlt dw 6 dup (?)
data ends
; the first 2 words of dta are the dividend words, while the third
; word is the divisor. The 4th and 5th words form the dividend for the next
; trial and the sixth is the next divisor
; the first two words of rlt are for the first quotient and the next
; is the remainder. Similarly, fourth, fifth are quotient words of
; second division and the sixth is the remainder.
code segment
assume cs:code, ds:data, es:data
; we are using macros here to make the data loading and storing simpler
movdata macro; load data into registers
lodsw
mov bx,ax
lodsw
mov dx,ax
lodsw
xchg ax,bx
endm
strlt macro ; store result in memory
stosw
mov ax,dx
stosw
mov ax,cx
stosw
endm
start: mov ax, data ; main program starts from here
mov ds, ax
mov es, ax
mov si, offset dta
cld ; for string operations
lea di, rlt
movdata
call smart_div
strlt
movdata
call smart_div
strlt
int 01
smart_div proc near
; the procedure takes the dividend from DX:AX, and divisor from BX, returns
; the quotient in DX:AX and the remainder in CX, the divisor is returned
; unaltered. CX is used along with DX and AX. All other registers are
; returned unaltered. Note the procedure provides no flexibility in the use
; of registers. Moreover, this procedure is not useful for byte division.
sub cx, cx
cmp dx, bx
jb down
xchg ax, cx
xchg ax, dx
div bx
xchg ax, cx
down: div bx
xchg dx, cx
ret
;
smart_div endp
code ends
end start
TESTING IN DEBUG
-u 0 44
106
13D7:0000 B8D513 MOV AX,13D5
13D7:0007 BE0000 MOV SI,0000
13D7:000A FC CLD
13D7:000B 8D3E0C00 LEA DI,[000C]
13D7:000F AD LODSW
13D7:0010 8BD8 MOV BX,AX
13D7:0012 AD LODSW
13D7:0013 8BD0 MOV DX,AX ; load input
13D7:0015 AD LODSW
13D7:0016 93 XCHG BX,AX
13D7:0017 E81B00 CALL 0035
13D7:001A AB STOSW
13D7:001B 8BC2 MOV AX,DX
13D7:001D AB STOSW ; store result
13D7:001E 8BC1 MOV AX,CX
13D7:0020 AB STOSW
13D7:0021 AD LODSW
13D7:0024 AD LODSW
13D7:0025 8BD0 MOV DX,AX ; load input
13D7:0027 AD LODSW
13D7:0028 93 XCHG BX,AX
13D7:0029 E80900 CALL 0035
13D7:002C AB STOSW
13D7:002D 8BC2 MOV AX,DX
13D7:002F AB STOSW ; store result
13D7:0030 8BC1 MOV AX,CX
13D7:0032 AB STOSW
13D7:0033 CD01 INT 01
13D7:0035 2BC9 SUB CX,CX
13D7:0037 3BD3 CMP DX,BX
13D7:0039 7205 JB 0040
13D7:003B 91 XCHG CX,AX
13D7:003C 92 XCHG DX,AX
13D7:003D F7F3 DIV BX
13D7:003F 91 XCHG CX,AX
13D7:0040 F7F3 DIV BX
13D7:0042 87D1 XCHG DX,CX
13D7:0044 C3 RET
-g 17
AX=4567 BX=789A CX=0065 DX=ABCD SP=0000 BP=0000 SI=0006 DI=000C

13D7:0017 E81B00 CALL 0035 ; before call
-d 0 17
13D5:0000 67 45 CD AB 9A 78 34 12-78 56 AB 89 00 00 00 00 gE...x4.xV......

13D5:0010 00 00 00 00 00 00 00 00 ;data input in memory ........
-g 1a
AX=6CAE BX=789A CX=54BB DX=0001 SP=0000 BP=0000 SI=0006 DI=000C

DS=13D5 ES=13D5 SS=13D5 CS=13D7 IP=001A OV UP EI PL NZ NA PE NC
13D7:001A AB STOSW ; after return
-g 29
AX=1234 BX=89AB CX=54BB DX=5678 SP=0000 BP=0000 SI=000C DI=0012
107
DS=13D5 ES=13D5 SS=13D5 CS=13D7 IP=0029 OV UP EI PL NZ NA PE NC
13D7:0029 E80900 CALL 0035 ; before call
-g 2c
AX=A0CB BX=89AB CX=079B DX=0000 SP=0000 BP=0000 SI=000C DI=0012

DS=13D5 ES=13D5 SS=13D5 CS=13D7 IP=002C OV UP EI NG NZ AC PO CY
13D7:002C AB STOSW ; after return
-g 33
AX=079B BX=89AB CX=079B DX=0000 SP=0000 BP=0000 SI=000C DI=0018

DS=13D5 ES=13D5 SS=13D5 CS=13D7 IP=0033 OV UP EI NG NZ AC PO CY
13D7:0033 CD01 INT 01
- d 0 17
13D5:0000 67 45 CD AB 9A 78 34 12-78 56 AB 89 AE 6C 01 00 gE...x4.xV...l..

13D5:0010 BB 54 CB A0 00 00 9B 07 ; input & output data. .T......
-q
If the smart divide procedure is compared with the corresponding smart divide
macro, one can easily recognize the flexibility provided by the macro. With macro we
could undertake byte or word division with a single macro, but we cannot use the
procedure indicated above for byte division. We have to write a separate procedure for
smart byte division. Not only that, with macros we have the flexibility of using any
register to store the divisor, by properly invoking the macro with that register as the
parameter, but such a flexibility is not there with procedures. Only one register, BX in
the above program, can have the divisor and only one register, CX, can be used to store
the remainder.
Passing Parameters to Subroutines: How to specify the parameters in a

subroutine? In a macro, we invoke the macro with the parameters specified explicitly.
Procedures are simply called without any argument being specified. All the parameters
or arguments are implied completely. If the parameters are not many in number, they can
be passed through specific registers. But if a relatively large number of parameters, say
five or more, are to be sent to the procedure, then registers may be required for
manipulations and computations in the subroutines and the parameters will have to be
stored possibly as an array in some memory location. Parameters may be required to be
handled in a random order in the procedure, and so they are unsuitable for storing in the
stack, which is a last-in first-out array. Parameters can be stored in a memory location as
an array, whose starting address is specified in an address register. It will then be
possible to retrieve any parameter from this array, any number of times, by indexed
addressing. When returning from the subroutine, these memory locations should be made
free for other uses by the program. Recursive and re-entrant programs may pose further
difficulty in parameter passing. A recursive program is one which calls itself. A simple
example is a program calculating the factorial of a number. The recursive equation n! =
n*(n – 1)! can be used with the terminating condition, 1! = 1. So the procedure for
calculating the factorial n, will first check if n = 1. If so, it will return with the value 1 for
the factorial, else it will store n in the memory, decrement n and call the subroutine again.
The process continues till n gets reduced to 1 when the factorial value of 1 will be
returned by the program, this will now be used to calculate 2!, from which 3! will be
computed and so on, till we are able to compute and return the value of n!. A re-entrant
108
program is one which starts executing for one set of parameters and part way through, it
is called again to repeat the computation for another set of parameters. This sort of
requirement may come about as follows. Consider a floating point add subroutine. A
program is doing this process. In the middle of this execution, the processor is
interrupted by some system hardware. As per the interrupt handling operations, the
interrupt service routine starts executing now, putting the on-going floating point ADD
routine in a suspended condition. If the interrupt service routine now also requires
floating point ADD operation, it will call the same procedure, but with a different set of
parameters to be handled, as per the requirements of the interrupt service process. The
floating point ADD routine is now said to have re-entered with different parameters. On
return from interrupt service, the suspended floating point ADD routine should resume
operation from where it has left, that means, its parameters should not be disturbed by the
re-entered procedure. Many system programs do require being re-entrant.
Both recursion and re-entrance require different non-overlapping locations for the
parameters every time the procedures are called. There are several possibilities of
achieving all the above requirements of random access of the parameters and non-
overlapping region of memory for every new invocation of the routines. It may
sometimes be possible to avoid use of parameter overlapping by suitably adjusting the
recursive equation, as the example below shows; or if required, using the stack for
temporarily storing the parameters as shown in the next example, below.
EXAMPLE 1; computing n! without using the stack for parameter store
A RECURSIVE PROGRAM FOR FACTORIAL; PASSING PARAMETERS IN REGISTERS.
The program below uses the recursive equation, n! = n*(n-1)!, with the
terminating condition defined for 1! = 0! = 1. The parameters passed are: the
value of n in BX, and the identity element for multiplication, namely 1, is
stored in register AX as a parameter to be passed to the subroutine.
code segment
assume cs:code
strt: mov ax, 1 ;
mov bx, 8 ; parameters to be passed in ax and bx; n = 8
call fa
int 1
fa proc near
cmp bx,1
jna return
mul bx ; multiplication is done first, so no need to store n
dec bx
call fa
return: ret
fa endp
code ends
end strt
-u 0 16
13DC:0000 B80100 MOV AX,0001
109
13DC:0003 BB0800 MOV BX,0008
13DC:0006 E80200 CALL 000B
13DC:0009 CD01 INT 01
13DC:000B 83FB01 CMP BX,+01
13DC:000E 7606 JBE 0016
13DC:0010 F7E3 MUL BX
13DC:0012 4B DEC BX
13DC:0013 E8F5FF CALL 000B
13DC:0016 C3 RET
-r

13DC:0000 B80100 MOV AX,0001
-g

DS=13CC ES=13CC SS=13DC CS=13DC IP=000B NV UP EI PL ZR NA PE NC
13DC:000B 83FB01 CMP BX,+01
-q
EXAMPLE 2; using the stack for temporary parameter store
; the program below uses the recursion slightly differently

; for passing parameters through registers. In this
; you pass input parameter n through reg BX; output in ax.
; registers used: ax, bx and dx (for word multiplication)
; this program uses the recursive equation n! = (n-1)!*n.
; the terminating condition is 1! = 0! = 1; there is no
; need to save this condition in a register as in the earlier
; program. However, this program needs the use of the stack
; to store the value of the parameter n in the stack.
; Reason out why.
assume cs:code
code segment
strt:call fac
int 1
jmp strt
fac proc near
mov ax, 1
cmp ax, bx
jae return
push bx ;store n temporarily (till ‘ret’ from next ‘call’)
dec bx
call fac
pop bx ;retrieve the stored n for multiplication
mul bx ;multiply by n.
return: ret
fac endp
code ends
end strt
-u 0 16
13DB:0000 E80400 CALL 0007

13DB:0003 CD01 INT 01
13DB:0005 EBF9 JMP 0000
13DB:0007 B80100 MOV AX,0001
13DB:000A 3BC3 CMP AX,BX
13DB:000C 7308 JNB 0016
13DB:000E 53 PUSH BX
110
13DB:000F 4B DEC BX
13DB:0010 E8F4FF CALL 0007
13DB:0013 5B POP BX
13DB:0014 F7E3 MUL BX
13DB:0016 C3 RET
-r

DS=13CB ES=13CB SS=13DB CS=13DB IP=0000 NV UP EI PL NZ NA PO NC
13DB:0000 E80400 CALL 0007
-rbx
BX 0000
:8
-g

DS=13CB ES=13CB SS=13DB CS=13DB IP=0005 NV UP EI PL ZR NA PE NC
13DB:0005 EBF9 JMP 0000
In the example 1 above, we can notice an interesting feature. In the subroutine

there, the instruction CALL FA is immediately followed by the instruction RET (with a
label RETURN). Such CALL followed immediately by RET can always be replaced by
a simple JMP instruction. You may reason it out and make sure that it is so. If we do
that change, we get the FA subroutine altered at the end thus:
JMP FA
RETURN: RET.
In this fashion, it is no more a recursive routine, and the working of the program
is shown below.
-u 0 15
13DC:0000 B80100 MOV AX,0001
13DC:0003 BB0800 MOV BX,0008
13DC:0006 E80200 CALL 000B
13DC:0009 CD01 INT 01
13DC:000B 83FB01 CMP BX,+01
13DC:000E 7605 JBE 0015
13DC:0012 4B DEC BX
13DC:0013 EBF6 JMP 000B
13DC:0015 C3 RET
-r
13DC:0000 B80100 MOV AX,0001
-g

DS=13CC ES=13CC SS=13DC CS=13DC IP=000B NV UP EI PL ZR NA PE NC
13DC:000B 83FB01 CMP BX,+01
Passing of the parameters through the registers and then using the stack to keep
the parameters temporarily, so that the overlapping of parameters from one call to another
will not erase the parameters across the calls, we have seen above. However, the most
common and versatile method that can be used in the 8086 processor for recursion, is
111
passing parameters directly through the stack, instead of using the stack to store
parameters in the subroutines. This is the standard method used by the C-compiler, for
example, for any general subroutine handling. The stack has a limitation, of course. It
does not allow accessing the parameters randomly as would be required by the operations
of the program. To make a random access possible, a separate register, other than the
stack pointer is provided. This is the BP or the base pointer. Normally base pointer
defaults with the stack segment for the reason of making a parameter array in the stack
for the subroutines. The main or calling program pushes the parameters onto the stack;
the subroutine accesses the parameters randomly as required, using the BP register. On
return, the calling program could retrieve the results through pop operations from the
stack array. We make a separate structure (in the stack) called the stack frame, and put
our parameters to be passed in this stack frame, including the output desired from the
subroutine. In order to do this, we provide first, space for the output variable, by
subtracting enough number from the stack pointer. Then we push the input variables.
Having done this in the main program, we call the subroutine. In the subroutine, the first
thing we do is to get the BP pushed onto the stack and copy the SP value in BP. BP now
becomes the frame pointer; the space starting from the return address, down to, and
including the output space in the stack, will be the stack frame. The frame could be
further expanded by providing space for the local variables of the subroutine, which may
have to be referred to, a number of times. This space is provided by subtracting
appropriate number from the stack pointer. Space above this in the stack is now available
for use in the subroutine as a regular stack. This separation of stack space into frame and
stack, will give a disciplined approach to the parameter passing problem of subroutines.
According to this, the recursive subroutine for factorial will be as shown. Note that the
stack frame and the input and output parameters are referred to in the subroutine, by
indexed addressing using BP with positive displacement, while the local parameters are
with negative displacement. Stack beyond the frame is available to the subroutine to be
used like an ordinary stack with the LIFO operation. While returning, the process simply
does move to SP from BP and then pops BP, to return to the old BP, and then executes ret
n, (in the program shown, return alone is used, instead of return n, which is followed by
add SP, 2, which is another way of doing it), where n is the number of input bytes to be
discarded from the stack. Now the output of the subroutine can be simply popped off the
stack in the main program. In effect, the parameters are pushed in the calling program,
and recalled using BP relative addressing in the called program. On return, the results
can be popped off in the main program. Based on this philosophy, the recursive factorial
program can be seen to be as follows:
Details of Passing Parameters to a Subroutine
Using Stack Arrays or Stack Frames.
Code segment
Assume cs:code
Main: Mov AX, n ; (Choose n in the range 0 to 8 only.)
Sub SP,2 ; make space for one word output (Factorial value)
Push AX ; input parameter to the stack.
Call Fact ; call the recursive routine.
Add SP,2 ; clear the stack of the input
; to undo the Push AX above
Pop AX ; get the output result in reg. AX.
Int 01 ; pass control to the DOS.
112
Fact proc near ; the recursive procedure here.
Push BP ; BP is the frame pointer (defaulting to Stack Segment)

; so, save its old value belonging to the calling
; process, and make space to have its new value.
Mov BP,SP ; BP is now the (new) frame pointer for the subroutine.
; no temporary variable required, so no Dec SP needed
; in this subroutine.
Push AX ; stack space of subroutine utilized to save Regs. used.
Push BX ;
Push DX ; save Regs. used.
Mov BX, [BP+4]; the input parameter, n, is at [BP+4]; see table 4.1
Mov AX,1 ; prepare for checking termination condition.
Cmp AX, BX ; check for termination.
Jae Term ; if AX is above or equal to BX, that is, if n = 0 or 1 go
; to terminate; the result is already in the register AX.
Dec BX ; else, prepare to recursively call Fact (n-1)
Sub SP,2 ; make space for output of the procedure for Fact (n-1).
Push BX ; input to the procedure to find Fact (n-1).
Call Fact ; recursive call.
Add SP,2 ; discard the input variable of the called procedure.
Pop AX ; get Fact (n-1) in AX, output of called procedure.
Mov BX,[BP+4] ; get the input variable to this procedure, n, in BX.
Mul BX ; the product n* Fact (n-1) goes to DX:AX; note, DX will
; be zero here as we are limiting the result to 16-bits.
; however, DX is set to 0, by this instruction, and hence
; we need to save it across the procedure; result now in AX.
Term:Mov [BP+6], AX ; store the result from AX in the output space provided in
; the Stack Frame. Note, in case n is 1 or 0, we directly
; come here with the result 1, in AX.
Pop DX ; termination ritual starts from here, retrieve saved
; registers,
Pop BX ; from the stack,
Pop AX ; and clean the Procedure stack.
Mov SP, BP ; not really necessary here, as the stack is already clean.
; but it is a good practice to have this in the Procedure
; to make it doubly sure that the stack is cleaned.
Pop BP ; get back the original BP of the calling program
ret ; Procedure over, go back to the calling program.
fact endp
code ends
end main
Intel 8086 instruction, RET n, which does the job of adding the value n to the
stack pointer after the return, meant to be used in this situation. If we use Ret 2 instead of
Ret in the above program, the instruction Add SP, 2 following the call instruction in both
the main program and in the subroutine procedure can be eliminated. The table below
shows the stack frame for this program.
Table 4.1: Stack Frame after the 2nd Instruction of the Procedure
Memory Address Pointer Memory Contents
New BP = SP Old BP of calling program
New BP + 2 Return address of calling Program
(for a near call)
New BP + 4 Input to the called subroutine, n
New BP + 6 Space for output value of Fact
113
It may be noted here, that there is no local variable required for this subroutine,
so, there is nothing in the stack frame above the old BP. In case there are local variables
stored, above the old BP in the stack frame, the instruction mov SP,BP in the termination
part of the procedure, would clear the stack of those variables. Hence as a general rule it
is safe to use that instruction during the termination process. The space above is usable
in the routine as a normal stack, for saving registers, and for further call of nested
routines, etc. The stack frame as shown in this example, remains as compact as possible,
and for determining the offsets for variable parameters we need to consider only one
frame, without bothering about the nesting frames.
The Table 4.2 explains the operation of passing parameters through the use of
stack frame step by step.
However, re-entrant programs have no way other than going through a process
similar to what we discussed with recursion. A stack frame is the best way of meeting
the requirements of the re-entrant programs. For every entry there is a stack frame
created which preserves the parameters, as well as the local variables of the procedure,
which will be separate (non overlapping memory region) from the next instance of the
call to the same procedure.
Table 4.2 : Passing parameters through stack frames to subroutines
114
Calling program Called Sub routine
Step 1: Decrement Stack pointer
suitably to accommodate result
output
Step 2: Push all the input
parameters.
Step 3: Call the Subroutine → Step 1: Push the Frame pointer.
Step 2: Move SP to Frame pointer.
Step 3: Decrement the stack pointer suitably
to provide space for temporary or local
variables of the subroutine (which may be
required to be invoked in a random fashion)
to complete the stack frame.
Step 4: Do the subroutine job, use the stack
space above the frame, as the stack for the
subroutine. Use indexed addressing with the
Frame pointer as the base register to obtain
the parameters as required in the subroutine.
Use indexed addressing with the frame
pointer to store the output variables of the
subroutine in the space provided in the Stack
frame.
Step 5: When the subroutine job is done,
clear the subroutine stack by moving the
Frame pointer value to the stack pointer. Get
back the original Frame pointer.
Step 6: Return to the calling program
←
Step 4: On return from the
subroutine, clear the parameters
input to the subroutine from the
Stack by incrementing the Stack
pointer appropriately, to undo
step 2 above.
Step 5: Get the results of the
subroutine by popping them off
the stack into registers or
memory locations as required,
and use them as required.
It should however be noted that the program we have taken, uses only two or three
registers and hence, parameters can well be passed through registers in this case. Below,
we have a recursive program for factorial calculation using this idea.
115
PRO C FAC T
C REATE
STAC K
FRAME
No N ≤ 1?
N=N–1 Ye s
C ALL FAC T FAC T = 1

(N – 1)!
UNDO
THE
FAC T = S TAC K
N*FAC T FRAME
RETURN
Fig 4.1: Flowchart for the recursive Factorial Procedure for N!
Example: write a recursive program to compute nCr (passing parameters through

the stack), number of combinations of n items taking r at a time. Use the relations:
nCr = n-1Cr + n-1Cr-1 with the terminating conditions: nCn = nC0 = 1
SOLUTION: A single stacked word is used to define input parameters n and r.
The parameters n = 18 or 12 hex is in the high byte of the word and r = 9 in
the low byte. The input is chosen to limit the output to 16 bits. One word
space is left for input in the stack frame, by the instruction sub SP,2 on line
no. 2. Try to understand how the program works with the help of the comments
given.
Note the format of this program is not for assembling using the MASM. It
is the .lst program obtained from a 32 bit assembler, NASM, downloadable from
the net. However, the difference is not much, and the .asm version required for
MASM can be easily visualized from this listing. See also Q5 at the end of
chapter exercises for a brief introduction to NASM, and macros in NASM.
1 ; Main Program for finding nCr recursively.

2 00000000 81EC0200 sub sp,2
3 00000004 B80912 mov ax,1209h ; n = 12h or 18, and r = 9.
4 00000007 50 push ax
5 00000007 E80300 call ncrp ; call to the routine at line 8.
6 0000000B 58 pop ax ; get result in ax.
7 0000000C CD01 int 01 ; pass control to DOS.
; procedure ncrp
8 0000000E 55 ncrp:push bp ; recursive routine here.
9 0000000F 89E5 mov bp,sp
10 00000011 81EC0200 sub sp,2 ; space for temp variable
; of the procedure.
116
11 00000015 50 push ax
12 00000016 53 push bx
13 00000017 8B5E04 mov bx,[bp+4] ; parameters passed from
; the calling program.
14 0000001A B80100 mov ax,1
15 0000001D 38DF cmp bh,bl
16 0000001F 7427 jz over
17 00000021 08DB or bl,bl ; is bl = 0?
18 00000023 7423 jz over
19 00000025 81EC0200 sub sp,2
20 00000029 FECF dec bh
21 0000002B 53 push bx
22 0000002C E8DFFF call ncrp ; calculate n-1Cr
23 0000002F 5B pop bx
24 00000030 895EFE mov [bp-2],bx ; store partial result
; temporarily.
25 00000033 8B5E04 mov bx,[bp+4] ; recall the parameters
26 00000036 FECF dec bh
27 00000038 FECB dec bl
28 0000003A 81EC0200sub sp,2
29 0000003E 53 push bx
30 0000003F E8CCFF call ncrp ; compute n-1Cr-1
31 00000042 58 pop ax
32 00000043 8B5EFE mov bx, [bp-2]
33 ; partial result stores in
; temp variable space of
; stack frame
34 00000046 01D8 add ax, bx
35 00000048 894606 over:mov [bp+6],ax ; store result in stack space
36 0000004B 5B pop bx
37 0000004C 58 pop ax
38 0000004D 89EC mov sp,bp ; clean the stack
39 0000004F 5D pop bp
40 00000050 C20200 ret 2
The subroutine leaves space for one word of local variable (sub sp,2 in line no.
10). Sub sp,2 in line 28 is the space left for the output variable. Lines 15 to 18 check for
termination condition. The rest of the program can be clearly identified in terms of the
Flow chart of Fig. 4.1.
For counting the number of times the subroutine is called, you can use SI:CX as
counters set to 0, initially in the main program, and incremented before the return
instruction in the subroutine ncrp. You may be surprised to see the result! It comes to as
much as 97239 decimal, whereas the value 18C9 is only 48620 decimal.
The use of an additional set of terminal conditions, namely, nCn-1 = nC1 = n,
will certainly improve the execution time and stack memory requirement (the count here
comes to only 25836 decimal).
In normal non-recursive and non-re-entrant situations, it is possible to pass
parameters through registers. The following is an example of a non-recursive program
using registers for parameter passing. The program is written as a near program, and
does a 32 bit by 32 bit multiplication. The program is given below with adequate
comments.
An Example of passing parameters to subroutines through Registers
;The process in terms of 16 bit data (a:b)*(c:d) = acH : (acL + bcH + adH) :
; (bcL + adL + bdH) : bdL
117
; the numbers to be multiplied are first made available in the registers as
; indicated in the comments at the start of the procedure.
; Steps are: 1. Save registers used; 2. Do the job and 3. Retrieve saved regs.
code segment
assume cs:code
start: call dmult
int 01
dmult proc near
; input parameters, a in dx, b in ax, c in bx and d in cx
; numbers multiplied a:b and c:d; i.e. (dx:ax)*(bx:cx).
; output in dx:cx:bx:ax ; all other regs. are saved across the procedure
push si
push di
push bp ; save extra registers used
mov si, ax ; dx:ax has a:b, so b  si
mov di, dx ; a  di
mul cx ; (bd) dx:ax
xchg ax, si; si  bdL, final; and b  ax
mov bp, dx ; bdH  bp
mul bx ; (bc)  dx:ax
add bp, ax ; bcL + bdH  bp, carry1 is not disturbed & used later
mov ax, cx ; d  ax
mov cx, 0 ; 0  cx without disturbing the carry flag
adc cx, dx ; bcH + carry1  cx
mul di ; (ad)  dx:ax
add bp, ax ; bcL + bdH + adL  bp; carry2 could be there
adc cx, dx ; bcH + carry1 + adH + carry2  cx; carry3 could be there
mov ax, di ; a  ax
mov di, 0 ; carry3 not disturbed, 0 di
adc di, di ; carry3  di
mul bx ; (ac)  dx:ax
add cx, ax ; bcH + carry1 + adH + carry2 + acL  cx; carry4, may be
; cx now has the final result
adc dx, di ; acH + carry3 + carry4  dx
mov bx, bp ; arrange the results in bx
mov ax, si ; and in ax as required
pop bp ; retrieve saved registers
pop di
pop si
ret
dmult endp
code ends
end start
Non recursively solving problems which are susceptible to recursive solution:

Recursive problems may also be solved non recursively. For example, factorial
computation for n can be done by simply starting from 1 and keeping multiplying by
successive integers until we reach n, alternatively, starting with n decrementing and
multiplying until we reach 1. A skeleton program for the purpose is shown below:
-a
1377:0100 mov ax,1

1377:0103 mov cx,8
1377:0106 mul cx
1377:0108 loop 106
1377:010A int 01
-r
118
1377:0100 B80100 MOV AX,0001
-t12

1377:0103 B90800 MOV CX,0008

1377:0106 F7E1 MUL CX

1377:0108 E2FC LOOP 0106

1377:0106 F7E1 MUL CX

1377:0108 E2FC LOOP 0106

1377:0106 F7E1 MUL CX

1377:0108 E2FC LOOP 0106

1377:0106 F7E1 MUL CX

1377:0108 E2FC LOOP 0106

1377:0106 F7E1 MUL CX
AX=1A40 BX=0000 CX=0004 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

1377:0108 E2FC LOOP 0106
AX=1A40 BX=0000 CX=0003 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

1377:0106 F7E1 MUL CX
AX=4EC0 BX=0000 CX=0003 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

1377:0108 E2FC LOOP 0106
AX=4EC0 BX=0000 CX=0002 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

1377:0106 F7E1 MUL CX
119
AX=9D80 BX=0000 CX=0002 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
1377:0108 E2FC LOOP 0106

1377:0106 F7E1 MUL CX

1377:0108 E2FC LOOP 0106

DS=1377 ES=1377 SS=1377 CS=1377 IP=010A NV UP EI PL NZ NA PO NC
1377:010A CD01 INT 01
-q
The advantage of considering recursion is that the recursion logic is normally

relatively simple. In case the non recursive logic is simple, it is better to solve a problem
non recursively. Certain problems are difficult to visualize without recursion. Tower of
Hanoi is a very good example of such a problem. With recursion, it is very simple to
tackle. In such cases it becomes necessary to resort to recursion.
EXERCISES
1. Study the method of handling local labels in macro as demonstrated in section 1

on macros at the beginning of this chapter. Based on your study, answer the
following question: A program uses 2 macros. Macro 1 has three local variables,
while macro 2 has five. The main program invokes the macro 1 5000 times. How
many times can the program invoke macro 2 without getting into problems with
identifiers for the local labels?
2. Here is another example of the power of macros to do different operations

depending on the operations also indicated as parameter. Study the program
carefully and note how the different issues of the shift left and shift right of multi
word data by variable number of bits are handled in the macro.
data segment
dat dw 234h, 5678h, 89abh, 7604h, 0abc0h, 3 dup(0)
dat1 dw 8 dup(?)
dat2 dw 8 dup(?)
num dw 5
data ends
;
code segment
assume cs: code, ds: data, es: data
sflr macro opr, mem, n, m, df
local bak, again
df ;; either 'std' for right shift
;; or 'cld' for left shift
mov dx, m
again: mov cx, n
lea di, mem
120
clc
bak: mov ax, [di]
opr ax, 1
stosw
loop bak
dec dx
jnz again
endm
;
; In the above macro, n is the no.of data words, and m is the number of bits of
; shifts done in terms of bits, opr is rcl or rcr
;
strt: mov ax, data
mov ds, ax
mov es, ax
cld
mov cx, num
mov di, offset dat1
mov si, offset dat
rep movsw ; to copy data
mov cx, num
mov di, offset dat2
mov si, offset dat
rep movsw ; to copy data
sflr rcl, dat1, num, 4, cld ; data start address for 'rcl'
sflr rcr, dat2+8, num, 4, std ; data end address for 'rcr'
int 1
code ends
end strt
STUDY IN DEBUG
-u 0 4e
13E0:0000 B8DC13 MOV AX,13DC
13E0:0003 8ED8 MOV DS,AX
13E0:0005 8EC0 MOV ES,AX
13E0:0007 FC CLD
13E0:0008 8B0E3000 MOV CX,[0030]
13E0:000C BF1000 MOV DI,0010
13E0:000F BE0000 MOV SI,0000
13E0:0012 F3 REPZ
13E0:0013 A5 MOVSW
13E0:0014 8B0E3000 MOV CX,[0030]
13E0:0018 BF2000 MOV DI,0020
13E0:001B BE0000 MOV SI,0000
13E0:001E F3 REPZ
13E0:001F A5 MOVSW
13E0:0020 FC CLD
13E0:0021 BA0400 MOV DX,0004
13E0:0024 8B0E3000 MOV CX,[0030]
13E0:0028 8D3E1000 LEA DI,[0010]
13E0:002C F8 CLC
13E0:002D 8B05 MOV AX,[DI]
13E0:002F D1D0 RCL AX,1
13E0:0031 AB STOSW
13E0:0032 E2F9 LOOP 002D
13E0:0034 4A DEC DX
13E0:0035 75ED JNZ 0024
13E0:0037 FD STD
13E0:0038 BA0400 MOV DX,0004
13E0:003B 8B0E3000 MOV CX,[0030]
13E0:003F 8D3E2800 LEA DI,[0028]
13E0:0043 F8 CLC
121
13E0:0044 8B05 MOV AX,[DI]
13E0:0046 D1D8 RCR AX,1
13E0:0048 AB STOSW
13E0:0049 E2F9 LOOP 0044
13E0:004B 4A DEC DX
13E0:004C 75ED JNZ 003B
13E0:004E CD01 INT 01
-g
AX=8023 BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=000A DI=001E
DS=13DC ES=13DC SS=13DC CS=13E0 IP=0050 NV DN EI PL ZR NA PE NC
13E0:0050 FFFF ??? DI
-d0 3f
13DC:0000 34 02 78 56 AB 89 04 76-C0 AB 00 00 00 00 00 00 4.xV...v........
13DC:0010 40 23 80 67 B5 9A 48 60-07 BC 00 00 00 00 00 00 @#.g..H`........
13DC:0020 23 80 67 B5 9A 48 60 07-BC 0A 00 00 00 00 00 00 #.g..H`.........
13DC:0030 05 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
;
; the original word is: ABC0 7604 89AB 5678 0234;
; on 4 bit left shift, we get: BC07 6048 9AB5 6780 2340;
; and on right shift, we get: 0ABC 0760 489A B567 8023; as can be seen from
; the result above.
3. In the above program of Q2, the responsibility of the programmer in regard to

invoking the macro is a bit complex. If the opr chosen is RCL, then the address
of the l. s. word of data is to be given as mem and CLD is to be given as the df
parameter. If the opr is chosen as RCR, then the mem should point to the m.s.
word address of the data and df should be defined as STD. These have to be
coordinated properly; else the program will not work. Can you suggest a method
to simplify the coordination requirement? One way of doing it is suggested
below, study the method and determine how it will work. Suggest if you have
any alternatives.
Sflr macro opr, k, mem, n, m
Local bak, again
Mov bx, n
Mov dx, m
Mov si, offset mem
Cld ;; unless changed for RCR
Mov ax, k
Or ax, ax
Jz again
Std ;; for RCR operation.
Mov cx, bx
Dec cx
Shl cx, 1
Add si, cx ;; this works out to the m.s. word address for RCR op.
Again: mov cx, bx
Mov di, si
clc
Bak: Mov ax, [di];; from now on, proceed identical to the original
program.
In this program, we bundle the parameters (opr, k) as (rcl, 0) for left shift or as
(rcr, 1) for right shift, which is easier on the programmer. The mem value can be the start
address in both the cases, and the direction flag is adjusted based on the control parameter
122
k. Sometimes the value of n may not be known at the time of writing the program; it may
be a computed and stored word during operation. Such cases will also be accommodated
in this modified macro.
4. The following program is given without any comments. Find out what it does
and test the working of the program.
data segment
mnd dw 8 dup (1578h)
spare dw 0F8h dup (?)
sud dw 1234h, 0abcdh, 2389h, 5874h
dw 9876h, 4567h, 0bcdh
dw 9567h
spare2 dw 10h dup (?)
rest dw 9 dup (?)
data ends
code segment
subt macro minuend, subtrahend, result, n
local lup
mov si, offset minuend
mov di, offset result
mov cx, n
clc
cld
lup: lodsw
sbb ax, [subtrahend-minuend-2][si]
stosw
loop lup
sbb ax, ax
mov [di], ax
endm
strt: mov ax, data
mov ds, ax
mov es, ax
subt mnd, sud, rest, 8
int 1
code ends
end strt
5. Below is given an assembly language version of the smart divide using macro,
which we saw under point no.6 in this chapter earlier. This version can be
assembled using the freely available assembler, NASM (Net Assembler), which
can be directly downloaded from the net. The user manual is also freely
downloadable. The program as given here matches the program for assembly by
MASM, which we saw earlier. Note the features of using macros for NASM and
the overall simplicity of the .asm program. List the differences in assembly
programs for the two assemblers; especially find the interesting way of handling
the parameters for the micro. Note also, how the local labels for the macro are
handled.
THE SMARTDIV.ASM FILE
123
%macro smdiv 4
sub %4, %4
cmp %1, %3
jb %%down
xchg %4, %1
xchg %4, %2
div %3
xchg %4, %2
%%down: div %3
xchg %4, %1
%endmacro
;
start: mov dx, 0x0abc
mov ax, 0x1234
mov bx, 0x45ab
smdiv dx, ax, bx, cx
mov bx, 0xdabc
mov ax, 0x2345
mov cl, 35
smdiv ah, al, cl, bl
int 01
THE SMARTDIV.LST FILE OBTAINED FROM THE SMARTDIV.ASM FILE
The command used is: nasm –l smartdiv.lst smartdiv.asm. The file also shows
the assembled program starting at the origin 0000 in the code segment.
1 %macro smdiv 4
2 sub %4, %4
3 cmp %1, %3
4 jb %%down
5 xchg %4, %1
6 xchg %4, %2
7 div %3
8 xchg %4, %2
9 %%down: div %3
10 xchg %4, %1
11 %endmacro
12 ;
13 00000000 BABC0A start: mov dx, 0x0abc
14 00000003 B83412 mov ax, 0x1234
15 00000006 BBAB45 mov bx, 0x45ab
16 smdiv dx, ax, bx, cx
17 00000009 29C9 <1> sub %4, %4
18 0000000B 39DA <1> cmp %1, %3
19 0000000D 7206 <1> jb %%down
20 0000000F 87CA <1> xchg %4, %1
21 00000011 91 <1> xchg %4, %2
22 00000012 F7F3 <1> div %3
23 00000014 91 <1> xchg %4, %2
24 00000015 F7F3 <1> %%down: div %3
25 00000017 87CA <1> xchg %4, %1
26 00000019 BBBCDA mov bx, 0xdabc
27 0000001C B84523 mov ax, 0x2345
28 0000001F B123 mov cl, 35
29 smdiv ah, al, cl, bl
124
30 00000021 28DB <1> sub %4, %4
31 00000023 38CC <1> cmp %1, %3
32 00000025 7208 <1> jb %%down
33 00000027 86DC <1> xchg %4, %1
34 00000029 86D8 <1> xchg %4, %2
35 0000002B F6F1 <1> div %3
36 0000002D 86D8 <1> xchg %4, %2
37 0000002F F6F1 <1> %%down: div %3
38 00000031 86DC <1> xchg %4, %1
39 00000033 CD01 int 01
40
The program can be assembled using the command: nasm –o smartdiv.com

smartdiv.asm. The resulting machine language program could be executed
in debug. But then it will be at the offset 100h in the debug
environment as shown below. The identical fashion in which the two
machine language programs (what we saw earlier and what we see here)
have developed can be easily verified.
-u 100 133
13CC:0100 BABC0A MOV DX,0ABC

13CC:0103 B83412 MOV AX,1234
13CC:0106 BBAB45 MOV BX,45AB
13CC:0109 29C9 SUB CX,CX
13CC:010B 39DA CMP DX,BX
13CC:010D 7206 JB 0115
13CC:010F 87CA XCHG CX,DX
13CC:0111 91 XCHG CX,AX
13CC:0112 F7F3 DIV BX
13CC:0114 91 XCHG CX,AX
13CC:0115 F7F3 DIV BX
13CC:0117 87CA XCHG CX,DX
13CC:0119 BBBCDA MOV BX,DABC
13CC:011C B84523 MOV AX,2345
13CC:011F B123 MOV CL,23
13CC:0121 28DB SUB BL,BL
13CC:0123 38CC CMP AH,CL
13CC:0125 7208 JB 012F
13CC:0127 86DC XCHG BL,AH
13CC:0129 86D8 XCHG BL,AL
13CC:012B F6F1 DIV CL
13CC:012D 86D8 XCHG BL,AL
13CC:012F F6F1 DIV CL
13CC:0131 86DC XCHG BL,AH
13CC:0133 CD01 INT 01
125
5. SOME SIMPLE NUMBER CRUNCHING and
INTERRUPT PROGRAMS
Having studied the basics of programming in the previous Chapters, we shall

now look into some simple number-crunching programs. Many of these routines could be
of general use. In such cases they can be converted as procedures. The first program for
finding the GCD is directly written as a procedure. You can develop your own method of
testing it.
The science of programming lies in making non-working programs work.
Problems could arise in the ALP itself which will be indicated while assembling. Many
times, the error indication by the assembler may be difficult to understand. The error is
tersely noted as Error no. xxxx, with a short description. Some times these descriptions
may even be misleading to the uninitiated. To give a simple example, suppose in the data
segment you are trying to define the dividend with a word of value ABCD H, by labeling
it as dd, as shown below:
; the ALP try.asm
data segment
dd dw abcdh ; line 2 of try.asm ; the intention is to have the symbol ‘dd’
; defined by the word abcdh. dd(symbol) dw(defining word) abcdh
data ends
code segment
assume cs:code, ds: data
mov ax, data
mov ds, ax
mov ax, dd ; line 8 of try.asm
int 1
code ends
end
; Result of assembling try.asm using MASM

Microsoft (R) Macro Assembler Version 5.10A
Copyright (C) Microsoft Corp 1981, 1989. All rights reserved.
try.ASM(2): error A2009: Symbol not defined: DW

try.ASM(8): error A2009: Symbol not defined: DD
48212 Bytes symbol space free
0 Warning Errors
2 Severe Errors
What has happened is the MASM has understood ‘dd’ of line 2 as reserved word defining
double word (32-bit word) and is expecting to be getting the double word defined next.
What it sees is ‘dw’ and interprets it as a label defined elsewhere. No such label is found
and the matter is reported as an error. This will be difficult to make out by the
inexperienced user. The user would like to get an indication that the error is in the use of
the reserved word ‘dd’ as a symbol for a data word. Instead, the MASM interprets the
first word as a valid reserved word and hence looks at the second word as an undefined
symbol. If you correct this error by using the symbol dvd for the dividend, you will find
the line 2 still having error. Try this out in the laboratory, until you get the program
assembling without error! However, most of the time, the error indications can be easily
understood. When it is difficult to understand, one will have to work different possible
alternatives on the erroneous line and on other lines related to it till the fault is correctly
126
identified. It would require some practice before these aspects are properly understood.
On getting the machine language program using the LINK after the MASM, the program
can be debugged. At the debug stage also, there could be, or quite likely there will be
problems, which have to be solved using the ‘t’, ‘g’, ‘p’, ‘d’ and other commands of the
debug judiciously in assessing the faults. The procedures given below are tested and the
working test results are shown. Still when you work this problem in the laboratory, there
could be errors in your program entry. Until you get solid working programs from
possibly wrong programs, you would not have learnt programming well. The purpose of
the microprocessor laboratory is to impart this sort of training. Here are a few working
procedures and programs. In the laboratory when you are working in a team, it could be
arranged that one of the team members deliberately introduces errors in the assembly or
debug version of the program, unseen by other team members who may then try to
identify the error. This may be played as a game and slowly you will find your interest in
programming picking up. In any case whenever you notice errors in your program, make
a record of the error and the way you got it corrected. The learning of programming in
the laboratory is only by studying and avoiding such errors in the future and getting
confident about handling errors by getting aware of common errors possible.
1. A procedure for finding the GCD of two 16 bit numbers in AX and BX

registers: The procedure will assume the numbers as unsigned; will test for a
zero in any input data and return with the carry flag set to indicate invalid data
in case a zero is found as one or both of the inputs. For valid data, the GCD
will be returned in register AX, with the carry cleared to indicate valid data.
Program for finding the GCD of two 16 bit numbers
; Input: numbers in ax and bx registers. Output in ax register, when carry is

; returned clear. If carry is returned set, it indicates either one or both of
the ; input data are zeros. The program uses dx register in addition to the
input
; registers ax and bx. Rest of the registers are left intact.
GCD proc near

Cmp ax, bx
Jae down
Xchg ax, bx ;
Down: or bx, bx ; bx will now be the smaller of the two data
Jz invalid ; if it is zero the data is invalid
Push dx ; else, save register used and
; Note, Push is done only for valid data
Again:Sub dx, dx ; prepare for word division
Div bx ; No divide overflow possible as dividend itself is 16-bits
Mov ax, bx
Mov bx, dx
Or dx,dx ; note this will clear the carry
Jnz again
Pop dx
Ret
Invalid:Stc
Ret
Testing in Debug
-u 0 1e
127
13D5:0000 E80200 CALL 0005
13D5:0003 CD01 INT 01
13D5:0005 3BC3 CMP AX,BX ; procedure from here; Euclid’s algo.
13D5:0007 7301 JNB 000A
13D5:0009 93 XCHG BX,AX
13D5:000A 0BDB OR BX,BX ; bx carries the smaller number here.
13D5:000C 740F JZ 001D ; if bx = 0, invalid data
13D5:000E 52 PUSH DX ; save register used
13D5:000F 2BD2 SUB DX,DX ; prepare for word division
13D5:0011 F7F3 DIV BX
13D5:0013 8BC3 MOV AX,BX
13D5:0017 0BD2 OR DX,DX ; remainder = 0?; this also clears carry
13D5:0019 75F4 JNZ 000F ; if so, job over, gcd is in AX
13D5:001B 5A POP DX ; retrieve saved DX before return
13D5:001C C3 RET
13D5:001D F9 STC ; DX not saved and not used in this path
13D5:001E C3 RET
Test on normal data

-rax
AX 0000
:1234
-rbx
BX 0000
:1324
-rdx
DX 0000
:1111 ; data used to test if it is saved across the routine
-r
AX=1234 BX=1324 CX=001F DX=1111 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0000 E80200 CALL 0005
-g 3

13D5:0003 CD01 INT 01
Test on invalid data (BX = 0)

-rip
IP 0003
:0
-r
13D5:0000 E80200 CALL 0005
-g 3
128
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL ZR NA PE CY
13D5:0003 CD01 INT 01
-q
Certain features that we have introduced in this program are of general interest.
One such feature is the check on the input data, done at the beginning in this program.
Normally, programs operate on the data as long as the data is within some specific limits,
and if the input data transgresses these boundaries, the program will get into problems. It
is therefore, a good practice to have a check on the input data and limit that data within
certain boundaries. Any step beyond the allowable boundary will have to be handled by
not working on the data at all, but giving an indication in the output that the data is
invalid. This indication can be done in several fashions. Normally a message output is
presented on the screen or printed out indicating the fact. A simple way of handling this
requirement is through one of the flags in the flag register. One usual choice is the carry
flag as in this case. When the normal computation is through, all that needs to be done in
the main program is to check the carry flag, if a jump on carry instruction is used, the
program flow can be conveniently altered to account for this error. This method is very
common when using procedures or interrupt routines. Another feature of this program is
the fact that the push and pop of the register DX is done only when the data is valid, if the
data is invalid no such push pop is done, as no calculation is done if the data is invalid.
Exercise: Extend the above program, to give the LCM of the input data numbers,
using the well known relation: LCM (n1, n2) = (n1)*(n2)/GCD (n1, n2)
Hint: There are enough registers available, so that the following steps can be followed:
1. Save n1 and n2 in, say, si and di registers
2. Find GCD of n1 and n2 in ax as has been shown, and move it to bx.
3. Take n1 in ax (from si), make dx = 0 and word divide n1 by the GCD
4. The result will now be in ax with nothing in dx (why?), multiply this result by di.
The LCM will be in dx:ax and the GCD will be in bx.
This is perhaps the best method for finding LCM, even when GCD is not needed.
2. A program to produce a list of Fibonacci numbers not exceeding 16-bits:

This is written here as a main program. Fibonacci numbers start with 0 and 1,
and grow according to the equation Fn = Fn-1 + Fn-2, where Fn, Fn-1 and
Fn-2 are three of the numbers in sequence (with F0 = 0 and F1 = 1). The
requirement to handle the storing of a list or array of numbers is best met by
using the string instruction, stosw and stosb, the former for word store and the
latter for byte store, of a data string in memory. In both cases, the segment
register used is the ES. However, while displaying the result in the debug
environment, the default segment used is the DS. So, unless there is a specific
reason to have DS and ES as separate numbers, it is always convenient to
have identical segment addresses in DS and ES as done in this program.
The assembly language program
data segment
fibo dw 200 dup (0); unknown number of entries possible, so liberal provision.
data ends
code segment
129
assume cs:code, ds:data, es:data ; DS and ES are same as discussed above.
start: mov ax,data
mov ds, ax
mov es, ax
sub ax, ax ; the first number
lea di, fibo
stosw ; stored
mov bx, ax ; first word goes to bx
inc ax ; second word in ax
back: stosw ; stored
xchg ax, bx ; two consecutive words now in ax, bx
add ax, bx ; add them to get the next word
jnc back ; does it go beyond the 16-bit limit?, if not go back.
int 01
code ends
end start
Testing in debug
-u 0 17
13EE:0000 B8D513 MOV AX,13D5
13EE:0003 8ED8 MOV DS,AX
13EE:0005 8EC0 MOV ES,AX
13EE:0007 2BC0 SUB AX,AX
13EE:0009 8D3E0000 LEA DI,[0000]
13EE:000D AB STOSW
13EE:000E 8BD8 MOV BX,AX
13EE:0010 40 INC AX
13EE:0011 AB STOSW
13EE:0012 93 XCHG BX,AX
13EE:0013 03C3 ADD AX,BX
13EE:0015 73FA JNB 0011
13EE:0017 CD01 INT 01
-g 7
AX=13D5 BX=0000 CX=01A9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

DS=13D5 ES=13D5 SS=13D5 CS=13EE IP=0007 NV UP EI PL NZ NA PO NC
13EE:0007 2BC0 SUB AX,AX
-d 0 3f
13D5:0000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13D5:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13D5:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13D5:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-g 17
AX=2511 BX=B520 CX=01A9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0032
DS=13D5 ES=13D5 SS=13D5 CS=13EE IP=0017 NV UP EI PL NZ NA PE CY
13EE:0017 CD01 INT 01
-d 0 3f
13D5:0000 00 00 01 00 01 00 02 00-03 00 05 00 08 00 0D 00 ................

13D5:0010 15 00 22 00 37 00 59 00-90 00 E9 00 79 01 62 02 ..".7.Y.....y.b.
13D5:0020 DB 03 3D 06 18 0A 55 10-6D 1A C2 2A 2F 45 F1 6F ..=...U.m..*/E.o
13D5:0030 20 B5 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ...............
-q ; see comments in the next page to understand the significance of the

; highlight in the flag register display above.
; an interesting alternative to this program without using the string

; instruction stosw is presented below (in its essentials, without segment
; definitions etc).
130
Xor ax, ax
Mov bx, 1 ; the first two numbers 0 and 1
lea si, list_start ; start of the list
Mov [si], ax
Up: Mov [si+2], bx
Add si, 4
Add ax, bx
Jc down ; to terminate on carry
Mov [si], ax
Add bx, ax ; note, bx is made the destination now
Jnc up ; don’t terminate if no carry
Down: int 01 ; terminate
This program also gives the same result as above.
Comments on the first program above (using the stosw instruction): It is to be
noted that we have missed an important point in the above program, and that is, we have
forgotten to set the direction flag to have the array address incremented, that is, to have
the direction ‘up’. Fortunately, the direction happened to be up as can be seen from the
flag register display in the debug, so we were not able to notice this error. The rule
however, is as follows:
Before using string instructions, always set the direction flag as required.
The instruction STD (for decrementing the string address) or CLD (for
incrementing the string address) is to be there before the use of the string instructions. In
our program, the first use of stosw is as the 6th instruction from start label. So the CLD
instruction should appear inserted anywhere in the block of 5 instructions from start. In
the above program it has not been done, but the result has come out OK because the D
flag is turned off when we freshly enter the debug as we have done here.
3. A Program to find the sum of n numbers stored in an array in the

memory: This requirement is best met by the use of string or array handling
instructions for handling the addresses. Consider an array of words to be
summed up. This program requires the array data to be read from memory.
The instructions applicable are lodsw and lodsb. These instructions will load
into the registers AX or AL and then update the address in the SI register.
The default segment is the DS. We may not use the segment ES in this
program. The .asm program is given below.
Program Sum.asm for adding an array of up to 64K of unsigned bytes
data segment
array db 24h, 0a4h, 0bbh, 0fah, 58h, 23h
asize dw $-array ; notice the way of defining the array size
data ends
code segment
assume cs: code, ds:data
start: mov ax,data
mov ds,ax
cld ; before we forget, we clear the D flag
mov cx, asize; array size in number of bytes
sub dx,dx
mov bx,dx ; bx and dx will be used for summing operation
mov ax,dx ; essentially to make AH register = 0.
back: lodsb
131
add bx,ax ; collect the sum in BX
adc dx,0 ; any overflow from bx beyond the word size, will go to
DX
loop back
int 01
code ends
end start
The program above is simple enough to understand. But certain questions may
arise. The data added are in terms of bytes. But why is word addition (add ax, bx) done
in the program? Why the array size of 64K is chosen as maximum? What is the meaning
of $-array, as used to define the array size (label assize) in the data segment? We will
answer these questions one by one.
Add ax, bx: the data in al is to be added to the data in register bx, because, when
some bytes are added, the sum would sooner or later, overflow the byte size, and would
become word size. So, by putting 0’s in AH register, we would have made the byte into a
word in AX register, which is added to the sum collected so far in BX.
Size limit of 64K for the array: The offset address that can be accommodated in
SI is 216 or 64K (65536). Hence we cannot easily handle a byte array more than this size.
Please note, if we are handling word arrays, the maximum size easily handled is 32K
only. Also we have to take note that when we add more than 256 byte sized numbers, our
result may exceed 16 bit value and our addition must then handle numbers up to 3 byte
size and so on. If we add 64KB of byte size data we need to provide for full 24 bit
addition.
Meaning of $ - array in the data segment: $ is the symbol for the current memory
address and array represents the address at the label marked array, so automatically, this
expression $-array computes the array size in bytes. If word array is to be handled, the
size in bytes will have to be halved, which can easily be done in the program by a single
right shift of the array byte count or you might specify as ($-array)/2. MASM will take
care of this conversion during assembly of the program,
4. A program to find the approximate square root of a 16-bit number:

Finding the square root is not an easy process. However, squaring a number is
a relatively simple process, and it can be used to find the square root (at least
approximately) as the following simple program shows.
Basic AL Program for 16-bit number root finding
; In this program, the number whose root is to be found is put in the dx
register
; and the exact root if it is a perfect square, or an approximate root
otherwise is ; found in the register cl. The square of the number in the cl
register is shown
; in the ax register, so an approximate idea of the square root of the number
in dx ; can be had. The essentials of the program are given below without
comments. Try ; and work out the logic of the program.
OR DX, DX
JNZ SKIP
MOV CL, DL
MOV AX, DX
INT 01
SKIP: MOV CX, 1
UP: MOV AL, CL
MUL AL
132
CMP AX, DX
` JAE DOWN
INC CL
JNZ UP
DEC CL
DOWN: INT 01
The program above tries the square of every number from 1 onwards, until the
square of the number exceeds the given value in the DX register. The process can be
speeded up by increasing initially in steps of 16 and then refining the process by
increasing insteps of 1. Use of macros will help here. The following .asm program
below will show how:
The Program Sqrt.asm
code segment
assume cs: code
approx macro n
local ddd, dddd, uu
uu: mov al, cl
mul al
cmp ax, dx
ja ddd ; The macro highlighted in yellow
jz dddd
add cl, n
jz ddd
jmp uu
ddd: sub cl, n
dddd:
endm
start: or dh, dh
mov cl,0
jz down
approx 10h
down: approx 1
jz dn1 ; The Program using the macro in green
mov al, cl
mul al
dn1: int 01
code ends
end start
It should be noted that this program, when assembled, will require more memory
space, but will certainly execute faster, than the earlier program of the previous page.
5. Bubble Sort with Flagged Exchange: We will now look into a standard
bubble sort operation on an array.
The Bubble sort (ascend sort, as unsigned numbers) ALP using a macro Bigb,
which bubbles the biggest element of the array down to the bottom of the array
data segment
aray dw 75c2h, 8d29h,3bfbh,3bfbh,72f0h
data ends
code segment
assume cs: code, ds:data
start : mov ax,data
mov ds,ax
Mov cx, 5 ; array count n
133
lea si, aray ; array start address
sub dx,dx ; exchange flag
mov di, si ; save array start address
dec cx ; only n-1 bubbling necessary
up: mov bp, cx ; save this count for the next round
bigb macro ; big-to-the-bottom macro
local back, down
mov ax,[si]
back: mov bx, 2[si]
cmp ax, bx ; is ax > bx?
jbe down ; if no, go down
xchg ax, bx ; else, exchange ax, bx
inc dx ; indicate the array is altered, making dx non-zero
down: mov [si], ax ; store appropriate value at {si}
mov ax, bx ; adjust registers for the next bubble
add si,2 ; point to the next address and
loop back ; do the next bubble. cx = 0 here last
mov [si], ax ; store the last data, which is biggest
endm
bigb ; call to the macro
cmp dx, cx ; is dx = 0? (any alteration in the array?)
jz over ; if no, job over
mov dx, cx ; arrange to repeat; dx flag = 0
mov cx, bp ; bubble count is 1 less
mov si, di ; start address of array is same
loop up ; repeat bubbling now
over: int 01 ; terminate the process
code ends
end start
Testing the program in the debug
-u 0 35
13D6:0000 B8D513 MOV AX,13D5

13D6:0005 B90500 MOV CX,0005
13D6:0008 8D360000 LEA SI,[0000]
13D6:000C 2BD2 SUB DX,DX
13D6:000E 8BFE MOV DI,SI
13D6:0010 49 DEC CX
13D6:0011 8BE9 MOV BP,CX
13D6:0013 8B04 MOV AX,[SI]
13D6:0015 8B5C02 MOV BX,[SI+02]
13D6:0018 3BC3 CMP AX,BX
13D6:001A 7602 JBE 001E
13D6:001C 93 XCHG BX,AX
13D6:001D 42 INC DX ; macro expanded here
13D6:001E 8904 MOV [SI],AX
13D6:0022 83C602 ADD SI,+02
13D6:0025 E2EE LOOP 0015
13D6:0027 8904 MOV [SI],AX
13D6:0029 3BD1 CMP DX,CX
13D6:002B 7408 JZ 0035
13D6:002D 8BD1 MOV DX,CX
13D6:002F 8BCD MOV CX,BP
13D6:0031 8BF7 MOV SI,DI
13D6:0033 E2DC LOOP 0011
13D6:0035 CD01 INT 01
-g 5
134
DS=13D5 ES=13C5 SS=13D5 CS=13D6 IP=0005 NV UP EI PL NZ NA PO NC
13D6:0005 B90500 MOV CX,0005
-d 0 f
13D5:0000 C2 75 29 8D FB 3B FB 3B-F0 72 00 00 00 00 00 00 .u)..;.;.r......

; initial unsorted array – watch the five unsorted words: 75C2, 8D29, 3BFB,
3BFB ; and 72F0
-g
AX=72F0 BX=72F0 CX=0000 DX=0000 SP=0000 BP=0002 SI=0004 DI=0000

DS=13D5 ES=13C5 SS=13D5 CS=13D6 IP=0037 NV UP EI PL ZR NA PE NC
13D6:0037 C6061D01FF MOV BYTE PTR [011D],FF DS:011D=E8
-d 0 f
13D5:0000 FB 3B FB 3B F0 72 C2 75-29 8D 00 00 00 00 00 00 .;.;.r.u).......

; final sorted array, see the rearranged, sorted array of the same words.
-q
The bubble sort is handling an array both for reading from, as well as writing to
the memory. Therefore the string instruction, lods or the instruction stos or both could be
used. Using both may be difficult, as it requires two registers for the inner loop and these
will have to be renewed from the memory every time the outer loop is initialized. Two
example programs are given below, using only the lods instruction. It can be seen that
there is some difficulty in handling the registers where the data comparison is made in
respect of selecting the item to be stored in the memory and keeping track of the array
modification in the inner loop using the DX register. The two programs given, present
two ways of keeping proper track. It may be simpler if stos only is used. The reader may
try this alternative as an exercise. The program area managing the difficult part of
handling the tracking of array modification using the DX register is highlighted in both
the programs for identification and study by the reader.
Bubble Sort Program Without using macro, Version 1
data segment ; this program is tested and works OK

array dw 1234h, 0abcdh,1234h, 0fdcah, 2345h
count dw 5
data ends
code segment
assume cs: code, ds: data
start: mov ax, data
mov ds, ax
mov cx, count
lea di, array
dec cx
cld
sub dx, dx
up: mov si,di
mov bp, cx
lodsw
mov bx, ax
back: lodsw
inc dx
cmp ax,bx
jb down
xchg ax,bx
135
dec dx
down: mov [si-4], ax
loop back
mov [si-2], bx
cmp dx, cx
mov dx,cx
mov cx, bp
loopnz up
int 01
code ends
end start
Bubble Sort Program without macro, Version 2
data segment ; this program is tested and works OK

array dw 1234h, 0abcdh,1234h, 0fdcah, 2345h
count dw 5
data ends
code segment
start: mov ax, data
mov ds, ax
mov cx, count
lea di, array
dec cx
cld
sub dx, dx
up: mov si,di
mov bp, cx
lodsw
back: mov bx, ax
lodsw
cmp ax,bx
jae down
xchg ax,bx
inc dx
down: mov [si-4], bx
loop back
mov [si-2], ax
cmp dx, cx
mov dx,cx
mov cx, bp
loopnz up
int 01
code ends
end start
Bubble sort program version 3 using an inner loop within the outer
instead of a macro
code segment para

assume cs: code, ds:code, es:code
strt: mov ax, code
mov ds, ax
mov es, ax
cld
mov si, offset arrstrt
mov cx, count
dec cx ; n data items give n-1 pairs for comparison
back1: mov bp, cx ; save count in bp for the outer loop
mov di, si ; si remains constant, di is used in the loop
136
sub
dx, dx ; ‘exchange’ track flag, initialized to zero
mov
ax, [di]
back: mov
bx, [di+2] ; inner loop starts
cmp
ax, bx
jle
down ; smaller of the two goes to memory at[di]
; numbers interpreted as signed integers
inc dx ; alter dx from zero
xchg ax, bx
down: stosw
mov ax, bx ; the other data used in ax for next compare
loop back ; inner loop over
mov [di], ax ; the last item is also stored in memory
mov cx, bp ; recall the outer loop count
or dx, dx ; any exchange in the inner loop?
loopnz back1 ; if no, or if loop count is zero then
int 1 ; terminate
jmp strt ; given for facilitating testing again
nop ; to misalign the start of the data part.
Align 2 ; assembler directed for word alignment
arrstrt dw 1234h, 2348h, 8086h, 0abcdh, 0ffabh, 23ach
count dw ($ - arrstrt)/2
code ends
end strt
TESTING IN DEBUG
-u 0 34
13C0:0000 B8C013 MOV AX,13C0

13C0:0003 8ED8 MOV DS,AX
13C0:0005 8EC0 MOV ES,AX
13C0:0007 FC CLD
13C0:0008 BE3400 MOV SI,0034
13C0:000B 8B0E4000 MOV CX,[0040]
13C0:000F 49 DEC CX
13C0:0010 8BE9 MOV BP,CX
13C0:0012 8BFE MOV DI,SI
13C0:0014 2BD2 SUB DX,DX
13C0:0016 8B05 MOV AX,[DI]
13C0:0018 8B5D02 MOV BX,[DI+02]
13C0:001B 3BC3 CMP AX,BX
13C0:001D 7E02 JLE 0021
13C0:001F 42 INC DX
13C0:0020 93 XCHG BX,AX
13C0:0021 AB STOSW
13C0:0022 8BC3 MOV AX,BX
13C0:0024 E2F2 LOOP 0018
13C0:0026 8905 MOV [DI],AX
13C0:0028 8BCD MOV CX,BP
13C0:002A 0BD2 OR DX,DX
13C0:002C E0E2 LOOPNZ 0010
13C0:002E CD01 INT 01
13C0:0030 EBCE JMP 0000
13C0:0032 90 NOP
13C0:0033 90 NOP ; this by the assembler for aligning.
13C0:0034 3412 XOR AL,12
-d cs:34 41
13C0:0030 34 12 48 23-86 80 CD AB AB FF AC 23 4.H#.......#

13C0:0040 06 00
137
..
-g
AX=1234 BX=1234 CX=0002 DX=0000 SP=0000 BP=0003 SI=0034 DI=003A

DS=13C0 ES=13C0 SS=13C0 CS=13C0 IP=0030 NV UP EI PL ZR NA PE NC
13C0:0030 EBCE JMP 0000
-d 34 3f
13C0:0030 86 80 CD AB-AB FF 34 12 48 23 AC 23 ......4.H#.#

-q
Note: The termination has been because of no exchange. The cx = 5,4 loops have
worked and in cx = 3 loop, there have been no exchange and the program has
terminated because of this after decrementing cx to 2 by the loopnz instruction
The result for loop with cx = 5 is 1234, 8086, abcd, ffab, 2348, 23ac
With cx = 4 is 8086, abcd, ffab, 1234, 2348, 23ac
With cx = 3 is 8086, abcd, ffab, 1234, 2348, 23ac
resulting in no exchange taking place for cx = 3. Hence the loop
terminates with cx becoming 2 as seen.
6. Program to Copy an Array of Data: Copying a data array may have to be

done in several situations. Even file copy can be done by data array copy
program if we input the file as a collection of data words. This program is
also of interest because it can use ‘movsw’ which is the only instance of direct
memory to memory data operation done by the 8086 processor. Recall at this
point, that the 8086 processor is designed as a register/memory or
register/register processor, and not as a memory/memory processor. So this is
an important exception to that design feature. It uses the string move
instruction movsb or movsw with no operands, that is, with all operands
implied. The instruction movsb will do the following operation:
Copy the byte at DS:SI to the location ES:DI. The source data is in the data
segment at the address in the Source Index register SI. The destination of the
move operation is the extra segment at the offset address in the Destination
Index register DI. The provision of different segment registers for the source
and destination of the move operation permits the full range of memory to be
used. The source data as well as the destination address can be anywhere in
the total memory of 1MB. Use of the same segment register for both source
and destination would restrict the move to be confined to a single segment or
64 KB. With separate segment registers, different and remote memory
locations may be approached for the copy operation. The string move
instructions also do the address change operations also, after the move is done.
If the direction flag D is clear, the addresses in SI and DI will be incremented
(by 1 for byte move, and by 2 for word move), while, if the direction flag D is
set, both SI and DI will be decremented appropriately.
Why this choice of address increment/ decrement? In case, there is overlap

between source and destination data block, there could be problems as
illustrated below. Suppose the source block is 100h to 1FFh, and destination
block is 180h onwards in the same segment, with an overlap of the blocks in
138
the range 180h to 1FFh . If we now start with the start of the source block at
100h and do movsb operation, we would be copying the byte at 100h into
180h. But remember, there is another source data sitting there at
memory180h, which gets lost by this operation. Continuing will make us lose
the data in the block from 180h to 1FFh being over-written with the data from
100 17Fh. Consider, on the contrary, we start the transfer of data from the
other end, namely from 1FFh of source and take it to 27Fh of destination, and
keep decrementing the addresses to continue the copying, all our data will be
safely transferred without loss. This requires operation with the D flag set.
With overlap, if the source start address is larger than the destination start
address, it can be checked easily, that the data will be safely copied when
addresses increase every time, that is, with D flag cleared. In case there is no
overlap between the source and destination data blocks, working with either D
flag set or cleared will be OK. Combining all these, we can see that if the
absolute physical address of the source array start is lower than the destination
start address the data transfer should be done starting from the array end with
the D flag set irrespective of whether there is overlap or not. If the source
start address is greater than the destination start, the data transfer can be done
beginning from the start of the array with the D flag cleared irrespective of
overlap. In the trivial case of the start address of source and destination are
the same, no move is needed. The Block Move Program 1 below, indicates
the operations when the source address is lower than the destination address
and with ES = DS. Program 2 considers the case where DS and ES happen to
be different.
Repeat Prefix: The rep prefix can be used in this context instead writing a
transfer loop. After initializing the SI, DI and the DS, ES registers, initialize
the CX register with the byte count for movsb or word count for movsw
operation and then use the respective instruction using the rep prefix. The
loop will be executed reducing CX every time until it becomes zero.
Exercise: The data block from 13D5:1322h to 13D5:1351h is to be moved to

13D7:1320h onwards. What should be the D flag setting? Give an assembly
language program in the debug environment to do the job.
Below is the program 1 to copy a data array starting at the memory from
location labeled blok, to start at the memory location labeled dest in the same
segment. In this example destination address is above the source address and
the source and destination data blocks overlap. It is therefore necessary to
move the data from bottom end upwards setting the D flag. The program is
given below.
Example :Block Move program 1:

data segment
blok dw 1234h, 5678h, 9abch, 0cdefh, 2345h, 789ah
count dw($-blok)/2
dw 10 dup (0)
139
dest dw 3
data ends
code segment
assume cs:code, ds:data, es: data
strt: mov ax, data
mov ds, ax
mov es, ax
mov si, offset blok
mov di, dest
mov cx, count
mov ax, cx
dec ax
shl ax, 1
add si, ax
add di, ax
std
rep movsw
int 01
code ends
end strt
-u 0 1e
13D8:0000 B8D513 MOV AX,13D5

13D8:0007 BE0000 MOV SI,0000
13D8:000A 8B3E2200 MOV DI,[0022]
13D8:000E 8B0E0C00 MOV CX,[000C]
13D8:0014 48 DEC AX
13D8:0015 D1E0 SHL AX,1
13D8:0017 03F0 ADD SI,AX
13D8:0019 03F8 ADD DI,AX
13D8:001B FD STD
13D8:001C F3 REPZ
13D8:001D A5 MOVSW
13D8:001E CD01 INT 01
-g 1b
AX=000A BX=0000 CX=0006 DX=0000 SP=0000 BP=0000 SI=000A DI=000D
DS=13D5 ES=13D5 SS=13D5 CS=13D8 IP=001B NV UP EI PL NZ NA PO NC
13D8:001B FD STD
-d 0 f
13D5:0000 34 12 78 56 BC 9A EF CD-45 23 9A 78 06 00 00 00 4.xV....E#.x....
-t
AX=000A BX=0000 CX=0006 DX=0000 SP=0000 BP=0000 SI=000A DI=000D
DS=13D5 ES=13D5 SS=13D5 CS=13D8 IP=001C NV DN EI PL NZ NA PO NC
13D8:001C F3 REPZ
13D8:001D A5 MOVSW
-g
AX=000A BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=FFFE DI=0001
DS=13D5 ES=13D5 SS=13D5 CS=13D8 IP=0020 NV DN EI PL NZ NA PO NC
13D8:0020 0000 ADD [BX+SI],AL DS:FFFE=DB
-d 0 f
13D5:0000 34 12 78 34 12 78 56 BC-9A EF CD 45 23 9A 78 00 4.x4.xV....E#.x.
-q
140
Certain features of the above program may need explanation. Firstly, the entry
for the count in the data segment (highlighted) defines the count as a word size data and
gives a simple expression: the number of data bytes = $-blok, this divided by 2 is the
number of words of the blok. The assembler will compute this value during assembly.
Secondly, it could easily be worked out in this case, that the data transfer is to be done
starting from the tail end of the data block. Using the word count in the data block it is
necessary to get the tail addresses of the source and the destination blocks. The
highlighted part of the code represents this calculation and the setting of the D-flag. If
data could be transferred starting from the head end, all this is not needed, and a simple
CLD will suffice to ensure address incrementing.
Example: Block Move Program 2:

In this example, deliberately ES and DS segments are made different and the
actual physical memory addresses of the memory source and destination of the block
move may or may not overlap. The program computes the physical address difference
between the source and destination first and then takes a decision about the setting of the
direction flag. This gives an example of a very general block move operation. In the
program below, the source array is moved to the destination location. The program is
assembled and tested as shown. The segments and data are so arranged as to start
moving from the end address with the D flag set. The result of executing the program in
the debug is shown.
; The Block2.asm program
data segment
;
arr db 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15
db 16, 17, 18, 19, 20 ; source array
aln db ?
n dw ($-arr-1), 51 dup (0)
data ends
;
extra segment
ddi dw ?
dw 32 dup(0)
extra ends
;
code segment
assume cs:code, ds:data, es:extra
strt: mov ax, data
mov ds, ax
mov ax, extra
mov es, ax
cld
; the macro below compares the ds:si address with es:di address
cmpadr macro ;; this macro compares the physical address of
;; source and destination addresses; can be used
;; generally in such block move situations.
local cmp1
push dx
push cx
sub cx,cx
mov ax, ds
mov dx, es
sub ax, dx
mov dx, 16
imul dx
141
mov di, offset [ddi]
sub ax, di
sbb dx, cx
mov si, offset [arr]
add ax, si
adc dx, cx
or ax, dx
jz cmp1
inc cl
rol dx,1
jnc cmp1
neg cx
cmp1: mov ax, cx
pop cx
pop dx
endm
;
cmpadr
mov si, offset [arr]
mov di, offset [ddi]
or ax, ax
jz over
mov cx, n
jns down
mov ax, cx
dec ax
add si, ax
add di, ax
std
down: rep movsb
cld
over: int 1
code ends
end strt
; The program occupied 54 hex bytes of code.

; Execution of the above program gave the following results
-g
AX=0014 BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=FFFF DI=FFFF
DS=13DC ES=13E4 SS=13DC CS=13E9 IP=0055 NV UP EI PL NZ NA PE NC
13E9:0055 47 INC DI
-d 0 bf
13DC:0000 00 01 02 03 04 05 06 07-08 09 0A 0B 0C 0D 0E 0F ................

13DC:0010 10 11 12 13 14 00 15 00-00 00 00 00 00 00 00 00 ................
13DC:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0040 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0070 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; Note: Extra segment starts from here (13DC:0080 is same as 13E4:0000)

13DC:0080 00 01 02 03 04 05 06 07-08 09 0A 0B 0C 0D 0E 0F ................
13DC:0090 10 11 12 13 14 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; original source data : copied data
The macro cmpadr given above assumes ES and DS segments are different and
source array starts at offset arr in the data segment, while the destination array is in the
142
extra segment and is to start at offset ddi. Further, the macro saves all registers other than
AX. If the absolute address of the destination array is less than that of the source array,
data can be moved from start to end of the array irrespective of the overlap to get the
correct result. For this condition, the value in AX at the end of the macro is 1. If it is
greater than the source start address, then AX will have -1. In the very rare case they
happen to be same, in which case no move need be done at all, at the end of the macro,
the zero flag would have been set. However, if the data and extra segments are the same,
the cmpadr macro need not be used and the program of Block move 1 given earlier will
be adequate.
A special example of block handling – Block reversal in situ: A requirement

may sometimes arise to have an array reversed and put back in the same location without
using additional memory. The macro below will be able to do the job. While invoking
the macro, the register SI must be loaded with the start address offset for the array in the
data segment and the number of entries in the array must be indicated by n. The
parameter bw in the invocation, refers to the type of the array, byte or word. If byte array
bw should be 1, and if word array, bw should be 2. For example, invocation of the macro
as: ‘rev 2, 50’ will reverse an array of 50 words. Rev1 loop makes the offset address in
DI point to the end of the array, while rev2 loop carries out interchange of the start
element with the end element and moving similarly inwards into the array towards the
middle. Note how this loop is terminated at the middle of the array. Ensure yourself that
the rev2 loop will handle correctly the reversing operation, when the value of n is even as
well as when it is odd. The macro uses DI, SI, AX and CX registers
rev macro bw, n
local rev1, rev2
mov di, si
mov ax, n
dec ax
mov cx, bw
rev1: add di, ax
loop rev1
cld
rev2: lodsw
xchg ax, [di]
mov [si-bw], ax
sub di, bw
cmp si, di
jb rev2
endm
7. Checking if a given 16-bit number is a Prime: This program is slightly less

simple compared to the other programs we have been studying. It will provide a
good example for the use of macros in a program. The logic of the program is as
follows: Step 1. Check the number is a valid number. We consider the numbers 0
and 1 as invalid inputs for this program. All other numbers are valid. Step 2.
Check if the number is even, that is, divisible by 2. This check can be easily done
using the rotate instruction. Rotate right, followed by rotate left will put the least
significant bit of the number in the carry register, and still keep the original number
unaltered. This is the simplest way to check the odd/even feature of any number,
still saving the number without a change. Step 3. Try if the number is divisible by
143
odd numbers one by one; by successively dividing the number by the odd numbers
and testing if the remainder is zero. Stop whenever the remainder becomes zero
and declare the number not prime. The process could be improved if after seeing
the number is not divisible by 3, we skip division by odd numbers which are
multiples of 3. The sequence of numbers used as trial divisors is thus, 3, 5, 7, 11,
13, 17 etc. Step4. Terminate the process and declare the number is a prime, when
the square of the trial divisor exceeds the given number. The logic of this step is
that if a given number has no factor less than the square root of that number, then it
cannot have a factor bigger than this square root, because, a factor bigger than the
square root must imply that the quotient of division must be definitely less than the
square root. This is also a factor of the given number. If we have not found such a
number evenly dividing the given number, there cannot be a factor greater than the
square root. In the program given below, the process of checking if the next trial
divisor is greater than the square root, dividing to see if it is a factor of the given
number is bundled into a macro. The program follows.
The Assembly language program
; This program, considers an input no. in the register ax. The output from the
; program is in register ax. If ax = 0 at output, then the number is a prime,
else ; if ax = -1 (FFFF H), then the number is not a prime. In this case, the
smallest ; prime factor of the number is in cx register. Else, if ax has the
number ABCD H, ; then the input number is invalid (0 or 1). In this case, the
CY flag also will ; be set. In all other cases, the CY flag would be reset.
The input will be found ; in bx at the output stage.
code segment
assume cs:code
strt: mov bx, ax; the number input in ax, is saved in bx
mov cx, 2
cmp cx, bx
ja invalid
jz prime
ror bx, 1
rol bx, 1 ; lsb is now in CY also, and bx is unaltered
jnc nprime
checkp macro n
add cl,n ; get the next trial factor
jc prime ; if cl exceeds 8 bits, the number is prime
mov ax, cx
mul cx ; get the square of the number; this will also make dx = 0
cmp ax, bx
jz nprime ; if the number equals the square, then it is not prime
ja prime ; if square is greater, then it is prime
mov ax, bx
div cx
or dx,dx ; if no remainder, then
jz nprime ; the number is not a prime
endm
checkp 1 ; testing trial factor 3, adding 1 to cx register

checkp 2 ; testing trial factor 5, add 2 to cx now
back: checkp 2 ; testing trial factors 7, 13, 19, etc
checkp 4 ; testing trial factors 11, 17, 23, etc
jmp back ; loop to increase the trial factor by 6
invalid: mov ax, 0abcdh
stc
jmp finish
prime: mov ax, 00
144
jmp next
nprime: mov ax, -1
next: clc
finish: int 01
jmp strt ; used for repeated testing in the debug
code ends
end strt
Testing in debug
-u 0 82

13D5:0002 B90200 MOV CX,0002
13D5:0005 3BCB CMP CX,BX
13D5:0007 7766 JA 006F
13D5:0009 746B JZ 0076
13D5:000B D1CB ROR BX,1
13D5:000D D1C3 ROL BX,1
13D5:000F 736B JNB 007C
13D5:0011 80C101 ADD CL,01
13D5:0014 7260 JB 0076
13D5:0018 F7E1 MUL CX
13D5:001A 3BC3 CMP AX,BX
13D5:001C 745E JZ 007C ; macro checkp 1, expanded
13D5:001E 7756 JA 0076
13D5:0022 F7F1 DIV CX
13D5:0024 0BD2 OR DX,DX
13D5:0026 7454 JZ 007C
13D5:0028 80C102 ADD CL,02
13D5:002B 7249 JB 0076
13D5:002D 8BC1 MOV AX,CX
13D5:002F F7E1 MUL CX
13D5:0033 7447 JZ 007C ; macro checkp 2, expanded
13d5:0035 773F JA 0076
13D5:0039 F7F1 DIV CX
13D5:003B 0BD2 OR DX,DX
13D5:003D 743D JZ 007C
13D5:003F 80C102 ADD CL,02
13D5:0042 7232 JB 0076
13D5:0046 F7E1 MUL CX
13D5:004A 7430 JZ 007C ; macro checkp 2, expanded
13D5:004C 7728 JA 0076
13D5:004E 8BC3 MOV AX,BX
13D5:0050 F7F1 DIV CX
13D5:0052 0BD2 OR DX,DX
13D5:0054 7426 JZ 007C
13D5:0056 80C104 ADD CL,04

13D5:0059 721B JB 0076
13D5:005B 8BC1 MOV AX,CX
13D5:005D F7E1 MUL CX
13D5:005F 3BC3 CMP AX,BX
13D5:0061 7419 JZ 007C ; macro checkp 4, expanded
13D5:0063 7711 JA 0076
13D5:0067 F7F1 DIV CX
145
13D5:0069 0BD2 OR DX,DX
13D5:006B 740F JZ 007C
13D5:006D EBD0 JMP 003F
13D5:006F B8CDAB MOV AX,ABCD
13D5:0072 F9 STC
13D5:0073 EB0B JMP 0080
13D5:0075 90 NOP
13D5:0076 B80000 MOV AX,0000
13D5:0079 EB04 JMP 007F
13D5:007B 90 NOP
13D5:007C B8FFFF MOV AX,FFFF
13D5:007F F8 CLC
13D5:0080 CD01 INT 01
13D5:0082 E97BFF JMP 0000
Executing the program in the debug
-rax
AX 0000
:ffdf
-g
AX=FFFF BX=FFDF CX=001F DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

13D5:0082 E97BFF JMP 0000; 1F is the smallest prime factor of FFDF
-rax
AX FFFF
:ffef
-g
AX=0000 BX=FFEF CX=0001 DX=00F5 SP=0000 BP=0000 SI=0000 DI=0000

DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0082 NV UP EI PL NZ AC PO NC
13D5:0082 E97BFF JMP 0000; FFEF is a Prime
-g
; note 0 is in ax register for this run
AX=ABCD BX=0000 CX=0002 DX=00F5 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0082 NV UP EI PL NZ NA PO CY
13D5:0082 E97BFF JMP 0000; The input is invalid in this case
Note: In the above program, there are a lot of unnecessary operations involved by
way of testing division by all numbers which are not multiples of 2 or 3. The checks by
these extra numbers could be easily avoided if a table of prime numbers less than 256 is
provided. Such a list is given below, in the data segment of the program listing and it is
seen that there are only 53 such numbers (excluding 2); it is thus adequate if these
numbers only are tested for division of the given number. In the program given earlier,
we check 2 numbers in a group of 6, which means we check about 84 numbers in all upto
256, if we cover the full range. The program here requires only a very slight
modification from the program given earlier, and is presented below.
Here is the program:
data segment
; here is the table of primes ending with -1.
prlst db 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61
db 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131
db 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193
146
db 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 255
data ends
code segment
assume cs:code, ds:data
strt: mov bx, ax; the number input in ax, is saved in bx
mov ax, data
mov ds, ax
mov cx, 2
cmp cx, bx
ja invalid
jz prime
ror bx, 1
rol bx, 1 ; lsb is now in CY also, and bx is unaltered
jnc nprime
cld
sub ch, ch
lea si, prlst
checkp macro ; this macro checks if the next prime

; is a factor of the given number
sub dx, dx ; prepare for word division
lodsb ; get the next prime from the table
cmp al, -1 ; check for end of table
jz prime ; if yes, the number is prime
mov cl, al
mul cl
cmp ax, bx
jz nprime
ja prime
mov ax, bx
div cx ; word division; reason out why?
or dx, dx
jz nprime
endm
;
back: checkp
jmp back ;in this loop we check if the next prime is a factor.
invalid: mov ax, 0abcdh
stc
jmp finish
prime: mov ax, 00
jmp next
nprime: mov ax, -1
next: clc
finish: int 01
code ends
The reader should be able to check the validity of the program
8. A program for counting the leading 0’s in a 32 bit data in regs. dx:ax
While handling the normalization process in floating point numbers, it
becomes necessary to count the leading 0’s in double length numbers. The
program is given below without any comment or demonstration of its
working. Interested readers may take it as an exercise and write
appropriate comments and also study its working by assembling it and
testing it in DEBUG. The program returns the input data intact in dx:ax,
and the count of leading zeros in cx.
THE PROGRAM FOR COUNTING THE LEADING ZEROS OF DATA IN DX:AX REGISTERS
147
code segment
assume cs:code
strt: Push dx
push ax
sub cx, cx
or dx, dx
jnz d16
mov cx, 16
mov dx, ax
d16: or dh, dh
jnz d8
add cx, 8
mov dh, dl
d8: test dh, 0f0h
jnz d4
add cx, 4
shl dh, 1
shl dh, 1
shl dh, 1
shl dh, 1
d4: test dh, 0C0h
jnz d2
add cx, 2
shl dh, 1
shl dh, 1
d2: test dh, 080h
jnz d1
add cx, 1
d1: or dh, dh
jnz d0
add cx, 1
d0: pop ax
pop dx
int 01
jmp strt ; to repeat with a new set of data
code ends
end strt
9. A slightly different program for counting leading zeros in a 64 bit data

stored in registers AX, BX, DX, BP with AX being the highest
significant word: Here is a different approach for the whole process.
The intention in this program is to make the program relatively small and
not involving any serious logical complexity. However, it does take more
time compared to the earlier program studied. The program and partial
testing of it are given below without much comment. The program shifts
the data so that the msb becomes the leading bit of reg. AX, the leading
zeros brought round as trailing zeros.
; This program counts the leading 0's in the quad word stored in regs.
; ax, bx, dx, bp with high word in ax. The program also shifts the
data
; so that the first non zero bit comes as msb in ax with the entire
data
; shited in all the 4 registers inserting as many trailing zeros as
required.
; the count of the leading zeros is available in cx.
;
code segment
assume cs: code
148
Strt: mov cx, 41h ; 1 extra count - the loop starts with loopz
instn.
jmp down
back: add bp, bp
adc dx, dx
adc bx, bx
adc ax, ax
down: test ah, 80h
loopz back
sub cx, 40h
neg cx
int 1
jmp strt
code ends
end strt
-u 0 18
13DC:0000 B94100 MOV CX,0041

13DC:0003 EB09 JMP 000E
13DC:0005 90 NOP
13DC:0006 03ED ADD BP,BP
13DC:0008 13D2 ADC DX,DX
13DC:000A 13DB ADC BX,BX
13DC:000C 13C0 ADC AX,AX
13DC:000E F6C480 TEST AH,80
13DC:0011 E1F3 LOOPZ 0006
13DC:0013 83E940 SUB CX,+40
13DC:0016 F7D9 NEG CX
13DC:0018 CD01 INT 01
-r bp
BP 0000 ;initially ax, bx, dx, bp are all 0’s. bp is made = 0011h.
:11
-g
AX=8800 BX=0000 CX=003B DX=0000 SP=0000 BP=0000 SI=0000

DI=0000
DS=13CC ES=13CC SS=13DC CS=13DC IP=001A NV UP EI PL NZ AC PO CY
13DC:001A EBE4 JMP 0000
-rax
AX 8800
:0
-g
AX=0000 BX=0000 CX=0040 DX=0000 SP=0000 BP=0000 SI=0000
DI=0000
DS=13CC ES=13CC SS=13DC CS=13DC IP=001A NV UP EI PL NZ NA PO CY
13DC:001A EBE4 JMP 0000
10. A program for taking in 4 hex digits from the keyboard using the
DOS interrupt 21h, function 1: In the following program, we take a 4-
digit hex input from the keyboard. To keep the program to be applicable
in a variety of situations, it is proposed to ignore any non-hex key if
pressed accidentally, and also if a wrong entry is made as seen by the echo
on the monitor, all the 4 hex digits can be properly fed in without having
149
to complete and restart the entry. The program takes in only the last 4 hex
digits entered. The entry is terminated by the character ‘$’. The program
and the assembled version are presented below. A macro is used and it is
highlighted in the listing.
assume cs:code
0000 code segment
0000 B9 0004 strt: mov cx,4 ; shift count
0003 BB 3030 mov bx,3030h
0006 8B D3 mov dx, bx
0008 B4 01 here: mov ah,1 ;
000A CD 21 int 21h ; input from keyboard with echo
000C 3C 24 cmp al, '$'
000E 74 1C jz next ; end of input
0010 3C 30 cmp al, 30h
0012 72 F4 jb here
0014 3C 3A cmp al,3ah
0016 72 0A jb next1
0018 24 DF and al, 0dfh; convert lower-to-upper case
001A 3C 41 cmp al, 41h
001C 72 EA jb here
001E 3C 46 cmp al, 46h
0020 77 E6 ja here
0022 8A F2 next1: mov dh, dl ; rotate regs to make way for
; fresh ascii input
0024 8A D7 mov dl, bh
0026 8A FB mov bh, bl
0028 8A D8 mov bl, al
002A EB DC jmp here ; get fresh ascii input
a2h macro r1 ;; ascii-to-hex conversion
local down0
cmp r1, 40h
jb down0
sub r1,7
down0:sub r1, 30h
endm
002C next: a2h bl
002C 80 FB 40 1 cmp bl, 40h
002F 72 03 1 jb ??0000
0031 80 EB 07 1 sub bl,7
0034 80 EB 30 1 ??0000:sub bl, 30h
a2h bh
0037 80 FF 40 1 cmp bh, 40h
003A 72 03 1 jb ??0001
003C 80 EF 07 1 sub bh,7
003F 80 EF 30 1 ??0001:sub bh, 30h
0042 D2 C7 rol bh, cl
0044 0A DF or bl, bh
a2h dl
0046 80 FA 40 1 cmp dl, 40h
0049 72 03 1 jb ??0002
004B 80 EA 07 1 sub dl,7
004E 80 EA 30 1 ??0002:sub dl, 30h
0051 8A FA mov bh, dl
a2h dh
0053 80 FE 40 1 cmp dh, 40h
0056 72 03 1 jb ??0003
0058 80 EE 07 1 sub dh,7
005B 80 EE 30 1 ??0003:sub dh, 30h
005E D2 C6 rol dh, cl
0060 0A FE or bh, dh
0062 CD 01 int 1
150
0064 EB 9A jmp strt
0066 code ends
end strt
Result of executing the program

-g
12344Bxcdyo$; the input list
AX=0124 BX=4BCD CX=0004 DX=A00B SP=0000 BP=0000 SI=0000 DI=0000

DS=13B1 ES=13B1 SS=13C1 CS=13C1 IP=0064 NV UP EI NG NZ NA PO NC
Notice the non-hex inputs are ignored and the last 4 hex keys are presented as
a hex number of 4-digits in the register BX. The ASCII code (24h) for the
character ‘$’is seen in the register AL, which has caused the program to
terminate.
11. Interrupt 21h function 7: Many times we may need to wait in the
middle of a program, may be till we finish reading the material already
displayed, after which we may want to have further display. Function 07
of DOS interrupt 21h will be helpful here. This function causes a program
to wait until a key is pressed and only then allows the program to proceed.
The ASCII code of the key pressed is available in AL register like in
function 1 of interrupt 21h, but function 7 does not echo the character
pressed to the standard output of the system (that is, the monitor). A
simple program to demonstrate this function is given below:
; A simple program - waits for a key press and
; returns ‘OK’ on the monitor when any key is pressed.
; File name: ok.asm

assume cs:code
0000 code segment
0000 B4 07 start: mov ah, 7
0002 CD 21 int 21h ; wait for key press
0004 B4 02 mov ah, 2
0006 B2 4F mov dl, 'O' ; note: single or double
0008 CD 21 int 21h ; quotes are okay for speci-
000A B2 4B mov dl, "K" ; fying display 'O’ “K”
000C CD 21 int 21h
000E B4 4C mov ah, 4ch ; function call to terminate
0010 CD 21 int 21h ; and return to the system
0012 code ends
end start
; This program could be directly tested in the DOS environment as follows:

; First the ok.asm file is assembled and then linked to produce an executable
; file ok.exe; then it is executed in the Dos environment by the simple command
; ok, the program is seen to wait till a key is pressed and when it happens,
: the word ‘OK’ is displayed on the monitor. Below is a demo
C:\DOCUME~1\acer\MYDOCU~1\MYFILE~1\REF~1.MAT\DOSPRO~1> ok ; no result for this
; command until any
; key is pressed.
; Then we get
OK ; ‘OK’ as seen (left),
; and the program
C:\DOCUME~1\acer\MYDOCU~1\MYFILE~1\REF~1.MAT\DOSPRO~1> ; returns control to
; DOS (terminates).
151
There are several other useful interrupt 21H functions, many of which are useful
for controlling different input/ output devices. Information on these functions is readily
available in the internet. They make handling of I/O operations like disk reading/writing,
video display handling etc. It is not the purpose here to go into these ready made
programs and their use.
_____xxxx_____
EXERCISES
1. Find the logic of the following 4-digit BCD to hex converter program. Input is a 4-
digit BCD in reg AX, and output in reg DX.
Hint: this is a divide by 2 operation to get bits of the result.
code segment
assume cs: code
strt: mov cx, 16
sub dx, dx
next: mov bx, ax
and bx, 1110h
shr bx, 1
shr bx, 1
sub ax, bx
shr bx, 1
sub ax, bx
shr ax, 1
rcr dx, 1
loop next
int 1
code ends
end strt
TESTING IN DEBUG
-u 0 1d
13DC:0000 B91000 MOV CX,0010

13DC:0003 2BD2 SUB DX,DX
13DC:0005 8BD8 MOV BX,AX
13DC:0007 81E31011 AND BX,1110
13DC:000B D1EB SHR BX,1
13DC:000D D1EB SHR BX,1
13DC:000F 2BC3 SUB AX,BX
13DC:0011 D1EB SHR BX,1
13DC:0013 2BC3 SUB AX,BX
13DC:0015 D1E8 SHR AX,1
13DC:0017 D1DA RCR DX,1
13DC:0019 E2EA LOOP 0005
13DC:001B CD01 INT 01
13DC:001D 90 NOP
-rax
AX 0000
152
:9999
-g
AX=0000 BX=0000 CX=0000 DX=270F SP=0000 BP=0000 SI=0000 DI=0000

DS=13CC ES=13CC SS=13DC CS=13DC IP=001D NV UP EI PL ZR NA PE NC
13DC:001D 90 NOP
The program can be improved as shown; also this gives a partial hint as
to the operations done:
; This program converts 4 digit BCD to binary. Input in reg AX
; output is also returned in AX; uses regs BX, CX and DX
; the program continuously divides the BCD data by 2
; to get the 10 lsb's. The 4 msb's are then got simply
; by rotating and ORing as is, and further rotated right by 2 more
bits ; (these bits are just 0’s) to properly align the hex result.
code segment
assume cs: code
strt: mov cx, 10
sub dx, dx
next: mov bx, ax
and bx, 1110h
shr bx, 1
shr bx, 1
sub ax, bx
shr bx, 1
sub ax, bx
shr ax, 1
rcr dx, 1
loop next
mov cl, 6
ror ax, cl
ror dx, cl
or ax, dx
int 1
2. The program below reverses an array in-situ, using the array start and array end
addresses in regs SI and DI. Study the logic and the clever use of the string instructions
in the program; also study the loop control adopted in the program without using reg cx.
Check that the program works both for even number of elements in the array as well as
odd number of elements. Check that the central element in an array with odd elements is
left as it is and not handled at all by the program.
; This program changes an array in-situ
; watch the clever use of string instructions here
; watch also the array loop control without using reg. cx
data segment
array db 1, 2, 3, 4, 5, 6
arr_end db 7
Data ends
;
code segment
assume cs:code, ds: data, es: data
153
start: mov ax, data
mov ds, ax
mov es, ax
mov di, offset arr_end
std
back: mov al, [di]
xchg al, [si]
stosb
inc si
cmp si, di
jb back
int 1
code ends
end start
3. Write an appropriate 8086 assembly language program to test the array reversing
macro under section 6 of this Chapter. Test the working of the program.
4. Study the various int 21H functions from the internet, and write small programs to
use some of them. The site at:
bbc.nvg.org/doc/Master%20512%20Technical%20Guide/m512techb_int21.htm
for example, gives good information.
154
6. ILLUSTRATING THE POWER OF THE 8086 PROCESSOR
Introduction to Handling Complex Programs: As discussed in Chapter 3, it is

necessary to have a clear idea of the steps involved in the program and to take care to
allocate proper registers for handling the different variables in the program. The
algorithms are to be chosen to keep the powers of the registers and of the instruction set
fully exploitable and usable. In this chapter, we shall see some reasonably complex
programs of the number crunching type. We will start by multi hex word multiplication
and then progress towards multi decimal word multiplication. The decimal facility
provided in the 8086 is not much. So we will use a hybrid hex/decimal system to do the
job. Add/sub operations on multi word numbers are relatively simple. Division is
difficult as we shall see. We will also see a factorial computations for large decimal
numbers and building up of a prime number table.
1. Multiplying a multi-word Hex number by a single word Hex number: In

this problem, the basic operation is simple word to word multiplications. Doing the
multi-word by multi-word type of operations involves two steps like handling matrices.
In fact, in one round of multiplication, a single word of the multiplier could multiply the
complete multiplicand word string which we seein this section. This operation could be
repeated as many times as there are words in the multiplier string in the second round as
we will see in the next section. The first round of multiplication can be easily
encapsulated in a macro as indicated below. The assembly language version, the
machine language version and the testing of the program in the debug are all shown
below, for a multi-word by single word multiplication.
code segment
Assume cs: code, ds:code, es:code
;
mmul macro
local again
xor bp,bp
again: lodsw
mul bx
xchg dx,bp
add ax, dx
adc bp, 0
stosw
loop again
mov [di], bp
endm
;
start: mov ax, cs
mov ds, ax
mov es, ax
mov si, offset mpd
mov di, offset prd
mov bx, 0abcdh ; multiplier
cld
mmul ; cx will have to be loaded manually during
execution
int 1
;
align 2
mpd dw 1234h, 56feh, 67abh, 89cdh
155
prd dw 5 dup (0)
;
code ends
end start
Testing in debug
-u 0 21

13DC:0002 8ED8 MOV DS,AX
13DC:0004 8EC0 MOV ES,AX
13DC:0006 BE2400 MOV SI,0024
13DC:0009 BF2C00 MOV DI,002C
13DC:000C BBCDAB MOV BX,ABCD
13DC:000F FC CLD
13DC:0010 33ED XOR BP,BP
13DC:0012 AD LODSW
13DC:0015 87D5 XCHG DX,BP
13DC:0017 03C2 ADD AX,DX
13DC:0019 83D500 ADC BP,+00
13DC:001C AB STOSW
13DC:001D E2F3 LOOP 0012
13DC:001F 892D MOV [DI],BP
13DC:0021 CD01 INT 01
-r
-d cs:24 35
13DC:0020 34 12 FE 56-AB 67 CD 89 00 00 00 00 4..V.g......
13DC:0030 00 00 00 00 00 00 ......
-rcx
CX 0036
:4 ; number of words in the multiplicand loaded manually.
-r
AX=0000 BX=ABCD CX=0004 DX=0000 SP=0000 BP=0000 SI=0024 DI=002C
-g
AX=8DBB BX=ABCD CX=0000 DX=4592 SP=0000 BP=5C7A SI=002C DI=0034
DS=13DC ES=13DC SS=13DC CS=13DC IP=0023 NV UP EI PL NZ NA PO NC
13DC:0023 90 NOP
-d 24 35
13DC:0020 34 12 FE 56-AB 67 CD 89 A4 4F 9D 5F 4..V.g...O._
13DC:0030 50 77 BB 8D 7A 5C Pw..z\
It can easily be checked that multiplication of hex no:‘89cd 67ab 56fe 1234’ by
hex no:‘abcd’ is equal to hex no:’5c7a 8dbb 7750 5f9d 4fa4’ using the
scientific calculator of the system in the hex mode in 2 rounds.
2. Multiplication of m-word hex number by n-word hex number: To

do this operation, we need more number of registers than the 8086 can provide. So we
may use the stack frame as an extension of the register set. We may pass the parameters
through the registers, but in the subroutine we may put them in a stack frame so that they
156
could be used whenever there is a need without being lost. The parameters required will
be m, n, and the start addresses of the multiplicand array of m-words and of the multiplier
array of n-words. The operations involved will be obtaining the multi-word operand with
word by word multiplication of the multiplier, and adding these products with proper
alignment. An additional temporary word array of (m+1)-words would be required to
store the partial results of single multiplier word multiplication with the complete
multiplicand. As we have seen in the example above, encapsulation of the operation
(word* multi-word) multiplication may not be needed (the macro for this was used only
once in our example 1 earlier), and we shall write the complete operation as a subroutine,
with the parameters passed through the registers of the processor.
data segment
mpd dw 9fedh, 8abch, 7efah, 0fdabh ; multiplicand
dw 252 dup (0) ; multiplicand can go upto a total of 256 words.
mpr dw 0f123h, 9cdeh, 8754h, 1156h, 3478h, 73fbh ; multiplier
dw 250 dup (0) ; multiplier can also be upto 256 words
prod dw 512 dup (0) ; product array has a space of 512 words
temp dw 257 dup (?) ; temporary use, (word*256 word) = 257 words
dw 15 dup (0) ; extra space
m dw 4
n dw 6
data ends
;
code segment
strt: mov ax, data
mov ds, ax
mov es, ax
mov ax, n
mov cx, m
mov bx, offset mpr ; addresses
mov si, offset mpd
mov dx, offset temp
mov di, offset prod
call mmmult
int 1
mmmult proc near
;initialising
;prepare the stack frame
push dx ; address of temp [bp + 12]
push bx ; address of mpr [bp + 10]
push si ; address of mpd [bp + 8]
push di ; address of prod [bp + 6]
push ax ; value of n [bp + 4]
push cx ; value of m [bp + 2]
push bp
mov bp, sp
sub ax, ax
push ax
push ax ; 2 word locations, for local variables in the stack
; frame: [bp - 2] partial prod, [bp - 4] mpr word
; position (outer loop index).
cld
; now proceed to clear the temp space
; outer loop starts here
olup: sub ax,ax
mov cx, 257
mov di, offset temp
rep stosw
157
mov bx, [bp + 10]
add bx, [bp - 4]
mov bx, [bx]
mov di, [bp + 12]
mov si, [bp + 8]
mov cx,[bp + 2]
; inner loop starts now.
ilup: lodsw
mul bx
xchg dx, [bp – 2]
add ax, dx
adc word ptr[bp - 2], 0
stosw
loop ilup
mov ax, [bp - 2]
stosw
; inner loop over
mov cx, [bp + 2]
inc cx
mov si, [bp + 12]
mov di, [bp + 6]
add di, [bp - 4]
clc
; another loop nested inside the outer loop
comp: lodsw
adc ax, [di]
stosw
loop comp
; nested loop completed
mov [bp - 2], cx ;clear [bp - 2], note cx = 0 here.
add word ptr [bp - 4], 2
mov cx, word ptr[bp + 4]
add cx, cx
cmp cx, word ptr [bp - 4]
jnz olup
; outer loop over - prepare to return
mov sp, bp ; unwind the stack frame and clear the stack
pop bp
pop cx
pop ax
pop di
pop si
pop bx
pop dx
ret ; and return
mmmult endp
code ends
end strt
-u 0 89
147F:0000 B8DC13 MOV AX,13DC

147F:0003 8ED8 MOV DS,AX
147F:0005 8EC0 MOV ES,AX
147F:0007 A1220A MOV AX,[0A22]
147F:000A 8B0E200A MOV CX,[0A20]
147F:000E BB0002 MOV BX,0200
147F:0011 BE0000 MOV SI,0000
147F:0014 BA0008 MOV DX,0800
147F:0017 BF0004 MOV DI,0400
147F:001A E80200 CALL 001F
147F:001D CD01 INT 01
147F:001F 52 PUSH DX
158
147F:0020 53 PUSH BX
147F:0021 56 PUSH SI
147F:0022 57 PUSH DI
147F:0023 50 PUSH AX
147F:0024 51 PUSH CX
147F:0025 55 PUSH BP
147F:0026 8BEC MOV BP,SP
147F:0028 2BC0 SUB AX,AX
147F:002A 50 PUSH AX
147F:002B 50 PUSH AX
147F:002C FC CLD
147F:002D 2BC0 SUB AX,AX
147F:002F B90101 MOV CX,0101
147F:0032 BF0008 MOV DI,0800
147F:0035 F3 REPZ
147F:0036 AB STOSW
147F:0037 8B5E0A MOV BX,[BP+0A]
147F:003A 035EFC ADD BX,[BP-04]
147F:003D 8B1F MOV BX,[BX]
147F:003F 8B7E0C MOV DI,[BP+0C]
147F:0042 8B7608 MOV SI,[BP+08]
147F:0045 8B4E02 MOV CX,[BP+02]
147F:0048 AD LODSW
147F:0049 F7E3 MUL BX
147F:004B 8756FE XCHG DX,[BP-02]
147F:004E 03C2 ADD AX,DX
147F:0050 8356FE00 ADC WORD PTR [BP-02],+00
147F:0054 AB STOSW
147F:0055 E2F1 LOOP 0048
147F:0057 8B46FE MOV AX,[BP-02]
147F:005A AB STOSW
147F:005B 8B4E02 MOV CX,[BP+02]
147F:005E 41 INC CX
147F:005F 8B760C MOV SI,[BP+0C]
147F:0062 8B7E06 MOV DI,[BP+06]
147F:0065 037EFC ADD DI,[BP-04]
147F:0068 F8 CLC
147F:0069 AD LODSW
147F:006A 1305 ADC AX,[DI]
147F:006C AB STOSW
147F:006D E2FA LOOP 0069
147F:006F 894EFE MOV [BP-02],CX
147F:0072 8346FC02 ADD WORD PTR [BP-04],+02
147F:0076 8B4E04 MOV CX,[BP+04]
147F:0079 03C9 ADD CX,CX
147F:007B 3B4EFC CMP CX,[BP-04]
147F:007E 75AD JNZ 002D
147F:0080 8BE5 MOV SP,BP
147F:0082 5D POP BP
147F:0083 59 POP CX
147F:0084 58 POP AX
147F:0085 5F POP DI
147F:0086 5E POP SI
147F:0087 5B POP BX
147F:0088 5A POP DX
147F:0089 C3 RET
-g 1a
DS=13DC ES=13DC SS=13DC CS=147F IP=001A NV UP EI PL NZ NA PO NC
147F:001A E80200 CALL 001F
-d 0 f ; multiplicand - size 4 words - as below
159
13DC:0000 ED 9F BC 8A FA 7E AB FD-00 00 00 00 00 00 00 00 .....~..........
-d 200 20f ; multiplier - size 6 words

13DC:200 23 F1 DE 9C 54 87 56 11-78 34 FB 73 00 00 00 00 #...T.V.x4.s....
-d a20 a2f ; input – data sizes in words of mpd and mpr

13DC:0A20 04 00 06 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-g
DS=13DC ES=13DC SS=13DC CS=147F IP=001F NV UP EI PL ZR NA PE NC
-d 400 41f ; prodduct output 10 (or m+n) words in size

13DC:0400 67 FA DD A5 A7 EE A3 5F-7D 71 54 30 8C 4D 55 DB g......_}qT0.MU.
13DC:0410 2D F5 EC 72 00 00 00 00-00 00 00 00 00 00 00 00 -..r............
The main and the sub-program above use 8AH (or 138) decimal bytes of memory,
and the sub routine uses only 107 bytes of memory with less than 60 instructions.
The results indicated above say that FDAB7EFA8ABC9FED multiplied by

73FB3478115687549CDEF123 is
72ECF52DDB554D8C3054717D5FA3EEA7A5DDFA67. Using the scientific calculator
of the system, this result can be verified, not so easily as in the earlier case, of course!
The following allocation of the data segment may be noted:

0000-01FF hex: space for the multiplicand words
0200-03FF hex: space for the multiplier words
0400-07FF hex: space for the product words
0800-0A00 hex: space for the temporary product of single word multiplication of
all the multiplicand words
0A02-0A1Fhex: not used
0A20 hex : multiplicand word count
0A22 hex : multiplier word count
It can therefore be observed that this program, as it is, will be useful for
multiplication of upto 256-word by 256-word hex numbers, that is, binary 4096-bit by
4096-bit numbers. Operations of this magnitude will be needed in cryptography and
other applications. An example of 255 x 255 word (or 4080 x 4080 bit) multiplication is
shown below. (However, nothing prevents us from using the entire data segment, in
which case, we can easily go up to 32000 digit hex numbers for our multiplier and
multiplicand.)
data segment
mpd dw 255 dup (0ffffh) ; multiplicand
dw 0
mpr dw 255 dup (0ffffh) ; multiplier
dw 0
prod dw 512 dup (0) ;.... .... . ; product array
temp dw 257 dup (?) ; temporary use
dw 15 dup (0)
m dw 255
n dw 255
data ends
160
;
code segment
; this and the un-assembled program are the same as shown earlier.
TESTING IN DEBUG
-g ;execute the program
AX=00FF BX=0200 CX=00FF DX=0800 SP=0000 BP=0000 SI=0000 DI=0400

DS=13DC ES=13DC SS=13DC CS=147F IP=001F NV UP EI PL ZR NA PE NC
-d0 9ff ; The displayed data are separated and labled for the sake of clarity
; multiplicand below- 255 words
13DC:0000 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:01F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF 00 00 ................
;The green highlighted words here and below are not in the data or results
; multiplier below - 255 words:

161
13DC:03F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF 00 00 ................
;Product below - 510 words
13DC:0400 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

13DC:0410 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0420 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0430 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0440 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0450 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0460 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0470 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0500 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0510 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0520 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0530 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0540 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0550 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0560 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0570 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0580 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0590 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:05A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:05B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:05C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:05D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:05E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:05F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 FE FF ................
162
13DC:07F0 FF FF FF FF FF FF FF FF-FF FF FF FF 00 00 00 00 ................
; Data in Temp. location - 256 words - last word-multiply result

13DC:0800 01 00 FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:09F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FE FF ................
-q
3. Handling Large BCD Numbers: Intel 8086 provides a minimal facility for
handling decimal numbers, but when we are thinking of large decimal numbers in the
163
BCD representation, these facilities are very inadequate. However, it is possible to
handle BCD, directly, while keeping the computations in hex at word level and using the
decimal power for inter-word operations. I call this as the method of power BCD. The
meaning will become clear, if we consider an example. Suppose we want to multiply the
BCD numbers 1234 and 5678, what we may do is to multiply the hex equivalent of 1234
by the hex equivalent of 5678, and divide the result by 10000 (104 for 4 digit BCD
computations) decimal to get two hex words. If we convert these two hex words to their
BCD equivalents we get the complete product in BCD. Our first step will be to convert
1234 and 5678 into their hex equivalents. These turn out to be 4D2 and 162E. These two
hex numbers can be directly multiplied to obtain the hex product 6AE9BC. Dividing this
by 10000 (equivalent to 2710 hex) we obtain the quotient 2BC and the remainder 19FC.
We can now convert these two hex numbers to BCD to get 0700 6652 the complete
product in the BCD form. This can be called BCD4 method as the 4-digit BCD is
handled in terms of 4-digit hex at a time. Two conversion processes become necessary
here. First, we convert the BCD words to Hex; and second, we convert the Hex words
less than 10000 (decimal) to BCD. These conversions we have done already and in this
context, they can be frozen into macros in an optimized fashion to be invoked whenever
needed. The two programs are shown below:
Condh macro ;; macro to convert BCD decimal word to hex word
mov bx, ax ;; the data for conversion is assumed in ax
and ax, 0f0f0h
ror ax, 1
ror ax, 1
sub bx, ax
ror ax, 1 ;; the given number converted to concatenation
of
sub bx, ax ;; two hex bytes; BCD 9863, for example,
becomes ;; hybrid (62)(3F), 62H = 98 BCD, and
3FH = 63 BCD
mov al, 100
mul bh
sub bh, bh
add ax, bx ;; result of conversion in ax.
Endm
Conhd macro ;; macro to convert hex word < 2710H to BCD word.
;; input BCD, assumed in ax
mov bl, 100
div bl
mov bl, ah
aam
xchg ax, bx
aam
xchg ah, bl
shl bx, 1
shl bx, 1
shl bx, 1
shl bx, 1
add ax, bx ;; output BCD in ax. Macro uses only bx with ax.
Endm
Both the above macros are reasonably optimal and use only BX register as
additional facility required for the conversion. The input and the converted output are
both in register AX. Now we should look at the multiplication of two such words to
164
produce a result which contains hex words such that the conversion of each hex word to
BCD and concatenation produces the decimal value string of the product. This
corresponds to the operation in the inner loop of the previous program. That program can
easily be modified with an additional temporary storage in the stack frame of the number
2710H at the location, say, [BP – 6]. This program will now become:
ilup: lodsw
mul bx
add ax, word ptr[bp – 2]; [bp-2] contains the number of 10000’s carried
; from the previous multiplication
adc dx, 0
div word ptr [bp - 6] ; [bp – 6] has 10000 decimal (2710h)
xchg ax, dx
mov word ptr [bp – 2],dx ;10000’s saved for the next word multiplication
stosw
loop ilup
mov ax, word ptr[bp – 2]
stosw
Simple addition of two words , in this Power BCD4 system, can be seen in the
following program:
BCD4_add: add ax, bx
add ax, 0d8f0h; 0d8f0h is the negative of 2710h (10000 decimal)
jc down
sub ax, 0d8f0h
down:
It is easy to see the above program leaves the data in ax corrected for BCD4
addition along with proper carry in the flag register for BCD4 add operation.
The highlighted portion in the above programs can be seen as the extra for
decimal operation in this loop and in the addition program. Everything else will remain
the same as the hex program, excepting the original BCD data conversion at the
beginning and the final conversion of the result in BCD4 form to BCD form at the end.
If other operations also are required on the BCD numbers then the power BCD
representation can be conveniently used right through without much difficulty.
The decimal m word by n word multiplication is done in the program below,

almost on the same lines as the hex multiplication already considered. In this program,
first the multiplicand BCD input is converted to BCD4 and stored elsewhere in the
memory and used for multiplication. The multiplier conversion is done word by word at
a time; the multiplier word when it is taken for multiplication is converted on the fly.
The program keeps both the multiplier and multiplicand inputs, undisturbed in the
memory. The multiplicand and multiplier considered in this demo program are both
16386 decimal digits long and the product is double this size. It should be noted that the
data stored in memory is hexadecimal (or binary). The word stored in the memory for
(BCD) 9998 is 1001 1001 1001 1000 in binary, and it is our interpretation that its value is
not hex 9998 but decimal 9998. when converted to BCD4 form this word becomes 270E.
This word is, of course, hex.
data segment
mpdraw dw 9998h ; raw or BCD array for the multiplicand starts here.
dw 4095 dup (9999h) ; continuation of mpd
mpr dw 9997h, 4095 dup (9999h)
165
prod dw 8192 dup (0) ; cleared product array
mpd dw 4096 dup (?) ; mpd array in BCD4 form used for the computations.
temp dw 4097 dup (?) ; reserved for temporary use
dw 15 dup (0)
m dw 4096 ; no. of mpd words
n dw 4096 ; no. of mpr words
data ends
;
code segment
condh macro
mov bx, ax
and ax, 0f0f0h ;; for digit separation
ror ax, 1
ror ax, 1
sub bx, ax
ror ax, 1
sub bx, ax
mov al, 100
mul bh
sub bh, bh
endm ;; note the last add ax, bx is omitted in this
;
conhd macro
mov bl, 100
div bl
mov bl, ah
aam
xchg ax, bx
aam
xchg ah, bl
shl bx, 1
shl bx, 1
shl bx, 1
shl bx, 1
add ax, bx
endm
;
strt: mov ax, data
mov ds, ax
mov es, ax
mov ax, n ; count of words of mpr
mov cx, m ; count of words of mpd
mov bx, offset mpr ; addresses
mov si, offset prod
mov dx, offset temp
mov di, offset mpd
mov bp, offset mpdraw
call dmmult
int 1
dmmult proc near
;initialising
;prepare the stack frame
push dx ; address of temp @ [bp + 12]
push bx ; address of mpr @ [bp + 10]
push di ; address of mpd @ [bp + 8]
push si ; address of prod @ [bp + 6]
push ax ; value of n @ [bp + 4]
push cx ; value of m @ [bp + 2]
push bp ; address of mpdraw @ [bp]
mov bp, sp
sub ax, ax
push ax
166
push ax ; 2 word locations, for local variables in the stack
; frame: [bp - 2] partial prod, [bp - 4] mpr word
; position (outerloop index).
mov ax, 2710h
push ax ; [bp - 6] stores decimal 10000
cld
;
; conversion of raw mpd data to the BCD4 form
mov si, [bp] ; address of mpdraw
conlup1: lodsw ; note di and cx are properly loaded at entry to the proc.
condh
add ax, bx ; BCD4 in ax now
stosw
loop conlup1 ; conversion complete for mpd
; now clear the temp space

olup: sub ax,ax ; outer loop
mov cx, 257
mov di, [bp + 12]
rep stosw
; The next multiplier word is taken up
mov bx, [bp + 10]
add bx, [bp - 4]
mov ax, [bx] ; this is now to be converted to BCD4 form
condh
add bx, ax ; BCD4 in bx now
mov di, [bp + 12]
mov si, [bp + 8]
mov cx,[bp + 2]
;
ilup: lodsw ; inner loop
mul bx
add ax, [bp - 2]
adc dx, 0
div word ptr [bp - 6] ; div by 10000
xchg ax, dx
mov [bp - 2], dx
stosw
loop ilup
mov ax, [bp - 2]
stosw
; inner loop over
mov cx, [bp + 2]
inc cx
mov si, [bp + 12]
mov di, [bp + 6]
add di, [bp - 4]
clc
; another loop nested inside the outer loop
; note the modification for powerBCD addition as below
comp: lodsw
adc ax, [di]
add ax, 0d8f0h; 2's complement of 2710h
jc ddown
sub ax, 0d8f0h ;there will not be any carry from this!
ddown:stosw
loop comp
; inner nested loop completed
mov [bp - 2], cx ;clear [bp - 2]
add word ptr [bp - 4], 2
mov cx, word ptr[bp + 4]
add cx, cx
cmp cx, word ptr [bp - 4]
167
jnz olup
; outer loop over - prepare to convert result to BCD
;
mov di, [bp+6] ; prod address
mov cx, [bp + 2]
add cx, [bp + 4]
conlup2: mov ax, [di]
conhd
stosw
loop conlup2 ; conversion over here, now prepare to return
mov sp, bp
pop bp
pop cx
pop ax
pop si
pop di
pop bx
pop dx
ret ; and return
mmmult endp
code ends
end strt
;TESTING IN THE DEBUG

-u 0 f6
149F:0000 B8DC13 MOV AX,13DC
149F:0003 8ED8 MOV DS,AX
149F:0005 8EC0 MOV ES,AX
149F:0007 A122CO MOV AX,[C022]
149F:000A 8B0E20C0 MOV CX,[C020]
149F:000E BB0020 MOV BX,2000
149F:0011 BE0040 MOV SI,4000
149F:0014 BA00A0 MOV DX,A000
149F:0017 BF0080 MOV DI,8000
149F:001A BD0000 MOV BP,0000
149F:001D E80200 CALL 0022
149F:0020 CD01 INT 01
149F:0022 52 PUSH DX
149F:0023 53 PUSH BX
149F:0024 57 PUSH DI
149F:0025 56 PUSH SI
149F:0026 50 PUSH AX
149F:0027 51 PUSH CX
149F:0028 55 PUSH BP
149F:0029 8BEC MOV BP,SP
149F:002B 2BC0 SUB AX,AX
149F:002D 50 PUSH AX
149F:002E 50 PUSH AX
149F:002F B81027 MOV AX,2710
149F:0032 50 PUSH AX
149F:0033 FC CLD
149F:0034 8B7600 MOV SI,[BP+00]
149F:0037 AD LODSW
149F:0038 8BD8 MOV BX,AX
149F:003A 25F0F0 AND AX,F0F0
149F:003D D1C8 ROR AX,1
149F:003F D1C8 ROR AX,1
149F:0041 2BD8 SUB BX,AX
149F:0043 D1C8 ROR AX,1
149F:0047 B064 MOV AL,64
149F:0049 F6E7 MUL BH
149F:004B 2AFF SUB BH,BH
168
149F:004D 03C3 ADD AX,BX
149F:004F AB STOSW
149F:0050 E2E5 LOOP 0037
149F:0052 2BC0 SUB AX,AX
149F:0054 B90101 MOV CX,0101
149F:0057 8B7E0C MOV DI,[BP+0C]
149F:005A F3 REPZ
149F:005B AB STOSW
149F:005C 8B5E0A MOV BX,[BP+0A]
149F:005F 035EFC ADD BX,[BP-04]
149F:0062 8B07 MOV AX,[BX]
149F:0064 8BD8 MOV BX,AX
149F:0066 25F0F0 AND AX,F0F0
149F:0069 D1C8 ROR AX,1
149F:006B D1C8 ROR AX,1
149F:006D 2BD8 SUB BX,AX
149F:006F D1C8 ROR AX,1
149F:0073 B064 MOV AL,64
149F:0075 F6E7 MUL BH
149F:0077 2AFF SUB BH,BH
149F:0079 03D8 ADD BX,AX
149F:007B 8B7E0C MOV DI,[BP+0C]
149F:007E 8B7608 MOV SI,[BP+08]
149F:0081 8B4E02 MOV CX,[BP+02]
149F:0084 AD LODSW
149F:0085 F7E3 MUL BX
149F:0087 0346FE ADD AX,[BP-02]
149F:008A 83D200 ADC DX,+00
149F:008D F776FA DIV WORD PTR [BP-06]
149F:0090 92 XCHG DX,AX
149F:0091 8956FE MOV [BP-02],DX
149F:0094 AB STOSW
149F:0095 E2ED LOOP 0084
149F:0097 8B46FE MOV AX,[BP-02]
149F:009A AB STOSW
149F:009B 8B4E02 MOV CX,[BP+02]
149F:009E 41 INC CX
149F:009F 8B760C MOV SI,[BP+0C]
149F:00A2 8B7E06 MOV DI,[BP+06]
149F:00A5 037EFC ADD DI,[BP-04]
149F:00A8 F8 CLC
149F:00A9 AD LODSW
149F:00AA 1305 ADC AX,[DI]
149F:00AC 05F0D8 ADD AX,D8F0
149F:00AF 7203 JB 00B4
149F:00B1 2DF0D8 SUB AX,D8F0
149F:00B4 AB STOSW
149F:00B5 E2F2 LOOP 00A9
149F:00B7 894EFE MOV [BP-02],CX
149F:00BA 8346FC02 ADD WORD PTR [BP-04],+02
149F:00BE 8B4E04 MOV CX,[BP+04]
149F:00C1 03C9 ADD CX,CX
149F:00C3 3B4EFC CMP CX,[BP-04]
149F:00C6 758A JNZ 0052
149F:00C8 8B7E06 MOV DI,[BP+06]
149F:00CB 8B4E02 MOV CX,[BP+02]
149F:00CE 034E04 ADD CX,[BP+04]
149F:00D1 8B05 MOV AX,[DI]
149F:00D3 B364 MOV BL,64
149F:00D5 F6F3 DIV BL
149F:00D7 8ADC MOV BL,AH
149F:00D9 D40A AAM
169
149F:00DB 93 XCHG BX,AX
149F:00DC D40A AAM
149F:00DE 86E3 XCHG AH,BL
149F:00E0 D1E3 SHL BX,1
149F:00E2 D1E3 SHL BX,1
149F:00E4 D1E3 SHL BX,1
149F:00E6 D1E3 SHL BX,1
149F:00E8 03C3 ADD AX,BX
149F:00EA AB STOSW
149F:00EB E2E4 LOOP 00D1
149F:00ED 8BE5 MOV SP,BP
149F:00EF 5D POP BP
149F:00F0 59 POP CX
149F:00F1 58 POP AX
149F:00F2 5D POP BP
149F:00F3 5F POP DI
149F:00F4 5B POP BX
149F:00F5 5A POP DX
149F:00F6 C3 RET
-g 1d
AX=1000 BX=2000 CX=1000 DX=A000 SP=0000 BP=0000 SI=4000 DI=8000
DS=13DC ES=13DC SS=13DC CS=1FDF IP=001D NV UP EI PL NZ NA PO NC
1FDF:001D E80200 CALL 0022; this is before entry to subroutine
Partial view of the data segment at entry to the subroutine

-d 0 f
13DC:0000 98 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
-d 2000 200f
13DC:2000 97 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
-d 4000 400f
13DC:4000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-d 6000 600f
13DC:6000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-d 8000 800f
13DC:8000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-d a000 a00f
13DC:A000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-d c000 c02f
13DC:C000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:C010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:C020 00 10 00 10 00 00 00 00-00 00 00 00 00 00 00 00 ................
m n
-g ;go and complete the subroutine and stop at ‘int 1’.
AX=1000 BX=2000 CX=1000 DX=A000 SP=0000 BP=0000 SI=4000 DI=8000
DS=13DC ES=13DC SS=13DC CS=1FDF IP=0022 NV UP EI NG NZ NA PE NC
1FDF:0022 52 PUSH DX ; this is after exit from subroutine
-d 0 c02f ; full display from this command, but only some portion is shown
; below
; view of the relevant portion of the data segment
; at first, the multiplicand in BCD, (1000h words = 64536 decimal digits)
13DC:0000 98 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
13DC:0010
; the memory from here to the location shown filled entirely with 99
13DC:1FF0 99 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
; multiplicand up to this.
170
; now the multiplier in BCD, also the same size as multiplicand
13DC:2000 97 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
13DC:2010 ;
multiplier is also filled with data 99 upto the line shown below.
13DC:3FF0 99 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
;excepting the highlighted words in mpd and mpr, rest of the words are all
;decimal 9999
; now the result

13DC:4000 06 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:4010 ; the part of the result in these memory area are all 0’s
13DC:5FF0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:6000 95 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
13DC:6010 ; the intervening memory contains all 99’s.
13DC:7FF0 99 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
;the highlighted words in the above result indicate the correctness of the
calculation; the rest of the words are 0000 in the first half and 9999 in the
last half as they should be.
; below is the BCD4 conversion of the multiplicand

13DC:8000 0E 27 0F 27 0F 27 0F 27-0F 27 0F 27 0F 27 0F 27 .'.'.'.'.'.'.'.'
; the data stored in the intervening memory are all 270f’s.
13DC:9FF0 0F 27 0F 27 0F 27 0F 27-0F 27 0F 27 0F 27 0F 27 .'.'.'.'.'.'.'.'
; below is the result of the last multiplier word multiplication of the

multiplicand (4097 word result).
13DC:A000 02 00 0E 27 0F 27 0F 27-0F 27 0F 27 0F 27 0F 27 ...'.'.'.'.'.'.'
13DC:A010 : the data stored in the intervening memory are all 270f’s.
13DC:BFF0 0F 27 0F 27 0F 27 0F 27-0F 27 0F 27 0F 27 0F 27 .'.'.'.'.'.'.'.'
13DC:C000 0E 27 00 00 00 00 00 00-00 00 00 00 00 00 00 00 .'..............
;
13DC:C010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:C020 00 10 00 10 00 00 00 00-00 00 00 00 00 00 00 00 ................
;
; The word at C000 address is the last word (4097th word)of the last partial
product.
; The 2 words at C020 and C022 represent m and n values
4. Factorial of large numbers in decimal displayed in the Big Endian

Fashion: Utilising the ideas presented so far, the program below is devised for finding
the factorial of numbers upto 7DA1 h (32177 decimal) and displaying them in the big
endian fashion in decimal. The multiplication part here is simpler as at a time we have to
multiply only one word with a BCD4 wordstring. If numbers larger than 7DA1 h are
used, we end upwith results which do not fit into the 64KB of the data segment, if we use
hexadecimal number system we can certainly handle factorials of larger numbers. The
result is displayed in the data segment with termination indicated by FFFF h appearing
after the result. This is done by initially filling all the result space with FFFF h, and
overwriting the results as we proceed. The process can be improved, if at the end, the
part of the unused data segment immediately following the result is filled up with FFFF,
just once, maybe. The program is given below with a simple demonstration of its
working. At the end, to get the big-endian display, the displayed array is simply inverted
in situ, and that operation also we have studied already.
The .asm file for finding the factorial
data segment
dw 8000h dup(?) ; space reserved for result
171
data ends
stack segment stack

dw 256 dup(?)
tos label word
stack ends
code segment
assume cs:code, ds:data, es:data, ss:stack
start: mov ax, data
mov es,ax
mov ds,ax
mov ax,stack
mov ss,ax
mov sp,offset tos ;stack initiaization
again1: int 1 ;now, load any number whose factorial is to be found
;into reg. ax (no. to be less than or = 7da1 hex.
call fact
jmp again1 ;get factorial of another number; load number in ax.
fact proc near
;This procedure can find factorial of any number upto 7db1 hex or
;32177 decimal, It is basically in two steps
;Step 1 stores data input in ax, if necessary,
;in 2 memory words in the BCD4 format in the little endian fashion.
;This operation is also common to multiplication of step 2, and so it
;comes at the last part of step 2.
;Step 2 consists of doing multiplication successively by one less nuber
;obtained in the previous multiplication.
;Before concluding, step 3 marks the endof result with a flag word FFFF,
;converts the BCD4 to normal BCD format, so
;the fnal result is in BCD in the big endian fashion.
;the procedure assumes ds and es point to the same segment in memory
;step 1
mov bp,ax ; save ax in bp
cld
sub di, di ;initialize di to 0
mov cx, di ;clear cx, so that store of initial data value is OK
mov bx,10000 ;for BCD4 handling
or ax,ax ;ax = 0?
jnz check ;if no,do further check (in step 2 later part)
inc ax ; ax is 0, so put its factorial in ax.
jmp store ;go to store the result (in step 2)
;step 2
;di has 0, the start address of the multiplicand; bp has the
;multiplier in hex; both multiplier and multiplicand are thus in BCD4
;form. so, multiplication in BCD4 form is carried out. cx has the number
;of words in the result (initialised to 0). si has 0 initially.
mult2: sub ax,ax

repz scasw ;this avoids MUL on initial 0's in the mult. process
inc cx ;this and the next instruction compensate for one extra-
Sub di,2 ;operation done on these 2 regs. by the repz instruction.
mult: mov ax, word ptr[di]
mult1: mul bp
add ax, si ;si has residue from previous mult
adc dx, 0 ;higher word of product
div bx ;BCD4 conversion
mov si,ax
mov ax, dx
stosw
172
loop mult ;on exit from the loop cx = 0.
mov ax, si ;check if higher part is there in the result
or ax, ax
jz process ;ifso,go to process it.
;else check ax as shown next.
;store from now on, is the same as the initial input store
; so this forms the part 2 of step 1.
check: cmp ax, bx ;greater than 10,000?

jb down ;if no go down
sub ax, bx ;the check loop is essentially,a divide by 10,000.
inc cx ;note, cx is initialised to 0 in the mult loop (or at
;start before storing the given data).
jmp check
down: stosw
jcxz process ;if cx = 0, data store over, go to further process
mov ax, cx ;else do the store of the next BCD4 digit
store: stosw
process:cmp bp,2 ;check for the termination of the process

jbe step3 ;if below or equal to 2, process over
;go to convert BCD4 to BCD and do big endian store.
dec bp ;else continue to dec and multiply
mov cx, di ;get word count in cx
ror cx, 1 ;dx has the memory byte address which is double
;the word count, so divide by 2, using left shift
sub di, di ;get start address in di
mov si, di ;make initial residue 0.
jmp mult2 ;proceed to multiply
;step 3. This step is used to convert the result in BCD4 to regular BCD
;and store it in the big-endian fashion so as to make it easy to view.
;note di at this point has an address 2 more than the last word stored
step3: mov ax, -1 ;to flag the end of data

mov[di], ax
convert:mov bx, 100 ;arranging for conversion
mov cx, 4 ;shift count for conversion
sub si, si ;get start address in si
again: lodsw ;get the BCD4 number
or ax, ax
jz skip ;if ax is 0, skip conversion
call conv ;BCD4 to BCD conversion routine
skip: sub di,2 ;get the last unconverted word address
cmp si,di
xchg ax, [di]
ja finish
call conv
mov [si - 2], ax
cmp si,di
jnz again
finish:add si,di ; get the final address of result where FFFF is stored
ret
fact endp
conv proc near
div bl
xchg ah,bh
aam
rol al,cl
ror ax,cl
173
xchg bh,al
aam
rol al,cl
rol ax,cl
xchg al,bh
ret
conv endp
;total memory used by the executable program = 172 bytes

;total no. of instructins used 85
code ends
end start
Testing in debug
-g
AX=23DC BX=0001 CX=02A7 DX=0000 SP=0200 BP=0000 SI=0000 DI=0000
DS=13DC ES=13DC SS=23DC CS=23FC IP=0011 NV UP EI PL NZ NA PO NC
23FC:0011 E80200 CALL 0016
-rax
AX 23DC
:2f ; (2f hex = 47 decimal)
-g
AX=14F1 BX=0064 CX=0004 DX=0A1A SP=0200 BP=0002 SI=001E DI=000E
DS=13DC ES=13DC SS=23DC CS=23FC IP=0011 NV UP EI PL NZ NA PE NC
23FC:0011 E80200 CALL 0016
; help for finding the termination of the result
-d0 2f
13DC:0000 25 86 23 24 15 11 16 81-80 64 29 64 35 51 53 61 %.#$.....d)d5QSa
13DC:0010 19 79 96 91 97 63 23 89-12 00 00 00 00 00 FF FF .y...c#.........
; this result can be verified in the scientific calculator.
; the word FFFF at address 001E flags the end
of the result.
-rax
AX 14F1
:7db1 ; this is the maximum input possible- limited by the data segment size.
-g
AX=17FA BX=0064 CX=0004 DX=118C SP=0200 BP=0002 SI=FFFE DI=7FFE
DS=13DC ES=13DC SS=23DC CS=23FC IP=0011 NV UP EI NG NZ NA PO NC
23FC:0011 E80200 CALL 0016; location FFFE is where the result ends
-d0
13DC:0000 44 92 60 64 35 41 31 94-14 42 51 01 76 89 57 78 D.`d5A1..BQ.v.Wx
13DC:0010 61 91 34 55 16 28 36 58-31 77 63 99 74 35 01 00 a.4U.(6X1wc.t5..
13DC:0020 32 73 17 02 13 31 81 84-14 30 27 95 19 99 90 37 2s...1...0'....7
13DC:0030 95 41 58 53 15 06 58 26-14 94 02 11 78 98 01 64 .AXS..X&....x..d
13DC:0040 93 45 78 83 32 67 60 39-09 31 74 21 27 98 88 43 .Ex.2g`9.1t!'..C
13DC:0050 85 94 64 18 99 56 02 57-67 69 88 66 04 39 88 25 ..d..V.Wgi.f.9.%
13DC:0060 71 42 58 06 97 36 78 12-57 89 63 16 15 96 84 35 qBX..6x.W.c....5
13DC:0070 90 71 06 01 34 12 73 32-65 39 49 62 85 55 61 40 .q..4.s2e9Ib.Ua@
;initial significant part of the result
-d f000
13DC:F000 63 00 16 22 26 47 69 72-27 27 92 14 45 15 37 86 c.."&Gir''..E.7.

13DC:F010 03 91 25 24 99 87 19 12-04 36 67 91 04 70 28 29 ..%$.....6g..p()
13DC:F020 85 43 18 33 10 92 79 52-67 42 70 59 12 54 93 68 .C.3..yRgBpY.T.h
13DC:F030 99 52 81 80 36 06 08 26-12 38 28 75 16 27 89 41 .R..6..&.8(u.'.A
13DC:F040 92 46 92 06 14 21 83 01-44 00 00 00 00 00 00 00 .F...!..D.......
174
13DC:F050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:F060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:F070 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; from here till the end, the entries are all zeros.
-d fff0
13DC:FFF0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 FF FF ................
; The result covers the full data segment of 65536 (excepting the last two
;bytes). The result can be verified to a good extent using the Scientific
;calculator. Only portions of the reult in the data seg. are shown above.
5. Modular multiplication: In many cryptographic processes, there will be a

need for doing modular multiplications of two large numbers and obtain the modular
value of the product to a base of another large number. Modular value of a number to a
base indicates the remainder obtained on dividing the number by the base number. The
numbers we are speaking about here are of the order of 1024 bits or 2048 bits or may be
even more. What we wish to compute is X*Y mod M with X,Y and M all of the order
indicated above.
Algorithm: Interleaved Modular Multiplication. The algorithm shown below does the
job reasonably well and is easy to understand. What is done is, the product is obtained by
bitwise multiplication of X with bits of Y and at each stage of multiplication M is
removed. Each stage may require 2 subtractions of M at most.
There are other algorithms available but here we give the interleaved modular
multiplication.
INPUT: X; Y; M with 0 < = X,Y < = M
X = ∑ xi*2i ; i = 0, 1,..., n-1; and similarly Y and M in terms of bits.
OUTPUT: P = X * Y mod M
n: number of bits of each of X, Y and M
yi: ith bit of Y
1. P = 0;
2. for (i = n – 1; i < = 0; i--)
3. P = 2 * P;
4. if (P > = M) P = P – M;
5. I = yi * X ;
6. P = P + I;
7. if (P > = M) P = P – M;
THE ASSEMBLY LANGUAGE PROGRAM MODMUL.ASM
data segment
x dw 65535, 65535, 65535, 15, 76 dup(0); X = 000F FFFF FFFF FFFF hex
dw 65535, 65535, 65535, 15, 75 dup(0); Y = 000F FFFF FFFF FFFF hex
y dw 0 ; msw of Y
m dw 1, 0, 0, 16, 76 dup (0) ; M = 0010 0000 0000 0001 hex
p1 dw 80 dup(0) ; P of the algorithm
p2 dw 80 dup(0) ; Scratch pad for temporary use
xn dw 80
p1n dw 80 ; 80 words or 1279 bit (1 bit margin left to accommodate carry
; on addition) bit data can be had for each of X, Y and M
;
;
data ends
;
code segment
;
175
; since there are several variables involved,
; it is better to use the stack for variable store
;
start: push ax
push bx
push di
push dx
push cx ; used regs saved in stack
mov ax, data
mov ds, ax
mov es, ax ; segments initialized
; parameters stored in stack frame
;
mov bx, offset x ; [bp+10]
push bx
mov bx, offset y ; [bp+8]
push bx
mov bx, offset m ; [bp+6]
push bx
mov bx, offset p1 ; [bp+4]
push bx
mov bx, offset p2 ; [bp+2]
push bx
push bp
mov bp, sp
sub sp, 2 ; temp at [bp-2]
cld
;
mov bx, [bp+8]
lup1: mov ax, [bx]
sub bx, 2
push bx ; bx points to next lower word of Y
mov [bp-2], ax ; the current word of y
mov dx, 16
lup2: mov di, [bp+4]; p1 is doubled
mov si, di
call addchk ; 2*p1 --> p1; (p1 – M) --> p2; if borrow, ignore p2
; else interchange pointers p1 and p2
mov ax, [bp-2]
shl ax, 1
mov [bp-2], ax
jnc down1
mov di, [bp+4]; address p1
mov si, [bp+10]
call addchk ; p1 + X --> p1; p1 – M --> p2; if borrow, ignore p2
; else, interchange the pointers p1 and p2
down1: dec dx
jnz lup2
pop bx
dec [p1n]
jnz lup1
mov si, [bp+4]
mov sp, bp
pop bp
add sp, 10; clear the stack frame
pop cx
pop dx
pop di
pop bx
pop ax ; retrieve the stacked parameters
int 1
;
;
176
addchk proc near
;
;this procedure adds [di] to [si] and subtracts m from the sum
;and puts the result in p2. If the result ends in a borrow, on
;subtraction, the result in p2 is ignored.
;Else, if no borrow, the result of subtraction goes to p1. This
;is achieved by simply interchanging p1 and p2 pointers,
;in case of no borrow on subtraction.
mov cx, xn
clc
back10: lodsw
adc ax, [di]
stosw ; [si] + [di] --> [di]
loop back10 ;
mov si, [bp+4] ; address p1
mov di, [bp+2] ; address p2
mov bx, [bp+6] ; address m
mov cx, xn
clc
back11: lodsw ; Computation p1-m --> p2
sbb ax, [bx]
inc bx
inc bx ; so that carry will not change
stosw
loop back11
jc down10
mov ax, [bp+2]
xchg ax, [bp+4]
mov [bp+2], ax ; pointers to p1 and p2 exchanged
down10: ret
addchk endp
code ends
end start
The program takes a space of 152 bytes apart from the data space of the data
segment. It is arranged that the registers are all saved across the program excepting the
register SI which points to the result of the modular multiplication. The data segments
should have the indicated labels for the program to work. If these labels are stuck to,
then the whole program can be used as a sub routine. Alternatively, the offset address of
the start of the data may be passed to the subroutines through registers and directly these
registers may be stacked. That will be helpful in the cryptographic situations.
The program works only when at least one of the two data to be multiplied is
smaller than M, and the smaller one should be taken as X. The size of Y does not matter.
In the case of cryptographic processes like RSA algorithms etc. both X and Y will be
smaller than M. The reader can easily see why this is so by looking at the algorithm..
6. Division of large numbers: Division is a more complex problem than

multiplication when large numbers are involved. In case of floating point number
routines handled by software, division by any number a is done using approximate results
for 1/a from a table, a vs 1/a. This result is improved to the desired accuracy by iteration
in a few steps using Newton Raphson method. The theory of the method is given below.
Let x be an approximate inverse of a, such that ax = 1 - d, where d is the (signed

error) in the value of x (per unit). That is, if x is the correct inverse of a, then x = x*(1 –
d). We are trying to solve the equation a = 1/x or f(x) = a – (1/x) = 0.
177
Then x’ (the next approximation for x) = x – [f(x) / f ’(x)]; Newton-Raphson
Or x’ = x – [(a – 1/x)/ (1/x2)] = x – ax2 + x = x*(2 – ax) which becomes
x’ = x*(2 – 1 + d), since ax = 1 – d. Thus, x’ = x*(1 + d).
ax’ will now be ax*(1 + d) = 1 – d2, since ax = 1 – d
The highlighted equation in the last line above, indicates that the the error in the
iterated value x’ is the square of the error in x. The calculation in each iteration is seen to
involve only multiplication and addition, because x’ = x*( 2 – a*x), involving 2
multiplications and one subtraction. Further, it is to be noted that the accuracy goes up as
the square of the error with each iteration. If our original value of the approximation has
a 4 bit accuracy, next iteration will be of 8 bit accuracy and the next, of 16 bit accuracy
and so on. Even starting from a 4 bit accuracy we can reach 64 bit accuracy for the result
in 5 iterations. The table below is guaranteed to be accurate to 6 bits and is thus capable
of giving better than 32 bit accuracy in 3 rounds of iteration (actually 48 bits) , good
enough for single precision Floating Point calculations. One more round will give better
than what is required for extended double precision format of IEEE standard 754.
The table below is prepared as follows: The inverse of 1.xxxyyy1 (where x and y
are either bit 1 or bit 0) is taken and its value correct to 8 binary digits is computed and
placed on the table against the position xxxyyy, where xxx corresponds to the row and
yyy to the column. For example, the entry in the row 100 at column 110 corresponds to
the inverse of 1.1001101. This inverse, correct to 8 bit accuracy, turns out to be: 0.1010
0000 or A0 H with a leading binary point. The maximum error in these occur exactly at
the points indicated as those points are calculated for 1.xxxyyy1 and marked against
1.xxxyyy; at any other point, the inverse becomes closer to the indicated value. For
example, the value marked at 1.100110 will be valid over the range 1.100 110 to very
nearly 1.100 111, taking it as corresponding to 1.100 1101, will produce maximum error
at the two extremities of the range and will be increasingly accurate at the middle of the
interval. Worst error occurs at the value 1.000 000, and we see that value is correct upto
7 bits. With this we can get an accuracy of 56 bits which corresponds to double precision
IEEE standard, with three iterations, we get good enough accuracy for any normal FP
calculations. Once inverse is got division can be obtained using multiplication by the
inverse.
000 001 010 011 100 101 110 111

000 1FC 1F4 1EC 1E5 1DE 1D7 1D0 1CA
.001 1C3 1BD 1B7 1B2 1AC 1A6 1A1 19C
010 197 192 18D 188 183 17F 17A 176
011 172 16E 16A 166 162 15E 15A 157
100 153 150 14C 149 146 142 13F 13C
101 139 136 133 130 12E 12B 128 125
110 123 120 11E 11B 119 116 114 112
111 10F 10D 10B 109 107 105 103 101
7 bit accuracy table of inverses
178
We shall not go into the details of this process here.
7. Division using the method followed for modular multiplication: But we will
present a modification of the modular multiplication process, that we saw in the previous
section to carry division of arbitrarily long numbers. The process is the same as the
modular multiplication given, choosing X =1 , and Y as the dividend, the modular base
will now be the divisor. The program is given below. The program handles up to 1280
bit dividend and up to1279 bit divisor, because of providing 80 words as the data size.
The working memory space can be increased by providing more word size for use . We
need 5 times the data size, and we can comfortably accommodate 3000 hex bytes (i.e.,
over 12000 decimal bytes or over 96000 bits) of data with this program. Once a
provision has been made in the program, arbitrarily smaller data can be handled by
making the leading words zero, and the program will handle this data. The efficiency of
the program can be increased by using the actual word size using the data size
determining macro of the program given earlier. We have used it to size the divisor data,
but not for the dividend sizing. The size of the dividend can be accommodated by giving
this size at the label ndd (standing for number of words of the dividend in the program.
The program will need no alteration other than replacing the value of ndd by the size
value got for the dividend.
; Division of arbitrary long numbers
data segment
dvd dw 40 dup(4096), 65530, 39 dup(65535)
;dividend
dr dw 65534, 39 dup(65535), 40 dup(0) ;divisor
qt dw 80 dup(?) ; both quotient and remainder spaces are
r1 dw 80 dup(?) ; provided the same as dividend space
r2 dw 80 dup(?)
ndd dw 80 ; same space, as dividend at start
spare dw 7 dup(?);
data ends
;
code segment
assume cs:code, ds:data, es:data ;
;
strt: mov ax, data
mov ds, ax
mov es, ax
; preparing stack frame
mov ax, offset dvd ; [bp+14]
push ax
mov ax, offset dr ; [bp+12]
push ax
mov ax, offset qt ; [bp+10]
push ax
mov ax, offset r1 ; [bp+8]
push ax
mov ax, offset r2 ; [bp+6]
push ax
mov ax, ndd ; [bp+4]
push ax
call longdiv ; ret address [bp+2]
;
int 1
;
; now macros used
179
;
clr macro offs, n ;; ax should be 0 at entry here
mov di, offs
mov cx, n
sub ax, ax
rep stosw
endm
;
double macro offsd, n
local dub
mov si, offsd
mov di, si
mov cx, n
clc
dub: lodsw
adc ax, ax
stosw
loop dub
endm
;
; to handle variable size divisors, it is necessary to find the
; exact word size of the divisor. note the dividend size is
; flexible and accommodated in the process.
drsize macro ofset, nsize
local size1
mov si, ofset
mov cx, nsize
add si, cx
add si, cx
size1: sub si, 2 ; msw address of data at ofset
mov ax, [si]
or ax, ax
loopz size1
rol ax, 1
adc cx,0
endm
subt macro offs1, offs2, offs3, n ;; [offs1] - [offs2] --> offs3

local subt1
mov si, offs1
mov di, offs3
mov bx, offs2
mov cx, n
clc
subt1: lodsw
sbb ax, [bx]
inc bx
inc bx ;; inc is used so that carry remains same.
stosw
loop subt1
endm
;
longdiv proc near
push bp
mov bp, sp
sub sp, 4 ; [bp-2] for loop count, [bp-4] for partially
; shifted data
cld
;
; clear quotient space
clr [bp+10], [bp+4]
;
; clear remainder space
180
clr [bp+8], [bp+4]
;
; now, the real works! First double quotient and rem. r1
drsize [bp+12], [bp+4]
mov ax, [bp+4]
mov [bp-2], ax ; current word count for loop
mov bx, [bp+14] ; dividend start address
dec ax
shl ax, 1
add bx, ax ; dividend end address
lup1: push bx
mov ax, [bx] ; msw of dividend
mov [bp-4], ax ; dividend partial word, (current)
mov dx, 16
lup2: double [bp+10], [bp+4]
double [bp+8], [bp+4]
mov ax, [bp-4]
shl ax, 1
mov [bp-4], ax
mov bx, [bp+8]
adc word ptr [bx], 0 ; inc r1, on carry
subt [bp+8], [bp+12], [bp+6], 41 ; r1-divr = r2
jc down
mov bx, [bp+10]
inc word ptr[bx] ; quotient to be incremented
; the lsw of the quotient will end with a
; 0, hence incrementing the lsw is all
; that is required to increment the quotient
; now interchange r1 and r2 pointers
mov ax, [bp+8]
xchg ax, [bp+6]
mov [bp+8], ax
down: dec dx
jnz lup2
pop bx
sub bx, 2
dec word ptr[bp-2]
jnz lup1
mov si, [bp+8] ; pointer to remainder
mov sp, bp
pop bp
ret 12
longdiv endp
code ends
end strt
; The program takes 211 bytes of memory in the code segment and about 810
bytes of memory in the data segmment with about 30 bytes in the stach segment.
On testing the program with the above data, the following results are obtained
as seen from the relevant Data Segment area.
; The result of testing the program in the debug after assembling and
linking is presented below.
-g
AX=FFFF BX=FFFE CX=0000 DX=0000 SP=0000 BP=0000 SI=01E0 DI=02D2
DS=13DC ES=13DC SS=13DC CS=140F IP=0024 NV UP EI PL ZR NA PE CY
140F:0024 55 PUSH BP; First instn of proc(after INT 1).
-d 0 32f
; The dividend
13DC:0000 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
181
13DC:0010 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0020 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0030 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0040 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0050 FA FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
; the divisor
13DC:00A0 FE FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0100 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0110 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0120 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0130 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; the quotient
13DC:0140 FC FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; The remainder is here in this case, as pointed to by the SI reg.

13DC:01E0 F8 0F 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:01F0 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0200 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0210 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0220 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0230 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0240 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0250 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0260 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0270 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; r2, the scratch pad area.

13DC:0280 FA 0F 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0290 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:02A0 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:02B0 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:02C0 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:02D0 FF FF 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0300 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0320 50 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 P...............
I have verified the result above, by using the principles of Vedic

Mathematics. The yellow highlighted regions are helpful in the verification
182
; The same program tested with another set of data to indicate data size
flexibility of the program.
data segment
dvd dw 23567, 34239, 12345, 77 dup(0)
;dividend
dr dw 0abcdh, 0123, 78 dup(0) ;divisor
qt dw 80 dup(?) ; both quotient and remainder spaces are
r1 dw 80 dup(?) ; provided the same space as dividend space
r2 dw 80 dup(?)
ndd dw 80 ; same space, as dividend at start
spare dw 7 dup(?);
data ends
;
; Results of test
-g
AX=FFFF BX=FFFE CX=0000 DX=0000 SP=0000 BP=0000 SI=0280 DI=0232

DS=13DC ES=13DC SS=13DC CS=140F IP=0024 NV UP EI PL ZR NA PE CY
140F:0024 55 PUSH BP
-d 0 32f
; dividend
13DC:0000 0F 5C BF 85 39 30 00 00-00 00 00 00 00 00 00 00 .\..90..........
13DC:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0040 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0070 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; divisor
13DC:00A0 CD AB 7B 00 00 00 00 00-00 00 00 00 00 00 00 00 ..{.............
13DC:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0100 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0110 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0120 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0130 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; quotient
13DC:0140 50 D3 63 00 00 00 00 00-00 00 00 00 00 00 00 00 P.c.............
13DC:0150 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0160 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0170 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0180 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; scratchpad r2 changed to this place

13DC:01E0 32 09 BD FF FF FF FF FF-FF FF FF FF FF FF FF FF 2...............
183
13DC:0230 FF FF 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0240 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0250 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0260 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0270 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; the remainder r1 changed to this plce, see the register si for the
; starting address of remainder.
13DC:0280 FF B4 38 00 00 00 00 00-00 00 00 00 00 00 00 00 ..8.............
13DC:0290 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0300 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0320 50 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 P...............
This result can be checked with the scientific calculator.
8. Some Macros that can be used for Large Number handling: Large numbers
can be considered as single data entities. For handling this way, we allot a data array to
each large number. The array length can be of fixed size, like we handle word data in
registers. The data size is nominally 16 bits, but it does not prevent us from considering
any data of size upto 16 bits to be stored and handled in the registers. The simple number
1 can also be stored in a 16 bit register and it will be considered as 0001 hex. The
number can be handled as 16 bit without encountering any computational problem. In a
similar way, we can allot, say 256 bytes of memory to store any data upto 256 bytes or
2048 bits size. The data can be referenced by the start address of the array and by the
number of data words used. The array will be filled from the start address and will
extend as for as significant bits exist in the data and the remaining data words can be
filled with 0’s. For example see the 3 byte or 2 word data stored in the remainder value
presented in the array just above, (memory locations 13DC: 0280 – 031F). They can be
considered as 3 byte data, or 2 word data or 160 byte without any ambiguity. The macros
presented below can be viewed as handling data in this fashion. It will be a good method
to store data so that they can be considered to arrays of equal size and the algorithms
normally will be able to handle zero data in the leading words without serious problems,
other than taking more time processing the zero data as valid data numbers. If one is
particular one can extract the exact word size information, and limit the computations to
the size available. But this is not necessary, as some of the programs above indicate.
However, in the set of macros given below we have included a macro to find the actual
number of significant words of a data array.
Here is a study of some macros useful in handling arbitrarily long integers.

MACROS USEFUL FOR HANDLING LARGE NUMBERS
data segment
sc1 dw 02, 31 dup(0), 0fffeh, 30 dup (0ffffh), 0fffh, 64 dup (0)
sc2 dw 32 dup (0ffffh), 96 dup (0)
sc3 dw 128 dup (0)
des dw 128 dup (?)
184
des2 dw 128 dup (?)
n dw 128
data ends
;
code segment
;
; arbitrarily long integer handling macros
; (i) end address calculation
;
lend macro src, reg, n
mov reg, offset [src]
mov cx, n
dec cx
add reg, cx
add reg, cx
inc cx
endm
;
; (ii) [src1].as.[src2] --> [dest]; direction UP assumed
; 'as' is 'adc' or 'sbb'
;
las macro src1, src2, dest, n, as
local las1
mov si, offset[src1]
mov bx, offset[src2]
mov di, offset[dest]
mov cx, n
clc
las1: lodsw
as ax, [bx]
stosw
inc bx
inc bx
loop las1
endm
;
; (iii) mov arr1 to arr2; non overlapping arrays
;
movarr macro src, dest, n
mov si, offset[src]
mov di, offset[dest]
mov cx, n
rep movsw
endm
;
; (iv) lsig obtain the significant number of words of a long integer
;
lsig macro src, n
local lsig1
mov si, offset [src]
mov cx, n
dec cx
add si, cx
add si, cx
add cx, 2
std
lsig1: lodsw
or ax, ax
loopz lsig1
cld
endm
;
185
strt: mov ax, data
mov ds, ax
mov es, ax
lend sc1, si, n
las sc1, sc2, des, n, adc
movarr des, des2, n
las des2, sc1, des2, n, sbb
lsig sc1, n
int 1
code ends
end strt
TESTING IN DEBUG
-u 0 64
; INITIALISATION
0B96:0000 B8450B MOV AX,0B45
0B96:0005 8EC0 MOV ES,AX
;
; TESTING END ADDRESS COMPUTATION OF DATA [0] OR SC1
0B96:0007 BE0000 MOV SI,0000
0B96:000A 8B0E0005 MOV CX,[0500]
0B96:000E 49 DEC CX
0B96:000F 03F1 ADD SI,CX
0B96:0011 03F1 ADD SI,CX
0B96:0013 41 INC CX
;
; TESTING LAS FOR ADD; SC1 + SC2 à [300] OR [0] + [100] à [300]
0B96:0014 BE0000 MOV SI,0000
0B96:0017 BB0001 MOV BX,0100
0B96:001A BF0003 MOV DI,0300
0B96:001D 8B0E0005 MOV CX,[0500]
0B96:0021 F8 CLC
0B96:0022 AD LODSW
0B96:0023 1307 ADC AX,[BX]
0B96:0025 AB STOSW
0B96:0026 43 INC BX
0B96:0027 43 INC BX
0B96:0028 E2F8 LOOP 0022
;
; TESTING MOVE; [300] à [400]
0B96:002A BE0003 MOV SI,0300
0B96:002D BF0004 MOV DI,0400
0B96:0030 8B0E0005 MOV CX,[0500]
0B96:0034 F3 REPZ
0B96:0035 A5 MOVSW
;
; TESTING LAS FOR SUBTRACT; [400] – [0] à [400]
0B96:0036 BE0004 MOV SI,0400
0B96:0039 BB0000 MOV BX,0000
0B96:003C BF0004 MOV DI,0400
0B96:003F 8B0E0005 MOV CX,[0500]
0B96:0043 F8 CLC
0B96:0044 AD LODSW
0B96:0045 1B07 SBB AX,[BX]
0B96:0047 AB STOSW
0B96:0048 43 INC BX
0B96:0049 43 INC BX
0B96:004A E2F8 LOOP 0044
;
186
; TESTING LSIG; SIGNIFICANT NUMBER OF WORDS OF [0] à CX; SIGN FLAG INDICATES
; THE LEADING BIT OF THE LEADING WORD
0B96:004C BE0000 MOV SI,0000
0B96:004F 8B0E0005 MOV CX,[0500]
0B96:0053 49 DEC CX
0B96:0054 03F1 ADD SI,CX
0B96:0056 03F1 ADD SI,CX
0B96:0058 83C102 ADD CX,+02
0B96:005B FD STD
0B96:005C AD LODSW
0B96:005D 0BC0 OR AX,AX
0B96:005F E1FB LOOPZ 005C
0B96:0061 FC CLD
0B96:0062 CD01 INT 01
0B96:0064 2AE4 SUB AH,AH
-g 7 ;LOAD THE SEGMENT REGISTERS
AX=0B45 BX=0000 CX=0574 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=0007 NV UP EI PL NZ NA PO NC
0B96:0007 BE0000 MOV SI,0000
-d 0 500 ; the original data in the data arrays

0B45:0000 02 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0040 FE FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0050 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0070 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................
0B45:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

0B45:0140 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0150 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0160 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0170 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0180 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0200 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

0B45:0210 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0220 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0230 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0240 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0250 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
187
0B45:0260 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0270 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0280 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0290 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0300 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

0B45:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0320 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0330 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0340 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0350 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0360 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0370 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0380 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0390 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0400 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

0B45:0410 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0420 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0430 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0440 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0450 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0460 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0470 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0500 80 ; the value of n .
-g 14
;COMPUTE END ADDRESS OF SOURCE1
AX=0B45 BX=0000 CX=0080 DX=0000 SP=0000 BP=0000 SI=00FE DI=0000
DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=0014 NV UP EI PL NZ AC PO NC
0B96:0014 BE0000 MOV SI,0000
-g 2a
;TEST LAS FOR ADD;
DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=002A NV UP EI PL NZ AC PE NC
0B96:002A BE0003 MOV SI,0300
-d 0 3ff
; DATA [0] + [100] à [300]
; DATA [0]
0B45:0000 02 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
188
0B45:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
;DATA [100]
0B45:0140 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0150 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0160 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0170 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0180 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
;
; DATA [300]
0B45:0300 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0320 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0330 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0380 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0390 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-g 36
;DATA [300] à [400]

DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=0036 NV UP EI PL NZ AC PE NC
0B96:0036 BE0004 MOV SI,0400
-d 300 4ff
0B45:0300 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

0B45:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0320 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0330 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
189
0B45:0380 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0390 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0400 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

0B45:0410 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0420 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0430 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-g 4c; TESTING LAS FOR SUBTRACTION

;
DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=004C NV UP EI PL NZ AC PE NC
0B96:004C BE0000 MOV SI,0000
-d
; DATA [400] – DATA [0] à DATA [400]
; DATA [400] BEFORE EXECUTION OF LAS
0B45:0400 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0410 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0420 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0430 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
;DATA [0]
0B45:0000 02 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
190
0B45:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; DATA [400] AFTER SUBTRACTION

0B45:0440 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0450 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0460 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0470 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-g
; VERIFICATION OF LSIG, THE PL FLAG SHOWS THE LEADING DIGIT OF THE MS WORD IS 0
AX=0FFF BX=0100 CX=0040 DX=0000 SP=0000 BP=0000 SI=007C DI=0500
DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=0064 NV UP EI PL NZ NA PE NC
0B96:0064 2AE4 SUB AH,AH
-q
These macros, including ‘condh’ and ‘conhd’ that we discussed discussed in section 3 of
this chapter, could be conveniently used to handle the large number programs we have
been seeing. Sometimes we need to convert a little endian data to a big endian data,
without changing its location. This is the same as array reversal in-situ program given
under section 6 of Chapter 5. A slightly modified macro tailored to large number
handling in terms of word arrays is given below, without serious comments, for your
study. The macro assumes direction flag is clear, so the string operations are in the
address incrementing mode.
data segment
arr dw 1, 2, 3, 4, 5, 7
n dw ($ - arr)/2
data ends
;
;
code segment
assume cs:code, ds:data, es: data
;
; now the macro revar is defined
revar macro src, n
local rev1, rev2
mov si, offset [src]
191
mov di, si
add si, n
add si, n
jmp rev1
rev2: mov ax, [di]
xchg ax, [si]
stosw
rev1: sub si, 2
cmp si, di
ja rev2
endm
;
Strt: mov ax, data
mov ds, ax
mov es, ax
revar arr, n
int 1
code ends
end strt
The test program and the macro are given without test results. The verification of
the program is left to the reader.
9. A Table of first 16K prime numbers in Decimal: In section 7 of chapter 5, we

saw how we can check if a given number is a prime, using a table of primes limited
to the square root of the number being checked. We may start with a table which
has just a couple of prime numbers, 2 and 3, and build the table and also use it as
we proceed (satisfy yourself about this statement). Below, we give a program
which builds its own prime number table, as it progresses.
All the computations are done in hexadecimal, and the first 16K hexadecimal
prime numbers are found, the numbers are stored in terms of 4 byte words, and
these words are then converted to decimal numbers using 4 byte hex to BCD
conversion routine.
The numbers fill the entire data segment and are stored in the big endian fashion
in decimal. The largest number in the table is the 6-digit decimal prime number
180503. The entire entire table is not presented. Only a sample of the output at
the start and at the end are seen. The complete list can be got by copying the
program and running it in the DOS (DEBUG) environment after assembling.
Exercise: Study the program in respect of the algorithm, register usage and
optimizations done in the program.
A DECIMAL LIST OF FIRST 16K PRIME NUMBERS

AND THE PROGRAM THAT PRODUCED THEM
data segment
table dw 32768 dup(?)
data ends
;
stak segment stack
dw 256 dup(?)
tos label word
stak ends
192
;
code segment
assume cs:code, ds: data, es:data, ss:stak
start: mov ax, data
mov ds, ax
mov es, ax
mov ax, stak
mov ss, ax
lea ax, tos
mov sp, ax
mov ax, 2
sub dx, dx
mov cx, dx
cld
sub di, di
stosw ; first prime number 2, stored
xchg ax,dx
stosw ; stored as a double word
xchg ax, dx
inc ax ; next prime, 3.
stosw
xchg ax, dx
stosw
xchg ax,dx ; stored as a double word
mov bx,ax ;cx:bx is the number to be checked if prime
nextp: or di, di ;di is address for storing next prime
jz finish
nextp2: add bx, 2 ;try if the next odd number is a prime
adc cx, 0 ; if carry, increment cx
mov si,4
next: lodsw
cmp ax, 65535; what is this check? This ensures we do not get too large
; a number as the prime. Actually this is unnecessary.
jz finish
mov bp, ax
add si, 2
mul ax
cmp cx, dx
jz proc1
jnb procm ; if number-now is > bp*bp, then divide number-now by bp
over: mov ax, bx
stosw
mov ax, cx ; yellow part of the program checks the next odd number
stosw
jmp nextp
proc1: cmp bx, ax ; square is more than the number-now prime
jb over ; the number is prime, store it
jz nextp2 ; the number is a square, so not prime.
procm: mov ax, bx
mov dx, cx
div bp
or dx, dx
jnz next
jmp nextp2
finish: mov cx, 8
sub si, si
mov di, si
bak: lodsw
mov dx, ax
lodsw ;first 8 numbers converted to decimal
stosw
mov ax, dx
cmp ax, 10
193
jb dwn
add ax, 6
dwn: xchg ah, al
stosw
loop bak
lup2: lodsw
mov dx, ax
lodsw
xchg ax, dx
mov bp, 10000
div bp
cmp ax, 10
jb dn1
add ax, 6
dn1: xchg ah, al
dn: stosw
mov ax, dx
mov dx, 100 ; remaining double hex words converted to BCD
div dl
mov cx, 4
xchg ah, ch
aam
ror al, cl
ror ax, cl
xchg al, ch
aam
rol al, cl
rol ax, cl
mov al, ch
stosw
cmp si, 0000
jnz lup2
ok: int 1
code ends
end start
Testing in the Debug Environment
-u 0 b3
23F6:0000 B8D613 MOV AX,13D6

23F6:0003 8ED8 MOV DS,AX
23F6:0005 8EC0 MOV ES,AX
23F6:0007 B8D623 MOV AX,23D6
23F6:000A 8ED0 MOV SS,AX
23F6:000C 8D060002 LEA AX,[0200]
23F6:0010 8BE0 MOV SP,AX
23F6:0012 B80200 MOV AX,0002
23F6:0015 2BD2 SUB DX,DX
23F6:0017 8BCA MOV CX,DX
23F6:0019 2BFF SUB DI,DI
23F6:001B AB STOSW
23F6:001C 92 XCHG DX,AX
23F6:001D AB STOSW
23F6:001E 92 XCHG DX,AX
23F6:001F 40 INC AX
23F6:0020 AB STOSW
23F6:0021 92 XCHG DX,AX
23F6:0022 AB STOSW
23F6:0023 92 XCHG DX,AX
23F6:0024 8BD8 MOV BX,AX
23F6:0026 0BFF OR DI,DI
23F6:0028 7436 JZ 0060
194
23F6:002A 83C302 ADD BX,+02
23F6:002D 83D100 ADC CX,+00
23F6:0030 BE0400 MOV SI,0004
23F6:0033 AD LODSW
23F6:0034 3D0100 CMP AX,0001
23F6:0037 7427 JZ 0060
23F6:0039 8BE8 MOV BP,AX
23F6:003B 83C602 ADD SI,+02
23F6:003E F7E0 MUL AX
23F6:0040 3BCA CMP CX,DX
23F6:0042 740A JZ 004E
23F6:0044 730E JNB 0054
23F6:0046 8BC3 MOV AX,BX
23F6:0048 AB STOSW
23F6:0049 8BC1 MOV AX,CX
23F6:004B AB STOSW
23F6:004C EBD8 JMP 0026
23F6:004E 3BD8 CMP BX,AX
23F6:0050 72F4 JB 0046
23F6:0052 74D6 JZ 002A
23F6:0054 8BC3 MOV AX,BX
23F6:0056 8BD1 MOV DX,CX
23F6:0058 F7F5 DIV BP
23F6:005A 0BD2 OR DX,DX
23F6:005C 75D5 JNZ 0033
23F6:005E EBCA JMP 002A
23F6:0060 B90800 MOV CX,0008
23F6:0063 2BF6 SUB SI,SI
23F6:0065 8BFE MOV DI,SI
23F6:0067 AD LODSW
23F6:0068 8BD0 MOV DX,AX
23F6:006A AD LODSW
23F6:006B AB STOSW
23F6:006C 8BC2 MOV AX,DX
23F6:006E 3D0A00 CMP AX,000A
23F6:0071 7203 JB 0076
23F6:0073 050600 ADD AX,0006
23F6:0076 86E0 XCHG AH,AL
23F6:0078 AB STOSW
23F6:0079 E2EC LOOP 0067
23F6:007B AD LODSW
23F6:007C 8BD0 MOV DX,AX
23F6:007E AD LODSW
23F6:007F 92 XCHG DX,AX
23F6:0080 BD1027 MOV BP,2710
23F6:0083 F7F5 DIV BP
23F6:0085 3D0A00 CMP AX,000A
23F6:0088 7203 JB 008D
23F6:008A 050600 ADD AX,0006
23F6:008D 86E0 XCHG AH,AL
23F6:008F AB STOSW
23F6:0090 8BC2 MOV AX,DX
23F6:0092 BA6400 MOV DX,0064
23F6:0095 F6F2 DIV DL
23F6:0097 B90400 MOV CX,0004
23F6:009A 86E5 XCHG AH,CH
23F6:009C D40A AAM
23F6:009E D2C8 ROR AL,CL
23F6:00A0 D3C8 ROR AX,CL
23F6:00A2 86C5 XCHG AL,CH
23F6:00A4 D40A AAM
23F6:00A6 D2C0 ROL AL,CL
23F6:00A8 D3C0 ROL AX,CL
195
23F6:00AA 8AC5 MOV AL,CH
23F6:00AC AB STOSW
23F6:00AD 83FE00 CMP SI,+00
23F6:00B0 75C9 JNZ 007B
23F6:00B2 CD01 INT 01
; 92 instructions and 179 memory locations used.
-g
AX=0305 BX=C117 CX=0504 DX=0064 SP=0200 BP=2710 SI=0000 DI=0000

DS=13D6 ES=13D6 SS=23D6 CS=23F6 IP=00B4 NV UP EI PL ZR NA PE NC
23F6:00B4 0000 ADD [BX+SI],AL DS:C117=71
-d 0 FF ; first 64 prime numbers in decimal

13D6:0000 00 00 00 02 00 00 00 03-00 00 00 05 00 00 00 07 ................
13D6:0010 00 00 00 11 00 00 00 13-00 00 00 17 00 00 00 19 ................
13D6:0020 00 00 00 23 00 00 00 29-00 00 00 31 00 00 00 37 ...#...)...1...7
13D6:0030 00 00 00 41 00 00 00 43-00 00 00 47 00 00 00 53 ...A...C...G...S
13D6:0040 00 00 00 59 00 00 00 61-00 00 00 67 00 00 00 71 ...Y...a...g...q
13D6:0050 00 00 00 73 00 00 00 79-00 00 00 83 00 00 00 89 ...s...y........
13D6:0060 00 00 00 97 00 00 01 01-00 00 01 03 00 00 01 07 ................
13D6:0070 00 00 01 09 00 00 01 13-00 00 01 27 00 00 01 31 ...........'...1
13D6:0080 00 00 01 37 00 00 01 39-00 00 01 49 00 00 01 51 ...7...9...I...Q
13D6:0090 00 00 01 57 00 00 01 63-00 00 01 67 00 00 01 73 ...W...c...g...s
13D6:00A0 00 00 01 79 00 00 01 81-00 00 01 91 00 00 01 93 ...y............
13D6:00B0 00 00 01 97 00 00 01 99-00 00 02 11 00 00 02 23 ...............#
13D6:00C0 00 00 02 27 00 00 02 29-00 00 02 33 00 00 02 39 ...'...)...3...9
13D6:00D0 00 00 02 41 00 00 02 51-00 00 02 57 00 00 02 63 ...A...Q...W...c
13D6:00E0 00 00 02 69 00 00 02 71-00 00 02 77 00 00 02 81 ...i...q...w....
13D6:00F0 00 00 02 83 00 00 02 93-00 00 03 07 00 00 03 11 ................
-d FF00 FFFF ; last 64 of the 16K Prime numbers in decimal

13D6:FF00 00 17 98 07 00 17 98 13-00 17 98 19 00 17 98 21 ...............!
13D6:FF10 00 17 98 27 00 17 98 33-00 17 98 49 00 17 98 97 ...'...3...I....
13D6:FF20 00 17 98 99 00 17 99 03-00 17 99 09 00 17 99 17 ................
13D6:FF30 00 17 99 23 00 17 99 39-00 17 99 47 00 17 99 51 ...#...9...G...Q
13D6:FF40 00 17 99 53 00 17 99 57-00 17 99 69 00 17 99 81 ...S...W...i....
13D6:FF50 00 17 99 89 00 17 99 99-00 18 00 01 00 18 00 07 ................
13D6:FF60 00 18 00 23 00 18 00 43-00 18 00 53 00 18 00 71 ...#...C...S...q
13D6:FF70 00 18 00 73 00 18 00 77-00 18 00 97 00 18 01 37 ...s...w.......7
13D6:FF80 00 18 01 61 00 18 01 79-00 18 01 81 00 18 02 11 ...a...y........
13D6:FF90 00 18 02 21 00 18 02 33-00 18 02 39 00 18 02 41 ...!...3...9...A
13D6:FFA0 00 18 02 47 00 18 02 59-00 18 02 63 00 18 02 81 ...G...Y...c....
13D6:FFB0 00 18 02 87 00 18 02 89-00 18 03 07 00 18 03 11 ................
13D6:FFC0 00 18 03 17 00 18 03 31-00 18 03 37 00 18 03 47 .......1...7...G
13D6:FFD0 00 18 03 61 00 18 03 71-00 18 03 79 00 18 03 91 ...a...q...y....
13D6:FFE0 00 18 04 13 00 18 04 19-00 18 04 37 00 18 04 63 ...........7...c
13D6:FFF0 00 18 04 73 00 18 04 91-00 18 04 97 00 18 05 03 ...s............
-q
Only the first and the last 256 bytes of displayed results (64 numbers in each block
of 256 bytes) are shown above. The last prime number indicated in the table is the
decimal number 180503. The result occupies the entire data segment. Note the
separate stack segment used in this program.
Conclusion: In this chapter we have seen how large integer numbers could be
handled in 8086 using only the assembly language programming without using any
serious tools. These programs illustrate the capability of the processor hardware
and its instruction set. In the previous chapters we have learnt about the processor
196
register set, instruction set architecture and looked at some simple programs. In
this last chapter, we have seen how those simple programs could be used to wrk out
more complex number handling routines. When we get into designing big
programs, the basic principles we have learnt in simpler programs of the earlier
chapter are still useful, but the complex programs do require careful management
of the resources available and proper tracking of the algorithm that we are
employing. Essentially, it is all about balancing the available resources against the
requirements of our algorithm, and this can greatly be helped by making adequate
coments so that the program can be easily understood and debugged without much
difficulty.
==00==
EXERCISES
1. Modify any one of the large number handling programs given in this Chapter
so that it uses the macros given at the end in section 8.
2. Write a program to invert a 20 byte number using the method of section 6.
197
APPENDIX A
In this book I have indicated the working and the results produced while solving a
.exe program in the debug environment. It is, as you have seen quite useful to produce a
permanent copy of the working with results for purposes of later use and demonstration.
In this appendix I shall explain the method that I have used for obtaining an MSWord file
indicating the working and results from the debug. The method is indicated below:
In the DOS environment invoke the debug with the parameter – [filename.exe]. If
this is followed by pressing of the <enter> key, the debug will work as usual. If instead,
the parameter is followed by – [>filename.dem], the output from the debug will go to
the .dem file. However, the response of the debug will not come on the screen; it will
directly go into the .dem file. (In Unix OS, there is a Tee operation permitted to make the
result go to the screen which is the standard output device and also to a named file. If
you are using a Unix system, it can be quite convenient. It is not so straight to do this in
the windows system, though there are tricks to overcome this deficiency. But I am sure
there is such a facility available in windows also. The information in the following site
(see under the FAQ index number 94) http://www.netikka.net/tsneti/info/tscmd.htm you
will find some tricks to do this job in the windows which may be useful. I have not used
any of these methods, but I have operated (blindly) in the debug environment by
redirecting the result directly to a .dem file without getting any visual feedback on the
screen. The .dem file can then be copied into an MSWord file and manipulated like a
.doc file. As a .dem file it cannot be easily manipulated as the output file created is not a
regular MSWord file; it is something like a notepad file.
For example, if we want to test the program style1.exe, the command sequence
could be as follows (refer Chapter 3, programming style 1):
From the DOS screen, give the command “debug style1.exe > style1.dem”. Press
the <enter> key as usual after the command. You will notice the debug prompt “-“. But
from then on, nothing of the debug responses will be seen on the screen. These responses
will directly go to the style1.dem file. The debug commands will have to be properly
given as required without any visible feedback on the responses of the debug. In case of
the demo on style1, the following sequences of commands were given.
u 0 30 <enter>
rax <enter>
ffef <enter>
r <enter>
t16 <enter>
q <enter>
These command sequences have to be worked out initially in the debug mode for
the required operations. Once the .dem file is obtained, it can conveniently be copied into
an MSWord file and edited wherever necessary. In this way, almost hands-on type of
feature can be had for studying assembly language programs, with a hard copy of the
working of the different programs.
197

8086 Programming

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

8086 Programming

Diunggah oleh

Hak Cipta:

Format Tersedia

INTRODUCTION

Students of microprocessor courses find programming in the assembly language

In chapter 1, a basic introduction to the assembly language programming is

In Chapter 3, we see the basics of programming. Generally beginners study

Use of microprocessors in embedded systems catering to some special equipment or

An assembly language may have one additional advantage. Normally, people

CPI 0A ; Compare register A with hex number 0A

Assembly language, however, requires thinking in two levels as we have already

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

The description above is, in brief, an introduction to the assembly language

8086 does not control the segment size

AX=0003 BX=0000 CX=0009 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

0B45:0000 03 00 03 00 03 00 03 00-03 00 00 00 00 00 00 00 ................

AX=45C2 BX=0012 CX=0000 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

0B45:0000 0D 00 0D 00 0D 00 0D 00-0D 00 0A 00 0A 00 0A 00 ................

1. What are the advantages of programming in the assembly language?

8085 Operations for the hex to ASCII conversion:

13D5:0002 3C0A CMP AL,0A

13D5:0004 1C2F SBB AL,2F

WORKING OF THE PROGRAM CASE 3:

13D5:0002 3C0A CMP AL,0A

13D5:0004 1C2F SBB AL,2F

STUDY OF THE PROCEED COMMAND OF DEBUG

The pr.exe program in the debug environment

13D6:0000 B8D513 MOV AX,13D5

AX=1300 BX=0000 CX=0010 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000

AX=1330 BX=0000 CX=0010 DX=0000 SP=0000 BP=0000 SI=0000 DI=0001

AX=1330 BX=0001 CX=0010 DX=0000 SP=0000 BP=0000 SI=0000 DI=0001

AX=1346 BX=0010 CX=0000 DX=0000 SP=0000 BP=0000 SI=0000 DI=0010

13D5:0000 30 31 32 33 34 35 36 37-38 39 41 42 43 44 45 46 0123456789ABCDEF

The pr.lst file

Microsoft (R) Macro Assembler Version 5.10 1/19/7

0000 data_here segment

0010 data ends

Microsoft (R) Macro Assembler Version 5.10 1/19/7

Segments and Groups:

N a m e Length Align Combine Class

CODE . . . . . . . . . . . . . 0021 PARA NONE

N a m e Type Value Attr

ASC . . . . . . . . . . . . . . L BYTE 0000 DATA_HERE Length = 0010

BACK . . . . . . . . . . . . . . L NEAR 000E CODE

HASC . . . . . . . . . . . . . . N PROC 001A CODE Length = 0007

START . . . . . . . . . . . . . L NEAR 0000 CODE

@CPU . . . . . . . . . . . . . . TEXT 0101h

47090 + 412122 Bytes symbol space free

Copyright © 2008 K M Hebbar

Register set of 8086 accessible to programmers

1. General purpose: ax (16 bits) or ah:al (8 bits each) – accumulator

The register cx is called the counter register. It is used as a counter in handling

The register dx is used as an extension to the accumulator as already mentioned.

The overflow flag indicates the addition or subtraction of numbers interpreted as

1377:0100 add al,ah

-r ; display all registers

1377:0100 00E0 ADD AL,AH ; add al and ah

Note: The complete result of this addition is not zero as seen

Exercise: Show that after subtraction of two numbers if this

1377:0100 add ax, 0

AX=00F0 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000

AX=01E0 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000