Intel 8086 is one of the very successful microprocessors which have been there
from 1978 onwards. It is upward compatible with the advanced Intel processors based on
the IA-32 Architecture, and is the processor which every beginner of microprocessor
studies invariably goes through. It has a fairly complex and quite powerful instruction
set. A good understanding of the hardware register capabilities as well as of the
instruction set is needed to program and get the best from the processor.
In the present book, sufficient care is taken to make the background required for
programming very clear in the first two chapters. In the rest of the book, adequate
examples are discussed to nail down the various finer aspects of programming. In
writing an assembly language program, normally, text books give a working program and
that is that; no alternatives are discussed. Discussions on the logic of selecting a
particular algorithm and allotting registers to the variables to be handled are not seriously
done. In this book quite a lot of insight into proper register selection and proper
algorithm selection are all discussed adequately. In many instances more than one
program is presented for solving a given problem.
It is to be noted that we have used only the simplest programming mode in the
MASM. Use of helps like tiny, small medium and large models is not done, as the focus
here is more on remaining as close as possible to the processor hardware and not so much
on the study of the advanced features of the tool, namely, of the MASM.
Another important and unique feature of the book is that the programs given here
are fully tested using MASM version 5.10 for assembling; the debug environment is used
for testing. The working of the program in the debug is adequately illustrated with
i
adequate study documents produced from the debug during its operation. One can almost
get the hands on experience while going through these documents.
The book presents programs at various levels of difficulty, from simple to complex.
The stress is however, on number crunching type of programs, although some basic I/O
programs like keyboard handling and simple screen displays are included. Again, it is
also to be noted that the concentration throughout the book is on assembly language
programming at the hardware level. This means no serious note is made on MASM
feature based programming like use of macro libraries or modular programming etc.
which are not functions of the processor hardware. In fact, at the end of chapter 6, the
reader should be in a position to create his/her own library of macros useful in programs
handling large numbers. The individual macros are discussed at length as they serve to
introduce a concept of generating our own instructions to add to the instruction set of the
processor. The entire book is organized as follows:
Thorough descriptions of the register set of 8086 as well as of the instruction set of
8086 are presented in Chapter 2. They have to be fully understood before proceeding to
program in the assembly language. Like a person trying to play chess must be thorough
with the rules of the game before he starts to play, an assembly language programmer
must be completely confident of the register set and register capabilities along with the
instruction set for the processor before beginning to write programs. Examples are given
in this chapter for studying single instructions using the debug all by itself.
Chapter 4 discusses the use of macros and subroutines, which are quite useful in a
programming environment. Although they both serve almost similar purposes, namely
reducing repetitions from the point of view of the programmer, and freezing specific type
of useful tasks to reusable tested program units, they have their own differences and these
differences are clearly brought out and illustrated in this chapter.
ii
Chapter 5 is devoted to simple example programs which help the beginners in
understanding various aspects of developing working programs. In this process, one
should observe that if a program works for the first time in the lab, it will not be a good
learning material. Only when one goes through and rectifies errors that are inevitable
with any program, will the learning be complete.
Chapter 6 illustrates the power of the Intel 8086 processor in number crunching.
Very large numbers are handled in this Chapter including large BCD numbers. However,
as a beginner’s learning material, these are not suitable. These are there to show the
capabilities, and to motivate the believers (one need not necessarily go into the details,
but if one believes the details can be worked out with enough patience and is able to see
the results and also verify them.) into getting enthusiastic about assembly language
programming.
The author is grateful to the Nitte Education Trust and the Principal and the staff of
the NMAM Institute of Technology, for providing an encouraging atmosphere where the
author can peacefully pursue his interest. The staff and students of the National Institute
of Technology, Karnataka where the author worked earlier, and where the author was
introduced to the 8086 Processor are gratefully acknowledged for motivating the author
to study the intricacies of assembly language programming. I cannot, of course, miss to
mention the constant support from my family in all my endeavors. I believe the book
will be useful to the staff and students to understand the basics of assembly language
programming.
K M Hebbar
Copyright © 2008 K M Hebbar
iii
1. ASSEMBLY LANGUAGE PROGRAMMING
The programmer using the AL (assembly language) must still have a complete
knowledge of the register set of the processor and their capabilities, before he can write
an efficient program. Writing an ALP to solve a problem requires thinking in two
distinct levels, one at the processor hardware level, and another at the problem level. Let
us take the simple operation of multiplying two variables. At the problem or algorithmic
level, all that is to be done is to multiply two numbers. At the processor or hardware
level, one has to worry about where the variables are to be placed: if they are to be in the
processor registers or in the memory, and where the result is to be put. The HLL’s differ
from AL in this aspect. When using HLL, one need not be concerned about where, in the
hardware, the variables are to be located. One can directly program in terms of the
variables and the operations to be done on them, that is, think only in terms of the
algorithm and not about implementation details in terms of the processor system
hardware. The conversion of HLL to ML to use the hardware facilities available in a
given processor is done by the HLL compiler, specific to the target processor. It is to be
noted that different compilers may have different levels of program optimization.
However, a specific problem may have special features and a general purpose
1
optimization of an optimsing compiler may not be able to fully exploit these special
features in its optimization. An efficient human programmer may be better in exploiting
such special features. Of course, it requires more effort on the part of the programmer,
but once this effort is put, the resulting gain in the speed of execution of the problem is
available every time the program is used. An ALP can have this advantage over a
compiled program. The raw power of the processor can be best handled only at the AL
level and not so much by the HLL level. Further, when using a HLL, the programmer is
bound by the compiler in respect of the data types he may use. For example, very very
long integers cannot be used. For further discussions on this, you may refer to the website
http://webster.cs.ucr.edu/Articles/GreatDebate/index.html
This 3-line program converts the single hex digit in register A to its ASCII
equivalent. A corresponding program given below in 8086 will not do this conversion.
CMP AL, 0A H
SBB AL, 2F H
DAA
If we try to investigate the reason for this difference, we will discover quite a lot
about how the ALU of the two processors differ in their design. (See appendix A for a
brief discussion on these programs).
To study the hardware details to any extent, we should be as close to the hardware as
possible. The closest one can get without serious involvement with the machine language
is through the assembly language. The only other language which can be considered in
this context is the C language, which has some features of the assembly language. But
the assembly language gives the best possible approach to develop an insight into the
hardware of the processors.
2
does it relate to the problem or the algorithm at hand is not very clear. After a couple of
weeks, or even days, the programmer himself may not be able, perhaps, to understand
why a certain instruction is present or what it does in the program. To overcome this
difficulty, writing proper comments is a necessity. The instructions will clearly give the
hardware action, but the algorithmic basis for doing that hardware action is what the
comment should say. Comment for comment’s sake, as in the example shown below,
does not say anything beyond what the mnemonic says and should be strictly avoided:
MOV BX, AX ; move the data from reg. AX, to reg. BX
Note the comments are to be separated from the instructions by a semi colon, “;”. In
any line, the assembler will ignore whatever that comes after the semi colon. If the above
instruction has to be commented, the comment, depending on the algorithmic context,
can be something like:
MOV BX, AX ; save AX in BX for later use.
Writing the instructions with accompanying relevant comments at the problem
level will further make the process of thinking at two levels easier than otherwise.
Tools used for the ALP Studies: The basic tools required for studying the 8086
processor at the ALP level, are (i) a macro assembler, MASM, for example, (ii) the
associated linker, and (iii) a debugger, DEBUG, for example. We shall be using the
MASM (version 5.10), which can assemble files with the .asm extension, to produce a
.obj file (and .lst file also, if required), the associated linker which produces a .exe file
from the .obj file(s). The .exe file can be studied in the DEBUG. As we show later,
simple studies (like single instruction studies, for example), which do not need more than
a few instructions can directly be done in the DEBUG itself. We shall look into these
tools one by one.
The DEBUG: The Debug is a low level facility which allows programs to be
assembled as well as executed either step-by-step, tracing the entire register contents on
execution of each instruction or up to a specified break point. The trace facility also
indicates the next instruction to be executed, along with any relevant memory data
associated with the execution of the next instruction. Execution to a break point is also
permitted, in which case, the register contents etc. will be displayed after the execution of
the final instruction before the break. In case of a subroutine the trace through the
subroutine can be suppressed, and the result of the execution of the subroutine can be
seen at the return from the subroutine. Below, is shown, the format of the trace display in
the debug
-t
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 8B4140 MOV AX,[BX+DI+40] DS:0040=0050
-
Color coding is done above for easy identification of the different fields of the trace
display, and the different trace fields are described below:
-t The trace command; note the “-” here is the debug prompt; this prompt is
also seen after the trace operation is completed in the next line (in the 5 th
line of the display above), prompting for fresh command to be issued.
3
AX=0000: The thirteen registers, excluding the Flag registers are displayed and
their contents after the execution of the previous instruction are indicated in
hex.
NV: The eight flag conditions are indicated explicitly as existing after the
execution of the previous instruction, as follows:
Overflow flag: NV- No oVerflow, OV- OVerflow
Direction flag: UP- address increasing (string instructions) DN- address
decreasing
Interrupt flag: EI- Enable Interrupt; DI- Disable Interrupt
Sign flag: PL- positive or PLus; NG- NeGative
Zero flag: ZR- ZeRo; NZ- Not Zero
Auxiliary carry: AC- Auxiliary Carry present; NA- No Auxiliary carry present
Parity: PE- indicates Even Parity; PO- indicates Parity Odd
Carry Flag: CY- indicates CarrY present; NC- indicates No Carry present
1377:0100 8B4140 Shows the next instruction address and the next instruction
machine language coding.
MOV AX,[BX+DI+40] Next instruction in the assembly language, ready for
execution if a ‘t’ command is to be given next.
DS:0040=0050 The relevant word data (at DS:[BX+DI+40] which is DS:[40] here)
indicated as 50 at that location in hex.
General features of the debug: The debug prompt is the ‘–’ sign, as we have
already seen. All commands of debug are single letter commands. The commands may
have one or two parameters normally (sometimes a list of numbers), to represent address,
data or register names. The parameters are just given as hex numbers or register names
following the command letter with a space to separate the two, if there are two
parameters. The commands of debug are not case sensitive. ‘A’ or ‘a’ will carry out the
same command in the debug. A200, A 200, a200 or a 200 are all the same in the debug.
Similarly u200 210 or U 200 210 are also the same; at least one blank space separating
two parameters of the command is obviously a must, but space/s separating the command
character and the first parameter is optional. Only 16 bit register names may be used as
parameters in the command. For example, ‘ax’ can be used with a register command, but
not ‘ah’ or ‘al’; ‘rax’ or ‘r ax’ is a valid command, but not ‘ral’ or ‘r ah’. We shall now
look into some of the commands.
Table 1. Some Debug Commands
command ch parameters
assemble a [address]
dump d [address range]
enter e address [list]
go g [= address] [list of alternative addresses]
proceed p [=address] [number]
quit debug q none
register r [register]
trace t [= address] [value]
unassemble u [range]
Help ? none
4
Note:
1. All commands are single characters as shown in column 2 of the table.
The third column shows the parameters. Optional parameters are shown
within square brackets.
2. When optional parameters are not given, a default value based on the
current conditions will be taken.
3. Full details of the commands can be had in debug using the command ‘?’.
Several examples in the next chapter will clarify the use of these commands. An
example of the study of the g command is shown here. In this exercise you can see the
method of using not only the g command but several other commands as well.
-a ; assemble at the default address (no parameter given)
1377:0100 mov ax, 1234
1377:0103 ja 10a ; jump if no carry, to 10a hex location
1377:0105 jb 110 ; jmp if carry, to 110
1377:0107 jz 120 ; jmp if zero flag is set, to 120
1377:0109 ; simply press ‘enter’ to exit from ‘assemble’ command
-r ; display registers
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 B83412 MOV AX,1234
-g =103 10a 110 120 ; start from 103 and halt at 10a, 110 or 120
-rf
NV UP EI PL NZ NA PO NC -cy ; set carry flag
-r
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO CY
1377:0100 B83412 MOV AX,1234
-g = 103 10a 110 120
-rf
NV UP EI PL NZ NA PO CY -zr nc ; set zero and clear carry.
5
-r
Also see Appendix B for a study of the proceed command of the debug.
The Macro Assembler MASM, and the Linker, LINK: The assembly language
program is written using the edit command in the DOS environment, with a filename
with the file extension .asm. It can then be assembled using the command MASM
filename; for example, to assemble the file hex_to_bcd.asm, the command will be:
Masm hex_to_bcd;
The ; at the end will make the assembler ask no further questions about the files to be
generated. It will only generate the object file, a machine language file with a file
extension .obj; for the above command shown in italics, the file generated (considering
there are no errors in the ALP) will be:
Hex_to_bcd.obj
The object file will have machine codes for the ALP, but the segment will not be
initialized. This means, the file will be re locatable, depending on how the segments are
specified. Every effective address in the program or for data is relative to one of the
segment (cs:, ds:, es: or ss)base address values. The program will be completely
executable with all segments initialized, and if it is a multi module program, with all the
modules properly linked up, by using the link command. For a single module program,
as the one indicated above, the link command is: link hex_to_bcd;
The result of the link operation will produce an executable hex_to_bcd.exe file, if
everything is OK. This executable file can directly be worked or studied step-by-step or
using break points and so on, in the debug environment. It can also be directly executed
to the finish as a command under DOS.
The Assembler Directives: The following skeleton of an ALP shows the main features
of a simple .asm file, indicating many of the assembler directives used.
1 data segment
2 val_hex dw 1234h, 567h, 0abch
3 val_dec dw 3 dup (?)
4 data ends
5 stak segment stack
6 dw 256 dup(?)
7 tos label word
8 stak ends
9 code segment
10 assume cs:code, ds:data, es:data, ss: stak
6
11 start: mov ax, data
a. mov ds, ax
b. mov es, ax
c. mov ax, stak
d. mov ss, ax
e. mov sp, offset tos
f. ;program instructions
12 code ends
13 end start
Color coding: highlights: segment named data; segment named stak; and
segment named code. End of compilation indication to the assembler
Character colours: black (followed by red) symbol/variable names
Red: Assembler Directives; Blue: Data or processor hardware related information
We shall now look at the above skeleton program line-by-line.
Line 1: The assembler directive used is the segment. This makes the assembler
open up a new segment. The word data is the name given to the segment.
The format for the segment opening is name_of_segment followed by the
word segment.
Line2: val_hex: name of the first variable stored.
dw: is the assembler directive, define_word, signifying the variable val_hex
and the rest of the data on that line are word or 16-bit data
db or dd in that position will indicate define_byte and define_double word
(32-bits) respectively.
The blue numbers in the line initialize the 3 variables in the beginning of the
segment named ‘data’ to the values given. Values can be given in binary
(b), decimal (default) hexadecimal (h)
Line 3: val_dec: name of the variable stored after the 3 words of line 2
dw: assembler directive define word
3 dup (?) : this indicates 3 items (here 3 words) of data are provided
here, without being initialized. The directive means 3 duplicate (any data
word). Such un-initialized locations are kept for storing the results of the
program. If required, these locations can all be loaded with any data like all
0s by simply changing this part to 3 dup (0).
Line 4: data: The variable (segment name), as we have seen already.
ends: the assembler directive to end (or close) the segment (data segment in
this case).
Line 5: stak: a variable or label name.
segment: segment directive, indicating open a new segment as we have
already seen (and name it as stak)
stack: consider it as the stack segment. This is mainly information to the
user of the program, like a comment. The assembler essentially does not do
anything about it.
Line 6: dw: define word directive, which we have already seen.
256 dup (?): un-initialized words 256 (= 100hex). This is the memory
provided for the stack in this program.
Line 7: tos: variable or label name
label: directive to consider tos as a label name.
7
word: this indicates to the assembler that the label tos is a word pointer.
Line 8: Stak ends: marks the end of the segment named stak.
Line 9: code segment: indicates the start of a new segment named code.
Line 10: assume: this is an assembler directive. The program is written in the code
segment and to start the program, the first instruction is to be fetched from
the location pointed by cs:ip. This requires both cs and the ip must be
available in the beginning. Defining the cs segment is taken care of by this
assembler directive. At the time of linking, this will be taken note of.
Defining ip we shall see in the next line. The directive assume indicates
what are the segments used by the program and what are their names. Note,
cs cannot be managed by the program, because the program itself cannot
start without a cs being defined. The other segment registers can be loaded
in the program itself and hence, even though indicated in the assume
directive, are indicated only for user reference, and are to be specifically
managed in the program. They are simply like comments. Only cs is of
significance to the assembler, linker. Some assemblers do take care of
loading the other segments also using slightly different type of directives.
cs:code: indicates the assembler to use the segment named code as the cs.
ds:data, es:data, ss:stak: information for the user as already stated above.
Some more ideas on this can be had by looking at the pr.asm program
discussed in appendix B at the end of this Chapter.
Line 11: start: this is a label used for reference purposes.
mov ax, data: the first instruction of the program. This instruction is for
loading the different segment registers. The segment name data is to be
loaded into ds and es segment registers. So this segment name is moved to
register ax and from there to registers ds and es in the succeeding
instructions.
Lines a. to e: These instructions take care of initializing segment registers ds, es, and
ss, and also of the stack pointer sp.
Line f: Line f and onwards, the real useful operations of the program are written.
Line 12: code ends: Line 12 tells the assembler that the segment named code has to
be ended using the ends or end segment directive.
Line 13: end: The end directive tells the assembler that it is the end of the assembly.
start: is a reference to the start label, telling the assembler to load the
effective address or the offset address of the start label to the ip, and start
executing from that address.
8
The Segment Definitions and Segment Integrity Aspects: With the assembler
directives segment and ends, it may look as though the assembler MASM will take care
of maintaining the segment limits and prevent other segment operations over writing and
damaging the integrity of any segment. Unfortunately, it is not so. The assembler simply
converts the program as given, into a suitable machine language program. But if inside
the program, there happen to be instructions that violate any segment integrity, either
going beyond the segment defined areas or crossing over into regions defined for other
segments, the assembler will not be able to check on this, as this happens at the run-time,
and is not known at the assemble-time. The example shown below indicates this feature.
This means the programmer should work out in advance the requirement (maximum
requirement) for the data, extra and the stack segments and make adequate provisions for
these requirements in his program. The hardware of the processor will not check these
aspects during the running of the program. Later versions of the processor, 286 onwards
have guarded against these eventualities in the protected mode of operation, by defining
clearly the segment limits and providing hardware to prevent such infringements on the
segment integrity and segment boundaries. The example demo of the 8086 unprotected
system is given below:
data segment
base dw 5 dup(3)
data ends ; segment as defined has only 5 words in it with data 0003
;
code segment
assume cs: code; ds; data
start: mov ax, data
mov ds, ax
lea bx, base
mov cx, 9
back: mov ax, [bx]
add ax,10
mov [bx], ax
add bx,2
loop back ; this loop forces 9 words into the data segment
int 01
code ends
end start
;Testing in debug
-u 0 18
0B46:0000 B8450B MOV AX,0B45 ; note cs = 0B46, is just (ds + 1)
0B46:0003 8ED8 MOV DS,AX
0B46:0005 8D1E0000 LEA BX,[0000]
0B46:0009 B90900 MOV CX,0009 ; this forces 9 iterations of loop.
0B46:000C 8B07 MOV AX,[BX]
0B46:000E 050A00 ADD AX,000A
0B46:0011 8907 MOV [BX],AX
0B46:0013 83C302 ADD BX,+02
0B46:0016 E2F4 LOOP 000C
9
0B46:0018 CD01 INT 01
-g 0e
;defined data segment, extra space undefined, space defined as code segment.
-g 18
; data segment now has 9 words, last word over written on code segment
; changing the program itself as the listing after execution shows below:
-u 0 18
0B46:0000 C2450B RET 0B45 ; code segment over written
0B46:0003 8ED8 MOV DS,AX
0B46:0005 8D1E0000 LEA BX,[0000]
0B46:0009 B90900 MOV CX,0009
0B46:000C 8B07 MOV AX,[BX]
0B46:000E 050A00 ADD AX,000A
0B46:0011 8907 MOV [BX],AX
0B46:0013 83C302 ADD BX,+02
0B46:0016 E2F4 LOOP 000C
-q
The program above brings out one of the reasons for incorporating protection
features in the processor. In the absence of protection, a user program may destroy itself
during the execution. It is not difficult to see that one user’s program may also destroy
another’s program. Advanced processors, including upgrades of 8086 starting from
80286 have the protection features included in the hardware design of the processor. .
EXERCISES
==00==
10
APPENDIX 1.A
8086 operations of this program: Shown below is a demo of the 8086 program on
the same lines, along with results of execution of the program, step by step
CASE 1: LISTING 13D5:0000 B008 MOV AL,08
13D5:0002 3C0A CMP AL,0A
13D5:0004 1C2F SBB AL,2F
13D5:0006 27 DAA
WORKING OF THE PROGRAM CASE 1:
11
13D5:0000 B008 MOV AL,08
AX=0008 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0002 NV UP EI PL NZ NA PO NC
13D5:0006 27 DAA
AX=003E BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ AC PO CY
CASE 3: LISTING 13D5:0000 B00F MOV AL,0F
13D5:0002 3C0A CMP AL,0A
13D5:0004 1C2F SBB AL,2F
13D5:0006 27 DAA
13D5:0006 27 DAA
AX=0040 BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ NA PO CY
Two of the 3 cases are studied here in these debug demonstration, case 1 of the entry
being in the group 0 to 9 hex, and group 3 for the number 0F hex. In group 1, we get 3E
instead of 38 (6 more), and in group 3 we get 40 (6 less), instead of 46. In group 2 we
get, on SBB, the AL register having numbers DB to DF hex (this case is not shown
above), and 66 will be added to give the correct ASCII code in this case. When the result
goes wrong, the culprit is seen to be the auxiliary carry flag shown highlighted (yellow)
in the response to the SBB instruction in the demo shown above.
The interesting feature to be noted here is that when performing subtraction using 2’s
complement addition, the carry of this ADD operation needs to be complemented to get
the real borrow of the subtraction at every bit, as can be easily verified. However, neither
8085 nor 8086 indicates carry at each bit stage. Only the carry at the half byte stage
(auxiliary carry) and the final byte stage carry are used. In the 8085 processor,
adjustment for decimal subtraction is not provided, while 8086 provides for this
operation. Because of this, 8085 ALU (arithmetic logic unit) does not bother about
correcting the auxiliary carry for subtraction, because as a rule, auxiliary carry is not used
after subtraction in 8085. We have used it here, sort of illegally. 8086 keeps the
auxiliary carry at the correct value, to accommodate the DAS operation. Due to this
feature, we have 6 added to the correct value in group 1 numbers, and 6 less in the group
12
3 number. An equivalent program for the 8086 could work if we complement auxiliary
carry after the SBB instruction to simulate the 8085 ALU behavior. However this would
require 3 additional instructions:
LAHF ; load the lower byte of flag register to AH register
XOR AH, 04 ; this will complement auxiliary carry flag
SAHF ; store the AH register as the lower byte of flag register.
This makes the program a little bigger. Using the ideas of this program a more efficient
assembly language program can certainly be designed. A program more suitable for hex
to ASCII conversion in 8086, based on these ideas can be:
CMP AL, 0A H
CMC
ADC AL, 30 H
DAA
The reader could easily make out how this program works. The program can
conveniently be used as a macro (see chapter 4 for macros) to translate a hex digit in AL
to its ASCII equivalent. The program avoids the conditional jump that would normally
be used for this purpose of converting hex to ASCII as shown below, and conditional
jump would require more time to process. The conventional hex to ASCII program is as
follows:
CMP AL, 0A H
JB DOWN
ADD AL, 7
DOWN: ADD AL, 30 H
But this program uses a conditional jump instruction which generally takes more time
for execution.
Exercises of this type can give a lot of insight into the design aspects of the processor
sub-units.
APPENDIX 1.B
Note: the file is named pr.asm. To understand the segment relations, see the list file also
(pr.lst file).
The pr.asm program studied
Data_here segment
asc db 16 dup(0)
data ends
code segment
assume cs:code ; note other segments not indicated. It will
; make the program rather difficult to follow or debug.
start: mov ax,data_here
mov es, ax ; ‘data_here’ now becomes the extra segment, es.
; ds segment will be separate now.
mov di,offset asc ; offset in the ‘data’ (es) segment
cld
13
mov cx,16
mov bl,0
back: mov al, bl
call hasc
stosb
inc bl
loop back
int 1
hasc proc near
cmp al,10
cmc
adc al,30h
daa
ret
hasc endp
code ends
end start
-g 10
14
;the display at the end of the above line, indicates the characters printed
-q
Symbols:
15
@FILENAME . . . . . . . . . . . TEXT pr
@VERSION . . . . . . . . . . . . TEXT 510
26 Source Lines
26 Total Lines
11 Symbols
0 Warning Errors
0 Severe Errors
16
2. REGISTER SET AND INSTRUCTION SET OF 8086
In this chapter, we shall look at the register set of 8086, as accessible to the
programmer, and then we shall have a detailed look at the instructions; some of the
instructions do require a little bit of appreciation of the actual situation where the
instructions become useful (this is a general feature of all CISC – complex instruction set
computing – type of processors). Where required, such situations are examined with
worked out examples.
Note that in 8086, the data of 16-bits are called words (words are used to
represent data or address), and 8-bits are called bytes or half words (bytes are to represent
data or ASCII characters) and 4-bits are called nibbles (nibbles can be used to represent
BCD digits or HEX digits). Also note that the register names are not case sensitive, when
using ALP. That means ax and AX will both indicate the same register in ALP. Also
note that registers can be indicated in capital letters or in lower case in the assembly
language programs.
Discussion on the use of registers: We will start with the general purpose
registers. Although the registers ax, bx, cx and dx are called general purpose registers for
handling data (either in 8 bits or in 16 bits), these have some special capabilities. The
registers ax (16 bits), and al (8 bits) are used as accumulators, capable of doing certain
specific operations. When the registers AX or AL are used this way, they are implied in
the instruction without being specifically indicated. These registers act as one of the
source operands as well as the destinations for the result of the instruction. For example:
MUL CX will mean the word in ax (implied, and not directly specified in the
instruction) is to be multiplied by the word in cx (specified in the instruction) to get a
16
double word product and the result will go to implied registers dx (high word of product
in the extended accumulator) and ax (low word of product in the accumulator).
There are many such instructions which use ax, and al as implied accumulators, as
explained later while discussing the instructions in detail. In case of multiplication and
division of word size data, register dx is used as the high word extension of the
accumulator for the double word product in multiplication and for double word dividend
in division. The registers ax/ al are used as the accumulator in string instructions like
lodsb/w or stosb/w etc.
The register bx, is called the base register. As a 16 bit register, it can be used to
store an address. In the instruction XLAT (translate), it is used as the implied offset
address in the data segment, where the look up table for translation is located. The other
register associated with the XLAT instruction is the register al; al stores the byte pointer
in the look up table before execution of XLAT and after execution the data in the table
goes to al. The instruction can be used to realize any random Boolean function with up to
eight inputs and eight outputs.
The general purpose registers ax (ah, al), bx (bh, bl), cx (ch, cl) and dx (dh, dl) are
used for 16 or 8-bit data handling and are capable of performing arithmetic and logic
operations, shift and rotate operations on the data stored in them. In this sense, they are
all general purpose data handling registers.
Pointers and Index registers: We shall now look at the next set of registers,
which are five in number, and which are used for handling mainly addresses. They are
all 16-bit registers which can store the 16-bit offset address in a segment. Two of them
are index registers: si and di (source index and destination index); and the other three are
pointer registers: bp, sp and ip (base pointer, stack pointer and instruction pointer).
The registers si and di normally carry addresses of data in the data segment; in
case of string instructions, however, si refers to the source address in the data segment
and di refers to the destination address in the extra segment. We shall later discuss the
method addressing data using a segment with an offset address in registers.
17
The registers bp (normally) and sp (always) refer to address of data in the stack
segment, while the ip refers always to the address of the instruction in the code segment.
The registers si, di, bp and sp can all handle 16-bit arithmetic and logic
operations, like the registers, ax, bx, cx and dx. Although arithmetic addition and
subtraction will be useful for handling addresses, it is difficult to see how multiply, divide
and logical operations could be used for address handling. It simply means that these
registers can also serve as data registers for 16-bit data handling, when they are not used
for address handling. They have no arrangement for handling two separate 8-bit data,
unlike al, ah of ax.
The register ip, is meant exclusively for pointing to the next instruction to be
executed. As all instructions are in the segment cs, ip is always used with cs to generate
the instruction address. Although ip can take part in 16-bit addition/ subtraction, using an
instruction which does not appear to be doing this operation; ip cannot enter into any
other arithmetic (like multiply) or logic (like ex-or) operations. In short, it cannot at all
be used to handle data. Whatever add/ subtract it can do is simply limited to getting the
address of the next instruction by adding or subtracting an integer from the current
contents of ip. We shall see further about this while discussing the jump instructions.
Segment registers: There are four segment registers: cs (code segment), ds (data
segment), es (extra segment) and ss (stack segment). The register cs, as we have already
seen, indicates where the program instructions are located. The data and the extra
segments are indicators of the locations for storing the data used by the program
including results. The need for two segments for data storage will be brought out when
we discuss string instructions. The register SS is indicating the memory area used for
stack purposes. As we shall see later, stack is a very useful data structure which makes it
convenient to perform certain operations, during the execution of programs.
The flag register: The flags are single bits of information based on the nature of
the result of the results of the immediately preceding arithmetic or logic operation. The
8086 updates six flags when any arithmetic or logic instruction is executed.
18
interpret the data as signed 2’s complement number of 4-bits, it becomes negative 7,
which is not correct. The correct result is 01001 and requires 5-bits or more to be
represented correctly. An overflow has occurred, but carry bit is not set. It is thus seen
that 2’s complement overflow and carry are not the same; therefore a separate indication
for 2’s complement add/ subtract overflow is needed. This is the overflow flag, and
when it gets set, it indicates that the preceding add/ subtract operation has resulted in a
number which cannot be represented completely by the result register, if the numbers are
interpreted as signed numbers in the 2’s complement notation.
The carry flag indicates the same feature, for numbers interpreted as unsigned.
This is the carry resulting from the normal binary add. It indicates as unsigned numbers,
the numbers added/ subtracted have produced a result which cannot be represented fully
in the destination register.
The zero flag indicates that add/ subtract operation has produced a result which is
zero. The zero applies only to the data stored in the destination register, and not to the
actual result of the arithmetic operation. To clarify this statement, consider adding 80
hex to 80 hex. The result is 100 hex. But what is stored in the 8-bit register will only be
00 hex, and the zero flag will be set, along with carry flag. The following experiment in
debug environment shows this fact.
-a ; start assembling at default address of 100 in the CS
-rax
AX 0000
:8080 ; load ax with 8080, that is, ah and al with 80 hex each
-q ; quit debug
19
The sign flag indicates that the result in the register is storing a negative number,
if interpreted in the 2’s complement mode. That is, the leading or the leftmost bit in the
result register is 1.
The auxiliary carry flag indicates in case of 8-bit or 16-bit arithmetic operation,
the presence or absence of a carry at the L.S. (lowest significant) Digit or L.S. nibble.
This flag is useful in applying corrections, as we shall see later, in connection with
decimal arithmetic instructions.
The parity flag indicates if the number of 1’s in the result register is odd or even.
It is used in data communication type of application for carrying out parity check on the
data received, and also for producing parity bits during transmission. For this purpose,
parity flag indicates the parity of the lower byte of the result in case of 16 bit operations,
especially, the 16-bit add operation produces a parity flag which corresponds only to the
lower 8 bits of the result. Communication uses 8-bit operations normally. See the
demonstration below:
-a
-rax
AX 0000
:00f0
-r
AX=00F0 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 050000 ADD AX,0000; on adding AX has even parity
; AL also even parity
-t5
20
AX=0F00 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=010B NV UP EI PL NZ NA PE NC
1377:010B 0000 ADD [BX+SI],AL DS:0000=CD
; AX has even parity
; AL also has even parity
-q
In addition, there are two more flags that the user can manipulate. These are: the
direction flag and the interrupt flag. The direction or the D flag is used to control the
direction of the string operation in the string type of instructions. The interrupt or I flag
is used for interrupt control purposes as we shall see later.
There is still one more flag which is not accessible to the user, and that is the
trace or the T flag, which is essentially controlled by the system.
The Flag register details are shown below bitwise (each column indicates a bit):
The Flag register has 16 bits which are shown above. X’s are don’t
cares.
The experiment suggested below is an attempt to study the flag register details in
the debug environment.
-a ; assemble at cs:100
;Watch the highlighted flag register contents, see how they match the
;indications in register AX. Watch the change in parity bit on execution
;of this instruction. Reason out why.
AX=3CD7 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0105 NV UP EI PL NZ NA PE NC
1377:0105 50 PUSH AX
21
AX=3CD7 BX=0000 CX=0000 DX=0000 SP=FFEC BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0106 NV UP EI PL NZ NA PE NC
1377:0106 9D POPF
-q
Explain the program and the flag conditions at the high lighted places.
Exercises: There are in all 8 flags which can be user controlled, and the ‘xor’ing
above reverses all these 8 flag bits. Use the method of PUSH AX followed by POPF to
identify one by one, which flag register bit corresponds to which flag. Find also which
value of the flag bit represents which condition of the flagged entity.
Although the computer basically works on unsigned binary number system, there
are instructions which can manipulate the data in registers/ memory as signed numbers,
decimal (binary coded or BCD) numbers or as ASCII characters for display purposes and
so on. We will now look into the details of these different data types handled by the
instruction set of 8086 processor. The data size handled by the processor is 8-bits and
16-bits as we have already seen.
The data can be simple unsigned binary numbers. In this case, carry and zero
flags will be of interest.
Question: In case of subtraction of unsigned numbers, do you think the sign flag
can give any meaningful indication about which of the two numbers is bigger? [Hint:
No. Can you support it with examples? Also try to reason out which flag(s) give this
information about the greater of the two source numbers, in the subtraction.]
The processor 8086 can also handle signed binary numbers of 16-bits or 8-bits. In
this case, the overflow, sign and zero flags will be of interest.
Question: Reason out which flags will be required to find out if the minuend is
greater than, the same as or less than the subtrahend in case of subtraction/ comparison of
two signed numbers. [Hint: All three flags indicated above. Give the logic for the
comparison in terms of these three flags]
Two 2-digit BCD (binary coded decimal) numbers can be handled for addition or
subtraction, in which, the result of binary addition or subtraction should be in the register
AL. In general, the operation for BCD number addition/ subtraction requires two
instructions to be executed. The first indicating the normal add, subtract binary operation
with the result in register AL, and the second to correct or adjust the result of the binary
operation in AL to be consistent with the result of BCD operation. If correction for
operation of addition is required, the add must be followed by the instruction DAA
(decimal adjust AL for addition), and if correction for subtraction is desired, the subtract
instruction must be followed by DAS (decimal adjust AL for subtraction). This BCD
22
correction process involves the use of the carry and the auxiliary carry flags as we shall
see later.
Question: In the above description, we have not included the comparison
operation. Comparison, as you know, is based on the result of subtraction. Do you think
we should use DAS after the comparison instruction while comparing two BCD
numbers? [Hint: The answer is ‘No’. Try to reason out this issue.]
It should be noted that only provision is made for addition and subtraction of 2-
digit BCD numbers, and these numbers are to be unsigned only. No direct way is
provided for handling signed BCD numbers or handling larger BCD numbers. Direct
multiplication and division of BCD numbers is not provided for, in 8086. The
instructions AAA, AAS, AAM and AAD provide for decimal addition, subtraction,
multiplication and division essentially at single digit level, in two stage operations, as we
shall see later while discussing these instructions.
There are some (actually very little) provisions in 8086 for handling ASCII
(American Standard Code for Information Interchange) characters. The console
keyboard and the monitor or other input/ output devices handle characters in the ASCII
code, as they have to take care of numbers, as well as textual material. However,
interpretation of data as ASCII is mainly at the operating systems level. Certain interrupt
operations interpret the data in the registers AL, AH and AX as ASCII characters.
Details of these operations we shall look into later.
If the operation involves only a single operand like increment or rotate etc., the
result naturally replaces that operand.
Other instruction set architectures used in other processors can be of the zero
address type (operands, top two in stack, result replacing the stack top operand, used in
calculator type of systems), accumulator type or single address architecture(one operand
23
assumed to be in a special register called accumulator, the other source operand specified
by the instruction, and the result replacing the accumulator data – used mainly in 8-bit
processors), or three address architecture, where three separate locations in registers or
memory are specified, two for the source operands and one for the result. The data
sources, in three address machines will normally be only in registers (this is known as
load/ store or register/ register architecture, where memory data can only be loaded to a
register or register data can only be stored in memory, while only register data can
participate in arithmetic logic operations). This architecture is used mainly in RISC
(reduced instruction set computer) type of machines. Other types of architectures are also
there, but they are less commonly used.
Exercise: Study register direct and indirect addressing in the debug environment.
24
Based or indexed addressing: Based or indexed addressing gives useful method
of addressing data arrays stored in memory. Although both do the same thing, the two
different names provide two different situations where this method of addressing can be
used. An example of this type addressing is: INC [BX+2] which can also be written as
INC 2[BX]. This means read the data from memory at the address which is 2 more than
the address contained in register BX; increment the data, and write it back to the same
location. Consider a byte array starting at address 1200 hex. The first element of the
array is available by indirect addressing through the register BX, with BX having the
address 1200 hex, while any ith byte of the array is addressable with the index i-1 in the
array using the address [BX+i-1]. Here we have the base address of the array in BX and
the index number is specified as an unsigned integer in the instruction. This is known as
based addressing. Now consider another situation where we have 2 different byte arrays;
one starting at 1200 hex and the other, say, at 1380 hex. And we want to handle, say, the
5th byte entry of each of the arrays. Then we will store 5 -1, or the index number 4 in the
BX register, and use the address 1200h[BX] to refer to the 5th byte of the first array and
1380h[BX] to refer to the 5th byte of the second array. In this case we have stored the
index number in the register and the base address of the array is specified as an
immediate data to be added to the BX register content to get the address of the data in
memory. Since the index number is now in the register BX, this method of addressing is
known as indexed addressing. Note that the processing required to get the memory
address in both based as well as indexed addressing is the same. The difference is only in
our interpretation in terms of the problem requirement.
Based and indexed addressing: Intel 8086 permits a combination of based and
indexed addressing with an immediate number in the instruction. For this purpose, BX
and BP are considered as base registers, while SI and DI are considered as index
registers. Any base register and any index register along with an additional offset
number can be used for the addressing in this mode. The address 2[BP+DI] is a valid
address in this mode; this address will correspond to the address obtained by adding the
contents of BP and DI registers and then adding the number 2 to this sum. The address
2[BX+BP] will be invalid and so will the address 2[SI+DI] as in both these cases we do
not satisfy the combination one base and one index register within the square brackets.
There are several ways in which the based indexed instructions with displacements can
be written in an assembly language program. Exercise: Use the debug environment and
try to find four valid methods of writing this based indexed addressing instruction, with
displacement, in the assembly language. (Hint: Try various forms like 2[bx][di], 2[bx,
di], 2[bx]di etc and find which gets rightly unassembled as [bx+di+2]).
The role of the segment registers in memory addressing: The Intel 8086
processor provides for addressing of memory with 20 bits of address, 00000 hex to fffff
hex. We have so far been seeing that addresses can be contained in 16-bit registers (like
in register indirect as well as based or indexed addressing etc.). Then how is the 20-bit
address produced? The answer lies in the fact that not just one register, but two 16-bit
values are used in producing the 20 bit address. Of these, we have already seen in the
above section on addressing modes, how the 16 bit memory address is generated based
25
on the instruction. The address thus obtained is known as the EA (effective address) or
the offset address of the data. This address is now combined with a value derived from
one of the segment registers to produce the 20-bit absolute address of the data. The
derivation of the 20-bit address is done by extending 16-bit segment register to 20-bits by
simply adding four binary 0’s or just one hex zero at the end of the segment register
content. The 16-bit address, EA, obtained from the instruction is now added to the 20-bit
address derived from the segment register to get the 20-bit absolute memory address.
Any carry resulting from this addition is simply ignored by the processor. Which of the
four segment registers goes with which effective address? IP is the EA of the instruction
to be fetched. It always goes with the code segment. Normally EA of any data goes with
the data segment DS. However, in case of string instructions the EA of the source is
associated with the SI register, and this goes with the data segment. The destination is
associated with the DI register and this goes with the extra segment ES. The use of
separate segments with the source and data addresses permit the movement of data from
any source address to any destination address in the full 20 bit address range of the
memory. Attaching both source and destination addresses to the same segment register
will give a total address range of only 16 bits from the segment base, as the effective
address is only 16 bits. Please note that the offset address in any segment can only be in
terms of 16-bits, which means a segment can accommodate 65536 bytes of data or
program.
-r
26
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 BB0002 MOV BX,0200
-t3
AX=0000 BX=0200 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP EI PL NZ NA PO NC
1377:0103 2E CS:
1377:0104 C7073400 MOV WORD PTR [BX],0034 CS:0200=75C2
Exercise: The DS register in an 8086 has the hex number 1234. What memory
address in 20 bits is indicated at the effective or offset address of 123A hex? If the
segment override prefix, ES: is used, which effective address will point to the same
physical memory location when ES has the hex address 123B? [Ans: 1357A hex and
11CA hex]
Instruction set: The instructions of 8086 are discussed in detail below. The
instruction description and the operation details of the instructions are taken essentially
from the Intel IA-32 Software Developer’s manual, vol. 2. To start with, the various
types of 8086 instructions available are listed, and then the instructions available in each
type and the details of their operation are presented. There are several types of
instructions as listed below:
Instruction Details
27
1. Data Transfer instructions (including I/O transfers): Data Transfer
instructions essentially copy the data and do not affect any flags (except of
course the POPF instruction which modifies all the flags as per the stack top
word)
• MOV: stands for move, it is actually copy, that is, when data is moved
from one register source to a destination register, source is not destroyed,
only, there will be a copy of this data in the destination register.
Examples: mov ax, bx ; data in bx is copied into ax
mov dx, [bx]
mov [si + 34], cx
mov bx, 1234h; 1234 hex goes to reg. bx
mov wordptr [bx+si + 2], 23 ; 23 decimal or 0017 h is moved.
• XCHG: Exchange instruction exchanges data between registers or
between register and memory
Examples: xchg bx, dx ; reg-reg exchange
xchg ax, [bx] ; reg-memory exchange
xchg bx, [1234] ; reg-memory with direct addressing
• PUSH: Push causes the data in the source register to be copied on to the
stack top.
Examples: push ax
push [bx] ; push memory word at address in bx, on to the stack.
Push [1234] ; push word at effective address 1234 to the stack
pushf ; push the flag register on to the stack top.
• POP: Pop causes the stack top moved to (that is, removed from the stack
and loaded onto) the destination register or memory specified by the
instruction.
Examples: pop bx ; stack top moved to reg. bx
pop [bx + si + 4] ; stack top to memory at the address given
pop [1234] ; pop to memory at effective address 1234
popf ; pop the stack top onto the flag register
Among all the data transfer instructions popf is the only instruction that
affects and modifies the flags.
• IN: The IN instruction reads from an input port into AL or AX (only these
two registers), to be specified in the instruction. The port address is
generally in the register DX. But if it is 8-bits or less, then it can also be
directly given in the instruction.
Example: in ax, 28h; read 16-bit port at address 28 hex into reg ax
in ax, dx; read 16-bit port at address in dx into reg ax
in al, 15 h; read 8-bit port at address 15 hex into reg al
in al, dx; read 8-bit port at address in dx
• OUT: The OUT instruction outputs the data in register AX or AL (only)
to be specified in the instruction to the output port indicated directly in the
instruction if the port address is 8 bits or less, or in the register DX (for
addresses 16 bits or less)
Examples: out 16h, ax; write to output port at address 16h from reg. ax
out dx, ax; write to output port at address in dx from reg. ax
28
out 23h, al; write to output port at address 23h from reg al
out dx, al; write to output port at address in dx from reg al
• CBW: Convert byte to word. The source register al and the destination ax
are both implied and not specifically mentioned in this instruction. This
instruction is used to extend the 8-bit integer (signed number) in reg. al to
16-bit integer in ax. The process is called sign extension. If the number in
al is positive, ah will be loaded with 00 hex, else ah will be loaded with ff
hex.
Example: cbw
Exercise: Study the instruction in debug
• CWD: Convert word in reg. ax (implied and not stated in the instruction)
to double word in regs. dx:ax (also implied and not stated). That is, sign
extend from ax into dx:ax.
Example: cwd
Exercise: Study the instruction in debug
29
DEST ← DEST – (SRC + CF);
The OF, SF, ZF, AF, PF, and CF flags are set according to the result.
• CMP: Compare two operands.
Compares the first source operand with the second source operand and
sets the status flags in the FLAGS register according to the results. The
comparison is performed by subtracting the second operand from the
first operand and then setting the status flags in the same manner as the
SUB instruction. When an immediate value is used as an operand, it is
sign-extended to the length of the first operand. Operation:
temp ← SRC1 − SRC2;
In case an immediate value is used, then
temp ← SRC1 − Sign Extend (SRC2);
Modify Status Flags; (* Modify status flags in the same manner as the
SUB instruction*)
Flags Affected:
The CF, OF, SF, ZF, AF, and PF flags are set according to the result.
Examples: CMP AX, 24 H; (24 is sign extended to 16 bits before
subtraction, because AX is a 16-bit register)
CMP BYTEPTR[BX], -24 H; (no sign extension done, data in
bytes are being handled)
CMP BX, SI
CMP AL, [BX]; (BX will be taken only as a byte pointer, as
AL is a byte register)
The meaning of sign extension is seen in the following debug study:
-a
-u 100 10B
30
ELSE (* word operation *)
DX:AX ← AX ∗ SRC
Flags Affected
The OF and CF flags are set to 0 if the upper half of the result is 0;
otherwise, they are set to 1.
The SF, ZF, AF, and PF flags are undefined.
Examples: MUL BX
MUL WORDPTR [BX + DI]48 H
MUL BYTEPTR [SI]
MUL CL
• IMUL: Integer (signed) multiply. Similar to MUL, except the data are
considered as signed integers.
Flags are also affected similarly as for MUL.
• DIV: Divide the unsigned integer dividend in the accumulator by the
unsigned integer divisor specified in the instruction. If the divisor
specified is a word register or word memory, the dividend is considered to
be the double word in DX:AX and the quotient of division will be in
register AX, with the remainder in register DX and the divisor word
specified in the instruction will not be altered. If the divisor specified in
the instruction is a byte register or byte memory, then the accumulator will
be the word register AX. The quotient of the division will be in AL
register, and AH register will have the remainder. In case of word
division, if the divisor word is not greater than the part of the dividend
word in DX, then the quotient obviously will not fit into the register AX.
Execution of DIV instruction in such a case will cause a division overflow
exception to be generated, and operating system should take care of this
exception. We shall see later, what ‘exception’ means. Similarly if the
byte divisor specified in the byte divide instruction is not greater than the
byte part of the dividend contained in the register AH, division overflow
exception will be generated.
Examples of DIV instruction: DIV BX;
DIV WORDPTR [DI];
DIV CL;
DIV BYTEPTR [SI];
Exercises: Test the DIV instruction in the debug environment. See what
happens when the data is such as to generate division overflow exception.
[Hint: here is an example of such a study.
-a
1377:0100 div bl
1377:0102
-rax
AX 0000
:1234
-rbx
BX 0000
:000f
-r
31
AX=1234 BX=000F CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 F6F3 DIV BL
; Note AH > BL and division overflow exception will occur.
The CF, OF, SF, ZF, AF and PF flags are undefined, when DIV is
executed
• IDIV: Integer divide, same as DIV but the data and the results are
considered as signed integers.
The CF, OF, SF, ZF, AF, and PF flags are undefined, when IDIV is
executed.
There is an interesting doubt that can come up with signed division.
Suppose we divide -7 by +3, there is no doubt about the sign of the
quotient, here the quotient can only be negative. The confusion is about
the magnitude of the quotient and the sign of the remainder. In the given
example, one can say the quotient is -3 and the remainder is +2, or the
quotient is -2 and the remainder is -1, as both these solutions satisfy the
basic requirement that (quotient)*(divisor) + remainder = dividend, and
that the magnitude of the remainder is less than the magnitude of the
divisor;
(-3)*(+3) + (+2) = -7; also (-2)*(+3) + (-1) = -7; which is correct?
Exercise: Try to see, in the debug environment, what the processor
actually gives; try to reason out logically if that is alright. [Hint: You will
find the processor gives a result corresponding to doing the division of the
magnitudes involved, and then attach signs as necessary, based on the
signs of the given data. In the given example, the division of magnitude 7
by magnitude 3 is done to get the result 2 for the quotient, and 1 for the
remainder. Since the dividend and the divisor have opposite signs as
given, the quotient becomes negative, while the remainder will carry the
same sign as that of the dividend. It is logically OK, because you are
distributing negative numbers to three people, and after distributing 2
negative to each, you have remaining with you, 1 negative. So the
quotient is -2 and the remainder is -1.
The CF, OF, SF, ZF, AF, and PF flags are undefined.
Below, we see the demonstration in the debug environment.
-a
32
1377:0100 idiv bl
1377:0102
-rax
AX 0000
:fff9 ; this makes AX, the dividend = -7
-rbx
BX 0000
:803 ; this makes the divisor in BL = +3; (we ignore BH)
-r
AX=FFF9 BX=0803 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 F6FB IDIV BL
-t
-q ; quit debug.
Note: In the above experiment if initially you load into AX some number
like FABC H, and keep BL the same, you will see divide overflow
occurring.
• INC: Increment register or memory. This involves only a single operand.
Adds 1 to the destination operand, while preserving the state of the CF
flag. The destination operand can be a register or a memory location. This
instruction allows a loop counter to be updated without disturbing the CF
flag. (If we use an ADD instruction with an immediate operand of 1 to
perform an increment operation that does update the CF flag.) Operation:
DEST ← DEST + 1;
The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set
according to the result.
• DEC: Similar to INC, this instruction does the operation:
DEST ← DEST – 1;
The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set
according to the result.
• NEG: Replaces the value of operand (the destination operand) with its
two’s complement. (This operation is equivalent to subtracting the
operand from 0.) The destination operand is located in a general-purpose
register or a memory location.
DEST ← – (DEST)
Flags Affected
The CF flag set to 0 if the source operand is 0; otherwise it is set to 1. The
OF, SF, ZF, AF, and PF flags are set according to the result.
33
3. Decimal (BCD, ASCII) arithmetic instructions: 8086 provides for handling
decimal digits in byte size, either as 2-bigit BCD (unsigned) or single-digit ASCII
character byte ( 30-39 Hex in order, standing for 0-9 BCD ). The instructions
DAA and DAS are for handling 2-digit BCD for addition and subtraction, while
the instructions AAA, AAS, AAM and AAD are for handling single digit ASCII
data for BCD digits for addition, subtraction, multiplication and division
respectively. It is to be noted that inputs from the keyboard or other input
devices, as well as outputs to monitor, printer and other output devices will
usually be in ASCII code, so the four ASCII adjust instructions above, starting
with the characters AA, facilitate the handling of the BCD digits in the ASCII
character code, for add, subtract, multiply and divide operations. It is also to be
noted that all the six instructions stated above have an A as the middle character.
This A stands for ADJUST. This implies the operation of add, subtract, multiply
and divide are not done by these instructions; they are done separately by the
normal ADD, SUB, MUL and DIV instructions considering the data as normal
binary. What these instructions do is, to adjust the result of binary operation, to
match the result of decimal operation. We now look into the details.
• DAA: Decimal adjust accumulator for addition
Description: Adjusts the sum of two packed BCD values to create a
packed BCD result. The AL register is the implied source and destination
operand. The DAA instruction is only useful when it follows an ADD
instruction that adds (binary addition) two 2-digit, packed BCD values and
stores a byte result in the AL register. The DAA instruction then adjusts
the contents of the AL register to contain the correct 2-digit, packed BCD
result. If a decimal carry is detected, the CF and AF flags are set
accordingly. Operation: A complete description of the operation is as
follows:
old_AL ← AL;
old_CF ← CF; AL & CF are saved in temporary registers
CF ← 0;
IF (((AL AND 0FH) > 9) OR AF = 1)
THEN
AL ← AL + 6;
CF ← old_CF OR (Carry from AL ← AL + 6);
AF ← 1;
ELSE
AF ← 0; The first IF ends here
IF ((old_AL > 99H) OR (old_CF = 1))
THEN
AL ← AL + 60H;
CF ← 1;
ELSE
CF ← 0;
Flags affected: The CF and AF flags are set if the adjustment of the value
results in a decimal carry in either digit of the result (see the “Operation”
34
section above). The SF, ZF, and PF flags are set according to the result.
The OF flag is undefined.
An experimental study of DAA using an ALP converted into an
executable program with MASM and LINK and execution of the program
in the debug environment is presented below.
The assembly language program studied is given below. (Note the data chosen to
be added to the byte 87 [1000 0111] in BL register in the nine cases.)
code segment
assume cs:code
start: mov bl, 87h ; this is the addend
mov cx, 9 ; 9 different augends are chosen
assume ds: code
mov ax, cs
mov ds, ax ; initialise data segment; note this method
cld
lea si, augends
back: lodsb ; string load byte, without ‘rep’ prefix.
; note cx (count reg) is not relevant here
add al, bl ; get the binary sum
daa ; correct the sum for decimal addition
; note, data in ah is unaffected by this inst.
loop back
int 01
;
augends db 12h; no cy, no ac, no 'abcdef' hex in the sum
db 19h; no cy, ac, no 'abcdef' in the sum
db 91h; cy, no ac, no 'abcdef' in the sum
db 32h; 'b' in msd, no cy, no ac
db 16h; 'd' in lsd, no cy, no ac
db 96h; cy and 'd' in lsd
db 69h; ac and 'e' in msd
db 99h; ac and cy and no ‘abcdef’ in the sum
db 67h; sum becomes 'ee'
code ends
end start
The execution of the program in debug environment:
From here it is actually data sitting in the code segment and interpreted as
instructions (unassembled) under the ‘u 0 1e’command
35
-g 11 ; execute until (and excluding) the instruction at 11 hex.
; Stop just before DAA for the first data, that is, after ADD AL, BL. ; From
here, the program is traced for every data.
13D5:0011 27 DAA
AX=1399 BX=0087 CX=0009 DX=0000 SP=0000 BP=0000 SI=0017 DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI NG NZ NA PE NC
13D5:0011 27 DAA
AX=1306 BX=0087 CX=0008 DX=0000 SP=0000 BP=0000 SI=0018 DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ AC PE CY
13D5:0011 27 DAA
AX=1378 BX=0087 CX=0007 DX=0000 SP=0000 BP=0000 SI=0019 DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 OV UP EI PL NZ NA PE CY
13D5:0011 27 DAA
AX=1319 BX=0087 CX=0006 DX=0000 SP=0000 BP=0000 SI=001A DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ NA PO CY
36
13D5:000F 02C3 ADD AL,BL
AX=139D BX=0087 CX=0005 DX=0000 SP=0000 BP=0000 SI=001B DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0011 NV UP EI NG NZ NA PO NC
13D5:0011 27 DAA
AX=1303 BX=0087 CX=0005 DX=0000 SP=0000 BP=0000 SI=001B DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ AC PE CY
; 66 added to binary sum; why?
13D5:0011 27 DAA
AX=1383 BX=0087 CX=0004 DX=0000 SP=0000 BP=0000 SI=001C DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 OV UP EI NG NZ AC PO CY
13D5:0011 27 DAA
AX=1356 BX=0087 CX=0003 DX=0000 SP=0000 BP=0000 SI=001D DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ AC PE CY
; 66 added, reason out why
13D5:0011 27 DAA
AX=1386 BX=0087 CX=0002 DX=0000 SP=0000 BP=0000 SI=001E DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 OV UP EI NG NZ AC PO CY
; also 66 added; why?
13D5:0011 27 DAA
37
AX=1354 BX=0087 CX=0001 DX=0000 SP=0000 BP=0000 SI=001F DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D5 IP=0012 NV UP EI PL NZ AC PO CY
; 66 again! Why?
38
After: AL=EEH BL=47H FLAGS(0SZAPC)=010111
DAA Before: AL=EEH BL=47H FLAGS(OSZAPC)=010111
After: AL=88H BL=47H FLAGS(0SZAPC)=X10111
Flags Affected by the DAS instruction:
The CF and AF flags are set if the adjustment of the value results in a
decimal borrow in either digit of the result (se e the “Operation”
section above). The SF, ZF, and PF flags are set according to the result.
The OF flag is undefined.
• AAA: ASCII adjust AL after addition.
The instruction AAA, adjusts the sum of two unpacked BCD values(or
even ASCII values, as the AAA destroys the upper nibble of the result of
AL register and does not depend on CY flag for its operation) to create an
unpacked BCD result. The AL register is the implied source and
destination operand for this instruction. The AAA instruction is only
useful when it follows an ADD instruction that adds (binary addition) two
unpacked BCD values and stores a byte result in the AL register. The
AAA instruction then adjusts the contents of the AL register to contain the
correct 1-digit unpacked BCD result. If the addition produces a decimal
carry, the AH register increments by 1, and the CF and AF flags are set. If
there was no decimal carry, the CF and AF flags are cleared and the AH
register is unchanged. In either case, bits 4 through 7 of the AL register are
set to 0. The operational details are as follows:
39
13D5:0104 02C3 ADD AL,BL
13D5:0106 37 AAA
-t4
;execute and trace four instructions
AX=0036 BX=0000 CX=0009 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0002 NV UP EI PL NZ NA PO NC
13D5:0106 37 AAA
AX=0105 BX=0039 CX=0009 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0007 NV UP EI PL NZ AC PO CY
-q ; quit debug
Note: It will help if AH is zero before executing AAA; any carry as comes
here (sum > 9), will then directly be in AH. If AH has any data, then it
will be simply incremented. You can try this in the debug environment as
an additional experiment
• AAS:ASCII adjust AL after subtraction:
Adjusts the result of the subtraction of two unpacked BCD values (or
ASCII values as in AAA) to create a unpacked BCD result. The AL
register is the implied source and destination operand for this instruction.
The AAS instruction is only useful when it follows a SUB instruction that
subtracts (binary subtraction) one unpacked BCD value from another and
stores a byte result in the AL. The AAA instruction then adjusts the
contents of the AL register to contain the correct 1-digit unpacked BCD
result. If the subtraction produced a decimal carry, the AH register
decrements by 1, and the CF and AF flags are set. If no decimal carry
occurred, the CF and AF flags are cleared, and the AH register is
unchanged. In either case, the AL register is left with its top nibble set to
0.
Operation:
IF ((AL AND 0FH) > 9) OR (AF = 1)
THEN
AL ← AL – 6;
AH ← AH – 1;
AF ← 1;
CF ← 1;
ELSE
CF ← 0;
40
AF ← 0;
AL ← AL AND 0FH;
Flags Affected
The AF and CF flags are set to 1 if there is a decimal borrow; otherwise,
they are set to 0. The OF, SF, ZF, and PF flags are undefined.
Exercise: study the AAS instruction in the debug.
• AAM: ASCII adjust AX after multiply:
Adjusts the result of the multiplication of two unpacked BCD values to
create a pair of unpacked (base 10) BCD values. The AX register is the
implied source and destination operand for this instruction. The AAM
instruction is only useful when it follows a MUL instruction that
multiplies (binary multiplication) two unpacked BCD values and stores a
word result in the AX register. The AAM instruction then adjusts the
contents of the AX register to contain the correct 2-digit unpacked (base
10) BCD result.
The generalized version of this instruction allows adjustment of the
contents of the AX to create two unpacked digits of any number base (see
the “Operation” section below). Here, the imm8 byte is set to the selected
number base (for example, 08H for octal, 0AH for decimal, or 0CH for
base 12 numbers). The AAM mnemonic is interpreted by all assemblers to
mean adjust to ASCII (base 10) values. To adjust to values in another
number base, the instruction must be hand coded in machine code (D4
imm8).
Operation:
tempAL ← AL;
AH ← tempAL / imm8; (* imm8 is set to 0AH for the AAM mnemonic *)
AL ← tempAL MOD imm8;
The immediate value (imm8) is taken from the second byte of the
instruction.
Flags Affected:
The SF, ZF, and PF flags are set according to the resulting binary value in
the AL register. The OF, AF, and CF flags are undefined.
The following is an example of hand coding for the base 12 conversion in
the debug environment:
-rax
AX 0000
:2070 ; AH has an irrelevant data 20H and AL has the data 70h (= 94
; in the base 12 system, = 112 in the decimal system)
-e cs:100
; enter the hand code D40C at the default assembly
; address 100H (in debug) in the code segment (cs)
1377:0100 07.d4 BB.c
-u 100 101
; unassemble the first 2 bytes of code
41
1377:0100 D40C AAM 0C ; Note how the hand coded instruction
; gets unassembled, but if we try to
; give this as an instruction, AAM 0C
; it will produce an error in debug or
; when assembled in MASM.
-t
-q ; quit debug
Exercise: study the regular AAM instruction (D4 0C) in the debug.
Note: This AAM instruction could be used for getting 2 digit unpacked
BCD in register AX, from 2-digit packed hex number, less than 64 H (=
100 decimal) in register AL. Using general form of the hand coded
instruction it is possible to apply this to general base conversion. You
may also check what happens when we use this instruction with, say, FF H
in register AL.
• AAD: ASCII adjust AX before division:
Adjusts two unpacked BCD digits (the least-significant digit in the AL
register and the most significant digit in the AH register) so that a division
operation performed on the result will yield a correct unpacked BCD
value. The AAD instruction is only useful when it precedes a DIV
instruction that divides (binary division) the adjusted value in the AX
register by an unpacked BCD value. The AAD instruction sets the value
in the AL register to (AL + (10 * AH)), and then clears the AH register to
00H. The value in the AX register is then equal to the binary equivalent of
the original unpacked two-digit (base 10) number in registers AH and AL.
The generalized version of this instruction allows adjustment of two
unpacked digits of any number base (see the “Operation” section below),
by setting the imm8 byte to the selected number base (for example, 08H
for octal, 0AH for decimal, or 0CH for base 12 numbers). The AAD
mnemonic is interpreted by all assemblers to mean adjust ASCII (base 10)
values. To adjust values in another number base, the instruction must be
hand coded in machine code (D5 imm8).
Operation:
tempAL ← AL;
tempAH ← AH;
AL ← (tempAL + (tempAH ∗ imm8)) AND FFH; (* imm8 is set to 0AH
for the AAD mnemonic *)
AH ← 0
The immediate value (imm8) is taken from the second byte of the
instruction.
42
Flags Affected:
The SF, ZF, and PF flags are set according to the resulting binary value in
the AL register; the OF, AF, and CF flags are undefined.
Note: This instruction can be used to convert 2-digit unpacked BCD in
AX to 2-digit packed hex in AL. The generalized hand coded version will
be useful for doing the same in any base less than 16 decimal (Why less
than 16? Try to reason out).
Exercise: Check the regular and the hand coded versions in the debug.
Hand coding in the .asm file is demonstrated below.
HAND CODING IN THE .ASM FILE
code segment
assume cs:code
start: mov ax,050AH
dw 0BD5H ; hand coded AAD instruction, with 0B or
; 11 decimal after code D5
; note the instruction word is D50B
; but loaded in memory with LS byte first.
; the base of conversion is now 0B or 11 decimal
int 01 ; return control to debug
Code ends
end start
The above file can now be assembled and linked using masm and link
programs to produce a .exe file which can be executed and seen in the debug
environment, as demonstrated below.
-u 0 6
; unassemble the first 6 bytes of the code segment
13D5:0000 B80A05 MOV AX,050A
13D5:0003 D50B AAD 0B
13D5:0005 CD01 INT 01
-r : display registers
-t2
; trace execution of two instructions
AX=050A BX=0000 CX=0007 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL NZ NA PO NC
13D5:0003 D50B AAD 0B
43
This instruction does a bitwise AND of the two operands, but does not place the
result in the destination register. The nature of the result goes to the flag register.
There are many logical operations available. How is it, these four functions:
AND, OR, NOT and EX-OR only are chosen? It can be seen that the functions
provide the programmer, with a capability to handle individual bits of a word
selectively. Consider we want to selectively set bit 4 from the left in a byte. We
can use a mask with a 1 on the 4th bit from left and 0 in all other bits, the mask
will then be 0000 1000. If we OR the data byte with this mask, we see that no
other bit is changed, but the 4th bit is set irrespective of the condition of that bit in
the original data byte. Similarly AND can be used to selectively clear a specific
bit irrespective of its original condition. The mask required will be the
complement of the mask we used for ORing above. A 1 will not alter a data on
ANDing, but a 0 will clear the data when ANDed. An EX-OR will be similarly
useful for selective toggling of the data. A 1 will toggle the data but a 0 will not
when EX-ORed. NOT will be useful for finding the 1’s complement of a full data
word. The logic function group AND, OR, NOT will form a universal logic
group, which means, any logic function could be generated using these three
functions appropriately on a bitwise basis, and hence no further logic functions
will be needed. The EX-OR function will also be useful in data comparisons also.
If we EX-OR two bytes or words, the result will be complete zero (every bit is
zero and the zero flag will be set to indicate this condition clearly), when the two
data bytes or words are equal. With this introduction we will now look at the four
instructions in detail.
• AND: Performs a bitwise AND operation on the destination (first) and
source (second) operands and stores the result in the destination operand
location. The source operand can be an immediate, a register, or a memory
location; the destination operand can be a register or a memory location.
(However, two memory operands cannot be used in one instruction.) Each
bit of the result is set to 1 if both corresponding bits of the first and second
operands are 1; otherwise, it is set to 0.
Operation:
DEST ← DEST AND SRC;
Flags Affected:
The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result.
The state of the AF flag is undefined.
• OR: Performs a bitwise inclusive OR operation between the destination
(first) and source (second) operands and stores the result in the destination
operand location. The source operand can be an immediate, a register, or a
memory location; the destination operand can be a register or a memory
location. (However, two memory operands cannot be used in one
instruction.) Each bit of the result of the OR instruction is set to 0 if both
corresponding bits of the first and second are 0; otherwise it is set to 1.
Operation:
DEST ← DEST OR SRC;
Flags Affected:
44
The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result.
The state of the AF flag is undefined.
• XOR: Performs a bitwise exclusive OR (XOR) operation on the
destination (first) and source (second) operands and stores the result in the
destination operand location. The source operand can be an immediate, a
register, or a memory location; the destination operand can be a register or
a memory location. (However, two memory operands cannot be used in
one instruction.) Each bit of the result is 1 if the corresponding bits of the
operands are different; each bit is 0 if the corresponding bits are the same.
Operation:
DEST ← DEST XOR SRC;
Flags Affected:
The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result.
The state of the AF flag is undefined.
• NOT: Performs a bitwise NOT operation (each 1 is set to 0, and each 0 is
set to 1) or does 1’s complementing on the destination operand and stores
the result in the destination operand location. The destination operand can
be a register or a memory location.
Operation:
DEST ← NOT DEST;
Flags Affected:
None.
Exercise on Logical instructions: The register AX has some unknown
data. Give a single instruction that will produce a 1 in the 4th and the 12th
bit from the left in AX without altering the other bits. If the mask used in
the above case is used with AND instruction, what will happen to the data
in AX?
Give an instruction using XOR logic that will produce the same result on a
register as the NOT instruction does.
• TEST: Bitwise AND the two sources operands, ignore the outcome, but
preserve the nature of the result in the flag register.
This instruction computes the bit-wise logical AND of first operand
(source 1 operand) and the second operand (source 2 operand) and sets the
SF, ZF, and PF status flags according to the result. The result is then
discarded.
Operation:
TEMP ← SRC1 AND SRC2;
SF ← MSB(TEMP);
IF TEMP = 0
THEN ZF ← 1;
ELSE ZF ← 0;
PF ← Parity of the lower 8-bits of TEMP;
CF ← 0;
OF ← 0;
(*AF is Undefined*)
Flags Affected
45
The OF and CF flags are set to 0. The SF, ZF, and PF flags are set according to the result.
The state of the AF flag is undefined.
5. Shift and Rotate Instructions: Shift and rotate instructions shift the data by one
or more bits towards either left or right, straight or in a circular fashion. The carry
flag is always involved in these operations. There are in all 3 shift instructions
and 4 rotate instructions. The shift/ rotate counts can be a single bit denoted as
such or multi bits based on the contents of register CL. The 8086 processor
performs multi bit shifts as per the data in CL register completely, taking 4 clocks
for each bit shift. The upper end processors starting from 286 onwards mask the
upper 11 bits and use only the last 5 bits as specifying the shift count. These
processors also permit multi bit shift counts to be specified as an immediate data
in the instruction, while 8086 allows only single bit shift to be directly specified
in the instruction. SHL BX, 15 H is an invalid instruction in 8086 (only SHL BX,
1 is valid), but valid in other higher end processors starting from 80286. We will
now go to the details.
• SAL/SHL/SAR/SHR: The shift instructions, although shown with four
separate mnemonics, are only three separate instructions. SAL and SHL
are the same, but SAR and SHR are not so. (The debug will only accept
the code SHL and indicates a fault on SAL. But MASM accepts both and
produces the same code for both.) These instructions shift the bits in the
first operand (destination operand) to the left or right by the number of bits
specified in the second operand (count operand). Bits shifted beyond the
destination operand boundary are first shifted into the CF flag, and then
discarded. At the end of the shift operation, the CF flag contains the last
bit shifted out of the destination operand. The destination operand can be a
register or a memory location. The count operand can be the immediate
value of 1, or it can be any 8-bit value in register CL for multiple shifts.
The shift arithmetic left (SAL) and shift logical left (SHL) instructions
perform the same operation; they shift the bits in the destination operand
to the left (toward more significant bit locations). For each shift count, the
most significant bit of the destination operand is shifted into the CF flag,
and the least significant bit is cleared.
The shift arithmetic right (SAR) and shift logical right (SHR) instructions
are different instructions as described below. They do the right shift of the
bits of the destination operand (toward less significant bit locations). For
each shift count, the least significant bit of the destination operand is
shifted into the CF flag, and the most significant bit is either set or cleared
depending on the instruction type. The SHR instruction clears the most
significant bit, and the SAR instruction sets or clears the most significant
bit to correspond to the sign (most significant bit) of the original value in
the destination operand. In effect, the SAR instruction fills the empty bit
position’s shifted value with the sign of the unshifted value.
The SAR and SHR instructions can be used to perform signed or unsigned
division, respectively, of the destination operand by powers of 2. For
example, using the SAR instruction to shift a signed integer 1 bit to the
46
right divides the value by 2. Using the SAR instruction to perform a
division operation does not produce the same result as the IDIV
instruction. The quotient from the IDIV instruction is rounded toward
zero, whereas the “quotient” of the SAR instruction is rounded toward
negative infinity. This difference is apparent only for negative numbers.
For example, when the IDIV instruction is used to divide -9 by 4, the
result is -2 with a remainder of -1. If the SAR instruction is used to shift -9
right by two bits, the result is -3 and the “remainder” is +3; however, the
SAR instruction stores only the most significant bit of the remainder (in
the CF flag). The OF flag is affected only on 1-bit shifts. For left shifts,
the OF flag is set to 0 if the most significant bit of the result is the same as
the CF flag (that is, the top two bits of the original operand were the
same); otherwise, it is set to 1. For the SAR instruction, the OF flag is
cleared for all 1-bit shifts. Execution of the SHR instruction, sets the OF
flag to correspond to the most-significant bit of the original operand.
(The use of the OF flag in these instructions is not indicated in the
manual.)
- u 100 10b
-r
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 B83412 MOV AX,1234
-t5
47
AX=1234 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP EI PL NZ NA PO NC
1377:0103 B112 MOV CL,12
;NOTE: Shift count is more than 16 and hence 16 0’s are shifted into AX.
-q
• RCL, RCR, ROL, and ROR: Rotate instructions, rotate including carry
(RCL, RCR) and rotate only the register (ROR, ROL): These instructions
shift (rotates) the bits of the first operand (destination operand) the number
of bit positions specified in the second operand (count operand) and stores
the result in the destination operand. The destination operand can be a
register or a memory location; the count operand is either the immediate
value 1 or a value in the CL register.
The rotate left (ROL) and rotate through carry left (RCL) instructions shift
all the bits toward more-significant bit positions, except for the most-
significant bit, which is rotated to the least significant bit location. The
rotate right (ROR) and rotate through carry right (RCR) instructions shift
all the bits toward less significant bit positions, except for the least-
significant bit, which is rotated to the most-significant bit location.
The RCL and RCR instructions include the CF flag in the rotation. The
RCL instruction shifts the CF flag into the least-significant bit and shifts
the most-significant bit into the CF flag. The RCR instruction shifts the
CF flag into the most-significant bit and shifts the least-significant bit into
the CF flag. For the ROL and ROR instructions, the original value of the
CF flag is not a part of the result, but the CF flag receives a copy of the bit
that was shifted from one end to the other.
The OF flag is defined only for the 1-bit rotates; it is undefined in all other
cases (except that a zero-bit rotate does nothing, that is affects no flags).
For left rotates, the OF flag is set to the exclusive OR of the CF bit (after
the rotate) and the most-significant bit of the result. For right rotates, the
OF flag is set to the exclusive OR of the two most-significant bits of the
result.
The 8086 does not mask the rotation count. However, all other IA-32
processors (starting with the Intel 286 processor) do mask the rotation
count to 5 bits, resulting in a maximum count of 31. This masking is done
48
in all operating modes (including the virtual-8086 mode) to reduce the
maximum execution time of the instructions.
The SF, ZF, AF and PF flags are not affected by the rotate instructions.
Exercises: Study the rotate instructions in the debug.
For the 8086 processor show that in case of ROL and ROR instructions,
the result of rotations, using CL register for shift count, is independent of
the upper nibble of CL. This upper nibble only increases the execution
time of the instruction.
49
referred to as a short jump. The CS register is not changed on near and
short jumps.
An absolute offset is specified indirectly in a general-purpose register or a
memory location (r/m16), or it may directly be specified in the instruction.
The following is a study of jump instructions in the debug environment.
-a
1377:0100 jmp
112 ; coded as short relative
1377:0102 jmp
1234 ; coded as relative – but 16 bit displacement
1377:0105 jmp
bx ; register direct; bx has the address
1377:0107 jmp
[bx] ; register indirect (near) jump
1377:0109 jmp
wordptr [bx]; same as above
1377:010B jmp
dwordptr[bx]; interpreted same as above? See unassembly.
1377:010D jmp
far[bx] ; register indirect far jump
1377:010F jmp
near [bx] ; register indirect near jump
1377:0111 jmp
far bx ; error, as far jump requires 32 bits of
^ Error ; address, while BX can only store 16 bits.
1377:0111 jmp far [1234] ; far jump to address @ DS:1234
1377:0115
-u100 114
50
instructions having single mnemonic, identify equivalent mnemonics and
check that they represent the same condition. Identify also opcodes which
are having a single mnemonic.
Opcode Instruction Description
1. 77 cb JA rel8 Jump short if above (CF=0 and ZF=0)
2. 73 cb JAE rel8 Jump short if above or equal (CF=0)
3. 72 cb JB rel8 Jump short if below (CF=1)
4. 76 cb JBE rel8 Jump short if below or equal (CF=1 or ZF=1)
5. 72 cb JC rel8 Jump short if carry (CF=1)
6. E3 cb JCXZ rel8 Jump short if CX register is 0
7. 74 cb JE rel8 Jump short if equal (ZF=1)
8. 7F cb JG rel8 Jump short if greater (ZF=0 and SF=OF)
9. 7D cb JGE rel8 Jump short if greater or equal (SF=OF)
10. 7C cb JL rel8 Jump short if less (SF ≠ OF)
11. 7E cb JLE rel8 Jump short if less or equal (ZF=1 or SF<>OF)
12. 76 cb JNA rel8 Jump short if not above (CF=1 or ZF=1)
13. 72 cb JNAE rel8 Jump short if not above or equal (CF=1)
14. 73 cb JNB rel8 Jump short if not below (CF=0)
15. 77 cb JNBE rel8 Jump short if not below or equal (CF=0 and ZF=0)
16. 73 cb JNC rel8 Jump short if not carry (CF=0)
17. 75 cb JNE rel8 Jump short if not equal (ZF=0)
18. 7E cb JNG rel8 Jump short if not greater (ZF=1 or SF ≠ OF)
19. 7C cb JNGE rel8 Jump short if not greater or equal (SF ≠ OF)
20. 7D cb JNL rel8 Jump short if not less (SF=OF)
21. 7F cb JNLE rel8 Jump short if not less or equal (ZF=0 and SF=OF)
22. 71 cb JNO rel8 Jump short if not overflow (OF=0)
23. 7B cb JNP rel8 Jump short if not parity (PF=0)
24. 79 cb JNS rel8 Jump short if not sign (SF=0)
25. 75 cb JNZ rel8 Jump short if not zero (ZF=0)
26. 70 cb JO rel8 Jump short if overflow (OF=1)
27. 7A cb JP rel8 Jump short if parity (PF=1)
28. 7A cb JPE rel8 Jump short if parity even (PF=1)
29. 7B cb JPO rel8 Jump short if parity odd (PF=0)
30. 78 cb JS rel8 Jump short if sign (SF=1)
31. 74 cb JZ rel8 Jump short if zero (ZF = 1)
The following examples in debug show sample instructions also with unassembly :
-a
1377:0100 ja 114
1377:0102 jnb 1234
^ Error; relative address more than 8 bits
1377:0102 jae 85
1377:0104
-u 100 103
51
Exercise: All the conditional jumps are only possible with displacements
in the range -128 to +127 from the current location. If a longer range of
conditional jump is required how can you arrange for that? [Hint: try using
a simple jump with a longer or 16-bit relative address (in addition to the
conditional jump) – at the destination of the conditional jump]
• LOOP, LOOPZ (LOOPE) and LOOPNZ (LOOPNE): These are
unconditional and conditional Loop instructions. Note that there are only
two conditional loops, both based on the condition of the zero flag. Loop
on zero (or Loop on equal) and Loop on not zero (or loop if unequal, that
is when the comparison of 2 data items show that they are unequal)
Description:
The loop instruction performs a loop operation using the CX register as a counter. Each
time the LOOP instruction is executed, the count register is decremented, then checked
for 0. If the count is 0, the loop is terminated and program execution continues with the
instruction following the LOOP instruction. If the count is not zero, a near jump is
performed to the destination (target) operand, which is presumably the instruction at the
beginning of the loop. If the address-size attribute is 32 bits, the ECX register is used as
the count register; otherwise the CX register is used. The target instruction is specified
with a relative offset (a signed offset relative to the current value of the instruction
pointer in the IP register). This offset is generally specified as a label in assembly code,
but at the machine code level, it is encoded as a signed, 8-bit immediate value, which is
added to the instruction pointer. Offsets of –128 to +127 are allowed with this instruction.
Conditional loop instructions (LOOPcc) accept the ZF flag as a condition for terminating
the loop before the count reaches zero. With these forms of the instruction, a condition
code (cc) is associated with each instruction to indicate the condition being tested for.
Here, the LOOPcc instruction itself does not affect the state of the ZF flag; the ZF flag is
changed by other instructions in the loop. Loopz stands for loop if zero, loopnz for loop
if not zero.
Opcode Instruction Description
E2 cb LOOP rel8 Decrement count; jump short if count ≠ 0
E1 cb LOOPE rel8 Decrement count; jump short if count ≠ 0 and ZF=1
E1 cb LOOPZ rel8 Decrement count; jump short if count ≠ 0 and ZF=1
E0 cb LOOPNE rel8 Decrement count; jump short if count ≠ 0 and ZF=0
E0 cb LOOPNZ rel8 Decrement count; jump short if count ≠ 0 and ZF=0
• CALL: Call instruction is a returnable jump to the destination or the target
address provided in the instruction. This instruction can be used to
execute two different types of calls:
Near call—A call to a procedure within the current code segment (the
segment currently pointed to by the CS register), sometimes referred to as
an intrasegment call.
Far call—A call to a procedure located in a different segment than the
current code segment, sometimes referred to as an intersegment call.
Near Call: When executing a near call, the processor pushes the value of
the IP register (which contains the offset of the instruction following the
CALL instruction) onto the stack (for use later as a return-instruction
pointer). The processor then branches to the address in the current code
52
segment specified with the target operand. The target operand specifies
either an absolute offset in the code segment (that is an offset from the
base of the code segment) or a relative offset (a signed displacement
relative to the current value of the instruction pointer in the IP register,
which points to the instruction following the CALL instruction). The CS
register is not changed on near calls.
For a near call, an absolute offset is specified indirectly in a general-
purpose register or a memory location (r/m16 ). Absolute offsets are
loaded directly into the IP register. (When accessing an absolute offset
indirectly using the stack pointer [SP] as a base register, the base value
used is the value of the SP before the instruction executes.)
Far Calls: When executing a far call, the processor pushes the current
value of both the CS and IP registers onto the stack for use as a return-
instruction pointer. The processor then performs a “far branch” to the code
segment and offset specified with the target operand for the called
procedure. Here the target operand specifies an absolute far address either
directly with a pointer (ptr16:16) or indirectly with a memory location
(m16:16 ). With the pointer method, the segment and offset of the called
procedure is encoded in the instruction, using a 4-byte far address
immediate. With the indirect method, the target operand specifies a
memory location that contains a 4-byte far address. The operand-size
attribute determines the size of the offset (16 or 32 bits) in the far address.
The far address is loaded directly into the CS and EIP registers. If the
operand-size attribute is 16, the upper two bytes of the EIP register are
cleared.
Exercises: In the debug, check and see how the following instructions are
machine coded on unassembling:
CALL 156
CALL SHORT 156
CALL 6789
CALL BX
CALL SHORT BX
CALL [BX]
CALL NEAR [BX]
CALL SHORT [BX]
CALL FAR [BX]
CALL FAR 1234:5678
CALL SP
CALL [AX]
That will give you a fair idea of the machine codes used, as well as of the
different modes of call instructions. However, by far the most common
method used for call is by directly giving the address of the procedure in
the instruction. In ALP (assembly language programming) this is done by
using the label name used for the procedure or subroutine.
53
• RET: The RET (return) instruction returns control back from the
procedure to the program that has called the procedure. The control will
be returned to the instruction following the procedure call.
This instruction transfers the program control to a return address located
on the top of the stack. The address is usually placed on the stack by a
CALL instruction, and the return is made to the instruction that follows
the CALL instruction. The optional source operand specifies the number
of stack bytes to be released after the return address is popped; the default
is none. This operand can be used to release parameters from the stack that
were passed to the called procedure and are no longer needed.
Exercises: Study following RET instructions in the debug and see their
machine codes.
RET
RET NEAR
RETF ; this stands for return far
RET 120
RETF 120
• INTn, INTO and INT 3: These instructions are software interrupt
procedure calls. Software interrupts are special procedures that can be
invoked or called using an 8-bit number, known as the interrupt number.
Many system services are rendered using software interrupts. The
interrupt invoked procedures are normally known as interrupt service
routines. I/O devices also can obtain system services using these calls.
They first get the attention of the processor by activating the interrupt pin
of the processor. When the pin is activated, the processor goes through a
sequence of operations to which the interrupting I/O device responds by
inputting an 8-bit number n. The processor then invokes the service
routine for INT n. This process is known as the hardware interrupt
operation. Once an interrupt is invoked, the processor pushes the FLAGS,
the CS and the IP value (corresponding to the instruction immediately
following the interrupt call). With this, the processor is ready to accept a
returnable far jump (new values in CS and IP, returnable because the old
values of CS and IP are stored in the stack along with the old flags). The
destination operand n in the instruction specifies an interrupt vector
number from 0 to 255, encoded as an 8-bit unsigned intermediate value.
Each interrupt vector number n provides an index to a 4-byte array – the
interrupt vector array – storing the far call address associated with the
particular n. In all, there is provision for 256 interrupts, and with each
interrupt having a 4 byte address (far call address CS and IP), the interrupt
vector array is placed in the lowest 1 KB of the memory space. The first
32 interrupt vector numbers are reserved by Intel for system use. Some of
these interrupts are used for internally generated exceptions.
The INT n instruction is the general mnemonic for executing a software-
generated call to an interrupt handler, with the vector number n.
The INTO instruction is a special mnemonic for calling overflow
exception, interrupt vector number 4. The overflow interrupt checks the
54
OF flag in the FLAGS register and calls the overflow interrupt handler,
that is, the interrupt with a vector number 4, if the OF flag is set to 1.
The INT 3 instruction generates a special one byte opcode (CC) that is
intended for calling the debug exception handler. This one byte form is
valuable because it can be used to replace the first byte of any instruction
with a breakpoint, including other one byte instructions, without over-
writing other code.
Exercise: 1. Unassemble the following instructions in the debug:
INTO
INT 3
INT 4
INT 73
2. Although INTO and INT 4 appear to be the same, INTO is a
conditional execution of vector 4 interrupt based on the overflow flag, but
INT 4 is unconditional software interrupt at vector 4, occurring even
without there being an overflow, as seen in the following debug
experiment. The experiment is done without the OF being reset, and as
can be seen in the execution, INT 4 actually branches to the interrupt
routine, while INTO does not. After setting the OF, executing INTO
invokes the interrupt at vector 4.
-a
1377:0100 int 4
1377:0102 into
1377:0103
-d 0000:0010 0013
;the data below shows the interrupt 4 vector
0000:0010 8B 01 70 00 ..p.
-r
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 CD04 INT 04
-t
IP 018B
:102
-rcs
CS 0070
:1377 ; get back to cs:ip = original 1377:102 (to INTO instruction)
-r
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0102 NV UP DI PL NZ NA PO NC
1377:0102 CE INTO
-t
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFE8 BP=0000 SI=0000 DI=0000
55
DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP DI PL NZ NA PO NC
1377:0103 0000 ADD [BX+SI],AL ; INTO has not occurred, as
;OF is not set, NV = no o’flow.
-rip
IP 0103
:102 ; again get back to INTO instruction
-rf
NV UP DI PL NZ NA PO NC – ov ; set overflow flag
-r
Exercise: check if the codes CD 03 and CC, both standing for interrupt 3
have any difference in execution. [Hint: the code CC is only a
convenience for debug purposes, for break point provision]
• IRET: Return from interrupt: the IRET instruction performs a far return to
the interrupted program or procedure. During this operation, the processor
pops the return instruction pointer, return code segment selector, and
FLAGS image from the stack to the IP, CS, and FLAGS registers,
respectively, and then resumes execution of the interrupted program or
procedure.
Exercise: Why should the flag be saved at entry to the interrupt service
routine and why should it be retrieved on return? What about other
registers used by the interrupt routine? How are their integrity maintained
on return? [Hint: It is the responsibility of the interrupt program to return
them intact]
Why are there no instructions like IRET NEAR, IRETF or IRET n? [Hint:
consider hardware interrupts by I/O devices]
56
upward direction or downward direction (SI, DI increasing or decreasing by 2 or
1, depending on word or byte operation). When the direction flag D is 0, the
upward direction (address increasing) is taken. When it is 1, the downward
direction (address decreasing) is taken for address modification. The direction
flag can be controlled by the instruction CLD (clear direction flag D) or STD (set
direction flag D). These instructions can be used with REP prefix to repeat a
certain number of times as per the array length. The array length should be in CX
before invoking the REP action.
• MOVSB, MOVSW: String byte, string word move. It is only in these
two instructions and the two CMPS instructions discussed next, that we
can have both source and destination operands in memory, although
implicitly specified. All other instructions will have at least one operand
specified by a register. The move here is the command to enable to copy
(a string of bytes or a string of words) like move elsewhere. These
instructions move the byte or word, specified with the second operand
(source operand) to the location specified with the first operand
(destination operand). Both the source and destination operands are
located in memory. The address of the source operand is read from the
DS:SI registers. With segment override prefix ES: the source is ES:SI.
The address of the destination operand is always from memory at ES:DI.
Note that these operands are not explicitly mentioned in the instruction,
but implied. The instructions are just MOVSB, MOVSW all by
themselves. After the data move, the addresses in SI and DI are
appropriately modified depending on the D flag and on whether it is a byte
or a word move. What makes it necessary to use up or down addressing?
Study the debug experiment below.
-e 250
;enter the source array values at address DS:250 onwards
1377:0250 74.1 03.2 E9.3 6A.4 FF.5 B8.
; source array is: 01 02 03 04 05
-a ; assemble at 100
1377:0100 mov cx,5 ; number of bytes to be transferred
-r
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 B90500 MOV CX,0005 ;note, es and ds are both same.
-t8
AX=0000 BX=0000 CX=0005 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0103 NV UP EI PL NZ NA PO NC
1377:0103 BE5002 MOV SI,0250
57
AX=0000 BX=0000 CX=0005 DX=0000 SP=FFEE BP=0000 SI=0250 DI=0252
DS=1377 ES=1377 SS=1377 CS=1377 IP=0109 NV UP EI PL NZ NA PO NC
1377:0109 F3 REPZ
1377:010A A4 MOVSB
58
compared, even if CX has not reached 0. It may be noted that REP and
REPZ or REPE have the same opcode. This means the REP, REPE,
REPZ prefixes can be used with MOVSB or MOVSW and when so used,
the zero flag will not be checked nor modified during execution, but the
instruction will continue repeating until CX becomes 0. When the REPZ
(REPE) or REPNZ (REPNE) prefixes are used with CMPSB or CMPSW,
the flag register is not changed to correspond to the result of
comparison, neither is the repeat action decided by the zero flag in the
flag register, but the result of comparison is directly used for deciding to
repeat comparing or not. However, at exit (may be because the condition
is not satisfied or because CX has reached zero), the result of comparison
from which the exit from the loop has occurred is seen in the zero flag.
The following assembled program and the tracing of its execution in the
debug, clearly brings out this fact.
It is also to be noted that only this cmps instruction and the scas
instruction discussed next that distinguish between the two, repz and
repnz prefixes. All other string instructions do not distinguish
between the two prefixes.
STUDY OF REP CMPSB INSTRUCTION
ASSEMBLY LANGUAGE PROGRAM:
data segment
59
13D6:0016 A6 CMPSB
13D6:0017 75F9 JNZ 0012
13D6:0019 EBEC JMP 0007
13D6:001B CD01 INT 01
-d 0 f
; Group 1 Group 2
13D5:0000 01 02 2D 04 05 06 07 08-01 02 03 04 05 06 07 08 ..-.............
-g 15 ; execute till CS:15
60
AX=13D5 BX=0000 CX=0001 DX=0000 SP=0000 BP=0000 SI=0007 DI=000F
DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0015 NV UP EI PL NZ NA PO NC
13D6:0015 F3 REPZ
13D6:0016 A6 CMPSB
AX=13D5 BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=0008 DI=0010
DS=13D5 ES=13D5 SS=13D5 CS=13D6 IP=0017 NV UP EI PL ZR NA PE NC
13D6:0017 75F9 JNZ 0012; second exit from loop (CX= 0)
; Note result from the comparison is put in the zero flag only at exit from the
; REP loop in both loop exit situations. WHY? What is your conclusion from the
experiment?
-q ; quit
• SCASB, SCASW: Scan string byte, scan string word: This instruction is
the same as the earlier CMPSB, CMPSW instructions we saw in the
previous section, except for the fact that the source for comparison is the
register AL for SCASB, or AX for SCASW. The destination is the same,
namely, ES:DI, and on execution, DI will point to the next byte or word,
based on the instruction and the direction flag as in the earlier cases of
MOVES and CMPS instructions. With CX initialized to the length of the
destination array, and DI initialized to the array start address when D flag
is reset, or to the end address of the array when the D flag is set, we can
use the conditional instruction prefixes REPE (REPZ) or REPNE
(REPNZ). The repetition will then go on until the condition gets
contradicted or until the register CX reaches zero (that is, the destination
array is completed). The zero flag in the flag register is updated only
when the loop exits, exactly as we saw in case of the CMPS instruction.
• LODSB, LODSW: Load string byte, load string word: These instructions
are similar to MOVSB, MOVSW except the destination of the move
becomes AL for LODSB, and AX for LODSW. The source is DS:SI. On
execution the data will come to the register AL or AX, and SI will be
properly modified. The REP or REPE (REPZ) prefix may be used like in
the MOVES instructions.
The example below shows that the repeat prefix produces the same result
for this instruction whether it is used as REPE or as REPNE. (See
discussion in connection with the instruction CMPS)
THE .asm PROGRAM
code segment
assume cs:code
Start: mov ax, cs
mov ds, ax ; ds is made same as cs
mov cx,6
mov si, offset array
repe lodsb
mov cx, 6
repne lodsb
int 1
jmp start
array db 00,11h,22h,33h,44h,55h,66h,77h,88h,99h,0aah,0bbh
code ends
end start
61
-u 0 14
-d cs:15 20
-r
-g
Exercise: From the example shown, try to prove that as far as executing the
LODSW instruction is concerned, the prefix REPE behaves the same as the
prefix REPNE, although they are coded differently; Intel literature gives only
the prefix REPE for the purpose, which is the authentic coding for this repeat
operation here.
62
8. Flag Control Instructions: There are two types of instructions which control the
flags. The first type controls specific flags, like the C flag, the D flag and the I
flag. The other moves either the entire flag register or the lower significant byte
of the register. The details are given below:
• STC, CLC and CMC: These instructions control the carry flag in the
flag register. They stand for set carry, clear carry and complement carry.
No other flags are affected by these instructions.
STC operation: CF ← 1;
CLC operation: CF ← 0;
CMC operation: CF ← NOT CF.
• STD and CLD: These instructions control the direction flag in the flag
register. They stand for set and clear the direction flag. Other flags are
not affected by these instructions. The need for controlling the D flag is
already seen in connection with the string instructions.
STD operation: DF ← 1; enables string addresses to be decremented
CLD operation: DF ← 0; enables string addresses to be incremented
• STI and CLI: These instructions modify the Interrupt control flag in the
flag register. When this I flag is set, the processor is enabled to accept the
hardware interrupts. Otherwise when it is reset, the processor will not be
interrupted by activating the interrupt pin of the processor from the
external hardware. Software interrupts are not disabled by clearing the I
flag, as the debug experiment below shows. Other flags are not affected
by these instructions.
STI operation: IF ← 1; Hardware interrupts enabled.
CLI operation: IF ← 0; Hardware interrupts disabled.
The debug experiment:
-a ; assemble at 100 onwards
1377:0100 cli
1377:0101 int 20
1377:0103
-r
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 FA CLI
-t2
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0101 NV UP DI PL NZ NA PO NC
1377:0101 CD20 INT 20
63
• LAHF: Load flag register (lower byte) to register AH. This is flag
register move instruction
Description:
Moves the low byte of the EFLAGS register (which includes status flags SF, ZF, AF, PF,
and
CF) to the AH register. Reserved bits 1, 3, and 5 of the EFLAGS register are set in the
AH register as shown in the “Operation” section below.
Operation:
AH ← FLAGS (SF:ZF:0:AF:0:PF:1:CF);
Flags Affected:
None (that is, the state of the flags in the EFLAGS register is not affected).
• SAHF: Store the contents of AH register into the lower byte of Flag
register.
Description:
Loads the SF, ZF, AF, PF, and CF flags of the FLAGS register with values from the
corresponding bits in the AH register (bits 7, 6, 4, 2, and 0, respectively). Bits 1, 3, and 5
of register AH are ignored; the corresponding reserved bits (1, 3, and 5) in the FLAGS
register remain as shown in the “Operation” section below.
Operation:
FLAGS (SF: ZF: 0: AF: 0: PF: 1: CF) ← AH;
Flags Affected:
The SF, ZF, AF, PF, and CF flags are loaded with values from the AH register. Bits 1, 3,
and 5 of the EFLAGS register are unaffected, with the values remaining 1, 0, and 0,
respectively.
• PUSHF and POPF: These instructions have already been discussed in
connection with data transfer instructions (classified under type 1
instructions). It may be noted here that when we pop from the stack into
the FLAG register, only the bits that represent the flags will be transferred,
but the other bits (marked as don’t cares in the description of Flag register
will not be altered). The debug program below indicates this feature of the
flag register.
-a
1377:0100 pushf
; flag register stack top
1377:0101 pop ax
; stack top AX
1377:0102 xor ax,f02a
; the non-flag bits are complemented in AX
1377:0105 push ax ; modified AX stack top
1377:0107 pushf
; the modified flag register stack top
1377:0108 popax
; and thence to the register AX
1377:0109
-u 100 108
64
1377:0100 9C PUSHF
1377:0101 58 POP AX
1377:0102 352AF0 XOR AX,F02A
1377:0105 50 PUSH AX
1377:0106 9D POPF
1377:0107 9C PUSHF
1377:0108 58 POP AX
-r
-t7
; Note: Non-flag bits, specially, the bits of the M S nibble of the flag
;reg were later used for identifying different x86 series of processors.
;what we see in the MS nibble of flag register here is not 1111 which is
;the ID for 8086. Here, I have a Pentium mobile processor operating in
;the real mode. Hence the nibble here is an unchangeable 0011 or 3 hex.
-q
65
we have to use one of these 8 registers as an intermediate register. MOV AX, DS
followed by MOV ES, AX will be a valid operation for copying DS in ES.
• LDS and LES: LDS stands for Load DS and an address register indicated
as the first operand in the instruction. LES stands for load ES and an
address register indicated as the first operand for the instruction. The
second operand is a memory pointer, where the far address of 4 bytes is
stored. The following are examples of valid instructions.
LDS SI, [BX +1234]
LES SP, [1234]; It may be noted this instruction is only given to
; indicate a very rare possibility, but may not be
; normally having any serious use other than for
; simultaneously loading both the registers ES
; and SP with a single instruction, provided the
; memory data is suitably manipulated for this
; requirement.
• CS:, DS:, ES: and SS: : These are segment override prefixes which we
have already seen.
66
• NOP: No operation: This is a one byte instruction doing nothing except
incrementing the IP by 1. It can also be coded in the assembly language as
XCHG AX, AX, which also does nothing really. The machine code for
XCHG AX, AX and NOP are the same; both are 90 H.
• HLT: Go to the HALT state by stopping the cyclic fetch and execute
operations of the processor. The processor will remain in this HALT state
until this state is interrupted by a hardware activation (taken to logic high)
on one of its pins: INTR (interrupt request), NMI (non maskable interrupt)
or RESET.
• WAIT or FWAIT: Wait for Test signal or hardware interrupt – INTR,
NMI or RESET. This instruction takes the processor to an idle state until
one of the following happens: 1. The TEST pin of the processor goes low,
or 2. INTR, NMI or RESET goes logic high. With test pin going low, the
processor comes out of the wait state and proceeds normally. In case of
accepting the interrupt, the processor executes the interrupt service routine
and returns to the wait state again. The return address pushed to the stack
is the address of the wait instruction itself, and not of the following
instruction. This instruction is used for synchronizing the math or I/O
coprocessor operations with the 8086.
• LOCK prefix: This instruction prefix is useful when there are instructions
which do both read and write operation at a memory address while
executing a single instruction. This happens when a memory address is
used as the destination operand of an instruction; instructions like XCHG
AX, [BX], INC or DEC a memory data etc. These instructions cause a
memory read first at the destination register in the beginning, and at the
end of execution of the instruction, there will be a memory write, writing
back the result to the same memory location. When 8086 is used in a
multi processor, parallel processing environment, it will be necessary to
have these read and write operations to follow successively without
allowing other processors to use the common data transfer bus. The bus is
then supposed to be locked to the processor for the duration of the read
followed by the write operation, in fact for the duration of the execution of
the entire instruction. What the instruction actually does is to activate
(drive to a logic low voltage) a processor signal at the LOCK pin of the
processor, for the entire duration of the execution of the instruction
prefixed by the LOCK prefix. The system bus should be designed to
ensure that the data transfer bus – DTB – (the DTB consists of the lines
handling the data, address and read/ write and other control lines
associated with the transfer of data between the processor and other
system units like memory etc.) control cannot be taken up by any other
processor as long as the LOCK# (the # symbol is used to indicate an active
low signal) remains activated. For example, if we have the instruction,
LOCK INC [BX], the LOCK# pin of the processor will remain active all
through the execution of this instruction. That is, no other processor will
be able to access and control the DTB during the period of the reading of
67
the original data at the memory location at DS:BX, and subsequent write
back of the incremented value to the same memory location, while
executing this LOCK prefixed INC instruction. We can only say here that
parallel processor systems do require this type of control.
Exercise: Find out and list the type of instruction that can take a LOCK
prefix. We have already indicated the instructions XCHG and INC/ DEC
type. List other instruction types if any.
In this chapter, we have studied in detail, the instruction set of 8086. The
instruction set of all the advanced Intel processors of the IA-32 Architecture, namely,
80x86 and the Pentium processors are all supersets of this basic set; any assembly
language program (ALP) written using the instructions we have studied here, can
normally be executed in these advanced processors. That is why it is very necessary to
understand this instruction set very well if we are to work on these processors at the
assembly level. In this chapter, the 8086 instruction set is studied to a sufficient depth,
with examples of actual programs in the debug and .asm environments. This study is
made in my system, which uses a Pentium mobile processor, working in the real 8086
mode. Advantages of assembly level working, we have already seen. Later chapters will
give examples of ALPs at a serious level.
EXERCISES
1. Use the DEBUG to study the following instructions after loading different
segment addresses in CS, DS, ES and SS: (i) mov [si], 58 followed by mov ax,
[si]; (ii) stosw; (iii) mov [di], 54 (iv) mov ax, 24; (v) add ax, 38; (vi) daa;
(vii) mov ax, [bp]; (viii) try another 5 instructions of your choice.
2. Write a program directly in the DEBUG to manipulate the unsigned data
available in the registers ax, bx and cx so that ax has the largest of the three and
cx has the smallest. Check the working of the program. What will be the
modification required to the program if the data are considered as signed
numbers?
3. Write a program directly in the DEBUG to shift the 32-bit data in registers
dx:ax by one bit to the left. Check the working of the program.
4. Repeat the question 3 with these modifications: (i) one bit shift to the right;
(ii) one bit rotate to the right; (iii) one bit rotate through carry to the right
and (iv) one bit arithmetic shift to the right.
-rdx
DX 0000
:eeee ; and dx with EEEE hex; 32-bit data is now EEEE7777 hex
68
137B:0100 shl ax, 1
137B:0102 rcl dx, 1 ; watch carefully the two instructions used.
137B:0104 ; program over.
-t
AX=EEEE BX=0000 CX=0000 DX=EEEE SP=FFEE BP=0000 SI=0000 DI=0000
DS=137B ES=137B SS=137B CS=137B IP=0102 OV UP EI NG NZ NA PE NC
137B:0102 D1D2 RCL DX,1
-t
AX=EEEE BX=0000 CX=0000 DX=DDDC SP=FFEE BP=0000 SI=0000 DI=0000
DS=137B ES=137B SS=137B CS=137B IP=0104 NV UP EI NG NZ NA PE CY
137B:0104 0000 ADD [BX+SI],AL DS:0000=CD
69
3. PROGRAMMING BASICS
What is it that we are looking for in a program; when can we say, a program is
good? We should answer this question first, before attempting to write programs. We
could put forth a few criteria for a good program. First, it must solve the given problem
completely and for all sets of data. Sometimes the input to the program may not be a
valid data, like an input of 0 for the divisor in a division program. In such cases, the
program should exit giving an indication of the data invalidity. A good program must be
easy to follow. Above all, it should be efficient in resource or register usage, efficient in
terms of time of execution and efficient in terms of memory usage. In the context of
continuously increasing memory and resource availability, it may look like the most
important entity to be economized is the time of execution of the program. However,
over indulgence in time optimization at the cost of simplicity may not be worthwhile.
The way things are evolving, most of the features including the speed of systems are
continuously improving and in this environment, a good program can be the one which is
easier to understand and modify if needed, by any programmer with average expertise.
This implies simplicity of the program may be a primary concern, more important than
resource usage or the time taken for execution.
The following sets of alternative programs illustrate that even a simple function
may be achieved in so many ways. Consider we wish to round off an eight bit number to
only seven significant bits, that is, if the last bit is 0, we leave it unaltered, but if it is a 1,
we increment the number so that the number will become the next number,
approximately equal to the original eight bit number to seven bit accuracy. The
following programs consider the number in AL register at input as well as after
modification. They also do not use any other registers. Four possibilities (among
several) are given below.
69
Alternative 3: INC AL
AND AL, 0FE h; kill the last bit after incrementing
Alternative 4: TEST AL, 01; Test does not destroy the data tested
JZ DOWN
INC AL
DOWN:
The fact, that even such a simple operation has so many possible ways of
programming, indicates that programming is something where different individuals may
come up with different versions for doing the same job. The programming language is
thus quite flexible, almost similar to our normal languages like English.
We shall now take a little more serious problem, and see how we can program it
in different styles, with different levels of goodness, or efficiency.
The problem we take is a 4-digit HEX to 5-digit BCD conversion. BCD to hex
and hex to BCD conversions are useful in many situations. We understand BCD or
decimal numbers better, but the processor is more at home with hex (actually binary, but
binary is practically same as hex and we think of it as hex, for hex is more compact
compared to binary). Because of this, at human-machine interaction level, this
conversion from hex to BCD as well as BCD to hex will be necessary to make the
systems user-friendly. So this program has a serious application.
Basics of number Base Conversions: There are two basic methods of number
conversion from one base to another. The first method consists of separating the digits of
the given number first, then multiplying the digits with the powers of the base and
adding. Suppose we want to convert a hexadecimal number 12A to decimal. What we
can do is to separate the digits 1, 2, and A, and multiply in decimal, each of these digits
with the appropriate powers of 16 and add the results in decimal. Accordingly digit 1 is
multiplied by 162 = 256 decimal to get 256, digit 2 is multiplied by 16 decimal to get 32
decimal, and the digit A, which is 10 decimal is multiplied by 1 to get 10. All these are
now decimal added 256 + 32 + 10 to give the value 298 decimal for the number 12A hex.
Horner’s rule can be used to simplify these calculations: 12A hex = (1*16 +
2)*16 + 10 decimal
= 18*16 + 10
= 288 + 10 = 298 decimal.
Note, in this method all calculations (multiplications and additions) are to be in decimal.
70
An alternative method to do this is to divide the hexadecimal number by 10 decimal, that
is by 0A hex (using completely hex as the base of computation) successively to get the
decimal digits as remainders every time and then putting these digits in proper sequence.
According to this method: we have, 0A)12A
using hex computation: 0A)1D – 8↑
2 – 9|
Here we are using the hexadecimal calculation to separate the decimal digits from the
given number and then we can assemble the digits properly. In our example as shown by
the division above, we see the decimal equivalent of 12A as 2-9-8 digit wise, which,
assembled, gives the decimal number 298. The calculations done here are all in
hexadecimal to get the digits and then it is only a question of assembling the digits
properly. We, being conversant with decimal calculations, will find the first method
(method with calculations in decimal) more convenient, but in computers it is always the
method using hexadecimal computations, that is, the second method, here, of separating
the digits by hexadecimal division which is simpler to use. If we want to convert BCD
to hex, we would find decimal division successively by 16, to separate the digits to be
more convenient, while in the computer, multiplying the decimal digits by powers 0A in
the hexadecimal system and adding the hex results in the hex base would be convenient.
While going from an arbitrary base to another arbitrary base, we may find it convenient
to go via decimal system using decimal computations, and in the computers it will be
convenient to go through hex system using hex calculations.
Exercise: Convert the number AB5 in base 13 to its equivalent in base 12, as
decimal- system-using people would do it, and also as hex-system-using computer would
do it.
[hint: decimal: AB5 in base 13 = (10*13 + 11)*13 + 5 = 1838 decimal = 12)1838
Hence the result: AB5 in base 13 = 1092 in base 12 12)153 – 2↑
12)12 – 9 |
1–0|
In hexadecimal computation: AB5 = (A*D + B)*D + 5 = 72E hex = C)72E
Hence the result: AB5 in base D = 1092 in base C C)99 – 2↑
C) C – 9 |
1–0| ]
With this background, we shall now study different programs in different styles
for the conversion of 4-digit hex to 5-digit BCD.
Programming Style 1: The first style we use is the simple minded approach to
successively divide the hex number by 0A hex and assemble properly, the different
decimal digits that we get. We continuously use word sized division, although some
divisions could be byte size (we are simple minded in this respect). The method,
involving hex operations, is ideally suited for a binary computer. We will consider the
number to be originally in register AX in hexadecimal, and we want to get the output 5
digits in DX:AX; the most significant digit in DX and the rest of the digits in AX. We
will use one register to store each digit, as we have adequate registers. CX, we use for
storing the divisor 0A initially and different shift counts (required for positioning the
71
digits properly) later. Registers BX, SI and DI are used to store three of the digits, while
the last two of the digits happen to be in DX and AX, where we would finally need them.
We would need no further registers. The program and its execution are given below.
The program is simple and self explanatory.
The style1.asm program
-u 0 30
72
13D6:0019 92 XCHG DX,AX
13D6:001A B90C00 MOV CX,000C
13D6:001D D3E0 SHL AX,CL
13D6:001F B90800 MOV CX,0008
13D6:0022 D3E7 SHL DI,CL
13D6:0024 B90400 MOV CX,0004
13D6:0027 D3E6 SHL SI,CL
13D6:0029 03C6 ADD AX,SI
13D6:002B 03C7 ADD AX,DI
13D6:002D 03C3 ADD AX,BX
13D6:002F CD01 INT 01
(ii) Testing of the program with data FFEF hex = 65519 decimal
-rax
AX 0000
:ffef
; initialize ax with the hex data FFEF
-r
; diplay intial register contents
AX=FFEF BX=0000 CX=0031 DX=1234 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0000 NV UP EI PL NZ NA PO NC
13D6:0000 2BD2 SUB DX,DX ;execute the 1st instn.
-t16
; trace 16 hex (that is 22 decimal) instructions.
73
AX=0041 BX=0009 CX=000A DX=0005 SP=0000 BP=0000 SI=0001 DI=0005
DS=13C6 ES=13C6 SS=13D6 CS=13D6 IP=0015 NV UP EI PL ZR NA PE NC
13D6:0015 2BD2 SUB DX,DX ;11th
-q
Review and Comments on the style of the above Program: As already stated,
the program is simple. The only attempt at economic register management is seen in
74
making a single register choice of CX register, initially for the devisor store, and later,
after the digit separation, for the shift count store. For shift count store, no other register
will be useful. However, for storing the divisor 0A another register, BP, could have been
used, which means the demands of the program are much less compared to the register
resources available. The operations performed are mindlessly repeated as many times as
required without any attempts to optimize. Firstly, digit separation using word division,
and then positioning the words for the final assembly. The data is handled throughout in
terms of words, while byte handling at places could have simplified the operation. The
style reminds me of the children’s story style, with repetitions of identical stuff many
times. It is tolerable perhaps in a beginner’s program.
codeseg segment
assume cs:codeseg
begin: mov cx,0ah ; decimal 10
sub dx,dx ; preparing for word division
div cx
mov bx,dx ; 000d0 bx
sub dx,dx ; for word division again
div cx ; 00 dh, 0d1 dl
div cl ; byte division now
xchg bh, ah ; 0d20d0 bx ;00 ah for next byte division
div cl ; 0d30d4 ax
xchg dl, al ; 000d4 dx; 0d30d1 ax
mov cl, 04
shl ax,cl ; d30d10 ax
add ax,bx ; d3d2d1d0 ax; dx already has 000d4
; hence the result is ready
int 01
codeseg ends
end begin
-u 0 1b
75
13D5:0011 F6F1 DIV CL
13D5:0013 86D0 XCHG DL,AL
13D5:0015 B104 MOV CL,04
13D5:0017 D3E0 SHL AX,CL
13D5:0019 03C3 ADD AX,BX
13D5:001B CD01 INT 01
(ii) Program executed for the data ABCD hex = 43981 decimal
-rax
AX 0000
:abcd
-r
76
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0017 NV UP EI PL ZR NA PE NC
13D5:0017 D3E0 SHL AX,CL
Review and Comments on Style 2: In the program of Style 2, we see that even
though the same algorithm is used, the resource management has been very finely tuned
to the problem at hand. Perhaps in this program, it may not be possible to alter a single
instruction (and of course alter the rest of the program, if necessary, to give proper result)
without altering the efficiency of the program. It is something like a good piece of
poetry. Good poetry (like the Elegy Written in a Country Churchyard, by Thomas Gray)
they say, is such that a single word in the writing cannot be replaced by an alternative
word, without somehow degrading the quality of the writing. I call this style therefore,
the poetry style, and it is this style which good programmers normally try to develop.
There are a large number of variations possible, in the programming domain, between the
simple style 1 and the style 2, to suit the taste and capabilities of any programmer. An
example of such an intermediate type of program is given below without comments and
without a demonstration of its working, for the purpose of your study. As one
approaches the reasonably perfect program, it becomes more and more difficult to
improve on the program, till at last one comes to a point where one thinks further
improvement is not worth the trouble. That will be the style 2 program. Below I give a
program which is only partially optimized, with a style between the styles 1 and 2. The
program is given without comments for your study.
A Program with a style between Style 1 and Style 2 for hex-to-BCD conversion.
77
end strt
Programming Style 3, extracting Full Power from the Instruction Set: This is
a very complex style of programming wherein one tries to exploit as much as possible,
the raw power of the processor instructions and capabilities. Properly exploited, this
method would provide the best possible program for a given job. May be, this requires a
little thinking in what is sometimes called ‘out of the box’ fashion. It is not worth
wasting time on this, as it is a sort of creative type of activity, where there is no guarantee
of a solution. If you get it, you get it, else, you don’t; so leave it at that. In our chosen
example, we still follow the same method of digit separation and positioning, but we do it
in a slightly more efficient fashion. The following is the program:
Style3.asm
code_here segment
assume cs:code_here
star: mov cx, 100; 100 decimal = 64 hex
sub dx, dx ; prepare for word division
div cx ; 00 DH and hex eq. of d1d0 BCD DL as hex
div cl ; 0d4 AL and hex eq. of d3d2 BCD AH
xchg dl,al ; 000d4 DX and hex of d1d0 AL; (hex of d3d2 BCD AH)
mov bl, ah ; hex of d3d2 BL
aam ; 0d10d0 AH
xchg bx,ax ; 0d10d0 BX; hex of d3d2 AL
aam ; 0d30d2 AX
xchg al,bh ; 0d30d1 AX; 0d20d0 BX
rol ax, cl ; watch this! (CL=64; but effective rotation is only 4).
; d30d10 AX
add ax, bx ; d3d2d1d0 AX, DX already has 000d4; all set to finish
int 01 ; finish
code_here ends
end star
-u 0 18
78
13D5:0018 CD01 INT 01
-rax
AX 0000
:afbe
-r
79
AX=4990 BX=0900 CX=0064 DX=0004 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0018 NV UP EI PL NZ NA PE NC
13D5:0018 CD01 INT 01
-q
Review and Comments on Style 3: This style 3 may not always be available as
pointed out already. But when available and properly applied, it can give the best results
by way of giving a very efficient program. It may be a little difficult to comprehend also,
and may require extensive commenting in the assembly program at every instruction
used. I would call this a power method or a creative method, requiring an extensive and
thorough knowledge of the instruction set. Normal programmers will do well not to
bother too much about this type of programming. This is also difficult to maintain and
modify if required.
Programming Style 4: This style makes use of an algorithm which is not suitable
for the processor on hand and is presented here as a style to be avoided. In terms of
literary activities, this style would correspond to using a method of presentation which is
not befitting the theme presented, like trying to write a big novel on a material suitable
only for a short story. Only consummate artists may perhaps do it effectively
successfully.
Many processors may not be fully geared to handle certain specific types of jobs.
The Intel 8086 for example, is not very efficient for handling computations in decimal. If
we make the 4-digit hex to 5-digit BCD conversion by decimal computations in this
processor, we will have to be using very circuitous methods. All that 8086 can do in
respect of multi digit decimal handling is to handle 2-digit decimal addition/ subtraction.
The hex-to-BCD conversion can still be done and here is a way of doing it. But I repeat,
the method becomes quite complex and wasteful of resources and is recommended to be
avoided.
In the program given, the most significant digit is still computed using subtraction
for simplicity, and the remaining 4 digits, whose hex value can at most be 270F (=9999
decimal) are found by decimal computation (see discussion at the beginning of this
chapter). The method consists of finding the place value of each bit in decimal, and
adding it to the result number as a decimal number, if the corresponding bit is present in
the hex number to be converted. To give a small example, if we want to calculate the
decimal value of 10111 binary, we calculate the weights of each bit in decimal, b4 = 16
decimal, b3 = 8 decimal, b2= 4 decimal, b1 = 2 decimal and b0 = 1 decimal. In the given
number, b3 is absent, so the decimal value of the number is (16+4+2+1); addition to be
done in the decimal system, and it works out to 23 decimal. The program is given below,
with a test demo. Remember, we are not even using Horner’s rule here.
Program Style4.asm
co segment
assume cs:co
star: mov si, -1; in si we want to get the digit d4 using successive
; subtraction of 10000 (dec) from the given
number
80
mov bx, 10000
back: inc si
sub ax, bx
jnc back
add ax, bx; the m s digit d4 is now in SI; remaining 14-bit no. in ax
mov cx, 14; loop count
mov di, ax; the remaining 14-bit number to DI for bit checking
sub bx, bx; bx is where we are adding the decimal numbers
; which will give us the final result (along with SI)
mov dx, 1 ; in dx we have the weight of the current bit in decimal.
jmp down
loopst: mov al, dl
add al,al
daa
mov dl, al
mov al, dh
adc al,al
daa
mov dh, al ; decimal doubling of DX contents
down: shr di,1 ;
jnc loopend
mov al,bl
add al,dl
daa
mov bl, al
mov al, bh
adc al, dh
daa
mov bh, al ; decimal adding of DX to BX
loopend: or di, di ; check for any data in di
loopnz loopst; loop termination if di = 0, or count = 14.
mov ax, bx
mov dx, si
int 01
co ends
end star
-u 0 42
81
13D5:0030 27 DAA
13D5:0031 8AD8 MOV BL,AL
13D5:0033 8AC7 MOV AL,BH
13D5:0035 12C6 ADC AL,DH
13D5:0037 27 DAA
13D5:0038 8AF8 MOV BH,AL
13D5:003A 0BFF OR DI,DI
13D5:003C E0DC LOOPNZ 001A
13D5:003E 8BC3 MOV AX,BX
13D5:0040 8BD6 MOV DX,SI
13D5:0042 CD01 INT 01
-rax
AX 0000
:abcd
-r
82
AX=369D BX=2710 CX=0044 DX=0000 SP=0000 BP=0000 SI=0002 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0006 NV UP EI PL NZ NA PO NC
13D5:0006 46 INC SI
83
AX=0F01 BX=0000 CX=000E DX=0001 SP=0000 BP=0000 SI=0004 DI=07C6
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0030 NV UP EI PL NZ NA PO NC
13D5:0030 27 DAA
Review and Comments on Style 4: The method used in this program involves
digit separation and summation of the place values of the binary digits in decimal terms.
We are using 4-digit BCD summation, while the processor provides only two digit
summation and that too using two instructions. One of the data is to be in AL for the 2-
digit decimal add to be successful. A good part of the program, practically the entire loop
in the above program, is devoted to this double byte decimal addition processes. This is
the price paid for using algorithms, with not well supported computational techniques in a
processor system. In this case the price paid is in terms of the time of execution and
memory space required for the program, along with increased usage of register resources.
Notice the loop has a lot of instructions and all these instructions are executed 14 times
normally, which reflects heavily on the time of execution of the program. This program
could be simplified using Horner’s Rule for polynomial evaluation as shown in the ALP
below. The program is not explained, nor commented. These are left to the reader as
exercises.
84
co segment
assume cs:co
sr: mov cx,10000
mov dx,-1
back: inc dx
sub ax,cx
jnc back
add ax,cx
shl ax,1
shl ax,1
mov bx,ax
mov cx,14
sub ax,ax
star: shl bx,1
adc al,al
daa
xchg al,ah
adc al,al
daa
xchg al,ah
loop star
int 1
co ends
end sr
85
; macro over, we will now use it in the program
start: mov si,0ah ; divisor 10
sub bx, bx ; initialize bx to zero.
sep 00 ; macro used with the shift count 00 to be loaded in CL
sep 04 ; shift count 4 for the next digit and so on.
sep 08
sep 12
mov dx, ax ; the m.s.digit to dx.
mov ax, bx ; move the assembled 4 digits to ax.
int 01 ; terminate
code ends
end start
; Apart from the macro definition and the macro use, there are only four
; other instructions in the program.
; The assembled program is shown below from the debug. See the expanded macros!
-u 0 31
Review and Comments on Style 5: If style 1 made the program something like
a children’s story, the style 5 makes it even simpler; it reduces the program almost to a
child’s play! If the problem permits by having operations that repeat several times, this
style is very simple in respect of visualizing the operations and writing the program.
However time and memory optimization may not be there to the extent possible. In the
interest of repetition here, we have to use word division throughout and byte operations
cannot be used. The resulting time and memory economy will not be there. Further, in
the above program, the first use of the macro does unnecessary shift and add operations.
The whole macro could be replaced for the first time by: SUB DX, DX
DIV SI
MOV BX, DX
86
If that was done, the 2nd instruction of our program, namely SUB BX, BX will become
superfluous and could be omitted without harm. As a basis to be improved towards style
2, this sort of program is easy to write, and the assembled program gives us a starting
style 1 program for being taken towards style 2.
In the foregoing, we have seen a few programming styles. Style 3 is the best from
memory and time efficiency points of view, but is difficult for those with an average
ability in the use of the instruction set, to venture into. Styles 1 or 5 may form the basis
of our starting framework, to be kept at the back of our mind, and on the fly, using the
ideas of these styles, we could attempt actual program writing in style 2, which is perhaps
the normal assembly language programmer’s goal. In the initial analysis we will have to
find the algorithm that best suits the given problem and the processor system we have, so
as to avoid programming in style 4. Style 3 may be left out actually, as the gain from this
will be marginal, and it will not be worthwhile considering the effort to be put in, as well
as the depth and the breadth of the system knowledge required for the purpose. The
program written in this style is not good from maintainability point of view either. Any
modification or alteration to the program will be quite difficult. In this type of program, a
change at one point may produce unanticipated side effects elsewhere in the program
which may turn out to be very hard to catch and correct. Optimality and maintainability
are conflicting requirements many a time, and assembly language programmers are
advised not to use style 3 programs, but to settle for style 2 or style 5, with moderate
optimization and with adequate comments indicating the logic of the processes. We
don’t have experts available all the time to handle any modifications or alterations to the
program when required. The programs must be understandable, not only to the original
programmer, but also to any programmer with average expertise, at any time, in order for
it to be maintainable.
Although style 3 programs are not to be used for commercial purposes, from the point
of learning the art of programming, developing expertise and for getting a deeper
knowledge of the instruction set and the processor, they are perhaps the best.
Exercises:
1. Using the ideas presented in this chapter, write a program to convert 4-digit BCD
number in the AX register to 4-digit hex also output in the AX register.
2. Study the 6-digit hex to 8-digit BCD conversion program bincvt given last in Fig
10.35 of the Microprocessor book by Douglas Hall (2nd edition or 2nd revised
addition, TMH Publications) and identify the programming style used. Could
you think of a suitable style 1or style 2 programs in this context?
3. Given below is a Style 3 program, without comments, for converting 4-digit
BCD in AX, to its equivalent hex. The out put is in register AX itself. The
program uses BX and CX registers. Figure out the logic of the program and fill
in the comments.
co segment
assume cs:co
strt: mov bx,ax
and ax,0f0f0H
87
mov cl,2
shr ax,cl
sub bx,ax
shr ax,1
sub bx,ax
mov ah,bh
sub al,al
shr ax,1
sub bx,ax
shr ax,cl
sub bx,ax
inc cl
shr ax,cl
add ax,bx
int 01
co ends
end strt
4. Here is another optimized program for doing the 4-digit BCD to 4-digit hex
conversion, also given without comments. Test the program and reason out how
it works. The program enters with the BCD number in AX (uses just the two
registers CX and DX) and returns the hex result also in AX.
code segment
assume cs: code
strt: mov dx, ax
mov cx, 0a04h
and ax, 0f0f0h
sub dx, ax
rol ax, cl
mov cl, dl
mov dl, ah
mul ch
add al, dh
mov dh, ah
mul ch
add ax, dx
mov dl, ch
mul dx
mov ch, dh
add ax, cx
int 1
code ends
end strt
Compare the two programs given in the exercises 3 and 4 above. Both are
perhaps style 3 programs, however. Determine which one is the worst of the two.
You may note that the program of exercise 4 uses Horner’s rule. Observe the way in
which the registers are managed and the whole process is optimized in this program.
Copyright © 2008 K M Hebbar
88
4. MACROS AND SUBROUTINES
Macros and Subroutines normally appear to be doing similar type of jobs, namely,
avoiding writing the same string of instructions several times in a program. However,
there are quite a lot of differences between the two. We shall be looking into these
differences and then learning about the proper use of Macros and Subroutines (or
Procedures) in this chapter.
89
end strt
Macros:
N a m e Lines
DDD . . . . . . . . . . . . . . 5
Symbols:
90
15 Source Lines
25 Total Lines
10 Symbols
0 Warning Errors
0 Severe Errors
-u 0 10
co segment
assume cs:co
mox macro reg, n
mov reg,n ; macro defined here, seen only by the assembler
endm
begin: mov bx,20 ; the program starts from here with a normal instruction
91
mox ax, cs:[bx] ; note how the segment override is used
mox cl, 04 ; note how it is applicable to 8-bit regs also.
int 01
co ends
end begin
Note: direct instruction ‘mov ax, cs:[bx]’ will not be valid, and ‘cs: mox
ax, [bx]’ will also be not valid.
Note the reverse order of registers in the pop operation so that the register
sequence could become identical as parameters in both the pushreg and
popreg macros. Also note this type of operation cannot be got done using
subroutines because of stack unbalance.
4. The parameters of the macro are more flexible than those of the
subroutines: In the assembly language, the parameters of the subroutine
are passed using registers or through the stack. This makes the
parameters to be of a fixed size. The option of using either word or byte
92
size parameters is not normally available for subroutines, whereas, in
respect of macros any size that makes sense in an instruction is valid. In
the program given under Para 2 above, we see the invocation of the macro
mox at two places. At the first instance, the reg parameter is the register
AX, and the n parameter is the CS segment over ridden indirect
addressing through register BX, that is the data in the memory at address
CS:BX. In the next case of the invocation, the parameter reg is the 8-bit
register CL, while the parameter n is just the simple number 04. Such
wide flexibility is unthinkable in procedures. In section 7 of this chapter
and also in chapter 6 we will see processor opcodes can also be used as
parameters of the macro, which will mean a single macro can execute
different operations depending on the opcode parameter used at its
invocation.
6. Macros exist only in the ALP and not at the Machine Language level:
Having said all the 5 points above, we have to note a fundamental
difference between macros and subroutines. Macros exist only at the
assembly language level, while the subroutines are seen at the machine
language level also. This means there are hardware provisions for
handling the subroutines by way of storing the return address in the stack,
while at the machine language level there are no macros visible. Macros
are only short cuts at the assembly language level and are handled by the
assembler software, but do not appear as separate entities in the
executable machine language programs.
Normally, for small operations, it is common practice to write macros, while large
and complex operations repeated several times are handled through subroutines, as a
result of the point 6 indicated above. An example of a good macro for improving the
DIV instruction is given below:
Macro Smart-div: The divide instruction is rather restrictive as shown below.
Divide instruction takes a double size dividend (double word for word division, or double
byte for byte division) and a single size devisor to produce a single size quotient and a
single size remainder. It is not always possible to have the quotient limited to single size,
when dividing a double size dividend by a single size divisor. Whenever the quotient
size exceeds, the processor does not carry out the division, but simply gives an indication
of the divide overflow by producing an internal interrupt in the processor. This can be
93
taken care of by the programmer like we did in case of style 2, and elsewhere in Chapter
3. But that may not always be possible. Sometimes exact idea of the quotient size may
not be known beforehand. In such cases to ensure that the program does not get caught at
this point, it is possible to think of a macro which will carry out the division properly,
producing a double size quotient, instead of single size. The macro given here, follows
almost the same register allocation for division inputs, that is DS:AX for the dividend of
word division, or only AX for the dividend of the byte division. The same registers carry
the quotient after the division. The divisor is specified as a parameter for the macro. An
additional single size parameter is provided for the remainder and is specified in the
macro. The macro is defined below with examples of its use.
THE PROGRAM
code segment
assume cs:code
smdiv macro d1,d2, dv, rr;; d1d2:double size dividend, dv : divisor,
;; rr: remainder
local down
sub rr,rr ;; clear remainder register
cmp d1,dv
jb down ;; if below, only one div is enough, so go down
xchg rr, d1 ;; else, save d1 in rr and load 00 in d1
xchg rr, d2 ;; net result of the 2 instns:0 d1, d1 d2, d2 rr
div dv ;; first division, rem d1, quot. d2
xchg rr, d2 ;; m.s.quotient rr, l.s. part of dividend d2
down: div dv
xchg rr, d1
endm
; examples of smart divide macro use
start: mov dx, 0abch
mov ax, 1234h
mov bx, 45abh
smdiv dx,ax,bx,cx
mov bx, dabch
mov ax, 2345h
mov cl, 35
smdiv ah,al, cl, bl
int 01
code ends
end start
DEBUG OPERATIONS
-u 0 33
94
13D5:0023 3AE1 CMP AH,CL
13D5:0025 7208 JB 002F
13D5:0027 86DC XCHG BL,AH
13D5:0029 86D8 XCHG BL,AL ; 2nd expansion of smdiv
13D5:002B F6F1 DIV CL
13D5:002D 86D8 XCHG BL,AL
13D5:002F F6F1 DIV CL
13D5:0031 86DC XCHG BL,AH
13D5:0033 CD01 INT 01
-r
95
AX=2345 BX=DA00 CX=4423 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0023 NV UP EI PL ZR NA PE NC
13D5:0023 3AE1 CMP AH,CL
96
mov ah,ch
lbl: pop cx
endm
start: mov ax, 5632h
mov cl, 05
smmul cl
smmul cl
int 01
code ends
end start
DEBUGGING
-u 0 19
13D5:000F 51 PUSH CX
13D5:0010 50 PUSH AX
13D5:0011 F6E1 MUL CL
13D5:0013 59 POP CX
13D5:0014 7202 JB 0018
13D5:0016 8AE5 MOV AH,CH
13D5:0018 59 POP CX
13D5:0019 CD01 INT 01
-r
-g f
97
A macro to take in 4 BCD digits input from the keyboard, using the DOS
interrupt 21h, function1: Many times we need to take a 4 digit BCD number
from the key board. Here is a simple macro to do the job. It will take jus 4 BCD
digits from the key board ignoring the non- BCD keys. It can be improved by
making the “$” key as the terminating key from the key board, so that less than4-
digit numbers can be had, and also if we go wrong in making the entry, we can
reenter all 4 keys over again to get the correct number.
bcd4 macro reg ; register in which the number is to be
returned.
local again
xor bx, bx
mov cx, 0404h
mov ah, 1
again: int 21h
cmp al, 30h
jb again
cmp al, 39h
ja again
sub al, 30h
shl bx, cl
add bl, al
dec ch
jnz again
mov reg, bx
endm
98
bak: lodsw ;;
as ax, [si + bx] ;; note the manipulations here
stosw
loop bak
as cx, cx ;; note, cx=0 here
mov [di], cx
endm
;
strt: mov ax, data
mov ds, ax
mov es, ax
multas dat, daat, dat1, num, adc
multas dat, daat, dat2, num, sbb
int 1
code ends
end strt
-u 0 3b
-g
AX=B09D BX=000E CX=FFFF DX=0000 SP=0000 BP=0000 SI=000A DI=003A
DS=13DC ES=13DC SS=13DC CS=13E1 IP=003B NV UP EI NG NZ AC PE CY
13E1:003B 83EC08 SUB SP,+08
-d0 4f
13DC:0000 34 02 78 56 AB 89 04 76-C0 AB 00 00 00 00 00 00 4.xV...v........
13DC:0010 CD AB 48 23 53 02 89 45-23 FB 00 00 00 00 00 00 ..H#S..E#.......
13DC:0020 01 AE C0 79 FE 8B 8D BB-E3 A6 01 00 00 00 00 00 ...y............
13DC:0030 67 56 2F 33 58 87 7B 30-9D B0 FF FF 00 00 00 00 gV/3X.{0........
13DC:0040 05 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
99
8. Here is a beautiful example of the use of Macros for realizing hardware
functions and using them in an easily understandable way: The example below
shows a matrix keyboard as shown in the circuit, with a flow chart for a simple key
identification subroutine, along with a program following that flow chart. It should be
understood there are different methods possible for interpreting the keys, some with only
one key press permitted
at a time, and some with multiple keys permitted at a time. The program given is for a
simple interpretation of one key pressed at a time.
A Program to identify the key pressed to work on the above hardware, and
following the flowchart given: Note how the use of macros simplify the programming
; this subroutine keyid below, does the following:1. wait for clearing of all
; previous keys; 2. wait for a new key press. 3. Give a de bounce delay. 4. if
; the key is still remaining pressed, then identify and return with the key
; number in reg AH. The procedure assumes a hardware circuit as shown in the
; figure above, and a flow chart, also as shown above.
;
assume cs:code
code segment
anykey macro
mov al, 00 ;; all rows to be 0's
out dx, al ;; dx has address of port A of 8255
add dx,2
in al, dx ;; column in, through port B
sub dx,2
and al, 0Fh
cmp al, 0Fh
100
endm
;;
rows macro patn, num
mov ah, num
mov al, patn
out dx, al
add dx,2
in al, dx
sub dx,2
and al, 0Fh
cmp al, 0Fh
jnz colchk ;;key pressed in the row, so check columns
endm
;;
cols macro ;;find the column of the pressed key, and return.
local back
mov cx, 4
back: ror al,1
jnc found
dec ah
loop back
jmp err
endm
;;
;; debounce macro is a standard delay macro using cx as counter
;; (20n – 8) clocks of delay will be produced normally by this macro
debounce macro n
local lup
mov cx, n
lup: nop
loop lup
101
endm
strt:call keyfind
int 01
;
keyfind proc near
start: anykey
jnz start ; some key pressed, so wait for its release, else
; all previous keys are cleared, now look for fresh key.
again: anykey
jz again ; if no, try again. Else debounce
debounce 5000 ; will produce about 20msec delay on a 5 MHz clock.
rows 0Eh, 3
rows 0Dh, 7
rows 0Bh, 0Bh
rows 07, 0Fh
err: stc ; no rows have key pressed, indicate error thro’ CY flag.
found: ret
colchk: cols ; find the column having the key pressed & return
keyfind endp
code ends
end strt
The List file for the above program as obtained from the assembler MASM: note
how the macros are expanded.
;KEYID routine
; this subroutine keyid below, does the following:1. wait for clearing of all
; previous keys; 2. wait for a new key press. 3. Give a debounce delay. 4. if
; the key is still remaining pressed, then identify and return with the key
; number in reg AH. The procedure assumes a hardware circuit as shown in the
; figure above, and a flow chart, also as shown above.
;
assume cs:code
0000 code segment
anykey macro
mov al, 00 ;; all rows to be 0's
out dx, al ;; dx has address of port A of 8255
add dx,2
in al, dx ;; column in, through port B
sub dx,2
and al, 0Fh
cmp al, 0Fh
endm
;;
rows macro patn, num
mov ah, num
mov al, patn
out dx, al
add dx,2
in al, dx
sub dx,2
and al, 0Fh
cmp al, 0Fh
jnz colchk;;key pressed in the row, so check
;;columns
endm
102
;;
cols macro
local back
mov cx, 4
back: ror al,1
jnc found
dec ah
loop back
jmp err
endm
;;
;; debounce macro is a standard delay macro using cx as counter
;; now the subroutine using these macros
;
debounce macro n
local lup
mov cx, n
lup: nop
loop lup
endm
0000 E8 0005 R strt:call keyfind
0003 CD 01 int 01
;
0005 keyfind proc near
0005 start: anykey
0005 B0 00 1 mov al, 00 ;
0007 EE 1 out dx, al ;
0008 83 C2 02 1 add dx,2
000B EC 1 in al, dx ;
000C 83 EA 02 1 sub dx,2
000F 24 0F 1 and al, 0Fh
0011 3C 0F 1 cmp al, 0Fh
0013 75 F0 jnz start
0015 again: anykey
0015 B0 00 1 mov al, 00 ;
0017 EE 1 out dx, al ;
0018 83 C2 02 1 add dx,2
001B EC 1 in al, dx ;
001C 83 EA 02 1 sub dx,2
001F 24 0F 1 and al, 0Fh
0021 3C 0F 1 cmp al, 0Fh
0023 74 F0 jz again
debounce 5000
0025 B9 1388 1 mov cx, 5000
0028 90 1 ??0000: nop
0029 E2 FD 1 loop ??0000
rows 0Eh, 3
002B B4 03 1 mov ah, 3
002D B0 0E 1 mov al, 0Eh
002F EE 1 out dx, al
0030 83 C2 02 1 add dx,2
0033 EC 1 in al, dx
0034 83 EA 02 1 sub dx,2
0037 24 0F 1 and al, 0Fh
0039 3C 0F 1 cmp al, 0Fh
003B 75 38 1 jnz colchk ;
rows 0Dh, 7
003D B4 07 1 mov ah, 7
003F B0 0D 1 mov al, 0Dh
0041 EE 1 out dx, al
0042 83 C2 02 1 add dx,2
0045 EC 1 in al, dx
0046 83 EA 02 1 sub dx,2
103
0049 24 0F 1 and al, 0Fh
004B 3C 0F 1 cmp al, 0Fh
004D 75 26 1 jnz colchk ;
rows 0Bh, 0Bh
004F B4 0B 1 mov ah, 0Bh
0051 B0 0B 1 mov al, 0Bh
0053 EE 1 out dx, al
0054 83 C2 02 1 add dx,2
0057 EC 1 in al, dx
0058 83 EA 02 1 sub dx,2
005B 24 0F 1 and al, 0Fh
005D 3C 0F 1 cmp al, 0Fh
005F 75 14 1 jnz colchk ;
rows 07, 0Fh
0061 B4 0F 1 mov ah, 0Fh
0063 B0 07 1 mov al, 07
0065 EE 1 out dx, al
0066 83 C2 02 1 add dx,2
0069 EC 1 in al, dx
006A 83 EA 02 1 sub dx,2
006D 24 0F 1 and al, 0Fh
006F 3C 0F 1 cmp al, 0Fh
0071 75 02 1 jnz colchk ;
0073 F9 err: stc
0074 C3 found: ret
0075 colchk: cols
0075 B9 0004 1 mov cx, 4
0078 D0 C8 1 ??0001: ror al,1
007A 73 F8 1 jnc found
007C FE CC 1 dec ah
007E E2 F8 1 loop ??0001
0080 EB F1 1 jmp err
0082 keyfind endp
0082 code ends
end strt
Macros:
N a m e Lines
ANYKEY . . . . . . . . . . . . . 7
COLS . . . . . . . . . . . . . . 6
DEBOUNCE . . . . . . . . . . . . 3
ROWS . . . . . . . . . . . . . . 9
Symbols:
104
KEYFIND . . . . . . . . . . . . N PROC 0005 CODE Length = 007D
74 Source Lines
133 Total Lines
19 Symbols
0 Warning Errors
0 Severe Errors
A note on the Key Debounce operation: The Debounce delay ensures that the
previous key being released will not be seen repeated. When the key is released, there
are bounces, which open the contacts briefly and then again make them on briefly several
times, before finally breaking off. This will appear as repeated pressing of the same key a
few times. This illusion is avoided if you find the key once detected still remains
detected after a delay that would have caused the vibrations to cease, would mean a new
key and not the key bounce. In a similar way, a key being pressed also would appear
multi press of the same key, which would be avoided if we consider the key after
Debounce delay. Look at the flowchart from this point of view.
105
dta dw 4567h, 0abcdh, 789ah, 1234h, 5678h, 89abh
rlt dw 6 dup (?)
data ends
; the first 2 words of dta are the dividend words, while the third
; word is the divisor. The 4th and 5th words form the dividend for the next
; trial and the sixth is the next divisor
; the first two words of rlt are for the first quotient and the next
; is the remainder. Similarly, fourth, fifth are quotient words of
; second division and the sixth is the remainder.
code segment
assume cs:code, ds:data, es:data
; we are using macros here to make the data loading and storing simpler
movdata macro; load data into registers
lodsw
mov bx,ax
lodsw
mov dx,ax
lodsw
xchg ax,bx
endm
strlt macro ; store result in memory
stosw
mov ax,dx
stosw
mov ax,cx
stosw
endm
start: mov ax, data ; main program starts from here
mov ds, ax
mov es, ax
mov si, offset dta
cld ; for string operations
lea di, rlt
movdata
call smart_div
strlt
movdata
call smart_div
strlt
int 01
smart_div proc near
; the procedure takes the dividend from DX:AX, and divisor from BX, returns
; the quotient in DX:AX and the remainder in CX, the divisor is returned
; unaltered. CX is used along with DX and AX. All other registers are
; returned unaltered. Note the procedure provides no flexibility in the use
; of registers. Moreover, this procedure is not useful for byte division.
sub cx, cx
cmp dx, bx
jb down
xchg ax, cx
xchg ax, dx
div bx
xchg ax, cx
down: div bx
xchg dx, cx
ret
;
smart_div endp
code ends
end start
TESTING IN DEBUG
-u 0 44
106
13D7:0000 B8D513 MOV AX,13D5
13D7:0003 8ED8 MOV DS,AX
13D7:0005 8EC0 MOV ES,AX
13D7:0007 BE0000 MOV SI,0000
13D7:000A FC CLD
13D7:000B 8D3E0C00 LEA DI,[000C]
13D7:000F AD LODSW
13D7:0010 8BD8 MOV BX,AX
13D7:0012 AD LODSW
13D7:0013 8BD0 MOV DX,AX ; load input
13D7:0015 AD LODSW
13D7:0016 93 XCHG BX,AX
13D7:0017 E81B00 CALL 0035
13D7:001A AB STOSW
13D7:001B 8BC2 MOV AX,DX
13D7:001D AB STOSW ; store result
13D7:001E 8BC1 MOV AX,CX
13D7:0020 AB STOSW
13D7:0021 AD LODSW
13D7:0022 8BD8 MOV BX,AX
13D7:0024 AD LODSW
13D7:0025 8BD0 MOV DX,AX ; load input
13D7:0027 AD LODSW
13D7:0028 93 XCHG BX,AX
13D7:0029 E80900 CALL 0035
13D7:002C AB STOSW
13D7:002D 8BC2 MOV AX,DX
13D7:002F AB STOSW ; store result
13D7:0030 8BC1 MOV AX,CX
13D7:0032 AB STOSW
13D7:0033 CD01 INT 01
13D7:0035 2BC9 SUB CX,CX
13D7:0037 3BD3 CMP DX,BX
13D7:0039 7205 JB 0040
13D7:003B 91 XCHG CX,AX
13D7:003C 92 XCHG DX,AX
13D7:003D F7F3 DIV BX
13D7:003F 91 XCHG CX,AX
13D7:0040 F7F3 DIV BX
13D7:0042 87D1 XCHG DX,CX
13D7:0044 C3 RET
-g 17
107
DS=13D5 ES=13D5 SS=13D5 CS=13D7 IP=0029 OV UP EI PL NZ NA PE NC
13D7:0029 E80900 CALL 0035 ; before call
-g 2c
If the smart divide procedure is compared with the corresponding smart divide
macro, one can easily recognize the flexibility provided by the macro. With macro we
could undertake byte or word division with a single macro, but we cannot use the
procedure indicated above for byte division. We have to write a separate procedure for
smart byte division. Not only that, with macros we have the flexibility of using any
register to store the divisor, by properly invoking the macro with that register as the
parameter, but such a flexibility is not there with procedures. Only one register, BX in
the above program, can have the divisor and only one register, CX, can be used to store
the remainder.
108
program is one which starts executing for one set of parameters and part way through, it
is called again to repeat the computation for another set of parameters. This sort of
requirement may come about as follows. Consider a floating point add subroutine. A
program is doing this process. In the middle of this execution, the processor is
interrupted by some system hardware. As per the interrupt handling operations, the
interrupt service routine starts executing now, putting the on-going floating point ADD
routine in a suspended condition. If the interrupt service routine now also requires
floating point ADD operation, it will call the same procedure, but with a different set of
parameters to be handled, as per the requirements of the interrupt service process. The
floating point ADD routine is now said to have re-entered with different parameters. On
return from interrupt service, the suspended floating point ADD routine should resume
operation from where it has left, that means, its parameters should not be disturbed by the
re-entered procedure. Many system programs do require being re-entrant.
Both recursion and re-entrance require different non-overlapping locations for the
parameters every time the procedures are called. There are several possibilities of
achieving all the above requirements of random access of the parameters and non-
overlapping region of memory for every new invocation of the routines. It may
sometimes be possible to avoid use of parameter overlapping by suitably adjusting the
recursive equation, as the example below shows; or if required, using the stack for
temporarily storing the parameters as shown in the next example, below.
The program below uses the recursive equation, n! = n*(n-1)!, with the
terminating condition defined for 1! = 0! = 1. The parameters passed are: the
value of n in BX, and the identity element for multiplication, namely 1, is
stored in register AX as a parameter to be passed to the subroutine.
code segment
assume cs:code
strt: mov ax, 1 ;
mov bx, 8 ; parameters to be passed in ax and bx; n = 8
call fa
int 1
fa proc near
cmp bx,1
jna return
mul bx ; multiplication is done first, so no need to store n
dec bx
call fa
return: ret
fa endp
code ends
end strt
-u 0 16
109
13DC:0003 BB0800 MOV BX,0008
13DC:0006 E80200 CALL 000B
13DC:0009 CD01 INT 01
13DC:000B 83FB01 CMP BX,+01
13DC:000E 7606 JBE 0016
13DC:0010 F7E3 MUL BX
13DC:0012 4B DEC BX
13DC:0013 E8F5FF CALL 000B
13DC:0016 C3 RET
-r
assume cs:code
code segment
strt:call fac
int 1
jmp strt
fac proc near
mov ax, 1
cmp ax, bx
jae return
push bx ;store n temporarily (till ‘ret’ from next ‘call’)
dec bx
call fac
pop bx ;retrieve the stored n for multiplication
mul bx ;multiply by n.
return: ret
fac endp
code ends
end strt
-u 0 16
110
13DB:000F 4B DEC BX
13DB:0010 E8F4FF CALL 0007
13DB:0013 5B POP BX
13DB:0014 F7E3 MUL BX
13DB:0016 C3 RET
-r
BX 0000
:8
-g
Passing of the parameters through the registers and then using the stack to keep
the parameters temporarily, so that the overlapping of parameters from one call to another
will not erase the parameters across the calls, we have seen above. However, the most
common and versatile method that can be used in the 8086 processor for recursion, is
111
passing parameters directly through the stack, instead of using the stack to store
parameters in the subroutines. This is the standard method used by the C-compiler, for
example, for any general subroutine handling. The stack has a limitation, of course. It
does not allow accessing the parameters randomly as would be required by the operations
of the program. To make a random access possible, a separate register, other than the
stack pointer is provided. This is the BP or the base pointer. Normally base pointer
defaults with the stack segment for the reason of making a parameter array in the stack
for the subroutines. The main or calling program pushes the parameters onto the stack;
the subroutine accesses the parameters randomly as required, using the BP register. On
return, the calling program could retrieve the results through pop operations from the
stack array. We make a separate structure (in the stack) called the stack frame, and put
our parameters to be passed in this stack frame, including the output desired from the
subroutine. In order to do this, we provide first, space for the output variable, by
subtracting enough number from the stack pointer. Then we push the input variables.
Having done this in the main program, we call the subroutine. In the subroutine, the first
thing we do is to get the BP pushed onto the stack and copy the SP value in BP. BP now
becomes the frame pointer; the space starting from the return address, down to, and
including the output space in the stack, will be the stack frame. The frame could be
further expanded by providing space for the local variables of the subroutine, which may
have to be referred to, a number of times. This space is provided by subtracting
appropriate number from the stack pointer. Space above this in the stack is now available
for use in the subroutine as a regular stack. This separation of stack space into frame and
stack, will give a disciplined approach to the parameter passing problem of subroutines.
According to this, the recursive subroutine for factorial will be as shown. Note that the
stack frame and the input and output parameters are referred to in the subroutine, by
indexed addressing using BP with positive displacement, while the local parameters are
with negative displacement. Stack beyond the frame is available to the subroutine to be
used like an ordinary stack with the LIFO operation. While returning, the process simply
does move to SP from BP and then pops BP, to return to the old BP, and then executes ret
n, (in the program shown, return alone is used, instead of return n, which is followed by
add SP, 2, which is another way of doing it), where n is the number of input bytes to be
discarded from the stack. Now the output of the subroutine can be simply popped off the
stack in the main program. In effect, the parameters are pushed in the calling program,
and recalled using BP relative addressing in the called program. On return, the results
can be popped off in the main program. Based on this philosophy, the recursive factorial
program can be seen to be as follows:
Details of Passing Parameters to a Subroutine
Using Stack Arrays or Stack Frames.
Code segment
Assume cs:code
Main: Mov AX, n ; (Choose n in the range 0 to 8 only.)
Sub SP,2 ; make space for one word output (Factorial value)
Push AX ; input parameter to the stack.
Call Fact ; call the recursive routine.
Add SP,2 ; clear the stack of the input
; to undo the Push AX above
Pop AX ; get the output result in reg. AX.
Int 01 ; pass control to the DOS.
112
Fact proc near ; the recursive procedure here.
Table 4.1: Stack Frame after the 2nd Instruction of the Procedure
Memory Address Pointer Memory Contents
New BP = SP Old BP of calling program
New BP + 2 Return address of calling Program
(for a near call)
New BP + 4 Input to the called subroutine, n
New BP + 6 Space for output value of Fact
113
It may be noted here, that there is no local variable required for this subroutine,
so, there is nothing in the stack frame above the old BP. In case there are local variables
stored, above the old BP in the stack frame, the instruction mov SP,BP in the termination
part of the procedure, would clear the stack of those variables. Hence as a general rule it
is safe to use that instruction during the termination process. The space above is usable
in the routine as a normal stack, for saving registers, and for further call of nested
routines, etc. The stack frame as shown in this example, remains as compact as possible,
and for determining the offsets for variable parameters we need to consider only one
frame, without bothering about the nesting frames.
The Table 4.2 explains the operation of passing parameters through the use of
stack frame step by step.
However, re-entrant programs have no way other than going through a process
similar to what we discussed with recursion. A stack frame is the best way of meeting
the requirements of the re-entrant programs. For every entry there is a stack frame
created which preserves the parameters, as well as the local variables of the procedure,
which will be separate (non overlapping memory region) from the next instance of the
call to the same procedure.
114
Calling program Called Sub routine
Step 1: Decrement Stack pointer
suitably to accommodate result
output
Step 2: Push all the input
parameters.
Step 3: Call the Subroutine → Step 1: Push the Frame pointer.
Step 2: Move SP to Frame pointer.
Step 3: Decrement the stack pointer suitably
to provide space for temporary or local
variables of the subroutine (which may be
required to be invoked in a random fashion)
to complete the stack frame.
Step 4: Do the subroutine job, use the stack
space above the frame, as the stack for the
subroutine. Use indexed addressing with the
Frame pointer as the base register to obtain
the parameters as required in the subroutine.
Use indexed addressing with the frame
pointer to store the output variables of the
subroutine in the space provided in the Stack
frame.
Step 5: When the subroutine job is done,
clear the subroutine stack by moving the
Frame pointer value to the stack pointer. Get
back the original Frame pointer.
Step 6: Return to the calling program
←
Step 4: On return from the
subroutine, clear the parameters
input to the subroutine from the
Stack by incrementing the Stack
pointer appropriately, to undo
step 2 above.
Step 5: Get the results of the
subroutine by popping them off
the stack into registers or
memory locations as required,
and use them as required.
It should however be noted that the program we have taken, uses only two or three
registers and hence, parameters can well be passed through registers in this case. Below,
we have a recursive program for factorial calculation using this idea.
115
PRO C FAC T
C REATE
STAC K
FRAME
No N ≤ 1?
N=N–1 Ye s
UNDO
THE
FAC T = S TAC K
N*FAC T FRAME
RETURN
Note the format of this program is not for assembling using the MASM. It
is the .lst program obtained from a 32 bit assembler, NASM, downloadable from
the net. However, the difference is not much, and the .asm version required for
MASM can be easily visualized from this listing. See also Q5 at the end of
chapter exercises for a brief introduction to NASM, and macros in NASM.
; procedure ncrp
8 0000000E 55 ncrp:push bp ; recursive routine here.
9 0000000F 89E5 mov bp,sp
10 00000011 81EC0200 sub sp,2 ; space for temp variable
; of the procedure.
116
11 00000015 50 push ax
12 00000016 53 push bx
13 00000017 8B5E04 mov bx,[bp+4] ; parameters passed from
; the calling program.
14 0000001A B80100 mov ax,1
15 0000001D 38DF cmp bh,bl
16 0000001F 7427 jz over
17 00000021 08DB or bl,bl ; is bl = 0?
18 00000023 7423 jz over
19 00000025 81EC0200 sub sp,2
20 00000029 FECF dec bh
21 0000002B 53 push bx
22 0000002C E8DFFF call ncrp ; calculate n-1Cr
23 0000002F 5B pop bx
24 00000030 895EFE mov [bp-2],bx ; store partial result
; temporarily.
25 00000033 8B5E04 mov bx,[bp+4] ; recall the parameters
26 00000036 FECF dec bh
27 00000038 FECB dec bl
28 0000003A 81EC0200sub sp,2
29 0000003E 53 push bx
30 0000003F E8CCFF call ncrp ; compute n-1Cr-1
31 00000042 58 pop ax
32 00000043 8B5EFE mov bx, [bp-2]
33 ; partial result stores in
; temp variable space of
; stack frame
34 00000046 01D8 add ax, bx
35 00000048 894606 over:mov [bp+6],ax ; store result in stack space
36 0000004B 5B pop bx
37 0000004C 58 pop ax
38 0000004D 89EC mov sp,bp ; clean the stack
39 0000004F 5D pop bp
40 00000050 C20200 ret 2
The subroutine leaves space for one word of local variable (sub sp,2 in line no.
10). Sub sp,2 in line 28 is the space left for the output variable. Lines 15 to 18 check for
termination condition. The rest of the program can be clearly identified in terms of the
Flow chart of Fig. 4.1.
For counting the number of times the subroutine is called, you can use SI:CX as
counters set to 0, initially in the main program, and incremented before the return
instruction in the subroutine ncrp. You may be surprised to see the result! It comes to as
much as 97239 decimal, whereas the value 18C9 is only 48620 decimal.
The use of an additional set of terminal conditions, namely, nCn-1 = nC1 = n,
will certainly improve the execution time and stack memory requirement (the count here
comes to only 25836 decimal).
In normal non-recursive and non-re-entrant situations, it is possible to pass
parameters through registers. The following is an example of a non-recursive program
using registers for parameter passing. The program is written as a near program, and
does a 32 bit by 32 bit multiplication. The program is given below with adequate
comments.
;The process in terms of 16 bit data (a:b)*(c:d) = acH : (acL + bcH + adH) :
; (bcL + adL + bdH) : bdL
117
; the numbers to be multiplied are first made available in the registers as
; indicated in the comments at the start of the procedure.
; Steps are: 1. Save registers used; 2. Do the job and 3. Retrieve saved regs.
code segment
assume cs:code
start: call dmult
int 01
dmult proc near
; input parameters, a in dx, b in ax, c in bx and d in cx
; numbers multiplied a:b and c:d; i.e. (dx:ax)*(bx:cx).
; output in dx:cx:bx:ax ; all other regs. are saved across the procedure
push si
push di
push bp ; save extra registers used
mov si, ax ; dx:ax has a:b, so b si
mov di, dx ; a di
mul cx ; (bd) dx:ax
xchg ax, si; si bdL, final; and b ax
mov bp, dx ; bdH bp
mul bx ; (bc) dx:ax
add bp, ax ; bcL + bdH bp, carry1 is not disturbed & used later
mov ax, cx ; d ax
mov cx, 0 ; 0 cx without disturbing the carry flag
adc cx, dx ; bcH + carry1 cx
mul di ; (ad) dx:ax
add bp, ax ; bcL + bdH + adL bp; carry2 could be there
adc cx, dx ; bcH + carry1 + adH + carry2 cx; carry3 could be there
mov ax, di ; a ax
mov di, 0 ; carry3 not disturbed, 0 di
adc di, di ; carry3 di
mul bx ; (ac) dx:ax
add cx, ax ; bcH + carry1 + adH + carry2 + acL cx; carry4, may be
; cx now has the final result
adc dx, di ; acH + carry3 + carry4 dx
mov bx, bp ; arrange the results in bx
mov ax, si ; and in ax as required
pop bp ; retrieve saved registers
pop di
pop si
ret
dmult endp
code ends
end start
-r
118
AX=0000 BX=0000 CX=0000 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0100 NV UP EI PL NZ NA PO NC
1377:0100 B80100 MOV AX,0001
-t12
119
AX=9D80 BX=0000 CX=0002 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=1377 ES=1377 SS=1377 CS=1377 IP=0108 NV UP EI PL NZ NA PO NC
1377:0108 E2FC LOOP 0106
EXERCISES
120
clc
bak: mov ax, [di]
opr ax, 1
stosw
loop bak
dec dx
jnz again
endm
;
; In the above macro, n is the no.of data words, and m is the number of bits of
; shifts done in terms of bits, opr is rcl or rcr
;
strt: mov ax, data
mov ds, ax
mov es, ax
cld
mov cx, num
mov di, offset dat1
mov si, offset dat
rep movsw ; to copy data
mov cx, num
mov di, offset dat2
mov si, offset dat
rep movsw ; to copy data
sflr rcl, dat1, num, 4, cld ; data start address for 'rcl'
sflr rcr, dat2+8, num, 4, std ; data end address for 'rcr'
int 1
code ends
end strt
STUDY IN DEBUG
-u 0 4e
13E0:0000 B8DC13 MOV AX,13DC
13E0:0003 8ED8 MOV DS,AX
13E0:0005 8EC0 MOV ES,AX
13E0:0007 FC CLD
13E0:0008 8B0E3000 MOV CX,[0030]
13E0:000C BF1000 MOV DI,0010
13E0:000F BE0000 MOV SI,0000
13E0:0012 F3 REPZ
13E0:0013 A5 MOVSW
13E0:0014 8B0E3000 MOV CX,[0030]
13E0:0018 BF2000 MOV DI,0020
13E0:001B BE0000 MOV SI,0000
13E0:001E F3 REPZ
13E0:001F A5 MOVSW
13E0:0020 FC CLD
13E0:0021 BA0400 MOV DX,0004
13E0:0024 8B0E3000 MOV CX,[0030]
13E0:0028 8D3E1000 LEA DI,[0010]
13E0:002C F8 CLC
13E0:002D 8B05 MOV AX,[DI]
13E0:002F D1D0 RCL AX,1
13E0:0031 AB STOSW
13E0:0032 E2F9 LOOP 002D
13E0:0034 4A DEC DX
13E0:0035 75ED JNZ 0024
13E0:0037 FD STD
13E0:0038 BA0400 MOV DX,0004
13E0:003B 8B0E3000 MOV CX,[0030]
13E0:003F 8D3E2800 LEA DI,[0028]
13E0:0043 F8 CLC
121
13E0:0044 8B05 MOV AX,[DI]
13E0:0046 D1D8 RCR AX,1
13E0:0048 AB STOSW
13E0:0049 E2F9 LOOP 0044
13E0:004B 4A DEC DX
13E0:004C 75ED JNZ 003B
13E0:004E CD01 INT 01
-g
AX=8023 BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=000A DI=001E
DS=13DC ES=13DC SS=13DC CS=13E0 IP=0050 NV DN EI PL ZR NA PE NC
13E0:0050 FFFF ??? DI
-d0 3f
13DC:0000 34 02 78 56 AB 89 04 76-C0 AB 00 00 00 00 00 00 4.xV...v........
13DC:0010 40 23 80 67 B5 9A 48 60-07 BC 00 00 00 00 00 00 @#.g..H`........
13DC:0020 23 80 67 B5 9A 48 60 07-BC 0A 00 00 00 00 00 00 #.g..H`.........
13DC:0030 05 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
;
; the original word is: ABC0 7604 89AB 5678 0234;
; on 4 bit left shift, we get: BC07 6048 9AB5 6780 2340;
; and on right shift, we get: 0ABC 0760 489A B567 8023; as can be seen from
; the result above.
In this program, we bundle the parameters (opr, k) as (rcl, 0) for left shift or as
(rcr, 1) for right shift, which is easier on the programmer. The mem value can be the start
address in both the cases, and the direction flag is adjusted based on the control parameter
122
k. Sometimes the value of n may not be known at the time of writing the program; it may
be a computed and stored word during operation. Such cases will also be accommodated
in this modified macro.
4. The following program is given without any comments. Find out what it does
and test the working of the program.
data segment
mnd dw 8 dup (1578h)
spare dw 0F8h dup (?)
sud dw 1234h, 0abcdh, 2389h, 5874h
dw 9876h, 4567h, 0bcdh
dw 9567h
spare2 dw 10h dup (?)
rest dw 9 dup (?)
data ends
code segment
assume cs:code, ds:data, es:data
subt macro minuend, subtrahend, result, n
local lup
mov si, offset minuend
mov di, offset result
mov cx, n
clc
cld
lup: lodsw
sbb ax, [subtrahend-minuend-2][si]
stosw
loop lup
sbb ax, ax
mov [di], ax
endm
strt: mov ax, data
mov ds, ax
mov es, ax
subt mnd, sud, rest, 8
int 1
code ends
end strt
5. Below is given an assembly language version of the smart divide using macro,
which we saw under point no.6 in this chapter earlier. This version can be
assembled using the freely available assembler, NASM (Net Assembler), which
can be directly downloaded from the net. The user manual is also freely
downloadable. The program as given here matches the program for assembly by
MASM, which we saw earlier. Note the features of using macros for NASM and
the overall simplicity of the .asm program. List the differences in assembly
programs for the two assemblers; especially find the interesting way of handling
the parameters for the micro. Note also, how the local labels for the macro are
handled.
123
%macro smdiv 4
sub %4, %4
cmp %1, %3
jb %%down
xchg %4, %1
xchg %4, %2
div %3
xchg %4, %2
%%down: div %3
xchg %4, %1
%endmacro
;
start: mov dx, 0x0abc
mov ax, 0x1234
mov bx, 0x45ab
smdiv dx, ax, bx, cx
mov bx, 0xdabc
mov ax, 0x2345
mov cl, 35
smdiv ah, al, cl, bl
int 01
The command used is: nasm –l smartdiv.lst smartdiv.asm. The file also shows
the assembled program starting at the origin 0000 in the code segment.
1 %macro smdiv 4
2 sub %4, %4
3 cmp %1, %3
4 jb %%down
5 xchg %4, %1
6 xchg %4, %2
7 div %3
8 xchg %4, %2
9 %%down: div %3
10 xchg %4, %1
11 %endmacro
12 ;
13 00000000 BABC0A start: mov dx, 0x0abc
14 00000003 B83412 mov ax, 0x1234
15 00000006 BBAB45 mov bx, 0x45ab
16 smdiv dx, ax, bx, cx
17 00000009 29C9 <1> sub %4, %4
18 0000000B 39DA <1> cmp %1, %3
19 0000000D 7206 <1> jb %%down
20 0000000F 87CA <1> xchg %4, %1
21 00000011 91 <1> xchg %4, %2
22 00000012 F7F3 <1> div %3
23 00000014 91 <1> xchg %4, %2
24 00000015 F7F3 <1> %%down: div %3
25 00000017 87CA <1> xchg %4, %1
26 00000019 BBBCDA mov bx, 0xdabc
27 0000001C B84523 mov ax, 0x2345
28 0000001F B123 mov cl, 35
29 smdiv ah, al, cl, bl
124
30 00000021 28DB <1> sub %4, %4
31 00000023 38CC <1> cmp %1, %3
32 00000025 7208 <1> jb %%down
33 00000027 86DC <1> xchg %4, %1
34 00000029 86D8 <1> xchg %4, %2
35 0000002B F6F1 <1> div %3
36 0000002D 86D8 <1> xchg %4, %2
37 0000002F F6F1 <1> %%down: div %3
38 00000031 86DC <1> xchg %4, %1
39 00000033 CD01 int 01
40
-u 100 133
125
5. SOME SIMPLE NUMBER CRUNCHING and
INTERRUPT PROGRAMS
0 Warning Errors
2 Severe Errors
What has happened is the MASM has understood ‘dd’ of line 2 as reserved word defining
double word (32-bit word) and is expecting to be getting the double word defined next.
What it sees is ‘dw’ and interprets it as a label defined elsewhere. No such label is found
and the matter is reported as an error. This will be difficult to make out by the
inexperienced user. The user would like to get an indication that the error is in the use of
the reserved word ‘dd’ as a symbol for a data word. Instead, the MASM interprets the
first word as a valid reserved word and hence looks at the second word as an undefined
symbol. If you correct this error by using the symbol dvd for the dividend, you will find
the line 2 still having error. Try this out in the laboratory, until you get the program
assembling without error! However, most of the time, the error indications can be easily
understood. When it is difficult to understand, one will have to work different possible
alternatives on the erroneous line and on other lines related to it till the fault is correctly
126
identified. It would require some practice before these aspects are properly understood.
On getting the machine language program using the LINK after the MASM, the program
can be debugged. At the debug stage also, there could be, or quite likely there will be
problems, which have to be solved using the ‘t’, ‘g’, ‘p’, ‘d’ and other commands of the
debug judiciously in assessing the faults. The procedures given below are tested and the
working test results are shown. Still when you work this problem in the laboratory, there
could be errors in your program entry. Until you get solid working programs from
possibly wrong programs, you would not have learnt programming well. The purpose of
the microprocessor laboratory is to impart this sort of training. Here are a few working
procedures and programs. In the laboratory when you are working in a team, it could be
arranged that one of the team members deliberately introduces errors in the assembly or
debug version of the program, unseen by other team members who may then try to
identify the error. This may be played as a game and slowly you will find your interest in
programming picking up. In any case whenever you notice errors in your program, make
a record of the error and the way you got it corrected. The learning of programming in
the laboratory is only by studying and avoiding such errors in the future and getting
confident about handling errors by getting aware of common errors possible.
Testing in Debug
-u 0 1e
127
13D5:0000 E80200 CALL 0005
13D5:0003 CD01 INT 01
13D5:0005 3BC3 CMP AX,BX ; procedure from here; Euclid’s algo.
13D5:0007 7301 JNB 000A
13D5:0009 93 XCHG BX,AX
13D5:000A 0BDB OR BX,BX ; bx carries the smaller number here.
13D5:000C 740F JZ 001D ; if bx = 0, invalid data
13D5:000E 52 PUSH DX ; save register used
13D5:000F 2BD2 SUB DX,DX ; prepare for word division
13D5:0011 F7F3 DIV BX
13D5:0013 8BC3 MOV AX,BX
13D5:0015 8BDA MOV BX,DX
13D5:0017 0BD2 OR DX,DX ; remainder = 0?; this also clears carry
13D5:0019 75F4 JNZ 000F ; if so, job over, gcd is in AX
13D5:001B 5A POP DX ; retrieve saved DX before return
13D5:001C C3 RET
13D5:001D F9 STC ; DX not saved and not used in this path
13D5:001E C3 RET
AX 0000
:1234
-rbx
BX 0000
:1324
-rdx
DX 0000
:1111 ; data used to test if it is saved across the routine
-r
-g 3
IP 0003
:0
-r
AX=0014 BX=0000 CX=001F DX=1111 SP=0000 BP=0000 SI=0000 DI=0000
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0000 NV UP EI PL NZ NA PO NC
13D5:0000 E80200 CALL 0005
-g 3
128
DS=13C5 ES=13C5 SS=13D5 CS=13D5 IP=0003 NV UP EI PL ZR NA PE CY
13D5:0003 CD01 INT 01
-q
Certain features that we have introduced in this program are of general interest.
One such feature is the check on the input data, done at the beginning in this program.
Normally, programs operate on the data as long as the data is within some specific limits,
and if the input data transgresses these boundaries, the program will get into problems. It
is therefore, a good practice to have a check on the input data and limit that data within
certain boundaries. Any step beyond the allowable boundary will have to be handled by
not working on the data at all, but giving an indication in the output that the data is
invalid. This indication can be done in several fashions. Normally a message output is
presented on the screen or printed out indicating the fact. A simple way of handling this
requirement is through one of the flags in the flag register. One usual choice is the carry
flag as in this case. When the normal computation is through, all that needs to be done in
the main program is to check the carry flag, if a jump on carry instruction is used, the
program flow can be conveniently altered to account for this error. This method is very
common when using procedures or interrupt routines. Another feature of this program is
the fact that the push and pop of the register DX is done only when the data is valid, if the
data is invalid no such push pop is done, as no calculation is done if the data is invalid.
Exercise: Extend the above program, to give the LCM of the input data numbers,
using the well known relation: LCM (n1, n2) = (n1)*(n2)/GCD (n1, n2)
Hint: There are enough registers available, so that the following steps can be followed:
1. Save n1 and n2 in, say, si and di registers
2. Find GCD of n1 and n2 in ax as has been shown, and move it to bx.
3. Take n1 in ax (from si), make dx = 0 and word divide n1 by the GCD
4. The result will now be in ax with nothing in dx (why?), multiply this result by di.
The LCM will be in dx:ax and the GCD will be in bx.
This is perhaps the best method for finding LCM, even when GCD is not needed.
129
assume cs:code, ds:data, es:data ; DS and ES are same as discussed above.
start: mov ax,data
mov ds, ax
mov es, ax
sub ax, ax ; the first number
lea di, fibo
stosw ; stored
mov bx, ax ; first word goes to bx
inc ax ; second word in ax
back: stosw ; stored
xchg ax, bx ; two consecutive words now in ax, bx
add ax, bx ; add them to get the next word
jnc back ; does it go beyond the 16-bit limit?, if not go back.
int 01
code ends
end start
Testing in debug
-u 0 17
13EE:0000 B8D513 MOV AX,13D5
13EE:0003 8ED8 MOV DS,AX
13EE:0005 8EC0 MOV ES,AX
13EE:0007 2BC0 SUB AX,AX
13EE:0009 8D3E0000 LEA DI,[0000]
13EE:000D AB STOSW
13EE:000E 8BD8 MOV BX,AX
13EE:0010 40 INC AX
13EE:0011 AB STOSW
13EE:0012 93 XCHG BX,AX
13EE:0013 03C3 ADD AX,BX
13EE:0015 73FA JNB 0011
13EE:0017 CD01 INT 01
-g 7
-d 0 3f
13D5:0000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13D5:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13D5:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13D5:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-g 17
AX=2511 BX=B520 CX=01A9 DX=0000 SP=0000 BP=0000 SI=0000 DI=0032
DS=13D5 ES=13D5 SS=13D5 CS=13EE IP=0017 NV UP EI PL NZ NA PE CY
13EE:0017 CD01 INT 01
-d 0 3f
130
Xor ax, ax
Mov bx, 1 ; the first two numbers 0 and 1
lea si, list_start ; start of the list
Mov [si], ax
Up: Mov [si+2], bx
Add si, 4
Add ax, bx
Jc down ; to terminate on carry
Mov [si], ax
Add bx, ax ; note, bx is made the destination now
Jnc up ; don’t terminate if no carry
Down: int 01 ; terminate
This program also gives the same result as above.
Comments on the first program above (using the stosw instruction): It is to be
noted that we have missed an important point in the above program, and that is, we have
forgotten to set the direction flag to have the array address incremented, that is, to have
the direction ‘up’. Fortunately, the direction happened to be up as can be seen from the
flag register display in the debug, so we were not able to notice this error. The rule
however, is as follows:
Before using string instructions, always set the direction flag as required.
The instruction STD (for decrementing the string address) or CLD (for
incrementing the string address) is to be there before the use of the string instructions. In
our program, the first use of stosw is as the 6th instruction from start label. So the CLD
instruction should appear inserted anywhere in the block of 5 instructions from start. In
the above program it has not been done, but the result has come out OK because the D
flag is turned off when we freshly enter the debug as we have done here.
data segment
array db 24h, 0a4h, 0bbh, 0fah, 58h, 23h
asize dw $-array ; notice the way of defining the array size
data ends
code segment
assume cs: code, ds:data
start: mov ax,data
mov ds,ax
cld ; before we forget, we clear the D flag
mov si, offset array
mov cx, asize; array size in number of bytes
sub dx,dx
mov bx,dx ; bx and dx will be used for summing operation
mov ax,dx ; essentially to make AH register = 0.
back: lodsb
131
add bx,ax ; collect the sum in BX
adc dx,0 ; any overflow from bx beyond the word size, will go to
DX
loop back
int 01
code ends
end start
The program above is simple enough to understand. But certain questions may
arise. The data added are in terms of bytes. But why is word addition (add ax, bx) done
in the program? Why the array size of 64K is chosen as maximum? What is the meaning
of $-array, as used to define the array size (label assize) in the data segment? We will
answer these questions one by one.
Add ax, bx: the data in al is to be added to the data in register bx, because, when
some bytes are added, the sum would sooner or later, overflow the byte size, and would
become word size. So, by putting 0’s in AH register, we would have made the byte into a
word in AX register, which is added to the sum collected so far in BX.
Size limit of 64K for the array: The offset address that can be accommodated in
SI is 216 or 64K (65536). Hence we cannot easily handle a byte array more than this size.
Please note, if we are handling word arrays, the maximum size easily handled is 32K
only. Also we have to take note that when we add more than 256 byte sized numbers, our
result may exceed 16 bit value and our addition must then handle numbers up to 3 byte
size and so on. If we add 64KB of byte size data we need to provide for full 24 bit
addition.
Meaning of $ - array in the data segment: $ is the symbol for the current memory
address and array represents the address at the label marked array, so automatically, this
expression $-array computes the array size in bytes. If word array is to be handled, the
size in bytes will have to be halved, which can easily be done in the program by a single
right shift of the array byte count or you might specify as ($-array)/2. MASM will take
care of this conversion during assembly of the program,
132
CMP AX, DX
` JAE DOWN
INC CL
JNZ UP
DEC CL
DOWN: INT 01
The program above tries the square of every number from 1 onwards, until the
square of the number exceeds the given value in the DX register. The process can be
speeded up by increasing initially in steps of 16 and then refining the process by
increasing insteps of 1. Use of macros will help here. The following .asm program
below will show how:
The Program Sqrt.asm
code segment
assume cs: code
approx macro n
local ddd, dddd, uu
uu: mov al, cl
mul al
cmp ax, dx
ja ddd ; The macro highlighted in yellow
jz dddd
add cl, n
jz ddd
jmp uu
ddd: sub cl, n
dddd:
endm
start: or dh, dh
mov cl,0
jz down
approx 10h
down: approx 1
jz dn1 ; The Program using the macro in green
mov al, cl
mul al
dn1: int 01
code ends
end start
It should be noted that this program, when assembled, will require more memory
space, but will certainly execute faster, than the earlier program of the previous page.
5. Bubble Sort with Flagged Exchange: We will now look into a standard
bubble sort operation on an array.
The Bubble sort (ascend sort, as unsigned numbers) ALP using a macro Bigb,
which bubbles the biggest element of the array down to the bottom of the array
data segment
aray dw 75c2h, 8d29h,3bfbh,3bfbh,72f0h
data ends
code segment
assume cs: code, ds:data
start : mov ax,data
mov ds,ax
Mov cx, 5 ; array count n
133
lea si, aray ; array start address
sub dx,dx ; exchange flag
mov di, si ; save array start address
dec cx ; only n-1 bubbling necessary
up: mov bp, cx ; save this count for the next round
bigb macro ; big-to-the-bottom macro
local back, down
mov ax,[si]
back: mov bx, 2[si]
cmp ax, bx ; is ax > bx?
jbe down ; if no, go down
xchg ax, bx ; else, exchange ax, bx
inc dx ; indicate the array is altered, making dx non-zero
down: mov [si], ax ; store appropriate value at {si}
mov ax, bx ; adjust registers for the next bubble
add si,2 ; point to the next address and
loop back ; do the next bubble. cx = 0 here last
mov [si], ax ; store the last data, which is biggest
endm
bigb ; call to the macro
cmp dx, cx ; is dx = 0? (any alteration in the array?)
jz over ; if no, job over
mov dx, cx ; arrange to repeat; dx flag = 0
mov cx, bp ; bubble count is 1 less
mov si, di ; start address of array is same
loop up ; repeat bubbling now
over: int 01 ; terminate the process
code ends
end start
-u 0 35
134
AX=13D5 BX=0000 CX=0047 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13D5 ES=13C5 SS=13D5 CS=13D6 IP=0005 NV UP EI PL NZ NA PO NC
13D6:0005 B90500 MOV CX,0005
-d 0 f
-g
-q
The bubble sort is handling an array both for reading from, as well as writing to
the memory. Therefore the string instruction, lods or the instruction stos or both could be
used. Using both may be difficult, as it requires two registers for the inner loop and these
will have to be renewed from the memory every time the outer loop is initialized. Two
example programs are given below, using only the lods instruction. It can be seen that
there is some difficulty in handling the registers where the data comparison is made in
respect of selecting the item to be stored in the memory and keeping track of the array
modification in the inner loop using the DX register. The two programs given, present
two ways of keeping proper track. It may be simpler if stos only is used. The reader may
try this alternative as an exercise. The program area managing the difficult part of
handling the tracking of array modification using the DX register is highlighted in both
the programs for identification and study by the reader.
Bubble Sort Program Without using macro, Version 1
135
dec dx
down: mov [si-4], ax
loop back
mov [si-2], bx
cmp dx, cx
mov dx,cx
mov cx, bp
loopnz up
int 01
code ends
end start
Bubble sort program version 3 using an inner loop within the outer
instead of a macro
136
sub
dx, dx ; ‘exchange’ track flag, initialized to zero
mov
ax, [di]
back: mov
bx, [di+2] ; inner loop starts
cmp
ax, bx
jle
down ; smaller of the two goes to memory at[di]
; numbers interpreted as signed integers
inc dx ; alter dx from zero
xchg ax, bx
down: stosw
mov ax, bx ; the other data used in ax for next compare
loop back ; inner loop over
mov [di], ax ; the last item is also stored in memory
mov cx, bp ; recall the outer loop count
or dx, dx ; any exchange in the inner loop?
loopnz back1 ; if no, or if loop count is zero then
int 1 ; terminate
jmp strt ; given for facilitating testing again
nop ; to misalign the start of the data part.
Align 2 ; assembler directed for word alignment
arrstrt dw 1234h, 2348h, 8086h, 0abcdh, 0ffabh, 23ach
count dw ($ - arrstrt)/2
code ends
end strt
TESTING IN DEBUG
-u 0 34
137
..
-g
-d 34 3f
Note: The termination has been because of no exchange. The cx = 5,4 loops have
worked and in cx = 3 loop, there have been no exchange and the program has
terminated because of this after decrementing cx to 2 by the loopnz instruction
The result for loop with cx = 5 is 1234, 8086, abcd, ffab, 2348, 23ac
With cx = 4 is 8086, abcd, ffab, 1234, 2348, 23ac
With cx = 3 is 8086, abcd, ffab, 1234, 2348, 23ac
resulting in no exchange taking place for cx = 3. Hence the loop
terminates with cx becoming 2 as seen.
138
the range 180h to 1FFh . If we now start with the start of the source block at
100h and do movsb operation, we would be copying the byte at 100h into
180h. But remember, there is another source data sitting there at
memory180h, which gets lost by this operation. Continuing will make us lose
the data in the block from 180h to 1FFh being over-written with the data from
100 17Fh. Consider, on the contrary, we start the transfer of data from the
other end, namely from 1FFh of source and take it to 27Fh of destination, and
keep decrementing the addresses to continue the copying, all our data will be
safely transferred without loss. This requires operation with the D flag set.
With overlap, if the source start address is larger than the destination start
address, it can be checked easily, that the data will be safely copied when
addresses increase every time, that is, with D flag cleared. In case there is no
overlap between the source and destination data blocks, working with either D
flag set or cleared will be OK. Combining all these, we can see that if the
absolute physical address of the source array start is lower than the destination
start address the data transfer should be done starting from the array end with
the D flag set irrespective of whether there is overlap or not. If the source
start address is greater than the destination start, the data transfer can be done
beginning from the start of the array with the D flag cleared irrespective of
overlap. In the trivial case of the start address of source and destination are
the same, no move is needed. The Block Move Program 1 below, indicates
the operations when the source address is lower than the destination address
and with ES = DS. Program 2 considers the case where DS and ES happen to
be different.
Repeat Prefix: The rep prefix can be used in this context instead writing a
transfer loop. After initializing the SI, DI and the DS, ES registers, initialize
the CX register with the byte count for movsb or word count for movsw
operation and then use the respective instruction using the rep prefix. The
loop will be executed reducing CX every time until it becomes zero.
Below is the program 1 to copy a data array starting at the memory from
location labeled blok, to start at the memory location labeled dest in the same
segment. In this example destination address is above the source address and
the source and destination data blocks overlap. It is therefore necessary to
move the data from bottom end upwards setting the D flag. The program is
given below.
139
dest dw 3
data ends
code segment
assume cs:code, ds:data, es: data
strt: mov ax, data
mov ds, ax
mov es, ax
mov si, offset blok
mov di, dest
mov cx, count
mov ax, cx
dec ax
shl ax, 1
add si, ax
add di, ax
std
rep movsw
int 01
code ends
end strt
-u 0 1e
-g 1b
AX=000A BX=0000 CX=0006 DX=0000 SP=0000 BP=0000 SI=000A DI=000D
DS=13D5 ES=13D5 SS=13D5 CS=13D8 IP=001B NV UP EI PL NZ NA PO NC
13D8:001B FD STD
-d 0 f
13D5:0000 34 12 78 56 BC 9A EF CD-45 23 9A 78 06 00 00 00 4.xV....E#.x....
-t
AX=000A BX=0000 CX=0006 DX=0000 SP=0000 BP=0000 SI=000A DI=000D
DS=13D5 ES=13D5 SS=13D5 CS=13D8 IP=001C NV DN EI PL NZ NA PO NC
13D8:001C F3 REPZ
13D8:001D A5 MOVSW
-g
AX=000A BX=0000 CX=0000 DX=0000 SP=0000 BP=0000 SI=FFFE DI=0001
DS=13D5 ES=13D5 SS=13D5 CS=13D8 IP=0020 NV DN EI PL NZ NA PO NC
13D8:0020 0000 ADD [BX+SI],AL DS:FFFE=DB
-d 0 f
13D5:0000 34 12 78 34 12 78 56 BC-9A EF CD 45 23 9A 78 00 4.x4.xV....E#.x.
-q
140
Certain features of the above program may need explanation. Firstly, the entry
for the count in the data segment (highlighted) defines the count as a word size data and
gives a simple expression: the number of data bytes = $-blok, this divided by 2 is the
number of words of the blok. The assembler will compute this value during assembly.
Secondly, it could easily be worked out in this case, that the data transfer is to be done
starting from the tail end of the data block. Using the word count in the data block it is
necessary to get the tail addresses of the source and the destination blocks. The
highlighted part of the code represents this calculation and the setting of the D-flag. If
data could be transferred starting from the head end, all this is not needed, and a simple
CLD will suffice to ensure address incrementing.
141
mov di, offset [ddi]
sub ax, di
sbb dx, cx
mov si, offset [arr]
add ax, si
adc dx, cx
or ax, dx
jz cmp1
inc cl
rol dx,1
jnc cmp1
neg cx
cmp1: mov ax, cx
pop cx
pop dx
endm
;
cmpadr
mov si, offset [arr]
mov di, offset [ddi]
or ax, ax
jz over
mov cx, n
jns down
mov ax, cx
dec ax
add si, ax
add di, ax
std
down: rep movsb
cld
over: int 1
code ends
end strt
The macro cmpadr given above assumes ES and DS segments are different and
source array starts at offset arr in the data segment, while the destination array is in the
142
extra segment and is to start at offset ddi. Further, the macro saves all registers other than
AX. If the absolute address of the destination array is less than that of the source array,
data can be moved from start to end of the array irrespective of the overlap to get the
correct result. For this condition, the value in AX at the end of the macro is 1. If it is
greater than the source start address, then AX will have -1. In the very rare case they
happen to be same, in which case no move need be done at all, at the end of the macro,
the zero flag would have been set. However, if the data and extra segments are the same,
the cmpadr macro need not be used and the program of Block move 1 given earlier will
be adequate.
143
odd numbers one by one; by successively dividing the number by the odd numbers
and testing if the remainder is zero. Stop whenever the remainder becomes zero
and declare the number not prime. The process could be improved if after seeing
the number is not divisible by 3, we skip division by odd numbers which are
multiples of 3. The sequence of numbers used as trial divisors is thus, 3, 5, 7, 11,
13, 17 etc. Step4. Terminate the process and declare the number is a prime, when
the square of the trial divisor exceeds the given number. The logic of this step is
that if a given number has no factor less than the square root of that number, then it
cannot have a factor bigger than this square root, because, a factor bigger than the
square root must imply that the quotient of division must be definitely less than the
square root. This is also a factor of the given number. If we have not found such a
number evenly dividing the given number, there cannot be a factor greater than the
square root. In the program given below, the process of checking if the next trial
divisor is greater than the square root, dividing to see if it is a factor of the given
number is bundled into a macro. The program follows.
The Assembly language program
; This program, considers an input no. in the register ax. The output from the
; program is in register ax. If ax = 0 at output, then the number is a prime,
else ; if ax = -1 (FFFF H), then the number is not a prime. In this case, the
smallest ; prime factor of the number is in cx register. Else, if ax has the
number ABCD H, ; then the input number is invalid (0 or 1). In this case, the
CY flag also will ; be set. In all other cases, the CY flag would be reset.
The input will be found ; in bx at the output stage.
code segment
assume cs:code
strt: mov bx, ax; the number input in ax, is saved in bx
mov cx, 2
cmp cx, bx
ja invalid
jz prime
ror bx, 1
rol bx, 1 ; lsb is now in CY also, and bx is unaltered
jnc nprime
checkp macro n
add cl,n ; get the next trial factor
jc prime ; if cl exceeds 8 bits, the number is prime
mov ax, cx
mul cx ; get the square of the number; this will also make dx = 0
cmp ax, bx
jz nprime ; if the number equals the square, then it is not prime
ja prime ; if square is greater, then it is prime
mov ax, bx
div cx
or dx,dx ; if no remainder, then
jz nprime ; the number is not a prime
endm
144
jmp next
nprime: mov ax, -1
next: clc
finish: int 01
jmp strt ; used for repeated testing in the debug
code ends
end strt
Testing in debug
-u 0 82
145
13D5:0069 0BD2 OR DX,DX
13D5:006B 740F JZ 007C
13D5:006D EBD0 JMP 003F
13D5:006F B8CDAB MOV AX,ABCD
13D5:0072 F9 STC
13D5:0073 EB0B JMP 0080
13D5:0075 90 NOP
13D5:0076 B80000 MOV AX,0000
13D5:0079 EB04 JMP 007F
13D5:007B 90 NOP
13D5:007C B8FFFF MOV AX,FFFF
13D5:007F F8 CLC
13D5:0080 CD01 INT 01
13D5:0082 E97BFF JMP 0000
-rax
AX 0000
:ffdf
-g
AX FFFF
:ffef
-g
data segment
; here is the table of primes ending with -1.
prlst db 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61
db 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131
db 137, 139, 149, 151, 157, 163, 167, 173, 179, 181, 191, 193
146
db 197, 199, 211, 223, 227, 229, 233, 239, 241, 251, 255
data ends
code segment
assume cs:code, ds:data
strt: mov bx, ax; the number input in ax, is saved in bx
mov ax, data
mov ds, ax
mov cx, 2
cmp cx, bx
ja invalid
jz prime
ror bx, 1
rol bx, 1 ; lsb is now in CY also, and bx is unaltered
jnc nprime
cld
sub ch, ch
lea si, prlst
8. A program for counting the leading 0’s in a 32 bit data in regs. dx:ax
While handling the normalization process in floating point numbers, it
becomes necessary to count the leading 0’s in double length numbers. The
program is given below without any comment or demonstration of its
working. Interested readers may take it as an exercise and write
appropriate comments and also study its working by assembling it and
testing it in DEBUG. The program returns the input data intact in dx:ax,
and the count of leading zeros in cx.
THE PROGRAM FOR COUNTING THE LEADING ZEROS OF DATA IN DX:AX REGISTERS
147
code segment
assume cs:code
strt: Push dx
push ax
sub cx, cx
or dx, dx
jnz d16
mov cx, 16
mov dx, ax
d16: or dh, dh
jnz d8
add cx, 8
mov dh, dl
d8: test dh, 0f0h
jnz d4
add cx, 4
shl dh, 1
shl dh, 1
shl dh, 1
shl dh, 1
d4: test dh, 0C0h
jnz d2
add cx, 2
shl dh, 1
shl dh, 1
d2: test dh, 080h
jnz d1
add cx, 1
d1: or dh, dh
jnz d0
add cx, 1
d0: pop ax
pop dx
int 01
jmp strt ; to repeat with a new set of data
code ends
end strt
148
Strt: mov cx, 41h ; 1 extra count - the loop starts with loopz
instn.
jmp down
back: add bp, bp
adc dx, dx
adc bx, bx
adc ax, ax
down: test ah, 80h
loopz back
sub cx, 40h
neg cx
int 1
jmp strt
code ends
end strt
-u 0 18
-r bp
BP 0000 ;initially ax, bx, dx, bp are all 0’s. bp is made = 0011h.
:11
-g
-rax
AX 8800
:0
-g
AX=0000 BX=0000 CX=0040 DX=0000 SP=0000 BP=0000 SI=0000
DI=0000
DS=13CC ES=13CC SS=13DC CS=13DC IP=001A NV UP EI PL NZ NA PO CY
13DC:001A EBE4 JMP 0000
10. A program for taking in 4 hex digits from the keyboard using the
DOS interrupt 21h, function 1: In the following program, we take a 4-
digit hex input from the keyboard. To keep the program to be applicable
in a variety of situations, it is proposed to ignore any non-hex key if
pressed accidentally, and also if a wrong entry is made as seen by the echo
on the monitor, all the 4 hex digits can be properly fed in without having
149
to complete and restart the entry. The program takes in only the last 4 hex
digits entered. The entry is terminated by the character ‘$’. The program
and the assembled version are presented below. A macro is used and it is
highlighted in the listing.
assume cs:code
0000 code segment
0000 B9 0004 strt: mov cx,4 ; shift count
0003 BB 3030 mov bx,3030h
0006 8B D3 mov dx, bx
0008 B4 01 here: mov ah,1 ;
000A CD 21 int 21h ; input from keyboard with echo
000C 3C 24 cmp al, '$'
000E 74 1C jz next ; end of input
0010 3C 30 cmp al, 30h
0012 72 F4 jb here
0014 3C 3A cmp al,3ah
0016 72 0A jb next1
0018 24 DF and al, 0dfh; convert lower-to-upper case
001A 3C 41 cmp al, 41h
001C 72 EA jb here
001E 3C 46 cmp al, 46h
0020 77 E6 ja here
0022 8A F2 next1: mov dh, dl ; rotate regs to make way for
; fresh ascii input
0024 8A D7 mov dl, bh
0026 8A FB mov bh, bl
0028 8A D8 mov bl, al
002A EB DC jmp here ; get fresh ascii input
a2h macro r1 ;; ascii-to-hex conversion
local down0
cmp r1, 40h
jb down0
sub r1,7
down0:sub r1, 30h
endm
002C next: a2h bl
002C 80 FB 40 1 cmp bl, 40h
002F 72 03 1 jb ??0000
0031 80 EB 07 1 sub bl,7
0034 80 EB 30 1 ??0000:sub bl, 30h
a2h bh
0037 80 FF 40 1 cmp bh, 40h
003A 72 03 1 jb ??0001
003C 80 EF 07 1 sub bh,7
003F 80 EF 30 1 ??0001:sub bh, 30h
0042 D2 C7 rol bh, cl
0044 0A DF or bl, bh
a2h dl
0046 80 FA 40 1 cmp dl, 40h
0049 72 03 1 jb ??0002
004B 80 EA 07 1 sub dl,7
004E 80 EA 30 1 ??0002:sub dl, 30h
0051 8A FA mov bh, dl
a2h dh
0053 80 FE 40 1 cmp dh, 40h
0056 72 03 1 jb ??0003
0058 80 EE 07 1 sub dh,7
005B 80 EE 30 1 ??0003:sub dh, 30h
005E D2 C6 rol dh, cl
0060 0A FE or bh, dh
0062 CD 01 int 1
150
0064 EB 9A jmp strt
0066 code ends
end strt
Notice the non-hex inputs are ignored and the last 4 hex keys are presented as
a hex number of 4-digits in the register BX. The ASCII code (24h) for the
character ‘$’is seen in the register AL, which has caused the program to
terminate.
11. Interrupt 21h function 7: Many times we may need to wait in the
middle of a program, may be till we finish reading the material already
displayed, after which we may want to have further display. Function 07
of DOS interrupt 21h will be helpful here. This function causes a program
to wait until a key is pressed and only then allows the program to proceed.
The ASCII code of the key pressed is available in AL register like in
function 1 of interrupt 21h, but function 7 does not echo the character
pressed to the standard output of the system (that is, the monitor). A
simple program to demonstrate this function is given below:
; A simple program - waits for a key press and
; returns ‘OK’ on the monitor when any key is pressed.
151
There are several other useful interrupt 21H functions, many of which are useful
for controlling different input/ output devices. Information on these functions is readily
available in the internet. They make handling of I/O operations like disk reading/writing,
video display handling etc. It is not the purpose here to go into these ready made
programs and their use.
_____xxxx_____
EXERCISES
1. Find the logic of the following 4-digit BCD to hex converter program. Input is a 4-
digit BCD in reg AX, and output in reg DX.
Hint: this is a divide by 2 operation to get bits of the result.
code segment
assume cs: code
strt: mov cx, 16
sub dx, dx
next: mov bx, ax
and bx, 1110h
shr bx, 1
shr bx, 1
sub ax, bx
shr bx, 1
sub ax, bx
shr ax, 1
rcr dx, 1
loop next
int 1
code ends
end strt
TESTING IN DEBUG
-u 0 1d
-rax
AX 0000
152
:9999
-g
The program can be improved as shown; also this gives a partial hint as
to the operations done:
; This program converts 4 digit BCD to binary. Input in reg AX
; output is also returned in AX; uses regs BX, CX and DX
; the program continuously divides the BCD data by 2
; to get the 10 lsb's. The 4 msb's are then got simply
; by rotating and ORing as is, and further rotated right by 2 more
bits ; (these bits are just 0’s) to properly align the hex result.
code segment
assume cs: code
strt: mov cx, 10
sub dx, dx
next: mov bx, ax
and bx, 1110h
shr bx, 1
shr bx, 1
sub ax, bx
shr bx, 1
sub ax, bx
shr ax, 1
rcr dx, 1
loop next
mov cl, 6
ror ax, cl
ror dx, cl
or ax, dx
int 1
2. The program below reverses an array in-situ, using the array start and array end
addresses in regs SI and DI. Study the logic and the clever use of the string instructions
in the program; also study the loop control adopted in the program without using reg cx.
Check that the program works both for even number of elements in the array as well as
odd number of elements. Check that the central element in an array with odd elements is
left as it is and not handled at all by the program.
; This program changes an array in-situ
; watch the clever use of string instructions here
; watch also the array loop control without using reg. cx
data segment
array db 1, 2, 3, 4, 5, 6
arr_end db 7
Data ends
;
code segment
assume cs:code, ds: data, es: data
153
start: mov ax, data
mov ds, ax
mov es, ax
mov si, offset array
mov di, offset arr_end
std
back: mov al, [di]
xchg al, [si]
stosb
inc si
cmp si, di
jb back
int 1
code ends
end start
3. Write an appropriate 8086 assembly language program to test the array reversing
macro under section 6 of this Chapter. Test the working of the program.
4. Study the various int 21H functions from the internet, and write small programs to
use some of them. The site at:
bbc.nvg.org/doc/Master%20512%20Technical%20Guide/m512techb_int21.htm
for example, gives good information.
154
6. ILLUSTRATING THE POWER OF THE 8086 PROCESSOR
155
prd dw 5 dup (0)
;
code ends
end start
Testing in debug
-u 0 21
-r
AX=0000 BX=0000 CX=0036 DX=0000 SP=0000 BP=0000 SI=0000 DI=0000
DS=13CC ES=13CC SS=13DC CS=13DC IP=0000 NV UP EI PL NZ NA PO NC
13DC:0000 8CC8 MOV AX,CS
-d cs:24 35
13DC:0020 34 12 FE 56-AB 67 CD 89 00 00 00 00 4..V.g......
13DC:0030 00 00 00 00 00 00 ......
-rcx
CX 0036
:4 ; number of words in the multiplicand loaded manually.
-r
AX=0000 BX=ABCD CX=0004 DX=0000 SP=0000 BP=0000 SI=0024 DI=002C
DS=13CC ES=13CC SS=13DC CS=13DC IP=0000 NV UP EI PL NZ NA PO NC
13DC:0000 8CC8 MOV AX,CS
-g
AX=8DBB BX=ABCD CX=0000 DX=4592 SP=0000 BP=5C7A SI=002C DI=0034
DS=13DC ES=13DC SS=13DC CS=13DC IP=0023 NV UP EI PL NZ NA PO NC
13DC:0023 90 NOP
-d 24 35
13DC:0020 34 12 FE 56-AB 67 CD 89 A4 4F 9D 5F 4..V.g...O._
13DC:0030 50 77 BB 8D 7A 5C Pw..z\
It can easily be checked that multiplication of hex no:‘89cd 67ab 56fe 1234’ by
hex no:‘abcd’ is equal to hex no:’5c7a 8dbb 7750 5f9d 4fa4’ using the
scientific calculator of the system in the hex mode in 2 rounds.
156
could be used whenever there is a need without being lost. The parameters required will
be m, n, and the start addresses of the multiplicand array of m-words and of the multiplier
array of n-words. The operations involved will be obtaining the multi-word operand with
word by word multiplication of the multiplier, and adding these products with proper
alignment. An additional temporary word array of (m+1)-words would be required to
store the partial results of single multiplier word multiplication with the complete
multiplicand. As we have seen in the example above, encapsulation of the operation
(word* multi-word) multiplication may not be needed (the macro for this was used only
once in our example 1 earlier), and we shall write the complete operation as a subroutine,
with the parameters passed through the registers of the processor.
data segment
mpd dw 9fedh, 8abch, 7efah, 0fdabh ; multiplicand
dw 252 dup (0) ; multiplicand can go upto a total of 256 words.
mpr dw 0f123h, 9cdeh, 8754h, 1156h, 3478h, 73fbh ; multiplier
dw 250 dup (0) ; multiplier can also be upto 256 words
prod dw 512 dup (0) ; product array has a space of 512 words
temp dw 257 dup (?) ; temporary use, (word*256 word) = 257 words
dw 15 dup (0) ; extra space
m dw 4
n dw 6
data ends
;
code segment
assume cs:code, ds:data, es:data
strt: mov ax, data
mov ds, ax
mov es, ax
mov ax, n
mov cx, m
mov bx, offset mpr ; addresses
mov si, offset mpd
mov dx, offset temp
mov di, offset prod
call mmmult
int 1
mmmult proc near
;initialising
;prepare the stack frame
push dx ; address of temp [bp + 12]
push bx ; address of mpr [bp + 10]
push si ; address of mpd [bp + 8]
push di ; address of prod [bp + 6]
push ax ; value of n [bp + 4]
push cx ; value of m [bp + 2]
push bp
mov bp, sp
sub ax, ax
push ax
push ax ; 2 word locations, for local variables in the stack
; frame: [bp - 2] partial prod, [bp - 4] mpr word
; position (outer loop index).
cld
; now proceed to clear the temp space
; outer loop starts here
olup: sub ax,ax
mov cx, 257
mov di, offset temp
rep stosw
157
mov bx, [bp + 10]
add bx, [bp - 4]
mov bx, [bx]
mov di, [bp + 12]
mov si, [bp + 8]
mov cx,[bp + 2]
; inner loop starts now.
ilup: lodsw
mul bx
xchg dx, [bp – 2]
add ax, dx
adc word ptr[bp - 2], 0
stosw
loop ilup
mov ax, [bp - 2]
stosw
; inner loop over
mov cx, [bp + 2]
inc cx
mov si, [bp + 12]
mov di, [bp + 6]
add di, [bp - 4]
clc
; another loop nested inside the outer loop
comp: lodsw
adc ax, [di]
stosw
loop comp
; nested loop completed
mov [bp - 2], cx ;clear [bp - 2], note cx = 0 here.
add word ptr [bp - 4], 2
mov cx, word ptr[bp + 4]
add cx, cx
cmp cx, word ptr [bp - 4]
jnz olup
; outer loop over - prepare to return
mov sp, bp ; unwind the stack frame and clear the stack
pop bp
pop cx
pop ax
pop di
pop si
pop bx
pop dx
ret ; and return
mmmult endp
code ends
end strt
-u 0 89
158
147F:0020 53 PUSH BX
147F:0021 56 PUSH SI
147F:0022 57 PUSH DI
147F:0023 50 PUSH AX
147F:0024 51 PUSH CX
147F:0025 55 PUSH BP
147F:0026 8BEC MOV BP,SP
147F:0028 2BC0 SUB AX,AX
147F:002A 50 PUSH AX
147F:002B 50 PUSH AX
147F:002C FC CLD
147F:002D 2BC0 SUB AX,AX
147F:002F B90101 MOV CX,0101
147F:0032 BF0008 MOV DI,0800
147F:0035 F3 REPZ
147F:0036 AB STOSW
147F:0037 8B5E0A MOV BX,[BP+0A]
147F:003A 035EFC ADD BX,[BP-04]
147F:003D 8B1F MOV BX,[BX]
147F:003F 8B7E0C MOV DI,[BP+0C]
147F:0042 8B7608 MOV SI,[BP+08]
147F:0045 8B4E02 MOV CX,[BP+02]
147F:0048 AD LODSW
147F:0049 F7E3 MUL BX
147F:004B 8756FE XCHG DX,[BP-02]
147F:004E 03C2 ADD AX,DX
147F:0050 8356FE00 ADC WORD PTR [BP-02],+00
147F:0054 AB STOSW
147F:0055 E2F1 LOOP 0048
147F:0057 8B46FE MOV AX,[BP-02]
147F:005A AB STOSW
147F:005B 8B4E02 MOV CX,[BP+02]
147F:005E 41 INC CX
147F:005F 8B760C MOV SI,[BP+0C]
147F:0062 8B7E06 MOV DI,[BP+06]
147F:0065 037EFC ADD DI,[BP-04]
147F:0068 F8 CLC
147F:0069 AD LODSW
147F:006A 1305 ADC AX,[DI]
147F:006C AB STOSW
147F:006D E2FA LOOP 0069
147F:006F 894EFE MOV [BP-02],CX
147F:0072 8346FC02 ADD WORD PTR [BP-04],+02
147F:0076 8B4E04 MOV CX,[BP+04]
147F:0079 03C9 ADD CX,CX
147F:007B 3B4EFC CMP CX,[BP-04]
147F:007E 75AD JNZ 002D
147F:0080 8BE5 MOV SP,BP
147F:0082 5D POP BP
147F:0083 59 POP CX
147F:0084 58 POP AX
147F:0085 5F POP DI
147F:0086 5E POP SI
147F:0087 5B POP BX
147F:0088 5A POP DX
147F:0089 C3 RET
-g 1a
AX=0006 BX=0200 CX=0004 DX=0800 SP=0000 BP=0000 SI=0000 DI=0400
DS=13DC ES=13DC SS=13DC CS=147F IP=001A NV UP EI PL NZ NA PO NC
147F:001A E80200 CALL 001F
159
13DC:0000 ED 9F BC 8A FA 7E AB FD-00 00 00 00 00 00 00 00 .....~..........
-g
AX=0006 BX=0200 CX=0004 DX=0800 SP=0000 BP=0000 SI=0000 DI=0400
DS=13DC ES=13DC SS=13DC CS=147F IP=001F NV UP EI PL ZR NA PE NC
147F:001F 52 PUSH DX
The main and the sub-program above use 8AH (or 138) decimal bytes of memory,
and the sub routine uses only 107 bytes of memory with less than 60 instructions.
It can therefore be observed that this program, as it is, will be useful for
multiplication of upto 256-word by 256-word hex numbers, that is, binary 4096-bit by
4096-bit numbers. Operations of this magnitude will be needed in cryptography and
other applications. An example of 255 x 255 word (or 4080 x 4080 bit) multiplication is
shown below. (However, nothing prevents us from using the entire data segment, in
which case, we can easily go up to 32000 digit hex numbers for our multiplier and
multiplicand.)
data segment
mpd dw 255 dup (0ffffh) ; multiplicand
dw 0
mpr dw 255 dup (0ffffh) ; multiplier
dw 0
prod dw 512 dup (0) ;.... .... . ; product array
temp dw 257 dup (?) ; temporary use
dw 15 dup (0)
m dw 255
n dw 255
data ends
160
;
code segment
; this and the un-assembled program are the same as shown earlier.
TESTING IN DEBUG
-d0 9ff ; The displayed data are separated and labled for the sake of clarity
; multiplicand below- 255 words
13DC:0000 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0010 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0020 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0030 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0040 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0050 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0060 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0070 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0080 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0090 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0100 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0110 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0120 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0130 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0140 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0150 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0160 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0170 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0180 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0190 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:01A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:01B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:01C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:01D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:01E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:01F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF 00 00 ................
;The green highlighted words here and below are not in the data or results
161
13DC:02C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:02D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:02E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:02F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0300 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0310 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0320 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0330 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0340 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0350 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0360 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0370 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0380 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0390 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:03A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:03B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:03C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:03D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:03E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:03F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF 00 00 ................
162
13DC:0680 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0690 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:06A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:06B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:06C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:06D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:06E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:06F0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0700 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0710 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0720 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0730 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0740 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0750 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0760 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0770 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0780 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0790 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:07A0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:07B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:07C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:07D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:07E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:07F0 FF FF FF FF FF FF FF FF-FF FF FF FF 00 00 00 00 ................
3. Handling Large BCD Numbers: Intel 8086 provides a minimal facility for
handling decimal numbers, but when we are thinking of large decimal numbers in the
163
BCD representation, these facilities are very inadequate. However, it is possible to
handle BCD, directly, while keeping the computations in hex at word level and using the
decimal power for inter-word operations. I call this as the method of power BCD. The
meaning will become clear, if we consider an example. Suppose we want to multiply the
BCD numbers 1234 and 5678, what we may do is to multiply the hex equivalent of 1234
by the hex equivalent of 5678, and divide the result by 10000 (104 for 4 digit BCD
computations) decimal to get two hex words. If we convert these two hex words to their
BCD equivalents we get the complete product in BCD. Our first step will be to convert
1234 and 5678 into their hex equivalents. These turn out to be 4D2 and 162E. These two
hex numbers can be directly multiplied to obtain the hex product 6AE9BC. Dividing this
by 10000 (equivalent to 2710 hex) we obtain the quotient 2BC and the remainder 19FC.
We can now convert these two hex numbers to BCD to get 0700 6652 the complete
product in the BCD form. This can be called BCD4 method as the 4-digit BCD is
handled in terms of 4-digit hex at a time. Two conversion processes become necessary
here. First, we convert the BCD words to Hex; and second, we convert the Hex words
less than 10000 (decimal) to BCD. These conversions we have done already and in this
context, they can be frozen into macros in an optimized fashion to be invoked whenever
needed. The two programs are shown below:
Condh macro ;; macro to convert BCD decimal word to hex word
mov bx, ax ;; the data for conversion is assumed in ax
and ax, 0f0f0h
ror ax, 1
ror ax, 1
sub bx, ax
ror ax, 1 ;; the given number converted to concatenation
of
sub bx, ax ;; two hex bytes; BCD 9863, for example,
becomes ;; hybrid (62)(3F), 62H = 98 BCD, and
3FH = 63 BCD
mov al, 100
mul bh
sub bh, bh
add ax, bx ;; result of conversion in ax.
Endm
Conhd macro ;; macro to convert hex word < 2710H to BCD word.
;; input BCD, assumed in ax
mov bl, 100
div bl
mov bl, ah
aam
xchg ax, bx
aam
xchg ah, bl
shl bx, 1
shl bx, 1
shl bx, 1
shl bx, 1
add ax, bx ;; output BCD in ax. Macro uses only bx with ax.
Endm
Both the above macros are reasonably optimal and use only BX register as
additional facility required for the conversion. The input and the converted output are
both in register AX. Now we should look at the multiplication of two such words to
164
produce a result which contains hex words such that the conversion of each hex word to
BCD and concatenation produces the decimal value string of the product. This
corresponds to the operation in the inner loop of the previous program. That program can
easily be modified with an additional temporary storage in the stack frame of the number
2710H at the location, say, [BP – 6]. This program will now become:
ilup: lodsw
mul bx
add ax, word ptr[bp – 2]; [bp-2] contains the number of 10000’s carried
; from the previous multiplication
adc dx, 0
div word ptr [bp - 6] ; [bp – 6] has 10000 decimal (2710h)
xchg ax, dx
mov word ptr [bp – 2],dx ;10000’s saved for the next word multiplication
stosw
loop ilup
mov ax, word ptr[bp – 2]
stosw
Simple addition of two words , in this Power BCD4 system, can be seen in the
following program:
BCD4_add: add ax, bx
add ax, 0d8f0h; 0d8f0h is the negative of 2710h (10000 decimal)
jc down
sub ax, 0d8f0h
down:
It is easy to see the above program leaves the data in ax corrected for BCD4
addition along with proper carry in the flag register for BCD4 add operation.
The highlighted portion in the above programs can be seen as the extra for
decimal operation in this loop and in the addition program. Everything else will remain
the same as the hex program, excepting the original BCD data conversion at the
beginning and the final conversion of the result in BCD4 form to BCD form at the end.
If other operations also are required on the BCD numbers then the power BCD
representation can be conveniently used right through without much difficulty.
165
prod dw 8192 dup (0) ; cleared product array
mpd dw 4096 dup (?) ; mpd array in BCD4 form used for the computations.
temp dw 4097 dup (?) ; reserved for temporary use
dw 15 dup (0)
m dw 4096 ; no. of mpd words
n dw 4096 ; no. of mpr words
data ends
;
code segment
assume cs:code, ds:data, es:data
condh macro
mov bx, ax
and ax, 0f0f0h ;; for digit separation
ror ax, 1
ror ax, 1
sub bx, ax
ror ax, 1
sub bx, ax
mov al, 100
mul bh
sub bh, bh
endm ;; note the last add ax, bx is omitted in this
;
conhd macro
mov bl, 100
div bl
mov bl, ah
aam
xchg ax, bx
aam
xchg ah, bl
shl bx, 1
shl bx, 1
shl bx, 1
shl bx, 1
add ax, bx
endm
;
strt: mov ax, data
mov ds, ax
mov es, ax
mov ax, n ; count of words of mpr
mov cx, m ; count of words of mpd
mov bx, offset mpr ; addresses
mov si, offset prod
mov dx, offset temp
mov di, offset mpd
mov bp, offset mpdraw
call dmmult
int 1
dmmult proc near
;initialising
;prepare the stack frame
push dx ; address of temp @ [bp + 12]
push bx ; address of mpr @ [bp + 10]
push di ; address of mpd @ [bp + 8]
push si ; address of prod @ [bp + 6]
push ax ; value of n @ [bp + 4]
push cx ; value of m @ [bp + 2]
push bp ; address of mpdraw @ [bp]
mov bp, sp
sub ax, ax
push ax
166
push ax ; 2 word locations, for local variables in the stack
; frame: [bp - 2] partial prod, [bp - 4] mpr word
; position (outerloop index).
mov ax, 2710h
push ax ; [bp - 6] stores decimal 10000
cld
;
; conversion of raw mpd data to the BCD4 form
mov si, [bp] ; address of mpdraw
conlup1: lodsw ; note di and cx are properly loaded at entry to the proc.
condh
add ax, bx ; BCD4 in ax now
stosw
loop conlup1 ; conversion complete for mpd
167
jnz olup
; outer loop over - prepare to convert result to BCD
;
mov di, [bp+6] ; prod address
mov cx, [bp + 2]
add cx, [bp + 4]
conlup2: mov ax, [di]
conhd
stosw
loop conlup2 ; conversion over here, now prepare to return
mov sp, bp
pop bp
pop cx
pop ax
pop si
pop di
pop bx
pop dx
ret ; and return
mmmult endp
code ends
end strt
168
149F:004D 03C3 ADD AX,BX
149F:004F AB STOSW
149F:0050 E2E5 LOOP 0037
149F:0052 2BC0 SUB AX,AX
149F:0054 B90101 MOV CX,0101
149F:0057 8B7E0C MOV DI,[BP+0C]
149F:005A F3 REPZ
149F:005B AB STOSW
149F:005C 8B5E0A MOV BX,[BP+0A]
149F:005F 035EFC ADD BX,[BP-04]
149F:0062 8B07 MOV AX,[BX]
149F:0064 8BD8 MOV BX,AX
149F:0066 25F0F0 AND AX,F0F0
149F:0069 D1C8 ROR AX,1
149F:006B D1C8 ROR AX,1
149F:006D 2BD8 SUB BX,AX
149F:006F D1C8 ROR AX,1
149F:0071 2BD8 SUB BX,AX
149F:0073 B064 MOV AL,64
149F:0075 F6E7 MUL BH
149F:0077 2AFF SUB BH,BH
149F:0079 03D8 ADD BX,AX
149F:007B 8B7E0C MOV DI,[BP+0C]
149F:007E 8B7608 MOV SI,[BP+08]
149F:0081 8B4E02 MOV CX,[BP+02]
149F:0084 AD LODSW
149F:0085 F7E3 MUL BX
149F:0087 0346FE ADD AX,[BP-02]
149F:008A 83D200 ADC DX,+00
149F:008D F776FA DIV WORD PTR [BP-06]
149F:0090 92 XCHG DX,AX
149F:0091 8956FE MOV [BP-02],DX
149F:0094 AB STOSW
149F:0095 E2ED LOOP 0084
149F:0097 8B46FE MOV AX,[BP-02]
149F:009A AB STOSW
149F:009B 8B4E02 MOV CX,[BP+02]
149F:009E 41 INC CX
149F:009F 8B760C MOV SI,[BP+0C]
149F:00A2 8B7E06 MOV DI,[BP+06]
149F:00A5 037EFC ADD DI,[BP-04]
149F:00A8 F8 CLC
149F:00A9 AD LODSW
149F:00AA 1305 ADC AX,[DI]
149F:00AC 05F0D8 ADD AX,D8F0
149F:00AF 7203 JB 00B4
149F:00B1 2DF0D8 SUB AX,D8F0
149F:00B4 AB STOSW
149F:00B5 E2F2 LOOP 00A9
149F:00B7 894EFE MOV [BP-02],CX
149F:00BA 8346FC02 ADD WORD PTR [BP-04],+02
149F:00BE 8B4E04 MOV CX,[BP+04]
149F:00C1 03C9 ADD CX,CX
149F:00C3 3B4EFC CMP CX,[BP-04]
149F:00C6 758A JNZ 0052
149F:00C8 8B7E06 MOV DI,[BP+06]
149F:00CB 8B4E02 MOV CX,[BP+02]
149F:00CE 034E04 ADD CX,[BP+04]
149F:00D1 8B05 MOV AX,[DI]
149F:00D3 B364 MOV BL,64
149F:00D5 F6F3 DIV BL
149F:00D7 8ADC MOV BL,AH
149F:00D9 D40A AAM
169
149F:00DB 93 XCHG BX,AX
149F:00DC D40A AAM
149F:00DE 86E3 XCHG AH,BL
149F:00E0 D1E3 SHL BX,1
149F:00E2 D1E3 SHL BX,1
149F:00E4 D1E3 SHL BX,1
149F:00E6 D1E3 SHL BX,1
149F:00E8 03C3 ADD AX,BX
149F:00EA AB STOSW
149F:00EB E2E4 LOOP 00D1
149F:00ED 8BE5 MOV SP,BP
149F:00EF 5D POP BP
149F:00F0 59 POP CX
149F:00F1 58 POP AX
149F:00F2 5D POP BP
149F:00F3 5F POP DI
149F:00F4 5B POP BX
149F:00F5 5A POP DX
149F:00F6 C3 RET
-g 1d
AX=1000 BX=2000 CX=1000 DX=A000 SP=0000 BP=0000 SI=4000 DI=8000
DS=13DC ES=13DC SS=13DC CS=1FDF IP=001D NV UP EI PL NZ NA PO NC
1FDF:001D E80200 CALL 0022; this is before entry to subroutine
-d 2000 200f
13DC:2000 97 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
-d 4000 400f
13DC:4000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-d 6000 600f
13DC:6000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-d 8000 800f
13DC:8000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-d a000 a00f
13DC:A000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-d c000 c02f
13DC:C000 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:C010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:C020 00 10 00 10 00 00 00 00-00 00 00 00 00 00 00 00 ................
m n
-g ;go and complete the subroutine and stop at ‘int 1’.
AX=1000 BX=2000 CX=1000 DX=A000 SP=0000 BP=0000 SI=4000 DI=8000
DS=13DC ES=13DC SS=13DC CS=1FDF IP=0022 NV UP EI NG NZ NA PE NC
1FDF:0022 52 PUSH DX ; this is after exit from subroutine
-d 0 c02f ; full display from this command, but only some portion is shown
; below
; view of the relevant portion of the data segment
; at first, the multiplicand in BCD, (1000h words = 64536 decimal digits)
13DC:0000 98 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
13DC:0010
; the memory from here to the location shown filled entirely with 99
13DC:1FF0 99 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
; multiplicand up to this.
170
; now the multiplier in BCD, also the same size as multiplicand
13DC:2000 97 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
13DC:2010 ;
multiplier is also filled with data 99 upto the line shown below.
13DC:3FF0 99 99 99 99 99 99 99 99-99 99 99 99 99 99 99 99 ................
;excepting the highlighted words in mpd and mpr, rest of the words are all
;decimal 9999
;the highlighted words in the above result indicate the correctness of the
calculation; the rest of the words are 0000 in the first half and 9999 in the
last half as they should be.
data segment
dw 8000h dup(?) ; space reserved for result
171
data ends
code segment
assume cs:code, ds:data, es:data, ss:stack
start: mov ax, data
mov es,ax
mov ds,ax
mov ax,stack
mov ss,ax
mov sp,offset tos ;stack initiaization
again1: int 1 ;now, load any number whose factorial is to be found
;into reg. ax (no. to be less than or = 7da1 hex.
call fact
jmp again1 ;get factorial of another number; load number in ax.
;This procedure can find factorial of any number upto 7db1 hex or
;32177 decimal, It is basically in two steps
;Step 1 stores data input in ax, if necessary,
;in 2 memory words in the BCD4 format in the little endian fashion.
;This operation is also common to multiplication of step 2, and so it
;comes at the last part of step 2.
;Step 2 consists of doing multiplication successively by one less nuber
;obtained in the previous multiplication.
;Before concluding, step 3 marks the endof result with a flag word FFFF,
;converts the BCD4 to normal BCD format, so
;the fnal result is in BCD in the big endian fashion.
;the procedure assumes ds and es point to the same segment in memory
;step 1
mov bp,ax ; save ax in bp
cld
sub di, di ;initialize di to 0
mov cx, di ;clear cx, so that store of initial data value is OK
mov bx,10000 ;for BCD4 handling
or ax,ax ;ax = 0?
jnz check ;if no,do further check (in step 2 later part)
inc ax ; ax is 0, so put its factorial in ax.
jmp store ;go to store the result (in step 2)
;step 2
;di has 0, the start address of the multiplicand; bp has the
;multiplier in hex; both multiplier and multiplicand are thus in BCD4
;form. so, multiplication in BCD4 form is carried out. cx has the number
;of words in the result (initialised to 0). si has 0 initially.
172
loop mult ;on exit from the loop cx = 0.
mov ax, si ;check if higher part is there in the result
or ax, ax
jz process ;ifso,go to process it.
;else check ax as shown next.
;store from now on, is the same as the initial input store
; so this forms the part 2 of step 1.
;step 3. This step is used to convert the result in BCD4 to regular BCD
;and store it in the big-endian fashion so as to make it easy to view.
;note di at this point has an address 2 more than the last word stored
div bl
xchg ah,bh
aam
rol al,cl
ror ax,cl
173
xchg bh,al
aam
rol al,cl
rol ax,cl
xchg al,bh
ret
conv endp
code ends
end start
Testing in debug
-g
AX=23DC BX=0001 CX=02A7 DX=0000 SP=0200 BP=0000 SI=0000 DI=0000
DS=13DC ES=13DC SS=23DC CS=23FC IP=0011 NV UP EI PL NZ NA PO NC
23FC:0011 E80200 CALL 0016
-rax
AX 23DC
:2f ; (2f hex = 47 decimal)
-g
AX=14F1 BX=0064 CX=0004 DX=0A1A SP=0200 BP=0002 SI=001E DI=000E
DS=13DC ES=13DC SS=23DC CS=23FC IP=0011 NV UP EI PL NZ NA PE NC
23FC:0011 E80200 CALL 0016
; help for finding the termination of the result
-d0 2f
13DC:0000 25 86 23 24 15 11 16 81-80 64 29 64 35 51 53 61 %.#$.....d)d5QSa
13DC:0010 19 79 96 91 97 63 23 89-12 00 00 00 00 00 FF FF .y...c#.........
13DC:0020 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
; this result can be verified in the scientific calculator.
; the word FFFF at address 001E flags the end
of the result.
-rax
AX 14F1
:7db1 ; this is the maximum input possible- limited by the data segment size.
-g
AX=17FA BX=0064 CX=0004 DX=118C SP=0200 BP=0002 SI=FFFE DI=7FFE
DS=13DC ES=13DC SS=23DC CS=23FC IP=0011 NV UP EI NG NZ NA PO NC
23FC:0011 E80200 CALL 0016; location FFFE is where the result ends
-d0
13DC:0000 44 92 60 64 35 41 31 94-14 42 51 01 76 89 57 78 D.`d5A1..BQ.v.Wx
13DC:0010 61 91 34 55 16 28 36 58-31 77 63 99 74 35 01 00 a.4U.(6X1wc.t5..
13DC:0020 32 73 17 02 13 31 81 84-14 30 27 95 19 99 90 37 2s...1...0'....7
13DC:0030 95 41 58 53 15 06 58 26-14 94 02 11 78 98 01 64 .AXS..X&....x..d
13DC:0040 93 45 78 83 32 67 60 39-09 31 74 21 27 98 88 43 .Ex.2g`9.1t!'..C
13DC:0050 85 94 64 18 99 56 02 57-67 69 88 66 04 39 88 25 ..d..V.Wgi.f.9.%
13DC:0060 71 42 58 06 97 36 78 12-57 89 63 16 15 96 84 35 qBX..6x.W.c....5
13DC:0070 90 71 06 01 34 12 73 32-65 39 49 62 85 55 61 40 .q..4.s2e9Ib.Ua@
;initial significant part of the result
-d f000
174
13DC:F050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:F060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:F070 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; from here till the end, the entries are all zeros.
-d fff0
13DC:FFF0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 FF FF ................
; The result covers the full data segment of 65536 (excepting the last two
;bytes). The result can be verified to a good extent using the Scientific
;calculator. Only portions of the reult in the data seg. are shown above.
Algorithm: Interleaved Modular Multiplication. The algorithm shown below does the
job reasonably well and is easy to understand. What is done is, the product is obtained by
bitwise multiplication of X with bits of Y and at each stage of multiplication M is
removed. Each stage may require 2 subtractions of M at most.
There are other algorithms available but here we give the interleaved modular
multiplication.
INPUT: X; Y; M with 0 < = X,Y < = M
X = ∑ xi*2i ; i = 0, 1,..., n-1; and similarly Y and M in terms of bits.
OUTPUT: P = X * Y mod M
n: number of bits of each of X, Y and M
yi: ith bit of Y
1. P = 0;
2. for (i = n – 1; i < = 0; i--)
3. P = 2 * P;
4. if (P > = M) P = P – M;
5. I = yi * X ;
6. P = P + I;
7. if (P > = M) P = P – M;
data segment
x dw 65535, 65535, 65535, 15, 76 dup(0); X = 000F FFFF FFFF FFFF hex
dw 65535, 65535, 65535, 15, 75 dup(0); Y = 000F FFFF FFFF FFFF hex
y dw 0 ; msw of Y
m dw 1, 0, 0, 16, 76 dup (0) ; M = 0010 0000 0000 0001 hex
p1 dw 80 dup(0) ; P of the algorithm
p2 dw 80 dup(0) ; Scratch pad for temporary use
xn dw 80
p1n dw 80 ; 80 words or 1279 bit (1 bit margin left to accommodate carry
; on addition) bit data can be had for each of X, Y and M
;
;
data ends
;
code segment
assume cs:code, ds:data, es:data
;
175
; since there are several variables involved,
; it is better to use the stack for variable store
;
start: push ax
push bx
push di
push dx
push cx ; used regs saved in stack
mov ax, data
mov ds, ax
mov es, ax ; segments initialized
; parameters stored in stack frame
;
mov bx, offset x ; [bp+10]
push bx
mov bx, offset y ; [bp+8]
push bx
mov bx, offset m ; [bp+6]
push bx
mov bx, offset p1 ; [bp+4]
push bx
mov bx, offset p2 ; [bp+2]
push bx
push bp
mov bp, sp
sub sp, 2 ; temp at [bp-2]
cld
;
mov bx, [bp+8]
lup1: mov ax, [bx]
sub bx, 2
push bx ; bx points to next lower word of Y
mov [bp-2], ax ; the current word of y
mov dx, 16
lup2: mov di, [bp+4]; p1 is doubled
mov si, di
call addchk ; 2*p1 --> p1; (p1 – M) --> p2; if borrow, ignore p2
; else interchange pointers p1 and p2
mov ax, [bp-2]
shl ax, 1
mov [bp-2], ax
jnc down1
mov di, [bp+4]; address p1
mov si, [bp+10]
call addchk ; p1 + X --> p1; p1 – M --> p2; if borrow, ignore p2
; else, interchange the pointers p1 and p2
down1: dec dx
jnz lup2
pop bx
dec [p1n]
jnz lup1
mov si, [bp+4]
mov sp, bp
pop bp
add sp, 10; clear the stack frame
pop cx
pop dx
pop di
pop bx
pop ax ; retrieve the stacked parameters
int 1
;
;
176
addchk proc near
;
;this procedure adds [di] to [si] and subtracts m from the sum
;and puts the result in p2. If the result ends in a borrow, on
;subtraction, the result in p2 is ignored.
;Else, if no borrow, the result of subtraction goes to p1. This
;is achieved by simply interchanging p1 and p2 pointers,
;in case of no borrow on subtraction.
mov cx, xn
clc
back10: lodsw
adc ax, [di]
stosw ; [si] + [di] --> [di]
loop back10 ;
mov si, [bp+4] ; address p1
mov di, [bp+2] ; address p2
mov bx, [bp+6] ; address m
mov cx, xn
clc
back11: lodsw ; Computation p1-m --> p2
sbb ax, [bx]
inc bx
inc bx ; so that carry will not change
stosw
loop back11
jc down10
mov ax, [bp+2]
xchg ax, [bp+4]
mov [bp+2], ax ; pointers to p1 and p2 exchanged
down10: ret
addchk endp
code ends
end start
The program takes a space of 152 bytes apart from the data space of the data
segment. It is arranged that the registers are all saved across the program excepting the
register SI which points to the result of the modular multiplication. The data segments
should have the indicated labels for the program to work. If these labels are stuck to,
then the whole program can be used as a sub routine. Alternatively, the offset address of
the start of the data may be passed to the subroutines through registers and directly these
registers may be stacked. That will be helpful in the cryptographic situations.
The program works only when at least one of the two data to be multiplied is
smaller than M, and the smaller one should be taken as X. The size of Y does not matter.
In the case of cryptographic processes like RSA algorithms etc. both X and Y will be
smaller than M. The reader can easily see why this is so by looking at the algorithm..
177
Then x’ (the next approximation for x) = x – [f(x) / f ’(x)]; Newton-Raphson
Or x’ = x – [(a – 1/x)/ (1/x2)] = x – ax2 + x = x*(2 – ax) which becomes
x’ = x*(2 – 1 + d), since ax = 1 – d. Thus, x’ = x*(1 + d).
ax’ will now be ax*(1 + d) = 1 – d2, since ax = 1 – d
The highlighted equation in the last line above, indicates that the the error in the
iterated value x’ is the square of the error in x. The calculation in each iteration is seen to
involve only multiplication and addition, because x’ = x*( 2 – a*x), involving 2
multiplications and one subtraction. Further, it is to be noted that the accuracy goes up as
the square of the error with each iteration. If our original value of the approximation has
a 4 bit accuracy, next iteration will be of 8 bit accuracy and the next, of 16 bit accuracy
and so on. Even starting from a 4 bit accuracy we can reach 64 bit accuracy for the result
in 5 iterations. The table below is guaranteed to be accurate to 6 bits and is thus capable
of giving better than 32 bit accuracy in 3 rounds of iteration (actually 48 bits) , good
enough for single precision Floating Point calculations. One more round will give better
than what is required for extended double precision format of IEEE standard 754.
The table below is prepared as follows: The inverse of 1.xxxyyy1 (where x and y
are either bit 1 or bit 0) is taken and its value correct to 8 binary digits is computed and
placed on the table against the position xxxyyy, where xxx corresponds to the row and
yyy to the column. For example, the entry in the row 100 at column 110 corresponds to
the inverse of 1.1001101. This inverse, correct to 8 bit accuracy, turns out to be: 0.1010
0000 or A0 H with a leading binary point. The maximum error in these occur exactly at
the points indicated as those points are calculated for 1.xxxyyy1 and marked against
1.xxxyyy; at any other point, the inverse becomes closer to the indicated value. For
example, the value marked at 1.100110 will be valid over the range 1.100 110 to very
nearly 1.100 111, taking it as corresponding to 1.100 1101, will produce maximum error
at the two extremities of the range and will be increasingly accurate at the middle of the
interval. Worst error occurs at the value 1.000 000, and we see that value is correct upto
7 bits. With this we can get an accuracy of 56 bits which corresponds to double precision
IEEE standard, with three iterations, we get good enough accuracy for any normal FP
calculations. Once inverse is got division can be obtained using multiplication by the
inverse.
178
We shall not go into the details of this process here.
7. Division using the method followed for modular multiplication: But we will
present a modification of the modular multiplication process, that we saw in the previous
section to carry division of arbitrarily long numbers. The process is the same as the
modular multiplication given, choosing X =1 , and Y as the dividend, the modular base
will now be the divisor. The program is given below. The program handles up to 1280
bit dividend and up to1279 bit divisor, because of providing 80 words as the data size.
The working memory space can be increased by providing more word size for use . We
need 5 times the data size, and we can comfortably accommodate 3000 hex bytes (i.e.,
over 12000 decimal bytes or over 96000 bits) of data with this program. Once a
provision has been made in the program, arbitrarily smaller data can be handled by
making the leading words zero, and the program will handle this data. The efficiency of
the program can be increased by using the actual word size using the data size
determining macro of the program given earlier. We have used it to size the divisor data,
but not for the dividend sizing. The size of the dividend can be accommodated by giving
this size at the label ndd (standing for number of words of the dividend in the program.
The program will need no alteration other than replacing the value of ndd by the size
value got for the dividend.
; Division of arbitrary long numbers
data segment
dvd dw 40 dup(4096), 65530, 39 dup(65535)
;dividend
dr dw 65534, 39 dup(65535), 40 dup(0) ;divisor
qt dw 80 dup(?) ; both quotient and remainder spaces are
r1 dw 80 dup(?) ; provided the same as dividend space
r2 dw 80 dup(?)
ndd dw 80 ; same space, as dividend at start
spare dw 7 dup(?);
data ends
;
code segment
assume cs:code, ds:data, es:data ;
;
strt: mov ax, data
mov ds, ax
mov es, ax
; preparing stack frame
mov ax, offset dvd ; [bp+14]
push ax
mov ax, offset dr ; [bp+12]
push ax
mov ax, offset qt ; [bp+10]
push ax
mov ax, offset r1 ; [bp+8]
push ax
mov ax, offset r2 ; [bp+6]
push ax
mov ax, ndd ; [bp+4]
push ax
call longdiv ; ret address [bp+2]
;
int 1
;
; now macros used
179
;
clr macro offs, n ;; ax should be 0 at entry here
mov di, offs
mov cx, n
sub ax, ax
rep stosw
endm
;
double macro offsd, n
local dub
mov si, offsd
mov di, si
mov cx, n
clc
dub: lodsw
adc ax, ax
stosw
loop dub
endm
;
; to handle variable size divisors, it is necessary to find the
; exact word size of the divisor. note the dividend size is
; flexible and accommodated in the process.
drsize macro ofset, nsize
local size1
mov si, ofset
mov cx, nsize
add si, cx
add si, cx
size1: sub si, 2 ; msw address of data at ofset
mov ax, [si]
or ax, ax
loopz size1
rol ax, 1
adc cx,0
endm
180
clr [bp+8], [bp+4]
;
; now, the real works! First double quotient and rem. r1
drsize [bp+12], [bp+4]
mov ax, [bp+4]
mov [bp-2], ax ; current word count for loop
mov bx, [bp+14] ; dividend start address
dec ax
shl ax, 1
add bx, ax ; dividend end address
lup1: push bx
mov ax, [bx] ; msw of dividend
mov [bp-4], ax ; dividend partial word, (current)
mov dx, 16
lup2: double [bp+10], [bp+4]
double [bp+8], [bp+4]
mov ax, [bp-4]
shl ax, 1
mov [bp-4], ax
mov bx, [bp+8]
adc word ptr [bx], 0 ; inc r1, on carry
subt [bp+8], [bp+12], [bp+6], 41 ; r1-divr = r2
jc down
mov bx, [bp+10]
inc word ptr[bx] ; quotient to be incremented
; the lsw of the quotient will end with a
; 0, hence incrementing the lsw is all
; that is required to increment the quotient
; now interchange r1 and r2 pointers
mov ax, [bp+8]
xchg ax, [bp+6]
mov [bp+8], ax
down: dec dx
jnz lup2
pop bx
sub bx, 2
dec word ptr[bp-2]
jnz lup1
mov si, [bp+8] ; pointer to remainder
mov sp, bp
pop bp
ret 12
longdiv endp
code ends
end strt
; The program takes 211 bytes of memory in the code segment and about 810
bytes of memory in the data segmment with about 30 bytes in the stach segment.
On testing the program with the above data, the following results are obtained
as seen from the relevant Data Segment area.
; The result of testing the program in the debug after assembling and
linking is presented below.
-g
AX=FFFF BX=FFFE CX=0000 DX=0000 SP=0000 BP=0000 SI=01E0 DI=02D2
DS=13DC ES=13DC SS=13DC CS=140F IP=0024 NV UP EI PL ZR NA PE CY
140F:0024 55 PUSH BP; First instn of proc(after INT 1).
-d 0 32f
; The dividend
13DC:0000 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
181
13DC:0010 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0020 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0030 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0040 00 10 00 10 00 10 00 10-00 10 00 10 00 10 00 10 ................
13DC:0050 FA FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0060 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0070 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0080 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0090 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
; the divisor
13DC:00A0 FE FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00B0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00C0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00D0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00E0 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0100 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0110 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0120 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0130 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; the quotient
13DC:0140 FC FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0150 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0160 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0170 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0180 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
182
; The same program tested with another set of data to indicate data size
flexibility of the program.
data segment
dvd dw 23567, 34239, 12345, 77 dup(0)
;dividend
dr dw 0abcdh, 0123, 78 dup(0) ;divisor
qt dw 80 dup(?) ; both quotient and remainder spaces are
r1 dw 80 dup(?) ; provided the same space as dividend space
r2 dw 80 dup(?)
ndd dw 80 ; same space, as dividend at start
spare dw 7 dup(?);
data ends
;
; Results of test
-g
; dividend
13DC:0000 0F 5C BF 85 39 30 00 00-00 00 00 00 00 00 00 00 .\..90..........
13DC:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0040 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0050 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0060 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0070 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; divisor
13DC:00A0 CD AB 7B 00 00 00 00 00-00 00 00 00 00 00 00 00 ..{.............
13DC:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0100 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0110 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0120 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0130 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; quotient
13DC:0140 50 D3 63 00 00 00 00 00-00 00 00 00 00 00 00 00 P.c.............
13DC:0150 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0160 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0170 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0180 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
183
13DC:0220 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
13DC:0230 FF FF 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0240 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0250 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0260 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0270 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
; the remainder r1 changed to this plce, see the register si for the
; starting address of remainder.
13DC:0280 FF B4 38 00 00 00 00 00-00 00 00 00 00 00 00 00 ..8.............
13DC:0290 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:02F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0300 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
13DC:0320 50 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 P...............
8. Some Macros that can be used for Large Number handling: Large numbers
can be considered as single data entities. For handling this way, we allot a data array to
each large number. The array length can be of fixed size, like we handle word data in
registers. The data size is nominally 16 bits, but it does not prevent us from considering
any data of size upto 16 bits to be stored and handled in the registers. The simple number
1 can also be stored in a 16 bit register and it will be considered as 0001 hex. The
number can be handled as 16 bit without encountering any computational problem. In a
similar way, we can allot, say 256 bytes of memory to store any data upto 256 bytes or
2048 bits size. The data can be referenced by the start address of the array and by the
number of data words used. The array will be filled from the start address and will
extend as for as significant bits exist in the data and the remaining data words can be
filled with 0’s. For example see the 3 byte or 2 word data stored in the remainder value
presented in the array just above, (memory locations 13DC: 0280 – 031F). They can be
considered as 3 byte data, or 2 word data or 160 byte without any ambiguity. The macros
presented below can be viewed as handling data in this fashion. It will be a good method
to store data so that they can be considered to arrays of equal size and the algorithms
normally will be able to handle zero data in the leading words without serious problems,
other than taking more time processing the zero data as valid data numbers. If one is
particular one can extract the exact word size information, and limit the computations to
the size available. But this is not necessary, as some of the programs above indicate.
However, in the set of macros given below we have included a macro to find the actual
number of significant words of a data array.
data segment
sc1 dw 02, 31 dup(0), 0fffeh, 30 dup (0ffffh), 0fffh, 64 dup (0)
sc2 dw 32 dup (0ffffh), 96 dup (0)
sc3 dw 128 dup (0)
des dw 128 dup (?)
184
des2 dw 128 dup (?)
n dw 128
data ends
;
code segment
assume cs: code, ds: data
;
; arbitrarily long integer handling macros
; (i) end address calculation
;
lend macro src, reg, n
mov reg, offset [src]
mov cx, n
dec cx
add reg, cx
add reg, cx
inc cx
endm
;
; (ii) [src1].as.[src2] --> [dest]; direction UP assumed
; 'as' is 'adc' or 'sbb'
;
las macro src1, src2, dest, n, as
local las1
mov si, offset[src1]
mov bx, offset[src2]
mov di, offset[dest]
mov cx, n
clc
las1: lodsw
as ax, [bx]
stosw
inc bx
inc bx
loop las1
endm
;
; (iii) mov arr1 to arr2; non overlapping arrays
;
movarr macro src, dest, n
mov si, offset[src]
mov di, offset[dest]
mov cx, n
rep movsw
endm
;
; (iv) lsig obtain the significant number of words of a long integer
;
lsig macro src, n
local lsig1
mov si, offset [src]
mov cx, n
dec cx
add si, cx
add si, cx
add cx, 2
std
lsig1: lodsw
or ax, ax
loopz lsig1
cld
endm
;
185
strt: mov ax, data
mov ds, ax
mov es, ax
lend sc1, si, n
las sc1, sc2, des, n, adc
movarr des, des2, n
las des2, sc1, des2, n, sbb
lsig sc1, n
int 1
code ends
end strt
TESTING IN DEBUG
-u 0 64
; INITIALISATION
0B96:0000 B8450B MOV AX,0B45
0B96:0003 8ED8 MOV DS,AX
0B96:0005 8EC0 MOV ES,AX
;
; TESTING END ADDRESS COMPUTATION OF DATA [0] OR SC1
0B96:0007 BE0000 MOV SI,0000
0B96:000A 8B0E0005 MOV CX,[0500]
0B96:000E 49 DEC CX
0B96:000F 03F1 ADD SI,CX
0B96:0011 03F1 ADD SI,CX
0B96:0013 41 INC CX
;
; TESTING LAS FOR ADD; SC1 + SC2 à [300] OR [0] + [100] à [300]
0B96:0014 BE0000 MOV SI,0000
0B96:0017 BB0001 MOV BX,0100
0B96:001A BF0003 MOV DI,0300
0B96:001D 8B0E0005 MOV CX,[0500]
0B96:0021 F8 CLC
0B96:0022 AD LODSW
0B96:0023 1307 ADC AX,[BX]
0B96:0025 AB STOSW
0B96:0026 43 INC BX
0B96:0027 43 INC BX
0B96:0028 E2F8 LOOP 0022
;
; TESTING MOVE; [300] à [400]
0B96:002A BE0003 MOV SI,0300
0B96:002D BF0004 MOV DI,0400
0B96:0030 8B0E0005 MOV CX,[0500]
0B96:0034 F3 REPZ
0B96:0035 A5 MOVSW
;
; TESTING LAS FOR SUBTRACT; [400] – [0] à [400]
0B96:0036 BE0004 MOV SI,0400
0B96:0039 BB0000 MOV BX,0000
0B96:003C BF0004 MOV DI,0400
0B96:003F 8B0E0005 MOV CX,[0500]
0B96:0043 F8 CLC
0B96:0044 AD LODSW
0B96:0045 1B07 SBB AX,[BX]
0B96:0047 AB STOSW
0B96:0048 43 INC BX
0B96:0049 43 INC BX
0B96:004A E2F8 LOOP 0044
;
186
; TESTING LSIG; SIGNIFICANT NUMBER OF WORDS OF [0] à CX; SIGN FLAG INDICATES
; THE LEADING BIT OF THE LEADING WORD
0B96:004C BE0000 MOV SI,0000
0B96:004F 8B0E0005 MOV CX,[0500]
0B96:0053 49 DEC CX
0B96:0054 03F1 ADD SI,CX
0B96:0056 03F1 ADD SI,CX
0B96:0058 83C102 ADD CX,+02
0B96:005B FD STD
0B96:005C AD LODSW
0B96:005D 0BC0 OR AX,AX
0B96:005F E1FB LOOPZ 005C
0B96:0061 FC CLD
0B96:0062 CD01 INT 01
0B96:0064 2AE4 SUB AH,AH
187
0B45:0260 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0270 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0280 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0290 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:02F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-g 14
;COMPUTE END ADDRESS OF SOURCE1
AX=0B45 BX=0000 CX=0080 DX=0000 SP=0000 BP=0000 SI=00FE DI=0000
DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=0014 NV UP EI PL NZ AC PO NC
0B96:0014 BE0000 MOV SI,0000
-g 2a
;TEST LAS FOR ADD;
AX=0000 BX=0200 CX=0000 DX=0000 SP=0000 BP=0000 SI=0100 DI=0400
DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=002A NV UP EI PL NZ AC PE NC
0B96:002A BE0003 MOV SI,0300
-d 0 3ff
; DATA [0] + [100] à [300]
; DATA [0]
0B45:0000 02 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
188
0B45:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0040 FE FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0050 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0060 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0070 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................
0B45:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
;DATA [100]
0B45:0100 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0110 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0120 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0130 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0140 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0150 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0160 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0170 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0180 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0190 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:01F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
;
; DATA [300]
0B45:0300 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0310 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0320 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0330 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0340 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0350 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0360 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0370 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................
0B45:0380 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0390 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-g 36
;DATA [300] à [400]
-d 300 4ff
189
0B45:0340 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0350 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0360 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0370 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................
0B45:0380 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0390 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:03F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-d
; DATA [400] – DATA [0] à DATA [400]
; DATA [400] BEFORE EXECUTION OF LAS
0B45:0400 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0410 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0420 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0430 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0440 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0450 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0460 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0470 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................
0B45:0480 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0490 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:04F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
;DATA [0]
0B45:0000 02 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0010 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0020 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0030 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0040 FE FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
190
0B45:0050 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0060 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF FF ................
0B45:0070 FF FF FF FF FF FF FF FF-FF FF FF FF FF FF FF 0F ................
0B45:0080 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:0090 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0B45:00F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
-g
; VERIFICATION OF LSIG, THE PL FLAG SHOWS THE LEADING DIGIT OF THE MS WORD IS 0
AX=0FFF BX=0100 CX=0040 DX=0000 SP=0000 BP=0000 SI=007C DI=0500
DS=0B45 ES=0B45 SS=0B45 CS=0B96 IP=0064 NV UP EI PL NZ NA PE NC
0B96:0064 2AE4 SUB AH,AH
-q
These macros, including ‘condh’ and ‘conhd’ that we discussed discussed in section 3 of
this chapter, could be conveniently used to handle the large number programs we have
been seeing. Sometimes we need to convert a little endian data to a big endian data,
without changing its location. This is the same as array reversal in-situ program given
under section 6 of Chapter 5. A slightly modified macro tailored to large number
handling in terms of word arrays is given below, without serious comments, for your
study. The macro assumes direction flag is clear, so the string operations are in the
address incrementing mode.
data segment
arr dw 1, 2, 3, 4, 5, 7
n dw ($ - arr)/2
data ends
;
;
code segment
assume cs:code, ds:data, es: data
;
; now the macro revar is defined
revar macro src, n
local rev1, rev2
mov si, offset [src]
191
mov di, si
add si, n
add si, n
jmp rev1
rev2: mov ax, [di]
xchg ax, [si]
stosw
rev1: sub si, 2
cmp si, di
ja rev2
endm
;
Strt: mov ax, data
mov ds, ax
mov es, ax
revar arr, n
int 1
code ends
end strt
The test program and the macro are given without test results. The verification of
the program is left to the reader.
All the computations are done in hexadecimal, and the first 16K hexadecimal
prime numbers are found, the numbers are stored in terms of 4 byte words, and
these words are then converted to decimal numbers using 4 byte hex to BCD
conversion routine.
The numbers fill the entire data segment and are stored in the big endian fashion
in decimal. The largest number in the table is the 6-digit decimal prime number
180503. The entire entire table is not presented. Only a sample of the output at
the start and at the end are seen. The complete list can be got by copying the
program and running it in the DOS (DEBUG) environment after assembling.
Exercise: Study the program in respect of the algorithm, register usage and
optimizations done in the program.
data segment
table dw 32768 dup(?)
data ends
;
stak segment stack
dw 256 dup(?)
tos label word
stak ends
192
;
code segment
assume cs:code, ds: data, es:data, ss:stak
start: mov ax, data
mov ds, ax
mov es, ax
mov ax, stak
mov ss, ax
lea ax, tos
mov sp, ax
mov ax, 2
sub dx, dx
mov cx, dx
cld
sub di, di
stosw ; first prime number 2, stored
xchg ax,dx
stosw ; stored as a double word
xchg ax, dx
inc ax ; next prime, 3.
stosw
xchg ax, dx
stosw
xchg ax,dx ; stored as a double word
mov bx,ax ;cx:bx is the number to be checked if prime
nextp: or di, di ;di is address for storing next prime
jz finish
nextp2: add bx, 2 ;try if the next odd number is a prime
adc cx, 0 ; if carry, increment cx
mov si,4
next: lodsw
cmp ax, 65535; what is this check? This ensures we do not get too large
; a number as the prime. Actually this is unnecessary.
jz finish
mov bp, ax
add si, 2
mul ax
cmp cx, dx
jz proc1
jnb procm ; if number-now is > bp*bp, then divide number-now by bp
over: mov ax, bx
stosw
mov ax, cx ; yellow part of the program checks the next odd number
stosw
jmp nextp
proc1: cmp bx, ax ; square is more than the number-now prime
jb over ; the number is prime, store it
jz nextp2 ; the number is a square, so not prime.
procm: mov ax, bx
mov dx, cx
div bp
or dx, dx
jnz next
jmp nextp2
finish: mov cx, 8
sub si, si
mov di, si
bak: lodsw
mov dx, ax
lodsw ;first 8 numbers converted to decimal
stosw
mov ax, dx
cmp ax, 10
193
jb dwn
add ax, 6
dwn: xchg ah, al
stosw
loop bak
lup2: lodsw
mov dx, ax
lodsw
xchg ax, dx
mov bp, 10000
div bp
cmp ax, 10
jb dn1
add ax, 6
dn1: xchg ah, al
dn: stosw
mov ax, dx
mov dx, 100 ; remaining double hex words converted to BCD
div dl
mov cx, 4
xchg ah, ch
aam
ror al, cl
ror ax, cl
xchg al, ch
aam
rol al, cl
rol ax, cl
mov al, ch
stosw
cmp si, 0000
jnz lup2
ok: int 1
code ends
end start
-u 0 b3
194
23F6:002A 83C302 ADD BX,+02
23F6:002D 83D100 ADC CX,+00
23F6:0030 BE0400 MOV SI,0004
23F6:0033 AD LODSW
23F6:0034 3D0100 CMP AX,0001
23F6:0037 7427 JZ 0060
23F6:0039 8BE8 MOV BP,AX
23F6:003B 83C602 ADD SI,+02
23F6:003E F7E0 MUL AX
23F6:0040 3BCA CMP CX,DX
23F6:0042 740A JZ 004E
23F6:0044 730E JNB 0054
23F6:0046 8BC3 MOV AX,BX
23F6:0048 AB STOSW
23F6:0049 8BC1 MOV AX,CX
23F6:004B AB STOSW
23F6:004C EBD8 JMP 0026
23F6:004E 3BD8 CMP BX,AX
23F6:0050 72F4 JB 0046
23F6:0052 74D6 JZ 002A
23F6:0054 8BC3 MOV AX,BX
23F6:0056 8BD1 MOV DX,CX
23F6:0058 F7F5 DIV BP
23F6:005A 0BD2 OR DX,DX
23F6:005C 75D5 JNZ 0033
23F6:005E EBCA JMP 002A
23F6:0060 B90800 MOV CX,0008
23F6:0063 2BF6 SUB SI,SI
23F6:0065 8BFE MOV DI,SI
23F6:0067 AD LODSW
23F6:0068 8BD0 MOV DX,AX
23F6:006A AD LODSW
23F6:006B AB STOSW
23F6:006C 8BC2 MOV AX,DX
23F6:006E 3D0A00 CMP AX,000A
23F6:0071 7203 JB 0076
23F6:0073 050600 ADD AX,0006
23F6:0076 86E0 XCHG AH,AL
23F6:0078 AB STOSW
23F6:0079 E2EC LOOP 0067
23F6:007B AD LODSW
23F6:007C 8BD0 MOV DX,AX
23F6:007E AD LODSW
23F6:007F 92 XCHG DX,AX
23F6:0080 BD1027 MOV BP,2710
23F6:0083 F7F5 DIV BP
23F6:0085 3D0A00 CMP AX,000A
23F6:0088 7203 JB 008D
23F6:008A 050600 ADD AX,0006
23F6:008D 86E0 XCHG AH,AL
23F6:008F AB STOSW
23F6:0090 8BC2 MOV AX,DX
23F6:0092 BA6400 MOV DX,0064
23F6:0095 F6F2 DIV DL
23F6:0097 B90400 MOV CX,0004
23F6:009A 86E5 XCHG AH,CH
23F6:009C D40A AAM
23F6:009E D2C8 ROR AL,CL
23F6:00A0 D3C8 ROR AX,CL
23F6:00A2 86C5 XCHG AL,CH
23F6:00A4 D40A AAM
23F6:00A6 D2C0 ROL AL,CL
23F6:00A8 D3C0 ROL AX,CL
195
23F6:00AA 8AC5 MOV AL,CH
23F6:00AC AB STOSW
23F6:00AD 83FE00 CMP SI,+00
23F6:00B0 75C9 JNZ 007B
23F6:00B2 CD01 INT 01
-g
Only the first and the last 256 bytes of displayed results (64 numbers in each block
of 256 bytes) are shown above. The last prime number indicated in the table is the
decimal number 180503. The result occupies the entire data segment. Note the
separate stack segment used in this program.
Conclusion: In this chapter we have seen how large integer numbers could be
handled in 8086 using only the assembly language programming without using any
serious tools. These programs illustrate the capability of the processor hardware
and its instruction set. In the previous chapters we have learnt about the processor
196
register set, instruction set architecture and looked at some simple programs. In
this last chapter, we have seen how those simple programs could be used to wrk out
more complex number handling routines. When we get into designing big
programs, the basic principles we have learnt in simpler programs of the earlier
chapter are still useful, but the complex programs do require careful management
of the resources available and proper tracking of the algorithm that we are
employing. Essentially, it is all about balancing the available resources against the
requirements of our algorithm, and this can greatly be helped by making adequate
coments so that the program can be easily understood and debugged without much
difficulty.
==00==
EXERCISES
1. Modify any one of the large number handling programs given in this Chapter
so that it uses the macros given at the end in section 8.
2. Write a program to invert a 20 byte number using the method of section 6.
197
APPENDIX A
In this book I have indicated the working and the results produced while solving a
.exe program in the debug environment. It is, as you have seen quite useful to produce a
permanent copy of the working with results for purposes of later use and demonstration.
In this appendix I shall explain the method that I have used for obtaining an MSWord file
indicating the working and results from the debug. The method is indicated below:
In the DOS environment invoke the debug with the parameter – [filename.exe]. If
this is followed by pressing of the <enter> key, the debug will work as usual. If instead,
the parameter is followed by – [>filename.dem], the output from the debug will go to
the .dem file. However, the response of the debug will not come on the screen; it will
directly go into the .dem file. (In Unix OS, there is a Tee operation permitted to make the
result go to the screen which is the standard output device and also to a named file. If
you are using a Unix system, it can be quite convenient. It is not so straight to do this in
the windows system, though there are tricks to overcome this deficiency. But I am sure
there is such a facility available in windows also. The information in the following site
(see under the FAQ index number 94) http://www.netikka.net/tsneti/info/tscmd.htm you
will find some tricks to do this job in the windows which may be useful. I have not used
any of these methods, but I have operated (blindly) in the debug environment by
redirecting the result directly to a .dem file without getting any visual feedback on the
screen. The .dem file can then be copied into an MSWord file and manipulated like a
.doc file. As a .dem file it cannot be easily manipulated as the output file created is not a
regular MSWord file; it is something like a notepad file.
For example, if we want to test the program style1.exe, the command sequence
could be as follows (refer Chapter 3, programming style 1):
From the DOS screen, give the command “debug style1.exe > style1.dem”. Press
the <enter> key as usual after the command. You will notice the debug prompt “-“. But
from then on, nothing of the debug responses will be seen on the screen. These responses
will directly go to the style1.dem file. The debug commands will have to be properly
given as required without any visible feedback on the responses of the debug. In case of
the demo on style1, the following sequences of commands were given.
u 0 30 <enter>
rax <enter>
ffef <enter>
r <enter>
t16 <enter>
q <enter>
These command sequences have to be worked out initially in the debug mode for
the required operations. Once the .dem file is obtained, it can conveniently be copied into
an MSWord file and edited wherever necessary. In this way, almost hands-on type of
feature can be had for studying assembly language programs, with a hard copy of the
working of the different programs.
Copyright © 2008 K M Hebbar
197