CP/M Assembly Language
                    Part I: Assembler Basics
                          by Eric Meyer

      I first discovered this about two years ago, when I needed
to modify the source code for a public domain modem program for
an unusual application.
     Since then, I've gone on to write a number of programs in
assembler, ranging from some simple public domain utilities to
the memory resident utility PRESTO!.
     For many such applications, assembler is the language of
choice: it's very compact and fast; it's the most efficient way
to do simple tasks that deal with moving around bytes of data,
such as copying and modifying files; and it allows the most
sophisticated interfacing with the CP/M operating system, which
is itself written in assembler.
     Another nice thing is that you already have all the tools
that you need to learn and use assembly language: nothing more to
buy, unless your needs grow to be very sophisticated.
     CP/M 2.2 includes the ASM assembler; CP/M 3.0 comes with MAC
and RMAC. All you lack is instructions. Let me quickly mention
two good books on the subject: CP/M Assembly Language
Programming, and The Soul of CP/M. Both, while not complete
language references, put a lot of emphasis on programming in the
CP/M environment, which will have you doing truly useful things
(like manipulating disk files) in short order.
     Both are far more comprehensive than I can attempt to be
here; I will just just present an introduction, and explain some
basic concepts for those who would like to become literate in
assembler.
     Numbers play an important role in all that follows.
Basically, everything in the computer is (or is represented as)
numbers -- such as the instructions that make up a program, or
the operating system itself; characters of text and other data
that you may be manipulating; addresses in memory where various
data or subroutines can be found; and so on.
     Only the context determines whether a particular value is to
be interpreted as a number, an ASCII character, part of an
address, or a machine instruction.
     This can be very powerful, but it's also potentially very
confusing. (Pascal aficionados may need a strong drink before
proceeding.)
     All numbers in what follows are decimal, unless followed by
a "H" (for Hexadecimal, base 16) or "B" (for Binary, base 2).
Hexadecimal is commonly used in assembly language programming, as
it's the most natural representation for the numbers from 1 to
255 (or 65535) that your computer manipulates on the most
fundamental level.
     If you're unfamiliar with these base systems, you may want
to find or make a conversion chart for reference.


1. The CPU
     The CPU (central processing unit) is the integrated circuit
at the heart of your computer. It fetches your instructions,
executes them, and keeps track in the meantime (via "interrupts")
of all the other tasks your computer needs to have done.
     Most CP/M computers today use the Z80 CPU, though some still
use the 8080 (or 8085), which are very similar but don't have
quite as many instructions.
     These "8-bit" CPUs deal primarily with "bytes", numeric
values from 0 to 255 (11111111B, or FFH); though two bytes
together can also be used as a 16-bit "word", a value from 0 to
65535 (FFFFH).
     In this manner, up to 64K (64 times 1024, or 65536, bytes)
of memory can be addressed. Part of this memory will be holding
the CP/M operating system; part will contain the transient
program that is actually running at the moment; and part will
remain available as data storage space for that program.


2. Assembly Language
     The CPU has a moderate number of "instructions", each of
which performs some simple but useful task: adding two values,
fetching a byte of data from memory, and so on. Each instruction
is "coded" by one (or possibly several) bytes, according to an
arbitrary system. For example, C9H (201) is the "return"
instruction, which marks the end of a subroutine.
     On the earliest microcomputers, programs were entered as a
series of such numbers, often with a row of eight mechanical
switches: thus the sequence "on, on, off, off, on, off, off, on"
would represent 11001001B, or C9H.
     This was incredibly tedious. Today, having plenty of memory
available to work with, you can write assembly language like any
other language, using an editor to create a text file; a special
program, the assembler, will translate the statements you write
(e.g., the mnemonic "RET" for return) into the appropriate
machine code.
     The assembler functions very much like a compiler for a
higher-level language. The difference is that a language compiler
will incorporate prewritten library routines to perform many
common tasks, and allows you to do very complex things with just
a few statements. Thus when you write something like:

100 INPUT "DIAMETER:",D
110 PRINT "CIRCUMFERENCE IS:",3.14159*D

you are actually invoking a whole set of routines (part of your
BASIC interpreter or compiler) that prints messages on the
screen, gets input from the keyboard, stores and retrieves data
values in memory, performs floating point arithmetic, and so on.
     When you program in assembler, you have to write every
single CPU instruction yourself. This can be a lot of work, since
the CPU can basically do two things: move a byte from one place
to another; and add, subtract, and do logical operations like
"and" and "or" with byte values from 1 to 255.
     Are you wondering how you would do floating point
multiplication (C=3.14159*D) using an instruction set so
primitive that it can only add and subtract integers from 0 to
255?  The answer is that if you are sane, you wouldn't. There are
tasks well suited to assembly language, and others best done in
higher level languages. (Somebody has already written the
floating point code that's part of your BASIC interpreter; take
advantage of it.)
     In assembler, stick to fundamentally lower level tasks, such
as talking to your computer hardware (like memory and I/O ports),
and manipulating disk files with the CP/M BDOS calls. For these
purposes there is no better "language".


3. The Assembler
     There are several common assemblers, but they all work in
similar ways. CP/M 2.2's ASM is a good example of a basic 8080
assembler. MAC is a macro assembler, meaning that it lets you
designate frequently-used blocks of code as "macros", and invoke
them with a single name, much as you would a function call in
another language -- this is just a convenience.
     RMAC is a relocatable macro assembler, meaning that it can
produce output in a format that can be installed to run in
different parts of memory as circumstances require; the usual
assembler output is code intended to run only at address 0100H,
the beginning of the TPA (transient program area) under CP/M.
(This is not something you are going to need to worry about at
first.)
     Many commercial assemblers are also available, such as
Microsoft's M80. Generally these are even more powerful, and
frequently they can also take advantage of the expanded
instruction set of the Z80 CPU.
     My personal favorites are SLR Systems' SLRMAC (8080) and
Z80ASM, both of which are incredibly fast relocatable assemblers,
and can also generate COM files directly. But unless you get as
heavily involved in assembly language as I have recently, it
won't much matter which you use. The common procedure is:

1)  Write the source code with your favorite text editor.
2)  Run the assembler, typically producing a HEX output file.
3)  Generate an executable (COM) file from the HEX file.

     The first step will require learning the assembler
instruction set. The second is usually as easy as typing A>ASM
PROG<cr>; see your computer documentation for (probably minimal)
instructions on assembler usage. The third is done using the
HEXCOM utility under CP/M 3.0, or LOAD and SAVE under CP/M 2.2
(though a fine public domain utility called MLOAD is much easier
than this combination).


4. Practical Tasks
     Before we get into real assembler programming, it's
worthwhile to note that frequently, what you need to do is not
actually to write a program from scratch, but simply to get an
existing program running the way you want. Good public domain
utilities, for example, often allow a number of features to be
changed, to allow proper operation on different computers, or
just to conform to different tastes.
     At the simplest level, the program's DOC file may just give
a list of patching addresses. For example, the instructions for
the (imaginary) XYZED text editor might include this information:

ADDRESS   VALUE
0130H     create BAKup files? (00=no, FF=yes)
0131H     copy buffer size in bytes (0...3000H)

     This indicates, for example, that you can get XYZED to
create backup files or not, as you like, by changing a particular
byte in the COM file. The easiest way to do this is to edit
XYZED.COM with a utility like EDFILE, PATCH, or DU; find the
value at address 0130H; and change it, if necessary, to what you
wanted. That's all you have to do; XYZED must be designed to
check the value it has at 0130H, and adjust its behavior
accordingly.
     Sometimes the installation process can be more complex.
Modem programs, for example, typically have to have very
different basic routines to talk to the I/O hardware of different
computers. Here there will often be a whole "overlay"; an
assembler source file containing an actual listing of portions of
the program.
     You will have to edit this file, then assemble it and merge
it with the rest of the COM file. This can require knowledge of
some basic assembly language, but sometimes it can also be as
simple as changing data values.
     Let's begin by considering a handful of simple assembler
directives. These are not actually CPU instructions at all; they
are merely instructions to the assembler, regarding where to put
code, and the insertion of data values. You will see these used
frequently in overlay files.


5. Assembler Directives

ORG (origin): tells the assembler the address in memory at which
     the following code, or data, should be put. Most programs,
     e.g., begin with "ORG 0100H", since transient CP/M programs
     load in at address 0100H, the beginning of the TPA.

END: marks the end of an assembler source file.

EQU (equate): assigns a numerical value to a label. This isn't a
     "variable", as its value cannot change, and it generates no
     output code; it's merely a convenience.

DB, DW (define byte, define word): like the "DATA" statement in
     BASIC, instructs the assembler simply to insert the
     following numerical values at the current address in memory.
     Presumably the program is going to refer to them as data at
     some point.


     Consider the XYZED program again. Instead of merely giving a
table of patch information to go by, as described above, it might
have provided you with an overlay file XYZEDOV.ASM which would
include the following instructions:

;XYZEDOV.ASM installation overlay
YES       EQU     0FFH
NO        EQU     0
ORG       0130H
BAKFLG:   DB      YES       ;create BAK files?
                            ;  (yes or no)
BUFSIZ:   DW      0800H     ;copy buffer size,
                            ; in bytes
END

     The semicolon ";", like REM in BASIC, indicates that the
rest of the line is simply a comment, to be ignored by the
assembler.
     The two EQUates tell the assembler to substitute the number
FFH (255) everywhere "YES" occurs in what follows, and 0 for
"NO".
     Not only is this convenient; it also makes the code more
understandable, by making it clear that a value is logical
(yes/no), rather than just an arbitrary number (like 255).
     This kind of thing always helps in assembly language, which
is prone to be very confusing otherwise.
     The ORG statement tells the assembler that the following
code or data is to be put starting at address 0130H in memory. In
this case, XYZED.COM expects to find these data items at this
address.
     The labels "BAKFLG:" and "BUFSIZ:" are just for the purpose
of identification here, though in an actual program, labels can
function as names for variables or subroutines, as we'll see
later.
     The "DB YES" inserts one byte of data (in this case "YES",
or FFH) at the current address (in this case 0130H, set by the
ORG statement).
     The "DW 0800H" inserts a word (two bytes) of data at the
current address (now 0131H, since the previous byte went at
0130H). In fact, two-byte values are stored "backwards" or low
byte first, so the assembler is actually going to put the 00H at
address 0131H, and then the 08H at 0132H. So this file has
instructed the assembler to set up the following sequence of
three data bytes:

9ADDRESS        DATA
0130H          FFH
0131H          00H
0132H          08H

     If you now assemble this file, with a command like

A> asm xyzedov<cr>

you will get an output file XYZEDOV.HEX which contains the HEX
version of this code, a compact (though still ASCII text) format
frequently used as an intermediary between source code and
(unreadable) machine code. If you looked at the HEX file, you
would see something like this:

:03013000FF0008F5

which can be read as "three bytes, starting at address 0130, as
follows: FF, 00, 08". (The last value on the line is just a
checksum byte for safety.)
     You can then use a utility like MLOAD to merge this HEX file
with the program XYZED.COM itself:

A> mload xyzed.com=xyzed.com,xyzedov.hex<cr>

and you will have a new copy of the XYZED program, with the
values changed accordingly.


6. Coming Up. . .
     In future installments we'll learn about the 8080 CPU and
its instruction set, and explain how to use CP/M BDOS calls.