CP/M Assembly Language
                       Part VI: Disk Files
                          by Eric Meyer

     By now you have a good understanding of the basics of 8080
assembly language. It's time to learn to do something truly
useful with it: how to read and write disk files.


1. BDOS File Access
     The CP/M BDOS is set up to make access to disk files as easy
as possible. You don't have to worry about where the file is on
the disk, or any of the mechanics of reading and writing blocks
of data. A small handful of BDOS functions take care of all these
tasks:

1.  SET DMA -- designate a 128 byte buffer to hold data;
2.  OPEN a file -- locate file and prepare to read or write; or
     MAKE a file -- create a new file to write;
3.  READ or WRITE -- a record (128 bytes) of data (repeat as many
     times as needed);
4.  CLOSE a file -- finish updating a file written to.

     Involved here are two "data structures", used to communicate
with these BDOS functions: the DMA and the FCB.
     The DMA (Direct Memory Access) is simply a small area of
memory where the BDOS is to put (or find) the data that will be
read (or written) when the file is accessed.
     Data is read in units of 128 bytes, called "records". There
are eight of these in 1K. You simply declare a region of memory
to be the DMA area, by passing its address to the BDOS with the
appropriate function call.
     As it happens, the default CP/M DMA is at address 0080h near
the bottom of memory, and you can just use that if you only need
one DMA at a time; otherwise, you need to set your own DMA, which
is as simple as:

LXI    D,MYDMA  ;point to the region you want
MVI    C,26     ;function 26 sets the DMA
CALL   BDOS     ;ask BDOS to do it

     (Remember declare BDOS EQU 0005H.)  You can reserve space
for this (or any other) purpose with the assembler instruction
DS, which we may not have mentioned before, e.g.,:

MYDMA:  DS     128      ; one record for DMA

sets up a 128-byte buffer at that point for use as a DMA. The
initial contents of a DS block are undefined (unpredictable).


2. The FCB
     The FCB (File Control Block) is a structure used by the BDOS
to keep track of your position in a file when it's open.
Basically it is a working copy of the directory entry for the
file, and if you've ever snooped around a disk directory with DU
or some such utility, it will look very familiar:

FCB:   d F I L E N A M E T Y P e x x x
       x x x x x x x x x x x x x x x x
       c r r r

     The FCB is 36 bytes long. The first byte ("d") is the number
of the drive (00=logged drive, 01=A:, 02=B:, etc). The next 11
bytes are the filename and type. The next byte ("e") is the
current extent number (an "extent" is a unit of 16K; long files
have several extents).
     All the "x" bytes are used by the BDOS to keep track of the
physical location of data on the disk. Their values don't concern
us. The next byte ("c") is the current record number, within the
current extent. And finally, the three bytes "rrr" are for a
random record address (not ordinarily used when accessing a file
sequentially, as we will be doing here).
     You can set up any number of FCBs in your own program, to
read and write files as needed. There is also a default CP/M FCB
at address 005Ch, which is set up when you invoke a program with
the name of any argument you give. For example, when you enter
WordStar with a command like A>ws mytext.fil<cr>, CP/M sets up
the default FCB at 005Ch in the following way:

005C:  0 M Y T E X T _ _ F I L 0 x x x
       x x x x x x x x x x x x x x x x
       0 0 0 0

     Thus WordStar can look at this location to see whether you
have given it a filename to open and edit. In this case it sees
that you have: on the logged drive ("00"), in this case A:, the
file MYTEXT.FIL. It can read this file from the beginning (note
extent 0, record 0) just by pointing to this FCB and asking the
BDOS to open the file.
     A further complication arises when you run a program with
two arguments, typically an input and an output file, for
example:

A>pip newfil.txt=b:oldfil.txt<cr>

     The first file goes into the FCB at 005Ch; the second name,
if present, is put at 006Ch.

005C:  0 N E W F I L _ _ T X T 0 x x x
006C:  2 O L D F I L _ _ T X T 0 x x x
       0 0 0 0

     As you can see, the second name is sitting in the middle of
the default FCB!  What PIP has to do is to copy the second
filename out to another FCB, which it constructs elsewhere. Then
the default FCB at 005Ch can be used to write NEWFIL.TXT, and the
other FCB (in PIP someplace) can be used to read B:OLDFIL.TXT.


3. Opening an Existing File
     In order to read an existing disk file, all you have to do
is "open" it. After constructing an appropriate FCB (or using the
default one, if possible), you simply:

LXI    D,FCB   ;point to the FCB with D-E reg.
MVI    C,15    ;Function 15 opens a file
CALL   BDOS    ;ask BDOS to do it

     At this point, if the specified file existed, it can now be
accessed with BDOS read/write functions, described below.
However, it's also possible that the file didn't exist, in which
case you do not want to go on and try to read or write to it!
     All the BDOS file functions leave a "return code" in the A
register, which you can then examine. For the OPEN, CLOSE, and
MAKE functions, a code from 00-03 indicates success, while FFh
indicates an error. So immediately after the BDOS call you should
check this return code, and if you see FFh, exit with an error
message of some sort. For example, you could do this:
 
CPI    0FFH    ;is the error code FF?
JZ     IOERR   ;jump to i/o error routine if so

     In the case of the OPEN function, an error generally means
the file was not found.


4. Making a New File
     If you're creating a new file, the process is very similar:
after creating the appropriate FCB, you simply:

LXI    D,FCB   ;point to the FCB with D-E
MVI    C,22    ;Function 22 makes a new file
CALL   BDOS    ;ask BDOS to do it

     Again, you had better check to see that the return code is
not FFh before continuing to write data to your new file. An
error means there was no room in the directory for another
filename.


5. Closing a File
     This is a bit out of order here, but: after you have
finished writing data to a file, you need to "close" the file, to
ensure that all the new data has actually been written to disk,
and the directory has been updated accordingly. By now you can
probably guess how this is done:

LXI    D,FCB   ;point to the FCB with D-E
MVI    C,16    ;Function 16 closes a file
CALL   BDOS    ;ask BDOS to do it

     Afterwards, check the return code. If it was FFh, the disk
(or possibly the directory) filled up and some of your data could
not be written.
     NOTE: you only need to close a file that has been written
to. Files opened for reading only need not (and probably
shouldn't) be closed.


6. Reading and Writing Sequentially
     Once a file is "open" (or "made") it can be read or written
to. This can be done either randomly or sequentially. Random
access is used when quick access to data is needed, and you are
going to maintain some table or index to tell you in what record
in the file a given individual entry can be found. (This is often
done by database programs.)  Sequential access is far more
common, though, and just goes from beginning to end of the file,
reading or writing the whole thing. All you do is:

LXI   D,FCB                       LXI    D,FCB
MVI   C,20   ;Read Seq.   OR      MVI    C,21   ;Write Seq.
CALL  BDOS                        CALL   BDOS

     Each time the record of data in the current DMA (for the
moment, at 0080h) is read from (or written to) the file, and the
record count is updated to point to the next.
     Again there are return codes. A return code of 00 indicates
success. For READ, an error code of 01 means you've passed the
end of the file, and there are no more records to read. Any other
error code indicates a physical error (such as disk full).


7. Character Input: GETCH
     Although the BDOS organizes disk i/o in terms of records of
128 bytes, this is almost never the unit of interest to you. If
you are reading a text file, for example, you will be interested
in the individual character, or possibly line, of text. Thus it
will be convenient to write a pair of functions to "hide" the
BDOS i/o activity, and just pretend that you're reading and
writing one character at a time. We will call them GETCH and
PUTCH. The usage will be:

;Reading a character            ;Writing a character
CALL  GETCH    ;get char in A   MVI   A,xx     ;put char in A
JC    IOERR    ;jump if error   CALL  PUTCH    ;write to file
CPI   1AH      ;is it EOF?      JC    IOERR    ;jump if error
JZ    ISEOF    ;jump if EOF

     For now, both routines can only be used to read/write ONE
file at a time. Here is the read routine:

;Routine to get character from open file at GCFCB
;Returns char or EOF in A, and C for Error
GETCH:  PUSH   H       ;save registers
        PUSH   D       ;  (so GETCH will be easy
        PUSH   B       ;   to use)
        LDA    GCFLG   ;check EOF flag
        CPI    0       ;is it clear?
        JNZ    GCEOF   ;if not, at EOF
        LDA    GCPOS   ;get position count
        CPI    80H     ;is it up to 128?
        JC     GCCHR   ;no, just go get char
        LXI    D,GCDMA ;yes, need to read another record
        MVI    C,26    ;set the DMA to GCDMA
        CALL   BDOS    ;ask BDOS to do it
        LXI    D,GCFCB ;use the FCB at GCFCB
        MVI    C,20    ;read a record sequentially
        CALL   BDOS    ;ask BDOS to do it
        CPI    1       ;check return code
        JZ     GCEOF   ;oops, 1: no more (end of file)
        JNC    GCERR   ;argh, >1: physical error
        STA    GCPOS   ;0: read successful, set GCPOS to 0
GCCHR:  LXI    H,GCDMA ;point to DMA with H-L
        MOV    E,A     ;move GCPOS into E
        MVI    D,0     ;now D-E is 16-bit version of GCPOS
        DAD    D       ;GCPOS+DMA points to next char
        MOV    A,M     ;get that char into A
        LXI    H,GCPOS ;point to GCPOS with H-L
        INR    M       ;increment GCPOS to point to next
        CPI    1AH     ;is it ^Z?
        JZ     GCEOF   ;if so, record it
        STC
        CMC            ;clear Carry
        JMP    GCRET   ;and return the character
GCEOF:  MVI    A,1AH   ;EOF, get a ^Z
        STA    GCFLG   ;set flag for future reference
        STC
        CMC            ;clear Carry
        JMP    GCRET   ;return the ^Z
GCERR:  MVI    A,1AH   ;ERROR, get a ^Z
        STA    GCFLG   ;set flag
        STC            ;and set Carry
GCRET:  POP    B       ;restore
        POP    D       ;  the
        POP    H       ;    registers
        RET            ;and return
GCFLG:  DS     1       ;flag says EOF reached
GCPOS:  DS     1       ;keep track of position in record
GCDMA:  DS     128     ;DMA for GETCH, 128 bytes
GCFCB:  DS     36      ;FCB for GETCH

     Let's discuss how this works. The key is the byte variable
GCPOS, which runs from 00. . .7FH to keep track of which is the
next byte in the record (in the DMA) to be read. GCPOS starts out
at 80H, which indicates that a new record must be read in.
     Once this is done, GCPOS is reset to 0. Then each time a
character is needed, it is found by adding GCPOS to the DMA
address, and GCPOS is incremented to point to the next.
     When GCPOS reaches 80H again, the whole record has been
used, and a new one must once again be read in.
     Ordinarily, GETCH simply returns with the character read in
A, and the C flag clear.
     But once the end of the file has been reached (either no
more records, or the EOF character 1Ah), GETCH returns with an
EOF. (Note the variable GCFLG, a flag that is set non-zero once
this occurs, so that GETCH will continue to return EOFs
thereafter.)  And if an error is encountered trying to read the
file, the C flag is returned.
     You have seen most of these instructions before, but there
are a couple of new ones here.
     LDA and STA are rather like MOV A,M and MOV M,A:  they fetch
or store the value in A to a memory address, but to the address
you specify directly, instead of to the address in H-L.
     Thus STA GCPOS stores the value in A to address GCPOS, and
LDA GCPOS gets it back.
     STC and CMC are the instructions to "set Carry" and
"complement Carry"; they only affect the Carry flag.
     There is no simple "clear Carry" instruction, which is what
we really want to do here; you have to first set it, then
complement it. ("Complement" means, roughly, "reverse".)


8. Character Output: PUTCH
     Here now is the complementary character output routine:

;Routine to write character in A to open file at PCFCB
;Returns C for write Error
PUTCH:  PUSH   H       ;save registers
        PUSH   D       ;  (so PUTCH will be
        PUSH   B       ;   easy to use)
        MOV    C,A     ;save the character in C
        LDA    PCPOS   ;get position count
        CPI    80H     ;is it up to 128?
        JC     PCCHR   ;no, just go write char
        PUSH   B       ;preserve character in C
        LXI    D,PCDMA ;need to write the record out
        MVI    C,26    ;set the DMA to PCDMA
        CALL   BDOS    ;ask BDOS to do it
        LXI    D,PCFCB ;use the FCB at PCFCB
        MVI    C,21    ;write a record sequentially
        CALL   BDOS    ;ask BDOS to do it
        POP    B       ;restore character in C
        CPI    0       ;check return code
        JNZ    PCERR   ;argh, >0: physical error
        STA    PCPOS   ;0: write successful, set PCPOS to 0
PCCHR:  LXI    H,PCDMA ;point to DMA with H-L
        MOV    E,A     ;move PCPOS into E
        MVI    D,0     ;now D-E is 16-bit version of PCPOS
        DAD    D       ;PCPOS+DMA points to next char
        MOV    M,C     ;put outgoing char in its place
        LXI    H,PCPOS ;point to PCPOS with H-L
        INR    M       ;increment PCPOS to point to next
        SUB    A       ;clear Carry flag, all is OK
        JMP    PCRET
PCERR:  STC            ;write error, set Carry
PCRET:  POP    B       ;restore
        POP    D       ;  the
        POP    H       ;    registers
        RET            ;and return
PCPOS:  DS     1       ;keep track of position in record
PCDMA:  DS     128     ;DMA for PUTCH, 128 bytes
PCFCB:  DS     36      ;FCB for PUTCH
;

     By and large, this is very similar to GETCH.
     Note the need to PUSH and POP the outgoing character, so it
isn't lost when we do the BDOS calls to write a record
periodically.
     Also note that PCPOS starts off at 0, not 80H, because you
don't need to write a record until it's full (whereas in GETCH,
we needed to begin by reading a record).


9. Using GETCH and PUTCH
     To use these routines properly, we will need a few routines
to open, make, and close files properly for them. First, for
GETCH, we need a routine to open the file for reading:

;Routine to open a file for GETCH (takes DE=FCB)
;Returns C if file not found
GCOPEN: MVI    A,80H    ;initialize GCPOS to 80H
        STA    GCPOS    ;so first record will be read
        SUB    A        ;and zero the EOF flag
        STA    GCFLG
        XCHG            ;put FCB address in HL
        LXI    D,GCFCB  ;we'll move it into GCFCB
        MVI    B,12     ;the drive/filename is 12 bytes
GCL1:   MOV    A,M      ;fetch a byte from FCB
        STAX   D        ;store it in GCFCB
        INX    D        ;point to next
        INX    H        ;in both places
        DCR    B        ;count down on 12 bytes
        JNZ    GCL1     ;loop if more
        SUB    A        ;now get a 0
        STAX   D        ;and put it into extent ("e")
        STA    GCFCB+32 ;and also into record ("r")
        LXI    D,GCFCB  ;point to start of GCFCB again
        MVI    C,15     ;function 15 opens a file
        CALL   BDOS     ;ask BDOS to do it
        CPI    0FFH     ;check return code
        CMC             ;now Carry is set if it was FFH
        RET             ;return with Carry set if error

     This routine simply copies the filename from the FCB address
given to it in the D-E register, into the GCFCB that will be used
by GETCH; then it tries to open the file, returning with the C
flag set if it could not.
     There is one new instruction here. STAX D (and its relative,
STAX B) are just like MOV M,A, except that they use the D-E (or
B-C) registers as pointers instead of H-L. Similarly, there are
instructions LDAX D and LDAX B, which load values as does MOV
A,M. (Unfortunately, these arbitrary names make them look
completely different.)
     The routine to make a new file for use by PUTCH will be very
similar. For now, it simply erases any pre-existing file of the
given name; later we may add code to preserve such a file
instead, possibly renaming it to a ".BAK" file.

;Routine to make a file for PUTCH (takes DE=FCB)
;Returns C if cannot make file
PCOPEN: SUB    A        ;initialize PCPOS to 0
        STA    PCPOS    ;for beginning to write with PUTCH
        PUSH   D        ;preserve FCB address
        MVI    C,19     ;function 19 ERASES a file
        CALL   BDOS     ;ask BDOS to do it
        POP    H        ;recover FCB address into HL
        LXI    D,PCFCB  ;we'll move it into PCFCB
        MVI    B,12     ;the drive/filename is 12 bytes
PCL1:   MOV    A,M      ;fetch a byte from FCB
        STAX   D        ;store it in PCFCB
        INX    D        ;point to next
        INX    H        ;in both places
        DCR    B        ;count down on 12 bytes
        JNZ    PCL1     ;loop if more
        SUB    A        ;now get a 0
        STAX   D        ;and put it into extent ("e")
        LXI    H,20     ;point ahead 20 bytes
        DAD    D        ;HL now points to record ("r")
        MOV    M,A      ;put 0 there, too
        LXI    D,PCFCB  ;point to start of GCFCB again
        MVI    C,22     ;function 22 makes a file
        CALL   BDOS     ;ask BDOS to do it
        CPI    0FFH     ;check return code
        CMC             ;now Carry is set if it was FFH
        RET             ;return with Carry set if error

     Note the use of BDOS function 19 to delete any file with the
same name before we try to make the new file. We don't care about
any error that may result: if the file did exist, it's now gone;
if it didn't, the attempt to erase it failed, but that's OK too.
Use function 19 with caution; we'll say no more about it here.
     Finally, we need a routine to clean up after PUTCH, when
we're all done writing, and close the file. The task is
complicated by the fact that there may still be characters
sitting in the PCDMA buffer, that need to be written before we
close the file. Here is the routine:

;Routine to close a file for PUTCH (no arguments)
;Returns C if cannot close file
PCLOSE: MVI    A,1AH    ;get a 1AH (EOF char)
        CALL   PUTCH    ;and write it to the file
        LDA    PCPOS    ;now look at position in PCDMA
        CPI    0        ;is it 00?
        JZ     PCLFIL   ;if so, no data, just close file
        LXI    D,PCDMA  ;need to write the last record out
        MVI    C,26     ;set the DMA to PCDMA
        CALL   BDOS     ;ask BDOS to do it
        LXI    D,PCFCB  ;use the FCB at PCFCB
        MVI    C,21     ;write a record sequentially
        CALL   BDOS     ;ask BDOS to do it
        CPI    0        ;check return code
        JNZ    PCLERR   ;warn if error
PCLFIL: LXI    D,PCFCB  ;all set, point to the FCB
        MVI    C,16     ;function 16 closes a file
        CALL   BDOS     ;ask BDOS to do it
        CPI    0FFH     ;examine return code
        CMC             ;Carry now set if was 0FFH
        RET             ;return
PCLERR: STC             ;set Carry for write error
        RET             ;and return

     By now this should all be pretty self-explanatory.


10. Things To Come
     Take a deep breath!  Next time we'll use these routines to
construct "filter" programs that read and process text files.