CP/M Assembly Language Part X: Debugging by Eric Meyer This month, we're going to take a look at a very useful utility that comes with every CP/M system, but usually has little or no documentation: your debugger. 1. What Are Debuggers For? Up to this point, we have concentrated on the basics of writing your own assembly code. However, the quickest route to assembly language skills (as well as just getting the software you already have to do exactly what you want) is by studying, and modifying code written by others. Much public domain software is available in source code, but some isn't and of course commercial software, probably including various utilities that came with your computer, is not. Many programs can be modified to customize their operation according to your taste. (WordStar, for example, is notorious for its complexity in this regard.) You may be given a list of "patch addresses" and values for various options, and you will have to go in and change a byte here and there. Often there will be some task specific to your computer involving function keys, video display, clock, etc., that you would like to perform in one of your own programs. Probably one of your system utilities already does it, but you don't know how. There are various tools that allow you to study (and change) the way code is put together, starting from just a COM file. One such tool came with your CP/M system: the "debugger" DDT (or SID, for CP/M Plus). This is basically a program to read and modify bytes of memory, and it works either on what's already in memory (like your CP/M BIOS) or on a COM file that you ask it to read in. As usual, there are more sophisticated debugging tools available but everyone has DDT/SID, so we'll use these. The two programs are extremely similar, but differ in some respects (mostly regarding reading and writing files). 2. Examining Memory All you need to do to satisfy your curiosity about CP/M is to enter DDT/SID and take a look around. If you don't have a copy of the program available already, dust off your original system disk and make one; then type A>ddt or A>sid You will see a version message, then a prompt like "-" or "#". When you get bored, just type ^C here to exit the program. Meanwhile, there is a whole alphabet of commands available. Some take numerical arguments; these are in hexadecimal by default, though you also can type # followed by decimal input. The simplest thing to do is to DISPLAY memory. This produces a listing in both hex and ASCII, between any two addresses: -d0000,0007 0000: C3 03 E3 81 00 C3 00 BD ....... (Note, while we're here, that 0000 is the warm boot vector. Here, it reads JMP E303, which tells you where your BIOS jump table is. 0003 is the IOBYTE under CP/M 2.2, controlling I/O redirection. 0005 is the BDOS vector; it reads JMP BD00.) Ordinarily you would see something like D906, your BDOS entry point. DDT has changed this, for reasons that will become apparent. Like most commands, DISPLAY assumes natural defaults if you don't specify the addresses: start address where you left off last, end address 192 bytes (C0) later. Examples: -d0,ff display from 0000 to 00FF -d100 display from 0100 (to about 01C0) -d display the next bunch If what you're looking at is text or other data, the display will make a fair amount of sense. If it's machine code, however, it will look like gibberish. (It can be a challenge to tell the difference between code and data, when you don't know beforehand what you're working with.) To make sense of what may be program code, use the LIST command. This works the same way as DISPLAY, but it produces output in assembler mnemonics! Example: -l100 list from 0100 (about 11 lines worth) You'll see familiar things like "RET" on screen. If what you're looking at really is code, the sequence should make some sense. If it's actually data, the "code" produced by the LIST command will be nonsense. (ASCII text, for example, tends to produce inane strings like "MOV H,B; MOV H,C; . . . ") The LIST command makes it possible to reconstruct source code -- with many limitations. You have to guess what's code and what's data, where one routine starts and another ends. Z80 instructions aren't supported, so when encountered they generate several bytes of garbage. There are no labels or comments. Where someone may once have written: NEWLN: CALL SPMSG ;line is full... DB CR,LF,0 ;...start a new one JR LOOP ;and look for next entry you are now going to see merely something like 08BC CALL 1C04 (hmm . . . what's that?) 08BF DCR C (garbage generated by 08C0 LDAX B interpreting the text 08C1 NOP CR,LF,0 as machine code) 08C2 ??= 18 (DDT can't understand the 08C3 CMA Z80 "JR" either) But at least it's a start! If you get confused about numerical values, the HEX VALUE command will straighten you out. Given an argument in hex, decimal, or ASCII, it tells you the rest. Example: -h41 0041 #65 'A' 3. Making Changes There is a similar pair of commands for changing what you find. Obviously you should use them with caution; it is easy to make a mistake and crash your system. The SET MEMORY command allows you to change a byte at a time, in hex or ASCII. It takes one argument, a start address. Byte by byte, it shows the address and its current contents. You can hit to leave a value alone, or type in a new value, in hex, decimal (with "#"), or ASCII (in single quotes). To stop, type a period ".". Example: -s200 0200 44 (leave this alone) 0201 53 52 (change this) 0202 73 'w' (and this) 0203 00 . (all done) It also is possible to input assembler mnemonics, with the ASSEMBLE command. This turns DDT/SID into an instant interactive assembler, though rather limited in features. Example: -a186 0186 call 1124 0189 . (You don't get to see the original values; use LIST first if needed.) You can even manipulate the individual registers of the 8080 while debugging a program! The EXAMINE command will show you the contents of the CPU registers, or allow you to change them: -x -Z-E- A=00 B=0000 D=0000 H=0000 S=0100 P=0100 NOP This is telling you: the Zero and Parity (Even) flags are set, while the rest are clear. The contents of registers A, BC, DE, HL, and SP, PC (the stack and program counters), are as shown, and what the instruction at the PC (to be executed next) is. To change a value in a CPU register, just specify the register: -xh H=0000 1234 Now when you tell DDT to execute a piece of code, it will start out with register HL=1234h. 4. Running and Tracing Code You can actually test out programs, or bits of code that you write directly with the ASSEMBLE command. (I say "bits" because it isn't practical to do large programs without the convenience of labels.) The GOTO command will jump to, and execute, any block of code. Before you do this, you should see that the outermost routine ends with a special instruction that will return control to DDT: RST 7 If it ends with a warm boot (like JMP 0000) it will kick you out of DDT, and if it ends with just a RET it will wind up who knows where. Try out this little GOTO routine: -a100 (create a small routine) 0100 mvi e,7 0102 mvi c,2 0104 call 5 0107 rst 7 (it will return to DDT) 0108 . -g100 (run it) (Do you understand what happened?) In fact, DDT/SID is much more complicated. You can set "breakpoints", so that execution of a long program will stop at certain points for you to examine what has happened, or to request a "trace" that will show exactly how the contents of the CPU registers are changing, if you're having problems. This is an excellent way to learn how assembly language works. The whole debugging system is too complex to explain here, but it's useful to know at least how to trace a routine. Let's go back and take a closer look at the little bellringer we wrote above. To use the TRACE command, you must first set the PC to point to the code you want to run, then tell DDT how many instructions to execute. If you want to be really cautious you can just type "t" over and over again to trace a single instruction at a time; otherwise you can use something like "t10" to go 10 at a time. First let's make sure everything is set up: -l100,107 0100 MVI E,07 0102 MVI C,02 0104 CALL 0005 0107 RST 7 -x -Z-E- A=00 B=0007 D=1000 H=0000 S=0100 P=0107 RST So the routine is still there. Note the state the CPU was left in from running it the first time: the Zero and Parity flags are set, probably from something having zeroed the A register. There's an "07" in register C because that's where the "bell" character was moved for the BIOS CONOUT routine. The PC is at 0107, where the routine ended. To run it again in TRACE mode, we set the PC back to 0100 and then use the "t" command. Here's what I see (and hear) on my computer: -xp P=0107 0100 -t10 -Z-E- A=00 B=0007 D=1000 H=0000 S=0100 P=0100 MVI E,07 -Z-E- A=00 B=0007 D=1007 H=0000 S=0100 P=0102 MVI C,02 -Z-E- A=00 B=0002 D=1007 H=0000 S=0100 P=0104 CALL 0005 -Z-E- A=00 B=0002 D=1007 H=0000 S=00FE P=0005 JMP BD00 -Z-E- A=00 B=0002 D=1007 H=0000 S=00FE P=BD00 JMP C3A4 -Z-E- A=00 B=0002 D=1007 H=0000 S=00FE P=C3A4 XTHL -Z-E- A=00 B=0002 D=1007 H=0107 S=00FE P=C3A5 SHLD D6F2 -Z-E- A=00 B=0002 D=1007 H=0107 S=00FE P=C3A8 XTHL -Z-E- A=00 B=0002 D=1007 H=0000 S=00FE P=C3A9 JMP D806 (beep!) -Z-E- A=00 B=0007 D=1000 H=0000 S=0100 P=0107 RST 07 You can follow everything that happened, line by line. First an 07 (ASCII Bell) is put in the E register, and the PC is bumped to 0102 to point to the next instruction. Then 02 is put in C, and we point to 0104. Then we CALL 0005 (BDOS): a return to the next program address (0107) is placed on the stack, so the SP gets bumped down to 00FE (remember how the stack grows top down?), where 0107 is stored, and then the PC is made to point to 0005. There we should find a JMP to the BDOS, which would ordinarily have been something like D806, but in this case is BD00 because we're passing through some DDT code in high memory (more on this later). The next few things are being done by DDT itself: we JMP again to C3A4; there, XTHL exchanges what's on the top of the stack (which is 0107, our return address) with the current HL register (which is 0000). The SHLD D6F2 stores this return address, for some internal DDT purpose. Then we restore it again with another XTHL, and finally JMP to the "real" BDOS at D806. (Note that DDT does NOT trace the actual workings of the BDOS, only of your code, and a bit of its own.) At this point the bell rings, and we return. Note that the BDOS has moved the 07 from E into C, in order to call the BIOS CONOUT routine which expects to find it there. Also, popping the return address (0107) off the stack returns the SP to 0100. (Note: DDT/SID can fill up the screen with data far faster than you can comfortably read; but like many programs, it will pause if you type a ^S, and resume on a ^Q.) If you're still shaky on understanding some of the routines we've already written and used, tracing them is a great way to figure out how they really work. 5. Reading and Writing Files Fortunately, you're not limited to playing transient games in memory. You can also read and write disk files, allowing you to save what you've created, or to permanently modify ("patch") an existing program. Unfortunately the methods DDT and SID use for file I/O differ, so we'll run the examples in parallel columns, DDT on the left, SID on the right. To read in an existing (usually COM) file is simple: -iFILENAME.COM or #rFILENAME.COM -r You will see a message listing several addresses. In SID you can safely ignore this; in DDT you'd better remember what the "NEXT" address given was. Once a file is read in, you can examine or change it with all the commands mentioned above. Even a simple DISPLAY can be amusing. You may find messages and features you didn't know were there, such as "ILLEGAL ATTEMPT, NOW REFORMATTING HARD DISK". (Or just curious text left in the code by the programmer: WordStar's "Nosey, aren't you?" is a classic example.) Writing out the file is equally simple in SID: #wFILENAME.COM If this was not an already existing file, but something you just created, you will have to specify the address range you want to write out, usually starting at 0100, e.g., to save our little bell program you might use: #wBELL.SID,100,107 This will actually write out everything from 0100 to 017F, since files are written in whole records (128 bytes). (Note that I didn't call it "BELL.COM" because it ends with a RST 7, so it will do odd things if you try to run it under CP/M. If you first change this to a RET, you can write BELL.COM too.) Unfortunately DDT has NO mechanism for writing a disk file! You have to EXIT from DDT (with a ^C) and then use the CP/M 2.2 SAVE command to create a disk file from what's in memory: -^C A>save 1 BELL.DDT Hmm, what was that "1"? The SAVE command works in units of memory pages (256 bytes, 2 records). It always starts at 0100, and it needs to know HOW MANY pages to save. Obviously BELL is just one page long, but in general, you will have to do an ugly hex calculation to get the page length from that "NEXT" address that you remembered when the file was loaded into DDT. Each 100H is a page, and 1000H is 16 pages. Suppose you saw "NEXT 1B80" when you first read the file into DDT. That means the file runs from 0100 to 1B7F, so it's 1A80 bytes (1B80-0100) long. Breaking that up, we get: 1000 = 11 x 1000 = 1 x 16 pages = 16 pages A00 = A x 100 = 10 x 1 page = 10 more pages 80 = an extra half page = 1 more page so you would want to tell SAVE to write 27 pages. (Phew.) (By the way, how often have you wanted to create a 0k "file" to serve just as a disk or user area label? Under CP/M 2, just: A>SAVE 0 --MAIL--.87 Under CP/M 3 the SAVE command is quite different, and the easiest thing to do is instead to type A>PIP --MAIL--.87=CON: and then type ^Z.) 6. More Commands I don't want to go into great detail on all the DDT/SID commands (like call, passpoint, untrace), but here briefly are a few more: -fxxxx,yyyy,vv FILL memory xxxx-yyyy with vv -mxxxx,yyyy,zzzz MOVE memory xxxx-yyyy to address zzzz -iTEXT . . . INPUT line: besides being needed before an "r" command in DDT, this also sets up the FCBs and DMA in Page Zero just as the CCP would on encountering the arguments TEXT . . . So you can run a program that expects command line arguments under DDT/SID, by setting them up first with "i" before your "g" or "t" command. After you've used DDT for a while, you will probably notice that despite the variety of commands, there are some pretty obvious ones that are missing. Worst, perhaps, is that there is no "search" command to find a string of bytes. I have written a small RSX (SIDRSX11 from FOG- CPM.164) that can be easily attached to SID on a CP/M 3.0 system, that adds the commands "?" and "!": #?data FIND a data string #!xxxx,data WRITE a data string at address xxxx In each case, the data can be any mixture of hex digits (like EB412C00) and ASCII strings (like "Yes"). Unfortunately, an RSX will not work with DDT under CP/M 2.2. You may want to look through an array of simple public domain programs with names like SEARCH or FIND that can do this, although it is annoying to have to switch back and forth from these to DDT. 7. How Does It Do It? By this point, I hope you've begun to wonder how DDT/SID can allow you to load program code and run it at 0100, just as if DDT weren't there. The answer: DDT "relocates" itself to the high end of memory, right under the BDOS, in order to free up the usual program area for your debugging efforts. (Remember above, when we did a CALL BDOS, how the call passed through some extra code enroute? That was DDT, keeping its own record of the return address before passing the call along to the BDOS.) 8. Heavy Duty Disassembling You probably can see that trying to thoroughly break apart and understand a whole program of some size with DDT needs a LOT of work. While a debugger is useful for small scale tasks, if you really want to understand a large piece of a program written by someone else, you need a "disassembler" -- a program designed to take any COM file and produce readable assembly language output. In principle, the task is still complicated by all the difficulties of interpretation mentioned above, but in practice a disassembler can do a remarkably good job. There are both public domain and commercial programs available -- a really fine one, at least to start with, is the public domain Z80DIS. This is a Z80 disassembler, written by Kenneth Gielow; version 2.1 (Jan. 1987) is available on FOG-CPM.022 (6/87 revision). It's menu-driven, and reasonably fast. Most importantly, though, it does a very good job of guessing what's code and what's data, so that its first attempt at a reconstruction will already be nearly correct. (Many other programs get much less right the first time, meaning more work for you.) You may still have trouble figuring out exactly what the code you're looking at DOES, but a good disassembler makes the task as easy as humanly possible. Compare what you get from DDT with the output of a disassembler: ORIGINAL CODE newln: call spmsg db cr,lf,0 jr loop DDT OUTPUT 08BC CALL 1C04 08BF DCR C 08C0 LDAX B 08C1 NOP 08C2 ??= 18 08C3 CMA DISASSEMBLER OUTPUT J#08BC: CALL C.1C04 DEFB 0D,0A,00 JR J.0821 A good disassembler will give you a source code file that can be used to understand and modify a program in ways that go beyond the capabilities of a simple debugger. (Note, if you haven't already, that you usually can't insert or move code with a debugger. This would change all the addresses, and the code would crash if you tried to run it. But if you have source code to edit and reassemble, this isn't a problem.) As an exercise, if you get hold of a disassembler, run it on one of the programs (like FILTER.COM) you have already created. See whether you could figure out what the program does, on the basis of that very basic source code alone. 9. Caveats For safety's sake, keep an unmodified backup copy of all software. You may find that your brilliantly clever modifications suddenly aren't working, and it would be a shame to forget how to put the program back together again. Some legal and ethical questions are inevitably raised by the use of debugging tools on copyrighted (both commercial, and some public domain) software. Though personal feelings may differ, it seems to me that most anything you do with a debugger in the privacy of your own home (at least, in states other than Georgia) is fine. It's no crime to try to understand code, or to modify it to suit your needs or tastes. But as a rule, don't distribute copies of what you produce by disassembling or patching somebody else's code, either under their name or your own, as source code or COM file, for free or for money. That much said, have fun.