CP/M Assembly Language Part II: The 8080 CPU by Eric Meyer Last time we discussed the assembler itself, some basic assembler directives, and their use in patching (modifying) existing programs. Now we'll begin to investigate writing our own programs, which will first require learning something about the 8080 chip itself. We will be using the standard Intel mnemonics for 8080 instructions. (All of what follows will apply equally well to the Z80, which simply has more registers and instructions. Unfortunately there's a further complication: the Zilog instruction set, besides being larger, also uses different names for all the CPU instructions. We may deal with this in a later installment -- for now everything applies to an 8080 assembler.) I'm assuming you have some programming experience so won't explain such concepts as loops, subroutines, etc., though I will discuss ways in which assembly language differs from the higher level languages with which you may be familiar. 1. 8080 Registers The first thing to become accustomed to is the idea of a register. There are no variables and types (real number, character string) in assembly language, just numerical values and the place where they are stored. At any given time, a program will have most of its data stored someplace in memory, but what it is working on immediately must be fetched into the CPU itself. The CPU contains a number of registers for this purpose. By convention they have one-letter names, and each can hold one byte of data. Often they are depicted like this: +-------+=======+ | A | F | +-------+=======+ | B | C | +-------+-------+ | D | E | +-------+-------+ +-------+ | H | L | - - - > | (M) | +-------+-------+ +-------+ Seven of the eight registers "A-L" are used to hold data bytes; the "F" (Flag) register serves a different purpose, which we'll discuss later. The "A" register (accumulator) is where most of the arithmetic and logical operations take place, though it also can simply hold a byte for a while. The "B-C", "D-E", and "H-L" registers can function either separately or together in pairs to hold a word (two bytes) of data, often a memory address. The "M" register isn't really in the CPU at all -- it represents the contents of the memory at the address in the H-L register pair. Thus you can operate with a byte of data stored in memory just as you would one stored in the CPU itself, by putting its address in the H-L registers and referring to it as "register M". Typically the H-L pair is used as a pointer in this fashion. The B, C, D, and E registers are used for a variety of tasks: as pointers, holding counter values, storing bytes temporarily. 2. The MVI and MOV Instructions The simplest things you can do with a CPU register are put a data byte into it, and move a byte from one register to another. The MVI (move immediate) instruction puts a byte into a register: MVI A,13 and puts the value 13 (decimal, unless followed by "H" for hex or "B" for binary) into the A register. The MOV (move) instruction simply moves whatever's in one register into another: MOV B,A takes the contents of the A register (at the moment, 13) and puts that value in the B register too. (The A register remains unchanged.) Note that in both cases, the destination comes first, then the source. Thus you read "MOV B,A" as "MOVe into B, the value in A". Let'consider what's going on at the gut level. The assembler will translate each "MOV" instruction into one instruction byte; e.g., "MOV B,A" turns out to be 47H. Each "MVI" is also one byte, followed by a second byte to tell it what value is being put in the register, e.g., "MVI A," is 3EH so the instruction "MVI A,13" will become the two bytes 3EH, 0DH. This is typical of how all the assembly language instructions wind up in the executable program. The sequence of hex bytes 3E,0D,47 in a COM file would, when the program runs, put the value 13 into the A register, and then into the B register too. While we will mostly be discussing things at the level of the mnemonics (like "MVI") that you write, it helps to know what's ultimately happening. 3. The LXI and XCHG Instructions Similar instructions let you manipulate 16-bit values in register pairs though Intel mnemonics give these instructions totally different names. The LXI instruction (load index immediate) puts a word value into a register pair, referred to by the first register name: LXI H,801H, loads the H-L register pair with the value 801 hex. The high byte (08H) goes into the first register (H), the low byte (01H) into the second (L). (Note that this differs from the "backward" order in which 16-bit values are stored in memory, which we discussed last time.) In fact, you could have done exactly the same thing with the pair of instructions: MVI H,8 MVI L,1 except that the "LXI" instruction makes it clearer what you're doing (and in fact takes only 3 bytes of code, as opposed to 4 for the two "MVI"s). When you need to move 16-bit values around, you do in fact have to use pairs of "MOV" instructions, e.g., to move this value from the H-L register pair into B-C now would require: MOV B,H MOV C,L There is one exception. There are times when you would find it convenient to exchange the contents of the D-E and H-L register pairs, and this can be done by the simple instruction XCHG. 4. The INR, DCR and INX, DCX Instructions The need to "increment" and "decrement" (add and subtract 1) is very common. The INR and DCR instructions increment and decrement a single register, e.g., INR A adds one to whatever value was in the A register previously. A similar pair, INX and DCX, work with 16-bit values in register pairs: DCX H would subtract one from the value in the H-L register pair. If H-L still contained the value 801H we put in a moment ago, it would now contain 800H, (i.e., H would still contain 08H, and L would contain 00H.) Note that this is not the same as DCR H which would decrement the H register as a single byte, and not affect the L register at all. (If H-L had contained 801H, it would now contain 701H.) All arithmetic is cyclical here: negative numbers are represented by their complements. For example, if you do this: MVI A,0 DCR A the A register will contain the value FFH, which you may interpret as either 255 or -1, depending on the circumstances. If you increment that, of course you will get zero again. Most assemblers in fact allow you to write a statement like MVI A,-1 which is actually exactly equivalent to "MVI A,255". 5. Moving Bytes Around It's time to see how these instructions fit together to accomplish something potentially useful. Let's consider moving several bytes of data (this could easily be text) from one place to another. ORG 0100H ;code begins here 0100 211101 LXI H,SOURCE ;point to source with H-L 0103 111301 LXI D,DEST ;point to destination with D-E 0106 7E MOV A,M ;fetch byte from memory into A 0107 EB XCHG ;exchange so H-L is now dest 0108 77 MOV M,A ;store byte at destination 0109 EB XCHG ;now H-L is source again 010A 23 INX H ;point to the next byte 010B 13 INX D ;and the next destination 010C 7E MOV A,M ;get another byte 010D EB XCHG ;switch to destination again 010E 77 MOV M,A ;store the byte 010F EB XCHG ;switch back to source 0110 C9 RET ;all done, return. 0111 6869 SOURCE: DB 'hi' ;data: the bytes we will move, 0113 3F3F DEST: DB '??' ;and where we'll put them 0115 END ;end of source file In the middle is the source code, with comments to the right. On the left I have put the actual addresses and instruction bytes that will result. (Most assemblers can produce a "listing" [LST or PRN] file just like this, for your reference.) If you actually assembled this "program" into a COM file, it would contain the 21 bytes of instructions and data shown on the left, at addresses 0100-0115H. What's going on here? "SOURCE:" and "DEST:" are labels -- the assembler will figure out the actual addresses where they will wind up and will keep track of those (16-bit) values. When you refer to SOURCE, for example, the assembler will substitute the address of the label -- in this case 0111H -- so the statement "LXI H,SOURCE" is actually "LXI H,0111H". Similarly for DEST. The "DB" instruction will accept text characters in single quotes as shown. (We could have written "DB 68H,69H" instead of "DB 'hi'", since those are in fact the ASCII codes for these letters, but it wouldn't have been as clear what we meant). Remember that the "M register" actually refers to whatever's in memory at the address in the H-L register. Putting the address SOURCE in the H-L pair automatically makes "M" refer to the byte at that address, namely the 'h'. So "MOV A,M" fetches that 'h' into the A register. Then the program exchanges D-E and H-L, so that it's now DEST in the H-L pair, and "M" refers to the byte there (the first '?'); and it stores the 'h' there. Then it exchanges back again, increments both "pointers" so that they point to the next byte of data (the 'i') and the next destination (the second '?'), and does it again. Then we're finished, and the program returns control to the operating system. You can write, assemble, and run this little program. You might even try modifiying it, e.g., extend it to move three or four bytes instead of just two. But, it is ridiculously easy to crash your computer when doing assembly language programming, so don't leave disks in the machine that you don't have copies of, and don't be afraid to push the Reset button if disaster strikes. If you had used some higher-level language to write something like: 100 DEST$="??" 110 SOURCE$="hi" 120 DEST$=SOURCE$ your compiler would have generated machine instructions similar to those above, but about 10 times more. (When you program in assembly language, you don't have to include a single byte that you don't need.) This program was no big deal and nothing visible happened. But be patient; there's a lot more to learn.