UNLZH.REL and CRLZH.REL Documentation
               for Version 2.0 (with RUNTIME buffer allocation)
                                   July  1991

                                    Abstract
                                   ----------

   UNLZH.REL and CRLZH.REL contain assembly language routines for the decoding 
   and generation of LZH-encoded files, respectively.  The routines need only 
   be supplied with a pointer to a large scracth area and linkages to character 
   input and character output routines to be used.  There are a few easily met 
   functional requirements for the calling routine and I/O routine.

   No Z80 opcodes are used, so these routines may be used on 8080/8085/V20/Z80 
   based machines.

   The coding contained within UNLZH.REL and CRLZH.REL are Copyright (c) 1989, 
   1991 by Roger Warren and may not be used or reproduced on a for-profit 
   basis.  Non-profit use is freely permitted.  The author will not be 
   responsible for any damage, loss, etc.  caused by the use of these programs.

                        Version History and Compatibility
                        ---------------------------------

   Version 1.1 was released in Sept. '89 and was the first public offering of 
   LZH encoding for CP/M.

   Version 2.0 (July '91) introduces several improvements/changes:
       More efficient encoding
       Greater speed
       More compact object code

   There are NO interface changes from the 1.x version.

   Of greatest importance is the encoding improvement.  This change, while 
   generating even smaller output files, means that files compressed with 
   version 2.x programs cannot be decoded by old 1.x programs.

   However, the version 2.x UNLZH module DYNAMICALLY ADJUSTS for 1.x encoded 
   input files.  Thus, version 2.x of UNLZH can be used on all LZH-encoded 
   files regardless of which algorithm version was used to encode the files.

   By extensive rewriting for size and speed, a 20% improvement in performance 
   was achieved.  Since the LZH algorithm is intrinsically slow, this will be 
   of great interest to many.  The recoding project allowed the incorporation 
   of version 2.x extensions to the original algorithm while not appreciably 
   affecting the code module size.

                                Care and Feeding
                               ------------------

   The infomation that follows documents both CRLZH.REL and UNLZH.REL.  One or 
   both may be in the library this file is in, depending upon the nature of 
   the program(s) it's bundled with...so ignore the superfluous information 
   (if any).

   UNLZH.REL performs LZH decoding.  It's progamming interface is similar to 
   the UNCR.REL it was made to mimic.  The programmer must provide the program 
   with 8k of buffer space.  If RUNTIME buffer allocation is selected (it IS 
   selected in the version supplied with LT, FCRLZH, CRLZH, and UCRLZH), a 
   pointer to the buffer must be supplied in the H/L register pair when the 
   routine is invoked.  If RUNTIME buffer allocation is not selected, the user 
   must supply a PUBLIC symbol, UTABLE, which is the base of the provided 
   buffer area.

   Once invoked, the routine allocates its own stack and 'stays in control' 
   until the de-compression is completed (or an error is encountered).  The 
   programmer must supply two routines GLZHUN and PLZHUN, via which UNLZH 
   'GETS' bytes from the input stream and 'PUTS' bytes to the output stream, 
   respectively.


   UNLZH *DOES NOT* compute/process checksums, etc. on the  input file.  Any 
   support of such features must be handled externally.

   GLZHUN and PLZHUN should save all registers except the A register and 
   flags. GLZHUN must return the next character from the input stream in the A 
   register.  GLZHUN should return with the CARRY flag RESET for a valid 
   character, or with the CARRY flag SET when the end of the input stream is 
   encountered (the content of the A reg should be zero in that case).

   Upon exit (return to the caller), UNLZH returns the following information:

              Carry reset (or A reg = 0) - Success
              Carry set,  A reg = 1      - Newer version required
              Carry set,  A reg = 2      - File not LZH endocoded
              Carry set,  A reg = 3      - Bad or corrupt file
              Carry set,  A reg = 4      - Insufficient memory

   UNLZH has 2 entry points, to be used as the programmer needs:

     UNLZH is the 'normal' entry point which expects the file to be completely 
     REWOUND.  At this entry point, the entire file is processed - the 
     standard header is examined, but not reported or acted upon.  By 
     examining the return code, the programmer can discern if the file was, 
     indeed, an LZH-encoded file and act accordingly.

     UNL is a secondary entry point which can be used when the programmer 
     needs to process the standard header information (file name and stamp) 
     and cannot (or doesn't want to) rewind the file.  When this entry point 
     is invoked, the header (down to and including the stamps/comment 
     terminating zero) must have been processed (so the next byte in the input 
     stream will be the revision level).
      
   The revision level of UNLZH.REL performs is at the byte at UNLZH-1.  A hex 
   value of 11 indicates version 1.1, etc.

   CRLZH.REL performs LZH encoding.  It's progamming interface is similar to 
   the CRUNCH.REL it was made to mimic.	The programmer must provide the 
   program with 20k of buffer space.  If RUNTIME buffer allocation is selected 
   (it IS selected in the version supplied with LT, FCRLZH, CRLZH, and 
   UCRLZH),a pointer to the buffer must be supplied in the H/L register pair 
   when the routine is invoked.  If RUNTIME buffer allocation is not selected, 
   the user must supply a PUBLIC symbol, CTABLE, which is the base of the 
   provided buffer area.

   In addition, at invocation time the A register must contain a value for 
   CRLZH to install in the 'CHECKSUM FLAG' portion of the file header (see 
   below).  This byte, to be semi-compatible with C.B. Falconer's version of 
   CRN for the 8080, is a subset of CRN's strategy byte:

                      value (hex)   meaning
                      ----------------------------------------------------
                           00       Standard modulo 65536 checksum is used
                           10       CRC16 is used
                           20,30    Unassigned

   SUPPORT FOR CHECK INFORMATION MUST BE EXTERNALLY PROVIDED IN THE 
   USER-SUPPLIED I/O ROUTINES (see below).  THIS IS ALSO TRUE OF CRN...BUT WAS 
   NOT EMPHASIZED!  CRLZH merely provides the support for posting the value in 
   the output stream since it happens to 'follow' some of the information 
   posted by CRLZH (see the header description, below).

   CRLZH supports no other features of the CRN's strategy byte, all other bits 
   are ignored.

   Once invoked, the routine allocates its own stack and 'stays in control' 
   until the de-compression is completed (or an error is encountered).  The 
   programmer must supply two routines GLZHEN and PLZHEN, via which CRLZH 
   'GETS' bytes from the input stream and 'PUTS' bytes to the output stream, 
   respectively.

   CRLZH *DOES NOT* compute/process checksums, etc. on the  input file.  Any 
   support of such features must be handled externally.  Specifically, the 
   GLZHEN routine must provide for the accumulation of check information and 
   the caller must write that check information to the output stream when 
   CRLZH returns to the caller.

   GLZHEN and PLZHEN should save all registers except the A register and 
   flags. GLZHEN must return the next character from the input stream in the A 
   register.  GLZHEN should return with the CARRY flag RESET for a valid 
   character, or with the CARRY flag SET when the end of the input stream is 
   encountered (the content of the A reg should be zero in that case).  As a 
   service to the user's output processor, every 256th call to PLZHEN is made 
   with the Z flag set (for monitoring).  All other times the Z flag is reset.

   Upon exit (return to the caller), CRLZH returns the following information:

              Carry reset (or A reg = 0) - Success
              Carry set,  A reg = 1      - File already LZH-Encoded,CRUNCHed or
                                           SQueezed
              Carry set,  A reg = 2      - File empty
              Carry set,  A reg = 3      - Insufficient memory

   CRLZH has a single entry point at the label CRLZH.  The user must have 
   placed the standard header information in the output stream and must have 
   the input stream REWOUND prior to invoking CRLZH.

   The revision level of CRLZH.REL performs is at the byte at CRLZH-1.  A hex 
   value of 11 indicates version 1.1, etc.

   Since CRLZH and UNLZH allocate their own stacks, the user is reminded not 
   to make too large a use of that stack in the user-supplied I/O routines.  
   In addition, if the user-supplied I/O routines decide to abort the CRLZH or 
   UNLZH operation (due to operator keystrokes, for example), the user must 
   take steps to restore his own stack.  Upon a normal (or error) return from 
   CRLZH or UNLZH the user's stack is properly restored.
 
                           STANDARD HEADER information
                          -----------------------------

   LZH encoding follows Steve Greenberg's CRUNCH file format.  The header 
   contains information identifying compression format, original file name, 
   etc:
     field  size     value    Purpose
   -----------------------------------------------------------------------  
       1   1 byte    076h     Signifies compressed form
       2   1 byte    0FDh     Signifies LZH encoding (0ff is for squeezed and
                              0feh is for CRUNCHED)
       3   variable  User     Original file name in the form name.ext Trailing
                   supplied   blanks on the name portion should be suppressed,
                              but a full 3 characters following the '.' should
                              be used for the extension (i.e. no blank
                              suppression).
       4   variable  User     OPTIONAL.  Used for file comment/stamp.  If used
                              the convention is that the comment is placed in
                              square brackets [Like this].  Other information
                              may be placed here (e.g., date stamp).  The
                              logical restriction is that a binary zero must
                              not be part of the comment and/or other informa-
                              tion.
       5   1 byte     00h     Signifies end of STANDARD HEADER
   
   For use of CRLZH, the user must supply all of the information above.

   For UNLZH, use of the UNLZH entry point causes UNLZH to expect to process 
   the above information.  It will discard the file name and optional 
   comment/stamp, but will examine the general form (first 2 fields for a 
   match and general form of the rest of the header).  If the user chooses to 
   use the UNL entry point, UNL will expect to process the first byte 
   following the end of the standard header.

   What follows is the following:

     field  size     value    Purpose
   -----------------------------------------------------------------------  
       6   1 byte  variable   Identifies generating program revision level.
                              (11H signifies program generated by version 1.1
       7   1 byte  variable   Significant revision level.  Indicates major
                              revision level of algorithm for decoding program
                              compatability. (10h indicates significant
                              revision 1.0)
       8   1 byte  variable   Check type.  0=checksum, 1=CRC16, others
                              currently undefined.
       9   1 byte     05h     Currently a SPARE, set to 05H by convention.

   Following this is the compressed file, itself.

                 What LZH compression does and how it compares
                -----------------------------------------------

   FIRST - It's SLOW.  Much slower than CRUNCH.  About even with the old 
   SQueeze.  It's the nature of the algorithm, but the current implementation 
   contributes somewhat (more on that later).

   The most impressive aspect of the algorithm is that it compresses further 
   than CRUNCH.  The nature of material being compressed is important - prose 
   and high level language code will compress further.  Since the algorithm 
   depends, in part, on patterns within the file being compressed, I was 
   somewhat surprised to discover that it does a better job (in general) on 
   .COM files than CRUNCH.  Personally, I was surprised to discover that LZH 
   compression of CRUNCHed files is possible (but I've disabled that ability 
   in this release)!

   Examples:
    CRUNCH of SLR180.COM   106% ratio   (actually made a larger file)
    CRLZH  of SLR180.COM    84% ratio
    CRUNCH of TYPELZ22.Z80  45% ratio
    CRLZH  of TYPELZ22.Z80  40% ratio
    CRUNCH of 'C' source    45% ratio   (typical 'C' src selected at random)
    CRLZH  of 'C' source    33% ratio   (same file as above)

                                A small history
                               -----------------

   I am NOT the originator of the LZH encoding.  The program that started my 
   whole involvement in the introduction of this method of compression to the 
   8-bit world bears the following opening comments:
      /*
      * LZHUF.C English version 1.0
      * Based on Japanese version 29-NOV-1988
      * LZSS coded by Haruhiko OKUMURA
      * Adaptive Huffman Coding coded by Haruyasu YOSHIZAKI
      * Edited and translated to English by Kenji RIKITAKE
      */

   This 'C' program implemented the compression algorithm of the LHARC program 
   which arrived on the US scene in the spring of '89.

   Being of a curious nature, I figured I'd play with the algorithm just to 
   understand it (the internal comments were, indeed, sparse - leaving MUCH to 
   the reader's contemplation/reverse engineering) while 'better minds' than I 
   tackled it in earnest.

   Months passed.  I found that I was 'mastering' the algorithm (read that as 
   demonstrating to myself that I understood it) by converting it piece-wise 
   to assembly language.  After a while, I was left with a 'C' language main 
   program, run time library, and I/O with the business end of the compression 
   and decompression implemented entirely in assembly language.  Since the 
   expected event of one of the 'heavies' in the PD and/or compression world 
   releasing a CP/M version of the compression algorithm hadn't come to pass, 
   I set about making a version myself.

   The natural choice was to prepare an analog to the CRUNCH.REL and UNCR.REL 
   of Mr. Steven Greenberg and Mr. C.B. Falconer and append to/substitute in 
   the existing, widely known programs for handling SQueezed and/or CRUNCHed 
   files.

   I saw no reason to tamper with the format CRUNCH uses on the output file.  
   Therefore, with the exception of taking the 'next' file type in sequence 
   (SQueezed files begin with a 76h,FFh sequence; Crunched files with 76h,FEh; 
   so LZH encoded files begin with 076h,FDh) and setting the revision levels 
   in the header to appropriate values , there's no difference in the output 
   file format.  So, you can probably coax your time/date stamping into 
   operating on LZH encoded files.

R. Warren
Sysop, The Elephant's Graveyard (Z-Node#9)
619-270-3148 (PCP area CASDI)