SPELL V2.0 DOCUMENTATION
                        Michael C. Adler
                       December 22, 1982

    (C)  1982 Michael C.  Adler

    This  program  has been released into the public domain by
    the author.   It  may  neither  be  sold  for  profit  nor
    included  in a sold software package without permission of
    the author.

    The first SPELL using this dictionary was probably written
    by  Ralph Gorin at Stanford.  It was transported to MIT by
    Wayne Mattson.  Both the program at MIT and the dictionary
    were most recently revised by  William  Ackerman  at  MIT.
    Section 5 of this document was copied from portions of Mr.
    Ackerman's documentation.

          Thanks to all for the effort spent designing the
          dictionary!

     Spell  is  a  program, written for Z80 processors running
    CP/M,  designed to detect misspellings in a document.

1.  USING SPELL

     The  minimum  configuration  of  SPELL  requires  the files
SPELL.COM and DICT.DIC (the main dictionary).  At  the  time  of
execution, DICT.DIC must be on either the default drive or drive
A:.

     The  name  of  the file to be corrected must be included on
the command line that is used to invoke spell.  If a drive  name
is  specified  as  a second file name, output is directed to the
specified drive.  Thus,

               SPELL useless.doc

will  check  the  file  "useless.doc"  and  direct output to the
default drive and

               SPELL b:useless.doc c:

will check the file "b:useless.doc" and direct output to disk c.

     Spell  will  check  the  input file for errors by comparing
each  word in the file to the dictionary.   If  a  word  is  not
found,  a   null (ascii 0) is placed before the word.  To change
this marking  character, see section 4, PATCHING  SPELL.   If  a
backup  version   (.BAK  file type) of the input file exists, it
will be deleted.   The input file will be renamed  to  a  backup
file and the checked  file will replace the input file.

2.  USER DICTIONARIES

     A  user  dictionary is a list of correct words that  can  be


                                1


loaded  by  SPELL to augment the main dictionary.  Words such as
proper nouns can be placed in user dictionaries to inhibit error
marking.  User dictionary files may be formatted in any way that
the user desires, as long as words are delimited by non-alphabe-
tic characters.

     SPELL  will  automatically  search  for the user dictionary
SPELL.DIC on the default drive and on drive A: if it is  not  on
the  default one.  It's contents are then loaded and temporarily
added to the dictionary.  It must be loaded again to be included
in subsequent executions of SPELL.

     SPELL  will also automatically search for d:file.UDC, where
file is the name of the file being corrected and d: is the drive
on which file is found.  If found, it is also loaded and  tempo-
rarily augments the dictionary.  Thus, users may create separate
dictionaries for each text file being corrected.  After locating
d:file.UDC,  SPELL  will  search  file d:file.ADD.  This file is
created by WordStar's ^QL command (see section 3) and is not  an
ASCII  file.  d:file.ADD contains commands generated by WordStar
to include specific words in the user dictionary associated with
d:file.  SPELL will temporarily place all of the words in it  in
the dictionary and will also save the words by copying them into
d:file.UDC.

     It  is  possible  to  load  additional user dictionaries by
specifying them on the SPELL command line.  A list of user  dic-
tionaries  must  be  preceded by a dollar sign.  A dictionary is
specified by a file name and an  optional  drive  name.   If  no
drive   is  specified,  the  default  drive is searched and then
drive A: is  checked.  Extensions are  ignored  and  default  to


     SPELL useless.doc b: $dict1 c:dict2 dict3.fun

would  correct  useless.doc and direct output to drive B:.  User
dictionary DICT1.DIC would be loaded from the default  drive  or
drive  A:,  dictionary  DICT2.DIC would be loaded from drive C:,
and DICT3.DIC would be loaded from the default  drive  or  drive
A:.   Notice that the extension .fun was ignored.

3.  WordStar's ^QL COMMAND

     Files checked by SPELL can be corrected using WordStar.  In
response  to  ^QL,  the user is asked which portions of the file
should be searched.  WordStar will then position the  cursor  on
the  first marked word and print a menu offering F (Fix word), B
(Bypass word), I (Ignore word), D (Add  to  dictionary),  and  S
(Add   to  supplemental  dictionary).   The F option deletes the
error  marker and returns to the WordStar  main  menu,  allowing
the user  to correct the word.  B will leave the word marker and
will    search   for   the   next   misspelled  word.   In  this
implementation of  SPELL, the I, D and S options all perform the
same function  (although I is easier to use because no  question
is  asked  by   WordStar).  If either of these options (I, D, S)
are chosen, the 


                                2


mark  will  be  removed  and the word will be added to file.ADD.
Thus, choosing these options informs SPELL that the word is cor-
rect and should not be marked again.  The D and S options do not
add the word to SPELL's main dictionary because the  compression
method  used to store the dictionary is too complicated to allow
such  modification  efficiently.   After  choosing  all  of  the
options   except  F,  WordStar will automatically search for the
next marked  word.

4.  PATCHING SPELL

     It  is  not  necessary  to  recompile  SPELL  to change the
character that  marks  misspelled  words.   The  byte  at  0103H
contains   the  marking  character.   Byte  0104H  contains  the
"default disk" [1 for A: , 2 for B: etc].  In  the  distribution
version of SPELL, the bytes are 0 and 1 [default is NULL and A:]

change the bytes at 0103H, 0104H.  Octal 23 - '#' is a tolerable
marking character for FinalWord.

5.  PROGRAM AND DICTIONARY CHARACTERISTICS

5.1 Word identification algorithm

     A  word  is  any  uninterrupted  sequence  of  letters  and
apostrophes, which does not begin or  end  with  an  apostrophe.
Any  punctuation,  digit,  or control character separates words.
Any word consisting of a single letter, or any  word  more  than
40 letters long, is considered to be correctly spelled.

5.2  Dictionary policy

     It  is  the  policy  of  this  program  to contain only one
spelling of a word, even if ordinary dictionaries show  two   or
more  "acceptable"  spellings.   Hence, the dictionary  contains
LABELED and LABELING,  but  not  LABELLED  or  LABELLING,   even
though  all  four are actually acceptable.  The intention  is to
enforce uniformity within each document.  The author  apologizes
for the restriction  on  creativity  and  diversity   that  this
necessitates,  but believes that it is the best policy  for this
program.

     The  dictionary  contains many technical and computer terms
such as MICROPROGRAM and DEBUGGER, but does not contain  extreme
jargon  words  such  as  CONTROLIFY  or  VALRET.  The dictionary
contains no proper names  other  than  names  of  countries  and
states  of  the  United  States.  The reason is that it would be
virtually impossible to contain all of  the  proper  names  that
commonly  arise  in  normal use.  Users should keep proper names
(and other correctly spelled words) that arise in their own work
in private dictionaries to avoid having to repeatedly tell SPELL
to accept them.

     The dictionary is significantly smaller than that found  in
other  spelling checkers, such as the DEC TOPS-20 program.   The
author believes that the larger dictionary would not reduce  the
number of false misspelling indications by very much.


                                3


[Note:  I  believe  that this dictionary is actually MUCH larger
than any dictionaries currently  available  for  microcomputers.
-Michael]

5.3  Dictionary flags

     Words in SPELL's main dictionary (but not the other dictio-
naries)  may  have  flags  associated  with them to indicate the
legality of suffixes without the need to keep the full  suffixed
words  in  the dictionary.  The flags have "names" consisting of
single letters.  Their meaning is as follows:

Let  #  and  @  be  "variables"  that  can stand for any letter.
Upper case letters are constants.  "..."  stands for any  string
of zero or more letters, but note that no word may  exist in the
dictionary which is not  at  least  2  letters  long,  so,   for
example,  FLY  may  not  be produced by placing the "Y" flag  on
"F".  Also, no  flag  is  effective  unless  the  word  that  it
creates  is  at  least 4 letters long, so, for example, WED  may
not be produced by placing the "D" flag on "WE".

"V" flag:
        ...E --> ...IVE  as in CREATE --> CREATIVE
        if # .ne. E, ...# --> ...#IVE  as in PREVENT --> PREVENTIVE

"N" flag:
        ...E --> ...ION  as in CREATE --> CREATION
        ...Y --> ...ICATION  as in MULTIPLY --> MULTIPLICATION
        if # .ne. E or Y, ...# --> ...#EN  as in FALL --> FALLEN

"X" flag:
        ...E --> ...IONS  as in CREATE --> CREATIONS
        ...Y --> ...ICATIONS  as in MULTIPLY --> MULTIPLICATIONS
        if # .ne. E or Y, ...# --> ...#ENS  as in WEAK --> WEAKENS

"H" flag:
        ...Y --> ...IETH  as in TWENTY --> TWENTIETH
        if # .ne. Y, ...# --> ...#TH  as in HUNDRED --> HUNDREDTH

"Y" FLAG:
        ... --> ...LY  as in QUICK --> QUICKLY

"G" FLAG:
        ...E --> ...ING  as in FILE --> FILING
        if # .ne. E, ...# --> ...#ING  as in CROSS --> CROSSING

"J" FLAG"
        ...E --> ...INGS  as in FILE --> FILINGS
        if # .ne. E, ...# --> ...#INGS  as in CROSS --> CROSSINGS

"D" FLAG:
        ...E --> ...ED  as in CREATE --> CREATED
        if @ .ne. A, E, I, O, or U,
                ...@Y --> ...@IED  as in IMPLY --> IMPLIED
        if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)


                                4


                ...@# --> ...@#ED  as in CROSS --> CROSSED
                                or CONVEY --> CONVEYED

"T" FLAG:
        ...E --> ...EST  as in LATE --> LATEST
        if @ .ne. A, E, I, O, or U,
                ...@Y --> ...@IEST  as in DIRTY --> DIRTIEST
        if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
                ...@# --> ...@#EST  as in SMALL --> SMALLEST
                                or GRAY --> GRAYEST

"R" FLAG:
        ...E --> ...ER  as in SKATE --> SKATER
        if @ .ne. A, E, I, O, or U,
                ...@Y --> ...@IER  as in MULTIPLY --> MULTIPLIER
        if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
                ...@# --> ...@#ER  as in BUILD --> BUILDER
                                or CONVEY --> CONVEYER

"Z FLAG:
        ...E --> ...ERS  as in SKATE --> SKATERS
        if @ .ne. A, E, I, O, or U,
                ...@Y --> ...@IERS  as in MULTIPLY --> MULTIPLIERS
        if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
                ...@# --> ...@#ERS  as in BUILD --> BUILDERS
                                or SLAY --> SLAYERS

"S" FLAG:
        if @ .ne. A, E, I, O, or U,
                ...@Y --> ...@IES  as in IMPLY --> IMPLIES
        if # .eq. S, X, Z, or H,
                ...# --> ...#ES  as in FIX --> FIXES
        if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
                ...# --> ...#S  as in BAT --> BATS
                                or CONVEY --> CONVEYS

"P" FLAG:
        if @ .ne. A, E, I, O, or U,
                ...@Y --> ...@INESS  as in CLOUDY --> CLOUDINESS
        if # .ne. Y, or @ = A, E, I, O, or U,
                ...@# --> ...@#NESS  as in LATE --> LATENESS
                                or GRAY --> GRAYNESS

"M" FLAG:
        ... --> ...'S  as in DOG --> DOG'S

Note:  The  existence  of a flag on a root word in the directory
is not by itself sufficient to  cause  SPELL  to  recognize  the
indicated  word  ending.   If  there  is  more than one root for
which a flag will indicate a given word, only one of  the  roots
is the correct one for which the flag is effective; generally it
is  the  longest  root.   For example, the "D" rule implies that
either PASS or PASSE, with a "D" flag, will yield  PASSED.   The
flag must be on PASSE; it will be ineffective on PASS.  This  is
because, when SPELL encounters the word PASSED and fails to


                                5


find  it  in its dictionary, it strips off the "D" and looks  up
PASSE.  Upon finding PASSE, it then accepts PASSED if and   only
if  PASSE  has  the  "D" flag.  Only if the word PASSE is not in
the main dictionary at all does the program strip  off  the  "E"
and  search  for  PASS.  Furthermore, some combinations of flags
are forbidden to allow for dense flag encoding  to  save  space.
For example, only one of the "P", "J", or "V" flags may be on in
any one word.

6.  SPELL INTERNALS

     SPELL  uses  a  number of temporary files during execution.
The file file.D$$ is the union of file.UDC and file.ADD.  At the
end of execution, file.UDC and file.ADD are deleted and file.D$$
is renamed to file.UDC.  The file file.$$$ is the  output  file.
At  the end of execution, file.BAK is deleted, the input file is
renamed to file.BAK, and file.$$$ is renamed to the  input  file
name.   Warning:  if  you  do  not  have  room  on your disk for
file.BAK, file.DOC and file.$$$ at the same time, either use two
drives or delete file.BAK before you start.

     SPELL corrects files with two passes of the input file.  On
the  first pass, the words in the file are sorted alphabetically
and duplicate words are eliminated.  An attempt is then made  to
search  for  the  words in the dictionary.  Words that are found
are marked.  On  the  second  pass  of  the  input  file,  SPELL
determines  whether  each  word  was  found  by locating them in
memory.  This method makes the operation of SPELL more efficient
because common words must be looked up only once and because the
dictionary can be searched sequentially,  minimizing  disk  head
travel.   If all of the file does not fit in memory on the first
pass, the input file is partitioned into sections  small  enough
to fit into memory and is then corrected in a series of two pass
operations  until  the  entire  file  has  been  checked.  It is
unlikely that memory will be filled in  large  systems  by  even
large text files as 3000 individual words should fit easily.

7.  DICTIONARY INTERNALS

     The dictionary has been compressed, significantly, in order
to  save  space.   Dictionary records are all 256 bytes long and
each record contains as many  words  as  will  fit.   Individual
words  are stored in the following code:

     4 bits -- Number  of  characters to copy from  the  previous 
               word.    Because  the  dictionary  is  stored   in 
               alphabetical  order,  this saves a large number of 
               characters.   This field is 0 at the beginning  of 
               each record.

 x * 5 bits -- Characters are stored in 5 bit code.  There may be 
               any  number  of  5 bit  characters.   A  character 
               string is terminated by the following field.

     3 bits -- Set to 111 binary to indicate the end of the word.  


                                6


               Since  11100  binary  is  greater  than  26,   all 
               alphabetic characters can be stored without  using 
               this combination.

     4 bits -- Number  of  bits of flag data following the  word.  
               The bit position of the flags has been ordered  so 
               that  the flags most frequently used are earliest.  
               Flags not stored are assumed to be off.

     x bits -- Flag data.  x is determined by the previous field.  
               Each bit represents one of the 14 suffix flags.

8.  MODIFYING THE MAIN DICTIONARY

     The  source  for the main dictionary can currently be found
in the file "[MIT-XX]SRC:<WBA>SPELL.DCT".  In order to  make  it
compatible  with  SPELL,  all of the "/" characters that delimit
flags must be converted to "%" characters so that flags will  be
considered earlier in the alphabet than hyphens (DOG%S should be
before DOG'S).  The file must then be sorted alphabetically.  No
utilities  are provided with SPELL to accomplish either of these
tasks.  Without high capacity  disk  drives,  you  may  find  it
necessary to perform the above steps on a larger computer.

     Once  a  copy of the main dictionary has been placed on the
microcomputer, use the program DICCRE to  create  a  dictionary.
Include  the name of the source file on the DICCRE command line.
DICCRE will create the files  DICT.DIC  (compressed  dictionary)
and  SPELL0.MAC (pointer file to dictionary) ON THE DEFAULT DISK
DRIVE.   When  it  has finished converting the input file to the
dictionary file, it will execute a warm boot if the output  file
is  on the same drive as the input file.  However, if the output
file is not on the same disk, it will ask whether another  input
file  exists.   This  feature  allows the user to put the source
file  on two disks in case it does not fit on one.  DICCRE  will
combine  them into one dictionary file.  If no more files exist,
answer  N  to the question.  If another file does exist, put the
disk with  the new file in the input drive and type Y.

     After the dictionary file has been created, it is necessary
to  recompile  SPELL  with the new pointer file, SPELL0.MAC.  If
your assembler does not support the INCLUDE statement, you  will
have  to  replace  the  line  INCLUDE  SPELL0.MAC  in  the  file
SPELL.MAC  with the contents  of  SPELL0.MAC.   After  SPELL  is
recompiled, be  sure to use the correct copy of DICT.DIC with it
or you will  obtain unpredictable results.

     For more information about dictionaries, see the file:
          [MIT-XX]SS:<WBA>DICT.LETTER

Good luck and happy hacking!

Michael Adler       (MADLER@MIT-ML)
3 Sunny Knoll Terrace
Lexington, MA  02173