Text file INDEX generator (c) T.Jennings 7/21/81 Page 1 You can do anything you want with this program except sell it. Give it to anyone who wants it. Address bugs, suggestions, etc. to: Tom Jennings 221 W. Springfield St. Boston MA 02118 Leave me a message at NECS CBBS. INDEX is a utility for use with WordStar, and generates an alphabetically sorted index for a file. Words or phrases to be put in the indexed are marked with control characters not used elswhere within WordStar. (At least as of version 1.01) If a file is later edited, invoking INDEX again will remove the old index, produce a new one, and add it to the end of the file. INDEX can also be use with any non-WordStar text editor that can insert control characters into the text. No other assumptions are made about the contents of the file, except that the file is terminated by a control-Z character (correct way) or end of file. INDEX scans the text file for certain WordStar "dot commands", such as page breaks, etc., in order to maintain proper page numbers. If no page "dot" commands are found, as with other editors, pages are counted internally.  Text file INDEX generator (c) T.Jennings 7/21/81 Page 2 There are two different kinds of index entries; WORDS and PHRASES. WORDS are what are normally thought of as words; groups of characters, seperated by spaces, commas carriage returns (called CR from now on) or linefeeds (LF). PHRASES are groups of words, including the spaces that seperate the words. Since words are easy to find, only a single marker is necessary to identify them. This marker is a control-K character, ^K. Phrases must have both ends marked, and control-P is used, ^P. Below are some examples: The sixth word in this ^Ksentence will be put in the index. ^PThis entire phrase will be there^P, also. Since this is page 2 of the manual, the index for these should look like: Sentence...................................... 2 This entire phrase............................ 2 These two examples are actually in the index at the end of this manual. WordStar dot commands INDEX is optimized for use with WordStar. By default, it scans the file for "dot commands"; notably .pa and "..index". .PA is used to count pages, and must be the first word on the line to be counted as a dot command. The "..index" is created and used by INDEX. As defined in the WordStar manual, any line beginning with two dots (..) will be ignored when printed. INDEX uses this to mark the beginning of the index. When INDEX is run, if it finds the "..index" line, it will remove all text following that line. This allows creating an index for an updated file that already has an index. If one was not found, it is added. CAUTION: NEVER put a ".." WordStar dot command followed by index, as described above. All text following this line will be deleted from the file. A single space after the .. will suffice, or use .IG instead.  Text file INDEX generator (c) T.Jennings 7/21/81 Page 3 Sorting As stated before, the index generated is sorted alphabetically. The entire phrase or word is used in sorting, except that case is ignored. If identical entries are found, they are listed on a single line, followed by all page numbers found on. Unfortunately, multiple identical page numbers will be listed. For clarity, some examples of how things work follows. The following two phrases are equivalent, as case is ignored, and will be listed on one line. The first occurence will be the entry on the left side of the page. This is the first phrase THIS IS THE FIRST PHrAsE Since length counts, these next are all in proper order. This This is This is what  Text file INDEX generator (c) T.Jennings 7/21/81 Page 4 Side effects and cautions This is a list of implementation peculiarities, etc. -In general, any group of one or more white-space characters (see below) are converted into a single space character. Phrases with embedded spaces will have all extra spaces (more than one) removed. A phrase may start and end on different lines (or even pages) and will work properly. Leading spaces will be removed from the index entry. -The following characters are converted to and treated as a single ASCII space character. These also mark the end of a word: CR LF tab comma (,) semicolon (;) colon (:) suprise-mark (!) -BUG NOTICE Periods are removed from the character stream. This was a cheap way out since it is a sentence-terminator. The only time this is a problem is when putting things in the index such as filenames. (i.e., FILENAME.TYP) If someone complains, it will probably get fixed. -BUG NOTICE The buffers for the indexed words is in an array in memory. Like most of my kludges, there is minimal error checking done. There is currently a limit of 1000 decimal words/phrases per index, and there is a 32768 byte buffer made for them. If you only have 40K of memory.... -ANNOYANCE WordStar control characters, such as ^B, count as legal characters, but are not printer in the index. So, if you indexed two words, ^K^Bfoo and ^Kfoo, they will get seperate entries. -GOOD THING INDEX assumes you do not want to lose your source file, and does all work in temporary files. When invoked, it generates a file name.IDX, and copies the input file to it as it looks for words. (see note on ..index and EOF) Then, the index is put in it, and the file is closed. Then if all is OK, any file name.BAK is deleted, the original name.ext renamed to name.BAK, and name.IDX renamed to name.ext. -Words and phrases will have any leading spaces removed. The first character of any word or phrase will be converted to upper case. Note that if a phrase consists of a single blank, it will NOT be removed from the index. This does not count for words, of course, as the next word that comes along will be indexed. -Because of wonderful CP/M, and the fact that some of it's utilities use end-of-file instead of a control-Z character to terminate text, INDEX cannot detect the following read errors: unwriten random record, zero length.  Text file INDEX generator (c) T.Jennings 7/21/81 Page 5 -INDEX sorts in ASCII order. Digits, quotes, parenthesis, etc come before letters. -The sort routine used is horrible. It uses a bubble sort, with extra unnecessary exchanges. Didn't require much thought, though.  Text file INDEX generator (c) T.Jennings 7/21/81 Page 6 Colon................................... 4 Comma................................... 4 Control-Z............................... 4 CP/M.................................... 4 CR...................................... 4 Embedded spaces......................... 4 End-of-file............................. 4 Examples................................ 2 Filenames............................... 4 INDEX................................... 1 Leading spaces.......................... 4, 4 LF...................................... 4 Non-WordStar text editor................ 1 Periods................................. 4 PHRASES................................. 2 Semicolon............................... 4 Sentence................................ 2 Side effects and cautions............... 4 Suprise-mark............................ 4 Tab..................................... 4 This entire phrase will be there........ 2 White-space characters.................. 4 WORDS................................... 2 WordStar................................ 1 WordStar "dot commands"................. 1 WordStar dot commands................... 2 ^B...................................... 4 ^K...................................... 2 ^P...................................... 2