ZGREP11.DOC 7/20/88 Richard Brewster ZGREP11.COM is a version of the UNIX utility GREP which searches for a pattern in a file or group of files and prints out the lines in which the pattern is found. It is an acronym for Global Regular Expression Print ZGREP11 is a sibling of ZCOPY20 and ZERA10. In addition to pattern searching ZGREP can copy, concatenate, list, and line number ASCII files. The general invocation of ZGREP is - A0>ZGREP "expression" afn {afn...} {>outfile} {[flags} The arguments in braces above are optional. (The braces themselves should not be typed on the actual command line.) You can rename ZGREP11.COM to ZG.COM. This is how the examples are shown below. Typing the program name with no arguments prints a help screen - A0>ZG >ZGHELP.TXT (creates the file ZGHELP.TXT) -or- A0>ZG >LST: (sends the help text to your printer) 1. Option Flags [N [Q [V and [F are option flags [N (Number) prints the line number of each line found [Q (Query) queries the user before searching files [V (inVert) prints lines which do NOT match the expression [F (Formfeed) sends a formfeed to the output after each search (The console bell also sounds whenever a formfeed is sent.) Flags may appear in any position on the command line. A0>ZG [n "exp" file - is equivalent to - A0>ZG "exp" [n file A0>ZG [nvf "exp" file - is the same as - A0>ZG [n "exp" [f file [v 2. Filespec List The filespec list can be any number of filespecs, separated by spaces. Each filespec may be ambiguous. For example - A1>ZG "exp" b:text.fil c12:*.let 2:foo.doc a:read.me would search for "exp" in TEXT.FIL on drive B1:, all files with the type of .LET on drive C12:, in C2:FOO.DOC, and A2:READ.ME. Drive and user begin with the logged values, and then default to the last specified along the list. Messages indicate which file is currently being searched. Filespecs are searched in the order they appear on the command line. Each filespec may match up to a maximum of 100 files, and you can increase this number by patching ZG.COM, see below. Wild USER number is also allowed - A0>ZG "exp" B*:*.DOC will search in all .DOC files in all user areas of B: 3. Redirected Output to File or Printer A0>ZG "exp" text.fil >b3:out.txt will search A0:TEXT.FIL for "exp" and place all lines in which "exp" is found into B3:OUT.TXT The '>' must precede the outfile spec with no intervening space. The output redirective may appear in any position on the command line. A0>ZG "exp" *.txt >lst: [f will send the output to the CP/M list device, separating the output from each file search with a formfeed. CON:, and PUN: are also supported, but are probably of limited use. If output is redirected, error and processing messages will still be sent to the CP/M console. Only the lines matched in the search file will be sent to the redirected output. This allows some special applications such as file copying, concatenating, and line numbering - see below how to do this. 4. "Regular Expression" The most complex and versatile aspect of ZGREP is the "regular expression" which specifies the search pattern. The expression is the first command line argument which is not preceded by a '[' or a '>'. But this does not limit the search pattern because '\[' will start the pattern off with '['. In fact the backslash, '\' is one of a number of special characters you need to know. Double and single quotes (" and ') are command line argument delimiters, along with space and tab. If you enclose the entire expression in quote marks, you may include spaces in the expression (if your CCP allows). You can even search for a pattern that starts with a single or double quote mark, either by preceeding it with '\', or by enclosing the whole expression in the other type of quote mark, e.g. A0>ZG '"hello" there' text.fil - or - A0>ZG \"hello"_there text.fil will both search for the pattern <"HELLO" THERE> in TEXT.FIL, including the double quote marks in the search pattern. The only difference in the second example is that a tab as well as a space between <"HELLO"> and will match. ALL COMPARISONS DISREGARD THE CASE OF ALPHABETIC CHARACTERS. Any character not described below as special just represents itself in the search pattern, but alphabetic case is ignored. \ preceding a character will quote that character as it is, i.e. will remove any special meaning that it otherwise would have. It even can be applied to itself: '\\' will enter the backslash character into the search pattern. ^ preceding an ALPHABETIC character, or one of [\]^_?, will enter the corresponding control character into the search pattern with the exception of ^J, ^M, and ^Z, since all of these are line terminators and can never be found inside a line of ASCII text. If you try to enter one of these or any other disallowed control character, you will receive an error message and ZGREP will abort, saving you from searching for an unmatchable string. NOTE: the combination ^\ is a special case which represents a control character. Do not use ^\\ for this. The same applies for ^[, ^], ^^, and ^_. ^? represents DEL. | at the beginning of the expression matches the beginning of a line. $ at the end of the expression matches the end of a line. _ (underline char) will match a space or a tab. ? will match any single character. : followed by one of the letters below will match a class (a set) of characters. :a will match any alphabetic [a-z] :c will match any control character including DEL (7FH), except for TAB (^I). (Note that ^@ ^J ^M and ^Z will never be found.) :d will match any digit [0-9] :n will match any alphanumeric [a-z] or [0-9] :p will match any punctuation character, i.e. the set !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ These colon classes could be specified explicitly. For example, [a-z] matches the same set as :a, but the colon specifier executes much faster. [] delimits a class of your choosing, for example [.?!] would find the end punctuation of English sentences. It makes no difference the order in which characters appear. If any one of the set is found in the place in the general expression that the class appears, the line will register a match. For example, "ZCPR[132]" will match "ZCPR1", "ZCPR2", or "ZCPR3". This same set could be specified as a RANGE by "ZCPR[1-3]". For ranges some special rules apply. The range delimiter (the dash) must be preceded by a character other than '[', and followed by something other than ']' or '\', or you will get an error message. Make sure that the range goes from lower to higher in the ASCII set. Inverse ranges are disallowed because, for example, [c-a] can never match anything. Ranges may appear anywhere within the class delimiters. For example, [ql-pbe-h] means the same as [qlmnopbefgh], which is also the same as [befghlmnopq]. (Yes, think about this for a minute.) Control characters may be included within classes, or used as range starting or ending characters. [^q-^t] and [^e^]] are valid sets. Note that '|' '$' '_' '?', ':', '+', and '*' have no special meaning inside of a class set and do not have to be preceded by '\', although you may if you wish. The only need for the backslash prefix here is to allow entry of '^', '-', ']', or '\' (itself) into the set. Finally, if the character '!' (chosen to honor all the C programmers) is the first character after the opening bracket, then any character EXCEPT those in the class will be matched. For example, [!a-z] will match any character that is not alphabetic. [!!] will match anything but an exclamation point! - (minus sign) is a special character which must FOLLOW a sub- expression that you want to optionally match. That is, there can be zero or one occurance of the expression. For example, the expression ".CQ-_" will find all lines containing ".C_" and also lines containing ".CQ_" + FOLLOWING a sub-expression will match one or more occurances of the sub-expression. For example, "_m:a+e_" will find all lines containing a word that begins with 'M' and ends with 'E'. How about that! However, this won't find 'ME'. * FOLLOWING a sub-expression will match ZERO or more occurances of the sub-expression. "_m:a*e_" will find the same lines as the above, plus those containing the separate word 'ME'. As another example, "(?*)" will find all parenthesized expressions, including the expression "()". 5. General Operational Characteristics Number of input files As distributed, ZGREP is limited to processing 100 files for EACH ambiguous filespec given on the command line. If more than 100 files match an ambiguous filespec, only the first 100 will be searched. Files are searched first in the order of filespecs given on the command line, and then as they are found in the disk directory. The names list for each filespec is sorted first. ZGREP operates on ASCII files, and any kind of text file can be processed. As characters are read from the file, the high order bit is first reset so that only normal ASCII codes (0 to 127) are compared. This is useful for searching WordStar(tm) format files. Whenever a linefeed or carriage return is found, the line is terminated. This means that you cannot search for a pattern that goes across a line boundary, that is, you cannot include linefeeds or carriage returns in your search expression. But you CAN specify the beginning or end of a line in the search string, using the '|' and '$' characters. Lines as long as 512 characters can be handled. Control-Z is the normal End of File marker, so you can search through binary files only as far as the first ^Z. 6. Indirect Input Instead of - or in addition to - the filespecs typed on the command line, you can create an ASCII text file listing a set of files you want searched and then command ZGREP to use it. ZGREP "exp" ZGREP ZGREP "July :D*, 1988" b:*.TXT [Q Arguments in the file are ADDED to whatever else you type on the command line. Be careful not to put any SPACES after a filespec on a line in the indirect file, or it will be an invalid filespec. Lower case letters are converted to uppercase automatically. As a special usage of input redirection, try A0>ZGREP ZG "exp" file >CON: You will see the '^' prefix used on the screen, even if you have set video attributes for the screen. This is because redirecting the output to the screen (as CON:) opens a separate IO stream which uses the default prefix '^' as if it were writing to a file. Further patching - There are several more locations which may be patched using DDT, EDFILE, etc. (The following locations are shown as displayed by DDT or EDFILE. The actual locations in the .COM file are 0100H less.) 0199H 00H entry Query mode [Q flag 01 = ON, 00 = OFF 019BH 00H entry Number mode [N flag 019DH 00H entry inVert mode [V flag 019FH 00H entry Formfeed [F flag 01A1H 50H ('P') maximum drive letter, ASCII upper case. 01A3H 0FH (15) maximum user number. 01A5H 64H (100) maximum number of files per filespec. 01A6H 00 - high order byte for maximum number of files. Maximum drive is the highest drive letter that will be accepted as valid in a filespec. Maximum user number is similar, but also sets the highest user number that will be searched if a wild user spec, for example, A*:*.COM is given. You may want to lower the maximum user to speed up wild user searches, since the search starts at user zero and goes up to and includes the maximum. The highest allowable maximum user is 31. 9. Aborting ZGREP If you want to abort ZGREP , the most reliable way is to type ^C. You will be prompted with the following message: Abort? (Y/N) A 'Y' answer will abort the session. 10. Special operations on ASCII files These applications are accomplished by trying to match only the end of a line ('$' - which never matches), and then displaying the non-matching lines using the [v flag, thus getting all the lines in the file. Wildcard TYPE ZG $ [v file1.txt file2.doc *.not will type all the files to the console in order. Use control-S to pause the display at any time. File Copying A0:ZG $ [v TEXT.FIL >B:TEXT.CPY will copy TEXT.FIL to B0:TEXT.CPY Warning: if you try to copy a file to the same drive, user area and file name, it will be destroyed. Line numbering ZG $ [vn TEXT.FIL >TEXT.NUM will create a copy of TEXT.FIL called TEXT.NUM which has line numbers preceding each line of text. Concatenating ZG $ [v file1 file2 file3 >file123 will concatenate the three files into file123. Listing ZG $ [vf file? >lst: will send file1, file2, and file3 to the printer in that order, with a formfeed after each file. Filtering ZG $ [v WORDSTAR.DOC >ASCII.TXT will convert a WordStar file to an ASCII file by zeroing all the 8th bits, and displaying all control chars with the '^' prefix.