;;01-06-88 FINREP.DOC Version 2.8 05/28/88 Eric Gans French Department, UCLA Los Angeles, CA 90024 * MS-DOS users: FINREP now exists for DOS. MS-DOS v2.6a is * * (more or less) equivalent to CP/M v2.8 (including bug fix). * Version 2.8 Corrects another couple of bugs introduced into v2.6 (sigh). It would be nice if this were the final version for a while. Version 2.7 Corrects a bug introduced in v2.6 that cut the output file short in certain cases. Version 2.6 Corrected a bug in reading words across sectors. Version 2.5 Corrected a bug in the wild-card file routine (thanks to faithful FINREP user John Stensvaag). Improved verify routine (as per DOS version); reduced program size. Version 2.4 The search routines have been extensively revised and debugged. FINREP should now find just about any string, however perverse. Version 2.3 Fixed bug that made verification incompatible with multiple (wildcard) files. Allow wildcard (?, not *) at end of search string. Version 2.2 Added "V" flag to allow user verification ("Replace (y/n)?") before replacement in text files; a few minor improvements. Version 2.1 Fixed bug that treated wildcard filetypes as single files. Added a couple of clarifications to DOC file. Version 2.0 Allows wildcard searches (various options), wildcard filetypes. Easier entry of caps (!string! instead of !s!t!r!i!n!g). Allows control characters other than letters (e.g., ^[,^@). ***** FINREP is a search/replace program that remedies most of the deficiencies of Wordstar's ^QA and other similar commands. Aside from being faster, it has important additional features: - allows wildcards in search string (v2.0) - allows wildcard filename (find/replace in groups of files) - command-line entry allows batch processing by SUBMIT, etc. - allows entry of control or hex characters (0-FF) - can be used with object files (e.g., COM files) - sets capitalization (first letter or whole word) and high bit of the last character according to the old string This last feature means that, for example, if you are writing a scenario where the characters' names appear sometimes in CAPS and sometimes just Capitalized, you don't need two search/replaces to replace one name with another: JOE will be replaced by HARRY, Joe by Harry, and even joe by harry. ***** Format: finrep [d:]fn [newfn] /[switches]/ oldstring [newstring] (Enter "finrep" alone for a brief command summary.) If a second filename is given, the changes will be placed in that file; if not, the old filename will hold the changes and the original file will be changed from fn.ft to fn.BAK (unless the "B" switch is used). Wildcards (*,?) may be used anywhere in the filename; if there are wildcards in the filetype (after the ".") the B switch will be set automatically to suppress creation of BAK files. If wildcards are used, a second filename cannot be entered. If you enter: A>finrep urk*.doc // "blurk" "zap" the files urk01.doc, urk33.doc, urktty.doc would be modified as expected and files urk01.bak, urk33.bak etc. would be created. The characters "//" must be entered even if no switches are used. The switches are as follows: B = no BAK file. This switch disables making a BAK file; the original file will be lost. (Use only if you did not enter a second filename.) Q = allow wildcards in search string. (The program runs faster if this switch is not used.) The various options for this command are described below. V = verify replacement. If this switch is used, the context of the search string will be displayed on the screen and you will be queried re replacement. This switch cannot be used along with the O or H switches (see below). O = Object file. If this switch is used, the program will ignore end-of-file markers (1AH), as in PIP's "o" command. Use for search/replace in non-text files. WARNING: if you don't use "O" with a non-text file it will be cut off after the first 1AH. That's why FINREP makes BAK files! H = keep High bit. With this switch, all bytes are searched exactly as they are; letters with the high bit set will not be identified with their standard ASCII counterparts. W = no Whole-word search. This switch is used to search a string whether or not it is a whole word; with it, a search for "the" will find "other", "their" etc. NB - The program defines a "word" as anything preceded and followed by something other than a letter (space, punctuation mark, number, control character, beginning or end of file). Thus this switch is not needed if the search string is a series of words, a word preceded by a control character that is not contiguous to another word, etc. C = respect case. This switch allows you to distinguish capital from lower-case letters: in a search for "the", "The" will not be found/replaced. (NB: Upper case letters cannot be entered within quotes; see below.) In normal operation (with no switches), the search will include whole words only; it will ignore case and high bits, but will set the new string to correspond to the old in this respect, capitalizing the first letter or the whole string and setting the high bit of the last as required. This last feature is only useful if the replacement string is one word long; if it contains more than one word, you may set the high bits when you enter the string, or let your word-processor (e.g. Wordstar) do it. If you include capitals in your replacement string, they will be respected even if the find string is not capitalized. If you want to search for a capitalized word, you must use the "C" switch (or the "H" or "O" switch); FINREP will give you an error message if you don't. The last four switches are in the relation O > H > C > W ; the "higher" switch includes the smaller. Thus if the "H" switch is used, capitals and lower case will be distinguished, and the search will not be limited to whole words. ***************************************************************** String entry: The find and replace strings must be separated by a space from the switch entry and from each other. Strings should be entered as follows: ASCII - in quotes: "blurk", "54%**90er @" The following characters must NOT be placed between quotes: HEX - separate by commas: d,1A,cd,10,ff,3 CAPITALS - between !!: !A!,!hello! [NEW IN V2.0] CONTROL CHARACTERS - preceded by ^: ^M,^m^j,^c,^C,^[,^^ WILDCARDS - ????, ?n (1 <= n <= 9) or ?* (indeterminate) The "|" character is used to display a break in the replace string (see below). All ascii letters entered within quotes will be treated as LOWER CASE. If you want to search upper case letters with the "C" switch, or to put upper case letters in the replace string, you must surround them with "!!", unless you enter them as hex characters: (A = 41, B = 42 ...). Sorry about this, but the CP/M command line cannot distinguish upper from lower case. Any combination of characters is valid; for clarity, groups should be separated by commas, although this is only necessary for individual hex characters: !h!"ello",^m^j,e5,?7,32,!blurk!,^q Quotes and !..! must be closed. To search/replace the quotation mark, enter it as a hex character ("=22h). You can search for "!" if you keep it between quotes. The length of the find/replace strings is limited to 30 bytes; this length applies to the strings themselves and not to the keyboard entry, which cannot exceed 127 bytes in all (blame CP/M for this). Thus ^j,cd,ff,3d is 4 bytes long. In the case of indeterminate wildcards, up to 255 bytes are allowed, but the limit of 30 still stands for the find/replace strings themselves. If you do not enter a replace string, the searched-for string will be replaced by nothing, i.e., deleted. WILDCARDS The wildcard search has a great deal of flexibility. For obvious reasons, wildcards cannot appear at the beginning of the search string. (In versions below 2.3 they can't be at the end either. For some not-so-obvious reason this seemed a bad thing at the time.) The options are as follows: 1. Simple wildcard search: all bytes of the search string will be replaced. finrep zz.txt /q/ "d"?2"e" "xxyz" will replace all words like "done", "dare", "dove" etc. by "xxyz" A maximum of four wildcard groups are allowed in this form: thus "a"?"cd"?4"ijk"??"nopq"?"s" is a permissible search string 2. Simple wildcard search with break. Only one wildcard group is allowed; the replace string is divided in two, with the first part replacing what precedes the wildcards and the second what follows; the intermediate bytes are left alone. The break CAN appear at the beginning or end of the replace string to indicate that the corresponding part of the find string is to be deleted. A blank replace string (entered simply as: | ) will delete both. finrep xx.txt /qw/ "d"?2"e" "xx"|"yzz" will replace the "d" in this pattern with "xx" and the "e" with "yzz"; "done" becomes "xxonyyz", "madame" -> "maxxamyxx", etc. (This last example only works if the "W" switch is used.) 3. Indeterminate wildcard search/replace. The indeterminate wildcard "?*" must be the only wildcard in the search string. In this option the whole string from the beginning to end is replaced. A maximum of 255 characters will be allowed in the search string; longer strings will not be found. finrep blurk.let /qw/ "xy"?*"zq" "garbage" will replace all strings beginning and ending with the indicated letters: "xyrwerwerzq", "xyuu is the nbrzq", "xy ^C^Yzq" will all be replaced by "garbage" NB - Since FINREP only looks for one thing at a time, it will not find nested pairs of strings, and will appear to miss some pairs where the second half of the search string is over 255 bytes away from the point at which the search began. (FINREP checks this only every 128 bytes.) Thus if you are looking for "the"?*"of", FINREP will sometimes miss the apparent "hit" in a text like this: ... the [ ... the ... ] of ... where the [] contain over 255 bytes. This is not a bug, but a limitation of the program. 4. Indeterminate wildcard with break. This is a very powerful option that allows you, for example, to replace PerfectWriter "fences" with WordStar control toggles (& vice versa). Here again only one wildcard group is allowed in the search string; the intermediate bytes are left unchanged. finrep zap.kkk /qw/ "123"?*"45" "6"|"789" will replace "zz123blurk blurk xxxc oo45rr" by "zz6blurk blurk xxxc oo789rr"; finrep perf.wri /qwc/ "@"!ux!"{"?*"}" ^s|^s will replace the PW underline fence @UX{ ... } by WS's ^S ... ^S Note that the "C" flag is used here to search for caps; if l.c. as well as caps are acceptable, it could be omitted and the search string written "@ux{"?*"}". You can delete the fences altogether by replacing the ^s|^s by | in the last example. One user thought the word "break" was misleading and should be replaced by "save," since the "|" in the replace string means that you preserve the wildcard part of the search string. In other words: finrep zap.txt /qw/ "<<"?*">>" will kill everything between the "<<..>>" whereas: finrep zap.txt /qw/ "<<"?*">>" | will just kill the "<<>>" and "save" their contents. FINREP can be aborted at any time by typing (=1B HEX). I preferred this to ^C since an extra ^C will be read by CP/M as a Warm Boot. Except when the "V" switch is used, the only screen output is the number of strings replaced and, if you use wildcards, the names & total number of files processed. If you want to see the replacement procedure in action, use a word-processor! Notes: 1. FINREP will modify files of any length; it uses the entire memory below the CCP as its buffer, and writes to disk whenever the buffer fills up. Since it doesn't overwrite the CCP, it doesn't have to end with a Warm Boot. 2. There is no intrinsic limit on the number of files allowed under the wildcard filename option; for sanity's sake, you will get an error message if there are more than 255. 3. If you want to create a version of FINREP with some of the switches preset, run the program without a filename: finrep /[sw1][sw2]../ After it returns to the CP/M prompt, save 13 finrep1.com will keep the switches as you like them. This procedure is NOT REVERSIBLE, so keep your original FINREP unchanged. 4. In deciding whether to capitalize a whole word/string, FINREP looks at the first two letters. If the find string has only one letter, only the first letter of the replacement string will be capitalized. If the word to be replaced has unusual captalization (e.g. BBrrOOOmm), use the "C" switch and/or enter separate replacement strings for different variants. 5. In using indeterminate wildcards, you should use the "W" switch unless BOTH HALVES of the search string begin and end on word boundaries. 6. Re speed, FINREP is somewhat faster than Wordstar's ^QA command. But if all you want to do is replace a string, it is over three times faster, since its time includes loading and saving the file. Measured on a long (84 K) file, FINREP took 27 seconds and WS 34 for a typical search/replace. But WS needs at least 10 seconds to load and a good minute to save the file and exit. With a little practice, the command line can be entered as fast as WS's, and it can be included in SUBMIT files or reproduced by programs like SYNONYM or my SYN.COM. ***** FINREP was written at the request of John-Mark Stensvaag of Vanderbilt University. At first I couldn't see the use for it, but he convinced me (he is a professor of Law). The wildcard features added in v1.1 and v2.0 were also his idea; the verification feature in v2.2 was ssuggested by J. Olsen of Chicago. I would appreciate hearing from you about (a) bugs and (b) suggestions for further enhancements.