==== DOCUMENTATION FOR UNIQUE.CMD (C) ==== Version 2.4 Released 12/16/1987 A dBASEII(tm) Utility To Purge Duplicate Records From a dBASEII(tm) .DBF file by James A. Gronek Phoenix, Arizona & Stephen Aidikonis Streamwood, Illinois COPYRIGHT (C) UCS, inc. 1984, 1985, 1987 ALL RIGHTS RESERVED dBASEII is a trademark of Ashton-Tate Incorporates routines from: DATEIN.CMD (c) 1983 by Bernard L. Lambert SHRINK.CMD (c) 1984 by Les Shockley ==== WHAT IS IT?? ==== UNIQUE.CMD is a utility program that is designed to allow you to purge almost ANY dBASEII .DBF file of duplicate records. It is written in the dBASEII applications language and requires dBASEII version 2.4 or later to run on any computer. UNIQUE.CMD may be used with any database that has character fields to use as keys. It creates a file, UNIQUE.DBF, that contains the unduplicated records of your original source database. Once you have verified the validity of UNIQUE.DBF, you may rename it for use in place of your source database. It will not alter your source database in any ANY WAY!! ==== HOW DO I USE IT?? ==== Place DBASE.COM, DBASEOVR.COM, and UNIQUE.CMD together with the database you wish to purge. Ensure that you have a blank disk in the default drive, or sufficient space on the disk with your database to allow for working room. See the discussion on free space requirements later under TRADEOFFS. When you execute the program ( A>DBASE UNIQUE ), you will be prompted for a system date, and a default drive. If you do not have sufficient space on the same disk with your original database, select a disk with sufficient space for working files and the database of unique records, UNIQUE.DBF. Next, you will be presented with a list of the .DBF files on your default disk, and asked to select the database to be purged. If you selected a blank disk as your default drive, no files will be listed. Enter the PRIMARY FILENAME of the .DBF to be purged (including the drive, if not on the default disk). After some status messages, you will be shown a selection of character fields in your .DBF and asked for an index key. Select the field names to be used for comparison, one at a time. Ensure that you select a sufficient number of fields to compare on to guarantee record uniqueness. Watch out that you don't exceed dBASE's internal limits of 99 characters in your index key! If you do, you will be unceremoniously dumped to a dot prompt. As you enter the field names to be used, you will see a status line appear on the bottom line of the screen, showing the syntax of the index key you are constructing. It will look something like this: FIELDNAME1+FIELDNAME2+FIELDNAME3+FIELDNAME4 ... You may select on up to FIVE separate fields so long as the combined length of the fields does not exceed the 99 character limit. When you have entered the last field name to be used, enter a to continue. You will be presented with a choice of two ways to attack the problem. You may select either "A" or "B" from the menu. See the section following: ==== TRADEOFFS ==== The two methods used to purge the database differ dramatically. Selection "A" uses a technique that is very MEMORY VARIABLE DEPENDANT and can only be used on databases with 22 or fewer total fields. The purge function will begin almost immediately, but proceeds SLOWLY from there, on through your database. This method requires less free disk space to use than method "B", described next, but can take as much as TWICE as long to run as method "B". Selection "B" uses a technique that requires the indexing of the source database on the index key before starting the actual purge function. This can take varying amounts of time to accomplish, depending on the number of records in the source database, and the complexity of the index key. About the only thing I can tell you is that it will ALWAYS seem to take longer than it should. Once the Purge has started, the program fairly RIPS through the database, about as fast as a terminal can update the count on the screen, faster in some cases. This method is extremely DISK SPACE DEPENDANT. IN ORDER TO ENSURE A SUCCESSFUL RUN YOU SHOULD HAVE, AT LEAST, FOUR (4) TIMES AS MUCH FREE SPACE AS THE SIZE OF THE SOURCE DATABASE!!! If you are running on a hard disk, you have no problems. Those on floppies may run out of room with larger databases. This method can be (by my benchmark tests, anyway) up to 2.5 times as fast as method "A". So, which one should you use? It depends on your database. If you have a small database (few records) then selection 'A' will probably be faster, overall. You will not have to wait while UNIQUE constructs an index file on your database. If, on the other hand, you have a database of 300 or more records, 'B' is your best bet. It starts slower, but runs MUCH faster through longer databases, than the 'A' routine. During the time the program is processing your source .DBF, status messages will be sent to the console advising you of what record numbers are under study at any given time. ==== KNOWN BUGS AND CAUTIONS ==== UNIQUE.CMD is NOT fast. It is written entirely in the dBASEII applications language and is intensely disk-bound, hence, very slow. Despite its 'underwhelming' speed of execution, its flexibity may make it a valuable addition to the dBASEII users 'Toolkit'. ==== VERSION 2.0 UPDATE INFORMATION ==== This release of UNIQUE.CMD incorporates many improvements over the original version, primarily due to the efforts of STEPHEN AIDIKONIS of Streamwood, Illinois. Steve came up with error checking routines that greatly improved the key selection routine. He also came up with TWO different (and superior) methods to do the purge function. I studied and tested both before deciding that each had its' own merits, and potential applications. Well, they are both incorporated into this version, you may select from them as appropriate. Thanks Steve, your efforts made a useful tool into an indispensible one. ==== VERSION 2.1 UPDATE INFORMATION ==== This release repairs a 'bug' found in the (A) search routine and incorporates some additional changes to reduce the size of the .CMD file. ==== VERSION 2.2 UPDATE INFORMATION ==== This release is functionally identical to version 2.1, with the exception that default drives may be in the range "A" through "P". This modification was made at the request of some users without contiguous disk drives. Version 2.2 also incorporates some additional changes to reduce the size of the command file, and speed execution. ==== VERSION 2.3 UPDATE INFORMATION ==== This release incorporates NUMEROUS coding changes, mostly to eliminate unnecessary or redundant commands, and correct a couple of minor display bugs. Version 2.3 is, again, released as an encoded file. With the plethora of Public Domain Uncrunchers available, I hold no illusions concerning the security of the file. The 'uncrunched' file will indeed demonstrate the coding techniques used, but, they will be unintelligable to all but the most sophisticated of dBASE programmers without the source comments. ==== VERSION 2.4 UPDATE INFORMATION ==== This release of UNIQUE adds a routine to check for apostrophes or double quote marks in the index key string. Early versions would crash if a double quote mark (") was imbedded in the index key string. Also, early versions would not accept a date later than 1985. This release will not accept a date earlier than 1987. There have been changes made to speed execution and shrink the size of the code. ====>>>> WARNING <<<<==== Finally, this program is COPYRIGHTED. I wrote it.* I am releasing it to the Public Domain for non-commercial use by other people who may find it of use. This program may be freely reproduced and distributed, as long as this file and all copyright notices remain intact and no monetary consideration is involved. (* versions 2.0 & 2.1 co-authored with Stephen Aidikonis, Streamwood, Ill.) UNIQUE.CMD is being distributed as an encrypted file. The file occupies less space on the disk, and executes faster, in this form. The .CMD file cannot be listed, or externally altered. I have attempted to compile UNIQUE, using WordTech's dBASEII Compiler, but the compiler is not sophisticated enough to allow the use of macros such as those used in UNIQUE. Fully commented Source code is available from the author. If you modify the program, I forgive you. Please upload a copy of the modified code to me at my RCP/M (see below) as a 'PRIVATE' file, and PLEASE keep the source code to yourself. If I decide to incorporate your modification in a future revision, you will receive appropriate credit in the documentation. If you desire an 'Official' copy of the commented source code, you may purchase it for $15.00, if you pick it up from my RCP/M, or for $25.00 on 5 1/4" disk by return mail. I can support about 50 different formats, leave a message on my RCP/M for information of disk formats. Money Order or Cash orders will be shipped within 48 hours. Allow 2 weeks for personal checks to clear. JAMES A. GRONEK President UCS, inc. Telephone: The Lost Dutchman's Gold Mine PICS (602)247-2880 300-1200-2400 Baud 24 Hours 7 Days a week Address: The Lost Dutchman's Gold Mine PICS Post Office Box 23937 Phoenix, Arizona 85063 Attn: Unique