MULTIVARIATE ANALYSIS PACKAGE 1.6
Copyright 1985,86,87   Douglas L. Anderton
                       Department of Sociology
                       University of Chicago
                       1126 E. 59th Street
                       Chicago, IL 60637

These  programs  are  released for distribution so long  as  1)  any
charges involved do not exceed costs of media and mailing, and 2) no
portion of the programs is used for commercial resale.

Revision History:

01/12/87 - added codebooks for variable names and missing values see
           usage documentation below.
01/10/87 - major optimization in FACTOR eigen subroutines  cut iter-
           ations by about 40% and gave user control of tolerance.
01/09/87 - minor revisions  DESCRPT, PLOT, CORREL, PARTIAL, CLUSTER,
           HYPOTHS and MANOVA.
01/08/87 - fixed rollover bug in grand totals in CROSSTAB, and minor
           optimization.
01/07/87 - substantially optimized TRANSFRM for 16% speed increase.
01/05/87 - fixed bug in REGRESS mean squares,  converted to gaussian
           and LU solution. Modified GETCOR subroutine to get names.
08/25/86 - Modified TRANSFRM to  allow leading minus signs on number
           entry and numbers up to 11 characters long.
06/25/86 - Fixed bug in group option and histograms in DESCRPT.
05/27/86 - New Release. Buffering added to TRANSFRM, MANOVA program,
           Simple 2-dimensional PLOT, and Kmeans CLUSTERing program.
04/21/86 - Added Spicer algorithm and weighted data to CORREL.
04/19/86 - Added (improved accuracy) Spicer algorithm, weighted data
           and 'by' group computations to DESCRPT.
03/23/86 - Fixed IFS bug in TRANSFRM and Sped it up considerably.
09/27/85 - Fixed Critical bugs in CORREL with missing values.
09/24/85 - New Release.   Transformations  Package,  Partial Correl-
           ations, Factor Analysis and Hypotheses Tests.
09/13/85 - Fixed bug which dropped  sign of correlations from CORREL
           when read into REGRESS if negative.
06/28/85 - Fixed bug in CROSSTAB (unidimensional addressing).
06/26/85 - Fixed bug in CROSSTAB (init row and col tots).
06/22/85 - New Release.  CROSSTAB.
06/15/85 - First Release. DESCRPT, CORREL, REGRESS.


INTRODUCTION:

Mapstat  is a very serious multivariate statistical analysis package
capable of meeting 90% or more of most users analytical needs.   The
routines have,  at this point, been well tested and provide the most
frequently  used procedures of the relatively expensive  statistical
packages  without cost.   Source code is included for  modifications
and elaborations at your own risk.

Eleven programs are included in this sixth release of MAP.

  1) DESCRPT  - descriptive statistics and frequency histograms.
  2) CORREL   - correlation and covariance matrices.
  3) REGRESS  - multiple linear regression.
  4) CROSSTAB - n-way crosstabulation and association tests.
  5) TRANSFRM - data transformations.
  6) HYPOTHS  - simple hypotheses test on means and variances.
  7) PARTIAL  - partial correlation coefficients.
  8) FACTOR   - principle axis factoring with rotations.
  9) CLUSTER  - kmeans clustering program.
 10) PLOT     - simple 2 dimensional plots
 11) MANOVA   - multiple dependent variable analysis of variance

Users  are  encouraged to REPORT BUGS and make REQUESTS  for  future
versions.   Do not release your own versions or modifications  using
the  copyrighted  MAP  or  MAPSTAT logos - and abide  by  the  above
copyright notice.


HARDWARE REQUIREMENTS:

MAP is written in version 2 (or 3) of Turbo Pascal (@Borland  Intl).
It  has  been  written to compile with less than 56k TPA  for  those
running ZCPR3 or an alternative OS on 8-bit machines.

Only several statements must be altered to run the programs on MSDOS
machines.  Change  BDOS(0) calls to EXIT and try to compile.   As  I
recall  only two or three other lines need to be changed out of  all
the code herein for MSDOS version 3 Turbo.

PLOT contains printer control codes for the EPSON MX80 in  procedure
Openfiles, modify these codes to suit your printer.


DESIGN PHILOSOPHY:

First, MAP is written as a sequential case processor to avoid memory
resident storage and achieve the greatest speed possible.   This has
several  consequences  1) the package contains powerful  statistical
analysis  programs  without  horrendous  memory   requirements,   2)
however,  the  cost  arises in that for redundant functions such  as
histograms,   regression  residuals,  etc.,  the  package  currently
requires multiple passes at the data.   Even for large data sets the
programs are sufficiently fast to make such passes reasonable.


INPUT DATA REQUIREMENTS:

MAP  expects  to find your data in a free format with at  least  one
blank  separating  each variable and a newline at the  end  of  each
line.   All  variables for each case must be on a single line,  i.e.
newlines  separate records.  It will not accept  alphanumeric  data.
Programs  assume  all data transformation has been  performed  (e.g.
CROSSTAB expects a finite number of values,  not necessarily integer
value).  These are the only data requirements.

Codebook files containing variable names and missing values are also
allowed, see 'running the programs' below.

COMPILING THE PROGRAMS:

Use  your  Turbo  Pascal  (@Borland Intl) compiler  to  compile  the
programs with the options set to a .COM file for MAPSTAT and a  .CHN
file  for all others.   Rename all except MAPSTAT to the names given
in  the file MAPSTAT.PAS.   If you plan to run these programs  under
ZEX control (highly reccommended) then be sure to compile them under
ZEX. This is done putting all the distribution files along with your
turbo compiler in a common access area and running the  MINSTALL.ZEX
file included.  Alternatively,  to compile one at a time,  but under
zex, just enter:
        >ZEX
        :TURBO
        :<carriage return>
and then proceed as you normally would.


RUNNING THE PROGRAMS:

1. Data Input and Output Files -

After  invoking the programs they will ask for the name of an  input
data file (or a file created from a prior MAP run - for example, the
output  of  CORREL is used by REGRESS),  and the name of  an  output
file.   For  printer  output specify the filename as  LST:  and  for
screen  output specify CON:.   An exception is TRANSFRM,  which uses
buffered  output routines will accept LST:  and CON:  but will  send
output to LIST.TMP and CONSOLE.TMP disk files respectively.

2. Codebook Variable Description Files -

If the input to the program is raw data (i.e.  it is not one of  the
procedures which input a prior CORREL matrix), then the program will
ask for a codebook file.   The codebook file contains three items of
input for each variable in the data file (1) the column number,  (2)
a  variable name of eight characters,  and (3) a missing value  code
for missing values.   Again, I repeat, one line must be provided for
each  variable  in  the  data  file (whether  it  is  used  in  this
particular  analysis or not).  All three items must be provided  for
each variable on a new line and separated by blanks.  For example,

1 THISIS1  -9
2 HERESTWO -1E37
   (etc.)

Note  that  eight spaces must be allowed for variable  names,  leave
blanks  if  necessary  to fill out the string.   Note  also  that  a
missing  value code must be given for every variable.   The  example
above used MAPSTAT's default value of -1E37 for missing  data,  this
or another equally implausible value may be given in the codebook.

Alternatively,  if  the  user  specifies  'none' in  answer  to  the
codebook file query, variable names will default to variable numbers
and  the  default  missing value will be assumed.   This  is  not  a
recommended option if you will return to your output sometime in the
future.

3. Variable Column Identification -

After  file names the programs will typically request the number  of
variables  in the data file and then the number of variables  to  be
used in the present run.   For example, a CORREL run might be run on
a file containing lines for 500 cases each with 12 variables, only 4
of  which  are  to intercorrelated in the present  run.   The  total
number  of variables would then be entered as 12 and the number  for
the present run as 4.

For each variable to be used the program will request information on
the column number of the variable (e.g.  1 for the first variable, 2
for the second, etc.).  These are column numbers in the raw file not
among the subset to be used.   In the above example,  say the first,
third,  sixth, and eleventh of the 12 variables were to be used, the
user would enter 1<RETURN> 3<RETURN> 6<RETURN> and 12<RETURN>.

4. Specification of Groups, Weights and Special variables -

Occasionally,  the  programs  will ask you to identify  one  of  the
variables for use in weighting data,  grouping data,  as a dependent
variable, etc.  Again, reference is by original column number of the
input  data set.   For example,  if the correlations in the  example
above  were  to be weighted by population which is contained as  the
sixth  variable,  you would identify the weight as  column  6,  it's
position  in  the  raw  data file.   All of the  variables  used  as
weights,  groups,  etc.,  must  have been included in  the  original
number  of  variables  to use and selection of the columns  for  the
analysis.   That  is,  it  would not be  possible  to  specify,  for
example, column 4 as a weight since it has not been specified in the
variable list above.

5. Hints on Further Documentation -

All other information necessary is prompted for with what I hope are
explicit prompts.   If you have problems as to input queries, or the
interpretation of output,  refer to a statistics book.   Some of the
multivariate  routines  are  recognizably  influenced  by  those  in
Fortran  by  Cooley and Lohnes in their Multivariate  Data  Analysis
book.   The Kmeans clustering routine is found in almost any book on
cluster  analysis.   Some  routines  lifted from  numerical  methods
books, etc., have references in the source code.  The transformation
options  are relatively well elaborated if you initially specify  to
input transformations for the CON:  file.   Once you become familiar
with the program you can input transformations from files.

Finally,  I  am eminently reachable for the near present at the  BBS
number at the end of this file.  If you have any questions regarding
interpretation, etc., feel free to give me a line.

6. Hints on Power Usage -

There  are  a  number of features which  the  design  philosophy  of
mapstat  preclude.   However,  most  of these features  are  readily
derivable through coupling TRANSFRM with the other programs.

For  example,  many  regression packages output residuals  from  the
regression  and plots of the standardized residuals,  etc.   Mapstat
does  not  force  such a second pass through the data  since  it  is
designed  for  large  data sets without retention  of  the  data  in
memory.  If the user desires such an analysis the residuals could be
readily computed using TRANSFRM and then plotted with PLOT.

Similarly, FACTOR produces score coefficients which could be used to
generate factor scores for further analysis etc.

Dummy  variables can be coded through use of the recoding facilities
in  TRANSFRM  and used to compute complicated general  linear  model
analyses of variance (GLM/ANOVA's) through REGRESS.

The  list  goes  on,  and on,  and on.   The  more  you  know  about
statistics  and  what  you are doing the more you  will  find  these
programs of use.  At the same time, if you are a basic user you will
probably  not  require  more  than  the  basic  output  provided  by
routines.


PROGRAM LIMITATIONS:

The  addition  of  codebooks and transformation  files  makes  these
routines  roughly competitive with other micro statistics  packages.
Given  you have recieved them free of cost and,  "omigosh," with the
source code,  they are extremely flexible and useful tools for  data
analysis.

Both  DESCRPT  and  CORREL now allow weighted data  to  be  entered.
While the Spicer algorithm provides good accuracy on computations in
both these programs it is not as robust against weighted data.   The
results  are sufficent for most purposes but excercixe caution  with
heavily weighted data.

At this stage with humble documentation it is up to the user to look
at  the  beginning  type and variable declarations to see  what  the
limitations on the number of variables,  etc.,  of each program are.
I  think if you are doing any REAL data analysis you will  find  the
provisions ample.

I  have  relied  almost exclusively upon these routines  in  several
analyses published over the last couple of years and they have  been
scrutinized by a number of graduate students and colleagues.   While
I can't guarantee any revision won't create some obscure bug,  I can
assure  you there are no subtle bugs of any significance for regular
data analysis.   As with all statistical software,  you should avoid
absurd or extreme value input.

Leave messages on the LILLIPUTE ZNODE (312-649-1730).