Anda di halaman 1dari 10

OR/MS Today - February 1998 - Software Review

Page 1 of 10

February 1998

Software Review:
Stat::Fit
Distribution fitting software makes simulation more
attractive, viable in many applications
By James C. Benneyan
Stat::Fit is a probability distribution fitting software package designed to
help users more easily test the fit of hypothesized statistical models to
empirical data and, ultimately, to identify the best candidate distribution
for a given scenario. Developed by Geer Mountain Software, the primary
intended users are simulation analysts who, by the nature of their work,
frequently need to determine appropriate distributions for any number of
random events or activities. Several nice features also make Stat::Fit
helpful for other possible uses, including basic data analysis and as a
teaching tool.
While a few similar products exist [1-4] a number of things make
Stat::Fit particularly appealing, not the least of which are its flexibility,
user-friendly GUI interface, and several secondary capabilities which fall
in the bells-and-whistles category. The obvious value of such a tool, used
in conjunction with commercial or special-purpose simulation programs,
is to largely free the analyst from the burden of testing and verifying
appropriate model inputs that otherwise can require sufficient time and
statistical background. In the past a few goodness-of-fit programs
therefore have been bundled with simulation software so as to automate
these "front-end activities as much as possible and allow a user to focus
more on other issues.
Overview
System requirements to run Stat::Fit are minimal by today's personal
computer standards, requiring "at least a 386 IBM compatible PC
running MS Windows 3.1 or higher, 4 MB of RAM and 6 MB of
available hard drive space. At the present time, Stat::Fit is not available
for the Macintosh. Software installation and self-familiarization are a
breeze. I intentionally ignored the user manual and stampeded on to
loading and executing Stat::Fit (my usual modus operandi, although here
conveniently claiming a valid excuse).
Stat::Fit has all the standard capabilities one would expect, including the
ability to import a data file, calculate several summary statistics, plot

http://lionhrtpub.com/orms/orms-2-98/swr.html

11/17/2009

OR/MS Today - February 1998 - Software Review

Page 2 of 10

various histograms and distributions, transform data in a number of


ways, test the goodness-of-fit of a variety of probability models, and
display results in various graphical manners. In addition, several features
have been added which make the software particularly appealing to
typical needs of simulation practitioners, including random variate
generation, a graphical distribution viewer, confidence interval and
replication estimation, tests for autocorrelation, time-series run tests, and
others.
Obvious attention to interface design makes Stat::Fit easy to use, and the
manual and electronic documentation both are clear and well-organized,
although in a few cases sufficient information may not exist for
unfamiliar users to fully understand certain technical issues (e.g.,
parameter estimation details, goodness-of-fit implementation,
ramifications of autocorrelation, and so on). It is unclear, however, to
what extent such detail should exist, as it may be reasonable to assume
that a user either is already familiar with or does not need to understand
such concerns, dependent on your philosophy towards these things. In
most cases, more than sufficient information is available for practitioners
to understand the general background issues and to interpret all results.
Running and using the software are also quick and easy, with a decent
on-line help facility, an intuitive menu structure and the usual buttons for
frequently-used functions. The overall performance, speed, accuracy and
ease-of-use all are very good. Over the past several months I used
Stat::Fit to examine dozens of data sets in the course of various typical
projects. Out of curiosity I also ran Stat::Fit on several problematic files
encountered in the past while developing special-purpose fitting code for
certain semiconductor and health care research problems. In each case I
was able to quickly get the results I needed to move forward in my work.
In the few cases where I consulted the on-line help or user manual, I was
able to answer my question without too much effort.
Specifics
Empirical data either can be manually entered into a table window or,
more typically, imported from an external file of several possible types.
A report of standard descriptive statistics can then be quickly generated,
including the count, minimum, maximum, mean, median, mode,
standard deviation, variance, coefficient of variation, skewness and
kurtosis. Several plots of the empirical data are available, the most
commonly used probably being frequency, relative frequency and
cumulative histograms using either of two user-selected algorithms or,
for example, based on a user-specified number of intervals. An example
of a Stat::Fit-generated relative frequency histogram is shown in Figure
1. Each graph also can be customized and adjusted to a user's liking
relatively painlessly and quickly. In the course of other work, in fact, I
sometimes use Stat::Fit just to take a quick look at a data set, obtain
descriptive statistics and generate a few quick histograms before
proceeding with my task at hand.

http://lionhrtpub.com/orms/orms-2-98/swr.html

11/17/2009

OR/MS Today - February 1998 - Software Review

Page 3 of 10

Figure 1 Sta::Fit histogram of emergency room interval


times vs. fitted exponential PDF (p = 0.0615).

In terms of identifying an appropriate probability model for the data,


most of the conventional well-known discrete and continuous
distributions are available, including the beta, binomial, discrete
uniform, Erlang, exponential, extreme value, gamma, geometric, inverse
Gaussian, logistic, log-logistic, lognormal, negative binomial, Gaussian,
Pareto, Pearson 5, Pearson 6, Poisson, triangular, uniform and Weibull.
While several other probability distribution functions (PDFs) can also
arise in certain scenarios of interest, the above should be sufficient for
most anticipated discrete event simulation needs.
Any or all of three common goodness-of-fit methods, namely Pearson's
chi square test or the Kolmogorov-Smirnov and Anderson-Darling tests
[5-7], can be run. The user also has some control over which
distributions to test, how their parameters are estimated (either via the
method of moments or, by default, maximum likelihood) and how fit
methods are conducted. For example, in the chi square case for
continuous data, radio buttons control whether intervals of equal length
or equal probability are used, although the usual within-interval
expectation of 5 and other implementation considerations [1, 5, 8, 9]
cannot be over-ridden (not necessarily a bad limitation given probable
users). Upon execution, a window is displayed summarizing all
estimation and fit results for each selected distribution, as shown in
Figure 2, with a plethora of supporting detail included beneath this
window. In the top summary, the test statistic values (and in the chi
square case, the number of intervals) are displayed, rather than p values,
although these can be found within the detailed information if a user is
more accustomed to interpreting these values.

http://lionhrtpub.com/orms/orms-2-98/swr.html

11/17/2009

OR/MS Today - February 1998 - Software Review

Page 4 of 10

Figure 2 Example of goodness-of-fit results table (ER


interarrival data).

As an alternative, an "AutoFit function can be executed which


automatically selects all appropriate candidate distributions, conducts
each goodness-of-fit test, and displays a window listing general numeric
ratings of fit for each pdf (rather than the above summary report). "Hotspots in this window appear as the user passes the mouse over each
distribution, and clicking on any one automatically generates a plot of
that PDF or cumulative distribution function (CDF), dependent on userpreference, overlaid on top of the corresponding empirical histogram. A
second plot of cumulative residuals versus an error interval also is
generated automatically to examine the delta between the theoretic and
empirical cdfs. Other available graphical diagnostics include Q-Q and PP plots to help visually assess fit in the tails or center of the distribution,
respectively.
A related function is a "Report Generator which automatically
constructs and prints a report of graphical and tabular results. Either of
two default contents can be used (basic and detailed) or other formats
can be customized to some extent via an options window (and saved for
future re-use). Most results also are easily printed, saved as graphics
files, or copied into other documents as desired. The parameters of a
chosen distribution also can be saved to a file or copied to the clipboard
in the appropriate syntax for importing directly into any of a dozen
popular commercial simulation packages. Perhaps more useful, an
empirical distribution can also be constructed easily and then saved or
copied similarly, to simulate situations for which no good model was
identifiable as a reasonable fit.
Some Unique Features
In addition to the above capabilities, several unique features distinguish
Stat::Fit from other products, one being a "Distribution Viewer for
displaying and exploring the shape(s) of any selected distribution. By
changing either the parameters or moments via manipulating sliders with

http://lionhrtpub.com/orms/orms-2-98/swr.html

11/17/2009

OR/MS Today - February 1998 - Software Review

Page 5 of 10

the mouse, as shown in Figure 3, a user can instantly see the change in
shape, viewed either in PDF or CDF form dependent upon preference.
(While a similar feature can be found elsewhere [2], I have found the
Distribution Viewer to be a bit more user-friendly and visually
appealing.) Such a tool alone is well-worth the investment for anyone
who in the past has iteratively plotted PDFs by hand, in spreadsheets, or
in other software (or even generated large data sets and then constructed
histograms of these).

Figure 3 Example of Stat::Fit probability distribution viewer.

Educators in particular may also find this feature useful, especially those
who traditionally cover this material via hardcopy, transparencies or
cumbersome software. I have found this to be a much more dynamic way
to illustrate in classroom settings the various shapes that different PDFs
can assume, with very positive feedback from students as well. A
random number generator menu also can be used to explore any of the
listed PDFs or to export simulated data to a file for use elsewhere.
Stat::Fit also has an interesting "repopulate feature for continuous
random variables that have been rounded to integers as an artifact of data
collection, with the idea being to increase estimator accuracy by
replacing the truncated decimal portions of each datum with randomly
generated values based on the hypothesized pdf. A data file also can be
edited, transformed and algebraically manipulated in several manners,
with most open windows being immediately recalculated (again a very
nice teaching tool).
All of the above features offer a good deal of user customization,
including the abilities to set many analysis and format defaults, to
designate a default directory for saving and retrieving files, and to insert
bookmarks and free-form text annotations in the help facility once
information of interest is located. Other conveniences include the ability
to filter a data set in several ways, general overall flexibility in options
and graph formats, and a button for calculating a table of maximum
likelihood or method-of-moments estimates for selected distributions

http://lionhrtpub.com/orms/orms-2-98/swr.html

11/17/2009

OR/MS Today - February 1998 - Software Review

Page 6 of 10

(without running the fit routines), perhaps necessary for some other
need. Analysis results (including graphs) can be stored in a "project file,
saving the user from starting completely over each time a data file is
analyzed (although some formatting can be lost).
My Personal Wish List
In the course of evaluating this software it was only natural to develop a
few ideas as to what would make it even better. While I was pleasantly
surprised to find my list of grievances small, here are my top four
wishes:
1. Distribution Viewer:

As nice as this function is, the ability to


superimpose it on top of a histogram of empirical data would make it
even better, perhaps using maximum likelihood estimates as default
starting values, so that by manipulating the slide bars a practitioner (or
instructor) could visually experiment, compare, or demonstrate a given
data set with various PDFs and parameter settings.

2. Distributions and Parameters:

While a sufficient range of


continuous distributions is available, the list of discrete models might be
expanded to allow greater flexibility. On several occasions I have wanted
to compare count data to something other than a Poisson PDF, such as
the more general form of the negative binomial and others. Additionally
and related, other estimators might be added and in particular minimum
variance unbiased estimators where different, especially as some
practitioners cannot or do not always collect large data sets on which to
base input assumptions. Other possibilities include improving initial
maximum likelihood estimates via Levenberg-Marquardt or other
methods [2].
3. Graph and Table Formats:

While perhaps beyond its intended


purpose, on occasion I have desired even greater ability to customize
graphs, as can be done in most spreadsheet, presentation and other
software. Additionally, while the table formats of estimation and fit
results could be made more visually appealing, more frustrating is the
inability to save these results to a file or copy them to a word processing
document, presumably a frequent desire.
4. Output Analysis:

Ideally, later versions also might contain more in


the way of output analysis capabilities, both statistical and graphical, and
assistance in designing effective experimental plans for typical
simulation studies. Certainly all three needs good models, good
experimental designs and good results analyses are important and
interrelated considerations to successful simulation. As one example, the
latest Stat::Fit upgrade contains a useful window for estimating the
number of independent replications necessary to achieve a desired
confidence interval width or, after the replications are run, the resultant
confidence interval, although more advanced features also could be
included in this general area.

http://lionhrtpub.com/orms/orms-2-98/swr.html

11/17/2009

OR/MS Today - February 1998 - Software Review

Page 7 of 10

General Comments on Any Fitting Software


In addition to the above comments, a few more general observations
regarding any distribution fitting software seem appropriate at this time.
Foremost of these is the importance of examining data not just in a
convenient time-static sense, but also in their original longitudinal
context in order to determine whether some type of natural nonhomogeneity or unnatural behavior exists (such as trending, cyclic
behavior or a lack of statistical control). Arguably one reason queueing
and simulation output sometimes do not validate well to the real world is
incorrect assumptions of stationary or stable distributions, emphasizing
the importance of testing the simultaneous dual hypotheses of
distributional form and stability, and therefore making the ability to
examine data chronologically critical.
To its credit, Stat::Fit can test for time independence via either a plot of
the autocorrelation function, as shown in Figure 4, or two types of run
tests (above/below the median and number of turning points). While
these features are quite useful (I recommend running them regularly or
by default), complementing them with even a simple time series graph or
more advanced capabilities could be very informative. Although Figure
4 represents an ideal situation, quite often a process may not exhibit
distributional stability (e.g., non-homogenous arrival rates). In fact, this
is a common argument for using simulation, so software which addresses
these situations only makes sense. Beyond detection, some guidance or
capability as to how to proceed also would be ideal, such as for
identifying distinct periods within the data, developing separate
estimates for each, modeling non-homogenous Poisson processes and
other nonstationary behavior, etc.

Figure 4 Example of Stat::Fit probability distribution viewer.

A second general observation concerns types of goodness-of-fit tests,


typically with any of the above used as a general omnibus test of whether
one particular distribution reasonably models the data versus the nonspecific alternate hypothesis that it does not. To identify the "best"

http://lionhrtpub.com/orms/orms-2-98/swr.html

11/17/2009

OR/MS Today - February 1998 - Software Review

Page 8 of 10

distribution among several candidates, a customary approach simply


extends this idea by repeating the same test independently on each PDF.
Multiple separate fits are now tested for each distribution individually, in
each case again versus the most general alternate "does not fit
hypothesis (for widest applicability), as opposed to a more powerful Ha
that some specific other model is a better fit [10,11] or that the test
statistic exhibits certain characteristics when the null Ho is not true [12].
Final Comments
Products like Stat::Fit should make simulation more attractive and viable
in many applications, especially with continued growth in practitioner
use, where frequent hurdles include sufficient time and resources. I also
would recommend this software to anyone who is faced with the general
task, be it for simulation or other purposes, of analyzing an empirical
data set and identifying an appropriate probability distribution for some
random variable of interest. (While an occasional researcher may need to
develop special-purpose code for other needs, such uses are beyond the
focus of this product.) My understanding is that several of the above
comments may be addressed in a forthcoming new release, making it a
worthwhile addition to practitioner and educator tool boxes. In summary,
this is a good, useful product, and I imagine subsequent versions will
only get better. I keep it readily accessible on my PC and were it
available for the Macintosh I would have it loaded under my Apple.
Product Information
for Stat::FitTM, Version 1.10
Stat::FitTM is distributed by Geer Mountain Software Corporation,
104 Geer Mountain Road, South Kent, CT 06785.
Contact Information:
Phone: 860-927-4328
Fax: 860-927-4328
E-mail: statfit@geerms.com
Pricing: $249 per copy. Educational packages are also available for
$149 for the professor version; $49 for the student version; and
$250 for an educational LAN license.

Wanted: Software Reviewers


Susan Palocsay, the OR/MS Today software review editor, is in the
process of updating the list of software reviewers and is looking for
new reviewers. Individuals interested in reviewing should contact
her directly at James Madison University, Computer Informations
Systems/Operations Management Program, Harrisonburg, Va.,
22807; Phone: (540) 568-3061; E-mail: palocsw@jmu.edu

Vendor Comments

http://lionhrtpub.com/orms/orms-2-98/swr.html

11/17/2009

OR/MS Today - February 1998 - Software Review

Page 9 of 10

Editor's Note: It is the policy of OR/MS Today to allow software


developers an opportunity to clarify and/or comment on the review
article. Following are comments from John Mauer of Geer Mountain
Software Corp.
We are currently working on Version 1.2 of Stat::Fit, which we
expect to release in the fall of 1998. Many of the items on the
reviewer's wish list as well as those desires expressed by our
customers will be added. These include an expanded list of
distributions (both discrete and continuous), a working viewer to
allow the user to test fit the parameters to the data, various tests
for outliers and homogeneity, support for copy of analysis to the
clipboard, simple output analysis and more.
Further, we are working with several simulation software
companies to improve transfer of the exported distributions to
simulation software. While the expert user should find much to his
or her liking, our main goal is to provide the beginner user with as
simple a task as possible.

References
1. Swain, J.J., Venkatraman S., Wilson, J.R. (1988), "Distribution
Selection and Validation, Journal of Statistical Computation and
Simulation, Vol. 29, pp. 271-297.
2. Sugiyama, S.O., Chow, J.W. (1997), "@Risk, RiskView, and
BestFit, OR/MS Today, April 1997, pp. 64-66.
3. Vincent, S.G., Law, A.M. (1992), "UniFit II: Total Support for
Simulation Input Modeling, Proceedings of the 1992 Winter
Simulation Conference, pp. 371-376.
4. Gottfried, B.S. (1993), "Use of Computer Graphics in Fitting
Statistical Distribution Functions to Data Representing Random
Events, Simulation, April 1993, pp. 281-286.
5. Cochran, W.G. (1952), "The X2 Test of Goodness of Fit, Annals of
Mathematical Statistics, Vol. 28, pp. 315-345.
6. Kolmogorov, A. (1941), "Confidence Limits for an Unknown
Distribution Function, Annals of Mathematical Statistics, Vol. 12, pp.
461-465.
7. Anderson, T.W., Darling, D.A. (1954), "A Test of Goodness of Fit,
Journal of the American Statistical Association, Vol. 49, pp. 765-769.
8. Cheng, R.C.H. (1994), "Selecting Input Models, Proceedings of
the 1994 Winter Simulation Conference, pp. 184-191.
9. D'Agnostino, R.B., Stephens, M.A., eds (1986), "Goodness-of-Fit
Techniques, New York: Marcel Dekker.
10. Dumonceau, R., Antle, C.E. (1973), "Discrimination Between the
Lognormal and Weibull Distributions, Technometrics, Vol. 15, pp.
923-926.

http://lionhrtpub.com/orms/orms-2-98/swr.html

11/17/2009

OR/MS Today - February 1998 - Software Review

Page 10 of 10

11. Hinz, P., Gurland, P. (1970), "A Test of Fit for the Negative
Binomial and Other Contagious Distributions,Journal of the American
Statistical Association, Vol. 65, pp. 887-903.
12. Neyman, J. (1937), "Smooth Test for Goodness of Fit,
Skandinavisk Aktuarietidskrift, Vol. 20, pp. 150-199.
James C. Benneyan is a professor of industrial engineering and
operations research at Northeastern University in Boston where he
teaches and researches in the areas of industrial statistics,
probability theory, statistical quality control and computer simulation.
For more information, put the number 5 in the appropriate
space on the
Reader Service Form

z Table of Contents
z OR/MS Today Home Page

OR/MS Today copyright 1998 by the Institute for Operations Research and the
Management Sciences. All rights reserved.
Lionheart Publishing, Inc.
506 Roswell Street, Suite 220, Marietta, GA 30060, USA
Phone: 770-431-0867 | Fax: 770-432-6969
E-mail: lpi@lionhrtpub.com
URL: http://www.lionhrtpub.com
Web Site Copyright 1998 by Lionheart Publishing, Inc. All rights reserved.

http://lionhrtpub.com/orms/orms-2-98/swr.html

11/17/2009

Anda mungkin juga menyukai