The historical origin of the statistical theory underlying the methods of this book, and as some
misapprehensions have occasionally gained publicity, ascribing to the originality of the author
methods well known to some previous writers, or ascribing to his predecessors modern
developments of which they were quite unaware, it is hoped that the following notes on the
principal contributors to statistical theory will be of value to students who wish to see the modern
work in its historical setting.
has been attacked on this ground, but it has no real connection with inverse probability. Gauss,
further, perfected the systematic fitting of regression formulae, simple and multiple, by the
method of least squares, which, in the cases to which it is appropriate, is a particular example of
the method of maximum likelihood.
The first of the distributions characteristic of modern tests of significance, though originating
with Helmert, was rediscovered by K Pearson in 1900, for the measure of discrepancy between
observation and hypothesis, known as c2. This, I believe, is the great contribution to statistical
methods by which the unsurpassed energy of Prof Pearson's work will be remembered. It
supplies an exact and objective measure of the joint discrepancy from their expectations of a
number of normally distributed, and mutually correlated, variates. In its primary application to
frequencies, which are discontinuous variates, the distribution is necessarily only an approximate
one, but when small frequencies are excluded the approximation is satisfactory. The distribution
is exact for other problems solved later. With respect to frequencies, the apparent goodness of fit
is often exaggerated by the inclusion of vacant or nearly vacant classes which contribute little or
nothing to the observed c2, but increase its expectation, and by the neglect of the effect on this
expectation of adjusting the parameters of the population to fit those of the sample. The need for
correction on this score was for long ignored, and later disputed, but is now, I believe, admitted.
The chief cause of error tending to lower the apparent goodness of fit is the use of inefficient
methods of fitting. This limitation could scarcely have been foreseen in 1900, when the very
rudiments of the theory of estimation were unknown.
and Labour of the People in London" (1889-1903) and Seebohm Rowntree's "Poverty, A Study
of Town Life" (1901), Bowley's, key innovation consisted of the use of random sampling
techniques. His efforts culminated in his New Survey of London Life and Labour.
1901, with Walter Weldon, founder of biometry, and Galton, he founded the journal Biometrika
as the first journal of mathematical statistics and biometry.
His work, and that of Galton's, underpins many of the 'classical' statistical methods which are in
common use today, including the Correlation coefficient, defined as a product-momentthe
method of moments for the fitting of distributions to samples; Pearson's system of continuous
curves that forms the basis of the now conventional continuous probability distributions; Chi
distance a precursor and special case of the Mahalanobis distance and P-value, defined as the
probability measure of th complement of the ball with the hypothesized value as center point and
chi distance as radius He also introduced the term 'standard deviation'.
He also founded the statistical hypothesis testing theory Pearson's chi-squared test and principal
component analysis.In 1911 he founded the world's first university statistics department at
University College London.
Ronald Fisher, "A genius who almost single-handedly created the foundations for modern
statistical science
The second wave of mathematical statistics was pioneered by Ronald Fisher who wrote two
textbooks, Statistical Methods for Research Workers, published in 1925 and The Design of
Experiments in 1935, that were to define the academic discipline in universities around the
world. He also systematized previous results, putting them on a firm mathematical footing. In his
1918 seminal paper The Correlation between Relatives on the Supposition of Mendelian
Inheritance, the first use to use the statistical term, variance. In 1919, at Rothamsted
Experimental Station he started a major study of the extensive collections of data recorded over
many years. This resulted in a series of reports under the general title Studies in Crop Variation.
In 1930 he published The Genetical Theory of Natural Selection where he applied statistics to
evolution.
Decision making:
Sometimes decisions are made under certainty where the outcomes of the strategic options are
fully predictable. They essentially deal with the management of resources,such as inventory
control, where the issue in question is to optimize the available resources to achieve the best
possible outcome. These are classic mathematicalprogramming problems.
Sometimes decisions are made under partial uncertainty where knowledge of the outcomes under
different strategic options is incomplete. A classical case is acceptance testing in quality control
where the decision is to accept or reject the batch in question,and this is a problem for preposterior analysis.
Sometimes decisions are made under risk where only the likelihood of the outcomes of various
strategic options is known. They are typical problems of risk and return in investment, where the
issue in question is to optimize the payoff. The usefulness of statistical methods depends very
much on the validity of the quantification of risk withthe variance.
the future usually bears a closer relationship with the immediate past than the distant past.
As forecasting models are often built on rather long time series, their
prediction ability is often impaired.
Predicting Disease
Lots of times on the news reports, statistics about a disease are reported. If the reporter simply
reports the number of people who either have the disease or who have died from it, it's an
interesting fact but it might not mean much to your life. But when statistics become involved,
you have a better idea of how that disease may affect you.
For example, studies have shown that 85 to 95 percent of lung cancers are smoking related. The
statistic should tell you that almost all lung cancers are related to smoking and that if you want to
have a good chance of avoiding lung cancer, you shouldn't smoke.
Sampling
When full census data cannot be collected, statisticians collect sample data by developing
specific experiment designs and survey samples. Statistics itself also provides tools for
prediction and forecasting the use of data through statistical models. To use a sample as a guide
to an entire population, it is important that it truly represents the overall population.
Representative sampling assures that inferences and conclusions can safely extend from the
sample to the population as a whole. A major problem lies in determining the extent that the
sample chosen is actually representative. Statistics offers methods to estimate and correct for any
bias within the sample and data collection procedures.
Medical statistics:
Medical statistics deals with applications of statistics to medicine and the health sciences,
including epidemiology, public health, forensic medicine, and clinical research. Medical
statistics has been a recognized branch of statistics in the United Kingdom for more than 40
years but the term has not come into general use in North America, where the wider term
'biostatistics' is more commonly used.[1] However, "biostatistics" more commonly connotes all
applications of statistics to biology.[1] Medical Statistics are a sub discipline of Statistics. "It is
the science of summarizing, collecting, presenting and interpreting data in medical practice, and
using them to estimate the magnitude of associations and test hypotheses. It has a central role in
medical investigations. It not only provides a way of organizing information on a wider and more
formal basis than relying on the exchange of anecdotes and personal experience, but also takes
into account the intrinsic variation inherent in most biological processes.
Application
Engineering:
of
Statistic
in
more structured estimation method (e.g., Difference in differences estimation and instrumental
variables, among many others) that produce consistent estimators.
Geostatistics is a branch of geography that deals with the analysis of data from
disciplines such as petroleum geology, hydrogeology, hydrology, meteorology,
oceanography, geochemistry, geography.
Population ecology is a sub-field of ecology that deals with the dynamics of species
populations and how these populations interact with the environment.
Quality control reviews the factors involved in manufacturing and production; it can
make use of statistical sampling of product items to aid decisions in process control or in
accepting deliveries.
Actuarial science is the discipline that applies mathematical and statistical methods to
assess risk in the insurance and finance industries.
10
Demography is the statistical study of all populations. It can be a very general science
that can be applied to any kind of dynamic population, that is, one that changes over time
or space.
Epidemiology is the study of factors affecting the health and illness of populations, and
serves as the foundation and logic of interventions made in the interest of public health
and preventive medicine.
Statistical physics is one of the fundamental theories of physics, and uses methods of
probability theory in solving physical problems.
=
11