Statistical Concepts

STATISTICAL CONCEPTS IN MICROBIOLOGY'
ROEBERT L. STEARMAN2
Department of Biostatistics, School of Hygiene and Public Health, The Johns Hopkins University, Baltimore, Maryland CONTENTS I. Introduction ................................... 161 II. Observations, Samples and Populations . 162 Describing a Population . 164
Frequency tables.................................................................... .164

Bar
Frequency
graphs.......................................................................... 164 Histograms.......................................................................... 164 .165 Frequency polygons..................................................................

166
.165 curves....................................................................
Parameters and Statistics .
Parameters.......................................................................... 166 Statistics............................................................................ 167

III. Precision and Accuracy .
169 169 Variance as an index of precision........................................ 169 Methods of increasing precision...................................................... 170 Coefficient of variation as an index of precision .171 Accuracy ............................................................................ 171 IV. The Normal Distribution .172 Utility............................................................................... 172 Parameters and statistics............................................................ 172 Significance Tests .173 Basic principles...................................................................... 173 Test of a single sample mean .176 Test of the difference between two treatments: paired samples..................................................................... 178 independent samples.............................................................. 179 Test of two sample estimates of the variance .183 Significant versus practical differences. ...................................... 184 Interpretation of results of significance tests .185 Test of the difference among more than two treatments .186 Problems of Estimation .188 Confidence intervals: basic principles. ...................................... 188 Confidence interval for a population mean. 189 Confidence interval for a population variance .... 190 Confidence interval for the difference between two means .... 190 191 Components of variance technique....................................... V. The Binomial Distribution .... 194 The parent population............................................................... 194 Probability.......................................................................... 194 Distribution of samples.............................................................. 194 Parameters and statistics of the binomial distribution .... 196 196 Significance Tests .... Binomial test of a single sample proportion.......................................... 196 Normal approximation to the binomial distribution .... 197 Normal approximation test of a single sample proportion .... 198 Normal approximation test for two sample proportions .... 199
Precision.......................................................9.......
............
......
'Paper number 300. Milbank Memorial Fund Fellow.

160
19551
STATISTICAL CONCEPTS IN MICROBIOLOGY
161
Normal approximation test for more than two sample proportions .................... 200 Graphical methods for tests of sample proportions .................................... 201 Problems of Estimation ................................................................ 201 Confidence interval for a population proportion ...................................... 201 Sample size and power in significance tests involving proportions ..................... 202 Other methods for a confidence interval for a population proportion .................. 204 Confidence interval for the difference between two population proportions ............ 204 VI. The Poisson Distribution ................................................................ 204 Derivation of the Poisson Distribution ................................................. 205 Limit of the binomial distribution ................................................... 205 Items or events randomly distributed in time or space ................................ 205 General form of probability for Poisson distribution .................................. 205 Mean and variance of the Poisson distribution ....................................... 206 Applications of the Poisson Distribution ............................................... 206 Bacterial counts by chamber method ................................................. 206 Bacterial counts by plate method .................................................... 206 Bacterial counts from dilution series ................................................. 206 VII. Acknowledgments........................................................................ 207 VIII. Appendix................................................................................ 207 Analysis of Variance Computing Table ................................................. 207 Tests to Supplement the Analysis of Variance .......................................... 210 Notes on the Application of the Chi-square Test ....................................... 213 References.............................................................................. 214
I. INTRODUCTION Statistical methods are being used to an increasing extent in the field of microbiology. They are employed in many studies which range from the estimation of bacterial deities with dilution series to the determination of better deil for of increased application o assays. This increasd Bt Bn vitamin assas. vitamin statistics in microbiology is part of a more general trend which is being noted in moat biological sciences. One reason for this trend lies in the fact that present biological problems are of a satitical nature. Whether we like it or not, once a science advances beyond the descriptive stage, its problems become statistical, even though we don't use formal statistical techniques. In 1943, Eisenhart and Wilson (1), in a review of statistical methods in bacteriology, concerned themselves primarily with the methodology of statistical tests. The present review will deal mainly with the basic concepts which underlie statistical tests while the methods will be used to illustrate these principles. Due to limitations of space the review will be restricted to the concepts underlying the elementary methods involving the three basic distributions in statistics, namely, the normal, binomial, and Poisson distributions. In addition to discusing the basic concepts, an attempt will be made to point out some of the pitfalls which should be avoided. Statements of theorems and methods will be given without the
benefit of proofs. However, in some instances an attempt will be made to show the logic underlying a procedure without resorting to involved mathematics or theory. This paper is neither intended to be a textbook nor is it designed to make statisticians of the The training of a statistician is a long readers and just program, a involved i p u as is the taning of a is microbiologist. It hoped, however, that the reader will become acquainted with some of the terminology of the field of statistics and some of the concepts underlying the various methods
PreetedBefore proceeding, it is appropriate to see what
statistical methods can do. One of the more important advantages of statistics is their power to get the most information out of a given set of data. This ability makes it possible to obtain a given amount of information from a smaller experiment than would be needed if cruder methods of analysis were used. For example, in studies involving the effects of different factors (such as pH and temperature), statistical methods make it possible to study the effects of all of the factors in a single experiment and still obtain the same amount of information as would be given by several experiments involving more work, Another advantage of statistical methods is that they offer a standard method of judging experimental data. When different people examine a
162
ROEBERT L. STEARMAN
[VOL. 19
given set of experimental data, each person usually forms his own opinion as to what the data mean, whereas statistical methods offer a common bars for evaluating the results. In some types of experiments, statistical methods offer the only satisfactory solution obtainable. Statistical methods are available for a large variety of problems ranging from the testing of differences between two laboratory procedures to the fitting of curves to data. Some words of caution about the use of statistical methods are also appropriate at this point. Statistical procedures, like dynamite, are beneficial if used properly, but may be dangerous when used improperly. It has been stated, by critics of statistical methods, that anything can be proved using statistics. This statement is true, if (and this is a big if) two conditions are met, (a) statistics are used improperly, and (b) the person to whom a fallacy is being "proved" does not understand statistics. In actuality, statistical methods are nothing more than the application of logic to experimental data, formalized by applied mathematics. If statistical methods led to illogical results, the field would have ceased to exist long ago. Statistical procedures, if used properly, provide a potent, very helpful and many times necessary tool, but anything can happen when the use is improper. Huff (2) has written an excellent, amusing and worth-while book on the subject of the improper use of statistics. A point which should always be kept in mind is that statistical methods are not a substitute for good experimental technique. The statistical results Obtained from a body of data are no better than the technique involved in the experiment. Further, statistical methods are not a substitute for sound professional judgement in interpretation of the results of an experiment. They are primarily an aid to interpretation. One final caution before proceeding: Be careful that the statistical analysis fits the experimental procedure: Many experimental designs which are quite similar require different methods of analysis. Statisticians have learned that a thorough knowledge of the details of the experimental procedure must be obtained before the method of statistical analysis can be decided upon.
from other fields while other terms have originated within the field. In making the transition to statistics, the definitions of the borrowed terms have undergone changes, so that the statistical definitions do not necessarily jibe with the definitions to be found in a dictionary. This section and the next will present the basic terminology to be used in later sections. The basic building block of the statistical method is the observation. An observation is a measurement. It may be qualitative, such as the classification of dead or alive in an experiment on the effect of botulinus toxin on an animal, or the serologic type in the classification of pneumococci. On the other hand, it may be quantitative, such as the optical density or per cent light transmission of a bacterial culture in liquid medium, the number of colonies on a plate, or the number of animals that die in a cage of rats given a fixed dose of botulinus toxin. No matter whether the unit being measured is a single entity or a group, as long as a single measurement represents the unit, the measurement is an observation. A sample is a group of observations drawn from a population (called a universe by some). The membership of a population or sample is determined by what is being studied. To illustrate this, consider a problem in clinical bacteriology. If a patient has a septicemia, the primary interest will be to determine the etiologic agent to start therapy. Then, the population will be the organism or organisms (in mixed infections) that are in the blood of the patient. The sample will consist of the specific type or types of organisms found in the blood drawn from the patient and sent to the laboratory. On the other hand, the study might consist of finding what organisms are found in septicemia cases in a hospital during a given year. Then, the type or types of organis of a patient will become a sample in the population of types found in all of the septicemia cases in the hospital during the year. Again, the study may be broader than this: the problem might be to determine what organisms are found in septicemia cases in general. If so, the organisms found in cases in a hospital now become a sample in this larger population. Thus, populations and samples will change with the primary interests of II. OBSERvATIONs, SAMPLES AND POPULATIONS the study. As interests broaden, what used to be Statistics, like any other science, has its own populations become samples from still larger vocabulary; some terms have been borrowed populations.
19551
163
A point to be derived from this discion is that a population is automatically defined as soon as the problem to be studied is defined. Samples are obtained to shed light on the charac. teristics of the population being sampled. For example, when a physician treats a septicemia patient with one of the standard antibiotics, he is primarily interested in the well-being of his patient; he is interested in the treatment only insofar as it will improve the condition of the patient. If, however, he tests a new experimental antibiotic on the patient, he is interested not only in the well-being of this particular patient but also in the effect of the new treatment on the countless other patients who may some day have the same condition. In this way, the effect of the new antibiotic on this patient becomes an observation taken from the population made up of the effect on all patients who may suffer from the same infection. When he and other physicians testing the same new treatment gather these observations together, they have a sample which may help in shedding light on the efficacy of the new antibiotic. The data obtained from a laboratory experiment are, in the vast majority of cases, nothing more than a sample drawn from a population defined by the materials and methods used in the experiment. If the investigator who ran the experiment (or some other investigator) attempts to reproduce the results, the data obtained represent a second sample from the same population, as long as the materials and methods remain the same. Most of the populations which are sampled in laboratory experiments are infinite populations. For example, in studying the metabolism of some particular strain of Escherichia coli, the number of possible organimsof this type would indeed be infinite. The laboratory studies are done on samples from this infinite population. Metabolic studies on E. coli are aimed at determining the characteristic metabolic processes of the entire infinite population. In other words, we must work with samples, but our ultimate aim is the characterization of an infinite population which we cannot study in its entirety. This is the basis of experimental science. It is at the point of generalization from sample results to population characteristics that statistical methods offer a standard approach as well as a potent tool. There are, of course, problems where the population defined is
not infinite. For example, a study to determine the etiologic agent for a septicemia case without a mixed infection would define a population of one member, namely the type of organism involved. Studies to determine what organisms are found in septicemia cases in a year would also define a finite population with more than one member. The size of a sample is limited only by the size of the parent population (the population from which a sample is drawn is called the parent population). A sample may contain only one observation, or, if the population is of finite size, the sample may contain all of the members of the parent population; this is referred to as complete sampling. It is, of course, impossible to have complete sampling on an infinite population. One concept which will be needed later on is that of a population of all possible samples of a given size. To illustrate a population of this type, consider a parent population of Streptococcus pyogenes (the cocci themselves). Now, the population of all possible samples of size five will contain all of the possible combinations of five cocci which may be drawn from this parent population. It must be noted that any one particular coccus may appear in a great many different samples constructed in this manner, since the manner of construction is equivalent to returning each sample to the population before the next sample is made up. In a population of this type, the members of the population are samples of five cocci. Any sample of five cocci which could be drawn from the parent population will be a member of this population of samples. A type of sample which is used to a great extent is the random sample. A random sample is a sample taken in any manner which gives each member of the parent population an equal chance of appearing in the sample, with the additional condition that once any particular member is chosen it does not affect the chance of any other of the members appearing in the sample. There are many ways in which this can be done. One method which could be used is to assign a number to each member of the parent population, write these numbers on separate slips of paper, place the slips in a hat, mix them thoroughly, and then draw several slips from the hat. The members whose numbers were drawn would then appear in the sample. A sample of this type would give each of the members an equal chance of appearing in the sample and the ap-
164
ROEBERT L. STEA
[VOL. 19
pearance of any particular member in the sample does not affect the chance of the other membe appearing in the sample.
Preqe..y
Describing a Population If all of the members of a population were alike, it would be a simple matter to describe the population by iing the number of observations in the population and the common measurement possessd by all of the members., Most populations met in research are not of this sort, however. For example, the size of streptococci will vary with species, strain and even with the individuals within a strain. Description of popula_ tions of this type presents a problem. In small populations, it is not hard to list each of the observdatons, but in large populations a thing would be a hopeless task. Even though all of the members of a large population might be listed, it would be virtually impossible to determine, from the listing, just how the observations were distributed, i.e., whether the observations are clustered about some central value, what amount of variation exists and so on. The following aids may be used to help in the description of a popuused in describing lation; they may also al be usedmore samplatio In; inthe fa, they In theyay fact, more samples. are in desr the densed scription of samples than in the description of populations since populations are seldom completely known and since data are usually only a sample from a population. Thus, although the following descriptions will be made in reference to populations, the reader must understand that the methods may also be applied to samples. Frequency tables. One of the aids to the descripe
Measurement Figure 1. Bar graph (see text for detailed exp gu ).
each. With the cases of pneumococcal pneumonia, the rare serologic types might be grouped together. Bar graphs. Since it is often helpful to "see" a population, graphs are useful. Although innumerable types of graphs might be mentioned, only the ones that are commonly used will be described. One type is the bar graph shown in figure 1. This particular type portrays populations in which the measurements are qualitative, such as the serologic types of pneumonia, or discrete quantitative measurements, such as the number of rats in groups of a given size that die with a standard dose of botulinus toxin. Each of the possible different measurements that occurs in the population is plotted along the abscissa (horizontal axis of the graph), and their relative frequencies are represented by heights of bars (or lines) drawn above each of the measurements. Each of the members of the population is represented by equal units of height along the ordinate (vertical axis of the graph). Thus, the height of each of the bars is proportional to the relative frequency for the measurement involved. Histograms. A type of graph which is used to tion of the distribution of the observations in a population is the frequency table. The frequency portray continuous measurements, such as the table is a listing, in tabular form, of the fre- diameters of streptococci, is the histogram (see quency (number of observations or fraction of figure 2). The different possible measurements are
Frequency
of unit per measurement
the total number of observations) with which observations occur for each of the possible different measurements contained in the population. For example, if the population were the pneumococcal pneumonia cases occurring in a given year, and the measurement were the serologic type, the frequency table would contain a listing of the number of cases for each of the different serologic types. It is often advantageous to collect the measurements into intervals or groups. For example, if the population were a stock colony of rats, and the measurement were the weight, the number of animals could be listed for each of several intervals of say five grams
Measurement
Figure B. Histogram.
19551
165
again plotted along the abscissa. Each member of Frequency the population could again be represented by a per unit of line above its particular value of the measuremeqeurement ment. However, in the histogram, the abscissa is divided into intervals, and the relative frequencies with which the members of the population fall in each interval are represented by rectangles constructed above the intervals. Here, each member of the population is represented by equal units of area under the rectangles. Thus, the area under each rectangle is proportional to the relative frequency for the interval of measureMeasurement ments involved. This latter point is extremely important in the construction of histograms Figure 5. Frequency polygon for histogram of where the intervals along the abscissa are un- figure 2. equal. If the intervals along the abscissa of the graph are of equal length, the relative frequency relative frequencies in the two intervals, the recwill be proportional to the height of the rectangle; however, if the intervals are unequal in tnhgl abv the frstinterva mus becone uni length, the heights of the rectangles must be the five rectan aeit second inproportional to the average number of members Ters will be fie units in heit (eetfigue 4). of the population for each unit of length of the This will make the areas of the two rectangles the . same, as they should be, since both rectangles itra. xmpeonie tw represent the same number of members of the a ngive both c of thepouain five members members oth tervals, both containig population. Let the first interval be five units in population. length and thethe second be one unit in length. Now, An alternative method of Frequencyi polygons. of the information in a histogram presentation is prop if the height are made proporheights of the rectangles ae th rqec oyo hw nfgr .Tefe tional to the relative frequencies, both rectangles the frequency polygon shown in figure 5. The frewil befiv unts n high. Tusthevieer ;iiquency polygon is made by connecting the centers the tops of successive rectangles in a histobe led to believe that the relative frequencies per of gram by straight lines. The frequency polygon unit of length of the intervals are the same (see has the advantage of giving the viewer the imfigure 3). This is obvious nonsense, since there is rein of continuity which is iherent in the on the average, only one member per unit length msth being portrayed. It is for this in the first interval and five in the second in- measurements reason that a frequency polygon should not reterval. Therefore, to present a true picture of the place the bar graph, since the measurements represented by the bar graph are not continuous. Frequency The frequency polygon does not maintain the relationship between area and the relative frequency given by the histogram. Frequency curves. A specialization of the histogram which will be of use is the frequency curve. If a small sample were drawn from an infinite population, a histogram of the sample could be made with fairly wide intervals along the abMeasurement scissa. If the number of observations in the Figue Isample is increased, a smoother picture of the distribution of the sample will be obtained by Frequency decreasing the size of the intervals. Thus, by inper unit of creasing the size of the sample and decreasing measurement the width of the intervals, the top of the histogram can be made to approach a smooth curve. The curve which is approached by a histogram by letting the size of the sample increase to infinity, that is by approaching complete sampling, and by letting the width of the intervals become infinitesimal is called the frequency curve of the distribution of the population (see figure 6). Figure 4. Properly plotted histogram.
ieihand
a.ljacent... intervals, Foa ifngthe recondbetanglesu mden

.
166
members in the population be represented by N. Thus, if there are 1000 members in the population, N will be 1000 and the members will be numbered from 1 to 1000. N does not have to be finite, as it is here, but may be infinite for infinite populations. Now, let X be the measurement which is taken for each of the members. The measurement for the first member may be denoted by X1, the measurement for the second y-2 Xu+ J 12 or members as X2 and so on up to the measurement for the last member of the population which will Measurement Figure 6. Frequency curve for normal distribu- be XN. Matters can be simplified even further by letting Xi be the general term for any one of the tion. N measurements; when i is equal to 1 we obtain Here, again, the area under the curve within a Xi, the measurement for the first member. With given interval of the abscissa is proportional to this general term, we can denote the measurethe relative frequency with which members of ments of the population as being the population fall within the interval. (2.1)8 Xi(i = 1, 2, * -, N) Parameters and Stattstics Symbol 2.1 tells us that the measurement of the Parameters. We have discussed how tables and ith member of the population is denoted by Xi graphs can be used to describe a population. Al- and that i can be anything from 1 to N (. . . stands though these methods are frequently the most for all of the intervening terms between 2 and N). useful, distributions can also be described by the Now, the only thing necessary to define the popuuse of certain constants called parameters (the lation mean is a symbol which will tell us to add number of parameters that are required to up all of these measurements. The capitalized describe a population completely will vary with greek letter sma (Z) is used as a summation the distribution). A parameter is a characteristic sign. Thus, we can define the population mean by of a population. Actually, a parameter may be N defined as any function (in the mathematical x X i ( sense) of the measurements of the members of a (2.2) N population. For example, if a population consists of the measured diameters of the streptococci on The indices i = 1 and N tell us to add all of the a slide (this would, indeed, be a trivial popula- Xi's fo up to i equal to N. tion), the average diameter of the cocci would be equal rhe t1 m p m n the cter in a a parameter, as would the smallest and largest The ation variance, on the other hand, is a diaetesthetoal f the he iamtes, r venth the total of diameters, diameters . Al- rameter. parameterThe which is designed to give a measure of the of though p ameters. ith is thus the spread or variability of the population. The though the number of possible parameters unlimited, two parameters are used to a greater variance, denoted by the lower case greek letter extent than the others. These are the mean and sma square (o2),is the average of the squared the population mean, of the deviations, from the variance. which make up the population, The mean of a population is nothing more than i the arithmetic average of the measurements of the te. N members of the population. Thus, to obtain the (I mean, add the measurements for each of the (2.3) N members of the population and divide by the number of members. The mean of a population is The following notation is used for numbering denoted by the lower case greek letter mu (u). To save space later on, it is advisable to pick decimal is the section number and the number up some mathematical notations at this time. following the decimal point is the number of the For the sake of convenience, let us number the equation within the section. Thus, symbol 2.1 members of the population and let the number of is the first numbered symbol in the second section
Frequency
nqumber pllossile
.~measurnt
ROEBERT L. STEARMAN
[VOL. 19
1955]
167
To determine the variance of a population, subtract the population mean from each of the measurements in the population, square the resulting numbers, add them, and divide by the number of members in the population. There are, of course, other possible parameters which would give a measure of the spread of the population, and the reader may wonder why such a complex parameter should be chosen as the one to use. To give an adequate answer to this question would require a jaunt into theoretical statistics, but one reason is that the variance has several advantages not enjoyed by other parameters of spread. One, which will be put to use at a later point in the paper, is that the variance can be split into component parts, i.e., if the variation in a population arises from different sources, the variance of the population may be split into terms representing the amount of variation arising from each source. Similarly, viance terms can be combined, i.e., if we have a laboratory procedure which requires several steps, and the variance introduced by each step is known, the variances for the individual steps can be combined to obtain a variance for the complete procedure. Another reason for the choice of the variance lies in its position of importance in the normal distribution which will be discussed in later sections. Since the normal distribution is of great significance in many types of statistical problems, and since a normally distributed population can be completely defined in terms of its population mean and variance, we have another good reason for the choice of the variance as a parameter of spread or dispersion. The square root of the variance, called the standard deviation and denoted by the lower case greek letter sigma (o), will be used to a large extent in later sections of this paper; the equation of its definition is
/ N
Statistics. Populations may be characterized by their parameters. A similar characteristic for a sample is called a 8tatistic. Thus, a statistic is a characteristic of a sample and may be defined as any function (in the mathematical sense) of the observations in a sample. There are an unlimited number of possible statistics just as there are an unlimited number of possible parameters. An example of a statistic would be the average diameter of a sample of the streptococci on our slide. The usefulness of any particular statistic will depend upon its ability to estimate a parameter of the parent population. To discuss the ability of a statistic to estimate a parameter, we will need to define and draw a distinction between the terms estimator and estimate, as applied to a statistic. An estimator is the mathematical procedure used to determine a statistic and an estimate is the actual number obtained when this procedure is applied to a particular sample. An estimator will remain the same from sample to sample (for a given statistic), but estimates will vary from sample to sample. As an illustration, consider samples from the slide of streptococci: the average diameter of the cocci will vary from sample to sample, but the method of determining the average diameter will remain the same. One other concept which will be needed before proceeding is the concept of an expected value. The expected value of a function of the measurements in a population is the average value of the function for the entire population. The expected value of a function is denoted by placing the capital letter E in front of the function, as in equation 2.5. To illustrate the expected value, consider two examples of its use. First, consider the expected value of the measurements of a population. The expected value is the average value of all of the measurements in the popula-
tion, thus,
_
:(Xi
A)2
-(2.4)
The standard deviation has an advantage over the variance in that taking the square root brings the standard deviation into the same units of measurement as that for the members of the population. Thus, if population members are expressed in grams, the population mean and the population standard deviation will also be expressed in grams.
Xi N
(2.5)
But, the right hand side of equation 2.5 is the definition of the population mean, therefore,
E(X)
(2.6)
Next, consider the expected value of the squared deviations of the measurements from the population mean. This would be the average value of
168
ROEBERT L. STEARMAN
[VOL. 19
these squared deviations for the entire popu- ing bias may also be used in other conditions, as will be seen later on. lation, or, The two statistics most commonly used for N - )2 estimating the population mean and population Z (Xi _ EI(X - ;9)'j =-1 (2.7) variance are the sample mean and the sample estimate of the variance. To distinguish samples N from populations in equations for defining the defi2.7 is The right hand side of equation nition of the variance for the population, there- statistics, new notations will be set up for samples. We denoted the size of a population by N and we fore, S will denote sample size by n. We denoted the (2.8) measurement for a member of a population by El(X - }s)" c Thus, the two commonly used population pa- Xi and we will denote the same measurement for rameters might have been defined by the use of an observation in a ample as x;. Thus, the observations in a sample of size n may be denoted expected values. -The concept of an expected value can also be by applied to a statistic. The application stems from (2.11) n) xz(i 1, 2, *--, the population of all possible samples of a given The sample mean is denoted by the lower case size. It has been pointed out previously that there is a population of all of the possible samples of a letter z bar (i), and may be defined by given size which could be drawn from a parent xi population. The estimates obtained by applying (2.12) a given estimator to each of the samples in this n population of samples would also constitute a population. Having thus obtained the population Thus, the sample mean is the arithmetic average of estimates, the expected value of these estimates of the observations in the sample. The sample mean is an unbiased estimate of the population can be determined. There are several criteria for judging how well mean. The sample estimate of the variance is denoted a statistic estimates a parameter. One criterion that will be important in later discussions i by the lower case letter s square (02) and is bias. Let the population parameter be denoted by defined by the lower case greek letter theta (0) and a member : (xof the population of estimates obtained by using the estimator by theta hat (a). Now, an estimator i(213) n- 1 is said to be unbiased, that is, the method of estimation is said to be unbiased, if The reader will note that the divisor is n - 1, (2.9) while the divisor of the population variance was E(8) = e In words, the estimator is unbiased if the mean N as shown in equation 2.3. There are several of the population of estimates is equal to the reasons for using the number n - 1 instead of n, parameter of the parent population. If the mean one of them being that 82, as defined, is an unof the population of estimates is not equal to the biased estimate of a0 for the infinite populations parameter of the parent population, then the which are so important in laboratory experiestimator is said to be biased, and the difference mentation, that is, is called the bias, i.e., (2.14) Ets') = as (2.14) bias = = E(0) - e (2.10) The number n - 1 is the number of degrees of if is the bias that is freedom of the sample estimate of the variance. In If the parameter is, negative, greater than the expected value, the estimator is mathematical parlance, the degrees of freedom said to be negatively biased. Similarly, if the bias may be defined as the number of "independent is positive, that is, if the parameter is less than variables" which go to make up 82. One method the expected value of the estimate, the estimator of determining the number of "independent is said to be positively biased. The terms concern- variables" is to answer the question: If we wish to
-
Eabias~~~~~~~~~~~~'
1955]
169
know the value of each of the observations in the sample, and we are given the mean of the observations, how many of the original observations must we have? The answer is one less than the total number of observations in the sample (if we know all but one of the observations, we may determine the value of the missing observation by subtraction). Another way of looking at the number of degrees of freedom is to say that when the sample was first taken, there were n degrees of freedom. One of these was used in determining the mean, so there are n - 1 degrees of freedom left for the sample estimate of the variance. The numerator of the right hand side of equation 2.13 for 8 is called the sum of squares of deviations from the mean. This term may be abbreviated as S.S. Thus, , (2.15) Shy- 2 (xi -")'
The denominator is the degrees of freedom abbreviated d.f. (some authors use onlyf, there fore, the sample estimate of the variance may be defined as being the S.S. divided by the degrees of freedom. Thus,
82
-.. d.f.
(2.16)
The abbreviations S.S. and df., as well as the definition given by equation2.16, will be ~ givn . agre tnt a great in the later sections of this review. extent There is also a sample estimate of the population standard deviation, namely, the square root of the sample estimate of the variance, denoted by the lower case letter 8. The equation of
theqlateions of16 twisrevsew
iotsd defition w d
8
81
d..
(217)
The calculation of the sample mean and the sample estimate of the variance, as well as the sample estimate of the standard deviation, will be illustrated many times in later sections, so no illustrations will be given at this point.
Precisi Measurements are subject to variation. For example, if repeated Kjeldahl determinations are run on a sample of protein, or repeated determina-
tions are done on the radioactivity of a specimen, it would indeed be rare to get the same results on every try. If there were no variation in these measurements, the measuring device used, or the procedure used would be said to be precise, under the usual dictionary definition of the word. However, in biological work, precise devices or procedures seldom if ever exist under this definition. Instead, the measuring devices have varying precision. The smaller the variation, the greater the precision. A more useful definition of the word precise allows the devices or procedures of great precision to be called precise while those devices whose measurements are subject to an amount of variation which exceeds some allowable amount are spoken of as being imprecise (under the dictionary definition of these terms, all devices and procedures are imprecise). These definitions will be used in this review. Precise and imprecise are relative; their definition will vary with the unit being measured. Variance as an index of precisin. The precision of measuring devices or laboratory procedures is of the utmost importance to the worker who uses them. Precision and reliability are inseparable in laboratory work. Of even greater importance is that the worker know the precision of his tools. There is no direct measure of precision, but an index of precisin is available since there is a the amount of variation, namely the measure of variance. The variance is an inverse function of the precision: the greater the precision, the less the variance. Consider the problem of determining the average diameter of one of the species of streptococci, pyogee. It is an easy matter to. say ,Streptococcus obtain a sample of the population of the cocci themselves; simply inoculate a tube of broth, let time at the proper it incubate for a withdraw a loopful and make a temperature, smear on a glass microscope slide. The cocci on the slide would be a sample of the population of the cocci. The problem of obtaining a sample of the diameters is another matter. Each of the cocci on the slide has a certain "true" diameter. The true diameters of the cocci on the slide cannot be measured, but the diameters may be estimated with measurements obtained by using an ocular micrometer on a fixed and stained preparation. Even apart from errors in measurement because of distortion of size from fixing and staining, these measurements by the ocular micrometer will be
sufficient
170
ROEBERT L. STEARMAN
[VOL. 19
subject to error and will not be the same as the true diameters of the cocci. A measurement obtained from a measuring device is called an estimate. The reader will note that apparently there are two definitions for the word estimate. The word estimate was used in the last paragraph to denote the measurement given by some measuring device; previously it was used to denote a statistic derived from a sample. Actually, both of these fall under the same broad definition of an estimate. With both we are trying to estimate a true value; in the first a measuring device is being used as the estimator to estimate the true value of a member of a population and in the second a mathematical procedure is used as the estimator to estimate the true value of a parameter of a population. Thus, no matter whether the estimator is a measuring device or a mathematical procedure, the numerical value obtained by its application is called an estimate. If one coccus from our slide of cocci were taken, a population of the estimates of its true diameter could be obtained by repeated measurements of its diameter using an ocular micrometer or some other similar device. The variance of this population of estimates would be an index of the precision of the measuring device. Another way of stating this is that the variance of this population of estimates is an index of the precision of an estimate, that is, the variance gives us an index of the precision of a single measurement. Methods of increasing precision. Many times, a measuring device does not give the needed precision. In this event, another measuring device which does have sufficient precision to meet our needs may be available. Often, however, a device of sufficient precision is not obtainable. To bypass this obstacle, advantage may be taken of the fact that the mean of several estimates has greater precision than a single estimate. The relationship of the variance of a mean of several measurements of the same object, denoted by a4, and the variance of a single measurement, 04, may be stated mathematically. The relationship between the variance of the mean of n measurements and the variance of a single measurement is
ment divided by the number of estimates which go to make up the mean. One point which must be brought out in this connection is that equation 3.1 holds only if the n measurements are taken on a single object or if a single measurement is taken on each of n individuals and only if the n measurements thus taken are independent. By independent we mean that the size of a given measurement does not depend in any way on the size of any of the preceding measurements. In general, if we are going to take measurements on more than one individual it is better to take more than one measurement on each since variation among individuals introduces a new source of variation above and beyond that of the measuring device. For example, the size of streptococci will vary with species, strain, and even with individuals within a particular strain. Let us confine our attention to measurements of individuals within a particular strain. Variation exists among these cocci, in that they will not all have the same diameter. A measuring device has a certain amount of variation, and the cocci also have a certain amount of variation. Thus, the variation in means of estimates obtained by taking several measurements on each of several cocci will be due not only to the variation from the measuring device, but also to the variation among the cocci. This means that the variance of the mean taken in this manner depends not only on the variance of the measuring device but also on the variance of the true diameters of the cocci. Here again, the relationship among these variances may be stated mathematically. Denote the variance among the true diameters of the cocci within the chosen strain by ol and the variance due to the measuring device as ad. Now, take the mean of the estimates for n. of these cocci with nd determinations per coccus (take the same number of determinations, nd, on each coccus). Then the relationship among the variances may be stated by the following equation. E
(3.2) s no ncn'z Equation 3.2 reduces to a special case of equation if a single measurement is taken on each s( 3.1 2= n (3-1) coccus (nd = 1). The relationship among the variances becomes more complex when the numThus the variance of the mean of several esti- ber of measurements varies among the different mates is equal to the variance of a single measure- cocci; this relationship will not be discussed.
+
1955]
171
The relationship shown in equation 3.2 is inadequate if streptococci from different species and strains are measured, since the variance among streptococci will depend upon the species, strains and individuals present. That is, the variance among the estimates would be a function of the variance among species, the variance among strains within species, the variance among individuals within the strains and the variance due to the measuring device. Equation 3.2 can be extended to take care of the variance for all of these sources. The main point to be gained from the preceding discussion is that the precision of a method can be increased by taking the mean of several measurements, and that the precision of this mean will depend upon the origin of the measurements which go to make up the mean. Coefficient of variation as an index of precision. Many times a laboratory worker, discussing the precision of one of his procedures, will state that his procedure gives answers that agree within, say, 5 per cent. This type of statement implies that the amount of variation is 5 per cent irrespective of the size of the true value, in contrast to that of a worker who states that weighings on a rough balance are within 2 grams. The latter statement implies that the variance is the same regardless of the size of the mean, whereas the first statement implies that the coefficient of variation is the same regardless of the size of the mean. The coefficient of variation is a measure of the amount of variation in terms of per cent of the mean. The population coefficient of variation will be denoted here by the capital letters C.V. and the sample estimate of the coefficient of variation by the lower case letters c.v. The population coefficient of variation is defined by -equation 3.3:
C.V.
If the variance remains the same regardless of the size of the mean, then the coefficient of variation will not remain the same. The opposite statement is also true, namely, if the coefficient of variation remains the same regardless of the size of the mean, then the variance will not remain the same. Sometimes neither the variance nor the coefficient of variation will remain constant with changes in the mean. In this event, it will be necessary to specify the variance or the coefficient of variation for each level of the mean to discuss the precision of the measurements. However, if either the variance or the coefficient of variation remains the same with each level of the mean, the one which remains constant may be used as an index of precision for the measuring device or procedure. It is, of course, important to specify which of these two indices is being used since there is a definite difference between them. Accuracy According to the dictionary a measuring device is accurate if the estimate obtained by the device is equal to the true value being estimated. This type of definition places a great restriction on the use of the term accurate, since even an imprecise measuring device will give the right answer part of the time; thus, it would be said to be accurate at times and inaccurate at other times (actually it would be inaccurate more times than it would be accurate). For a device to be accurate all of the time, it would have to be subject to no variation and give the right answer every time. As pointed out before, only devices which are subject to variation are available. This type of a definition is too limited to be of much use. A more practical approach is to apply the terms used in discussing bias to the problem of measuring devices. Their use makes it unnecessary to define a group of new terms. Thus, a device is said to be unbiased if the mean of the population of estimates is equal to the true value and it is biased if the mean of the population of estimates is not equal to the true value. The direction of its bias is defined by stating that the device is negatively biased or positively biased, and the amount can be stated. The bias or lack of bias of a measuring device or procedure is of great importance to the laboratory worker, but it is something which can't be determined directly by statistical methods. For example, if the diameters of the cocci on the slide
100
(3.3)
and sample estimate of the coefficient of variation by:
CAv. =100
(3.4)
The standard deviation is used in the coefficient of variation rather than the variance because the standard deviation is expressed in the same units of measurement as the mean (and the original measurements). The coefficient of variation will be in per cent, since the numerator and denominator are in the same units.
172
ROEBERT L. STEARMAN
[VOL. 19
were measured, and then 10 millimeters added to each, no statistical procedure would pick up this obvious bias in the results if the results alone were considered. However, statistical methods will determine bias if additional information is available. the be used used to to determine determine the Many methods may be bias of a device or method, for example, the comparison of the device with a standard. Thus, weights are tested against standard weights to determine the bias of the weights. When a standard is used, statistical comparison tests, to be considered later, will be of use. Another method of determimng the bias of a method recovery of added measurable is by the recovery measurable method material. For example, in microbiological assays, known amounts of the metabolite being assayed are added to unknowns and the difference, as measured by the assay, between the level of the metabolite in the unknown and the level in the unknown plus known added amount, is compared with the amount of the metabolite added. Here again, statistical comparison tests will be useful. The difference between bias and precision must be noted. Precision is concerned with the varability of the estimates about their central value, whereas bias is concerned with the difference between the central value of the estimates and the true value being estimated. Thus, procedures mybe be unbiased and imprecise, unbiased and may precise, biased and imprecise, or biased and precise. Any of these combinations may be met in practice. The best combination would be a procedure both unbiased and precise. This would be a procedure that could be called accurate.
Manymlethods
may
proach a normal distribution with small samples while other distributions may require quite large sample size.) (c) Many naturally occurring populations may be made to follow an approximately normal distribution by the use of a transformation of the measuresimple population. ments of the members of the For example, the logarithms of counts per minute in radioactivity measurements of a specimen using an Autoscaler are approximately normally distributed. (d) Statistical tests involving populations which are not normally distributed may be simplified by use of normal approxiExamples of this will be given mations. in later sections.
There is nothing particularly "normal" about the normal distribution. Although, at one time, some statisticians thought that all biological populations would be normally distributed, this was soon shown to be a fallacy. The name normal distribution has remained because of usage and not because it represents the normal state of affairs for biological populations. Parameters and statietics. A circle may be defined by the use of its center and its radius. If we wish to graph a circle we may do so if we
parameter) paaetr
IV. THE NORMAL
DISTRIBUTION
Utility. The normal distribution is one of the most used and most important distributions in statistics. The usefulness of the normal distribution lies in four facts.
(a) Many of the naturally occurring populations are approximately normally di(tributed.
(b) The means of large random samples from naturally occurring populations are
normally distributed even if their parent populations are not normally distributed (The meaning of the word large will vary with the distribution of the parent population. The means of random samples from some distributions ap-
way, we may define a normal distribution by its frequency curve for a normally distributed population, a symmetrical, bell-shaped curve (see figure 6) is completely defined in terms of the population mean and variance. The only parameters which appear in the mathematical equation for the curve are these two (the reader may find the equation of the curve in most elementary statistics books or reference 1). Changes in the mean and variance of the population result in changes in the position and spread of the curve. Changes in the mean result in shifting the curve to left or right along the abscissa while changes in the variance result in increasing or decreasing the spread of the curve. An excellent discussion of the effect on the curve the mean and variance, complete with graphs, is given by Bross (3) startig On page 202 of his highly recommended book. An important relationship which is used to a great extent is that 95 per cent of the area under
mean and variance. The
and its radius (equivalent to a an t ais(qlaett parameter of spread). In a somewhat similar
oftchanging
19551
173
(4.5) (4.6)
the curve lies between the mean minus 2 standard deviations and the mean plus 2 standard deviations (to be more precise, 1.96 should be used instead of 2; however, for all practical purposes, 2 is close enough). Since the area under the curve is proportional to the relative frequency with which the measurements of the members fall within the interval, this means that 95 per cent of the members of the population will have measurements which lie between the mean minus 2 standard deviations and the mean plus 2 standard deviations. As an illustration, if the distribution of weights (estimates) obtained by weighing a 1000-gram object on a rough balance was a normal distribution with mean 1000 and variance 9, then the standard deviation of the distribution would be the square root of 9, or 3 grams. Thus, 2-2a 1000-6- 994 grams (4.1) and A + 2 - 1000 + 6 =1006grams (4.2) therefore, 95 per cent of the weights obtained in this manner would lie between 994 grams and 1006 grams. One other relationship follows from the fact that the frequency curve for a normal distribution is symmetrical (one half the curve is a mirror image of the other half). Hence, the remaining 5 per cent of the population is divided equally between the two "tails" of the curve, that is, 2% per cent of the members will have measurements which are less than the mean minus 2 standard deviations and 2% per cent of the members will have measurements which exceed the mean plus 2 standard deviations. In the example, 2% per cent of the weights obtained from the balance will be less than 994 grams and 2H per cent of the weights will exceed 1006 gram One method for denoting the relationship of the area under the curve and a particular value of the measurements along the abscissa is by use of a subscript which tells what proportion of the population will have measurements which are less than the given value. For example Xo.oz is the value of a measurement such that 2% per cent (a proportion of 0.025) of the population members will have measurements which are less than this value. Thus, for the normal curve X =o- -2o (4.3) + 2i . 7 = (4.4) or in the example of the rough balance
X. o2 = 994 gram X.975 = 1006 gram
We shall use this sort of notation to a great extent in the discussion of statistical tests. The two statistics which are used to estimate the population mean and variance are the sample mean and the sample estimate of the variance. If a random sample is taken from a normal population, the expected value of the sample mean is the population mean and the expected value of the sample estimate of the variance is the population variance, thus, the two estimators are unbiased for a parent population with a normal distribution.
Swnfran Tests Basic principles. Certain bacteria which occur normally in throat cultures taken from healthy individuals are called the "normal flora." When a clinical bacteriologist finds only these organisms in a throat culture and if none of the organism are there in abnormal quantity, the culture may be reported to the physician as a normal throat culture. However, if beta-hemolytic streptococci occur m the culture in large numbers, this findin would be rare in a normal throat, so the bacteriologist would report the finding of the beta-hemolytic streptococci in the culture. Other fiding, also rare in a normal throat culture, would be reported to the physician as abnormal findings. Thus, throat cultures fall into two categories--the first, those cultures showing organ which could be expected in normal throats, and the second, cultures that exhibit s oan rae inbormal throat cultures. Onorenumber receiving the laboratory findings, the physician may return a diagnosis of either a normal or an infected throat. If the diagnosis is no infection, and nothing is wrong with the patient's throat, then no error will be made. If the diagnosis is a throat infection and the patient has an infection of the throat, again, no error will be made. However, one type of error will be made if the diagnosis is an infection of the throat when none exists, and a second type of error will be made if the diagnosis is made that there is nothing wrong with the patient's throat when in truth there is an infection. The basic principles of statistical tests are much the same as those for the diagnosis of the presence or lack of throat infection based on
ofltype.
174
ROEBERT L. STEARMAN
[VOL. 19
TABLE 1 Types of errors which can be made in a test of a hypothesis

Decision concerning the hypothesis
than 5 per cent; one other level which is used at times is the 1 per cent significance level.
To illustrate a significance test, consider an artificial example with the rough balance. Assume that the problem is to test whether the balance is unbiased in weighing a 1000-gram object. The object will be weighed only once. Now, if the population of estimates which could be obtained is normally distributed and if the variance for this population is known, a test may be set up for the hypothesis that the balance is unbiased, that is, that the expected value of the estimates on a standard weight weighing 1000 grams will be 1000 grams. Let us assume that although we haven't tested for bias before, we know from past experience with the balance that the variance for estimates is 9 (this is indeed an artificial
Status of the Hypothesis T

True
False
Do not reject the hy-
pothesis ....
No error
ror
Type II
error
Reject the hypothesis.. Type I er- No error
laboratory findings on throat cultures. In a statistical test, a theory or proposition (called a hypothesis) is to be tested. We then determine what we would expect to find if the hypothesis example). Thus, the standard deviation will be is true and what we would expect to be a rare the square root of or 3 grams (this is not part occurrence if the hypothesis is true. Havng de- of our hypothesis). 9, termined what would be normal and what would Before setting up the test, the alternatives to be rare if the hypothesis were true, the experi- the hypothesis must be considered, that is, what mental data is examined to determine into which can happen if the hypothesis is not true? There of these classes the data fall. If the data fall into are two possible alternatives, namely, the balance the normal class, the hypothesis is not rejected; may be negatively biased so that the estimates again, there are two types of error. An error is will, on the average, be less than 1000 grams or made if the hypothesis is rejected when it is true. the balance may be positively biased so that the This is called a type I error. An error is also made estimates will, on the average, be greater than if the hypothesis is not rejected if it is false. This 1000 grams. Now, the test must be set up so as to up a bias in either direction and still have a is called type II error. The types of errors which pick total of only 5 per cent of type I errors if the hypothesis is indeed true. To accomplish this we can be madeeansummarized in table 1. One of the things to be done for a test of a can use the fact that if the hypothesis is true, we hypothesis (called a significance tedt) is to set up have a normal distribution with mean 1000 and a some sort of criterion for judging what is a rare standard deviation of 3. Thus, since 5 per cent occurrence and what is a normal occurrence. The of the members of such a population will either method used in setting up this criterion is to be less than 994 grams or more than 1006 grams place a limit on the rate at which type I errors (see equations 4.1 and 4.2), the significance level will be made if the procedure is used on all of the test will be 5 per cent if the hypothesis is when the weight obtained is less than possible samples drawn from the theoretical rejected grams and if the hypothesis is also rejected thhpohsi994 in which e when the weight obtained exceeds 1006 grams. population most commonly used test criterion is set up m The hypothesis is rejected for these values since such a way that if the hypothesis is true, type I values as large as or larger than 1006 grams or as errors will be made only 5 per cent of the time. small as or smaller than 994 grams can only occur This is equivalent to saying that if the hy- by chance 5 per cent of the time if the hypothesis pothesis is true, a discrepancy between the is true. observed event and the hypothesis as large as or Values which lead to the rejection of the larger than the one obtained could only happen 5 per cent of the time by chance. The per cent of hypothesis are said to lie in the critical regon for type I errors which will occur is called the thetest. Intheexampleof therough balance, the significance level of the test. Thus, the most critical region consists of two parts, namely, commonly used test criterion has a significance those values which. are less than 994 grams level of 5 per cent. Other tests may be set up in and those values which exceed 1006 grams. The critical region in the example of the rough which the significance level is something other
19551
175
balance consists of the two tails of the population curve; this type is called a two-tailed test. That it is a two-tailed test arises ultimately from the fact that there were two alternatives to the hypothesis. It is also possible to have a one-tailed test, which will arise if there is only one alternative to the hypothesis. For example, if for some reason it was known that the rough balance could only be positively biased, then a one-tailed test would be used. Thus the alternatives to the hypothesis are important in deciding whether a two-tailed or a one-tailed test will be used.
The advantage of this transformation is that even if the variance of the original population changes from problem to problem or if the hypothetical mean of the population changes due to a change of hypothesis, u will still have the same distribution. Thus, the critical region for a u-test will remain the same each time if the significance level and the number of alternatives to the hypothesis remain the same. Since the variance is 1, the standard deviation will be the square root of 1 which is 1. Thus U.025 = -2 = 0-2 = -2 (4.8)
The test criterion is now ready. Note that this and has been done without once seeing the data which will be used to decide whether the hypothesis U.976 = + 2a = 0 + 2 = 2 (4.9) will be rejected or not, since if a test is to be objective it must not be swayed by a prior knowl- Therefore, the critical region for a two-tailed edge of the data. Having the criterion for the u-test, with a significance level of 5 per cent, will test, we now proceed to obtain the data. Let us be of values of u which are less than minus 2 and say that we weigh a 1000-gram weight using our values of u which exceed plus 2. To illustrate the rough balance and obtain an estimate from the test, let us apply it to the problem of the rough balance of 1005 grams. Now, this estimate does balance. In the example, not fall into the critical region for the test, there1005 - 1000 5 fore, the hypothesis that the balance is unbiased is not rejected. u= 3 170
If we continue to use a test of the sort taken for our rough balance, the critical region will change each time that a new hypothesis is chosen. It will also change each time that the variance of the population changes, that is, the critical region for a test of this type will not always be thesame for every test. Each time a significance test of this sort is made, the critical region must be computed. Matters can be somewhat simplified by the use of another test (which will be called the u-test in this paper). This test will have the same critical region for each problem even if the hypothesis changes or if the variance of the population changes. The u-test makes use of a simple transformation for normally distributed measurements. If y is an observation drawn from a normally distributed population with mean &, and variance 2, then the quantity u where -
upn <(4.7)
is normally distributed with mean zero and variance 1. Thus any normally distributed population can be transformed to a standard normal distribution by means of this relationship.
Here again, the value obtained does not fall into the critical region, therefore the hypothesis that the balance is unbiased is not rejected. This is as it should be, since the two tests are identical. When the hypothesis is not rejected, the statement is made that the observed value is not significantly different, statitially, from the mean given in the hypothesis and the significance level of the test is specified. Thus in the test of the rough balance, it is said that the estimate obtained is not significantly different, statistically, from 1000 grams at the 5 per cent ificance level. When the hypothesis is rejected, the statement is made that the value obtained is sinifiantly different, statistically, from the value given by the hypothesis and again the significance level is specified. The word significant as used in this connection is strictly a statistical term and has no connection with the dictionary definition of the term. Thus, if the statement is made that results obtained in an experiment are significantly different, statistically, from the results expected if the hypothesis were true, what is meant is only that the hypothesis is rejected at the given significance level and not that the results are world-shaking
176
ROEBERT L. STEARMAN
[VOL. 19
or even that the observed difference is of any practical value. The practical aspects of the results of the test are given by a consideration of the original problem and by a consideration of the size of the difference in relation to the problem. A further discussion of significant versus practical differences will be given at a later point in this section.
significance test: (a) Set up the hypothesis.
alternative hypothesis, a test with a significance level of 5 per cent will have less type II errors than will a test with 1 per cent significance level. A term used in discussing type II errors is power. The power of a test is 100 minus the per cent of type II errors. Thus, if the level of type II errors was, say, 27 per cent, the power would be 73 per cent (100-27 = 73). The power of a test is, in a way, a measure of its ability to ween teanull othess a and nd Summarizing the steps which are taken in a dertia differentiate between the null hypothesis
the specified alternative hypothesis. Thus, as the power of the test increases, it is less likely that the null hypothesis will be accepted if the alternative hypothesis is the true state of affairs. The power of a test also depends upon the variance of the population specified for the test. For example, if the variance in the example of the rough balance were larger than 9, then the power of the test against a particular alternative hypothesis would be less than it would be if the
(b) Determine what the alternatives to the hypothesis are (this tells whether to use a twotailed or a one-tailed test). (c) Set up the significance level for the test (this tells the rate at which type I errors will occur). (d) Using the significance level and the alternatives to the hypothesis, determine what the critical region is from the population defined by the hypothesis. (e) Check the data to see whether or not they fall into the critical region. (f) If the data fall into the critical region, reject the hypothesis, otherwise, do not reject the hypothesis. These are the basic steps which will be taken in testing hypotheses with the statistical tests to be discussed in this paper.
In setting up a significance test, the per cent of type I errors (the significance level) to determine the critical region was used for the test. Nothing was said about the type II errors for the test. The level of type I errors could be used to set up the critical region for the test only after the hypothesis to be tested had been specified.t A I- A To discuss type II errors, a particular alternative u (4.11) U = which -(.1 The rate at be must specified. hypothesis type II errors will occur depends upon the dif4/in ference between the alternative hypothesis and the hypothesis to be tested (called the null Here again, u has a normal distribution with hypoteis) as well as on the significance level for mean zero and unit variance so the test is the the test. The rate at which type II errors occur same as before. In all of the significance tests so far the increases as the mean specified by the alternative hypothesis and the mean specified by the null variance of the parent population had to be hypothesis get closer to each other. For example, known to set up the test, but seldom will that if the null hypothesis states that the mean is 1000 be known. Usually, we must rely on the sample grams, as in our previous example of the rough estimate of the variance derived from a sample balance, an alternative hypothesis of 1005 grams to estimate the variance of the parent population. will have a greater rate of type II errors than an If the sample estimate of the variance is substialternative hypothesis of 1010 grams. The rate of tuted for the population variance into the type II errors will also increase by decreasing the equation for u (equation 4.7), the resulting significance level. For example, for a given quantity no longer follows a normal distribution.
-
variance were 9. The variance of the parent population i something which controls the power Of the test. The power of the test can be increased in the same way that precision is increased, that is by using the mean of a sample for the test rather than a single observation. Test of a single sample mean. The population of sample means of all possible samples of size n drawn from a parent population which is normally distributed with mean Mumean and variance distributed normally no but with ariane sinetese means a normal distribution, the u-test (see previous section) applies. Letting y (in equation 4.7) be the amplesmeaniro a ran sample:
1955]
177
TABLE 2 However, the distribution of the resulting ratio (called t) is known. That is, if y is an observation Hypothetical sample of size 10 obtained by weighing drawn from a normally distributed population a standard 100-gram weight on a rough balance with mean p, and if is asample estimate of the Order of Estimate variance of the population from which y is WetiininGrams xi-x (x;- i) _(xi)_ drawn, then the quantity t where 10,020.01 1 100.1 t ty (4.12) 0.0576 10,100.25 100.5 -0.16 0.24 0.0256 2 _ 141
mg
Xi
follows the t-distribution. The frequency curve of the t-distribution is similar in shape to that for the normal distribution and is also a sym. . center at zero. . curve with its metrical When we were dealing with the quantity u, the value of c2 was exact; therefore, the distribution of u did not vary with the size of the smple. With t, however, s is an estimate and the precision of this estimate will vary with the size of the sample from which it is obtained. The distribution of t will therefore depend upon the size of the sample from which the sample estimate of the variance of y was obtained. When the distribution of t is computed, however, the degrees of freedom of s are used rather than the size of the sample from which it was obtained. The advantage of this method will be seen when the t-distribution is applied to more complex problems. The distribution of t has been worked out for various degrees of freedom and the values of t for the critical region have been tabulated.'
The use of the t-distribution is illustrated by the example given in table 2. A one per cent significance level and a two-tailed test will be used with a sample of size 10, therefore, df. = 10 - 1 = 9. Using a t-table we find our critical region for the test will include values of t which are less than -3.250 and values of t which exceed 3.250. From column 2 of table 2:
1
3 100.2 -0.06 4 100.5 0.24 5 100.4 0.14 6 100.6 0.34 100.3 0.04 ~ ~~~~~~~7 99.9 -0.36 8 9 100.2 -0.06 10 99.9 -0.36
0.0036 0.0576 0.0196

0.0016 0.1296 0.0036 0.1296
0.1156
10,040.04 10,100.25 10,080.16 10,120.36 10,060.09 9,980.01 10,040.04 9,980.01
Total... 1,002.6
l,
0.00 0.5440 100,521.22
In column 4 of table 2 10 Z (.1
Therefore
0.0604 (4.15) 9 The null hypothesis is that ,; = 100 and from equation 4.13, 2 - 100.26; therefore, letting y in equation 4.12 be the sample mean, I: -iI 2-
SS
0.5440 0 54 =
dof.
<
or
/
n
(4.16)
0.26 0.078
100.26 - 100 0
0.26
A/0604
V10
= 3.333
(4.17)
This value of t lies in the critical region for the - 0026 ' *at-test, therefore the hypothesis that the balance is unbiased is rejected, and it is stated that the Thus, sample mean is significantly different, statis116 tically, from the value given by the null hy2; - xi 1,002.6 10 (4.13) pothesis. In table 2 and equation 4.14 the deviations of 10 n each of the observations from the sample mean 4Statistical tables are included in most ele- were calculated, squared, and then added to obmentary statistical textbooks. There are also tain the S.S. The S.S. for a sample can be obspecial books of tables. Included among these tained by the use of a mathematical identity: are those of Arkin and Colton (4), Hald (5) and (2 Fisher and Yates (6). Unless otherwise noted, all _ / (4.18) (x- )'2- Z values used in this paper are those taken from SASH 2 n i-l Hald (5). i-l
i-1 '
178
ROEBERT L. STEARMAN
[VOL. 19
That is, to obtain the S.S.: square the individual observations, add them and subtract the square of the total of the observations divided by the number of observations in the sample. The reader can verify equation 4.18 by applying it to the data of table 2. Using equation 4.18:
TABLE 3*t Results obtained by two observers each counting the
same 12 plates
Plate No.
Plate Count Observer B Observer D
Difference (B-D) -102 -61 -17 +51 -31 -18 -20 -19 -21 -19 -4 +2
S.S. = 100,521.22 - 100,520.676 = 0.5440
R1 R2
R3 which is the same result obtained in equation R4 4.14. The mathematical identity shown in equation 4.18 saves time in computing the value for R5 R6 the S.S. when calculating machines are available. The identity is also used in setting up tables P7 P8 for more advanced statistical tests. Although either method may be used for computing an P9 P10 S.S. value, the identity will give a more precise answer if the mean of the sample is a number P1l P12 with a great many integers beyond the decimal point, since rounding off the number which repre-_ -259 sents the mean introduces an error which is mag- Total ....... 2,330 2,071 19,883 nified by the procedure of squaring and totaling 2;x2.........467,705 577,682 the deviations from the mean. * This table originally appeared in The BacTest of the difference bween two treatments: teriological Grading of Milk, by G. S. Wilson, 105, and is reproduced here with the permispaired samples. Until now, we have been testing page of Her Majesty's Stationery Office, the copysion owner right mean. a sample single concerning hypotheses t Note that in this table the summation sign Many times we may wish to test the difference between two treatments. For example, su1ppo5e Z, is used without the indices i = 1 and 12 and there is a standard method for isolating a given that there is no index on x. As this is the usual toxin from raw materi and a new method is to practice when there is no doubt about what is be tested to determine whether any improvement being summed, indices will be used from now on is obtained in the yield. Or, the recovery of a only when necessary. known added amount of metabolite to an unknown in checking for bias in a new away subgroups would be as nearly alike as we could make them. method may be tested. Another method of pairing is used if a treatOne of the methods of comparing two treatments is the method of pairing, a method used ment has no lasting effect on the unit to which the to a great extent in scientific research. It attempts treatment is applied. In this type of pairing, each to make the group submitted to one treatment as unit is treated with one treatment and, when it nearly like the group submitted to another recovers, is treated with the other treatment. treatment as possible. An example would be a Thus, each unit serves as its own control. An study to determine the effect of two treatments example of this would be obtained if two obon rats. If a group of rats is to be divided into servers were to count the bacterial colonies on a two subgroups, each of which would be given a given set of plates. Each of the observers would different treatment, the entire group would be count the same plates. In this way, each plate separated first into sets of two animals each. would give a direct comparison of the two Each set would be made up of two animals which observers. were as nearly alike as possible with respect to The data in table 3 (reference 7, table LVI, p. sex, age, weight, and so on. When the two 105) are the results obtained when each of two is are selected, onen groups for observers (B and D) counted the number of coltaken from each set of two for the first subgroup onies on each of 12 plates. These will be used to and the remaining animal from each set of two test for a differential bias of the two observers. The last column of table 3 contains the diffor the second subgroup. In this way, the two
221 141 63 249 292 79 161 397 118 93 94 163
323 202 80 198 323 97 181 416 139 112 98 161
teighete
1955]
STATISTICAL CONCEPTS IN MICROBIOLOGY X Z(d - d)'

'n-1
179
ferences between the counts of the two observers. In the analysis of a paired experiment, these differences (denoted by d) are treated as a sample from the population of differences. In this way the data may be analyzed by a test of a single sample mean. Letting y in equation 4.12 be d, the average difference for the sample, we obtain
t=
14,292.9167 - 1,299.3561 (4.23)
Therefore, using equation 4.20, -21.583 - 0 -21.583 -21.583 -* 4 t-
-8d
1,9(4.24) 12 (4~~~~~~4)1
--2.073
(4.19)
and, with our knowledge about the mean and variance of the population of sample mean,
equation 4.19 becomes
The t-value obtained in equation 4.24 does not fall into the critical region of the test; therefore the mean plate counts of the two observers are not significantly different, statistically.
d__i
n /k
(4.20) independe samples. The means from two samples

We will therefore use a t-test to test the differ ences. The usual hypothesis on this test is that there is no difference between the two treatments or in this example, the hypothesis is that both observers will, on the average, get the same plate count. With this hypothesis, pd- 0. The alternatives to the hypothesis will be that the mean is greater than zero (observer B will obtain a higher count) or that the mean is less than zero (observer D will obtain the higher count). Therefore, the test is a two-tailed test. Use the 5 per cent level of significance for the test, and since there are 12 differences in the sample, there will be 12 - 1 11 degrees of freedom. From the {-table, t.,75 with 11 degrees of freedom is 2.201; therefore, the critical region for the test will be values of t which are less than -2.201 and values of t which exceed 2.201. From column 4 of table 3:
=
Test of teU difference between two treatments:
can also be compared when the samples are not paired, but to develop the test for such a case, some additional information is needed, namely, the mean and variance for the difference between two sample means. This information is obtained from a more general bit of theory. If yi is an observation drawn from a parent population
with mean pv, and variance a",, and Y2 is an observation drawn from a parent population with mean i,, and variance ay,, then the mean of the difference between y1 and Y2 (denoted by is the difference between the means. That
o
= on
P-12
(4.25)
The variance of the difference between y1 and /2 (denoted by a42,,,) is the sum of the variances. That is = + 2 (4.26) OSf1<-2 = afl + fv2 Now, if yi is a sample mean, xl, for a sample of size ni drawn from a parent population with mean ,l and variance 4, and if 12 is a sample mean, x2, for a sample of size n2 drawn from a with mean P2 and variance parent a2, thenpopulation (4.27) #'ji-i - JA
Ed = -259
therefore,
= Ed n
-259 12
-21.583
(4.21)
Also from the same column,
Ud2
19,883
and
2 therefore, using equation 4.18, + (l9d-O-( -1 8
(4.28)
?,(d
-=
=
19883 ' 14,292.9167
12
Equations 4.26 and 4.28 hold only for inde-
(4.22)
Since there are 11 degrees of freedom,
pendent samples and not for paired samples. If the two samples are drawn from parent populations with a normal distribution, a u-test
180
ROEBERT L. STEARMAN
S
[VOL. 19
may be set up on the basis of the values of the mean and variance given in equations 4.27 and 4.28. That is, { 2) -*-=-2)-(1
u~~~~
al
<2
(4.29)
2 S. S.2 2S. S., an 82 d d.f.1 fd. The pooled estimate of as (denoted by s2) may be defined by _S.S., + S. S.2
will be normally distributed with mean 0 and variance 1. Thus, if the variances of the two parent populations were known, a u-test could be used to test some hypothesis concerning the difference between the means of the parent populations, however, this is not a common case in practice. If the variances of the parent populations are not known, as is usually so, a t-test may be used with
82 (4.30) Rni /-+'Ii n2 fll The t-test for the difference between two sample means falls into two categories: one type of test if the variances of the parent populations are equal and another type of test if the variances of the parent populations are not equal. The test for the equal population variances will be illustrated first. When the variances of the two parent populations are equal, the variance common to the two populations (denoted by a2) may be defined by
In other words, the pooled estimate of M2 is equal to the sum of the S.S. divided by the sum of the degrees of freedom. The degrees of freedom of a; is d.f.1 + d.f.2. When se is substituted for sk and s22 in equation 4.30:
t
M
- 2)
4)) -4.3 1+1

n, n2
(_ -_2)_-_ -_2)
01
a22~ = (42 (4.31)

-
The degrees of freedom for this test is the degreesof freedom of A To illustrate the use of the t-test for t shown in equation 4.34, the data from table 3 that were used to illustrate the test in which we had paired samples will be used. Note, however, that the test for unpaired samples should not be used when the samples are paired. Although pairing is used to a great extent in scientific research, an all too common failing is that the worker doesn't take advantage of the pairing when the statistical analysis is done. Instead, he uses the t-test from equation 4.34, which is designed for unpaired samples. The reason that we will use the paired data to illustrate the test for unpaired samples is to show what happens when the wrong test is
When M is substituted for the two population used. variances in equation 4.29: Again, test the
u=
(*1 - 2)
V
(4.32) i (-1 + _3 n n2
( -I
hypothesis that there is no difference in the mean plate counts for the observers, that is, test the hypothesis that MB MD = 0. The alternatives to the hypothesis remain the same. A two-tailed test and the 5 per cent
freedom, 82 and t will have 11 plus 11 or 22 degrees of freedom. From the t-table, t.s7 with 21 degrees of freedom is 2.074; therefore, the critical region for the test will consist of values of t which are less than -2.074 and values of t which exceed +2.074. From table 3:
2B = 172.583 S.S.B = 110,284.9167
Since the two variances
both*b estiates f thecommo
are
82 Will significance level will be used again. Since each equal, s2 and observer has 12 plate counts and 11 degrees of 2 v *
Each of these estimates contains a certain amount of information about oa. Pooling all of this information about M would provide an even better estimate of this common variance; a method is available for doing this. Since
2S. S. S2 = d.f.
put
(see equation 2.16)
(4.35)
(4.36)
s2 and s2 into this form and obtain
S2
10,025.9015
(4.37)
1955]
181
-D= 194.166 S =S-D 125,273.6667 4D= 11,388.5152 Using equation 4.33,

-
82 =
110,284.9167 + 125,273.6667
11 + 11
= 10,707.2083 Substitute these values in equations 4.34,
(4.38) there is quite a bit of variation in plate counts by each observer, the difference between (4.39) obtained their counts for the same plate does not vary (4.40) nearly as much. The greater the correlation between the observations for each pair in paired samples, the greater the loss in precision will be when we apply the test for unpaired samples. It (4.41) is for this reason that the test for unpaired samples should not be applied to data from
Paired samples are not the only source of correlated observations in a test of the difference between two treatments. It is possible to have 172583-194.166-0 correlation between observations within the I) 10,707.2083(1+ 12 12/ samples from each treatment. Here, again, the ) method for independent samples cannot be used. -21.583 = -0.511 * An example of correlation of this type will be found in the data from an experiment from As the value of t obtained does not fall into the Halvorson and Spiegelman (20). Yeast cells were critical region of the test, therefore, the mean nitrogen-starved for 80 minutes in a synthetic plate counts of the two observers are not ig- medium, replenished with NH4Cl for 15 minutes, nificantly different, statistically. Although this is centrifuged, washed, and resuspended in buffer. the same conclusion reached with the t-test for Equal aliquots were placed m each of 10 flasks: 0.3 per cent glucose control flg containin Ad . . ~~~~~~5 . values . of^ t obtained * . samples, compare the paired flasks containing 0.3 per cent glucose ad5ct using the two methods. From equation 4.42 the and 5 fks contaiing 0.3 per cent glucosew ith per cent a-methyl-glucoside (the two treatthe test est for of npaired amples t-value uing t-value using tnbained usini 0.5 ments). Free amino acid extracts were prepared whra th vau of t bane the test for paired samples, given in equation 4.24 from cells of each flask following aerobic incubais -2.073. Further ex.mition of these two tion at 30 C for 140 minutes. Glutamic acid was equations for t shows that the numerators are assayed manometrically by the decaboxylase the same and that the reason for the difference method, with two determinations per flask. The is that the denominator for the test for unpaired data from the control flasks given in table 6 (see the paired da data isI than tnat that for data. next section) are used to illustrate another type of or tnepi greatr tna data greater anal Checking back to the equations for the sample anayss Now, the amount of free glutamic acid will estimates of the variance for the two tests free (equations 4.23 and 4.41), we see that the reason vary from flask to flask. If a flask has a both determinations acid glutamic content, sis the for the difference in the denom difference in these two etates. The sample should be high and if the free glutamic acid test for the un- content is low for a flask the two determinations estimate*of th variance for the th esimated ofmhes visancefor timest fo the u will be low; obviously, the two determinations flask will be correlated, since their values imea for any that for the test for the paired samplesm acid e using s testp for un- will free glutamic that for has been tes be dependent the t forthe cotn ftefak upon h the aatu oss ffv by precision lost paired samples on the samples which are paired. The reason for this loss in precision is evident pairs of correlated values for each treatment; on examinaon n of the original data given in hence the t-test for the difference between two table 3. The actual numbers of colonies varied means cannot be used on the data as they stand. from plate to plate, thus, when the plate count As previously noted, the effect of using the test obtained by B increased, the plate count for D for unpaired samples on samples which are really also increased, and vice versa. This means that paired is to overestimate the variance of the the plate count of B is correlated with that of D, difference between the two means, with a that is, a change in D's count is associated with resulting loss in precision on the test. This in a similar change in B's count. Thus, although turn lowers the proportion of type I errors,
paired samples.
Vrlj5784.5~347
-0.511,~~
,sn
182
ROEBERT L. STEARMAN
[voL. 19
leading to the conclusion that the observed differences are not statistically significant, differences which would be statistically significant if the proper significance test were applied. The effect of using the test for independent samples on samples which have observations that are correlated within the samples is just the opposite. Here, the variance of the difference between the two means is underestimated. This raises the proportion of type I errors, which makes us state, as statistically significant, differences which would not be statistically significant if the proper test were applied. The difficulty of having the five correlated pairs of observations in our data is readily overcome: compute the mean of the two determinations for each flask and use the resulting data (now made up of five flask means for each treatment) for the t-test. Other slightly more complex methods of analysis are available but need not be used in place of the t-test. As noted in equation 4.33, when the population variances are the same, the sample estimates of the variance can be pooled. Pooling these estimates leads to the simplification of the equation for the t-test (from equation 4.30 to equation 4.34). If the variances of the two parent populations are not equal, the sample estimates of the variance cannot be pooled, since they are not estimates of the same quantity, and the t-t (equation 4.30) must be used without any smlfcto.^ Whe th population.variances When the populan simplification vari are not equal, the quantity t, as defined m equation 4.30, does not follow a standard t-distribution. However, its distribution can be approximated by a t-distribution with an approximately chosen degrees of freedom; of the several approximate solutions to this problem, two will be given here. Both solutions depend upon the ratios sI/n, and iVn2, therefore, the example will be worked out before taking up thes. The data in table 4 (statistical study of radioactivity measurements) illustrate the use of a i-test to test the difference between two sample
means when the variances of the two parent populations are not equal. The index of radioactivity used was the number of counts per minute (CPM); 12 determinations were run on a sample from solution 1 and 14 determinations were run on a sample from solution 2.
TABLE 4 Radioactivity measurements on two solutions
Solution 1
898 892 860
CPM
Solution 2
400 411 425 390 4
CPM
864 87
84
874 834 842
846 849
399 399 404 396

390 399
395
Zx n
2
10,353 12 862.75
ZX' 8,936,475
n S.S.
5,623 14 401.64 2,259,483

2258.437.79
(X) 8,932,050.75
4,424.25
402.20
1,045.21
80.40
82
ence
in the radioactivity of the two solutions,

a
the radioactivity of solution 1 is higher than that
ence of ve There hence a vrery high value of t is expzected. There are two alternatives to the hypothesis, namely,
highctivity thectwo
of solution 2 or the radioactivity of solution 2 is

higher than that of solution 1 (obviously, the former is the situation). Therefore, we will use a two-tailed test with a singificance level of 5 per
cent for the test. Substituting the values obtained in table 4 in equation 4.30, (862.75 - 401.64) - 0 461.11 t 40220 80.40 V-v33.52 + 5.74 12 + 14 (4.43) 461.11 461.11 - 39 73.542 6.27
To determine whether the value of t obtained falls into the critical region, the value for for the test must be known.
t.7
The hypothesis is that the two solutions have the same radioactivity per milliliter, that is - A2. From table 4, obviously there is a differ-
A simple approximate solution to this problem is to exmine s2/n, and 8n/f2 and take the degrees of freedom for the sample estimate of the
1955]
183
variance for the larger ratio. If sl/n, - 91/mg, then take the smaller of the degrees of freedom for the two samples. In the example, 42 has 11 degrees of freedom and 4 has 13 degrees of freedom, and 2 402.20 A * = 33.52 (4.44) nl 12
ns
s' and 4, then
When the degrees of freedom are the same for

tl
=
t2
t'
(4.48)
and both of the approximate solutions will give the same result. Under other conditions, the second approximate method gives the more exact results
-One point to be considered in deigning an -2 8040 (4.45) experiment in which a test of the difference 1T4* 5.74 between two treatments is to be made, is that if the variances are equal, maximum precision for of /ni test is obtained, for a given size of experiment, is larger the de eedomfor frothe t-table, ~t97' the 4,which is 11, are used. From If the variances are if the sample sizes are equal. m p for 11 degrees of freedom is 2.201, therefore, the n e for aoent critical region for the test will be values of t otained, for a given size of experiment, if itthe whic arelessthan- 2.01 ad vaues f sample sizes are proportional to the variances of which exceed 2.201. The value of t obtained in the parent populations from which they were equation 4.43 falls into the criticl region, there- drawn. fore we reject the hypothesis. Test of two sample estimates of the variance. It The second approximate solution to be has been noted that the method used to test the illustrated is that given by Cochran and Cox difference between two sample means depends (8, page 92) in which the value of t for the upon whether the parent populations of the two critical region is determined directly. We wish to samples have the same variance. In general, this be t. for is not known, so a test to decide whether they are estimate t.975 for the test. Letting t11i the degrees of freedom of and t2 be t.97r * for the same must be devised. The test used for this the degrees of freedom of 4, then the value of purpose is the F-test. It stems from the fact that t.976 for the test (which will be denoted by t) is if a random sample is drawn from each of two given by parent populations which are normally dis2 tributed, the distribution of F, defined by tnlJ +7J 4.49, is known.
Sinc
tobtained,
equation
81 + 82
n,1
= =---2;2 4/
(4.49)
02
From the t-table, t.975 with 11 degrees of freedom is 2.201 and t.976 with 13 degrees of freedom is 2.160. Therefore, substituting these values and the values from equations 4.44 and 4.45 into equation 4.46,
i
(33.52)(2.201) + (5.74)(2.160) 33.52 + 5.74

73.77752 + 12.39840 86.17592 2 39.9. 6
(447)
2.195
From the value obtained in equation 4.47, the critical region for the test will be values of t which are less than -2.195 and values of f which exceed 2.195. Again, the value of t obtained in equation 4.43 falls into the critical region of the test, and the hypothesis is rejected.
The frequency curves of the F-distribution are not symmetrical, and there are no negative values of F, since none of the values entering into the equation for F are negative numbers. The distribution of F depends on the precision of both of the sample estimates of the variance, thus the degrees of freedom of both of these estimates are used in setting up the tables of F for the determination of the critical region of the test. All tables of F are one-tailed tables. The usual method of tabulation is to set up a separate table for each of the various percentage points of F. Then, the columns of each table are assiged to the degrees of freedom of the numerator, while the rows axe assigned to the degrees of freedom of the denominator. The usual hypothesis in an F-test is that the two population variances are equal, i.e., the
184
ROEBERT L. STEARMAN
[VOL. 19
variance terms in the equation for F cancel out, leaving

F--1
81
2
fore, the critical region for the test will be values of F which exceed 3.20. Substituting the values from table 4 into equation 4.50,
e252
(4.50)
F-402.20 F~ 8090 -500
~~~~~~~~~~~~80.40
(.1 (4.51)
The test involved when we are checking to see The value of F obtained falls into the critical whether the variances of the parent populations region for the test, therefore, the hypothesis is are equal in order to determine which method to rejected that the population variances are equal, use in testing the difference between two sample and our statement was not in disagreement with means is a two-tailed test. This arises from the the data. Since under usual operating conditions t-tests for the difference between sample fact that if the variances are not equal, the for the for unpaired samples, it is not known varic the first population may be greater whether the population variances are equal or variance of omeans than that of the second or the variance of the not, it is a good idea to run an F-test to decide second population may be greater than that of whether they are equal before deciding which tthe first. Thus, to obtain the critical region for test to use. the test with a significance level of 5 per cent, the Snipcani ersuspracticaldifferences.Whenthe value of F. o2 and F.m76 must be found. There are no tables of F.026; all of the available tables null hypothesis is rejected on a statistical test, are set up for the upper tail only, but from we say that the results of an experiment are equation 4.50 it can be seen that for F to fall significantly different, statistically, from the below F.02s, S4 must be greater than s,. To use results we would expect if the null hypothesis is the available tables, then, simply interchange true. The fact that the difference between the subscripts on the two sample estimates of observed results and expected results is significant the variance so the larger estimate of the variance does not mean that this difference has any is in the numerator. In this way F-tests are made practical value. It is also true that a difference with the available tables, since values which of great enough size to be of practical value may would ordinarily fall below F.o02 are now trans- exist between the true population and the formed into values which will exceed F.976 for population given by the null hypothesis and still the null hypothesis will not be rejected by a the new test. statistical test of the data obtained in an experiThe F-test will be illustrated by testing the as- ment. The size of the difference between the true sumption made concerning the population var- state of affairs and the state given by the null iances for the t-test for unpaired samples with hypothesis which is necessary to obtain a staunequal variances. When we ran the t-test for tistically significant difference depends on the unpaired samples on the data on plate counts of power of the test used. The power of a test is dependent upon the 12 plates by each of two observers (table 3), we said that the population variances were equal. The sample estimates of the variance cannot be varance of the parent population (or populations) tested to see whether this is correct as the F- and o the size of the sample (or samples) used and to be independent and for the test. The power of the test is high if the test requires s s2 since the samples are paired they are correlated variance of the parent population is low. Power can be increased by increasing the size of the and thus not independent. When the t-test on the radioactivity measure- sample used for the test. Thus, if the variance is ments on the two solutions (table 4) was made, low or if too large a sample is taken the result it was said that the population variances were will be a test which is powerful enough to show, not equal. This can be tested with an F-test. as statistically significant, differences which are From table 4, s4 = 402.20 with 11 degrees of of no practical value. In fact, if the sample is freedom and s2 = 80.40 with 13 degrees of free- large enough, a statistically significant difdom. Since 82 is the larger, place it in the numerator for the test. Therefore, if a test is run ference will be obtained between any two with a significance level of 5 per cent, F.975 for 11 procedures unless they always give identical and 13 degrees of freedom (the number given results. The larger the sample, the smaller the first is the d.f. of the numerator) must be found. difference that can be detected by the test. It is a good idea to examine the data from an From the F-table, the value needed is 3.20, there-
1955]
185
experiment before a statistical test is run to see whether the difference between the observed results and the results expected, if the hypothesis to be tested is true, is great enough to be of practical value. If the difference is of sufficient size to be of practical importance, go ahead with the test. If the difference is not of sufficient size to be of practical importance, there is no need to run the test. At the other extreme, if the variance of the parent population is high or if too small a sample is taken, the power of the test may be so low that a difference which is of great enough size to be of practical importance may escape detection by the test. It is for these reasons that sample size is so critical. The size of a sample must be sufficient to detect differences of practical importance but not so large that differences too small to be important are also detected. A person who uses statistical tests must never become so enamored by the test that he loses sight of the problem to which the test is being applied. It is important to examine the results of a significance test carefully and translate those results in terms of the original problem. In making this translation, we must keep the power and the significance level of the test in mind. Intrpretation of resus of snificance tests. When we reject a hypothesis and say that the observed results are significantly different, statistically, from the results expected if the hypothesis were true, it is always important to examine the data to see how (in which direction) the observed results differ from the expected results. For example, if a person checking for bias in an instrument rejects the hypothesis that the instrument is unbiased, he should examine the data to see whether the bias is positive or negative. Thus, for the example of the rough balance (table 2), the hypothesis, which was rejected, was that the mean was 100 grams and, since the sample mean was 100.26 grams, this means that the balance is positively biased. If one tests the difference between two means and rejects the hypothesis that they come from populations with equal means, he should examine the data to see which population mean is the larger. Thus, in the example of the two solutions of radioactive phosphorus (table 4), the hypothesis, which was rejected, was that the two population means were equal and, since the sample mean of the first solution is higher than
the sample mean for the second solution, it is concluded that the level of radioactivity per milliliter for solution 1 is higher than that for solution 2. When we say that we do not reject a hypothesis, this does not mean that we regard the hypothesis as true. As has been pointed out, the size of the difference between the null hypothesis and the true state of affairs which is necessary for a test of hypothesis to be significant depends upon the power of the test. Thus, a difference may exist but our sample size may not be large enough to detect it. Therefore, the interpretation given when a hypothesis is not rejected is that a difference may exist, but if it does, the sample taken was not of sufficient size to detect it. For example, when the difference between the plate counts of observers B and D (table 3) was tested, the difference was found not to be statistically significant. However, this does not mean that a difference may not exist, only that if a difference does exist, the sample size is not sufficient to detect it. When the difference between observed results and expected results is of sufficient size to be of practical importance, but the difference is not statistically significant, it is a good idea to repeat the experiment, using larger samples. Thus, if the difference in plate counts between observers B and D is large enough to be of interest, the experiment should be repeated using more plates. It could be that there is a definite difference to be expected between the plate counts of these two observers. If so, it may be possible to detect it with a test in which the samples are larger. On the other hand, the observed difference may be a result of the variation in counts for the two observers and the next test may turn out to be not significant also (in fact, it could easily turn out that in the next test B's plate counts are higher than D's). When results of statistical tests are published, it is important for the writer to give the significance level used for the test. Many times, authors will say that results obtained in an experiment were significantly different, statistically, by a t-test for instance, but fail to mention the significance level used in the test. Since not everybody will agree on the significance level to be used, the writer should not only include the significance level used, but he should also give the value obtained in the test along
186
ROEBERT L. STEARMAN
[VOL. 19
TABLE 5* with the degrees of freedom for the test. In this way, other investigators who might wish to Effect of oxidative rancidity of lard on germination of spores of a putrefactive anaerobe examine the results at a different significance level (plate counts) may do so. Another point which is helpful in Lard Lard this respect is that a writer should, if possible, (cNo:tRan) i d Control Totals give all of the original data on which the test (Kreis +) (Kreis-) is based so other investigators will have a chance to check the results for themselves.
Test of the difference among more than two
71
123
133
60 118 140 treatments. Many times we may wish to study the effect of more than two treatments. For 132 129 70 127 127 example, the problem may be a test of the dif-70 138 136 62 ference in the effect on growth among each of 133 130 73 several amino acids when they are added to 119 74 cultures of a given organism. The differences 129 122 121 85 could be examined by applying t-tests for the 144 149 88 difference between all of the possible pairs of treatments, but difficulties will be encountered. 3,358 1,285 1,317 The number of t-tests that will be required T = ZX 756 30 10 10 10 mounts rapidly as the number of treatments n 131.7 75.6 128.5 increases. Thus, 3 treatments would require 3 1 ) 58,148 166,219 174,041 398,408 1-tests, but 10 treatments need 45 t-tests. When 57,153.6 165,122.5 173,448.9 may be number of t-tests there increases, the 592.1 2,683.0 994.4 1,096.5 difficulty due to the significance level of the 65.79 121.83 110.49 1-test. If a significance level of 5 per cent is used, S32 the hypothesis will be rejected, on the average, * This table originally appeared in the Journal 5 per cent of the time when the hypothesis is true. Under these conditions, if 45 t-tests were of Bacteriology, page 431, Volume 63, and is rerun, at least some of these would be expected produced here with the permission of the copyto turn out significant even though there was no right owner. difference among the population means of the 10 treatments being tested. Conflicting results may have two estimates of the common variance of also result from the t-tests. For example, consider the populations This will be illustrated with the data of Roth three treatments, A, B, and C, lettered in increasing order of magnitude of the sample and Halvorson (9) given in table 5. These data means; t-tests may show that A is not sig- are from an experiment designed to show the nificantly less than B and B is not significantly effect of oxidative rancidity on the germination should be no of bacterial spores. There are three treatments, less than C. In this way, there ther , 1-tes the control medium, medium with non-rancid between tes C. Hw, lard added observations with rancid lard , are plate added. The and the medium may reveal that A is significantly less than C. counts. Another point to be considered is that when If the variances of the populations from which t-tests are run, the information is used from the a some of the samples are taken are equal, a pooled estimate two of the treatments at time; only information which could be contributed by the of this common variance can be obtained by the same procedure used for the pooled estimate other treatments is missed. The method developed for testing the dif- of the variance for the test of the difference ference among more than two treatments is the between two treatments (see equation 4.18). The analysis of variance. The analysis of variance is pooled estimate of the variance is equal to the based on the fact that if we have a set of popula- sum of the S.S. divided by the sum of the tions which have the same variance, and we take degrees of freedom, or (4.52) pooled 82 = 2S.S. a random sample from each of these populations, Zd.f. then, if the population means are equal, we will
Ahis wandy. differehnc diferece.n
19551
187
This gives one estimate of the common variance (call this the internal estimate of the common variance). In the example, each of the samples have 10 observations, therefore each has 9 degrees of freedom. Thus,
pooled '
and
2,6830 = 9937
(4.&3)
27 If the population means of the treatments are equal (this is the hypothesis), the variance of the sample means is equal to the variance common to the populations divided by the size of the samples (see equation 3.1), or
2
-
2
n
An estimate of ar is obtained from a sample estimate of the variance of the sample means. If x is the mean of the sample means, that is,
2x
k
(4.54)
where k is the number of samples, which is 3 m the example, then, the estimate of the variance Thus, of the means (denoted by 4) is
(4.58) 1081 = 9926.5 nZ X = 9 The two estimates of the variance common to the populations can be examined by the F-test to see whether these two quantities are estimates of the same variance (statistical theory states these estimates are independent so the F-test can be used). The hypothesis for the test is that the population means are equal. If this is true, both the pooled 82 and n84 will be estimates of the same variance. If the population means are not equal, ned will be an estimate of something larger than the variance common to the three populations since there will be additional variation due to the difference among the population means of the treatments. Thus the test has only one alternative to the hypothesis, that is that nsm is an estimate of something which is larger than the variance estimated by the pooled s'. Therefore, the F-test is one-tailed and, since =2 is an estimate of a quantity which is equal to or larger than the variance estimated by the pooled e, n4 is always put in the numerator of F.
-
n8F pooled s2 (4.65) Now test the hypothesis that the population 2 (2( - ) means of the three treatments for the data in Now, 42 is an estimate of a/n thus, n is an table 5 are equal using the 5 per cent significance Since the test is a one-tailed test, the value estimate of o2, giving the second estimate of the level. critical is F.95 2 andthe with that 27 region variancecommo to the th, degrees for freedom. of the An F table reveals common three population variance populatons (we needed to thsteetra siaeo h omn critical region for the test will be values of F variance), which exceed 3.35. Substituting the values obtained in equations 4.53 and 4.58 into the equaThe identity in equation 4.12 can be used to tion for F (equation 4.59), determine the S.S. for, the sample means. Thus, for our example, F= 9,926.45 99.89 (4.60) 99.37 20- 39,572.50 This value falls into the critical region of the and test, therefore the hypothesis that the population means of the three treatments are equal is (2)2 112,761.64 = 3758721 3 The computations necessary for the analysis of therefore, variance are simplified by the use of an analysqis 3957.5 -3,57.1 195.9 (4.56) 4.6 vc of nai" r mlfe yteueo S.S.i 39,572.50 -37. table. This table takes variance X 587.21 1,985.29 computing
caran call
~=3,57.1rejected.
-
Since there are three sample means, the degrees of freedom of 4- is 2. Thus,
advantage of the type of mathematical identity shown in equation 4.13 to shorten the computing
necessary to reach the F-test. The method used in the table consists of splitting the total S.S.
- 1,985.29
2
992.645
(4.57)
for the experiment into its component parts.
188
ROEBERT L. STEARMAN
[VOL. 19
One part of this total is the S.S. for the variation safely under most practical conditions. To make among the sample means (this will lead to ns2) the preliminary test on variances is rather like while the other portion will be the S.S. for the putting to sea in a rowing boat to find out variation within the samples (this will lead to the whether conditions are sufficiently calm for an pooled s2). The total degrees of freedom for the ocean liner to leave port!"5 experiment are also split into similar parts. The It is emphasized that the analysis of variance details of the computing instructions for the is b no means completel insensitive to lack of y y analysis of variance computing table are given min t of the variances. Variation in the the appendix. Several assumptions underlie an analysis of population variances can be tolerated as long as variation does not exceed reasonable limits. variance, three of which have been mentioned the If the variation among the population variances already, namely, that the populations sampled are is too great, methods are available such as that normally distributed, the samples taken are of James (15) and Welch (16) for correcting the random samples and the populations have equal analysis of variance rocedure for the lack of population variances. Eisenhart (10) lists and of the variances. Transformations homogeneity discusses te these and.the assumptions r are also of use in problems requiring the use of an the chran describessomeof and Cochran (11) describes some of the conse- .............. of variance when the variances of the quences when these assumptions are not satisfied. analysis Statistical tests are available to check whether original measurements are not homogeneous. For example, radioactivity measurements using an the assumption of equal population variances is Autoscaler have a constant coefficient of variation met. When the population variances are equal, adteeoetevracsaenteulwe therefore the variances are not equal when they are said to be homogeneous, thus, a test if the logarithm of the radioactivity used to see whether the variances are hois use teevarianc iltb mogeneous is called a homogeneity of variances test. m ere sedo the cmmonly ue Two homogeneity of variances tests are used mogenes. fairly frequently; Bartlett's (12) method, which transformations are and by may be used if the degrees of freedom for each Bartltta(17). of the samples in the experiment is at least 4 and Box's (13) method, if the degrees of freedom Problems of Estimation for any of the samples is less than 4. Most experiments are run to estimate an Box (14) has shown that homogeneity of variances tests are very sensitive to non- unknown quantity and not to test some hynormality of the parent populations. This serves pothesis. For example, we may wish to determine as a drawback to the use of homogeneity of the number of bacteria in a sample of milk or the variances tests unless the populations are indeed amount of a given metabolite in some source or normal, since a statistically significant result may the potency of a toxin. Here, no particular be due to non-normality or lack of homogeneity. hypothesis is to be tested; instead, we have a problem of estimation. The same type of problem This fact led Box to conclude: may arise if a hypothesis is rejected. For example, "It has frequently been suggested that a test if the hypothesis that a measuring device is unof homogeneity of variances should be applied biased has been rejected, we may wish to debefore making an analysis of variance test for termine the amount of bias, or if the hypothesis homogeneity of means in which homogeneity of that two procedures give the same result is variances is assumed. The present research sug. . . gests that when, as is usual, little is known of the rejected, we may wsh to determine the magnparent distribution, this practice may well lead tude of the difference between them. to more wrong conclusions than if the preliminary Confidence intervals: basic principles. In test was omitted. It has been shown that in the problems of estimation, we must, again, contend commonly occurring case in which the group with the old problem of the variation of measuresizes are equal or not very different, the analysis ments. For example, consider the problem of of variance test is affected surprisingly little by variance inequalities. Since this test is also known 6 This paragraph originally appeared in Bioto be very insensitive to non-normality it would metrika, page 333, Volume 40, and is reproduced be best to accept the fact that it can be used here with the permission of the copyright owner.
~~~~~homogeneity
adi
tand however,
fisted
dlscussed
1955]
189
determining the bias of a balance. Now, if the balance gave exactly the same weight each time, it would be a simple matter to determine the bias of the balance; we would need only to obtain an estimate of the weight of a standard weight and the difference between the estimated weight and the true weight would give the bias of the balance. However, balances just don't do that. If a standard weight is weighed 10 times, the mean will give an estimate of the bias. If the same weight is weighed 10 more times the mean of the second set of 10 observations will give another estimate of the bias, but the second estimate probably won't agree with the first. Thus an estimate of the bias can be obtained from the mean of 10 observations but this mean, by itself, gives no idea about the reliability of the estimate. To obtain an estimate with a known amount of reliability, we utilize the information in the sample concerning the variation to which the observations are subject plus the information contained in the mean of the sample concerning the true amount of bias and come up with a range of values as an estimate of the bias of the balance. That is, all of the information the sample has to give is utilized and the statement is made that the true amount of bias lies within a certain range, the range being determined by the information in the sample. Now, we could say, with absolute assurance, that the bias of the instrument was between minus infinity grams and plus infinity grams. However, such a range would be of little value from a practical point of view. To obtain a range of values for the true bias which has greater practical usefulness, we settle for a little less assurance that the true bias lies within the given range. Consider a parent population with a population parameter, 0; take all of the possible samples of a given size from this population; for each of the samples some procedure is used for determining a range with which to estimate 0 (exactly the same procedure is used on each of the samples). The procedure for setting up the range is such that the ranges for a certain percentage of the ples will contain 0. The ranges set up in this way are called confidence interas and the per cent of the samples whose confidence intervals contain the population parameter is called the confidence coefficient. Thus, if the procedure is such that the confidence intervals of 95 per cent of the samples contain 0, the confidence coef-
ficient will be 95 per cent. The procedures for setting up confidence intervals follow. Confidence interval for a population mean. Before proceeding to the confidence interval for a population mean, a new notation will be needed, namely an inequality sign to show relative magnitudes. This sign is a V lying on its side, the open end pointing toward the larger quantity, thus, to show that A is less than B use the symbol A < B, or stating this inequality in another way, B is greater than A, as shown by the symbol B > A. The following example may help to clarify the use of these symbols.
2 <7 < 16
(4.61)
Symbol 4.61 says that 2 is less than 7 and 7, in

turn is less than 16. This can be stated in another way by saying that 7 lies between 2 and 16. This type of notation is used in setting up confidence intervals. The confidence interval for a population mean is derived from the t-test shown in equation 4.16. If the 5 per cent significance level is used, the test will not be significant if
___
t.2<
< t
62) (42
Now, if all three members of the inequality are multiplied byx/,i7/, which is a positive number, the direction of the inequality will be unchanged (the reader can verify this by multiplying all of the members of the inequality shown in 4.16 by some positive number, say +2). Thus,
t.Ou6-/sP/n < 2 -
< t.978n
(4.63)
If all three members of the inequality are
multiplied by-1, that is, if the sign of all of the

members is changed, the direction of the inequality will be changed (the reader can verify this by multiplying all three members of the inequality given in 4.61 by -1 to give -2 > -7 > -16). Thus, multiplying all three members of the inequality shown in 4.63 by -1, (4.64) -t.02sN~i; > s -x > -t.9,7r
Now add t to all three members of the in-
equality, this will not change the direction of the inequality (the reader can verify this by adding
some number, either positive or negative, to all
190
ROEBERT L. STEARMAN
[voL. 19
three members of the inequality given in 4.61). quantity can be used to test the hypothesis that the variance of a population is equal to a given Thus, The test is much the same as any other quantity. *-02sV/'82 > It > *- t. 975N/8/n (4.65) test of hypothesis, such as the t-test, so its use Rearranging the inequality given in 4.65, will not be illustrated. The test is two-tailed, hypothesis is rejected, for a - t.g75V/O7/n < A < - o~r,,N/_n (4.66) therefore, the significance level of 5 per cent, if Inequality 4.66 defines the confidence interval S. S. for a population mean with a confidence coef< x2. (4.71) xc.o < ficient of 95 per cent. The confidence interval will be illustrated with Now, taking the reciprocal of each member of the data in table 2, from which the hypothesis the inequality (i.e., each member of the inthat the rough balance was unbiased was rejected. equality is divided into 1), the direction of the From equation 4.13, : is 100.26 and from equa- inequality changes (the reader can verify this tion 4.17, ViO7i is 0.78. The number of degrees of by taking the reciprocal of all three members of freedom for t was 9, thus t.o26 is-2.262 and t.97s inequality 4.61 to give H > % > K6). Thus,
is +2.262. Substituting these values in the inequality for the confidence interval, 100.26 - (2.262) (0.078) <,A < 100.26 - (-2.262) (0.078)
or
1 >
> 1 >
(4.72)
Multiplying all three members of inequality 4.72 by 8.S.,
100.08< p < 100.44 (4.67) Thus the confidence interval, with a 95 per cent confidence coefficient, for the mean of the estimates of the weight lies between 100.08 and 100.44. The confidence interval for the bias is obtained by subtracting the true value of the standard weight from each, thus, 0.08 < bias < 0.44 (4.68)
8.8. x).om
S.S
>
xe.W
(4.73)
Rearranging inequality 4.73 gives the confidence

i
The confidence interval for a different confidence coefficient is determined by the appropriate choice of the significance level used in setting up inequality 4.62. For example if t.oo6 and t.996 (for the 1 per cent significance level) is used, the confidence interval for the population mean with a confidence coefficient of 99 per cent or results. This would be
-
interval for the 95 per cent confidence coefficient: S. S. S. S( - < a2 < (4.74) X2.n x'.i Using the data in table 2 as illustration, from equation 4.14 the S.S. is 0.5440. The degrees of freedom will be 9 again, therefore X2.o25 is 2.70 and x2.975 is 19.0; thus the confidence interval, for the 95 per cent confidence coefficient, will be
05440
19.0 < o <

2
05440
2.70
0.029 <
< 0.201
(4.75)
t.996N/0n < p< e2-Cor-t0/6
(4.69)
Confidence interval for a population variance. means. Two methods are available for de-
Confidence interval for the difference between two
This confidence interval for a population variance termining the confidence interval for the difis based on the chi-square (x2) distribution. The ference between two population means, correconfidence interval is derived from the fact that sponding to the two cases for the t-test for the quantity independent samples (paired samples will use
X
S. S.
(2=
(4.70)
follows the chi-square distribution with the degrees of freedom associated with the S.S. This
equation 4.66). If the population variances are equal for the two treatments, the equation for the t-test (equation 4.34) is used to derive the confidence interval for the difference between the two
1955]
191
population means. The resulting inequality is

~
1
TABLE 6* Free glutamic acid in N replenished cells (control)

Glutamic Acid mg dry cell ~~~~~~~isx/100
21
f22- CMs7
No )Flask Flask No. n, n2 4/ < JAI - ;92 < 21- 22 (4.76)

$/ tP ( 1 +
Totals
Flask
-t~o26 CM//89 (n
+/ni n2
+ ala)
19.6 20.4
40.0
where e2 is the pooled estimate of the common variance defined by equation 4.33. If the population variances are not equal for the two treatments, we use the equation for the t-test (equation 4.30) along with the equation for the approximate t-value, t' (equation 4.46) to derive the confidence interval for the difference between the two population means. The resulting inequality is 2 / 2
*1-*-t'.s iV/!1 +
ni
~2 3
4
17.9
17.2 18.0 18.9 19.6 17.3

17.5
35.1 35.2
38.5
34.8
Grand total . .183.6 * This table originally appeared in the Journal
82 ni
of Bacteriology, page 605, Volume 65, and appears here with the permission of the copyright owner.
replenished with NH4Cl for 15 minutes, centrifuged, washed, and resuspended in buffer. Equal
< ;I1- I2 < xl
(4.77)
aliquots were placed in five flasks, containing 0.3 per cent glucose. Free amino acid extracts were prepared from cells following aerobic incubation Components of variance technique. In some at 300C for 140 minutes. Glutamic acid was asni
-/ '2 +
nA
laboratory procedures variation in results may sayed manometrically by the decarboxylase arise from more than one source. For example, method, with two determinations per flask. each of a group of technicians makes the same Consider the parent population for the sample measurement on an unknown; the variation i given in table 6. This population will be made up results for the group will arise from two sources: of many glutamic acid determinations on each of variation in results due to the measuring device many flasks. Denote a member of this population, and further variation arising from the dif- say the ith glutamic acid determination on the ferences among the technician Another example jth flak, by Xii. Now, the glutamic acid dewould be a procedure in which two steps are terminations for a given flask, say the jth flask, required before the final result is obtained with will have a mean value which is called the flask variation in results from each of the steps in the mean and denoted by X.,. Thus if there are N procedure. When a problem arises in which it is determinations for the jth flask, necessary to obtain an estimate of the amount of variation arising from each of the sources, the (4.78) X., = Mx'' N components of variance technique can be used for this purpose. The components of variance technique is an extension of the analysis of The parent population will also have a mean variance and is based on the expected values of value, denoted by jA, which will be the mean of the mean squares in the analysis of variance all of the glutamic acid determinations for all of the flasks. Thus, if the total number of detable. terminations in the population is N, _____(479 We will use the data in table 6 to illustrate the IL = E(x1) = N479 components of variance technique. These data are the control portion of the experiment of Halvorson and Spiegelman (20) discussed previously (page 181). Yeast cells were nitrogen- The measurement of any member of the parent starved for 80 minutes in a synthetic medium population can now be broken down into compo-
192
ROEBERT L. STEARMAN +
[VOL. 19
nent parts, thus,

X
=
(X., - A) + (Xe, -X
(4380)
That is, a particular glutamic acid determination is the sum of the parent population mean plus the deviation of the flask mean from the population mean plus the deviation of the determination Treatments (flasks).... M.S. Od + nf Observations (determifrom the flask mean. 2 ad nations) M...Od........... The variance of the population, denoted by 2, can also be broken down into component parts. . Part of the over-all variation is due to the And an estimate of A2sI variation of the flask means about the population 2 M.S.T- M.S.0 mean. This variation would be due to such (4.84) n sf = things as the variation in the number of yeast cells . in an aliquot and the variation in the aliquots 2 . themselves. The variance of the flask means where MS.T is equivalent to nsa. The use of table 7 and the accompanying equaabout the population means can be denoted by tions 4.83 and 4.84 can be illustrated with the o? and will be ad= EtX~iM)21 in table 6. Table 8 is the analysis of variance = j El (4.81) data JA)'j I~f computing table for the data. From the analysis Another part of the over-all variation will be due of variance table M.S.o is 0.2300, therefore 2 to the variation of the glutamic acid determina(4.85) 8d = 0.2300 tions about their flask means. This would be the usual variation due to the procedure of obtaining and, since M.S.T is 2.8185 and n is 2, the estimate estimates of the amount of glutamic acid. The of 2s will be variance of the glutamic acid determinations 2.8185-0.2300 2 = 1.2942 = (4.86) about the flask means can be denoted by crd and 2, will be estimate of the variance due to the varia_ -09.82 = EI(ii (482) The 2= El(Xtion in glutamic acid determinations is 0.2300 *d Both ad and afJ will be components of 2. The while the estimate of the variances due to the components of variance technique gives an variation of glutamic acid content among the estimate of each of these components of a2 from flasks is 1.2942. Obviously, the contribution from a sample taken from the population. Denote the the difference in results for different flasks to the aby 2 aof over-all variance is greater than the contribution eof of o, by sl o-d D d. from and the ^estimate estimate variation in results due to the glutamic acid * . * 2 * Thus, sf gives an estimate of the variance due to determinations. The components of variance variation in results among flasks and Ad gives an technique can also be applied to experiments in estimate of the variance for the procedure for the which there are unequal numbers of observations for the various treatments, for example, unequal glutamic acid determinations. If we take a random sample of k flasks and numbers of glutamic acid determinations per run n determinations on each flask, an analysis flask, but the method is slightly more comof variance computing table can be set up for plicated than the method for equal numbers of the resulting data as given in the appendix. The observations for the various treatments. estimates of 2 and ad will be based on the the One point to be noticed is that the F-test for in the the mean mean square square terms in of the expected values of fdt ftetp teaayi fvrac the type data of of variance of the i s analysis 2 resulting analysis of variance table. The expected shw in tal et h yohssta of the mean square terms for an experivalues v0 That is, if the value of F falls into the critical * ment involving n determinations on each of k region, is significantly different we say that flasks are given in table 7.f An interesting phenomfrom zero, statistically. i. o From table 7 we see that M.S.o (equivalent enon ificantly different occur if af is not signficnldifrt may of2 to the pooled s2) is an estimate Ofld~, that is, from zero: if the value of F is less than 1, then (4.83) M.S.T < M.S.o and SJ will be negative. This is d2 = M.S.0
..
-
TABLE 7 Expected values of the mean square terms in an analysis of variance table for n determinations on each of k flasks Variation Due to: Mean Square E(MS.)
expecatedvalues
taser
1955]
193
TABLE 8 Analysis of variance computing table for the data in table 9 Part 1: Preliminary calculations
(1)
Source of Variation
By substituting the estimates from the components of variance technique into the equation, an estimate of this variance is,
2
2
nfnd
(2)
Total of
(3)
(4)
(5)
Total of Squares
nf
Thus,
Obser-
Squares
vationsr Squared Squared ber Item peri (2) O + (4)

1
10 3,370.896
No. of vations Items per
1.2942 + 0.2300 (.9 n(4-89)
nf
nf nd
Correction*.... 33,708.96
..........
Now, the variance of the mean can be decreased,

thus the precision, by increasing either increasing the number of flasks or the number of determinations per flask or both. Where sf is the larger of the two components, the greatest gain in precision will be obtained by increasing the number of flasks. This is illustrated by considering an experiment with a total of four determinations. There are three ways of obtaining four determinations: one flask with four determinations, two flasks with two determinations per flask, or four flasks with one determination per flask. Which method gives the
greatest precision? If four determinations are done on one flask, the variance of the mean of the determinations
Flasts 638343210 Determinations. 3,383.32 * _
1 3,383.320 '*______ Part 2: Analysis of variance table (6) (7) (8) (9) (10)
Variation Due to:

Flasks .........
Squares (S.S.)
11.274
Sum of
Degrees Mean
of Free- Square (s'u2) +
dom
(7) (8)
(d.f.)__
4 2.8185 12.254
Determinations.
5 0.2300 1.150 ___________ ______ ___* 9 12.424 Total.

.
* The correction term does not constitute a source of variation,
will be rather surprising; of, by its very nature cannot 1.2942 0.2300 2 be negative. A negative value of sf can only 8 = 1 + 4 = 1-2942 + 0.0575 = 1.3517 occur when 2~ is not significantly different from zero. When this occurs, reference to table 7 With two determinations on each of two flasks shows that both M.S.T and M.S.o will be esti- the estimate of the variance of the mean will be mates of ad. Since estimates of variances are subject to variation (this is the reason for having S2 = 2 + 4 = 0.6741 + 0.0575 = 0.7046 the F-distribution to test sample estimates of variances) it is not surprising to see M.S.T less which gives a fair increase in precision. On the than M.S.o when both are estimates of ad. other hand, if one determination is made on Therefore, we use zero as our estimate of af each of four flasks, the estimate of the variance when gf turns out to be negative. An example of of the mean will be a negative estimate of a component of variance is given by Stearman, Ward and Webster (21). s2 0.3235 + 0.0575 = 0.3810 + The components of variance technique is a very useful statistical tool in designing of ex- which gives an even greater increase in preciperiments. Its usefulness stems from our dis- sion. Thus it is clear that increasing the number cussion of methods of increasing precision; of flasks has a greater effect in increasing preequation 3.2 states that the variance of a mean cision than increasing the number of determinaof nf flasks with nd determinations per flask tions per flask. Estimates of the components can be used to determine the size of experiment and will be
2=
nf
2 + -
nf nd
obtain a given precision. Further discussion (4.87) to of the use of components of variance in de-
allocation of flasks and determinations necessary
194
ROEBERT L. STEARMAN
[VOL. 1l
signing experiments will be found in Stearman, Ward and Webster (21).

V. THE BINOMIAL DISTRIBUTION
The parent populawion. Some problems define a population in which each member of the population falls into one of two classes. An example would be a study of the effect of a given toxin on rats in which the classifications would be that an animal died or the animal did not die. The problem defines a population made up of the reactions of each of the animals to the toxin. The typical question to be answered in such a problem is, what proportion of rats will die, given a particular dose of the toxin? In other words, if the same dose of toxin is given to all of the population of rats from which samples are drawn in studies of this type, what propor. tion of the population would die? Call one of the classes class A and the other class "not A" (denote the class "not A" by A). Thus, some of the members of the population will have attribute A and will fall in class A while the remaining members of the population will not have attribute A and will fall in class A (in some discussions, the two classes are referred to as "successes" and "failures", however, since this type of nomenclature sometimes leads to confusion, we will use the classes A and A). Denote the proportion of the population which falls into class A by P and the proportion of the population which falls in class A by Q. As an example, if three quarters of rats given toxin would die and the remaining quarter survive, P would be 0.75 and Q would be 0.25. Since all of the members of the population will fall into one class or the other, the total of the two proportions must be 1, thus,
P+Q = 1
(51)
Equation 5.1 leads to a much-used relationship, namely Q=1-P (5.2) The assignment of the members of the population to the two classes is based on our primary Probability of AA = PP = P2 interests in the problem under study. For example, if, in the toxin example, primary interA sample with one member each from class A est is in the proportion of deaths of the test animals, assign the deaths to class A and the and class A can be obtained in two ways; a survivals to class A, however, if primary in- sample in which a member from class A is drawn terest is in the proportion of survivals, assign first and then a member from class A, or a
the survivals to class A and the deaths to class A. Probability. The statement that a certain proportion, P, of the parent population will fall into class A, is usually made by saying that the probability that a member of the parent population will fall into class A is P. In other words, the probability that a member of a population will have a certain attribute is nothing more than the proportion of the population members that have this attribute, e.g., in the example of the toxin given to rats the probability that a rat given the toxin will die is 0.75. The term probability can also be applied to some of the points discussed previously. As an example, the significance level in a test of a, hypothesis is the proportion, expressed in per cent, of type I errors which will be made if the hypothesis is correct. We can also speak of the probability of type I errors. Thus, at a significance level of 5 per cent, the probability of a type I error is 0.05. Ditibution of samples. Samples from the parent population in which members fall into one of two classes are like samples drawn from other populations in that they are subject to variation (unless all of the members of the population are alike). If a single member is drawn from the population it will either fall into class A or into class A. If a sample of size two is drawn from the parent population, three outcomes are possible: two members from class A, one member each from class A and class A, or two members from class A. The probabilities for each of these outcomes can be derived by considering the order in which the observations in the sample are drawn. To obtain a sample containing two members of class A, both the first and second observation must be in class A. The probability that the first observation is in class A is P and the probability that the second observation is in class A is also P (on the assumption that the parent population is an infinite population). The probability that both are in observation and of second the orclass A is the two probabilities, the first product
19551
195
sample in which first a member from class A and then a member from class A. The probability that the first observation comes from class A is P and the probability that the second observation comes from class A is Q, therefore Probability of AA PQ The probability that the first observation comes from class A is Q and the probability that the second observation comes from class A is P, therefore PQ Probability of AA - QP ^ Now, the probability of a sample with one member each from class A and class A is the total of the probabilities for the different ways in which we can obtain the sample, therefore Probability of either AX or AAA PQ + PQ - 2PQ To obtain a sample containing two members from class A, both the first and second observation must be from class A. The probability that the first observation is from class A is Q and the probability that the second observation is from class A is Q, therefore Probability of AA = QQ - Q' Now, if these are all of the possible outcomes, the total of the probabilities (proportions) must be 1. The proof of this is based on the fact that the three probabilities listed are the three terms in the binomial expansion of (Q + p)2, that is, (Q + P)- + 2PQ + PI
-
TABLE 9 Binomial distribution for samples of size 8
N
servations
Ai
Sampie in of Appearance SbaptnSmr

0
erance oProbability for Order T otal Proba-
AA
QQQ =
PQQ = PQ'
Qs
AAA
AA
AAA
QPQ
QQP
PQ PQ'
3PQ2
AAA AAA AAA

AAA
I
PPQ = P'Q
PQP-P'Q
QPP-PQ
-
3P'Q
PPP
Ps
Here again the total of the probabilities will be 1, since (Q + P)' - 1 - 1 If the method we used in determining the probabilities for the outcomes in samples of size 2 and 3 is applied to samples of greater size, it will be found that the probabilities for the different possible outcomes will coincide with the terms in a binomial expansion of (Q + P) raised to a power equal to the size of the sample. It is for this reason that the distribution of possible outcomes for samples from the parent population is called the binomial dis-
tribution.
Table 10 lists the probabilities for the general where the sample size is equal to n; these probabilities are based on the binomial expansion of (Q + P)R. Let us consider the example of the effect of a
case
Q=
since Q + P = 1 (see equation 5.1), (Q + P)2 = 1' = 1
That is, the total of the three probabilities is equal to 1. If a sample of size 3 is drawn from the parent population, four outcomes are possible: none, one, two or three observations in the sample from clas A. If the probabilities for each of the possible outcomes are computed in a method similar to that used for samples of size 2, the results are given in table 9. The four probabilities listed in column 4 of table 9 are the four terms in the binomial expansion of (Q + P)3, that is, (Q + P)' = Q8 + 3PQ2 + 3P2Q + Ps
toxin on rats for an illustration of the use of table 10. In the example, the rats had a probability of dying of 0.75. What is the probability that exactly 5 out of a sample of 12 rats will die? Here, n
is 12 and k is 5, so the probability will be
(12)(11)(10)(9)(8)
(1)(2)(3)(4)(5)
675)5 (O.2)7
103515625) - 0.01147127
(792)(.2373046875)(.0
(53)
The reader will note that the computations of the probabilities become quite involved: tables are available, however, which list the probabilities of the possible outcomes for various sizes
196
ROEBERT L. STEARMAN
[voL. 19
TABLE 10 for samples with distribution Binomial n observations

No.of Observationsfrom Class A in Sample
considered. If we are dealing with the proportion of observations from class A in the various samples,
,A=P (5.4)
Probability for Sample
and
= PQ
n
0
1
2
3
Qn
(5.5)
nPQt-
n(n -(21) PQn(n - 1)(n - 2)
However, if we are dealing with the number of observations from class A in the various possible samples,
= nP
(5.6)
(5.7)
(1)(2)(3)
PSQRS
and
a2 =
nPQ
One point which should be noted here is that equations 5.4 and 5.5 are also applicable when P and Q are given in terms of percentages. +1) pkpc (n-k k n(n- 1)(ne) (1)(2)(3) *-- (k) When a sample is drawn from the parent population, it is drawn to obtain an estimate of the proportion, P, of the parent population which has some particular attribute, A. The n sample estimate of P, denoted by p, is the proportion of observations in the sample from of samples. The tables of the National Bureau class A. A sample estimate of Q, denoted by q, of Standards (22) list the probabilities for will be 1 - p. Sample estimates of the variance samples of sizes 1 to 49, and the tables of Romig of the distribution of samples are obtained by (23) list the probabilities for samples of sizes substituting p and q for P and Q in the ap50 to 100 by steps of 5. Both sets of tables are propriate equations. These estimates will be set up for parent population probabilities by used later so they will not be illustrated at this point. steps of 0.01. Parameters and statistics of the binomial dis.e Significance Test tribution. The binomial distribution is a disThe basic principles of significance tests for tribution of samples, i.e., if all the possible samples of a given size are drawn from the the binomial distribution are much the same as parent population, the distribution of the those for significance tests involving the normal samples (table 10) will be the binomial dis- distribution. Binomial test of a single sample proportion. tribution. The binomial distribution can be considered in two ways: (a) the proportion of The basic procedure used in testing the number observations from class A in the various possible of class A members observed in a sample against samples; or (b) the number of observations from a specified parent population proportion, P, is to class A in the various possible samples. Table compute the mean (nP) for the binomial dis10 shows the distribution for the number of tribution with the given size of sample and the observations from class A. A table of the dis- value of P specified by the hypothesis, then tribution for the proportion of observations from determine the probability of obtaining deviaclass A is obtained by dividing the entries in tions from this mean as great as or greater than column 1 of table 10 by the size of the sample, n. the deviation of the observed number of class A The mean and variance of the binomial dis- members from the mean. With a two-tailed test, tribution depend upon whether the proportion the probabilities for deviations of this type are or number of observations from class A is computed in both tails of the distribution; with
1955]
STATISTICAL CONCEPTS IN MICROBIOLOGY TABLE 11
197
a one-tailed test, the deviations are computed for the one tail only. If the probability of the deviations falls below the significance level of the test, we reject the hypothesis.
This procedure can be illustrated with a hypothetical sample testing whether the proportion of rats which die when given the toxin in the example will be 0.75. If the toxin is given to 12 rats, and if the hypothesis is correct, the mean number of rats which will die for samples of size 12 will be
Binomial distribution for samples of size 12 with
P = 0.75
No. of Rats That Die
Probability
rP = (12)(0.75) = 9 n=
(5.8)
A two-tailed test will be used, since the proportion of rats which die in the parent population may be less than 0.75 or may be more than 0.75. Suppose that 6 of the 12 animals in the sample die; as the deviation of the observed number, 6, from the mean, 9 is 3, the probability of obtaining deviations of 3 or more from the mean is required. The numbers (of animals which would die) that have a deviation of 3 or more from the mean will include 0, 1, 2, 3, 4, 5, 6 and 12 (12 has a deviation of 3 and is included since the test is a two-tailed test). Now, if the total of the probabilities for these numbers falls below the significance level of the test, we will reject the hypothesis; however, if the total of the probabilities is greater than the significance level, the hypothesis will not be rejected. Let us use the 5 per cent (0.05) significance level. The probabili-
0 1 2 3 4 5 6 7 8 9 10 11 12
0000001
0.0
0.0000021 0.0000354
01
0.0003541
0.0023898
0.0114713 0.0401494 0.1032415 0.1935777 0.2581036 0.2322932 0.1267054 0.0316764
Total
.1.0000000
proportions of class A members, and equations 5.6 and 5.7 when considering the number of
class A members.
How well the normal distribution approximates the binomial distribution depends on two factors, namely, the value of the parent
.
ation proportinocls A members, P, and the size of sample, . When the value of P
is near or equal to 0.5, the normal distribution ties for the various possible outcomes for the is quite close to the binomial distribution even sample, taken from the National Bureau of for small samples; however, as P departs from Standards binomial distribution tables (22), are 0.5 the size of sample necessary to obtain a good fit increases. No attempt will be made to given in table 11. From table 11 the total of the probabilities for give any criterion for the size of sample neces0, 1, 2, 3, 4, 5, 6 and 12 rats dying is 0.0860786; sary for the use of the normal approximation to since this value exceeds 0.05, the hypothesis that the binomial distribution, since what would be the value of P for the parent population is 0.75, considered a good fit depends upon the conseis not rejected.'cosdrd gofideedupnteosquences of discrepencies between the binomial Normal approximation to the binomial dis- distribution and its normal approximation. One other point to be considered is the fact tribution. Significance tests which use the binomial distribution to test hypotheses concerning that while the binomial distribution is discrete sample proportions are, at best, rather involved (it has values only at 0, 1, 2, ... , n), the normal procedures. It would be well to have some curve that approximates it has a continuous quicker and more easily used procedure in distribution. The probability for a discrete making the tests. Such a test is available from point in the binomial distribution is approxithe fact that the binomial distribution can be mated by the probability for an interval in the approximated by the normal distribution. The normal distribution, that is, the height of a bar only parameters necessary in setting up a normal in a bar graph is approximated by the area under distribution are the population mean and the a frequency curve in an interval. The method of population variance. Thus, equations 5.4 and handling this problem, as suggested by Yates 5.5 are used to set up the normal approximation (24), is quite straight forward. If the normal for the binomial distribution when considering approximation for the probability of k members
198
ROEBERT L. STEARMAN
[voL. 19
This test can be illustrated with the example of class A in a sample is required, the area under the normal curve between (k - %) and (k + A) used for the test of a sample proportion using is used. Thus, for the probability of 6 ani- the binomial distribution. In the example, 6 rats mals dying in a sample of size 12 from a parent died in a sample of 12 animals, and the hypothesis population with P = 0.75, use the area under was that P = 0.75. Since the mean from the the normal curve with -u 9 ( - nP - 9) hypothesis is 9 (equation 5.8) and the observed number of deaths, 6, is less than the mean, equaand ora 1.5 (cr2 nPQ = 9 2.25) between 55 5-5tion 5.9 tests the hypothesis. Thus, and (a figure specialtwamsneed 6-1.5 and 6.5 (see figure 7). Two special cases need 2.5 -2.5 mentioning. If k is 0, use the area for everything u - 6 + 1/2-9 .25 1.5 less than X (O + %) and for k n, use the area V(12)(.75)(.25) (5.11) for everything exceeding n -b -1.667 Normal approximation test of a single sample proportion. The significance test used for a The critical region for the u-test with a 5 per angle sample proportion depends on whether cent significance level will be values less than -2 the number of class A members in the sample and values exceeding +2 (see equations 4.8 and As the value of u obtained does not fall into are being considered or the proportion of class A 4.9). the critical region, the hypothesis that the probthe the normal ability of death is 0.75. is not rejected, the same members in the sample.For For either, approximation is used. However, the mean and result as in the binomial test. Some idea of how variance will depend on whether numbers or well the probabilities for the two tests agree can proportions are considered. be gathered if the probability associated with the The u-test (not the t-test) is used for signifi- value of u obtained is determined. The probcance tests of a sample proportion. First con- ability obtained from the binomial test was eider tests that deal with the number of class A 0.0861 (table 11): using the National Bureau of members in a sample. If the observed number of Standards tables of the normal distribution (25), we see that the probability associated with a the sample, class A members in less than m the sample, k, > le88 tna A, is of u of -1.667 is 0.0955. value tiP, use ck + 1/2 - nP Now, tests that deal with the proportion of up ;(5.9) class A members in the sample can be considered. Here, again, a correction for continuity should be included; thus if the sample is less than the proportion according ik - 1/2 - nP proportion (5.10) (5Q to the hypothesis, P, use VQnPO
=
=
7)Two
arembers ionthesample. pn ormala
elP, usmembers
The value X in equations 5.9 and 5.10 is called the correction for continuity and arises from the fact that the interval between (k - %) and (k + M) is used to approximate the binomial probability.
Frequency
k + 1/2
n
u__
(5.12)
If the sample proportion is greater than P, use k - 1/2
urU
-P
(5.13)
4 6 2 8 lo 12 Number of class A members in sample Figure 7. Normal approximation to the binomial distribution for samples of size 12 with P 0.75.
The values of u which will be obtained using these equations will be identical with the values of u obtained using equations 5.9 and 5.10, since 5.12 and 5.13 may be obtained by dividing the numerator and denominator of equations 5.9 and 5.10 by n.
This u-test can also be illustrated with the data of the previous example. Here, P = 0.75 and
1955]

TABLE 12
199
the sample proportion, 0.5 (6/12), which is less than 0.75, therefore equation 5.12 is used: 6 + 1/2 - 0.75
- 0.541667 - 0.75 u _______
Data from a hypothetical test of a purification
procedure for a toxin*

Test Material
1275)(.25)
N/'.-015-6
(5.14)
Toxic N~ Total ~ Propoto Of Toxi Reac- ~~~~ox tion tinReactions
-0.208333 1.667 0.12 This is identical with the value of u obtained in the previous test (see equation 5.11), therefore the decision will be the same.
Original material.. 4 Purified product ... 12 Total .16

_
8
2
12
14
0.333
0.857 0.615
10
26
* Not corrected for continuity.
Norma approximam test for two sample The u-test for the hypothesis that the two proportwn. The extension of the u-test to the parent population proportions are equal will be case of two sample proportions is quite prraightPPS PL (5.15) or u PL BP forward. The correction for continuity is made + pq / + pq by subtracting % from the numerator of the ?/ larger sample proportion and adding % to the LnL nL ns numerator of the smaller sample proportion; where q 1-p also, a pooled estimate of the variance (similar The test can be illustrated with a hypothetical to the pooled estimate of the common variance in the t-test for two sample means) is used. This check on a purification procedure for a toxin; for pooled estimate of the variance comes from the example, starting with some original material hypothesis, which is usually that the parent containing a toxin, suppose we are attempting to population proportions are equal. If the parent isolate the toxin in pure form. Now, after one or population proportions are equal, the beat more stages in the isolation technique, the purified
estimate of the variance will come from the
. estimates and from the combinedsa
product is tested to determine if the toxicity has
and notation samplhes TThe hestimatfoowing poowing taqifrm theeed is needed for the pursample proportion
kL - the number of class A members in the sample with the larger sample proportion ns - the size of the sample with the smaller
pose of setting up the u-test, Let nL- the size of the sample with the larger
sample proportion
- the number of class A members in the k=
sample with the smaller sample proportion
Then
the proportion of class A members in the sample with the larger sample proportion corrected for continuity + -the proportion of class A pas k ns members in the sample with the smaller sample proportion corrected for continuity kL + k.8 the pooled estimate of the P nL + ns common parent population proportion (if the hypothesis is true)
b
increased which would indicate that the isolation was succeeding. To test the toxicity of technique the starting material and the purified product, we could find whether the proportion of animals showing a toxic reaction (or death) with the original material is the same as the proportion for animals given the purified product. Suppose a test is made on the two materials with the results summarized in table 12. Now, the proportion of toxic reactions is larger for the purified product, therefore 11.5 12 - YM 0.821 PL 4 14 14
X-
fL
=12 312= 16 12+4 - 0.615 p = 14 + 12 26

u
-.375-.821 (.615)(.385) (.615)(.385)

12
14 -.446 .446
A/.019731 +.016912
-
N/.06643
= =-2.34 .191
(5.16)
200
ROEBERT L. STEARMAN TABLE 13 Data from table 12 corrected for continuity
[VOL. 19
Normal approximation te8t for more than two
Test Material
Tox___NoToxic
Reac- Reaction
tion
ofal of Toxic
Proportion
Reactions
Original material. 4.5 Purified product.., 11.5

Total .......... 16
7.5 2.5
10
12 14
0.375 0.821 0.615
26
The value of u obtained falls in the critical region for the 5 per cent significance level (u is less than -2) so the hypothesis that the parent population proportions are equal is rejected. Since the sample proportion of toxic reactions is greater for the purified product, there is a statistically significant rise in the proportion of toxic reactions between the original material and the purified product.
A table of the type shown in table 12 is called a 2 x 2 contingency table, because the original data can be shown in a table consisting of two rows and two columns. The totals shown in the third row and the fourth column are called the marginal totals. Note that the proportion shown in the third row of the fifth column is the pooled 00 estimate of the common population proportion. However, the two sample proportions are not theone tht ae ued n te utes, sncethe table does not contain the correction for continuity. A table that contains the correction for continuity can be set up by subtracting . from the numerator of the larger proportion and adding or subtracting % from the remaining numbers in the 2 X 2 portion of the table in such a way as to maintain the marginal totals of the original data. An example of such a table is table 13, which shows the data in table 12 corrected for continuity; the values of PL, PSI and p which are needed in the u-test appear in the fifth column of the table.
example, a check of the toxicity of the toxin produced by a given organism under different conditions of temperature or pH or when cultured in different media, or a test of the proportion of positive reactions to tuberculin produced by different laboratories or by different processes. If we have more than two samples, say c samples, we can set the data up in the form of a contingency table as shown in table 14. When a significance test on more than two sample proportions is made, the chi-square test is used instead of the u-test, and the correction for continuity is dropped. Yates (24) suggested that for contingency tables larger than the 2 X 2 table, there appears to be less need for the correction for continuity. The chisquare test presents a method of comparing the numberofof members inich of the classes for each tempes eachwereobser (we can denote the observed number by 0) with the numer of ofsclass woul be number class member members which which would be
sample proportions. Some problems may lead to a test of more than two sample proportions; for
expete the hyotes tope tted we ue (deote usual hypothesis is the that the proportion pectd rbiE) Thof members in each class is the same in the parent populations from which the samples were drawn. If this hypothesis is true, the pooled estimates of the proportions (p and q) will be the best of the proportions which are common e to the parent populations. These estimates are of the members fobtae bypdividing the total for the respective classes by the total number of members in the samples, as shown in table 14. Now, if the hypothesis is true, the expected proportions in each of the samples will be the same as the pooled estimates of the common proportions, thus, the expected proportion of class A members in each sample will be p and
TABLE 14 General form of a 2 X c contingency table (two rows by c columns)

Class
12
Sample
...
Total
c
Pooled Estimates
A A
k2
n2-k2
n2
..
ni-k1
ni
k kc In- k n0-k,
nn
p= q=
Ik/Zn (Zn-zk)/Zn
Total ..........
...
1955]
201
the expected proportion of class A members will be q. To obtain the expected numbers for each sample multiply the number of observations for the sample by p and q in turn. Thus, the number of class A members in any sample, say the ith sample, will be the product of p and the number of observations in the sample, or (;k) = (nj(Zk)
Zn
Zn
TABLE 15 Test of data from a hypothetical experiment on toxin production by an organism on three media
Reaction
MeTium 2 3
_
I.
-
Observed values (0)

11
Similarly, the expected number of class A mem-
En En ..A
. 3 A.. 12
23*
27*
bers in any sample will be 18 17 50 Total. 15 Proportion A. 0.20 0.61 0.53 (Zn - Zk) =_ (nJ)(Zn - Zk) E niq (nj) Zn Znw II. Expected values (E) Having obtained the expected values for the A . 6.90 8.28 7.82 23.0 various cells in the contingency table, we are A . 8.10 9.72 9.18 27.0 now ready to set up the test. This test is based Total. .15.0 18.0 17.0 on the fact that the quantity III. Differences (0 - E) (O - )2( X2 2E(5.17) A ...Z..A-3.90 0 +2.72 +1.18 0 A. + 3.90 -2.72 -1.18 follows the chi-square distribution with c - 1 degrees of freedom. The test is one-tailed; the E_____ = (0procedure for obtaining chi-square is to square ______ the difference between the observed and exA. 2.2043 0.8935 0.1781 A . 1.8778 0.7612 0.1517 pected values, divide the resulting squared term by the expected value for each cell and then 6.0666 add up the resulting squared terms over all of A = toxic reaction. the cells. A = non toxic reaction. * Pooled estimate for A = 0.46; for A = 0.54 The test will be illustrated with a hypothetical example of toxin production by a given organism
n
on each of three different media. On testing the three preparations on a group of animals, suppose the results given in table 15 are obtained. Since there are three samples, chi-square will have 3 - 1 or 2 degrees of freedom. The value of X2.95 for 2 degrees of freedom is 5.99, therefore our critical region for the test will be all values of chi-square which exceed 5.99. The details of the computations are shown in table 15. Since the value of chi-square obtained in the test falls into the critical region, the hypothesis that the toxin production is the same for all three
Problems of Estimatio
Using the normal approximation, approximate

confidence intervals can be obtained for the parent population proportions from sample
estimates. The concepts of application of confidence intervals to the normal approximation to the binomial distribution are the same as
media is rejected. Graphical methods for tests of sample proportions. Mosteller and Tukey (29) have developed a graphical method for testing hypotheses concerning one or more sample proportions. The method, being graphical, is necessarily somewhat crude, however, it is sufficiently precise for many problems and provides a quick and easy method for testing such hypotheses.
those for the normal distribution. With the normal approximation to the binomial distribution, the u-test is used to derive the confidence intervals instead of the t-test as was done with
the normal distribution.
Confidence interval for a population proportion. Generally, no correction is made for continuity in setting up the confidence intervals for the normal approximation to the binomial distribution. If the confidence interval is derived as was done for the normal distribution (equations 4.62 through 4.66) the following confidence in-
202
ROEBERT L. STEARMAN
[VoL. 19
TABLE 16 Ninety-five per cent confidence limits for samples of various sizes with p = 0.5
SampleSize
Lower-Limit
% 0 9.2
limits for samples of various sizes in which 50 per cent of the sample members are from class A. Sample size and power in significance tests in-
Upper Limit
volving proportions. The effect of the imprecision of sample stimates of proportions based on small samples shows up not only in the confidence intervals, but also in tests of hypotheses. The imprecision of the sample estimates from small samples serves to decrease the power of the signifcance tests. When this happens, there
4 6 10 20
50
18.4 27.6
35.8 35.8 43.6
100 90.8 81.6 72.4
100
250
6402
56B4
64.2
illustrated by the determination of the sample sizes necessary to obtain a given power for tests of significance between samples taken from two terval for a population proportion with a con- populations with specified proportions. Davis fidence coefficient of 95 per cent is obtained: and Zippin (30) have given charts which may be used to determine sample sizes for sampling (5.18) from p - u.W6N/_pn < P < p - u.ogaV two populations with specified proportions where p and q are the sample estimates of P and for powers of the tests of 50 and 80 per cent. Q. Substituting the values of u in inequality These two charts are given in figures 8 and 9. 5.18, The method of finding the size of sample, as given by Davis and Zippin, is: < P < p + 2V (5.19) p-2V~i; 1. Find the vertical line whose value is that of The confidence interval for a confidence coeffithe smaller bei of the two population perprentages is determined cent substitutof 99 cient per by percentages being compared. compared. ing 2.576 for 2 in inequality 5.19. In this in2. Follow this vertical line up until it crosses equality, pq/n is the sample estimate of the the curved line corresponding to the other variance. One point to be noted is that inepercentage. 3. From the point of intersection, read horiquality 5.19 is applicable whether p and q are given in terms of proportion or percentage. zontally to determine the value of the horizontal line on which the intersection The use of the confidence interval given in occurs. This value will be the size required equation 5.19 can be illustrated with a hypofor each sample. thetical example. With a new toxin preparation now illustrate the use of the charts of Davis 3 out o6amlgetTo 3 ou atiof animals givnthep atioxn showd a and Zippin suppose an industrial firm is producing - an antibiotic which, from past experience, seems pro i s d tion of animpals that would show a toxic reac to be effective in 50 per cent of cases of a particular tion? Fifty per cent of the animals in the test type of disease. Further, a research unit of the showed a toxic reaction, so the confidence in- firm has produced a new antibiotic which they wish to test against the antibiotic now in proterval would be _ __ duction to learn whether it will be economically ~(0) / < P< 5 + 2Vr( (5 50 - 2 (50) (50)16 < P < 50 + 2 ~/(50) (50)16 feasible to place the new antibiotic in production. The cost of production of the new antibiotic is 50 - (2) (20.4) < P < 50 + (2) (20.4)
exists a greater probability of accepting a hypothesis that is actually wrong. This fact can be
toxicreation.of Whatmathet phopuatoion

n
_____2_____5)
50 - 40.8 < P < 50 + 40.8 9.2% < P < 90.8%
(5.20)
such that it will be economically feasible only if

the new product will be effective in at least 75 per
In addition to Mustrating the use of inequality 5.19, this example also serves to emphasize that sample estimates of parent population proportions are quite imprecise when the sample
also illustrated in size is small. small. This point is also in table 16 which shows the 95 per cent confidence
cent of the cases. The problem is to determine the size of the samples for the test necessary to detect a difference in proportions of this magnitude. The
zetions Thisepointise itustratled
smaller of the two population percentages is 50 per cent so in figure 8 find the vertical line corresponding to this value, then follow this vertical line to the point where it crosses the line corresponding to the other population percentage, 75 per cent. From the point of intersection, which is
10
Soo 6D0400-
300-
o
z
az80-
760-
:sZ{z
2D00
100
!
70
20
o0
PER CENT IN ONE SAMPLE
40
)0
60
TO
so
90
000
/ l ;_l
s::
S oo l g 9 600 7 l100 6 00 400
200 t1
=
/ /
+ =Z
~ ~ ~ ~
/
100
40
|(z
o
W
i 602
~ ~~~~~~6
//
50-
30 ~ ~ ~ ~ ~
-50 z
20- r
1000F~ ~~
t
~~~~4
w
- 20
lo- '/
0
10
000. 0/
30
40
50
60 70
10
80
90
20
100
PER CENT IN ONE SAMPLE Figure 8. Chart for estimating number of animals required in each sample for statistical significance between percentages. Significance level, 5 per cent; power, 50 per cent. (Figures 8 and 9 originally appeared in The Journal of Wildlife Management (30). These figures are reproduced by permission of Dr. David E. Davis and the editor of that journal). PER CENT IN THE OTHER POPULATION
15
20 25 30
35
40
45
50
55
60
65
70
75
80
85
90
zz og / g / X / A / A / XW|8080 :
z
20 0 0
//
if
200 0
a.
/ / / / 100 //0
/ / / / 40 0~~~~~~~~~~~~~
-/
a00
-100 20 80
0.
z>/> Z= !
mn
600 - 6030 5020 0 W W~~~~~~~~~~~~~~~~3 o 3 280
-5
mn
W. 100
20
z~~~~~~~~~~~ 204 50640 8 900c
W~~~~~~~~~~~~~~~2
80~~~~~~~~~0
204
ROEBERT L. STEARMAN
[voL. 19
on one of the horizontal lines, read horizontally in either direction to determine the size of the samples-here, the size is 30. Thus, two samples each of size 30 are necessary for the test. Now, the chart used was that for a power of 50 per cent, i.e., if the sample size is 30 for each sample, and if the difference between the two population percentages is that given, then in 50 per cent of such tests the conclusion will be reached that there is a significant difference between the two population proportions using a test with a 5 per cent level of significance. On the other hand, this means that there is a 50 per cent chance that even though there is the required difference between the two population proportions, the significance test on the two samples will lead to the conclusion that there is no significant difference between the two products, and will lead to the dropping of the new product. The research group would probably wish more assurance that their labors will not go unheeded if indeed they have come up with a new antibiotic which meets the necessary specifications, so let us see what happens to the sample size when the power of the test is increased from 50 to 80 per cent. Using figure 9 to determine the sample size, we proceed as before; this time the point of intersection of the two lines is on the horizontal line corresponding to a sample size of 60. Thus, to increase the power of the test from 50 to 80 per cent the size for each of the two samples must be doubled. This example shows that increasing the power of a test requires increasing the sample size. Examination of the charts in figures 8 and 9 also shows that as the size of difference between the population proportions to be detected diminishes, the size of sample becomes increasingly large.
tion proportions is assumed, the pooled p and q are not used in estimating the variance. If the confidence interval is derived as before:
PI
-
P -
(U.97O)(8,,1.-.2) < Pt - P2 < -P2-(U.025)(8pl-2)
(5.21)
where
P2+ 8___lql 2q2
pnq
(5.22)
The confidence interval for the difference between two population proportions can be illustrated with the data in table 12 which has been used to illustrate the u-test for the difference between two means. The conclusion reached was that there was a significant difference between the two proportions so now the next step in the process can be taken and an estimate of the difference obtained. Since the data are not to be corrected for continuity table 12 is used rather than table 13. From table 12, Pi = 0.333 (qi = 0.667)
ni 12 P2= 0.857
-
(q2-0.143)
n,
therefore = S-via.v
14
/(0.333)(0.667)
+
(0.857)(0.143)
(5.23) 14 = 0.165
12
Other methods for a confidence interval for a popultion proportion. Other methods are also
available for obtaining a confidence interval for a single population proportion. Clopper and Pearson (31) presented charts which can be used to obtain the exact (not normal approximation) confidence interval for a population proportion. Snedecor (32) gives tables based on the Clopper-Pearson charts. The binomial probability graph paper of Mosteller and Tukey (29) can also be used to obtain a confidence interval for a population proportion.
Thus, 0.333-0.857- 2(0.165) <PI - P2 < 0.333 - 0.857 + 2(0.165) 330 < -0.524 PI-P2
<-0.524 + 0.330 (5.24) -0.854 < P - P2 < -0.194 That is, the purification increased the proportion of toxic reactions by something between 0.194 and 0.854 or between 19.4 and 85.4 per cent.
VI. THE POISSON DISTRIBUTION The third and final of the basic distributions of statistics to be discussed is the Poisson distribution. The discussion of the Poisson distribution along with its history and applications was ably handled by Eisenhart and Wilson (1, pages 62 to 92); therefore, the treatment here will be held to a minimum.
Confidence interval for the difference between two population proportions. The normal approximation is used to obtain a confidence interval for the difference between two population proportions, again with no correction for continuity. Since a difference between the popula-
1955]
205
Derivation of the Poisson Distribution There are two major methods for deriving the Poisson distribution. In one, the Poisson distribution becomes the limit of the binomial distribution as n becomes large and at the same time P becomes small in such a way that the mean, ,u = nP, remains finite. In the other, if
where the lower case greek letter lambda (X) is the density or average number of bacteria per unit volume of the liquid. An example of events randomly distributed in time would be cosmic rays. Here the number of cosmic rays counted in intervals of given length of time, t, will have a Poisson distribution events or items are randomly distributed in and the probability of obtaining k cosmic rays time or space, then the number of events or in an interval of time will be items in samples taken with respect to time or (Xt)* space will have a Poisson distribution. - eCM (6.3) Limit of the binomial distribution. It has been already pointed out that the binomial distribu- Here, lambda is the average number of cosmic tion can be approximated by the normal dis- rays per unit time. General form of probability for Poisson distribution, and also that this approximation works well when the value of P is near or equal tribution. All expressions for the probability of k to 50 per cent and when the size of sample is events or items in a sample can be given in one large. On the other hand, if the proportion, P form, namely (or Q), of clas A members in the population k becomes small, say less than 0.01, and if we take Probability of k events in sample = e-P1 (6.4) large samples (of size n) so that the mean number of clas A members, ,u = nP, is some small where ;& is the mean number of events. The number, then the distribution of the number of previous three expressions for the probability class A members in the population of all possible (6.1, 6.2, 6.3) can be obtained by appropriate samples can be approximated by the Poisson substitution for A. In the limit of the binomial, distribution. The Poisson distribution, like the the mean number of class A members was nP. binomial distribution, is the distribution of For the bacteria randomly distributed in a sample outcomes. The probabilities for the liquid, the mean number of bacteria for a given various possible outcomes are given in table 17. TABLE 17 The expression for the probability of k class Poisson distribution A members in a sample can be simplified by the use of the notation k! = (1)(2)(3) -.. (k); No. of where k! is read factorial k. As an example, 5! is Obserfrm Probablty for Sample (1) (2) (3) (4) (5) 120. Then, the probability of Class A in Sample k class A members in a sample is
-
(nP)i eP
(6.1)
1
(by definition, 0! = 1). Items or events randomly distributed in time or space. As stated before, if events or items are randomly distributed in time or space, then the number of events or items in samples taken with respect to time or space will have a Poisson distribution. An example of items in space would be bacteria in a liquid such as milk or water. If the bacteria are randomly distributed in the liquid, then the number of bacteria in samples of a given volume, v, will have a Poisson distribution and the probability of obtaining k bacteria in a sample will be
(xvY' >Xw
ems nP-eP 1
1
2
3
(1)(2)
(nP)2 eP
(nP)3
(1)(2)(3)
6nP
(nP)* (1) (2) ... (k)
eP
(6.2)
206
ROEBERT L. STEARMAN
[VOL. 19
volume, v, will be Xv. For the cosmic rays, the mean number of rays for a time interval, t, will be Xt. Thus, all expressions for the probability of k items or events in a sample can be lumped into one common expression, given by equation 6.4. Tables of the probabilities are given by Molina (33) and Kitagawa (34). Mean and variance of the Poisson distribution. The mean and the variance of the binomial distribution are related; that is, if the proportion of class A members is considered, the mean is P and the variance is PQ/n. The relationship of the mean and variance of the Poisson distribution is even more striking; both the mean and the variance are equal to ,u. The standard deviation of the Poisson distribution will be NAPi. Randomness is essential in the distribution of the events or items in time or space to obtain the Poisson distribution. If the events or items are not random, the mean of all possible samples will still remain it, but the variance will not be ju. The effect of non-randomness on the variance will be discussed in the subsection on applications of the Poisson distribution.
cells to cover others. However, the variance will be smaller than that of the Poisson distribution, since the observed numbers are clustered more closely about the mean than they would be in a Poisson distribution. If there is clumping of the cells, the numbers obtained will tend to be more divergent from the mean number than we would expect from a Poisson distribution. That is, there will be too many squares with relatively large numbers of cells and too many squares with few cells. Here, if the clumping is sufficient to cause some of the cells to be hidden under others, the mean will be decreased. Also, the clumping will bring about an increase in the variance due to the divergence of the observed numbers from the mean. Bacterial counts by plate method. The Poisson distribution will be applicable to this count if the bacteria are randomly distributed in the liquid being sampled. Also, if the repelling effect among the cells is not negligible or if the relative volume of the cells is too high, the mean count will remain the same but the variance of the counts will be less than that expected from a Poisson distribution. If there is clumping, Applicatiowns of the Poisson Distribution the additional complication obtains of a clump givig rise to a single colony. Thus, of bacteria . . Poisson distribution for the number of bacteria "agam , the same conditions imposed for baccodinsmpedfrbper square in a counting chamber will be ob-agitesm method must be met. counts by chamber bted ted tldistributd series. The Bacterial counts from dilution by dilution ] the chamber. This follows from considering estimation of bacterial densities the volume of liquid above each square to be series is one of the oldest applications of statistics the sample of volume, v, in the example of bac- to microbiology. The estimate obtained is teria in liquid. Certain conditions must be met known as the Most Probable Number (MPN). before the distribution of the bacterial cells m Tables for determining the MPN, published by the chamber will be random: (a) cells should the American Public Health Association (35, another or else there must be pages 220 and 221), are set up for 10-fold dilunot~~ repe once another, not repel sufficient space among the cells so that the re- tions for certain combinations of 5- and 10-fold pelling effect will be negligible; (b) the volume dilutions. of the cells, relative to the volume of liquid in Cochran (36) published a table, given here as which they are suspended, should be small; (c) table 18, which can be used to obtain confidence there should be no clumping of cells. limits for the number of bacteria from a MPN, If the repelling effect among the cells is not of bacteria fm M for the number negligible, or if the relative volume of the cells for ofi 2, 4, 5 s lmit obtained by dThe lower confidence is too high, the cells in the chamber will tend to be uniformly or homogeneously distributed viding the MPN by the factor and the upper rather than random. That is, the numbers ob- confidence limit is obtained by multiplying the tained will tend to be closer to the mean number MPN by the factor. The use of Cochran's table than would be expected if the cells were ran- can be illustrated by obtaining confidence domly distributed. Here, the mean number of limits for a value in one of the A. P. H. A. tables cells per square will remain the same, as long as (35, page 221). This table is set up for five tubes the crowding is not great enough to cause some per dilution with a dilution ratio of 10, thus, the
ifuthe bnacounterian rhandmbewiy
ob-terial
elselo thee3usb onelca ~ ~ orlt
loera
andd10.
1955]

TABLE 18* Factors for confience limits for most probable number
207
factor from Cochran's table will be 3.30. Now, suppose that 4 out of 5 tubes contining 0.1 ml. of the liquid being tested showed growth and 3 out of 5 containing 1 ml. and 5 out of 5 tubes contaiing 10 ml. showed growth. The value of the Most Probable Number will be 59 per ml. The upper confidence limit will be
(59)(3.30) = 194.7
No. Samples
Of
tion
Factofor95__onfidnce ___it eL %rDilution ratio

2 4
5
_
10
and the lower confidence limit will be 59/3.30 17.9

-
2
3
4.00 2.67
2.23
4.00
7.14
3.10
8.32
14.45 4.68 3.80 3.30 2.98 2.74
4.47
3.39
6.61
Thus we see that our estimate of the number of bacteria per ml. of the liquid will be between 17.9 and 194.7. Finney (37) has given a computational method for obtaining the Most Probable Number which has certain advantages: (a) it is readily com puted for any dilution ratio; (b) the number of tubes need not be the same for all dilutions (this would be especially useful in the event of breakage through mishap); (c) confidence limits can be obtained directly from the computations. VII. ACKNOWLEDGMENTS I am indebted to my wife, Barbara Do Stearman (Technician in Charge, Diagnostic Bacteriology Laboratory, The Johns HopkinsHOse pital), for her encouragement and for serving as a microbiologist reader of the first draft of this review, as well as to Professor William G. Cochran and Doctor Margaret Merrell (Department of Biostatistics, The Johns Hopkins University) for checking the first draft for statistical content and readability. Thanks are due to the students and staff of the School of Hygiene and Public Health, The Johns Hopkins University, as well as to various persons in the fields of microbiology and statistics for their criticisms of the "Ditto" copies of the manuscript. The "Ditto" copies of the manuscript, as well as the final publication of this review would not have been possible without the patient help of Miss Virginia Brooke Thompson and Mrs. Adrienne Holland.
VI. APPENDIX
4 2.00 2.68 2.88 5 1.86 2.41 2.58 6 1.76 2.23 2.38 7 1.69 2.10 2.23 8 1.64 2.00 2.12 2.57 9 1.58 1.92 2.02 2.43 -_ ____ 1_86 1_9_ __2_32 * This table originally appeared in Biometrics, Volume 6 page 115, 1950, and is reproduced here with the permission of Professor William G. Cochran and the editor of that journal.
and another table will be used when the number of observations are not equal for the different samples. The method for samples of equal size
can be illustrated using the data in table 5. A few new notations will be required to set up the analysis of variance computing table.
Starting with data from an experiment with k treatments and n observations per treatment (in the example, k is3 and n is 10), let x;; be the ith observation from the jth treatment, thus, i will run from 1 to n and j will run from 1 to k. For example, if i is 5 and j is 3, this would give xu which would be the fifth observation from the third treatment (in the example this would be the fifth observation in the sample from the control, or 127). The total of the observations for the jth treatment will be denoted by T.j. Thus, T., will be the sum of all of the observations in the sample from the jth treatment or
2 Zij - XZi T.) = i-i
+ X2j + Xj +
*.
x"j
(8.1)
There are two slightly different computing tables used in the analysis of variance. One table will be used when there are an equal number of observations in each of the samples
In the example, T.2 would be the sum of the observations in the second treatment or 1,285. Let T.. stand for the grand total of all of the observations in the experiment, thus, * k T.. 2 iZ = 2 T.; x i-1 i -i (8.2) -T.1 + T.2 + -.- + T.t
208
ROEBERT L. STEARMAN
TABLE 19 Analysis of variance computing table for samples with equal numbers of observations Part 1: Preliminary calculations
(1)
[VOL. 19
(2)
Total of Squares
(3)
No. of Items
(4)
(5)
Total of Squares per Observation (2) + (4)
Source of Variation
Squared
Observati SquaredItem
T 1 - a Correction*............... Treatments ................... k -b T2{sxXjnk nic c Observations -..............
nk n
T../nk = A 2,T2.,/n = B ZZjz',1 = C

(10)
F
Part 2: Analysis of variance table

(6)
Variation Due to:
(7)
(8)
of
(9)
Mean Square (4) (7) + (8)
Sum of Degrees (d~f.) Squares (S)Freeom
Treatments ...........
Observations..............
.B - A C-B
b
c
a b
(B
(C
B)/(c
A)/(b
M.S. a) =
b)
MS.0
M.S.F/M.S.o
Total ...................
C-A
c-a
* The correction term does not constitute a source of variation.
TABLE 20 Analysis of variance computing table for the data in table 5 Part 1: Preliminary calculations ____________ - _ 2___ -3) __4_ (2) (3) (4) (5)
Source of Total of Variation Variationl | Squares
Obser- Total of No. of vations Square bts n hrfrebt etr r Items per thrfr ohltesaereplaced bohiadj, Obsevation Squared Sutared per h b osi fT Item (2) --2(4) usrp
letter replaced by the dot was the letter on which summation took place to obtain the total. Starting with xZq, the treatment total, T.j, is obtained when summation takes place on i while j remains constant, therefore the letter i is replaced by a dot in the subscript of T. The on grand total, T.., results from the summation
Correction*... 11,276,164 1 Treatments.... 3,957,250 3 Plate counts... 398,408 30

(6)
Variation Due to:
30 375,872.1 10 395,725.0
The summation notation can be simplified by using only i or j as an index to the summation sign, since the limits of both i and j are known, 1 398,408.0 thus
T.; = 2x
(10)
and

(7) (8)
Free-
(8.3)
(9)
T.. = 22ixiZ-2 ,T (8.4) .i ST 1 h a y F * . . Smuares dom Table 19 E the analysis of vace computing (d.f.) (7) (8) table for an experiment with equal numbers of Treatments ....... 19,852.9 2 9,926.45 99.89 observations. Table 20 illustrates the use of table 19 with the data from table 5. The analysis Plate counts ....... 2,683.0 27 99.37 _____________ - of variance computing table has two parts: the first part consists of the preliminary calculaTotal . 22,535.9 29 while the second part is the of tions, analysis * The correction-term of an results term does does not-constitute not constitute a variance table. When the analysis souce correctiona.
Sum of
Degrees Mean
In the example, T.. would be the total of the
totals, or 3,358.
a dot in the
The reader should note that in this notation, subscript of a T shows that the
of variance are published, only the second part is placed in the publication. The first line of the preliminary calculations is devoted to the correction term. The correction term does not usually constitute a source of
1955]
STATISTICAL[CONCEPTS IN MICROBIOLOGY
209
variation; it is placed in the table for computational convenience. The second line is for cormputations involving the treatment totals, while the third line involves the observations of the experiment. The second column contains the totals of squared terms. The first line contains the square of the grand total (3,3582 = 11,276,164). The second line contains the sum of the squares of the treatment totals (7562 + 1,2582 + 1,3172 = 3,957,250). The third line of the second column contains the sum of the squares of the observations (398,408). The third column lists the number of items which were squared to obtain the entries in the second column. In the first line, this was a single item, the grand total; in the second line, each of the k totals, so the entry here is k (k = 3); in the third line each of the nk observations in the experiment, so the entry is nk (nk = 30). The fourth column lists the number of observations which make up each of the items that were squared. In the first line, the grand total is the sum of the nk observations (nk = 30). In the second line each of the treatment totals was the sum of n observations (n = 10). In the third line each of the observations has been squared so the entry will be 1. Note that the product of an entry in column 3 multiplied by the respective entry in column 4 is always nk, the total number of observations. This fact gives a quick check on the entries in these columns; in the example, nk is 30, so the product will be 30. The entries in the fifth column are obtained by dividing the entry in the same line of column 2 by the respective entry in the fourth column. The entries in the third and fifth columns have been designated by letters for convenience in setting up the entries in the analysis of variance table. The entries in the third column have been denoted by lower case letters, while the entries in column 5 have been denoted by capital letters. Turning to the analysis of variance table, the entries in the seventh and eighth columns are obtained by simple subtractions involving the entries in columns 3 and 5 of the preliminary calculations. To obtain the treatment S.S., subtract A from B (395,725.0 - 375,872.1 19,852.9). To obtain the S.S. for the observations (the term error is often used in place of observations) subtract B from C (398,408.0 395,725.0 = 2,683.0). To obtain the total S.S.,
subtract A from C (398,408.0 - 375,872.1 = 22,535.9). The entries in column 7 represent the splitting of the total S.S. into two parts which represent the S.S. for the variation among the sample means (this is the treatment S.S.) and the S.S. for the variation within the samples (this is the observation S.S.). The entries in the eighth column are obtained by performing the same operations on the lower case letters as those which were performed on the capital letters to obtain the entries in column 7. Thus to obtain the treatment d.f., subtract a from b (3 - 1 = 2). To obtain the observation d.f., subtract b from c (30 - 3 = 27). Subtract a from c for the total d.f. (30 -1 = 29). The entries in the eighth column represent the splitting of the degrees of freedom into two parts which represent the degrees of freedom for the variation among sample means (the treatment df.) and the degrees of freedom for variation within the samples (the observation d.f.). Having obtained the S.S. and the d.f. values, we are now ready to find the mean squares, which will be the external and internal estimates of the common variance. The mean squares are entered in column 9 and are obtained by dividing the entry in the same line of column 7 by the respective entry in column 8 (the S.S. divided by the degrees of freedom). There is no need for the total mean square, so it won't be computed. The treatment mean square (M.S.T) is obtained by dividing the treatment S.S. by the treatment d.f. (19,852.9/2 = 9,926.45). The treatment mean square is equal to the external estimate ns2 (see equation 4.58). The observation mean square is equal to the observation S.S. divided by the observation d.f. (2,683.0/27 = 99.37). The observation mean square (M.S.o) is equal to the internal estimate or the pooled e2 (see equation 4.53). Now since
2
M.S.T= fl8C
and
M.8.0 = pooled s therefore, using equation 4.59,

F - ______ - M.S.T = 99.89 pooled s2 M.S.0
n2
M.S.,
(8.5)
The degrees of freedom of F are found in column 8 and will be b -a degrees of freedom for the numerator and c -b degrees of freedom for the denominator. Thus the analysis of variance
210
ROEBERT L. STEARMAN TABLE 21 Analysis of variance computing table for samples with unequal numbers of observations Part 1: Preliminary calculations
(1)
[voL. 19
(2)
(3)
No. of Items
(4)
Observations per Squared Item
(5)
Total of Squares+per Observations (2) (4)
Source of Variation
Total of Squares
Squared
Correction* ................ T.. Treatments.-. ZZ Observations .............. X x'sj
1 = a k = b
Zlnj
1
Zin,
(8)
Fr
T24 /n = A Z,(T'.J/n,) = B = C 2ix;24 Z

(10)
F

(6)
Variation Due to:
(7)
(9)
Mean Square (s0) (7) + (8)
Squares (S..)
(d.f.)
-
Treatments ................B - A Observations .............. C - B

Total
_oa ...................
b
c
a
b
(B (C
M. A)(b - a) = B)/(c - b) = M.So
M.S.TM.S.O
....A.
C-A
c.-.
c-a
The correction term does not constitute a source of variation.
computing table gives results identical to those obtained before (see equation 4.60). The analysis of variance computing table must be changed when the number of observations are not equal for the different samples. One more piece of notation is needed to set up the analysis of variance computing table for the case of unequal sample size. Let nj be the size of the sample from the jth treatment. The rest of the notation will remain the same. If the sample size for the jth treatment is n,, the total size of the experiment will be Zjai, that is, the size of the experiment will be the sum of the sizes of the samples involved. The analysis of variance computing table for samples of unequal size is given in table 21. The use of this table will not be illustrated. The same basic form is used; the relationship among columns remains the same and the analysis of variance table is unchanged. Only slight changes occur m the preliminary calculations. In the first and third lines of the preliminary calculations, the only change made is the replacement of nk by Znj as the total number of observations in the experiment. In the second line for equal numbers, we first took the total of the squares of the treatment totals and then divided the total of the squares by the number of observations common to the samples. For unequal sample sizes, square each of the treatment totals
and divide each squared item by the number of observations which make up the total before summing. That is, determine Tl/ni for each treatment and then take the sum of these quantities. Columns 2 and 4 of line 2 are left blank, and the sum of the T.2/nj values is entered directly in column 5 of line 2. The entry in column 3 of line 2 remains the same as before since the number of items squared is still k, the number of treatments. None of the entries in an analysis of variance computing table should be negative numbers. If any entry should be negative, a mistake has been made in the computations. A point which is helpful in checking the entries in the analysis of variance table is that the treatment S.S. plus the observation S.S. is equal to the total S.S. and the total d.f. is the sum of the treatment d.f. and the observation d.f. Tess to Supplement the Analysis of Variance
If the value of F obtained in the analysis of variance does not fall into the critical region for the test, the hypothesis that the population means for the treatments are the same is not rejected. If this happens, further tests for differences among the population means may be unnecessary. However, if the value of F falls into the critical region, further tests may be
1955]
211
needed to determine where the differences among the treatment population means lie. The analysis of variance tests the variation among the sample means of the treatments to see whether the variation among these means is greater than the variation normally expected to arise from the procedure for obtaining the observations. For example, if the three treatments in table 5 had population means which were equal, we would expect to have some variation among the sample means arising from the variation among the plate counts. However, when the analysis of variance was applied to these data, the fact that the F-value fell into the critical region told us that the variation among the sample means for these three treatments was greater than that which could be explained on the basis of the variation among the plate counts. The statistically significant value of F does not tell the source of this increased variation. If all but one of the population means were equal, the fact that one of the population means was different could introduce sufficient variation among the sample means to obtain a statistically significant value of F. Similarly, if no two population means were equal, we could again get sufficient variation among the sample means to obtain a statistically significant value of F. Thus, if the value of F falls into the critical region for the test, we know only that at least one of the population means is "out of line." In some problems, knowledge that the variation among the treatment means is greater than that which can be accounted for by the variation arising from the procedure for obtaining the observations may be sufficient and further testing may not be necessary. An example of this type of problem might arise from a group of laboratory technicians running the same laboratory procedure or some common test. Here, interest might be confined to knowing whether the different technicians agreed within the limits of their precisions. In this type of problem, the analysis of variance tests the variation among the technicians to see whether it is negligible with respect to the variation from the method of measurement. Here the treatments are samples and the observations are the measurements. Interest rests primarily on variation among the technicians, not in which technician gives the greatest or smallest average value,
therefore, further tests after the analysis of variance are not needed. Here, an estimate of the magnitude of the variation from each source can be obtained by the use of the components of variance technique. In other types of problems, interest may be centered in the relationship among the population means, for example, (a) in which treatment gives the best results, (b) in ranking the treatments, or (c) in finding whether the relationship among a set of treatments is linear. In these types of problems it is necessary to follow an analysis of variance by further tests if there is a statistically significant difference among the means. One method of testing differences among means after an analysis of variance is by the t-test. We use the usual t-test for the difference between two treatments for populations with equal variances (the variances must be equal to use the analysis of variance) except that the pooled s2 for two treatments is replaced by the pooled 02 from all of the treatments (the observation mean square from the analysis of variance table) as well as the degrees of freedom for this pooled 82 (the observation d.f.). In this way, all of the information afforded us by the entire experiment concerning the magnitude of the common population variance is utilized. As previously pointed out, the F-test for the analysis of variance is fairly insensitive to lack of homogeneity of variances. The t-tests which use the internal estimate of the variance, the pooled 2, are sensitive to homogeneity of variances. Thus, Bartlett's test (12) may give us a useful warning not to use a pooled e2 for t-tests which follow the F-test. If t-tests are to be used following the analysis of variance, the homogeneity of variance test should be used. Another of the troubles with the use of the t-test is that all too often conflicting results are obtained as pointed out earlier. Still remaining is the problem of the number of t-tests which must be run. Slightly more sophisticated methods have been offered for testing the differences including the method of Nair (18), which may be combined with the t-test, and the more recent method by Duncan (19). These latter methods can be used in ranking treatments. The most useful method, if it is applicable, for testing the differences among the treatments is the use of individual degrees of freedom. Ac-
212
ROEBERT L. STEARMAN
[VOL. 19
tually, the method of individual degrees of freedom is an extension of the analysis of variance procedure. The procedure consists of a further breakdown of the treatment S.S. into parts, each of which has 1 degree of freedom. The partitioning of the treatment S.S. is done in such a way that the resulting mean squares can be used to test certain hypotheses concerning the differences among the treatments, the denominator for the tests being the observation mean square. With k treatments, there are k - 1 degrees of freedom for treatments; therefore, the treatment S.S. can be partitioned into k - 1 parts, each having 1 degree of freedom. For example, the data in table 5 have three treatments, hence two degrees of freedom for treatments (see table 20). The treatment S.S. can be split into two parts, each having 1 degree of freedom; this can be done in many ways, including the following three ways. The first degree of freedom could be used to test the difference between the control (treatment 3) and the two media with lard added, with the second degree of freedom being used to test the difference between the rancid lard (treatment 1) and lard which is not rancid (treatment 2), or we can use the first degree of freedom to test the difference between treatment 1 and treatments 2 and 3 with the second degree of freedom for testing the difference between treatments 2 and 3, or use the first degree of freedom to test the difference between treatment 2 and treatments 1 and 3 with the second degree of freedom being assigned to test the difference between treatments 1 and 3. The way in which the treatment S.S. is partitioned must be meaningful. It should also be decided upon before the data are examined, so that the data do not bias the judgement as to which test is to be used. The researcher should have some particular hypothesis in mind before the data are obtained, in the event there is a statistically significant difference among the means. The choice of the hypothesis may be based on some theory to be tested. For example, before taking the data in table 5 it might have been thought that rancid lard would inhibit the germination of spores while non-rancid lard would have no effect differing from the control medium. Here, the proper choice would be to use the first degree of freedom to test the difference between treatment 1 and treatments 2
and 3, then use the second degree of freedom to test the difference between treatments 2 and 3. Another theory would require a different partitioning of the treatment S.S. It is not necessary to partition the treatment S.S. completely to test a hypothesis. For example, the hypothesis to be tested might be that the k treatments could be divided into two groups, each group having equal population means within the group but with a difference in population means between the groups. That is, there would be k1 treatments (the first group) with equal means and k2 treatments (the second group) with equal means (k1 + k2 = k), but the mean common to thefirst group would not be equal to the mean which was common to the second group. Here, 1 degree of freedom would be used to test the difference between the two groups, k1 - 1 degrees of freedom to test the difference among the means in the first group and k2 - 1 degrees of freedom to test the difference among the means in the second group (treatment d.f. = 1 + ki -1 + k2 -1 = ki + k2-1 = k - 1). Individual degrees of freedom can be used to test other types of hypotheses. For example, if the treatments consist of increasing equally spaced quantities of a metabolite or test material, the treatment S.S. can be split into units with single degrees of freedom to test whether the response is linear, quadratic, or cubic and higher. All in all, methods employing individual degrees of freedom cover a great many types of hypotheses and are very useful. The methods for partitioning the treatment S.S. for individual degrees of freedom are given in section 3.4 of Cochran and Cox (8). The supplementary tests discussed here may be used in place of an analysis of variance. In fact, they have a definite advantage over the analysis of variance in that they are designed to test specific hypotheses whereas the analysis of variance tests a general hypothesis. The analysis of variance is like a shotgun, in that it covers a lot of territory and doesn't bring too much force to bear on any particular point. The supplementary tests, however, are like rifles, in that they bring all of their force to bear on a particular point. Thus the analysis of variance is a general test which tests for any and all types of divergence from equality of treatment population means with little power
1955]
213
against any specific type of divergence. The supplementary tests, on the other hand, test specified types of divergence and have much higher power. Therefore, in some problems the supplementary tests may be more appropriate than the straight analysis of variance.
Notes
r X c contingency table, chi-square will have (r - 1)(c - 1) degrees of freedom. The chi-square test can also be used in tests of "goodness of fit" which are tests designed to determine whether data fit various theoretical frequencies. Tests of this sort include testing of of genetical characters as well as theApplicatiohypotheses testing the fit to such distributions as the norA Tplcaton of t Notes on the (Chi-square Test mal, binomial or Poisson. The chi-square test We can apply the chi-square test to cases for goodness of fit is discussed in most elemenwhere we are testing either two sample propor- tary textbooks [See also Eisenhart and Wilson tions or a single sample proportion. In the chi- (1)]. Chi-square, like the treatment S.S. in the square test of two sample proportions, use the same procedure on the 2 X 2 contingency table analysis of variance, can be partitioned into used on the 2 X c contingency table with one individual degrees of freedom, each of which change; instead of running the test on the origin can be used to test some particular hypothesis. nal 2 X 2 table, run the test on the 2 X 2 table Cochran (26) presents several methods which that is corrected for continuity, as in table 13. are applicable to the various chi-square tests. The degrees of freedom for the chi-square test Yates (27) has given a method, along with detailed computing instructions, for separating will be 2 - 1 or 1 degree of freedom. If the chi-square test is used on a single out a single degree of freedom of chi-square for sample proportion, again correct for continuity. testing the linearity of response in an r X c The expected values are obtained from the contingency table. Yates' method would be proportion given by the hypothesis. The ob- applicable, for example, to testing the hypotheserved values for both class A and class X must sis that the proportion of positive reactions to be corrected for continuity. The test will have tuberculin tests is a linear function of increasing strengths of tuberculin. 1 degree of freedom. Chi-square values are additive. Thus, in a The decisions reached concerning the hypothesis will be the same for both the u-test and series of chi-square values, each of which results the chi-square test. This arises from the fact from a test of the same hypothesis on different that although the two tests look quite different, sets of data, the sum of these will be distributed they are indeed related. The relationship be- as chi-square with degrees of freedom equal to tween the two tests is that u - x2 with 1 degree the sum of the degrees of freedom of the individof freedom. The critical regions of the two tests ual tests. Cochran (28) pointed out the fact are such that for a given significance level, any that the values of chi-square to be added must value which falls in the critical region for one not be corrected for continuity, e.g., when the test also falls in the critical region for the other chi-square values for a series of tests of one or two sample proportions are added, the tests test. The chi-square test can also be extended to must not contain the correction for continuity. There are certain restrictions concerning how results where the members of the parent poputhe expected values in the chi-square small For extwo classes. than lations fall into more ample, instead of classifying members as either tests can be. Some authorities suggest that 10 having a toxic reaction or no toxic reaction, the is the minimum expected value which should be severity of the reaction might have three classi- used while others have suggested that a minifications, such as none, mild or severe toxic mum expected value of 5 will suffice. Studying reaction. Another example would be the quanti- the effect of small expectations, Cochran (28) tation of the tuberculin reaction. If there are r concluded that the effect of small expected classifications with c samples, an r X c con- values depends upon the number of degrees of tingency table results. The chi-square test freedom in the test, and later (26) presented proceeds as before by the determination of the recommendations about minimum expectations. Yates' correction for continuity (24) is inexpected values and the test being based on chi-square as defined by equation 5.17. With an tended for use when the samples are small.
214
ROEBERT L. STEARMAN
[VOL. 19
Such correction is less necessary when the samples used are large. However, as a working rule, the correction for continuity takes little extra time and may as well be used regardless of the size of sample involved. The correction of Yates (24) is usually limited in application of either either one or to u-tests or chi-square tests of two sample proportions. Cochran (28) ha5 given a correction for continuity which has wider application.
tot Ytesi(24) chi-squally ltests
aone
REFERENCES 1. EISENHART, C., AND WILSON, P. W. 1943 Statistical methods and control in bacteriology. Bacteriol. Revs., 7, 57-137. 2. HUFF, D. 1954 How to lie with statistics. W. W. Norton and Co., New York, N. Y. 3. BROSS, I. D. J. 1953 Design for decision. The MacMillan Co., New York, N. Y. 4. ARKIN, H., AND COLTON, R. R. 1950 Tables for statisticians. Barnes and Noble, Inc., New York, N. Y. 5. HALD, A. 1952 Statistical tables and formulas. John Wiley and Sons, Inc., New York, N. Y. 6. FISHER, R. A., AND YATES, F. 1953 Statistical tables for biological, agricultural and medical research. 4th ed. Oliver and Boyd, London, England. 7. WILSON, G. S. 1935 The bacteriological grading of milk. His Majesty's Stationery Office, London, England. 8. COCHRAN, W. G., AND Cox, G. M. 1950 Experimental designs. John Wiley and Sons, Inc., New York, N. Y. 9. ROTH, N. G., AND HALVORSON, H. 0. 1952 The effect of oxidative rancidity in unsaturated fatty acids on germination of bacterial spores. J. Bacteriol., 63, 429-435. 10. EISENHART, C. 1947 The assumptions underlying the analysis of variance. Biometrics, 3, 1-21. 11. COCHRAN, W. G. 1947 Some consequences when the assumptions for the analysis of variance are not satisfied. Biometrics, 8, 22-38. 12. BARTLETT, M. S. 1937 Some examples of statistical methods of research in agriculture and applied biology. J. Roy. Stat. Soc. (Suppl.), 4, 137-170. 13. Box, G. E. P. 1949 A general distribution theory for a class of likelihood criteria. Biometrika, 36, 317-46. 14. Box, G. E. P. 1953 Non-normality and tests on variances. Biometrika, 40, 318335.
15. JAMES, G. S. 1951 The comparison of several groups of observations when ratios of the population variances are unknown. Biometrika, 38, 324-329. 16. WELCH, B. L. 1951 On the comparison of several mean values: an alternative approach. Biometrika, 38, 330-336. 17. BARTLETT, M. S. 1947 The use of transformations. Biometrics, 3, 39-52. 18. NAIR, K. R. 1948 The distribution of the extreme deviate from the sample mean and its studentized form. Biometrika, 35, 118144. 19. DUNCAN, D. B. 1951 A significance test for differences between ranked treatments in an analysis of variance. Virginia J. Sci., 2(N.S.), 909-913. 20. HALVORSON, H. O., AND SPIEGELMAN, S. 1953 Net utilization of free amino acids during the induced synthesis of maltozymase in yeast. J. Bacteriol., 65, 601-608. 21. STEARMAN, R. L., WARD, T. G., AND WEBSTER, R. A. 1953 Use of a "components of variance" technique in biological experimentation. Am. J. Hyg., 58, 340-351. 22. National Bureau of Standards 1949 Tables of the binomial probability distribution. United States Government Printing Office, Washington, D. C. 23. ROMIG, H. G. 1953 50-100 binomial tables. John Wiley and Sons, Inc., New York, N. Y. 24. YATES, F. 1934 Contingency tables involving small numbers and the x' test. J. Roy. Stat. Soc. (Suppl), 1, 217-235. 25. National Bureau of Standards 1953 Tables of normal probability functions. United States Government Printing Office, Wash-
ington, D. C. 26. COCHRAN, W. G. 1954 Some methods for strengthening the common x' tests. Biometrics, 10, 417-451. 27. YATES, F. 1948 The analysis of contingency tables with groupings based on quantitative characters. Biometrika, 35, 176-181. 28. COCHRAN, W. G. 1942 The x2 correction for continuity. Iowa State Coll. J. Sci., 16, 421-436. 29. MOSTELLER, F., AND TuzEY, J. W. 1949 The uses and usefulness of binomial probability paper. J. Am. Stat. Assoc., 44, 174-212. 30. DAVIS, D. E., AND ZIPPIN, C. 1954 Planning wildlife experiments involving percentages. J. Wildlife Management, 18, 170-178. 31. CLOPPER, C. J., AND PEARSON, E. S. 1934 The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika, 26, 404-413.
1955]
215
32. SNEDECOR, G. W. 1950 Statistical methods. 4th ed. The Iowa State College Press, Ames, Iowa. 33. MOLINA, E. C. 1949 Poisson exponential binomial limit. D. Van Nostrand Co., Inc., New York, N. Y. 34. KITAGAWA, T. 1952 Tables of the Poisson distribution. Baifukan, Tokyo, Japan. 35. American Public Health Association 1936
Standard methods for the examination of water and 8ewage. 8th ed. American Public Health Association, New York, N. Y. 36. COCOON, W. G. 1950 Estimation of bacterial densities by means of the "most probable number." Biometrics, 6, 105-116. 37. FINNEY, D. J. 1951 The estimation of bacterial densities from dilution series. J. Hyg., 49, 26-35.

Statistical Concepts

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Statistical Concepts

Diunggah oleh

Hak Cipta:

Format Tersedia

STATISTICAL CONCEPTS IN MICROBIOLOGY'

Frequency tables.................................................................... .164

graphs.......................................................................... 164 Histograms.......................................................................... 164 .165 Frequency polygons..................................................................

Parameters and Statistics .

Parameters.......................................................................... 166 Statistics............................................................................ 167

'Paper number 300. Milbank Memorial Fund Fellow.

STATISTICAL CONCEPTS IN MICROBIOLOGY

PreetedBefore proceeding, it is appropriate to see what

STATISTICAL CONCEPTS IN MICROBIOLOGY

Measurement Figure 1. Bar graph (see text for detailed exp gu ).

STATISTICAL CONCEPTS IN MICROBIOLOGY

a.ljacent... intervals, Foa ifngthe recondbetanglesu mden

STATISTICAL CONCEPTS IN MICROBIOLOGY

STATISTICAL CONCEPTS IN MICROBIOLOGY

theqlateions of16 twisrevsew

STATISTICAL CONCEPTS IN MICROBIOLOGY

and sample estimate of the coefficient of variation by:

IV. THE NORMAL

and its radius (equivalent to a an t ais(qlaett parameter of spread). In a somewhat similar

STATISTICAL CONCEPTS IN MICROBIOLOGY

X. o2 = 994 gram X.975 = 1006 gram

TABLE 1 Types of errors which can be made in a test of a hypothesis

Status of the Hypothesis T

Do not reject the hy-

Reject the hypothesis.. Type I er- No error

STATISTICAL CONCEPTS IN MICROBIOLOGY

significance test: (a) Set up the hypothesis.

STATISTICAL CONCEPTS IN MICROBIOLOGY

0.0036 0.0576 0.0196

10,040.04 10,100.25 10,080.16 10,120.36 10,060.09 9,980.01 10,040.04 9,980.01

In column 4 of table 2 10 Z (.1

TABLE 3*t Results obtained by two observers each counting the

S.S. = 100,521.22 - 100,520.676 = 0.5440

221 141 63 249 292 79 161 397 118 93 94 163

323 202 80 198 323 97 181 416 139 112 98 161

STATISTICAL CONCEPTS IN MICROBIOLOGY X Z(d - d)'

14,292.9167 - 1,299.3561 (4.23)

Therefore, using equation 4.20, -21.583 - 0 -21.583 -21.583 -* 4 t-

(4.20) independe samples. The means from two samples

Test of teU difference between two treatments:

Also from the same column,

2 therefore, using equation 4.18, + (l9d-O-( -1 8

19883 ' 14,292.9167

Equations 4.26 and 4.28 hold only for inde-

Since there are 11 degrees of freedom,

4)) -4.3 1+1

a22~ = (42 (4.31)

Since the two variances

both*b estiates f thecommo

(see equation 2.16)

s2 and s2 into this form and obtain

STATISTICAL CONCEPTS IN MICROBIOLOGY

-D= 194.166 S =S-D 125,273.6667 4D= 11,388.5152 Using equation 4.33,

= 10,707.2083 Substitute these values in equations 4.34,

TABLE 4 Radioactivity measurements on two solutions

399 399 404 396

5,623 14 401.64 2,259,483

in the radioactivity of the two solutions,

the radioactivity of solution 1 is higher than that

of solution 2 or the radioactivity of solution 2 is

STATISTICAL CONCEPTS IN MICROBIOLOGY

s' and 4, then

When the degrees of freedom are the same for

(33.52)(2.201) + (5.74)(2.160) 33.52 + 5.74

5 0.2300 1.150 _______ _* 9 12.424 Total.