Anda di halaman 1dari 9

ANOVAAnalysis of Variance

This file is part of a program based on the Bio 4835 Biostatistics class taught at Kean University in Union, New Jersey.
The course uses the following text:
Daniel, W. W. 1999. Biostatistics: a foundation for analysis in the health sciences. New York: John Wiley and Sons.
The file follows this text very closely and readers are encouraged to consult the text for further information.

ANALYSIS OF VARIANCEANOVA

Introduction

ANOVA means Analysis of Variance. It is used to separate the total variation in a set of data
into two or more components. The source of variation is identified so that one can see its influence on
the total variation. It is also used to compare means where there are three or more.

ANOVA is used to analyze the data from experiments. The purposes are for estimating and
testing hypotheses about population variances and population means. There are several types of
experiments and techniques which utilize ANOVA. These include one-way ANOVA, two-way
ANOVA and multiple ANOVA, which come from experiments employing the completely randomized
design, randomized complete block design, repeated measures design or factorial experiment design.

One-way ANOVA is used to determine if there is any significant difference between the means
of groups of data. These groups may vary under the effect of one factor. The data are organized into
groups and presented in a data table.

Data

For ANOVA work, the data are presented in a data table. There must be at least three groups of
data although more are possible.

Sample ANOVA data table.

The sample table above shows four groups. Additional columns are added as necessary to
accommodate each group. The groups do not need to be the same size. For each group of data we
need to find Sx, Sx2 and n.

Assumptions

The assumptions in ANOVA are


m normal distribution of data
m independent simple random samples
m constant variance

Hypotheses

H0: all means are equal

HA: not all means are equal

Statistical test

Test statistic and distribution

The ANOVA test statistic is the variance ratio, V.R., which is distributed as F with the
appropriate number of numerator degrees of freedom and denominator degrees of freedom at the
chosen a level.

A big value of F means to reject the null hypothesis. A small value means not to reject.

Calculations

Basic statistical calculations are made to determine Sx, Sx2 and n for each group. Also
required are N, the total number of measurements and k, the total number of groups. Then, an
ANOVA table is made as shown below.

Sample ANOVA table.

The ANOVA table has columns for degrees of freedom (df), sums of squares (SS), mean
squares (MS) and the variance ratio (F). These values are found using a series of calculations.

For degrees of freedom, N and k are used in the following formulas.

TOTAL df = N - 1
GROUP df = k - 1
ERROR df = N - k

The error term reflects how much each individual measurement differs from the population
mean of its group.
Steps for ANOVA calculations

[A] Calculate the correction factor

[B] Calculate the Sum of Squares Total value (SS Total)

SS Total = Sx2 - CF

[C] Calculate the SS Group value

[D] Calculate the SS Error value

SS Error = SS Total - SS Group

[E] Calculate MS Group value

[F] Calculate MS Error value

[G] Calculate F value (V.R.)

All of the above equations are used in the ANOVA calculations. All except equation [A]
appear in the ANOVA calculation table.

Discussion and conclusions


The statistical decision in ANOVA is based on whether the value of F exceeds the critical value
from the table of F values for the appropriate numerator and denominator degrees of freedom. A large
value of F indicates that the factor is important in causing the variation. We will see that it is very easy
these days to perform ANOVA calculations using an appropriate calculator or computer software
program.

Sample one-way ANOVA problem

a. Given

For this problem, data were obtained from goldfish breathing experiments conducted in biology
laboratory. The opercular breathing rates in counts per minute were collected in groups of 8
measurements at different temperatures ranging from 12C to 27C. The data are given in the table
below.

Opercular breathing rates in counts per minute of goldfish at various temperatures


(Source: Kean University biology laboratory)

N = 48 (number of measurements)
k = 6 (number of groups)

b. Assumptions

It is assumed that there is normal distribution of the data, that the data represent independent
random samples and that there is a constant variance.
c. Hypotheses

H0: all means are equal


HA: not all means are equal

d. Statistical test

Test statistic

The test statistic is the variance ratio.

Distribution

The test statistic is distributed as F with 5 numerator degrees of freedom (k-1) and 42
denominator degrees of freedom (N-k).

Decision criteria

The critical value of F with 5 numerator degrees of freedom and 42 denominator degrees of
freedom is about 2.45 at the 95% confidence level. We reject H0 if V.R. > 2.45.

Graph showing the critical value of F.

e. Calculations
Recall that the calculator uses exact values. Each calculation formula has its own letter
corresponding to a cell in the ANOVA calculation table. It is suggested that the values resulting from
each calculation be stored in their corresponding storage location on the calculator. Do not round off
results of calculations due to the strong risk of erroneous results being obtained.

[A] Calculate Correction Factor

The formula used to calculate the correction factor is

The term (Sx)2 is obtained by adding all of the values in the Sx row of the data table, then squaring it.
Store the result in location [A].

Value for sample problem

[B] Calculate Sum of Squares Total value (SS Total)

The value for SS Total is calculated using the following formula.

SS Total = Sx2 - CF

To find SS Total, the values in the Sx2 row of the data table are all added together, then the value of
CF is subtracted. The result is stored in location [B]. It is assumed that the value of CF is stored in
location [A].

For the sample problem

[C] Calculate Sum of Squares Group value (SS Group)

For this calculation the formula is


For each group, the value of Sx from the data table is squared and divided by n. Then all of these
fractions are added together. Finally, the Correction Factor [A] is subtracted. The result is stored in
[C].

For the sample data

[D] Calculate Sum of Squares Error value (SS Error)

The formula for calculating SS Error value is

SS Error = SS Total - SS Group

For the sample calculations, it is assumed that SS Total is located in [B] and SS Group is located in
[C]. The result is stored in [D].

For the sample problem

[E] Calculate Mean Square Group value (MS Group)

The value of MS Group is calculated as follows

This value is also known as the Mean Square Factor (MS Factor). The result is stored in location [E].

For the sample problem

F] Calculate Mean Square Error value (MS Error)

For calculating the MS Error value, the formula is


Following these procedures so far, the value of SS Error is stored in [D]. The result of this calculation
is stored in [F].

[G] Calculate Variance Ratio

The formula for V.R. is

For the sample problem

Final appearance of sample ANOVA calculation table.

f. Discussion

The 95% confidence level for F with 5 numerator degrees of freedom and 42 denominator
degrees of freedom is about 2.45 as read from the F tables. The actual value is 12.01 with a probability
(calculator value) of 2.98 x 10-7. This means that H0 is rejected.

g. Conclusions

We conclude that not all the means of the groups are equal.
Modern calculation methods

With the advent of statistical calculators, such as the TI-83, and spreadsheet programs with
built-in statistical calculation capabilities, there is no longer any reason that a researcher needs to do
any of these calculations manually anymore.

In old times, researchers worked with paper and a pencil. It is amazing what they
accomplished that way. But it is no longer necessary to become overly concerned with performing
manual calculations, even using a calculator, when there is the automatic way which can calculate a
complete ANOVA in seconds.

It is important, however, to understand the basis of why a technique such as ANOVA is useful
and the mathematical and statistical basis underlying it. Once that basis is understood, it is relatively
easy to do the calculations and go back to the really important considerations, such as the effects of the
factors or treatments on the groups in the study under consideration.

Anda mungkin juga menyukai