ANALYSIS OF VARIANCE
It is a technique whereby the total variation
present in a set of data is partitioned into
several components.
Associated with each of these components
is a specific source of variation, so that in
the analysis it is possible to ascertain the
magnitude of the contributions of each of
these sources to the total variation
ANALYSIS OF VARIANCE
Analysis of variance is used for two different
purposes
1. to estimate and test hypotheses about population
variances
2. to estimate and test hypotheses about population
means
We will focus on the second use here in this
chapter
We need to indicate that while talking about
means, our decisions will depend on observed
variances
ANALYSIS OF VARIANCE
Here in this chapter, analysis of variance
is used to analyze the results of two
different experimental designs
1. completely randomized design
2. randomized complete block designs
Also cocept of factorial experimets is given
in relation to completely randomized
design
ANALYSIS OF VARIANCE
The nine step procedure that is followed in chapter in
hypothesis testing is also uded in analysis of variance
They are
1. Description of the data
2. Assumtions
3. Hyptheses
4. Thest statistics
5. Distribution of the test statistics
6. Decision rule
7. Calculation of the test statistic
8. Statistical decision
9. Conclusion
THE COMPLETELY RANDOMIZED
DESIGN
In previous chapter, the tests we have seen
were about the comparison of difference or no
difference between two population means
What if we need to compare more than two
populations?
You may say do all possible combinations and
test them separately by means of a ttest
For example, if you have five populations, the
you will have
5
C
2
=10 possible pairs of sample
means
THE COMPLETELY RANDOMIZED
DESIGN
As you can see, the number of possible
combinations can be very large
However, the rather a more serious problem with
this all possible ttests is the risk of false
conclusions
For example for the above case with five sample
with equal means
Let say o=0.05 for each test
Then the probability of failing to reject a hypotesis
of no difference in ach case is 0.95
THE COMPLETELY RANDOMIZED
DESIGN
Now, if the tests were independent, the the
probability of failing to reject a hypothesis of no
difference in all 10 cases would be
(0.95)10=0.5987
Then, the probability of rejecting at least one
hypothesis of no diffrerence would be 10.5987 =
0.4013
If we know that the null hypothesis is true in every
case in this examle, rejecting the null hypothesis
constitutes the committing of a type I error.
Thus, for testing all possible pairs of means from
five sample, the type I error would be committed
40 % of time
THE COMPLETELY RANDOMIZED
DESIGN
So, in summary, we need some other way of
testing among several samples
Analysis of variance (ANOVA) is one such a
method
Simplest ANOVA is one way analysis of
variance in which only one source of
variance is investigated
We can say that the ttest fot two
independent samples is a special case of
one way analysis of variance
1 2 3 k
x
11
x
12
x
13
x
1k
x
21
x
22
x
23
x
2k
x
31
x
32
x
33
x
3k
.. .. .. .. ..
x
n1
1
x
n2
2
x
n3
3
x
nk
k
Total
Mean
TREATMENT
1 .
T
2 .
T
3 .
T
k
T
. ..
T
1 .
x
2 .
x
3 .
x
k
x
. ..
x
THE COMPLETELY RANDOMIZED DESIGN
=
ij
x the ith observation resulting from the jth treatment
i = 1, 2, ,
j
n ; j = 1, 2, , k
= =
=
j
n
i
ij j
x T
1
.
total of the jth treatment
= =
j
j
j
n
T
x
.
.
mean of the jth treatment
= = =
= = =
k
j
n
i
ij
k
j
j
j
x T T
1 1 1
. ..
total of all observations
N
T
x
..
..
=
=
=
k
j
j
n N
1
THE COMPLETELY RANDOMIZED
DESIGN
Example 7.2.1
In a study of the effect of glucose on insuline
release, specimens of pancreatic tissue from
experimental animals were randomly assigned to
be treated with one of the five different stimulants
Latter, a determination was made on the amount
of insuline released
The experimenters wished to know if they could
conclude that there is a difference among the five
treatments with respect to the mean amount of
insuline released
Example 7.2.1
Treatments
1 2 3 4 5
1 1.53 3.15 3.89 8.18 5.86
2 1.61 3.96 3.68 5.64 5.46
3 3.75 3.59 5.7 7.36 5.69
4 2.89 1.89 5.62 5.33 6.49
5 3.26 1.45 5.79 8.82 7.81
6 1.56 5.33 5.26 9.03
7 7.1 7.49
8 8.98
Total 13.04 15.6 30.01 47.69 56.81 163.15
..sample STIMULANT
THE COMPLETELY RANDOMIZED
DESIGN:The model
Any observation is given as
ij j ij
e x + =
j ij ij
x e =
k
j
=
Treatment Effect
The amount by which a group mean differs from
the grand mean we refer to as treatment effect
So j
th
treatment effect is given as
This last equation is our model
t =
j j
j j
t + =
ij j ij
e x + + = t , i = 1, 2, ,
j
n ;
j = 1, 2, , k
Assumtions of The Model
The k sets of observed data constitute k
independent random sample from the respective
populations
Each of the populations from which the samples
come is normally distributed with
j
and variance
o
j
2
Each of the populations has the varince in a way
that o
1
2
=o
2
2
=...=o
k
2=
o
2
to the common variance
The t
j
s are unknown constant and E t
j
= 0 since
the sum of all deviations of
j
from their mean, ,
is zero
Assumtions of The Model
The equation
Has three consequences
1. the e
ij
have a mean of 0, since the mean of x
ij
is
j
2. the e
ij
have a variance equal to the variance of the
x
ij
, since the e
ij
and x
ij
differ only by a constant; that is,
the error variance is equal to o
2
, the common variance
3. the e
ij
are normally and independently distributed
j ij ij
x e =
Hypotheses
We can set the hypotheses as
0 all not :
k ..., 2, 1, j , 0 :
ely alternativ or
equal are all not :
... :
j
2 1
=
= =
= = =
j A
j o
A
k o
H
H
H
H
t
t
Test Statistic
The test statistic for one way analysis of variance is a
computed variance ratio (V.R.)
The V.R. s distributed as the F distribution when Ho is
true and the assumptions are met
The decision rule is: Reject the null hypothesis if the
computed value of V.R. s equal to or greather than the
critical value of F for the selected o level
The calculatton of the test statistic is based on the
partitioning the total variation present in the data into
components
The term variation means here sum of squared deviations
of observations from their mean or in short the total sum
of squares
The Total Sum of Squares
Let us first look at the total sum of the squares
before doing any patitioning as
( )
or
1 1
2
= =
=
k
j
n
i
ij
j
x x SST
N
T
x SST
k
j
n
i
ij
j
2
..
1 1
2
=
= =
The Total Sum of Squares
For Example 7.2.1
N
T
x SST
k
j
n
i
ij
j
2
..
1 1
2
=
= =
( ) ( ) ( )
( )
32
15 . 163
98 . 8 ... 61 . 1 53 . 1
2
2 2 2
+ + + = SST
32
923 . 26617
3529 . 994 =
81008 . 831 3529 . 994 =
54282 . 162 =
The Total Sum of Squares
Now if we partition the total sum of squares as
( )
= =
+ =
k
j
n
i
j j j ij
j
x x x x x SST
1 1
2
.. . . .
( ) ( )  
( ) ( )( ) ( )
= = = = = =
= =
+ + =
+ =
k
j
n
i
j
k
j
n
i
j j ij
k
j
n
i
j ij
k
j
n
i
j j ij
j j j
j
x x x x x x x x
x x x x SST
1 1
2
.. .
1 1
.. . .
1 1
2
.
1 1
2
.. . .
2
If we group terms and expand
The middle term may be written as
( ) ( )
= =
j
n
i
j ij
k
j
j
x x x x
1
.
1
.. .
2
And this term is equal to zero. Then,
( ) ( )
( ) ( )
= = = =
= = = =
+ =
+ =
k
j
n
i
j j
k
j
n
i
j ij
k
j
n
i
j
k
j
n
i
j ij
j j
j j
x x n x x SST
x x x x SST
1 1
2
.. .
1 1
2
.
1 1
2
.. .
1 1
2
.
The Total Sum of Squares
When the number of observations is the same
in each group, the last term on the right may
be rewritten to give;
( ) ( )
= = = =
+ =
k
j
n
i
j
k
j
n
i
j ij
j j
x x n x x SST
1 1
2
.. .
1 1
2
.
where
k
n n n n = = = = ...
2 1
The Within Groups Sum of Squares
Now we can examine the patitioned parts sof the SST
equation
Here the first part of the equation is about the within
each group sum of squares
( ) ( )
( ) ( )
= = = =
= = = =
+ =
+ =
k
j
n
i
j j
k
j
n
i
j ij
k
j
n
i
j
k
j
n
i
j ij
j j
j j
x x n x x SST
x x x x SST
1 1
2
.. .
1 1
2
.
1 1
2
.. .
1 1
2
.
The Within Groups Sum of Squares
( )
( )
= = = = =
= =
k
j
j
j
k
j
n
i
ij
k
j
n
i
j ij
n
T
x x x SSW
j j
1
2
.
1 1
2
1 1
2
.
( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( )
35739 . 41
99551 . 952 3529 . 994
42201 . 403 90516 . 324 10002 . 150 56 . 40 00832 . 34 3529 . 994
8
81 . 56
7
69 . 47
6
01 . 30
6
60 . 15
5
04 . 13
98 . 8 ... 61 . 1 53 . 1
2 2 2 2 2
2 2 2
=
=
+ + + + =
(
+ + + + + + + = SSW
For example 7.2.1
The Among Groups Sum of
Squares
( )
( )
N T
n
T
x x n SSA
k
j
j
j
k
j
j j
2
..
1
2
.
1
2
.. .
= =
= =
( ) ( ) ( ) ( ) ( ) ( )
18543 . 121
81008 . 831 99551 . 952
32
15 . 163
8
81 . 56
7
69 . 47
6
01 . 30
6
60 . 15
5
04 . 13
2 2 2 2 2 2
=
=
+ + + + = SSA
For example 7.2.1
The second part of the SST equation gives SSA as
In Summary
For example 7.2.1
162.54 = 121.18 + 41.36
Now, with these we can calculate the two estimate
of the o
2
The SST can ve given as
SST = SSW + SSA
The first estimate of
2
o
( )
( ) 1
1
2
.
=
j
n
i
j ij
n
x x
j
Which provides an unbiased estimate of the true
variance of the population from which the sample
came
Within any sample
The first estimate of
2
o
( )
( )
=
= =
k
j
j
k
j
n
i
j ij
n
x x
j
1
1 1
2
.
1
Under the assumption that the popultion variances
are equal, it is possible to pool the k estimates to
obtain
This is also called within group variance but
generally referred as within groups mean squares
(MSW)
The first estimate of
2
o
5317552 . 1
27
35739 . 41
27
= = =
SSW
MSW
For example 7.2.1
The second estimate of
2
o
2 2
x
no o =
( )
1
1
2
.. .
=
=
k
x x n
k
j
j
From the equation
When rearranging the equation
n
x
2
2
o
o =
The second estimate of
2
o
( )
1
1
2
.. .
=
k
x x n
k
j
j j
When the sample sizes are not equal, an estimate of o
2
based on the variability among sample means is given
as and it is referred as among group mean squares
(MSA)
The second estimate of
2
o
30 . 30
4
185 . 121
) 1 5 (
= =
=
SSA
MSA
For example 7.2.1
THE COMPLETELY RANDOMIZED
DESIGN
If the null hypotehesis is true, then it is
expected that these two estimates of o
2
to
be fairly close in magnitude
If the null hypothesis is false, that is, if all
population means are not equal, we would
expect the among groups mean squares to
be larger than within groups mean dquares
THE COMPLETELY RANDOMIZED
DESIGN
To understand analysis of variance it must
be realized that the among groups mean
square provides a valid estimte of o
2
when
the assumption of equal population
variances is met and hen Ho is true
Both conditions, a true null hypothesis and
equal population variances, must be met in
order for the among groups mean square to
be a valid estimate of o
2
The variance Ratio
Now, it is time to compare two estimates of o
2
by
calculating the folowing ratio as
If the two estimates are both equal, V.R. Will be close to 1
A ratio close to 1tends to support the hypothesis of equal
population means
Onthe other hand, if the among groups mean square is
conciderably larger than the within groups mean square,
V.R. Will be much larger than 1
Avalue of V.R. sufficiently greather than 1 will bring doubt
on the hypothesis of equal population means
MSW
MSA
R V = =
squares mean group within
squares mean group among
. .
The variance Ratio
Now, the question is
how large a value of V.R. s required for us
to be willing to conclude that the observed
difference between our two estimate of o
2
is
not the result of chance alone?
The Ftest
To answer the previous question, we need to
concidere the sampling distribution of the ratio of
two sample varianve
We have learned (chapter 5) that the quantity
Follows F distribution when the sample variances
are computed from random and independently
drawn samples from normal populations
) (
) (
2
2
2
2
2
1
2
1
o
o
s
s
The Ftest
Now if the population variances are equal
then
Which is distributed as F
The F distribution depends on
Numerator degrees of freedom
Denominator degrees of freedom
Selected significant level
) (
) (
) (
2
2
2
1
2
2
2
2
2
1
2
1
s
s
s
s
=
o
o
The Ftest
The numerator degress of freedom is found as
And the denominator degrees of freedom is
obtained as
For example 7.2.1
k N
1 k
27 5 32
4 1 5 1
= =
= =
k N
k
Statistical Decision
To reach a decision, the compted V.R. Must be
compared with the critical (table F) value of F from
Table G with 4 and 27 degrees of freedom
For example 7.2.1
f o=0.05, the table value of F
(4,27)
is 2.73
The computed value of F was
F=MSA / MSW
F=30.29 / 1.53 = 19.78
Statistical Decision
Since the computed value (19.78) is greather than the
table (critical) value (2.73), we need to explain the
diffrence
There are two possible explanation
If the null hypothesis is true (sample variances are
estimates of common variance), then we know that the
probability of getting a value as large or larger than 2.73 is
0.05
The second explation is that the null hypothesis is false so
we reject the null hypothesis and conlude that not all
population means are equal
Statistical Decision
The p value of our example 7.2.1. will be even less than
0.005 since 19.78 > F
0.995
= 4.74
The completely randomized design is used when the units
receiving the treatments are homogeneous
If not homogeneous, then some lternative designs must be
used such as randomized complete block design
The completely randomized design is rather preferred for
the cases where the tratments have same sizes
Finally the computer anlysis of this method is much more
simple than we have seen so far
THE COMPLETELY RANDOMIZED
DESIGN
..sample STIMULANT
1 2 3 4 5
1 1.53 3.15 3.89 8.18 5.86
2 1.61 3.96 3.68 5.64 5.46
3 3.75 3.59 5.7 7.36 5.69
4 2.89 1.89 5.62 5.33 6.49
5 3.26 1.45 5.79 8.82 7.81
6 1.56 5.33 5.26 9.03
7 7.1 7.49
8 8.98
Total 13.04 15.6 30.01 47.69 56.81 163.15
Mean 2.608 2.6 5.00167 6.81286 7.10125 6.66545
Anova: Single
Factor
SUMMARY
Groups Count Sum Average Variance
Column 1 5 13.04 2.608 0.99172
Column 2 6 15.6 2.6 1.20808
Column 3 6 30.01 5.001667 0.916377
Column 4 7 47.69 6.812857 2.044224
Column 5 8 56.81 7.10125 2.071841
ANOVA
Source of
Variation SS df MS F Pvalue F crit
Between Groups 121.1854 4 30.29636 19.77885 1.05E07 2.727766
Within Groups 41.35739 27 1.531755
Total 162.5428 31
Testing for significant Differences Between
Individual Pais of Means
When ANOVA leads to a rejection of the null hypothesis of
no diffrence among population means, the question
naturally arises as which pairs of means are diffrent
For our example 7.2.1 we need to look for for the 10
possible pairs and find out wich pair or pairs causes the
rejection of the null hypothesis
The are several procedures for this and some of them will
be covered here such as
Fishers least significant difference (LSD)
Duncans new multiple range set
Tukeys honestly significant difference (HSD)
We will look at HSD
Tukeys HSD Test
The HSD test makes use of a single value against which all
differences are compared
It is given as
Where o is the chosen level of significance and k is the
number of means in the experiment, N is the total number
of observations, n is the numbe of observations in a
treatment, MSE is the error or within mean square (MSW)
and q is obtained by entering Table H with o, k, and Nk
n
MSE
q HSD
k N k
=
, , o
Tukeys HSD Test
All possible differences between pairs of means are
computed
Any difference that yields an absolute value that exceeds
HSD is declared to be significant
When the samples are not all the same size
Where n
j
* is the smallest of the two sample sizes
associated with the two sample means that are to be
compared
*
, ,
*
j
k N k
n
MSE
q HSD
=
o
Example7.2.2
Here we will apply HSD test to example 7.2.1
First we need to form a table of all possible (ordered) differences
between means as
 0.01 2.4 4.21 4.50
 2.39 4.20 4.49
 1.81 2.10
 0.29

0 . 5
3 .
= x
61 . 2
1 .
= x
60 . 2
2 .
= x
81 . 6
4 .
= x
1 . 7
5 .
= x
60 . 2
2 .
= x 61 . 2
1 .
= x 0 . 5
3 .
= x
81 . 6
4 .
= x 1 . 7
5 .
= x
Tukeys HSD Test
If we take o=0.05 and k=5, Nk=27, the the value of q is
foun as 4.14 (approximately by interpolation)
The MSE (MSW) is 1.532
With these we can now calculate HSD* values as
For Ho: 1=2
29 . 2
5
532 . 1
14 . 4
*
*
, ,
*
= =
=
HSD
n
MSE
q HSD
j
k N k o
29 . 2
5
532 . 1
14 . 4
*
= = HSD
Hypothesis HSD* Statistical Decision
Do not reject Ho
Since 0.01 < 2.29
Reject Ho
Since 2.39 > 2.29
Reject Ho
Since 4.20 > 2.29
Reject Ho
Since 4.49 > 2.29
Reject Ho
Since 2.40 > 2.09
Reject Ho
Since 4.21 > 2.09
Reject Ho
Since 4.50 > 2.09
Do not reject Ho
Since 1.81 < 2.09
Reject Ho
Since 2.10 > 2.09
Do not reject Ho
Since 0.29 < 1.94
5 2
: = Ho
5 4
: = Ho
29 . 2
5
532 . 1
14 . 4
*
= = HSD
29 . 2
5
532 . 1
14 . 4
*
= = HSD
29 . 2
5
532 . 1
14 . 4
*
= = HSD
09 . 2
6
532 . 1
14 . 4
*
= = HSD
09 . 2
6
532 . 1
14 . 4
*
= = HSD
09 . 2
6
532 . 1
14 . 4
*
= = HSD
09 . 2
6
532 . 1
14 . 4
*
= = HSD
09 . 2
6
532 . 1
14 . 4
*
= = HSD
94 . 1
7
532 . 1
14 . 4
*
= = HSD
3 1
: = Ho
2 1
: = Ho
5 1
: = Ho
4 1
: = Ho
4 2
: = Ho
3 2
: = Ho
5 3
: = Ho
4 3
: = Ho
29 . 2
5
532 . 1
14 . 4
*
= = HSD
THE RANDOMIZED COMPLETE BLOCK
DESIGN
This is a design in which the units (called
experimental units) to which the treatments are
applied are subdivided into homogeneous groups
called blocks so that the number of experimental
units in a block is equal to the number (or some
multiple of the number) of treatmnts being studied
The treatments are then assigned at random to
the experimental units within each block
Thus, each treatment apperars in every block, and
each block receives every treatment
THE RANDOMIZED COMPLETE BLOCK
DESIGN
Twoway Analysis of Variance
The model is
ij j i ij
e x + + + = t 
n i ,..., 2 , 1 =
k j ,..., 2 , 1 =
Block Effect
THE RANDOMIZED COMPLETE BLOCK
DESIGN
Twoway Analysis of Variance
. 2
x
. 3
x
. 1
x
1 .
x
2 .
x
3 .
x
k
x
.
..
x
BLOCKs
TREATMENTS
1 2 3 k
Total Mean
1 x
11
x
12
x
13
x
1k
T
1.
2 x
21
x
22
x
23
x
2k
T
2.
3 x
31
x
32
x
33
x
3k
T
3.
... .. .. .. .. .. ...
n x
n1
x
n2
x
n3
x
nk
T
n.
Total T
.1
T
.2
T
.3
T
.k
T
..
Mean
. n
x
THE RANDOMIZED COMPLETE BLOCK
DESIGN
Twoway Analysis of Variance
The model is
( ) ( ) ( ) ( )
= = = = = = = =
+ + + =
k
j
n
i
j i ij
k
j
n
i
j
k
j
n
i
i
k
j
n
i
ij
x x x x x x x x x x
1 1
2
.. . .
1 1
2
.. .
1 1
2
.. .
1 1
2
..
SSE SSTr SSBl SST + + =
Blocks Treatments
THE RANDOMIZED COMPLETE BLOCK
DESIGN
Twoway Analysis of Variance
The model is
C x SST
k
j
n
i
ij
=
= = 1 1
2
C
k
T
SSBl
n
i
i
=
=1
2
.
C
n
T
SSTr
k
j
j
=
=1
2
THE RANDOMIZED COMPLETE BLOCK
DESIGN
Twoway Analysis of Variance
The model is
SSTr SSBl SST SSE =
kn T kn x C
k
j
n
i
ij
2
..
2
1 1
=


.

\

=
= =
THE RANDOMIZED COMPLETE BLOCK
DESIGN
Twoway Analysis of Variance
The appropriate degrees of freedom for
each component of model equation is
Total = k*n1
Block = n1
Treatments = k1
Residual (error) = (n1)*(k1)
THE RANDOMIZED COMPLETE BLOCK
DESIGN
Twoway Analysis of Variance
Example 7.3.1
A physical therapist wished to compare three
methods for teaching patients to use a certain
prosthetic device
He felt that the rate of learning would be different for
patients of different ages and wished to design an
experiment in which in infulence of age could be
taken into accont
Apply randomized complete block design to the
data given in the following table and anlyze the data
at an o=0.05
Twoway Analysis of Variance
Teaching Method
A B C total mean
Age Group
under 20 7 9 10 26 8.66667
20 to 29 8 9 10 27 9
30 to 39 9 9 12 30 10
40 to 49 10 9 12 31 10.3333
50 and over 11 12 14 37 12.3333
total 45 48 58 151
mean 9 9.6 11.6 10.0667
Twoway Analysis of Variance
Anova: Two
Factor Without
Replication
SUMMARY Count Sum Average Variance
Row 1 3 26 8.666667 2.333333
Row 2 3 27 9 1
Row 3 3 30 10 3
Row 4 3 31 10.33333 2.333333
Row 5 3 37 12.33333 2.333333
Column 1 5 45 9 2.5
Column 2 5 48 9.6 1.8
Column 3 5 58 11.6 2.8
Twoway Analysis of Variance
ANOVA
Source of Variation SS df MS F
P
value F crit
Rows 24.93 4.00 6.23 14.38 0.00100 3.84
Columns 18.53 2.00 9.27 21.38 0.00062 4.46
Error 3.47 8.00 0.43
Total 46.93 14.00
THE FACTORIAL EXPERMENT
In the previous two cases, we were interested in
only one variable, the treatments
However, in many other cases we may want to
study, simultaneously, the effect of two or more
variables
The variables of inteerst that we would like to
investigate are called factors
The experiment in which two or more factors are
studied simultaneously is called factorial
experiment.
THE FACTORIAL EXPERMENT
Different designated categories of factor are called
levels in factorial experiments
For example if we are studying the effect on
reaction time of three dosages of some drug, then
the drug is factor and it is said to occur in three
levels
In a factorial experiment, we may study effect of
individual factors as well as the interactions of
factors if the experiment is designed properly
The following example illustrates the econcept of
interaction
THE FACTORIAL EXPERMENT
Example 7.4.1
Soppose, in terms of effect on reaction time, the true
relationship between three dosage levels of some drug and
the age of human subjects taking the drug is known
Soppose further that age occurs at two levels as young
(under 65) and old (65 and over)
If the true relationship between the two factor is known, we
will know for the three dosage levels, the mean effect of
reaction time of subjects in the two age groups
Let us assume that effect is measured in terms of reduction
in reaction time to some strimulus as the means are shown
in Table.7.4.1
THE FACTORIAL EXPERMENT
no interaction case
Factor B Grug Dosage
Factor A
Age
j=1 J=2 J=3
Young(i=1)
11
=5
12
=10
13
=20
Old (i=2)
21
=10
22
=15
23
=25
THE FACTORIAL EXPERMENT
Drug effect when no interaction present
0
5
10
15
20
25
30
1 2 3
Drug Dosage
R
e
d
u
c
t
i
o
n
i
n
R
e
a
c
t
i
o
n
T
i
m
e
Young(i=1)
Old (i=2)
THE FACTORIAL EXPERMENT
Age effect when no interaction present
0
5
10
15
20
25
30
1 2 3
Age
R
e
d
u
c
t
i
o
n
i
n
R
e
a
c
t
i
o
n
T
i
m
e
j=1
J=2
J=3
THE FACTORIAL EXPERMENT
Age effect when no interaction present
0
5
10
15
20
25
30
1 2 3
Drug Dosage
R
e
d
u
c
t
i
o
n
i
n
R
e
a
c
t
i
o
n
T
i
m
e
Young(i=1)
Old (i=2)
0
5
10
15
20
25
30
1 2 3
Age
R
e
d
u
c
t
i
o
n
i
n
R
e
a
c
t
i
o
n
T
i
m
e
j=1
J=2
J=3
THE FACTORIAL EXPERMENT
The effect of ome type of interaction
Factor B Grug Dosage
Factor A
Age
j=1 J=2 J=3
Young(i=1)
11
=5
12
=10
13
=20
Old (i=2)
21
=15
22
=10
23
=5
THE FACTORIAL EXPERMENT
Age and drug effects
0
5
10
15
20
25
1 2 3
Drug Dosage
R
e
d
u
c
t
i
o
n
i
n
R
e
a
c
t
i
o
n
T
i
m
e
Young(i=1)
Old (i=2)
THE FACTORIAL EXPERMENT
Age and drug effects
0
5
10
15
20
25
1 2 3
Age
R
e
d
u
c
t
i
o
n
i
n
R
e
a
c
t
i
o
n
T
i
m
e
j=1
J=2
J=3
THE FACTORIAL EXPERMENT
Age and drug effects
0
5
10
15
20
25
1 2 3
Drug Dosage
R
e
d
u
c
t
i
o
n
i
n
R
e
a
c
t
i
o
n
T
i
m
e
Young(i=1)
Old (i=2)
0
5
10
15
20
25
1 2 3
Age
R
e
d
u
c
t
i
o
n
i
n
R
e
a
c
t
i
o
n
T
i
m
e
j=1
J=2
J=3
THE FACTORIAL EXPERMENT
Example 7.4.2
In a study of length of time spent on individual home visits by
public health nurses, data were reported on length of home
visit, in minutes, by a sample of 80 nurses.
A record was made also of each nurses age and the type of
ilness of each patient visited.
The researchers wished to obtain from their investigation
answers to the following questions
1. Does the mean length of home visit differ among diffrent
age groups of nurses?
Does the type of patient affect the mean length of home
visit?
Is there interaction between nurses age and type of patient?
THE FACTORIAL EXPERMENT
The model
( )
ijk ij j i ijk
e x + + + + = o  o
a i ,..., 2 , 1 =
b j ,..., 2 , 1 =
n k ,..., 2 , 1 =
THE FACTORIAL EXPERMENT
Hypotheses
0 ) ( all :
0 ) ( :
0 all :
0 :
0 all :
0 :
=
=
=
=
=
=
ij A
ij o
j A
j o
i A
i o
not H
H
not H
H
not H
H
o
o


o
o
a i ,..., 2 , 1 =
b j ,..., 2 , 1 =
THE FACTORIAL EXPERMENT
Test statistic
( ) ( ) ( )
= = = = = = = = =
+ =
a
i
b
j
n
k
ij ijk
a
i
b
j
n
k
ij
a
i
b
j
n
k
ijk
x x x x x x
1 1 1
2
.
1 1 1
2
... .
1 1 1
2
...
SSE SSTr SST + =
THE FACTORIAL EXPERMENT
The SSTr can be broken down into three part as
( ) ( )
( )
( )
= = =
= = =
= = = = = =
+
+ +
+ =
a
i
b
j
n
k
j i ij
a
i
b
j
n
k
j
a
i
b
j
n
k
i
a
i
b
j
n
k
ij
x x x x
x x
x x x x
1 1 1
2
... . . .. .
1 1 1
2
... . .
1 1 1
2
... ..
1 1 1
2
... .
SSAB SSB SSA SSTr + + =
THE FACTORIAL EXPERMENT
C x SST
a
i
b
j
n
k
ijk
=
= = = 1 1 1
2
C
n
T
SSTr
a
i
b
j
ij
=
= = 1 1
2
SSTr SST SSE =
SSE SSTr SST + =
THE FACTORIAL EXPERMENT
SSAB SSB SSA SSTr + + =
C
bn
T
SSA
a
i
i
=
=1
2
..
C
an
T
SSB
b
j
j
=
=1
2
. .
SSB SSA SSTr SSAB =
abn x C
a
i
b
j
n
k
ijk
2
1 1 1


.

\

=
= = =
THE FACTORIAL EXPERMENT
Twoway Analysis of Variance with Replicates
For preparation of standard reference sample for the
determination of manganese in alloyed steel a
loboratory intercomparison study is carried out
Four laboratories participitated and each lab uses
three different analytical principles
Two way analysis of variance is to be used to test
whether there are systematic differences between the
laboratories and the principles of analysis
The following table shows the analysis results
ij y
Example 2
Analytical determinations of manganese (mass %)
Example 2
Analytical
Principles
Laboratories
1 2 3 4 Mean
1 2.01 1.96 1.99 2.03 2.00
2 1.97 2.05 2.04 1.99 2.01
3 2.05 2.06 2.11 2.12 2.09
Mean 2.01 2.02 2.05 2.05 2.03
Two way analysis of variance results
Example 2
Anova: TwoFactor Without
Replication
SUMMARY Count Sum Average Variance
1 4 7.99 1.9975 0.00089
2 4 8.05 2.0125 0.00149
3 4 8.34 2.085 0.00123
1 3 6.03 2.01 0.0016
2 3 6.07 2.02333 0.00303
3 3 6.14 2.04667 0.00363
4 3 6.14 2.04667 0.00443
Two way analysis of variance results
Example 2
ANOVA
Source of Variation SS df MS F Pvalue F crit
Rows 0.02 2 0.00876 6.66596 0.03 5.14
Columns 0 3 0.00099 0.75264 0.56 4.76
Error 0.01 6 0.00131
Total 0.03 11
For the comparison of analytical principles the critical
Fvalue at a significant level of o=0.05 is
F
(1o=0.95;f1=3;f2=6)
=5.14
This value is less than the calculated value (6.66) so
the conclusion is that different analytical principles
lead to systematic differences
Example 2
For the comparison of laboratories the critical The
critical Fvalue at a significant level of o=0.05 is
F
(1o=0.95;f1=2;f2=6)
=4.76
This value is larger than the calculated value (0.75)
thus the test result is statistically not significant.
The differences between the labs are random
Example 2
Let us look at the same example in a way that each
lab made three replication for each analytical
principles
ij
y
ANOVA two factor with replication
Analytical Principles
1 2 3 4 Mean
1 2.01 1.96 1.99 2.03
1 2.03 1.99 2.03 2.01 2.00583
1 2.02 1.97 1.97 2.06
2 1.97 2.05 2.04 1.96
2 1.94 2.08 2.02 1.96 2.0075
2 1.98 2.09 2.03 1.97
3 2.01 2.03 2.12 2.12
3 2.03 2.02 2.1 2.13 2.07417
3 2.05 2.06 2.11 2.11
Mean 2.0044 2.028 2.045556 2.03889 2.02917
Laboratories
ij
y
ANOVA two factor with replication
Anova: TwoFactor With Replication
SUMMARY 1 2 3 4 Total
1
Count 3 3 3 3 12
Sum 6.06 5.92 5.99 6.1 24.07
Average 2.02 1.973 1.996667 2.03333 2.00583
Variance 0.0001 2E04 0.000933 0.00063 0.00092
2
Count 3 3 3 3 12
Sum 5.89 6.22 6.09 5.89 24.09
Average 1.9633 2.073 2.03 1.96333 2.0075
Variance 0.0004 4E04 0.0001 3.3E05 0.00257
3
Count 3 3 3 3 12
Sum 6.09 6.11 6.33 6.36 24.89
Average 2.03 2.037 2.11 2.12 2.07417
Variance 0.0004 4E04 0.0001 0.0001 0.00203
ij
y
ANOVA two factor with replication
Total
Count 9 9 9 9
Sum 18.04 18.25 18.41 18.35
Average 2.0044 2.028 2.045556 2.03889
Variance 0.0012 0.002 0.002828 0.00481
ij
y
ANOVA two factor with replication
ANOVA
Source of Variation SS df MS F Pvalue F crit
Sample 0.0365 2 0.018233 55.6271 9.7E10 3.403
Columns 0.0088 3 0.002929 8.93503 0.00037 3.009
Interaction 0.044 6 0.007326 22.3503 1E08 2.508
Within 0.0079 24 0.000328
Total 0.0971 35
Five charges of gasoline (A, B, C, D, E) must be
compared with respect to their octane rating
To account for confounded effects from the anlyast
and the day of the analysis a 5x5 Latin square is used
in the experimental design
ij
y
Example
Analysts
Days 1 2 3 4 5
1 A B C D E
2 B C D E A
3 C D E A B
4 D E A B C
5 E A B C D
Use and appropriate softwere and ANOVA procedure
to decide whether thre are systematic differences
between the five cgarges at a significance level of
0.05
ij
y
Example
Analysts
Days 1 2 3 4 5
1 96.6 96.6 95.7 96.1 97
2 96.2 96.6 95.3 96.5 95.9
3 96.2 96 95.8 95.8 95.8
4 94.9 96.6 95.9 96.3 95.7
5 96 96.2 96.1 95.2 95.9
The 2
3
Factorial Design
Effects in The 2
3
Factorial Design
etc, etc, ...
A A
B B
C C
A y y
B y y
C y y
+
+
+
=
=
=
Analysis
done via
computer
Table 6.3 (p. 214)
Algebraic Signs for Calculating Effects in the 2
3
Design
An Example of a 2
3
Factorial
Design
Example 6.1
A 2
3
factorial design for nitride etch process
Factors are:
Gap between electrodes (A)
gas flow (B)
RF power (C)
The response is etch rate
An Example of a 2
3
Factorial
Design
A = gap, B = Flow, C = Power, y = Etch Rate
Figure 6.6 (p. 216)
The 2
3
design for the plasma etch experiment for Example 61.
Table of and + Signs for the 2
3
Factorial Design (pg. 214)
Properties of the Table
Except for column I, every column has an equal number of +
and signs
The sum of the product of signs in any two columns is zero
Multiplying any column by I leaves that column unchanged
(identity element)
The product of any two columns yields a column in the table:
Orthogonal design
Orthogonality is an important property shared by all factorial
designs
2
A B AB
AB BC AB C AC
=
= =
Estimation of Factor Effects
ANOVA Summary Full Model
ANOVA Summary Full Model
Look in folder, 2k(k=3) design
load D.txt
y=D(:,4);
f1=D(:,1);
f2=D(:,2);
f3=D(:,3);
[p,table,stats,terms] = anovan(y, {f1 f2 f3}, 'varnames',{'A' 'B' 'C'},'model', 'full')