Anda di halaman 1dari 6

Analysis of Variance - One Way

I. Introduction
The ANalysis Of VAriance (or ANOVA) is a powerful and common statistical procedure in the social sciences. It can handle a
variety of situations. We will tal a!out the case of one !etween "roups factor here and two !etween "roups factors in the ne#t
section.
The e#ample that follows is !ased on a study !y $arley and %atan& ('()(). The authors were interested in whether the presence of
other people has an influence on whether a person will help someone in distress. In this classic study* the e#perimenter (a female
"raduate student) had the su!+ect wait in a room with either ,* -* or . confederates. The e#perimenter announces that the study will
!e"in shortly and wals into an ad+acent room. In a few moments the person(s) in the waitin" room hear her fall and complain of
anle pain. The dependent measure is the num!er of seconds it taes the su!+ect to help the e#perimenter.
/ow do we analy0e this data1 We could do a !unch of !etween "roups t tests. /owever* this is not a "ood idea for three reasons.
'. The amount of computational la!or increases rapidly with the num!er of "roups in the study.
Num!er
2roups
Num!er 3airs
of 4eans
5 5
. )
6 ',
) '6
7 -'
8 -8
-. We are interested in one thin" -- is the num!er of people present related to helpin" !ehavior1 -- thus it would !e nice to !e
a!le to do one test that would answer this 9uestion.
5. The type I error rate rises with the num!er of tests we perform.
II. Logic
The reason this analysis is called ANOVA rather than multi-"roup means analysis (or somethin" lie that) is !ecause it compares
"roup means !y analy0in" comparisons of variance estimates. :onsider;
We draw three samples. Why mi"ht these means differ1 There are two reasons;
'. 2roup 4em!ership (i.e.* the treatment effect or IV).
-. $ifferences not due to "roup mem!ership (i.e.* chance or sampling error).
The ANOVA is !ased on the fact that two independent estimates of the population variance can !e o!tained from the sample data. A
ratio is formed for the two estimates* where;
one is sensitive to treatment effect < error
!etween "roups
estimate
and the other to error within "roups estimate
2iven the null hypothesis (in this case /O; '=-=5)* the two variance estimates should !e e9ual. That is* since the null assumes no
treatment effect* !oth variance estimates reflect error and their ratio will e9ual '. To the e#tent that this ratio is lar"er than '* it
su""ests a treatment effect (i.e.* differences !etween the "roups).
It turns out that the ratio of these two variance estimates is distri!uted as > when the null hypothesis is true.
http://www4.uwsp.edu/psych/stat/12/anova-1w.htm#V
Note;
'. > is an e#tended family of distri!utions* which varies as a function of a pair of de"rees of freedom (one for each variance
estimate).
-. > is positively sewed.
5. > ratios* lie the variance estimates from which they are derived* cannot have a value less than 0ero.
?sin" the >* we can compute the pro!a!ility of the o!tained result occurrin" due to chance. If this pro!a!ility is low (p )* we will
re+ect the null hypothesis.
III. Notation
We already new that;
i = any score
n = the last score (or the num!er of scores)
What is new here is that;
+ = any "roup
p = the last "roup (or the num!er of "roups)
Thus;
2roup
' - @ 3
A'' A'- A'+ A'p
A-' A-- A-+ A-p
Ai' Ai- Ai+ Aip
An' An- An+ Anp
T' T- T+ Tp
n' n- n+ np
And;
1.
2.
3.
4.
5.
IV. Terminology
Bince we are talin" a!out the analysis of the variance* letCs review what we now a!out it.
Bo the variance is the mean of the s9uared deviations a!out the mean (4B) or the sum of the s9uared deviations a!out the mean (BB)
divided !y the de"rees of freedom.
V. Partitioning the Variance
As noted a!ove* two independent estimates of the population variance can !e o!tained. D#pressed in terms of the Bum of B9uares;
To mae this more concrete* consider a data set with 5 "roups and . su!+ects in each. Thus* the possi!le deviations for the
score X13 are as follows;
As you can see* there are three deviations and;
!
within
"roups

!etween
"roups

total
deviation
E' E- E5
To o!tain the Bum of the B9uared $eviations a!out the 4ean (the BB)* we can s9uare these deviations and sum them over all the
scores.
Thus we have;
Note; n+ in formula for the BBFetween means do it once for each deviation.
VI. The " Test
It is simply the ratio of the two variance estimates;
As usual* the critical values are "iven !y a ta!le. 2oin" into the ta!le* one needs to now the de"rees of freedom for !oth the
!etween and within "roups variance estimates* as well as the alpha level.
>or e#ample* if we have 5 "roups and ', su!+ects in each* then;
$fF = p - ' = 5 G ' = -
$fW = p(n - ')
or with une9ual NCs;

= 5 H (',-') = -7

$fT = N - ' = 5, - ' = -(
Note that the df add up to the total and with =.,6* >crit= 5.56
VII. "ormal #$am%le
'. &esearch 'uestion
$oes the presence of others influence helpin" !ehavior1
2. (y%otheses
In )ym*ols In +ords
/O '=-=5 The presence of others does not influence helpin".
/A Not /o The presence of others does influence helpin".
5. ,ssum%tions
') The null hypothesis.
-) The su!+ects are sampled randomly.
5) The population distri!ution of the $V is normal in shape.
.) The "roups are independent.
6) The population variances are homo"enous.
.. -ecision rules
2iven 5 "roups with .* 6* and 6 su!+ects* respectively* we have (5-'=) - df for the !etween "roups variance estimate and
(5I.I.=) '' df for the within "roups variance estimate. (Note that it is "ood to chec that the df add up to the total.) Now
with an level of .,6* the ta!le shows a critical value of > is 5.(8. If >o!s >crit* re+ect /o* otherwise do not re+ect /o.
6. .om%utation - J4inita!K
/ere is the data (i.e.* the num!er of seconds it too for fols to help);

E people present
, - .
-6 5, 5-
5, 55 5(
-, -( 56
5- ., .'
5) ..
',7 ')8 '('
. 6 6
-).8 55.) 58.-
A "ood way to descri!e this data would !e to plot the means;
>or the analysis* we will use a "rid as usual for most of the calculations;

/ X
2
2 X
2
4 X
2

-6 )-6 5, (,, 5- ',-.
5, (,, 55 ',8( 5( '6-'
-, .,, -( 8.' 56 '--6
5- ',-. ., '),, .' ')8'
5) '-() .. '(5)
',7

')8

'('

=.)) T
. 6 6 ='. N
-).8 55.) 58.-
-(.( 67-) 7587 ='),)- II
-8)-.-6 6)...8 7-().- ='68,5.-6 III
Now we need the "rand totals and the three intermediate 9uantities;
I.
II.
III.
And now we can compute the BBCs (remem!er to chec that they add up);
))0! III1I! 15803.25-15511.14! 292.11
))+! II1III! 16062-15803.25! 258.75
))T! II1I! 16062-15511.14! 550.86
Then we can create the ANOVA summary ta!le;
)ource )) d2 3) " %
Fetween -(-.'' - '.).,6) ).-' L.,6
Within -68.76 '' -5.6-,
Total 66,.8) '5
). -ecision
Bince >o!s ().-') is M >crit (5.(8)* re+ect /o and conclude that the more people present* the lon"er it taes to "et help.
VIII. .om%arisons ,mong 3eans
In the formal e#ample presented a!ove* we re+ected the null and asserted that the "roups were drawn from different populations. Fut
which "roups are different from which1 A "comparison" compares the means of two "roups. There are two inds of comparisons
that we can perform; "preplanned" and "post hoc". These are outlined !elow. Which approach is used should !e !ased on our "oals.
In reality* the post hoc approach is the one that is most often taen.
Pre%lanned Post hoc
We have a theory (or some previous research) which su""ests
certain comparisons.
/ave a si"nificant overall (or omnibus) > < then want to locali0e
the effect.
In this case* we mi"ht not even compute the omni!us > (this
approach is somewhat analo"ous to a one-tailed test).
Are more commonly used than preplanned comparisons.
In addition* there are "simple" (involvin" two means) and "complex" (involvin" more than two means) comparisons. With three
"roups (2roups '* - < 5)* the followin" ) comparisons are possi!le.
)im%le .om%le$
' vs. - (' I -) vs. 5
' vs. 5 ' vs. (- I 5)
- vs. 5 (' I 5) vs. -
As the num!er of "roups increases* so does the num!er of comparisons that are possi!le. Bome of these can tell us a!out trend (a
description of the form of the relationship !etween the IV and $V).
The pro!lem with post hoc tests is that the type I error rate increases the more comparisons we perform. This is a somewhat
controversial area and there are a num!er of methods currently in use to deal with this pro!lem. We will consider one of the more
simple methods !elow.
The %rotected t test - J4inita!K JBpreadsheetK
It is performed only when the omni!us > is si"nificant. This techni9ue is protected !ecause it re9uires the omnibus > to !e
si"nificant (which tells us there is at least one comparison !etween means that is si"nificant). Bo* in other words* it is
protected !ecause we are not +ust shootin" in the dar.
It uses a more sta!le estimate of the population variance than the t test (i.e.* instead of and as a result the df is
"reater.
The formula is;
Where the dfCs are ' for the numerator and dfw for the denominator. Thus* the >crit for a comparison is different than it was
for the omni!us > ratio (-* '' df = 5.(8) since we are now comparin" a pair of means* and there is --'=' df for the
numerator.
Bo* for our e#ample the critical value of > ('* '' df) is ..8. (from the ta!le) and;
Thus* the only comparison that is si"nificant is that !etween the first and third "roups.
IX. &elation o2 " to t
Bince the > test is +ust an e#tension of the t test to more than two "roups* they should !e related and they are.
With two "roups* " ! t
2
(and this applies to !oth the critical and o!served values).
>or e#ample* consider the critical values for df = ('* '6) with = .,6;
"crit 415 156 ! tcrit 4156
2
O!tainin" the values from the ta!les* we can see that this is true;
4.54 ! 2.131
2
.o%yright 7 188912/12 3. Plons:y5 Ph.-.
Comments? mplonsky@uwsp.edu.

Anda mungkin juga menyukai