Anda di halaman 1dari 10

11/21/2013

Unit 11 Outline
Stat 101
Analysis of Variance (ANOVA)
General format and ANOVAs F-test
Unit 11: Analysis of Variance Link to binary regression
Contrast testing
Textbook Chapter 12
Multiple comparisons (& the Bonferroni correction)
Two-way ANOVA (and Multi-way ANOVA)

22

Recall the Jumping Rats Example


Analysis of Variance (ANOVA)
As always, first visualize the data:
ANOVA extends hypotheses that we have seen already
Groups
With one population:
1 No jumping
2 30 cm jump H0: = 0
3 60 cm jump With two populations:
H0: 1 = 2
Means & SDs
1 601 27.4 Now consider inferences for g = 3 or more populations
2 613 19.3 (technically, g > 2):
3 639 16.6 H0: 1 = 2 = . . . = g
Analysis technique for this situation is ANOVA
Wed like to do a t-test, but theres no formula for 3 groups
Solution: Analysis of Variance (ANOVA)
3 4

1
11/21/2013

Main concept of ANOVA Within group variability relatively large, adds


noise, obscures group differences
With I populations (groups), there are two types of variability in the
data:
(A) Variation of individual values around their group means
(variability within groups)
(B) Variation of group means around the overall mean
(variability between groups)
Main concept: If (A) is small relative to (B), this implies the group
means are different.
ANOVA determines whether variability in data is mainly from
variation within groups or variation between groups. That is, does
labelling the data into these groups lead to more than chance
differences in the group means Within group variability relatively small, group
differences not obscured by noise
Think Boxplots split by groups

55 66

One-way ANOVA the model Model for one-way ANOVA with g = 3 groups
The one-way ANOVA model for a quantitative variable X is
xij = i + ij
for i = 1, 2, . . . , g groups (subpopulations)for a specific factor
and j = 1, 2, . . . , ni individuals sampled in each group
The ij are assumed to be ~ N(0, )
Model parameters are 1, 2, . . . , g and (one common SD!)
Model can be written as (just like regression!)

DATA = MODEL + RESIDUAL

Note: we used the symbol x here to represent the measured Residuals ij are assumed to be ~ N(0, )
variable, but we could have chosen y instead (to mimic
regressions response variable) So only difference among the groups are differing means
77 88

2
11/21/2013

One-way ANOVA the data Variance within groups


SRSs from each of the I populations (groups): Since we assumed equal s for all g groups, all sample SDs
Sample size n1 n2 ng should be estimating the common
Observations x1j x2j xgj Thus we combine them in a pooled estimate (same idea as for
the 2-sample pooled t-testwhich we skipped )
for j = 1, 2, , ni Pooled estimate of 2 (the variance within groups) is
g
x2 x g
Sample means x1
Sample SDs s1 s2 sg (n1 1) s (n2 1) s ... (ng 1) s
2 2 2 (n 1)s
i
2
i
s i 1
2 1 2 g

ng ng
W

Practical rule for examining SDs in ANOVA (checking SSError


assumption of a common 2). If largest s is less than twice the MSE
smallest s, the ANOVA results will be approximately correct. dfError
Check to see if the ratio holds: slargest The subscript W refers to this within groups estimate
2 Also known as the mean square within groups (MSW) and
ssmallest mean square error (MSE)
99 1010

Variance between groups Concept behind the test


If the null hypothesis, H0: 1 = 2 = . . . = g, is true then examining If H0: 1 = 2 = . . . = g is true then sW and sB both estimate
individual sample means is as if we are sampling g times from the same and should be of similar magnitude
population, with mean and standard deviation
If H0: is not true, the between groups estimate of 2 will, in
Recall from the sampling distribution of sample means: x ~ N ,
n general, be larger than the within groups estimate of 2
so under H0, 2 can be estimated by g Therefore a test of H0: 1 = 2 = . . . = g can be based on a
n1 ( x1 x ) n2 ( x2 x ) ... ng ( x g x )
2 2 n (x x)
i i
2
comparison (ratio) of the between groups and within groups
s B2 i 1
estimates of 2
g 1 g 1
Examine this ratio of variance estimates as an F test
SSGroups
MSG
dfGroups s B2 MS between groups MSGroups
The subscript B (on sB) refers to this between groups estimate
F
sW2 MS within groups MSError
Also known as the mean square between groups (MSB or MSG) or
mean square of the model (MSM)
Only a valid estimate of 2 if H0: true, otherwise inflated
1111 1212

3
11/21/2013

One-way ANOVA: F-test Analysis of variance (ANOVA)


To test H0: 1 = 2 = . . . = g As for simple and multiple linear regression, for one-way
HA: not all i are equal, use the F statistic ANOVA the total sum of squares is decomposed by:

SST = SSG + SSE


MSG SSG / df G
F The degrees of freedom are now partitioned
MSE SSE / df E
DFT = DFG + DFE
(n - 1) = (g - 1) + (n - g)
This test statistic has an F distribution with g 1 and N g
degrees of freedom when H0 is true The mean squares (MS) are formed the same way

SS Sum of Squares
Reject H0: for large values of F; if F > F*, g 1, N - g or the MS
corresponding p-value is less than . df degrees of freedom

13 1414
13

Solution:
One-way ANOVA table H0: 1= 2 = 3
Source SS DF MS F HA: at least one i is
Groups different
(Between)
SSG g1 s B2 = SSG/DFG F = MSG/MSE
g
Error 2 SSG ni ( xi x ) 2
(Within) SSE ng sW = SSE/DFE i 1

Total SST n1 SST/DFT



10 (601.1 617.43) 2 10 (612.5 617.43) 2 10 (638.7 617.43) 2
7433.867 MSG SSG / df G 7433.867 /(3 1) 3716.9
g
The F statistic tests if there is a difference among any of the SSE (ni 1) si2 (9 27.364 2 ) (9 19.329 2 ) (9 16.594 2 )
i 1
g population means
12579.84 MSE SSE / df E 12579.84 /(10 10 10 3) 465.9

MSE is still our estimate of 2: the variance of the residuals F = 3716.9 / 465.9 = 7.978
or the average variance of the observations from their group F* = 3.35. Since our F > F*, we reject H0. The rats have
means (sometime called the pooled variance estimate). differing mean bone densities in the different treatment groups.

15 1616

4
11/21/2013

ANOVA Results from SPSS


Unit 11 Outline
In SPSS:
Analysis of Variance (ANOVA)
Analyze Compare Means One-Way ANOVA
General format and ANOVAs F-test
Link to binary regression
Contrast testing
Multiple comparisons (& the Bonferroni correction)
Two-way ANOVA (and Multi-way ANOVA)

17 1818
17

The link between ANOVA and Regression


ANOVA treats the groups as `nominal variables, i.e., variables A few thoughts about ANOVA
whose coding gives the groups a name but no numerical meaning
A naive application of regression here might use the codes
ANOVA is easy to calculate
no jump = 1, low jump = 2, high jump = 3 in a single predictor, but
this is mathematically incorrect! However, regression is more general because additional
A correct way to apply regression here is to create (g 1) binary continuous variables can be added
variables (variables coded (0 or 1), sometimes called dummy ANOVA generalizes to several categorical predictor
variables) that recreate the groups:
variables/factors (Ch 13), in two-way and higher ANOVA.

Note: the term ANOVA is thrown around a lot. It can refer


to the ANOVA setting/procedure where you are looking at
a quantitative variable across multiple groups (or
treatments). It can also refer to the ANOVA table (that is
calculated under both regression and ANOVA settings).

19 19 2020

5
11/21/2013

Unit 11 Outline Steps in a Complete ANOVA Procedure


1. Examine the data, checking assumptions
Analysis of Variance (ANOVA) 2. If possible, formulate in advance some working (alternative)
hypotheses about how the population group means might differ.
General format and ANOVAs F-test 3. Check the evidence against the global null hypothesis of no
Link to binary regression differences among the groups by calculating the F-test
Contrast testing 4. If the F-test is significant (that is, if it leads to a rejection of the
Multiple comparisons (& the Bonferroni correction) null hypothesis of equal population group means):
Test the individual hypotheses specified at the second step
Two-way ANOVA (and Multi-way ANOVA)
If there was not enough information to formulate working
hypotheses, test all pairwise comparisons of means,
adjusting for multiple comparisons (example also coming)

21 2222
21

Contrasts Contrasts testing hypotheses


After the omnibus F test has shown overall significance investigate
other comparisons of groups using contrasts can be performed To test the hypothesis
A contrast is a linear combination of is H 0 : 0 versus H : 0
A
ai ( i ) where ai 0 ( is the Greek psi). we use a t-test much like the pooled t-test:
The corresponding contrast of sample means is c ai ( xi )

In this rats example, what might be an interesting comparison of the t


a x i i

ai2
n
3 groups involved (no jump, low jump, high jump)?
sp
For example, consider the bone density study.
i
To compare control (group 1) versus treatment (groups 2 and 3) use.
= 1 (2 + 3)/2 = (1)1 +(-1/2)2 + (-1/2)3 This t-test has a t distribution with degrees of freedom
associated with sp under H0 (test can be 1-sided or 2-sided)
To compare levels of jumping (group 2 versus group 3) use
= 2 3 = (0)1 +(1)2 + (1)3
23
23 2424

6
11/21/2013

Example bone density Example bone density


Analysis of variance for bone density 2 Does jumping (any level) affect bone density?
s p Set-up hypotheses:
H 0 : Control 0.5( lowjump highjump ) 0
H A : Control 0.5( lowjump highjump ) 0
Calculate t-test

t
a x i i

(1)601.1 (0.5)612.5 (0.5)638.7
Reject H0: 1 = 2 = 3 if our calculated F > F*0.05,df=2,27 = 3.35 a 2
12 (0.5) 2 (0.5) 2
Omnibus test: P = 0.002, reject H0 and conclude there is a
sp n i
465.91
10

10

10
i
difference in bone density among the three groups.
24.5
Where does that difference lie? Lets look at the comparisons (via 2.93
8.36
contrasts): Calculate the p-value: df = N I = 30 3 = 27.
(Group 1) versus (Groups 2 and 3) p-value = 2*P(t<-2.93) < 2(0.005) = 0.01.
(Group 2) versus (Group 3) So reject H0 and conclude any kind of jumping improves bone density
25 over no jumping at all. 26
25 26

Example bone density


Does the level of jumping affect bone density? Unit 11 Outline
Look at group 2 versus group 3 using the contrast
H 0 : (0) Control (1) lowjump (1) highjump 0
H A : (0) Control (1) lowjump (1) highjump 0 Analysis of Variance (ANOVA)
General format and ANOVAs F-test
t
a x i i

(0)601.1 (1)612.5 (1)638.7 Link to binary regression
a2 02 (1) 2 (1) 2
sp ni 465.91
Contrast testing
i 10 10 10 Multiple comparisons (& the Bonferroni correction)
26.2 Two-way ANOVA (and Multi-way ANOVA)
2.714
9.653

So reject H0 (since our p-value < 0.02 < ) and conclude


higher jumping affects bone density differently than
moderate jumping (in fact, it makes bones denser).
27
27 2828

7
11/21/2013

The multiple comparisons problem The Bonferroni correction


A solution to the multiple comparisons problem is the adjustment of
To test H0: 1 = 2 = . . . = g, why not simply conduct multiple two- levels using the Bonferroni correction
sample t-tests? This correction is a conservative solution
For example, with g = 9 there are 36 possible pair-wise Suppose we wish to perform all possible pairs of comparisons
t-tests. What is the probability of rejecting a true H0 at least once?
among g groups:
P(Type I error) = 1 (0.9536) = 0.84
This inflated Type I error is due to multiple comparisons: we have g g! g ( g 1)
looked at multiple tests at once (each with = 0.05), and thus it will There are such comparisons

2 2! ( g 2 )! 2
lead to significant results that are not truly there (simply by chance).
In general, we would like the probability of a type I error to be some The Bonferroni correction - to protect the overall level of we must
fixed value (e.g., = 0.05)
perform each individual test at level *
This is accomplished using the overall ANOVA F-test

What happens when we start using multiple contrasts? *
If we dont have any contrasts pre-specified, we can just look at all g
the pairwise two-sample t-tests, but adjust .

2
29 3030
29

Example - Bonferroni correction Bonferroni Correction in SPSS


Then the pairwise comparisons can be made when you run the
Suppose we wish to perform pair-wise comparisons among 3
one-way ANOVA. Just click on the Post Hoc button, and select
groups but still maintain an overall = 0.05 Bonferroni:
If g = 3, there are Note: these confidence
intervals are wider
3 3!
3 possible comparisons: than those from the
regression (or if you

2 2! (1)!
did them separately
(Group 1 versus 2), (1 versus 3), and (2 versus 3) based on the t-stats for
comparing two
The Bonferroni correction says that if we want an overall level means). Here they use
of < 0.05, then we do each of the 3 tests at the (gotten from an online
* = 0.05 / 3 = 0.0167 level calculator):
t* 0.167 / 2,df 27 2.552
Thus, with each test at level * = 0.0167, this Bonferroni x1 x2 t df* 27 se x1 x2
correction gives an overall < 0.05
11.4 2.5529.653 (13.24, 36.04)
31
31 32

8
11/21/2013

Using Multiple Comparisons to your Unit 11 Outline


Advantage (bad use of statistics!!!)
Analysis of Variance (ANOVA)
General format and ANOVAs F-test
Link to binary regression
Contrast testing
Multiple comparisons (& the Bonferroni correction)
Two-way ANOVA (and Multi-way ANOVA)

33 3434

Extension to Two-way ANOVA (and beyond!) Two-way ANOVA in SPSS


The example we will use is trying to predict the ln_text messages by
We can expand ANOVA to include an additional factor (a Harvard College students based on their class year and sex.
second grouping variable) call it two-way ANOVA In SPSS: Analyze General Linear Model Univariate
For example, we could predict text messaging based on
class year (with g = 4 groups) and sex (with g = 2
groups), and even a third variable like house (g = 12?)
or a fourth like concentration (with g = ???).
Algebraically, things get a little more complicated, but we
wont focus on that in this course
Having multiple grouping variables (just like predictor
variables) also allows us to include interaction terms as
well
Easy to perform in SPSS!!!
Click on the Model button, select Custom, and drag the grouping
Analyze General Linear Model Univariate variables over into the model. Click Continue then OK.
35
35 3636

9
11/21/2013

Two-way ANOVA in SPSS Two-way ANOVA in SPSS


(with interaction) (with interaction)
Here are the results in SPSS for this sample if n = 169 students We can also consider the interaction between the two grouping
What should be the 3 different sets of degrees of freedoms here? variables here (sex and class year)
What would the interaction term represent?
The effect of class year on # text messages sent may be
different for men than women
Equivalent to saying that the effect of women on # text
messages sent may be different in the 4 class years
Maybe mens average decreases over the 4 years, but
womens average stays the same
What should be the degrees of freedom for the interaction
between these two variables?
The Corrected Model row tests for the model overall, and the
(g1-1)*(g2-1) = (4-1)*(2-1) = 3
gender and classyear rows tests for each variable separately.
Note: you can ignore the Intercept and Total rows, and always
compare to the Corrected Total row. SSs will not add up.
37 3838
37

Two-way ANOVA in SPSS


(with interaction) ANOVA: Main Points
Here are the results in SPSS for this sample if we include the
interaction between the grouping variables as well (which is Simple idea behind ANOVA: significant difference among 2 or
sometimes called the full factorial ANOVA modelthis is the more groups if variability between groups (differences among the
default in SPSS). group means) is significantly larger than variability within groups
(differences from mean within each group) [F-test].

If there is evidence of difference among groups, then an a priori


hypothesis can be tested via a contrast t-test, or some care must be
taken in searching for the group pairs that are significantly different
(using Bonferroni correction).

ANOVA can be expanded to include multiple grouping variables


(Multi-way ANOVA). Algebra is hard, but SPSS does the work
for us . Interactions can then be considered.

39
39 4040

10

Anda mungkin juga menyukai