Lecture 12

Lecture 12
Simple Recap
T-test
to test whether means of two

groups are the same.
ANOVA to test whether means for two or

more groups are the same.
MANOVA to test whether the vector of
means of the two or more groups are the same.
What does ANOVA do?

ANOVA tests the following hypotheses:
H0: The means of all the groups are equal.
Ha: Not all the means are equal
doesnt say how or which ones differ.
Can follow up with multiple comparisons
Assumptions of ANOVA
each group is approximately normal
standard deviations of each group are
approximately equal
One-Way ANOVA
Partitions Total Variation
Total variation
Variation due to
treatment
Sum of Squares Among
Sum of Squares Between
Sum of Squares Treatment
(SST)
Among Groups Variation
Variation due to
random sampling
Due to individual differences
within groups.
Sum of Squares Within
Sum of Squares Error (SSE)
Within Groups Variation
5
Total Variation
SS Total Y11 Y Y21 Y Yij Y
2
Response, Y
Group 1
Group 2
Group 3
Treatment Variation
SSB ( SStrt ) n1 Y1 Y n2 Y2 Y n p Y p Y
2
Response, Y
Y3
Y
Y1
Group 1
Y2
Group 2
Group 3
Random (Error) Variation

SSW ( SSE ) Y11 Y1 Y21 Y1 Y pj Yp
2
Response, Y
Y3
Y1
Group 1
Group 2
Y2
Group 3
One-Way ANOVA F-Test

Test Statistic
1. Test Statistic SStrt / p 1
F = MStrt / MSE
SSE / n p
MStrt Is Mean Square for Treatment
MSE Is Mean Square for Error
2. Degrees of Freedom
1 = p -1
2 = n - p
p = # Populations, Groups, or Levels
n = Total Sample Size
One-Way ANOVA
Summary Table
Source of Degrees Sum of
Variation
of
Squares
Freedom
Treatment
p-1
SSB
Mean
F
Square
(Variance)
MSB =
MSB
SSB/(p - 1) MSW
Error
n-p
SSW
MSW =
SSW/(n - p)
Total
n-1
SS(Total) =
SSB+SSW
10
One Way Analysis of Variance

Example 1
An apple juice manufacturer is planning to
develop a new product -a liquid concentrate.
The marketing manager has to decide how to
market the new product.
Three strategies are considered
Emphasize convenience of using the product.
Emphasize the quality of the product.
Emphasize the products low price.

Example 1 - continued
An experiment was conducted as follows:
In three cities an advertisement campaign was
launched .
In each city only one of the three characteristics
(convenience, quality, and price) was
emphasized.
The weekly sales were recorded for twenty
weeks following the beginning of the
campaigns.

Convnce
Convnce
Weekly
sales
529
529
658
658
793
793
514
514
663
663
719
719
711
711
606
606
461
Weekly
461
529
529
sales
498
498
663
663
604
604
495
495
485
485
557
557
353
353
557
557
542
542
614
614
Quality
Quality
804
804
630
630
774
774
717
717
679
679
604
604
620
620
697
697
706
706
615
615
492
492
719
719
787
787
699
699
572
572
Weekly
523
523
584
sales
584
634
634
580
580
624
624
Price
Price
672
672
531
531
443
443
596
596
602
602
502
502
659
659
689
689
675
675
512
512
691
691
733
733
698
698
776
776
561
561
572
572
469
469
581
581
679
679
532
532
Defining the Hypotheses
H0: 1 = 2= 3
H1: At least two means differ
Solution: Single factor ANOVA

Anova: Single Factor
SUMMARY
Groups
Convenience
Quality
Price
Count
20
20
20
ANOVA
Source of Variation
Between Groups
Within Groups
SS
57512
506984
Total
564496
Sum
Average Variance
11551
577.55 10775.00
13060
653.00
7238.11
12173
608.65
8670.24
df
SS(Total) = SST + SSE
2
57
MS
28756
8894
P-value
3.23
0.0468
F crit
3.16
59
P-value=0.0468<0.05, argue that at least

one of the mean sales is different than the
others.
SAS CODES FOR ANOVA
Data Sales;
input strategy$ sale @@;
cards;
C 529 Q 804 P 672
.;
run;
proc anova; /* or PROC GLM */

class group;
model sale=strategy;
run;
16
Two-Factor Analysis of Variance

Example 2
Suppose in Example 1, two factors are to be
examined:
The effects of the marketing strategy on sales.
Emphasis on convenience
Emphasis on quality
Emphasis on price
The effects of the selected media on sales.

Advertise on TV
Advertise in newspapers
Difference between the levels of factor A, and Difference between the levels of factor A
difference between the levels of factor B; no
No difference between the levels of factor B
interaction
M R
Level 1 of factor B
e e
s
a p
Level 2 of factor B
n o
n
s
e
Levels of factor A
Levels of factor A
M R
e e
s
a p
n o
n
s
e
1
M R
e e
s
a p
n o
n
s
e
No difference between the levels of factor A.

Difference between the levels of factor B
M R
e e
s
a p
n o
n
s
e
2
Interaction
Levels of factor A
1
Levels of factor A
3
b
2
.1
0
1
.
0
2
.1
5
0
.5
0
0
.0
.1
.0a2
.0
M
e
a
n
d
a
t
Treatment A big effect (A2>A1)

Treatment B mean (B1) is v close to
mean (B2) so no effect
Interaction: When A=1, B1<B2 but when
A =2, B1> B2
A1B1 A1B2
A2B1 A2B2
Effects of Gender (male or female) & dietary

group (sv, lv, nor) on systolic blood pressure
Interaction
Average
Response
No Interaction
male
Average
Response
male
female
sv
lv
nor
female
sv
lv
nor
20
Interaction
1.Occurs When Effects of One Factor
Vary According to Levels of Other Factor
2.When Significant, Interpretation of Main
Effects (A & B) Is Complicated
3.Can Be Detected
In Graph of Cell Means, Lines Cross
21
Two-Way ANOVA
Total Variation Partitioning
Total
Total Variation
Variation
SS(Total)
Variation
VariationDue
Dueto
to
Treatment
TreatmentAA
SSA
Variation
VariationDue
Dueto
to
Interaction
Interaction
SS(AB)
Variation
Variation Due
Dueto
to
Treatment
Treatment BB
SSB
Variation
VariationDue
Dueto
to
Random
Random Sampling
Sampling
SSE
22
Two-Way ANOVA
Null Hypotheses
1.No Difference in Means Due to Factor A
H0: 1. = 2. =... = a.
2.No Difference in Means Due to Factor B

H0: .1 = .2 =... = .b
3.No Interaction of Factors A & B

H0: ABij = 0
23
F tests for the Two-way ANOVA

Test for the difference between the levels of
the main factors
A
and
B
SS(A)/(a-1)
MS(B)
F=
MSE
MS(A)
F=
MSE
Rejection region: F > F,a-1 ,n-ab
SS(B)/(b-1)
SSE/(n-ab)
F > F, b-1, n-ab
Test for interaction between factors A

MS(AB)
and B
SS(AB)/(a-1)(b-1)
F=
Rejection region:
MSE
F > Fa-1)(b-1),n-ab
ANOVA table for two-way data

(with interaction)
Source of
variation
Sums of
squares
Deg. of
freedom
Mean squares
F ratio
Between A
SSA
K-1
MSA= SSA/(K-1)
MSG/MSE
Between B
SSB
H-1
MSB= SSB/(H-1)
MSB/MSE
Interaction
SSAB
(K-1)(H-1)
MSAB=
SSI/(K-1)(H-1)
MSAB/MSE
Error
SSE
KH(L-1)
MSE=
SSE/KH(L-1)
Total
SST
n-1
Test for interaction: compare MSI/MSE with

Test for block effect: compare MSB/MSE with
Test for group effect: compare MSG/MSE with
F( K 1)( H 1), KH ( L 1)
FH 1, KH ( L 1)
FK 1, KH ( L 1)

Example 2 continued
TV
TV
TV
TV
TV
TV
TV
TV
TV
TV
Newspaper
Newspaper
Newspaper
Newspaper
Newspaper
Newspaper
Newspaper
Newspaper
Newspaper
Newspaper
Convenience
Quality
Price
491
712
558
447
479
624
546
444
582
672
464
559
759
557
528
670
534
657
557
474
677
627
590
632
683
760
690
548
579
644
689
650
704
652
576
836
628
798
497
841
575
614
706
484
478
650
583
536
579
795
803
584
525
498
812
565
708
546
616
587
Yijk
Level i Level j
Factor Factor
A
B

Example 2- continued
Test for interaction between factor A and B
H0:No interaction
H1:there exists interaction
MS(AB)MSE
F = MS(Marketing*Media)/MSE = .09
Fcritical = Fa-1)(b-1),n-ab = F.05,(3-1)(2-1),60-(3)(2) = 3.17 (p-value= .9171)
At 5% significance level there is insufficient

evidence to infer that the two factors interact to
affect the mean weekly sales.
SAS CODES FOR ANOVA

proc glm;
class strategy media;
model sale=strategy media strategy* media;
run;
proc glm;
class strategy media;
model sale=strategy media;
run;
28
Multiple Comparisons
When the null hypothesis is rejected, it
may be desirable to find which mean(s)
is (are) different, and at what ranking
order.
Three commonly used statistical
inference procedures:
Fishers least significant difference (LSD)
method
Bonferroni adjustment
Tukeys multiple comparison method
Example 1
MANOVA
DV
Y1=
Y2=
IV
1=
2=
3=
4=
5=
Example 2
You might wish to test the hypothesis that

sex and ethnicity interact to influence a
set of job-related outcomes including
attitudes toward co-workers, attitudes
toward supervisors, feelings of belonging
in the work environment, and
identification with the corporate culture.
Example 3
You might want to test the hypothesis

that three different methods of teaching
writing result in significant differences in
ratings of student creativity, student
acquisition of grammar, and
assessments of writing quality.
How about test multiple hypotheses separately?

For example:
Test m independent hypotheses with level
Type I error rate
= P(at least 1 is falsely rejected)
1- P(no falsely rejected)
1- (1- ) M 1 (as M increases)
Thus
ANOVA vs. MANOVA

Consider the following 2 group
and 3 group scenarios,
regarding two DVs Y1 and Y2
If we just look at the marginal
distributions of the groups on
each separate DV, the overlap
suggests a statistically
significant difference would be
hard to come by for either DV
However, considering the joint
distributions of scores on Y1
and Y2 together (ellipses), we
may see differences.
ANOVA vs. MANOVA

Now we can look for the
greatest possible effect along
some linear combination of Y1
and Y2
The linear combination of the
DVs created makes the
differences among group
means on this new dimension
look as large as possible
So, by measuring multiple DVs you increase your chances
for finding a group difference
ANOVA
Setting:
Group 1: X 11 , X 12 ,..., X 1n1 , i.i.d ~ N(1, 2 )
Group 2: X 21 , X 22 ,..., X 2 n2 , i.i.d ~ N(2 , 2 )
M
Group g: X g1 , X g 2 ,..., X gn2 , i.i.d ~ N(g , 2 )
The random samples from different groups are independent.
Test: H 0 :
1 2 L g
MANOVA
Setting:
Group 1: X 11 , X 12 ,..., X 1n1 , i.i.d ~ N(1 , )
% %
%
%
Group 2: X 21 , X 22 ,..., X 2 n2 , i.i.d ~ N(2 , )
% %
%
%
M
Group g: X g1 , X g 2 ,..., X gn2 , i.i.d ~ N(g , )
% %
%
%
Test: H 0 :
1 2 L g
% %
%
ANOVA
Setting:
Group 1: X 11 , X 12 ,..., X 1n1 , i.i.d ~ N(1, 2 )
Group 2: X 21 , X 22 ,..., X 2 n2 , i.i.d ~ N(2 , 2 )
M
Group g: X g1 , X g 2 ,..., X gn2 , i.i.d ~ N(g , 2 )
Test: H 0 :
1 2 L g
Since:
l ( l )
Xlj
( l )
(overall mean)
(treatment effect)
el j ,
(random error)
el j ~N(0, 2 )
xl j
(observation)
x
(overall sample mean)
( xl x )
(estimated treatment effect)
( xl j x ) 2 ( xl x ) 2 ( xl j xl ) 2 2( xl x )( xl j xl )
nl
nl
( xl j x ) nl ( xl x ) ( xl j xl ) 2
2
j 1
j 1
nl
( x
l 1 j 1
SST
lj
nl
x ) nl ( xl x ) ( xl j xl ) 2
2
l 1
l 1 j 1
SSB
SSW
( xl j xl )
(residual)
ANOVA Table
______________________________________________________
Source of
Sum of
Degrees of
variation
squares (SS)
freedom(d.f.)
______________________________________________________
g
Treatments
SSB nl ( xl x ) 2
g -1
l 1
nl
Residual
SSW= ( xl j xl )
l 1 j 1
l 1
______________________________________________________
g
Total
nl
SST= ( xl j x )
l 1 j 1
l 1
______________________________________________________
If the test statistics: F=
SSB /( g -1)
g
SSW /( nl g )
l 1
g -1,
nl g
l 1
( ), reject H 0 .
MANOVA
Setting:
Group 1: X 11 , X 12 ,..., X 1n1 , i.i.d ~ N(1 , )
% %
%
%
Group 2: X 21 , X 22 ,..., X 2 n2 , i.i.d ~ N(2 , )
% %
%
%
M
Group g: X g1 , X g 2 ,..., X gn2 , i.i.d ~ N(g , )
% %
%
%
Test: H 0 :
1 2 L g
% %
%
Since:
l ( l )
% % % %
Xlj
%
%
(overall mean)
el j ~ N(0, )
%
( l )
% %
(treatment effect)
el j ,
%
(random error)
xl j
( xl x )
%
%
% %
(observation) (overall sample mean) (estimated treatment effect)
( xl j xl )
% %
(residual)
( xl j x )( xl j x ) ' [( xl x ) ( xl j xl )][( xl x ) ( xl j xl )]'

% %% %
% %
% % % %
% %
=( xl x )( xl x ) ' ( xl j xl )( xl x ) ' ( xl x )( xl j xl ) ' ( xl j xl )( xl j xl ) '
% %% % % % % % % %% %
% % % %
nl
nl
( xl j x )( xl j x ) ' nl ( xl x )( xl x ) ' ( xl j xl )( xl j xl ) '

%% %
% % % % j 1 % % % %
j 1 %
g
nl
nl
( x% x%)( x% x%) ' n ( x% x%)( x% x%) ' ( x% x% )( x% x% ) '

l 1 j 1
lj
lj
T
Total sum of squares
and cross products
l 1
l 1 j 1
Between sum of squares

and cross products
lj
lj
W
Within sum of squares
and cross products
MANOVA Table
______________________________________________________________
Source of
variation
Sum of
squares (SS)
Degrees of
freedom(d.f.)
______________________________________________________________
g
Treatments
B nl ( xl x )( xl x ) '
% %% %
l 1
nl
Residual
W= ( xl j xl )( xl j xl ) '
% % %
l 1 j 1 %
g -1
g
n
l 1
______________________________________________________________
g
Total
nl
T= ( xl j x )( xl j x ) '
%% %
l 1 j 1 %
n
l 1
______________________________________________________________
The Multivariate Test of Significance

Unlike the univariate situation, where a
single ratio (i.e., SSB/SSW adjusted for
degrees of freedom) can be used to test
the null hypothesis, in the multivariate
case the test of statistical significance is
based on the entire W 1 B matrix.
Let 1 , 2 , L , s 0 denote the s nonzero eigenvalues of
W -1B.
It can be shown that

equals SSB/SST (i.e., the proportion of
explained variance) for Yi.

The MANOVA test of significance can be viewed
as a discriminant problem. We want to find a set
of variates often called discriminant function
variates, each of which represents a linear
combination of the original variables to best
separate the groups.
The first such variate, which we will label Y1, is that linear
combination of the original variables that has the property that
SSB/SSW of Y1 is maximized. In other words, SSB/SSW is less for
any other linear combination of the original variables. That is, a new
dimension is defined in the original space in such a way that group
differences are maximized on this dimension.
When g and n are big, It is possible to find a second variate (i.e.,

dimension) that accounts for some between-group variation left
unexplained by the first variate. The second such variate, Y2,
maximizes SSB/SSW subject to the constraint that scores on Y2 be
uncorrelated with scores on Y1.
In general, the total number of such variates that may be

formed will be represented by s, completely accounts for all of
the between-group variation in the sample.
Recall (in discriminant analysis): Find the linear

combination Y=aTX such that the between-class variance is
maximized relative to the within-class variance.
max
between-class variance of Y
within-class variance of Y
aT Ba
max T
a
a Wa
where B : between-class covariance of X

W : within-class covariance of X
Result:
Let 1 , 2 ,L , s 0 denote the s nonzero eigenvalues of
W -1B
and
e1 , e 2 , K , es be the corresponding eigenvectors (Scaled so that e'Spooled e=1).

Then the vector of coefficients a that maximizes the ratio
a'Ba
a'Wa
is given by a1 e1. The linear cmbination e1 ' x is called the sample first
discriminant. The choice a 2 e 2 produces the sample second discrimiant

e 2 ' x, and so on.
%
Review: Factorial ANOVA

Example
Tests of Hypotheses:
(1) There is no significant main effect for education level (F(2, 58) = 1.685, p = .
194, partial eta squared = .055) (red dots)
(2) There is no significant main effect for marital status (F (1, 58) = .441, p = .
509, partial eta squared = .008)(green dots)
(3) There is a significant interaction effect of marital status and education level (F
(2, 58) = 3.586, p = .034, partial eta squared = .110) (blue dots)
Plots of Interaction Effects

Estimated Marginal Means of TIMENET
9
8
Estimated Marginal Means
7
6
5
4
MarriedorNot
Married/Partner
NotMarried/Partner
HighSchool
CollegeorNot
SomePostHigh
CollegeorMore
Education Level is plotted

along the horizontal axis and
hours spent on the net is
plotted along the vertical
axis. The red and green lines
show how marital status
interacts with education
level. Here we note that
spending time on the
Internet is strongest among
the Post High School group
for single people, but lowest
among this group for married
people
MANOVA Example
Lets test the hypothesis that region of the

country (IV) has a significant impact on three
DVs, Percent of people who are Christian
adherents, Divorces per 1000 population, and
Abortions per 1000 populations. The hypothesis
is that there will be a significant multivariate main
effect for region. Another way to put this is that
the vectors of means for the three DVs are
different among regions of the country
This is done with the General Linear Model/
Multivariate procedure in SPSS.
Computations are done using matrix algebra to
find the ratio of the variability of B (BetweenGroups sums of squares and cross-products
(SSCP) matrix) to that of the W (Within-Groups
SSCP matrix)
South
Midwest
MY1
MY1
MY2
My3
MY2
My3
Vectors of means
on the three DVs
(Y1, Y2, Y3) for
Regions South and
Midwest
MANOVA test of Our

Hypothesis
First we will look at the overall F test (over all three dependent variables). What we
are most interested in is a statistic called Wilks lambda (), and the F value
associated with that. In the case of our IV, REGION, Wilks lambda is .465, and has
an associated F of 3.90, which is significant at p. <001. Thus the null hypothesis
was rejected.
Looking at the Individual

Dependent Variables
If the overall F test is significant, then its common
practice to go ahead and look at the individual
dependent variables with separate ANOVA tests
You should divide your confidence levels by the number
of tests you intend to perform to control the type I error
rate, so in this case if you expect to look at F tests for the
three dependent variables you should require that p < .
017 (.05/3)
This procedure ignores the fact the variables may be

intercorrelated and that the separate ANOVAS do not
take these intercorrelations into account.
Univariate ANOVA tests of

Three Dependent Variables
Above is a portion of the output table reporting the ANOVA tests on the three
dependent variables, abortions per 1000, divorces per 1000, and % Christian
adherents. Note that only the F values for %Christian adherents and Divorces per
1000 population are significant at your criterion of .017. (Note: the MANOVA
procedure doesnt seem to let you set different p levels for the overall test and the
univariate tests, so the power here is higher than it would be if you did these tests
separately in a ANOVA procedure and set p to .017 before you did the tests.)
SAS Code
See page 314 in the textbook Applied
Multivariate statistical analysis, by
Richard A. Johnson, Dean W. Wichern,
China Statistics Press, 2003.

Lecture 12

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Lecture 12

Diunggah oleh

Hak Cipta:

Format Tersedia

Lecture 12

to test whether means of two

ANOVA to test whether means for two or

What does ANOVA do?

Random (Error) Variation

One-Way ANOVA F-Test

One Way Analysis of Variance

One Way Analysis of Variance

One Way Analysis of Variance

Defining the Hypotheses

Solution: Single factor ANOVA

SS(Total) = SST + SSE

P-value=0.0468<0.05, argue that at least

SAS CODES FOR ANOVA

proc anova; /* or PROC GLM */

Two-Factor Analysis of Variance

The effects of the selected media on sales.

No difference between the levels of factor A.

Treatment A big effect (A2>A1)

Effects of Gender (male or female) & dietary

2.No Difference in Means Due to Factor B

3.No Interaction of Factors A & B

F tests for the Two-way ANOVA

F > F, b-1, n-ab

Test for interaction between factors A

ANOVA table for two-way data

Test for interaction: compare MSI/MSE with

F tests for the Two-way ANOVA

F tests for the Two-way ANOVA

At 5% significance level there is insufficient

SAS CODES FOR ANOVA

You might wish to test the hypothesis that

You might want to test the hypothesis

How about test multiple hypotheses separately?

ANOVA vs. MANOVA

ANOVA vs. MANOVA

(estimated treatment effect)

If the test statistics: F=

( xl j x )( xl j x ) ' [( xl x ) ( xl j xl )][( xl x ) ( xl j xl )]'

( xl j x )( xl j x ) ' nl ( xl x )( xl x ) ' ( xl j xl )( xl j xl ) '

( x% x%)( x% x%) ' n ( x% x%)( x% x%) ' ( x% x% )( x% x% ) '

Between sum of squares

The Multivariate Test of Significance

Let 1 , 2 , L , s 0 denote the s nonzero eigenvalues of

It can be shown that

The Multivariate Test of Significance

The Multivariate Test of Significance

When g and n are big, It is possible to find a second variate (i.e.,

In general, the total number of such variates that may be

Recall (in discriminant analysis): Find the linear

where B : between-class covariance of X

e1 , e 2 , K , es be the corresponding eigenvectors (Scaled so that e'Spooled e=1).

discriminant. The choice a 2 e 2 produces the sample second discrimiant

Review: Factorial ANOVA

Plots of Interaction Effects

Estimated Marginal Means

Education Level is plotted

Lets test the hypothesis that region of the

MANOVA test of Our

Looking at the Individual

This procedure ignores the fact the variables may be

Univariate ANOVA tests of

Anda mungkin juga menyukai