Anda di halaman 1dari 33

Chapter 5 1

 Decide on the basis of sample data whether there


really is a difference in the effectiveness of three
methods of teaching a foreign language;

 Compare the average yields per acre of six varieties of


wheat;

 Difference in the average mileage obtained with four


kinds of gasoline.

Chapter 5 2
 Arises frequently as the null
distribution of a test statistic
especially in likelihood-ratio test
and most notably in Analysis of
Variance.
 Comparing statistical models that
have been fit to the data.
 F cannot be negative, and it is a
continuous distribution.
 The F distribution is positively
skewed.
 Its values range from 0 to 
 As F   the curve approaches the
X-axis.

Chapter 5 3
3
Chapter 5 4
 The F distribution is also used for testing whether two or
more sample means came from the same or equal
populations.

 the approach that allows us to use sample data to see if the


values of three or more unknown population means are
likely to be different

Assumptions:
 The sampled populations follow the normal distribution.
 The populations have equal standard deviations.
 The samples are randomly selected and are independent.
Chapter 5 5
5
Whether the differences between the groups are
significant depends on

• the difference in the means


• the standard deviations of each group
• the sample sizes

Chapter 5 6
 A division of the overall variability in data values in
order to compare means.
 Overall (or “total”) variability is divided into two
components:
 the variability “between” groups, and
 the variability “within” groups
 Summarized in an “ANOVA” table.

Chapter 5 7
Two variables: 1 independent variable,
1 dependent variable

Main Question: Do the (means of) the quantitative


variables depend on independent variable?

If independent variable has only 2 values:


• 2-sample t-test

ANOVA allows for 3 or more groups


Chapter 5 8
 To use the one-way ANOVA test, the following
assumptions must be true

 The population under study have normal


distribution

 The samples are drawn randomly, and each sample


is independent of the other samples.

 All the populations from which the samples values


are obtained, have the same unknown population
variance, that is for k number of populations,

 12   22    k2
Chapter 5 9
Comparing Means of Two or More
Populations
• State the hypothesis
• H0: µ1 = µ2 =…= µk
Step 1 • H1: The means are not all equal

ANOVA
table! • Test Statistics
SST k  1
Step 2 F
SSE n  k 

• Critical value
Step 3
Chapter 5 10
10
Comparing Means of Two or More
Populations

• Decision
Step 4 • Reject H0 if F > F,k-1,n-k

• Conclusion
Step 5

Chapter 5 11
11
 Only one classification factor is considered

Factor
1 Response/ outcome/
dependent variable
Treatment 2
(samples)
(The level of
the factor)

Replicates (1,…,j)
The object to a
given treatment Chapter 5 12
Degrees Sum of Squares (SS) Mean of Squares (MS) F Value
of
Freedom
(Df)

Between Ftest
sample  2
 MS (Tr )
 Treatment
Treatment / Factor 2
1 k 2 1 2
(Factor
SS (Tr )   Ti.  T..
Variation) k -1 n i 1 kn

SS (Tr )  / Factor

k 1  Error
2

Within
samples
SSE
(Error T- k SSE  SST  SS (Tr )  Error
2
 MSE 
variation)
N k

k n
1 2
Total T-1 SST   xij2  T..
i 1 j 1 kn

Reject Ho if Ftest  F ,k 1,T  k T = kn

Chapter 5 13
• n = number of data all together
• a = number of levels
• x = mean for entire data set is

Group i has
• ni = # of data in level i
• xij = value for data j in level i
• xi = mean for level i
• si = standard deviation for level i

Chapter 5 14
 An experiment was performed to determine whether the
annealing temperature of ductile iron affects its tensile
strength. Five specimens were annealed at each of four
temperatures. The tensile strength (in ksi) was measured
for each temperature. The results are presented in the
following table. Can you conclude that there are differences
among the mean strengths?

Temperature Sample Values


(oC)

750 19.72 20.88 19.63 18.68 17.89


800 16.01 20.04 18.10 20.28 20.53
850 16.66 17.38 14.49 18.21 15.58
900 16.93 14.49 16.15 15.53 13.25

Chapter 5 15
Subjects: 25 patients with blisters
Treatments: Treatment A, Treatment B, Placebo
Measurement: # of days until blisters heal

Data [and means]:


• A: 5,6,6,7,7,8,9,10 [7.25]
• B: 7,7,8,9,9,10,10,11 [8.875]
• P: 7,9,9,10,10,10,11,12,13 [10.11]

Are these differences significant?


Chapter 5 16
Brand1 Brand2 Brand3 Brand4 Brand5
194 189 185 183 195
184 204 183 193 197
189 190 186 184 194
189 190 183 186 202
188 189 179 194 200
186 207 191 199 211
195 203 188 196 203
186 193 196 188 206
183 181 189 193 202
188 206 194 196 195
Chapter 5 17
ANOVA measures two sources of variation in the data
and compares their relative sizes

• variation BETWEEN groups


• for each data value look at the difference
between its group mean and the overall mean
xi  x 
2

• variation WITHIN groups


• for each data value we look at the difference
between that value and the mean of its group
x ij  xi 
2
Chapter 5 18
We want to measure the amount of variation due
to BETWEEN group variation and WITHIN group
variation

For each data value, we calculate its contribution


to:
• BETWEEN group variation: xi  x  2

• WITHIN group variation:
( x ij  xi ) 2

Chapter 5 19
The ANOVA F-statistic is a ratio of the Between
Group Variaton divided by the Within Group
Variation:

Between MSA
F 
Within MSE
A large F is evidence against H0, since it
indicates that there is more difference
between groups than within groups.
Chapter 5 20
SST   ( xij  x ) 2
  x 
 x 2 ij
2

ij
obs n
SSE   ( xij  xi ) 2

obs

SSA   ( xi  x)  
2
( xi ) 2


 x  ij
2

obs ni n
SS MSA
SST  SSA  SSE ; MS  ; F
DF MSE
Chapter 5 21
One-way Analysis of Variance

Source DF SS MS F P
Factor a-1 SS(Between) MSA MSA/MSE
Error n-a SS(Error) MSE
Total n-1 SS(Total)
From F-distribution
with a-1 numerator and
n-a denominator d.f.

MSA = SS(Between)/(a-1)
n-1 = (a-1) + (n-a) MSE = SS(Error)/(n-a)

SS(Total) = SS(Between)Chapter 5
+ SS(Error) 22
“F” means “F test statistic”
One-way Analysis of Variance

Source DF SS MS F P
Factor 2 2510.5 1255.3 93.44 0.000
Error 12 161.2 13.4
Total 14 2671.7
P-Value

“Source” means “find the components of variation in this column”

“DF” means “degrees of freedom”


“SS” means “sums of squares”
Chapter 5
“MS” means “mean squared” 23
One-way Analysis of Variance

Source DF SS MS F P
Factor 2 2510.5 1255.3 93.44 0.000
Error 12 161.2 13.4
Total 14 2671.7

“Factor” means “Variability between groups” or “Variability due to


the factor of interest”

“Error” means “Variability within groups” or “unexplained random


variation”
Chapter 5
“Total” means “Total variation from the grand mean”24
One-way Analysis of Variance

Source DF SS MS F P
Factor 2 2510.5 1255.3 93.44 0.000
Error 12 161.2 13.4
Total 14 2671.7

1255.2 = 2510.5/2
13.4 = 161.2/12
14 = 2 + 12 93.44 = 1255.3/13.4
2671.7 = 2510.5 + 161.2 Chapter 5 25
 1 1
 X1  X 2   t MSE  n  n 
1 2

 where t is obtained from the t table with degrees of


freedom (n - k).
 MSE = [SSE/(n - k)]

Chapter 5 26
26
Chapter 5 27
 2 Independent Variables

 Examples
 IV#1 IV#2 DV
 Drug Level Age of Patient Anxiety Level

 Type of ExerciseType of Diet Weight Change

 Key Advantages
 Compare relative influences on DV
 Examine interactions between IV

Chapter 5 28
 Only two classification factor is considered

Factor B
1 2 j
1
Factor A 2

i
Chapter 5 29
 The standard two-way ANOVA tests are valid under the following
conditions:

 The design must be complete


 Observations are taken on every possible treatment
 The design must be balanced
 The number of replicates is the same for each treatment
 The number of replicates per treatment, k must be at least 2
 Within any treatment, the observations are a simple random sample
from a normal population
 The sample observations are independent of each other (the samples
are not matched or paired in any way)
 The population variance is the same for all treatments.

xij1 , , xijk

Chapter 5 30
Source (Df) Sum of Squares (SS) Mean of Squares (MS) F Value

1 a 2 x...2 SSA MSA Row


A a-1 SSA   xi..  MSA  Ftest 
bn i 1 abn a 1 MSE effect

x...2 SSB MSB Column


B b- 1 SSB 
1 b 2
 x. j.  abn MSB  Ftest 
an j 1 b 1 MSE effect

1 a b 2 x...2
SSAB   xij . 
SSAB MSAB
Ftest  Interaction
(a-1)(b-1)
Interaction MSAB 
n i 1 j 1 abn  a  1 b  1 MSE effect

Error ab(n-1) SSE  SST  SSA SSE


MSE 
SSB  SSAB ab  n  1

Total abn-1 a b n
x...2
SST   x  2
ijk
i 1 j 1 k 1 abn

Chapter 5 31
 A chemical engineer is studying the effects of various
reagents and catalyst on the yield of a certain process. Yield
is expressed as a percentage of a theoretical maximum. 4
runs of the process were made for each combination of 3
reagents and 4 catalysts. Construct an ANOVA table and
test is there an interaction effect between reagents and
catalyst.
Reagent
Catalyst
1 2 3

A 86.8 82.4 86.7 93.4 85.2 77.9 89.6


83.5 94.8 83.1 89.9 83.7
B 71.9 72.1 74.5 87.1 87.5 82.7
80.0 77.4 71.9 84.1 78.3 90.1
C 65.5 72.4 66.7 77.1 72.7 77.8
76.6 66.7 76.7 86.1 83.5 78.8
D 63.9 70.4 73.7 81.6 79.8 75.7
77.2 81.2 84.2 84.9 80.5 72.9

Chapter 5 32
 Miller, I and Miller, M (1999). Mathematical Statistics, 6th Ed.
Prentice-Hall, Inc.

 Stephens, L.J. (2006). Beginning Statistics, 2nd Ed. McGraw-Hill


Companies, Inc.

Chapter 1 33

Anda mungkin juga menyukai