STATISTICAL INFERENCE:
HYPOTHESIS TESTS
1. The Concept of Hypothesis Testing
2. The General Methodology of Hypothesis Testing
2.1. The Procedure
2.1.1. The Null Hypothesis versus the Alternative Hypothesis
2.1.2. The Type I Error versus Type II Error
2.1.3. Two-Tailed Hypothesis Tests versus One-Tailed Hypothesis Tests
2.1.3.1.
Two-Tailed Tests
2.1.3.1.1. Decision Rules
2.1.3.1.2. The Relationship Between the Confidence Interval and the
Acceptance Region for the TOH
2.1.3.1.3. Type I and Type II Errors Revisited
2.1.3.2.
One-Tailed Tests
2.1.3.2.1.
Lower Tail Test
2.1.3.2.2.
Upper Tail Test
2.1.4. How to Set Up the Null and Alternative Hypotheses
3. Hypothesis Test for the Small Samples From Normal Populations
4. Test of Hypothesis on Population Proportion
L ,U = x z / 2 se( x )
se ( x )=s / n . Note that the interval is built around
z / 2 se( x ) is the familiar marginal errorMOE. The MOE
where
Page 1 of 24
Example 1
Casual observation of vehicle speed on a freeway indicates that most vehicles exceed the
speed limit of 70 mph. Suppose we want to test the hypothesis that mean speed is 75 mph.
Accordingly, a random sample of
n=110 vehicles is secretly clocked, yielding the
following data:
65
74
82
73
80
86
80
69
87
83
78
83
84
66
81
80
90
77
84
80
67
62
91
69
92
84
69
64
65
84
82
68
75
65
88
86
89
85
66
76
92
76
66
85
88
78
84
83
83
81
64
91
76
88
89
69
64
79
66
78
90
81
72
66
77
84
64
65
65
87
62
83
75
78
74
92
84
87
86
89
82
87
78
72
73
68
91
76
90
87
76
72
85
71
67
86
62
89
70
73
68
83
65
89
72
73
70
62
70
72
MOE
for
Page 2 of 24
2.1.
The Procedure
The main task in performing a test of hypothesis is to find the margin of sampling error,
MOE . This would provide us with the decision rule, the criterion, to determine whether
to reject the hypothesis.
H 0 : =75
H 1 : 75
The null hypothesis states that the population mean is equal to is 75 mph; the alternative
hypothesis states that the population mean is not equal to, or different than, 75 mph.
H0
H0
is
is True
H0
(presumed innocent)
H0
is False
Page 3 of 24
rejected
H 0 is not
rejected
Error)
The accused is innocent and he
is found guilty.
Probability =
Correct decision (no error)
The accused is innocent and he
is found not guilty.
Probability = 1
In the hypothesis test, the burden of proof is always on the alternative hypothesis. In a
criminal court, the burden of proof is on the prosecutor. The prosecutor must convince the
jury, show beyond a reasonable doubt, that the defendant is guilty. Therefore, we want to
make it unlikely to reject the null hypothesis unless the evidence is "very strong" or
"significant". In a criminal court, significant means beyond a reasonable doubt. We
want to make it unlikely to find the defendant guilty unless guilt is established beyond a
reasonable doubt. For this reason the , the probability of rejecting the null hypothesis, is
always assigned a small valuetypically, 5 percent in statistical hypothesis tests. The
value is also called the level of significance of the test.
Note that in a confidence interval, is the percentage of all possible intervals built around
sample means that do not capture the population mean. That was because % of sample
means fall outside the margin of error MOE=z /2 se ( x ) . In a test of hypothesis plays a
similar role. If the randomly selected sample yields an x value which falls outside the
prescribed margin of error, we would wrongly reject the null hypothesis. And there is always
an % chance of doing that.
Since committing a Type I Error is considered as the more serious of the two errors (finding
an innocent person guilty), the threshold probability (the level of significance ) is set in
advance. The probability of Type II Error (), however, varies based on several factors, one
of the them being .
MOE=z /2 se ( x )
the interval which would contain
L ,U =0 z / 2 se ( x )
Here the hypothesis test is said to be a two-tailed test. The reason this is called a twotailed test is that no matter what the value of the sample statistic x , whether it is greater
than or less than the hypothesized mean, there is always some evidence against the null
hypothesis in terms of the difference between the value of the sample statistic and value
stated as the null hypothesis. The purpose of the test (the trial) is to gauge the significance
of the difference in either direction from the null mean. The significance of the difference
Page 4 of 24
MOE
formula,
H 0 : =0
H 1 : 0
The vehicle speed example is a two-tail test. Test the null hypothesis that the population
mean vehicle speed is equal to 75.
H 0 : =75 mph
H 1 : 75 mph
We select = 0.05 (allowing for 5% chance of committing a Type I error, that is, rejecting a
true null hypothesis). Going back to the sample data shown above, the sample mean and
standard deviation are obtained as:
x =77.5 and s=8.94 . To determine the margin of
error, first compute the standard error of x .
se ( x )=8.94 / 110=0.852
Given =0.05 , the other component of MOE,
error is then
z / 2 , is
MOE=(1.96)(0.852)=1.67
The interval is then
Page 5 of 24
2.1.3.1.
The Relationship Between the Confidence Interval
and the Acceptance Region for the TOH
We can use the above diagram to observe how the confidence interval for and the
acceptance region for a two-tail test of hypothesis are related. The margin of error for a
95% confidence interval for the vehicle speed example is:
L ,U = x MOE=77.5 1.67=(75.83,79.17)
Note that this interval does not capture the null mean = 75. You can thus use a
confidence interval to observe if the null mean 0 falls within the interval. If it does not,
then you reject the null hypothesis.
x L , x U =75 1.6=(73.4,76.6)
Page 6 of 24
Then
H 0 distribution.
H 0 distribution and do not
x =76.2 belongs to the H 0 distribution, we have not rejected a false null hypothesis.
We have, therefore, committed a Type II error.
The following is a graphic representation of the four scenarios involving a hypothesis test:
o
o
o
o
H0
H0
H0
H0
Page 7 of 24
H0
The decision rule is always set up to reject the null hypothesis. There are two ways to
set up the decision rule. Both are derived from the MOE formula. The role of MOE
here is that, if the null hypothesis H 0 were true, then 1 percent of the sample
means must fall within the MOE . Thus MOE becomes the criterion for rejecting
H 0 . We will reject H 0 , that is, we conclude the deviation x 0 is significant, only
when x falls outside the MOE , when the (absolute value of) deviation of x from
0 exceeds MOE .
We reject
H 0 if,
x 0 > MOE
Substituting for
MOE , we have,
x 0 > z / 2 se ( x )
Dividing both sides of in inequality by
se ( x ) gives us,
x 0
>z
se ( x ) /2
The term on the left-hand-side above is called the test statistic
critical value ( CV ) .
( TS ) and
z/ 2
is the
Page 8 of 24
x 0
>z
s e ( x ) / 2
|TS||z|>CV z /2
Note that the test statistic when you compute the test statistic
x 0
, the result is the
se (x )
z score. Also note the absolute value lines around the test statistic. This means that
when the test statistic is negative, to avoid the confusion arising from the negative sign
regarding the direction of the inequality, use the absolute value.
Now back to the vehicle speed example. The mean obtained from the sample is x =77.5 .
The objective of this exercise is to see if the deviation of the sample mean and the
hypothesized mean ( x 0=2.50) is significant. If this difference exceeds MOE, then the
difference is significant and it will lead us to reject the null hypothesis. The difference is:
x 0=77.57 5=2.5
Using
z=
x 0
2.5
=
=2.93
se (x ) 0.852
z / 2=z 0.025=1.96
TS=2.93>CV =1.96 , then the deviation is significant. Therefore, we reject the null
hypothesis that 0=7 5 . We conclude that the population mean vehicle speed is different
from 75 mph.
The alternative approach for determining if the difference x 0 is significant is to find
the tail area associated with the value of the test statistic. That is, find P( z >TS) . Using
the z table, this tail area is:
P ( z TS )=P(z 2.93)=0.0017
When the test is a two-tail test, double the computed probability ( 2 0.0017=0.0034 ) and
compare it to = 0.05. Note that 0.0034 is now the computed probability of Type I error.
With prob value=0.0034 , there is about 0.58% probability that we might reject a true null
hypothesis. Since we are allowing 5% as the comfort zone or threshold probability for
rejecting a true null, the computed probability 0.0034 is clearly within this comfort zone.
There is only a 0.34% chance that we will be rejecting a true null, or committing a Type I
error. This approach to the hypothesis test is the probability ( prob) value approach.
Page 9 of 24
2 P ( z >TS )<
p-value < level of significance
For a two-tail test, in Decision Rule (b) the prob value is twice the tail area
corresponding to TS. If the p value < , then reject H 0 .
H 0 : =0
H 1 : 0
a. Specify the level of significance .
b. Use any of the two methods to reject or not reject the null hypothesis
Decision Rule (a)Test Statistic
x 0
se (x )
i.
TS=
ii.
CV =z / 2
iii.
Reject H0 if
TS>CV
prob value
x 0
se (x )
i.
TS=
ii.
2 P( z >TS)
iii.
Reject H0 if
p-value <
2.1.4.One-Tailed Tests
In many cases the null hypothesis is that the population mean is either at least (greater than
or equal to), or is at most (less than or equal to) some value. In these cases the sample
statistic x may contradict the null hypothesis in only one direction. For example, if H 0
is 75 , the test is of interest only if x is less than 75. Only this way does the sample
Page 10 of 24
statistic contradict the null hypothesis and we want to test whether this is a significant
contradiction. If the sample mean turns out to be greater than 75, then it confirms the null
and, therefore, there is no need for the test.1 This is why the significance of the deviation
will be measured relative to the margin of error only in one direction. Regarding the level of
significance , to maintain the same probability of rejecting a true null hypothesis as in a
two-tailed test, the whole must be used. Thus, in the MOE formula we use z
z/ 2 .
instead of
MOE=z se ( x )
Here the hypothesis test is said to be a one-tail test.
In the above example, we conducted a two-tail test, testing the null hypothesis
H 0 : =75
mph against the alternative H 1 : 75 mph. What if the concern was the mean vehicle
speed is 75 mph or more (at least 75 mph). In this case we would be conducting a one-tail
test, testing the null hypothesis H 0 : 75 mph against the alternative H 1 : <75 mph.
H 0 : 0
H 1 : < 0
The alternative hypothesis:
The null hypothesis:
For this example, the null and alternative hypotheses are written as:
H o : 75
H 1 : <75
This is a lower-tail test, as indicated by "<" (a strict inequality) in the alternative hypothesis.
Example 2
To perform a test, let us continue with the example of clocking a random sample of
n=110 vehicles. Suppose the sample yields x =73.9 mph and s=8.82 . Note that
1 If there is no evidence the defendant has committed the crime, then there would be no trial.
Page 11 of 24
x =73.9< 0=75
H0 .
The
Let us compute the MOE , determine the acceptance region for the test, and see where
x falls relative to the acceptance region.
se ( x )=
8.82
=0.841
110
x L = 0MOE=751.38=73.62
The sample statistic x =73.90 falls inside the acceptance region bounded on the left by
x L =73.62 mph. We do not reject the null hypothesis and conclude that the population
mean is not less than 75 mph.
The following show the decision rules for a lower tail test.
x 0
>z
se( x )
|TS||z|>CV z
For our example,
TS=
x 0 73.975
=
=1.31
se (x )
0.841
Page 12 of 24
To avoid the confusion arising with the negative sign of the test statistic, use the absolute
value of TS to compare to the CV . Thus,
|TS|=1.31<CV =z 0.05=1.64
We do not reject the null hypothesis and conclude that the population mean speed is not
less than 75 mph.
P ( z <TS )<
p-value < level of significance
P ( z <1.31 )=0.0951
Since this is a one-tail test, we do not double the tail area. Thus,
p value=0.0951> =0.05
i.
ii.
CV =z =z 0.05=1.64
Reject H0 if TS <CV , or
Do not reject H0 since
|TS|>CV
H 0 : $ 100
Page 13 of 24
H 1 : > $ 100
Let us compute the MOE , determine the acceptance region for the test, and see where
x falls relative to the acceptance region.
se ( x )=
25.3
=2.36
115
x U = 0+ MOE=100+3.87=103.87
The sample mean x =$ 104.9 falls outside the acceptance region. Hence, we reject the
null hypothesis, H 0 : $ 100 , and conclude that the mean reimbursement is greater than
$100.
Now lets use the test statistic and p-value decision rules.
Decision Rule: Reject
TS=z=
H 0 , if
TS>CV
x 0 104.9100
=
=2.08
se ( x )
2.36
CV =z =z 0.05=1.64
Since
H0
H 0 , if p- value<
Page 14 of 24
Remark:
What would the conclusion be if = 0.01?
H0 .
H0 .
Since
H 0 : 1,000
H 1 : <1,000
n=49
s=56
se ( x )=56/ 105=5.465
z =z 0.05=1.64
=0.05
Decision Rule (a)Reject H if |TS|< CV
i.
ii.
CV =z =z 0.05=1.6 4
Reject H0 if |TS|>CV
Reject
H0 since |TS|=1.98>CV =1.64
iii.
Reject the manufacturers claim that the mean life is at least 1,000 hours and conclude that
it is less the 1,000 hours.
Page 15 of 24
The most important part of performing a hypothesis test is stating the correct null and
alternative hypotheses. The incorrect statement of the hypotheses will invariably lead you
to a wrong conclusion about the test. If you are confused about setting up the hypotheses,
hopefully the following guidelines will help.
Never put the equal sign in the alternative hypothesis. The following symbols
should not appear in the alternative hypothesis: "=", "", "". These symbols
belong to the null hypothesis. Depending on the nature of the test, the alternative
hypothesis may contain any of the following: " ", ">", "<".
Following the above directions, after you state your null and alternative hypotheses,
make certain that the sample evidence contradicts the null hypothesis (and agrees
with the alternative). Remember, the reason we conduct a hypothesis test is to
determine if the sample evidence is significant in order to reject the null. In a two tail
test, the sample evidence will always be different, or contradict, the null. So, there is
no confusion. However, in a one tail test the reason we conduct the test is that there
is evidence against the null, and we want to determine if the evidence is significant.
For example, if in a problem you set your null and alternative hypothesis as, say,
H 0 : 100
H 1 : <100
and the sample evidence is
xx =$ 110 ,
then the sample evidence does not contradict the null. There is no evidence that the
population mean is less than $100; there is no evidence to reject the null. This
should be a warning that your hypotheses statement is incorrect. The correct
statement should be,
H : $ 100
H : > $ 100
Now the sample evidence, xx =$ 110 , contradicts the null. There is evidence the
mean is greater than $100, but you want to determine if xx
is significantly greater
than 100 in order for you to reject the null.
Generally, any hypothesis test which involves challenging the status quo, the
prevailing practice or belief, the challenger's viewpoint should be the alternative
hypothesis. If you want to prove the prevailing practice or belief wrong, you have to
provide significant proof, a proof which is "beyond a reasonable doubt". Consider the
following examples
o The production team of a manufacturing company has designed a new
production process which is supposed to lower the average production cost.
To implement the new process, the production team must convince the
management that the average cost is lower with the proposed process than
the current process. Suppose the current average cost is $10. The production
team must provide significant evidence that the average cost under their
proposed process is lower. Therefore, the null and alternative hypotheses
must be stated as:
H : $ 10
Page 16 of 24
H : < $ 10
Note that the null hypothesis states that the average cost is "no less than" $10. The
task of the production team is to show significant evidence to reject the null.
o
A pharmaceutical company has developed new drug to treat a certain type of cancer.
Suppose 60% of patients who take the existing drug experience remission. To prove
that the new drug is more effective than the current treatment, the company must
convince, must provide significant evidence to the medical community that the new
drug is better, that the remission rate is higher. The null hypothesis, to be rejected,
then must be the new drug is no better:
H : 60
H : >60
An interesting point to keep in mind in this example is that the medical community
would require a smaller level of significance for the test, say, =0.01 , compared
to the typical =0.05 . This is to reduce the probability of Type I error, to lower the
likelihood of rejecting the "no better" hypothesis, when it may be true.
Another issue you should keep in mind in choosing the null and alternative
hypothesis is: choose H 0 such that, if the hypothesis is true, the consequence of
rejecting it is costly, dire, etc...
In the problems that you deal with in this course, you should mainly be concerned about how
the problem is stated. The problem may be stated as the null hypothesis or the alternative
hypothesis. For example, if you are asked to test the hypothesis that the mean is "at least",
say, $50, then you should recognize this as a null statement:
H : 0 . The same
problem may be state as: test the hypothesis that the mean is "less than" $50. This is an
alternative hypothesis statement:
H : < 0 . Just be careful to use the appropriate
symbol corresponding to the statement of the hypothesis. Then make sure that the equality
sign in any form, " ", " ", or " ", does not appear in the alternative hypothesis
statement.
11.84
11.98
12.00
11.96
11.83
11.95
12.03
11.82
11.91
11.75
11.96
11.95
11.86
11.97
11.85
11.92
11.89
12.02
Page 17 of 24
H 0 : =12
H 1 : 12
First we must compute the sample mean
the sample:
from
2
( x x )
x
x =
=12.032 s=
=0.100
n
n1
se ( x )=0.100/ 20=0.022
t /2,(n1)=t 0.025, (19) =2.093
distribution.
L ,U =H 0 MOE
L=120.05=11.0 5
U=12+0.0 5=12.0 5
H0
TS>CV
if
TS>CV
H0
ii. Find
iii. Reject
2 P(t> TS)
H 0 if
p value< .
if probability value
Page 18 of 24
IMPORTANT NOTE
Note that to compute P(t> 1.43) you must use a computer program that finds the tail
area under the t curve for a given t score and degrees of freedom. There are no tables to
determine such areas (probabilities). However, you can estimate this probability
ordinally using the t table as shown below:
df
16
17
18
19
20
21
22
23
0.100
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
0.050
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
0.025
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
0.010
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
0.005
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
You can still correctly guess, from a given t value, whether the prob value is greater
or less than a given level of significance (for a one-tail test) or /2 (for a two-tail test).
In the last example t=1.431 . Given df =19 , the t score increases as the tail area
in the top row decreases (as we move to the right in the table). In the above example,
t=1.431 is greater than 1.328, the smallest t score shown in the table associated with
df =19 . This means that the tail area associated with t score of 1.431 must be greater
than 0.100. So, the combined tail areas is definitely greater than the level of significance .
Therefore, we do not reject the null hypothesis.
Example 6
A light bulb manufacturer claims the average life of its light bulbs is at least 1,000 hours. To
perform a test of hypothesis at 5 percent level of significance, a sample of 25 light bulbs
yields an average life of 992.6 hours with a sample standard deviation s=49.3 hours.
Should the manufacturer's claim be rejected?
H 0 : 1,000
H 1 : <1,000
Note that this is a lower tail test because
n=25
x =992.6
s=49.3
=0.05
x 0=992.61000=7.40<0 .
se ( x )=49.3 / 25=9.86
T S=t ,(n1)=t 0.05,(24 )=1.711 . [Here you must use t ,(n1) , rather than
t /2 ,(n1) , because you are performing a one-tail test.]
H 0 if TSCV
0.751<1.711
iii. Reject
CV =t ,(n1) =t 0.05,(24)=1.711
|TS|=|xx 0|/se ( xx )=0.751
Do not reject H 0 since
Page 19 of 24
H0
if probability value
NOTE: Using Excel, =T.DIST.RT(0.751,24) = 0.2300. If a computer is not available, you can
use the t table to determine if the p-value is greater than or less than the level of
significance:
df
23
24
25
0.100
1.319
1.318
1.316
0.050
1.714
1.711
1.708
0.025
2.069
2.064
2.060
0.010
2.500
2.492
2.485
0.005
2.807
2.797
2.787
Note that |t| = 0.751 is less than 1.318, the smallest of the shown t scores corresponding to
df = 24, which is associated with a tail area of 0.10, the largest of the shown tail areas.
Thus, |t| = 0.751 must be associated with a much larger tail area than 0.10, which, in turn,
would exceed = 0.05.
H 0 : =0.26
H 1 : 0.26
This is a two-tail test, because we are testing the hypothesis that the population proportion
is 26 percent.
n=600
=0.05
px =0.273
z / 2=1.96
se ( p )=
0 (1 0)
n
Page 20 of 24
Note that to find se ( px ) , unlike the standard error in the confidence interval problems,
instead of the sample proportion p you use 0 in the formula. This is logical because
we are presuming the population proportion is the value specified in the null hypothesis.
se ( p )=
0.26(10.26)
=0.0179
600
The margin of error and the acceptance region for the test are determined as follows:
H0
if
TS>CV
CV =z / 2=z 0.025=1.96
Page 21 of 24
The test of hypothesis provides that we should not reject the null hypothesis that
H 0 : =0.26 . Therefore we conclude that the proportion of all Hoosier adults in the labor
force with a 4-year college degree is 26 percent.
Example 8
A pest control company claims that no more than 15% of its customers need repeated
treatment after a 90-day warranty period. To test the validity of this claim, a consumer
organization selected a sample of 300 customers and found that 57 needed repeated
treatment after the 90-day warranty period. Is there evidence, at 5% level of significance,
that the claims is not valid?
Here the claim is "no more than" 15%... The symbol for "no more than" or "at most" is .
This symbol must be stated in the null hypothesis. The alternative is then "greater than"
15%, which is shown as 15 . This makes the test an upper tail test.
H 0 : 0.15
H 1 : >0.15
n=200
px =57/200=0.19
0.15(10.15)
se ( p )=
=0.0 206
3 00
=0.05
z =1.64
Compute
rather than
MOE . Note that since this is a one tail test. Therefore, you must use
z / 2 , to obtain MOE .
z ,
U= 0 + MOE=0.15+0.034=0.18 4
TS>CV
Page 22 of 24
i.
TS>CV .
CV =z =z 0.05=1.64
TS=( p 0)/ se ( px )=1.94
Reject H 0 since TS=1.94 >CV =1.64
Both methods indicate that the null hypothesis H0: 0.15 should be rejected. The test
does not support the companys claim that no more than 15% of its customers need
repeated treatment after a 90-day warranty period.
Example 9
To test the hypothesis that less than 40% of drivers on a certain highway obey the legal
speed limit, in a random sample of 700 vehicles clocked secretly, 252 observed the legal
speed limit. Is there significant evidence that less than 40% of drivers observe the legal
speed limit? Perform the test at a 5 percent level of significance.
Since the hypothesis to be tested is less than 40 percent (
tail test:
H 0 : 0.40
H 1 : <0.40
Compute the sample proportion:
p=x / n=252/700=0.36
Since this is a lower tail test, the deviation of the sample proportion from the null value for
the proportion should be a negative value.
p 0=0.360.40=0.04
se ( p )= 0 ( 1 0 ) /n= 0.40(10.40)/700=0.0185
MOE=1.64 0.0185=0.03
Page 23 of 24
TS=( p 0)/ se ( p )
|TS|=|0.360.40|/0.0185=2.16
At
CV =z 0.05=1.64
Decision rule: reject
H 0 , if
TSCV :
P( z <2.16)=0.0154
Since
H0 .
Page 24 of 24