Anda di halaman 1dari 12

Exercise 5: Parametric hypotheses

Any claim regarding the unknown distribution of a characteristic is called a statistical


hypothesis.
A hypothesis which specifies only the numerical values of unknown parameters of the
distribution of a characteristic is called a parametric hypothesis. In order to verify such
hypotheses, we use parametric tests.
A hypothesis regarding other features of the distribution of a character (including its
parameters) is called a non-parametric hypothesis. To verify such hypotheses, we use
nonparametric tests.
Hypothesis testing is necessary to decide whether a given hypothesis can be considered to be
true or false. Hence, we specify an initial hypothesis (called H0 the null hypothesis) and
formulate the alternative hypothesis (called H1), which will be considered to be true if it is
concluded that the hypothesis H0 is not true.
If only one hypothesis is formulated and the object of a statistical test is to check whether this
hypothesis is true or not and we do not check other hypothesis, such a test is called a test of
significance.
An algorithm for parametric hypothesis testing (significance testing):
1. We formulate the hypothesis H0: (Q = Q0).
2. We set the significance level .
3. Next, we observe an n - element simple sample.
4. Calculate the value u of the relevant U statistic (significance test).
5. We look for the critical value u0 of the statistic U for the selected satisfying the
following inequality:
P U u0
6. If
If

u u0 , then we reject the hypothesis H0,


u u0 , there is no reason to reject the hypothesis H0.

Note: For small samples we deal with the exact distribution of the statistic U; for large
samples the limiting distributions of these statistics is in use.

Example.
Let X be a trait with a normal distribution N ( m,1) in the population of interest, where m is
unknown. We believe that the unknown average value is 0, i.e., we test the hypothesis H0: m
= 0 against the alternative hypothesis H1: m 0.
The only way to test our hypothesis is to compare it with a sample from the general
population. Hence, a random sample of 10 items was chosen and the following results
obtained:
-0,30023
-1,27768
0,24425
7
1,27647
4
1,19835
1,73313
3
-2,18359
-0,23418
1,09502
3
-1,0867
To test the null hypothesis using the SPSS program. Running the "Analyze / compare Means /
one_sample_t_test" option, we obtain

One-Sample Statistics
N
VAR00001

Mean
10

,0465

Std. Deviation
1,29240

Std. Error Mean


,40869

One-Sample Test
Test Value = 0
95% Confidence Interval of the Difference
t
VAR00001

df
,114

Sig. (2-tailed)
9

Mean Difference

,912

Lower

,04649

Upper
-,8780

,9710

Reading the results presented, we find that the difference between the average from the null
hypothesis and the average value calculated from the sample is 0.04649. Is this difference
significant (important) enough for us to believe that we must reject the null hypothesis? To
carry out this test, the t-statistic was used and the value of the statistic (9 degrees of freedom)
is equal to 0.114 The p-value is given in the Sig. column (this is a measure of the credibility
of H0). This is much larger than the minimum p-value required to state that H0 is credible,
typically 0.05. Thus, there is no reason to reject the hypothesis H0.
This conclusion can also be derived by analyzing the confidence interval for the difference
between the value according to the null hypothesis and the actual value. We obtained a
confidence interval of [-0.878, 0.971]. Thus, it is conceivable that the actual difference is 0,
because with a probability of not less than 1 - = 0.95 it can be found in such an interval.
Let us now divide the test sample into two parts (two samples) by including even-numbered
elements in the first sample and odd-numbered elements in the second sample. Such a method
can be used to test the homogeneity of the original sample. If it is homogeneous, the
parameters (e.g. mean value and standard deviation) in the two new samples should differ
insignificantly. Thus we have the following two samples:
-0,30023
0,244257
1,19835
-2,18359
1,095023

-1,27768
1,276474
1,733133
-0,23418
-1,0867

Let us examine whether the averages in these samples significantly differ. We carry this out
first using the testing procedure in Excel.
Option I. Assume that the place an item occupies in a given sample is important. Therefore,
we will compare the differences between pairs, i.e. the first element from the first sample with
the first element of the second sample, the second element from the first sample with the
second element of the second sample, etc. So we use the following test:

We get
t-Test: Paired Two Sample for Means

Mean
Variance
Observations
Pearson Correlation
Hypothesized Mean Difference
df
t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail

Variable 1
Variable 2
0,010762
0,0822094
1,888100953 1,866891821
5
5
0,278201169
0
4
-0,09704149
0,463680659
2,131846786
0,927361319
2,776445105

In both cases (one-sided t-test and two-sided), the calculated value of the t statistic is smaller
than the critical value. Moreover, one consequence of paired comparisons is the possibility of
calculating Pearsons coefficient of correlation. It is in this case 0.2782, which should be
considered a very low value, excluding the significance of the correlation between the two
samples. We will return to study the relevance of this measure.
Option II. Now let us use the test which does not require paired comparisons. Because we do
not know the variance in the populations from which the samples come, we use the test of
comparison with less strict assumptions, i.e. the variances in the populations from which these
samples come may be unequal.

We get:
t-Test: Two-Sample Assuming Unequal Variances
Variable 1 Variable 2
Mean
0,010762 0,0822094
1,86689182
Variance
1,888100953
1
Observations
5
5
Hypothesized Mean
Difference
0
df
8
t Stat
0,082445485
P(T<=t) one-tail
0,468158987
t Critical one-tail
1,859548038
P(T<=t) two-tail
0,936317975
t Critical two-tail
2,306004135
If it is reasonable to assume the population variances for these two samples are equal, we can
use the appropriate test:

We get the same results:


t-Test: Two-Sample Assuming Equal
Variances
Variable 1 Variable 2
Mean
0,010762
0,0822094
1,88810095
Variance
3 1,866891821
Observations
5
5
1,87749638
Pooled Variance
7
Hypothesized Mean
Difference
0
df
8

t Stat
P(T<=t) one-tail
t Critical one-tail
P(T<=t) two-tail
t Critical two-tail

0,08244548
5
0,46815898
7
1,85954803
8
0,93631797
5
2,30600413
5

At the end of the analysis we compare the samples using a test with the assumption that the
population variances are both equal to 1.8:

z-Test: Two Sample for


Means
Variable 1
Variable 2
0,010762 0,0822094
1,8
1,8
5
5

Mean
Known Variance
Observations
Hypothesized Mean
Difference
z
P(Z<=z) one-tail
z Critical one-tail
P(Z<=z) two-tail
z Critical two-tail

0
-0,084201568
0,466448086
1,644853627
0,932896171
1,959963985

Note that in all cases there is no reason to reject the hypothesis of equality of the means in the
populations from which these samples come. This helps settle the question of the
homogeneity of the study population: it is conceivable that the population is homogeneous.
For the record, let us show the appropriate output using SPSS for the comparison of pairs:
Paired Samples Statistics
Mean
Pair 1

Std. Deviation

Std. Error Mean

VAR00001

,0108

1,37408

,61451

VAR00003

,0822

1,36634

,61105

Paired Samples Correlations


N
Pair 1

Correlation

VAR00001 & VAR00003

Sig.

,278

,650

Paired Samples Test


Paired Differences
95% Confidence Interval of

Mean
Pair 1

VAR00001 -

-,07145

Std.

Std. Error

Deviation

Mean

1,64632

,73626

the Difference
Lower
-2,11562

Upper
1,97273

df

-,097

Sig. (2-tailed)
4

,927

VAR00003

The results are obviously the same. In addition, SPSS gives information on the
significance of the coefficient of correlation between the pairs. We find that the
correlation coefficient equals 0.278 and cannot reject the null hypothesis that the
correlation coefficient is 0 (the p-value for this result is 0.65). Hence, we assume that the

coefficient of correlation equals zero.


At the end, to analyze another aspect of the data used: the variance of the main sample
and to compare the variance of the two smaller samples. We will use the test

We get
F-Test Two-Sample for
Variances
Mean
Variance
Observations
df
F
P(F<=f) one-tail
F Critical one-tail

Variable 1
Variable 2
0,010762
0,0822094
1,888100953 1,866891821
5
5
4
4
1,011360665
0,495763859
6,388232909

We see that the assumption of equality of variances seems to be correct: you cannot reject
the hypothesis of the equality of the population variances for both samples. The
calculated value of the F statistic is 1.01136 and is much smaller than the critical value,
6.3882
Additional problems.
1. A manufacturer substitutes a different engine in machines that were known to have an
average consumption of 32.5 kVh per day. The manufacturer wants to test whether the
new engine changes the consumption of the machine. A random sample of 100 trial
runs gives x 29.8 kVh and s=6.6 kVh. Using a 0.05 level of significance, is the
average consumption of energy per day using the new engine different from the
average consumption of the old engine?
2. According to an independent survey, the average appreciation, in percent, on stocks
was 4.2% for the five-year period ending in June 2009. An analyst tests this claim by
looking at a random sample of 50 stocks and finds a sample mean of 4.8% and a
sample standard deviation of 1.1%. Using 0.05 , does the analyst have sufficient
statistical evidence to reject the claim made by this survey?
3. New companies that create computer programs believe that the average age of staff at
these companies is 25. To test this using a two-tailed test, a random sample is
collected:
41, 18, 25, 36, 27, 35, 24, 30, 28, 19, 22, 22, 26, 23, 24, 31, 22, 22, 23, 26, 27, 26, 29,
28, 23, 19, 18, 18, 23, 24, 23, 25, 24, 22, 20, 21, 21, 21, 21, 32, 23, 21, 20.
Test using 0.05 .
o test the hypothesis that the average age of staff is larger than 25,

o test the hypothesis that the average age of staff age is smaller than 25.
4. According to an independent survey, the average house owner stays in a property for 6
years. Suppose that a random sample of 120 house owners gives the following results:

Conduct a two-tailed hypothesis test using 0.05 and state your conclusion. What is
your p-value?
5. The p-value obtained in a hypothesis test for a population mean is 0.1. Select the most
precise statement about what it implies. Explain why the other statements are not
precise, or are false.
a)

If H0 is rejected based on the evidence that has been obtained, the probability of
a type I error would be 10%

b)

We can be 90% confident that H0 is false.

c)

There is at most a 10% chance of obtaining evidence that is even more


unfavorable to H0 when H0 is actually true.

d)

If 1% , H0 will not be rejected and there will be a 10% chance of a type II


error.

e)

If 0.05 , H0 will not be rejected and no error will be committed.

f)

If 10% , H0 will be rejected and there will be a 10% chance of a type I


error.

7. Explain the difference between the p-value and the significance level .