Anda di halaman 1dari 55

STAT171

Statistical Data Analysis


(2015)
Topic 11
Sample size and Power

J&B
Chapter 8
Sections 2.1, 4.3, 5.3

1. What is the idea and use of power


2. The influence of sample size
3. Practical versus statistical significance
4. Solving for sample size
5. Power of a z-test
one sided tests
two sided tests
6. Example of an application to
proportions
2

Most common question asked


what sample size should be used?
Two questions must be asked when
determining a required sample size:
1) What is the minimum difference youd be
interested in finding?
For example, the population average used to
be 0.75 and wed be interested if we knew it
was 0.73 or smaller, but not anything more
(anything above 0.73 will not be actioned
such as manufacturing a new drug).
2) How sure do you need to be that youve
found such a difference? We usually want
this to be fairly high, such as 95% or 99% etc.

The tool we need is power and this


should be borne in mind whenever doing
hypothesis testing!
3

With the naked eye

The Orion Nebula


1300 light years away, 24 light years across.
To naked eye a dot, but with a good (powerful)
telescope much can be distinguished.
You are using a statistical telescope when running a
hypothesis test need to know how strong it is.

With a fantastic telescope

Recall from Topic 6 there are two


types of error which can occur when
testing a hypothesis:
Type I error: this occurs when H0 is true,
and we wrongly declare it to be false.

= P(Type I error)
= P(Reject H0 | H0 true)

Type II error: this occurs when H0 is


false, and we wrongly declare it to be true.

= P(Type II error)
= P(Retain H0 | H0 false)
 Power is the probability of NOT
making a Type II error.

That is  Power is the probability we


reject H0 when H0 is false.
Power is the conditional probability we
make the right decision when there truly is
something different from the null
happening.

Power = P(Reject H0 | H0 false)


=1-

We would prefer power to be 100%, but


due to inherent variability, that can only
happen if we always reject, no matter what
(even when H0 is true).
 This would mean our Type I error rate () was
6
also 100% .

We need to be able to evaluate power


in a given situation to see how we
can increase the probability of
finding a true difference without also
increasing the probability of finding a
false difference.
That is, for a fixed Type I error rate,
how can we improve the power?
We will start by investigating how
the hypothesis test itself is affected
by things we can control such as
sample size and significance level.
7

Sample Size for a z-test


Returning to the paint example (from Topic 7):
H0: = 75
H1: < 75
x = 71.5 minutes

zobs

=9

n = 25

71.5 75
=
1.94
9 25

p-value = P( Z - 1.94)
= 0.0262
Reject H0 at the 5% level.
We concluded that there was evidence that
the new additive reduced the average
drying time of the paint.
What would have happened if we had
obtained the same x of 71.5, but from a
different sample size?
Would it affect the conclusions?

e.g.

For n = 5:

zobs

71.5 75
=
0.870
9 5

p-value = P(Z - 0.870) 0.1922


Dont reject H0
Now (with n=5 rather than n=25) we cannot
conclude that the additive has a significant
effect on the average drying time.
For n = 10,000:
71.5 75
zobs =
38.89
9 10000

e.g.

p-value = P(Z -38.89)


0.00
Reject H0 at any significance level.
Now we are very confident that the
additive decreases the average drying time
9
of the paint.

The sample size can affect your


conclusions because the larger the sample
size, the more confident we can be about
the sample mean as an estimate of .
For very small samples it is often
difficult to obtain a significant result
even for large observed differences.
For very large samples - you can quite
often obtain a statistically significant
result even for very small, even trivial,
observed differences.
For n = 10,000 what is the minimum
difference that we can declare significant
at the 5% level of significance (for the
paint drying example)?
10

zobs

x 75
=
9 10000

will be declared significant if zobs is less


than (or equal to) -1.645 (recall, H1 is < 75).
So the cut-off value of x is given by:
x - 75 - 1.645 * 9/10,000
x

75 - 1.645 * 0.09
75 - 0.148
74.85

For n = 10,000, any sample mean of 74.85


minutes or less will be declared (statistically)
significantly less than 75 at the 5% level
of significance.
i.e. a 9 second (or more) difference will be
declared significant at the 5% level.
11

Practical significance vs
Statistical significance
If a result is statistically significant, we
are saying that we are reasonably
confident that the actual population mean
is different from the hypothesized value.
That difference may or may not be of
interest or importance to us.
We need to decide what difference is
meaningful in terms of our experiment.
For the example:
We need to think in terms of what
difference in average drying time is
marketable.
For example, we might decide to market
the paint only if the decrease in average
drying time is more than 2 minutes. 12

Rule

Prior to the experiment, choose


the sample size so that
practical significance
= statistical significance

(later we may want to increase the


sample size further, for other
reasons)
13

If we observe a difference of two


minutes (or more), and it is
significant, we will market the paint.
If we observe a difference of less than
two minutes, we wont care if it is
significant or not because we wont
market the paint anyway.
So, if we observe a difference of 2
minutes or more, we want the test to
be significant at the 5% level.

How large does n have to be to be


able to declare such a result
significant?
14

Practical sig
(diff =2)

We want:

zobs

73 75
9 n

Statistical sig
= p-val 5%

1.645

Just a matter of solving for n.

73 75
1.645
9 n
2 n
1.645
9
9
n 1.645
= 7.4025
2
n ( 7.4025 )

54.80
Hence n 55 (as n is integer)
15

If we have a sample size of 55 (or


more) and we observe a decrease
in average drying time of at least 2
minutes, we know in advance that
the result will be significant at the
5% significance level.
If we have a sample size of less
than 55 and we observe a decrease
of exactly 2 minutes (or anything
less) the result will not be
significant at the 5% level.

16

General result
If we are able to apply a z-test ( is known
and X is normal or approx normal) for testing
H0: = 0
versus H1: < 0
at the 5% level, we will reject H0 if:

x 0

1.645

where:
x 0 is the minimum difference
required for action;
zcrit comes from the tolerance (how
often we are prepared to be wrong when
there is no difference);
needs to be known (or estimated from
past work).
17

Solution
So, once we have decided on what values
for these are to be used, we solve the
equation to obtain the minimum sample
size that will achieve this.
 Solve for n:

x 0

1.645

 In general, for a one sided z-test, the


minimum sample size needed is:


n z

x 0

18

If we do not know that X is normal, we


have to assume (hope) that n will be large
enough for the CLT (Central Limit
Theorem) to work, so that the distribution
of the sample mean will be approximately
normal. Check this after the calculations
are done.
Note: will probably be unknown and an
estimate will have to be used instead.
We can use the t-distribution to get the
critical value, but this depends on the
degrees of freedom, which depends on the
sample size n, which is what we are trying
to determine. 
To solve the dilemma, use the zcrit as this
will give a rough idea in helping to plan the
experiment.
19

In designing an experiment
We should plan what sample size would be
required to achieve significance, before
setting up the experiment.
Requirements:
1. An idea of what practical significance is
desired - i.e. the minimum difference
x 0 that would be meaningful.
2. An idea of the likely value of chance
variation (from previous experiments,
the literature, etc).
3. Use of the appropriate formula to
determine the minimum sample size
required to achieve significance.
Warning: We may observe sample
statistics that are entirely different from
what we are hoping/expecting to observe. 20

Power of a test
For any test of significance, there are two possible
correct outcomes, and two incorrect outcomes.
Recall from Topic 6:

H0 true

H0 false

H0 retained

Type II
ERROR

H0 rejected

Type I
ERROR

The errors are the outcomes of making a wrong


decision. They have CONDITIONAL probabilities:
= P(Type I error) = Prob(Reject H0 | H0 true)
= P(Type II error) = Prob(Retain H0 | H0 false)

The aim is to minimise


for a fixed (usually 5%).

21

The probability of making the correct


decision when H0 is false is called the
POWER of the test.
Power = Prob(reject H0 | H0 false)
= 1 - Prob(retain H0 | H0 false)
=1-

So, minimising and maximising


power are the same thing.
(and hence Power) depends on:
sample size (larger n  higher power )
(higher  higher power )
BUT this makes the Type I error rate higher 

whether the test is one or two tailed


the true value of the parameters.

We cannot make and smaller together, so


we choose as the maximum Type I error
rate we are prepared to put up with.
22

For the paint example


Before carrying out the experiment, we
decide we will be doing a one-sided test at
the 5% significance level for testing:
H0: = 75
versus H1: < 75
we know:

= 9 and n = 25

We can determine the rejection region


in terms of the value of the sample mean.
When we carry out the test we know we
will reject H0 if zobs -1.645
That is reject if:
x 75
1.645
9 25
9
x 75 1.645
25
x 75 2.961
x 72.039

23

We know that if H0: = 75 is true, the


sampling distribution of the sample mean is:
9 2
2
~
75,1.8
X ~ N 75,
N

25

The red area is 0.05, giving the rejection


region in terms of values of the sample mean.
24

i.e. If we carry out a one-sided z-test at


the 5% level of significance to test
H0: = 75 (vs <) with a sample size of 25,
we will:

reject H0 if X 72.039
not reject H0 if X > 72.039

So we can calculate the power of the test


in advance by calculating the probability
that X 72.039 for different values that
(the true average drying time) might
take.
Note that we have no idea:
what the true value of is, or
what value of X will be observed.
25

Power of the test if = 70


Power = P ( Reject H 0 H 0 false )

= P X 72.039 = 70

X 72.039 70

= P

= 70
n

9
25

2.039

= PZ

1.8

P ( Z 1.133)

We can drop the


conditioning here, as the
Z has a standard normal
distribution (if we have
conditioned on the
correct value for ).

0.8708
26

That is, if the true mean is 70 and we


carry out a one-tailed test for = 75
at the 5% level of significance testing :
there is a 5% chance that we would
reject H0 when we shouldnt (we have
set this rate)
there is an 87% chance that we would
reject H0 when we should - i.e. make
the right decision.
 BUT there is a 13% chance that we
would retain H0 - i.e. make a Type II
error.

27

Sample mean distributions


We can plot the distribution of the sample
mean:
under H0: = 75,
or when true = 70 (or any other specific value).
The test statistic will be significant if X is
less than (or equal to) 72.039.

Reject H0

87%

5%

28

Reject H0

87%

5%

What would happen to the power (area under the


black density curve to the left of 72.04) if:
the true mean is less than 70?

the true mean is greater than 70 (but still less than 75)?

29

Other values of :
When = 71:

Power = P ( X < 72.039 | = 71)


X 72.039 71

= 71
= P
<
n

9
25

P ( Z < 0.577 )
0.7190

If is 71, there is a 72% chance of


concluding it is less than 75.
When = 73:

Power = P ( X < 72.039 | = 73)


X 72.039 73

= 73
= P
<
n

9
25

P ( Z < 0.534 )
0.2946

If is 73, there is only a 30% chance of


(correctly) concluding it is less than 75. 30

When = 65:
Power = P ( X < 72.039 | = 65 )
X 72.039 65

= 65
= P
<
n

9
25

P ( Z < 3.911)
1.0000

(close enough)

When = 74.9:
Power = P ( X < 72.039 | = 74.9 )
X 72.039 74.9

= P
<
= 74.9
n

9
25

P ( Z < 1.589 )
0.0559

Power does not exist when H0 is true,


that is when =75 here, but the power
approaches 0.05 (the significance level).
Recall, P ( Type I error ) =P ( Reject H H true )
P ( X < 72.039 | = 75 )
0

P ( Z < 1.645 ) 0.05

31

Plotting Power versus possible values of

- 75 Power

65

-10 1.000

70

-5 0.871

71

-4 0.719

73

-2 0.295

74.9

-0.1 0.056

and the plot of the points (the


power curve would join them)

32

Using Minitab:
MTB > Stat > Power and Sample Size
> 1-Sample Z..

33

Power curve
Plot of Power versus difference = x 0

Power increases as the true gets


further away from the hypothesised
mean 0  we are more likely to pick
up a bigger true difference in population
means than a smaller difference.
34

Power curves, varying n

Power increases as the sample size gets


larger  we are more likely to pick up a
difference with a larger sample size than
a smaller sample size.

35

Power curve when =73


We can calculate the Power for various
samples sizes for a particular .
We decided to market the paint if there
was a difference of 2 minutes or more.
So we can calculate the sample size
required to achieve a power of 20%, 30%
etc, assuming that the true difference is 2
minutes less (i.e. true =73).

36

From the Session window


Power and Sample Size
1-Sample Z Test
Testing mean = null (versus < null)
Calculating power for mean = null + difference
Alpha = 0.05
Diff
-2
-2
-2
-2
-2
-2
-2
-2
-2
-2
-2

Assumed standard deviation = 9

Sample Target
Size
Power Actual Power
3
0.10
0.103843
14
0.20
0.208002
26
0.30
0.304417
40
0.40
0.405399
55
0.50
0.501273
73
0.60
0.600180
96
0.70
0.702800
126
0.80
0.802222
174
0.90
0.900859
220
0.95
0.950655
320
0.99
0.990107

With a sample size of 96, we get a


power of 70.28%.
If n was 95 (or less), the power would be
less than the pre-specified value of 70%.

37

From the Session window


Repeating the output from the previous slide:
Diff
Size
Power Actual Power
-2
3
0.10
0.103843
-2
14
0.20
0.208002
-2
26
0.30
0.304417
-2
40
0.40
0.405399
-2
55
0.50
0.501273
-2
73
0.60
0.600180
-2
96
0.70
0.702800
-2
126
0.80
0.802222
-2
174
0.90
0.900859
-2
220
0.95
0.950655
-2
320
0.99
0.990107

To get the graph, cut and paste these values for Size
and Actual Power into columns in the worksheet, then
do Graph > Scatterplot > With Connect Line
38

Different Example:
z-test power curve for different
[Note: this chart was NOT created in Minitab]
1.00
0.90
0.80

Power (1 - )

0.70

H0: = 120
Ha: > 120

0.60
0.50
0.40
0.30
0.20

= 0.10
= 0.05

0.10

= 0.01

0.00
118

119

120

121

122

123

124

125

126

127

128

True Value of

As the significance level is decreased, the


Power decreases  the problem with
making the Type I error rate low, is that we
are also making the power low there is
LESS chance of picking up a true
39
difference 

z-test power curve for different n


1.00
0.90
0.80

H0: = 120
Ha: > 120
= 10

Power (1 - )

0.70
0.60
0.50
0.40
0.30

n = 45

0.20

n = 90
0.10

n = 180
n = 360

0.00
118

119

120

121

122

123

124

125

126

127

128

True Value of

Power increases as n increases.


Here, the power does not exist for values of
< 120 what is actually plotted on the
vertical axis is P(Reject), not power. 
[Note: this chart was NOT created in Minitab]
40

Power for two-tailed z-test


Refer to Tutorial Week 8, question 2 in
which we looked at the amount of
paracetamol in tablets marketed as having
exactly 500 mg.
It is known that the amount of paracetamol
in each tablet follows a normal distribution
with a population standard deviation of
= 4.
We would be interested in a two tailed test,
as a difference in either direction would be a
problem. That is,
H0: = 500
H1: 500
At a 5% significance
level, we would reject
H0 for zobs -1.96
or zobs +1.96

41

Power for two-tailed tests (cont ...)


We now need to convert this rejection
region for zobs into one involving the
sampling distribution of the sample average
we will first deal with a sample size of 25.
When we carry out the test we will reject H0
if zobs -1.96 or zobs 1.96
X
1.96
n
X 500
1.96
4 25
4
X 500 1.96
25
X 498.432

X
+1.96
n

OR

X 500
+1.96
4 25
X 500 + 1.96

4
25

X 501.568

42

Obtaining the power for = 499 (say)

Power = P X < 498.432 = 499

+ P X > 501.568 = 499

X 498.432 499

= P
<
= 499
n

4
25

X 501.568 499

= 499
+P
>
n

4 25

P ( Z < 0.71) + P ( Z > +3.21)


0.2389 + (~ 0)
0.2389

Reject H0
Reject H0

43

Using Minitab
Specify a two tailed
alternative under
Options

The power curve is symmetric around a difference


of zero, and asymptotes to 1 the further the
distance the true mean is from 0 = 500 here. 44

Varying the sample size n

The power curve for n=50 is above that for n=25.


The power curve for n=10 is below that for n=25.

We can increase the power by increasing the


sample size n (keeping , , the same)
n is under our control
45

Varying the sig. level

The power curve for =0.10 is above that for =0.05


The power curve for =0.01 is below that for =0.05

We can increase the power by increasing the


significance level (keeping n, , the same)
is under our control
We have also seen that power increases as
the difference between the true mean and
the hypothesised mean 0 increases but
the true value of is NOT under our control!46

A practical example for proportions


KidzTV wishes to conduct a survey as it is
concerned it is losing viewers.
In the last poll, 30% of viewers were
watching KidzTV. The station would be
really worried if the proportion fell to less
than 25%.
What sample size is needed in a phone
survey if they are to pick up such a
difference, with probability of wrongly
doing so of at most 5%?
...
We have to assume n is large enough for
the CLT (the normal approximation to the binomial is
appropriate), and the current proportion is
30%.
47

Our hypotheses are:


H0: = 0.30 versus H1: < 0.30
We want P(Reject H0 | H0 true) 0.05
 Significance level = 0.05
We will reject H0 if:
p
1.645
(1 )
n
0.25 0.3
1.645
0.3 0.7
n

( 0.05)

0.3 0.7

1.645

There is no cc of
1/2n here, as we
are trying to find n
... and the cc
would overlycomplicate matters

0.3 0.7
n 1.645
15.07767
0.05
n 15.07767 2
227.306

48

Finding the power of this test:


For an n of 228, we can check the CLT
assumptions:
under H0: =0.3:
under H1: =0.25:

n = 228*0.3 = 68.4 > 15


n(1- ) = 228*0.7 = 159.6 > 15
n = 228*0.25 = 57 > 15
n(1- ) = 228*0.75 = 171 > 15

(1 )
We know that P approx ~ N ,

Hence, P has the following approximate


distributions:
under H0: = 0.3,

0.3 0.7
2

P %. N 0.3,
~ N 0.3, ( 0.03034885 )
228

under H1: = 0.25,

0.25 0.75
2

P %. N 0.25,
~ N 0.25, ( 0.028677 )
228
49

The rejection region in terms of the sample


proportion P has been determined at the
start: Reject H0 if P 0.25.
If the true proportion watching KidzTV was
0.25, the power would be 50% (as seen in the
following diagram of the sampling distributions for
P ... the area under the red curve to the left of the
blue line):

Power = area to
the left of the blue
line at P = 0.25

50

Power if the true = 0.23


With n = 228, P is distributed approx as

0.23 0.77
2

P ~ N 0.23,
~
N
0.23,
0.02787031
(
)

228

CLT check:

n = 228*0.23 = 52.44 > 15


n(1- ) = 228*0.77 = 175.56 > 15

The power is then the area under this approximate


normal distribution the left of P = 0.25

Power = area to the left of


the blue line at P = 0.25

51

The power is then:


Power = P ( Reject H 0 H1 true )
= P ( P 0.25 = 0.23)

= P

p
0.25 0.23
= 0.23

0.23 0.77
(1 )

228
n

= P ( Z 0.718 )
0.7642

So, there is a 76% chance of picking up a


true lower viewer proportion of 23% when
this test is applied.
52

And checking with


Minitab

Power and Sample Size


Test for One Proportion
Testing p = 0.3 (versus < 0.3)
Alpha = 0.05

Comparison p
0.25
0.23

Sample
Size
228
228

Power
0.501121
0.764392
53

So what do we tell KidzTV about


the survey?
1. A sample size of 228 (wed probably
say 250) will ensure that the probability
of falsely rejecting a true viewing
proportion of 0.3 is only ( =) 5%.
2. The probability of falsely retaining a
viewing proportion of 30% when it is
really as little as 23% is ( =) 0.24
3. The probability of correctly rejecting a
viewing proportion of 30% when it is
really 24% is 0.76 (the power).
KidzTV will probably not be happy with
(2) and (3) 

To overcome this, ask for a larger


sample size, as this will lower and
increase the power.

54

Power of a test: Conclusions


1. The larger the significance level, , the
higher the power of the test (but the
higher the Type I error rate).
This is our decision, but it does have a cost

2. The larger the sample size, the higher


the power of the test.
This is our decision, but it does have a cost

3. The larger the difference between the


hypothesized value and the true value
of the population parameter (mean or
proportion), the higher the power
We have no control over this!!

Often the true value of the parameter is


not known, and we calculate the power of
the test for a number of possible true
values of the parameter under study.
From this we can sketch a power curve.55

Anda mungkin juga menyukai