Anda di halaman 1dari 34

Description

Computes the required sample size using the optimal designs with multiple constraints proposed in Mayo et al.(2010). This optimal method is designed for two-arm, randomized phase II clinical trials, and the required sample size can be optimized either using fixed or flexible randomization allocation ratios.

Sample.Size(0.3, 0.6, 0.15, 0.15, 0.15, Allratio_c = 1, Allratio_e = 3)

Specified values for parameters:

Response rates:

control = 0.3 experiment = 0.6 Upper bounds for constriants:

gammaC = 0.15 gammaE = 0.15 gammaDelta = 0.15

Required sample sizes:

[1] Optimal Design:

nc = 20 ne = 20 n = 40 [2] 1 to 1 Allocation Design:

nc = 20 ne = 20 n = 40 [3] 1 to 3 Allocation Design:

nc = 13 ne = 39 n = 52

pi_c Response rate for control pi_e Response rate for experiment gamma_c Upper bound for gammaC gamma_e Upper bound for gammaE gamma_delta Upper bound for gammaDelta Allratio_c Allocation Design Allratio_e Allocation Design

Value

Prints required sample sizes, 1 to 1 allocation design and 1 to 3 allocation design

Description

Computes sample size for independent and paired Student’s t-test, Student’s t-test with Welchapproximation, Wilcoxon-Mann-Whitney test with and without ties on ordinal data

Usage

n.ttest(power = 0.8, alpha = 0.05, mean.diff = 0.8, sd1 = 0.83, sd2 = 2.65, k = 1, design = "unpaired", fraction = "balanced", variance = "equal")

Arguments

power Power (1 - Type-II-error) alpha Two-sided Type-I-error mean.diff Expected mean difference sd1 Standard deviation in group 1 sd2 Standard deviation in group 2

k Sample fraction k design Type of design. May be paired or unpaired fraction Type of fraction. May be balanced or unbalanced variance Type of variance. May be homo- or heterogeneous

Value

Total sample size Sample size for both groups together Sample size group 1 Sample size in group 1 Sample size group 2 Sample size in group 2

n.ttest(power = 0.8, alpha = 0.05, mean.diff = 0.80, sd1 = 0.83, k = 1, design = "unpaired", fraction = "balanced", variance = "equal")

$`Total sample size` [1] 36

$`Sample size group 1` [1] 18

$`Sample size group 2` [1] 18

n.ttest(power = 0.8, alpha = 0.05, mean.diff = 0.80, sd1 = 0.83, sd2 = 2.65, k = 0.7, design = "unpaired", fraction = "unbalanced", variance = "equal")

$`Total sample size` [1] 38

$`Sample size group 1` [1] 22

$`Sample size group 2` [1] 16

$Fraction

[1] 0.7

n.wilcox.ord(power = 0.8, alpha = 0.05, t = 0.53, p = c(0.66, 0.15, 0.19), q = c(0.61, 0.23, 0.16))

$`total sample size` [1] 8390

$m

[1] 3943

$n

[1] 4447

Description

Calculate number of deaths required for Cox proportional hazards regression with two covariates for epidemiological Studies. The covariate of interest should be a binary variable. The other covariate can be either binary or non-binary. The formula takes into account competing risks and the correlation between the two covariates. Some parameters will be estimated based on a pilot data set.

Description

Support is provided for sample size estimation, power, testing, confidence intervals and simulation for fixed sample size trials (that is, not group sequential or adaptive) with two arms and binary outcomes. Both superiority and non-inferiority trials are considered. While all routines default to comparisons of risk-difference, options to base computations on risk-ratio and odds-ratio are also included.

POWER AND SAMPLE SIZE DETERMINATION

Shawn Balcome July 22, 2015

Table

 

H0 is True

H0 is False

Reject H0

Type I error

Correct decision

Do not Reject H0

Correct decision

Type II error

Need to consider

Confidence Level

Confidence Interval

Population

Sample Size

Another

sample size

effect size

significance level = P(Type I error) = probability of finding an effect that is not there

power = 1 - P(Type II error) = probability of finding an effect that is there

pwd

install.packages("pwr")

library("pwr")

pwd Table

function

power calculations for

pwr.2p.test

two proportions (equal n)

pwr.2p2n.test

two proportions (unequal n)

pwr.anova.test

balanced one way ANOVA

pwr.chisq.test

chi-square test

pwr.f2.test

general linear model

pwr.p.test

proportion (one sample)

pwr.r.test

correlation

pwr.t.test

t-tests (one sample, 2 sample, paired)

pwr.t2n.test

t-test (two samples with unequal n)

Alternative R Packages

pwr Demo

  • library(pwr)

For a one-way ANOVA comparing 5 groups, calculate the sample size needed in each group to obtain a power of 0.80, when the effect size is moderate (0.25) and a significance level of 0.05 is employed.

pwr.anova.test(k=5,f=.25,sig.level=.05,power=.8)

Balanced one-way analysis of variance power calculation

k = 5 n = 39.1534 f = 0.25 sig.level = 0.05 power = 0.8

NOTE: n is number in each group

What is the power of a one-tailed t-test, with a significance level of 0.01, 25 people in each group, and an effect size equal to 0.75?

pwr.t.test(n=25,d=0.75,sig.level=.01,alternative="greater")

Two-sample t test power calculation

n = 25 d = 0.75 sig.level = 0.01 power = 0.5988572 alternative = greater

NOTE: n is number in each group

Using a two-tailed test proportions, and assuming a significance level of 0.01 and a common sample size of 30 for each proportion, what effect size can be detected with a power of .75?

pwr.2p.test(n=30,sig.level=0.01,power=0.75)

 

Difference of proportion power calculation for binomial distribution (arcsine transformation)

 

h = 0.8392269 n = 30 sig.level = 0.01 power = 0.75

 
  • alternative = two.sided

 

NOTE: same sample sizes

POWER OF A TEST AND SAMPLE SIZE IN COMPLEX SURVEYS

Hugo Andrés Gutiérrez Rojas

2015

There are many approaches to computing sample size. In public policy evaluation, for example, one is usually tented to check if there is statistical evidence on the impact of a intervention over a population of interest. This vignette is devoted to explain the issues that you commonly find when computing sample sizes.

The power function

Note the definition of power is related to testing an hypothesis testing process. For example, if you are interested in testing if a difference of proportions is statistically significant, then your null hypothesis may look as following:

Ho:P1−P2=0

vs.

Ha:P1−P2=D>0

Where D, known as the null effect, is any value greater than zero. You must notice that this kind of test induces the following power function, defined as the probability of rejecting the null hypothesis. Second, you note that we should estimate P1 and P2 by using unbiased sampling estimators (e.g. Horvitz-Thompson, Hansen-Hurwitz, Calibration estimators, etc.), say P^1 and P^2, respectively. Third, in general, in the complex sample set-up, we can define the variance of P^1−P^2 as

Var(P^1−P^2)=DEFFn(1−nN)(P1Q1+P2Q2)

Where DEFF is defined to be the design effect that collects the inflation of variance due to complex sampling design. Usually the power function is noted as βD:

βD≤Pr⎛⎝⎜P^1−P^2DEFFn(1−nN)(P1Q1+P2Q2)−−−−−−−−−−−−−−−−−−−−−−−−√>Z1−α|

P1−P2=D⎞⎠⎟=1−Φ⎛⎝⎜Z1−α−DDEFFn(1−nN)(P1Q1+P2Q2)−−−−−−−−−−−−−−−−−−−−−−−−√⎞⎠⎟

After some algebra, we find that the minimum sample size to detect a null effect D, is

n≥DEFF(P1Q1+P2Q2)D2(Z1−α+ZβD)2+DEFF(P1Q1+P2Q2)N

Some comments

  • 1. As D>0, the sample size n may take different values. Therefore, you should define the value of Dbefore drawing any sample.

  • 2. The power function, the probability of rejecting the null hypothesis depends on n. As n increases, βD also increases. Thereby, you should define the value of βD before drawing any sample.

  • 3. The variance of P^1−P^2 must be defined before performing the test. This way, if you know nothing about the phenomenon of interest, usually you suppose this variance to be the greatest value it can take. That occurs when P1=P2=0.5. This way, we have that:

Var(P^1−P^2)=DEFF2n(1−nN)

  • 4. Taking 1. and 2. into account, the sample size reduces to:

n ≥ DEFF2D2(Z1 − α + ZβD)2 + DEFFN

5.

Recall that the assumption P1=P2=0.5 is not an assumption on the point estimates, but on the variance. You cannot assume point estimates. If so, why are you drawing a sample? If you already know the point estimates, why to bother in doing statistical analysis? If you know the answer, you do not carry out an expensive study!

The ss4dpH function

The

  • function may be used to plot a graphic that gives an idea of how the definition

of D affects the sample size. For example, suppose that we draw a sample according to a complex design, such that DEFF=2, for a finite population of N=1000 units. This way, if we define the null effect to be D=3%, then we should have to draw a sample of size n> 873 for the probability of rejecting the null hypothesis to be 80% (default of the power), with a confidence of 95% (default of the confidence). Notice that as null effect increases, sample size decreases.

ss4dpH

(

N =

1000, P1 =

0.5, P2 =

0.5, D=0.03, DEFF =

2, plot=TRUE

)

## [1] 873

 

The b4dp function

The

  • function may be used to plot a figure that gives an idea of how the definition of the sample

size n affects the power of the test. For example, suppose that we draw a sample according to a

complex design, such that DEFF=2, for a finite population of N=1000 units, a sample size of n> 873, a null effect of D=3%and a confidence of 95%, then power of the test is β= 80.0228278%. Notice that as the sample size decreases, power also decreases.

b4dp

(

N =

1000, n =

873, P1 =

0.5, P2 =

0.5, D=0.03, DEFF =

2, plot=TRUE

)

## With the parameters of this function: N = 1000 n =

873 P1 = 0.5 P2 = 0.5 P2 = 0.5 D = 0.03 DEFF

 

=

2 conf = 0.95 .

 

## The estimated power of the test is 80.02283 .

 

##

 

## $Power

 

## [1] 80.02283

 

Conclusions

You may have been fooled for some people telling you do not need a large sample size. The sample size is an issue that you have to pay a lot of attention. The conclusions of your study could have been misleaded because you draw a sample with no enough size. For example, from last figure, one may conclude that with a sample size close to 600, the power of the test is as low as 30%. That is simple unacceptable in social research.

R DATA ANALYSIS EXAMPLES

POWER ANALYSIS FOR TWO-GROUP INDEPENDENT SAMPLE T-TEST

Examples

Example 1. A clinical dietician wants to compare two different diets, A and B, for diabetic patients. She hypothesizes that diet A (Group 1) will be better than diet B (Group 2), in terms of lower blood glucose. She plans to get a random sample of diabetic patients and randomly assign them to one of the two diets. At the end of the experiment, which lasts 6 weeks, a fasting blood glucose test will be conducted on each patient. She also expects that the average difference in blood glucose measure between the two group will be about 10 mg/dl. Furthermore, she also assumes the standard deviation of blood glucose distribution for diet A to be 15 and the standard deviation for diet B to be 17. The dietician wants to know the number of subjects needed in each group assuming equal sized groups.

Example 2. An audiologist wanted to study the effect of gender on the response time to a certain sound frequency. He suspected that men were better at detecting this type of sound then were women. He took a random sample of 20 male and 20 female subjects for this experiment. Each subject was be given a button to press when he/she heard the sound. The audiologist then measured the response time - the time between the sound was emitted and the time the button was pressed. Now, he wants to know what the statistical power is based on his total of 40 subjects to detect the gender difference.

Prelude to The Power Analysis

There are two different aspects of power analysis. One is to calculate the necessary sample size for a specified power as in Example 1. The other aspect is to calculate the power when given a specific sample size as in Example 2. Technically, power is the probability of rejecting the null hypothesis when the specific alternative hypothesis is true. For the power analyses below, we are going to focus on Example 1, calculating the sample size for a given statistical power of testing the difference in the effect of diet A and diet B. Notice the assumptions that the dietician has made in order to perform the power analysis. Here is the information we have to know or have to assume in order to perform the power analysis:

The expected difference in the average blood glucose; in this case it is set to 10. The standard deviations of blood glucose for Group 1 and Group 2; in this case, they are set to 15 and 17 respectively. The alpha level, or the Type I error rate, which is the probability of rejecting the null hypothesis when it is actually true. A common practice is to set it at the .05 level. The pre-specified level of statistical power for calculating the sample size; this will be set to .8. The pre-specified number of subjects for calculating the statistical power; this is the situation for Example 2. Notice that in the first example, the dietician didn't specify the mean for each group, instead she only specified the difference of the two means. This is because that she is only interested in the difference, and it does not matter what the means are as long as the difference is the same.

Power Analysis

In R, it is fairly straightforward to perform power analysis for comparing means. For example, we can use the pwr package in R for our calculation as shown below. We first specify the two means, the mean for Group 1 (diet A) and the mean for Group 2 (diet B). Since what really matters is the difference, instead of means for each group, we can enter a mean of zero for Group 1 and 10 for the mean of Group 2, so that the difference in means will be 10. Next, we need to specify the pooled standard deviation, which is the square root of the average of the two standard deviations. In this case, it is sqrt((15^2 + 17^2)/2) = 16.03. The default significance level (alpha level) is .05. For this example we will set the power to be at .8.

library(pwr)

pwr.t.test(d=(0-

10)/16.03,power=.8,sig.level=.05,type="two.sample"

,alternative="two.sided")

Two-sample t test power calculation

n = 41.31968

  • d = 0.6238303

sig.level = 0.05 power = 0.8 alternative = two.sided

NOTE: n is number in *each* group

The calculation results indicate that we need 42 subjects for diet A and another 42 subject for diet B in our sample in order the effect. Now, let's use another pair of means with the same difference. As we have discussed earlier, the results should be the same, and they are.

pwr.t.test(d=(5-

15)/16.03,power=.8,sig.level=.05,type="two.sample"

,alternative="two.sided")

Two-sample t test power calculation

n = 41.31968

  • d = 0.6238303

sig.level = 0.05

power = 0.8 alternative = two.sided

NOTE: n is number in *each* group

Now the dietician may feel that a total sample size of 84 subjects is beyond her budget. One way of reducing the sample size is to increase the Type I error rate, or the alpha level. Let's say instead of using alpha level of .05 we will use .07. Then our sample size will reduce by 4 for each group as shown below.

pwr.t.test(d=(5-

15)/16.03,power=.8,sig.level=.07,type="two.sample"

,alternative="two.sided")

Two-sample t test power calculation

n = 37.02896

  • d = 0.6238303

sig.level = 0.07 power = 0.8 alternative = two.sided

NOTE: n is number in *each* group

Now suppose the dietician can only collect data on 60 subjects with 30 in each group. What will the statistical power for her t-test be with respect to alpha level of .05?

pwr.t.test(d=(5-

15)/16.03,n=30,sig.level=.05,type="two.sample",alt

ernative="two.sided")

Two-sample t test power calculation

n = 30

  • d = 0.6238303

sig.level = 0.05 power = 0.6612888 alternative = two.sided

NOTE: n is number in *each* group

As we have discussed before, what really matters in the calculation of power or sample size is the difference of the means over the pooled standard deviation. This is a measure of effect size. Let's now look at how the effect size affects the sample size assuming a given sample power. We can simply assume the difference in means and set the standard deviation to be 1 and create a table with effect size, d, varying from .2 to 1.2.

ptab<-cbind(NULL, NULL)

# initalize ptab

for (i in c(.2, .3, .4, .5, .6, .7, .8, .9, 1, 1.1, 1.2)){

pwrt<-

pwr.t.test(d=i,power=.8,sig.level=.05,type="two.sa

mple",alternative="two.sided") ptab<-rbind(ptab, cbind(pwrt$d, pwrt$n))

}

ptab

 

[,1]

[,2]

[1,]

0.2 393.40570

[2,]

0.3 175.38467

[3,]

0.4 99.08032

[4,]

0.5 63.76561

[5,]

0.6 44.58579

[6,]

0.7 33.02458

[7,]

0.8 25.52457

[8,]

0.9 20.38633

[9,]

1.0 16.71473

[10,]

1.1 14.00190

[11,]

1.2 11.94226

We can also easily display this information in a plot.

plot(ptab[,1],ptab[,2],type="b",xlab="effect

size",ylab="sample size")

It shows that if the effect size is small, such .2 then we need a very

It shows that if the effect size is small, such .2 then we need a very large sample size and that sample size drops as effect size increases. We can also easily plot power versus sample size for a given effects size, say, d = 0.7

pwrt<-

pwr.t.test(d=.7,n=c(10,20,30,40,50,60,70,80,90,100

),sig.level=.05,type="two.sample",alternative="two

.sided")

plot(pwrt$n,pwrt$power,type="b",xlab="sample

size",ylab="power")

It shows that if the effect size is small, such .2 then we need a very

Discussion

An important technical assumption is the normality assumption. If the distribution is skewed, then a small sample size may not have the power shown in the results, because the value in the results is calculated using the method based on the normality assumption. We have seen that in order to compute the power or the sample size, we have to make a number of assumptions. These assumptions are used not only for the purpose of calculation, but are also used in the actual t-test itself. So one important side benefit of performing power analysis is to help us to better understand our designs and our hypotheses.

We have seen in the power calculation process that what matters in the two-independent sample t- test is the difference in the means and the standard deviations for the two groups. This leads to the concept of effect size. In this case, the effect size will be the difference in means over the pooled standard deviation. The larger the effect size, the larger the power for a given sample size. Or, the larger the effect size, the smaller sample size needed to achieve the same power. So, a good estimate of effect size is the key to a good power analysis. But it is not always an easy task to determine the effect size. Good estimates of effect size come from the existing literature or from pilot studies.

References

D. Moore and G. McCabe, Introduction to the Practice of Statistics, Third Edition, Section 6.4

POWER ANALYSIS

Overview

Power analysis is an important aspect of experimental design. It allows us to determine the sample size required to detect an effect of a given size with a given degree of confidence. Conversely, it allows us to determine the probability of detecting an effect of a given size with a given level of confidence, under sample size constraints. If the probability is unacceptably low, we would be wise to alter or abandon the experiment.

The following four quantities have an intimate relationship:

  • 1. sample size

  • 2. effect size

  • 3. significance level = P(Type I error) = probability of finding an effect that is not there

  • 4. power = 1 - P(Type II error) = probability of finding an effect that is there

Given any three, we can determine the fourth.

Power Analysis in R

The pwr package develped by Stéphane Champely, impliments power analysis as outlined by Cohen (! 988). Some of the more important functions are listed below.

function power calculations for

pwr.2p.test

two proportions (equal n)

pwr.2p2n.test

two proportions (unequal n)

pwr.anova.test

balanced one way ANOVA

pwr.chisq.test

chi-square test

pwr.f2.test

general linear model

pwr.p.test

proportion (one sample)

pwr.r.test

correlation

pwr.t.test

t-tests (one sample, 2 sample, paired)

pwr.t2n.test

t-test (two samples with unequal n)

For each of these functions, you enter three of the four quantities (effect size, sample size, significance level, power) and the fourth is calculated.

The significance level defaults to 0.05. Therefore, to calculate the significance level, given an effect size, sample size, and power, use the option "sig.level=NULL".

Specifying an effect size can be a daunting task. ES formulas and Cohen's suggestions (based on social science research) are provided below. Cohen's suggestions should only be seen as very rough guidelines. Your own subject matter experience should be brought to bear.

t-tests

For t-tests, use the following functions:

pwr.t.test(n = , d = , sig.level = , power = , type = c("two.sample", "one.sample", "paired"))

where n is the sample size, d is the effect size, and type indicates a two-sample t-test, one-sample t- test or paired t-test. If you have unequal sample sizes, use

pwr.t2n.test(n1 = , n2= , d = , sig.level =, power = )

where n1 and n2 are the sample sizes. For t-tests, the effect size is assessed as

Specifying an <a href=effect size can be a daunting task. ES formulas and Cohen's suggestions (based on social science research) are provided below. Cohen's suggestions should only be seen as very rough guidelines. Your own subject matter experience should be brought to bear. t-tests For t-tests, use the following functions: pwr.t.test(n = , d = , sig.level = , power = , type = c("two.sample", "one.sample", "paired")) where n is the sample size, d is the effect size, and type indicates a two-sample t-test, one-sample t- test or paired t-test. If you have unequal sample sizes, use pwr.t2n.test(n1 = , n2= , d = , sig.level =, power = ) where n1 and n2 are the sample sizes. For t-tests, the effect size is assessed as Cohen suggests that d values of 0.2, 0.5, and 0.8 represent small, medium, and large effect sizes respectively. You can specify alternative="two.sided", "less", or "greater" to indicate a two-tailed, or one-tailed test. A two tailed test is the default. ANOVA For a one-way analysis of variance use pwr.anova.test(k = , n = , f = , sig.level = , power = ) where k is the number of groups and n is the common sample size in each group. For a one-way ANOVA effect size is measured by f where Cohen suggests that f values of 0.1, 0.25, and 0.4 represent small, medium, and large effect sizes respectively. Correlations For correlation coefficients use pwr.r.test(n = , r = , sig.level = , power = ) where n is the sample size and r is the correlation. We use the population correlation coefficient as the effect size measure. Cohen suggests that r values of 0.1, 0.3, and 0.5 represent small, medium, and large effect sizes respectively. " id="pdf-obj-15-18" src="pdf-obj-15-18.jpg">

Cohen suggests that d values of 0.2, 0.5, and 0.8 represent small, medium, and large effect sizes respectively.

You can specify alternative="two.sided", "less", or "greater" to indicate a two-tailed, or one-tailed test. A two tailed test is the default.

ANOVA

For a one-way analysis of variance use

pwr.anova.test(k = , n = , f = , sig.level = , power = )

where k is the number of groups and n is the common sample size in each group. For a one-way ANOVA effect size is measured by f where

Specifying an <a href=effect size can be a daunting task. ES formulas and Cohen's suggestions (based on social science research) are provided below. Cohen's suggestions should only be seen as very rough guidelines. Your own subject matter experience should be brought to bear. t-tests For t-tests, use the following functions: pwr.t.test(n = , d = , sig.level = , power = , type = c("two.sample", "one.sample", "paired")) where n is the sample size, d is the effect size, and type indicates a two-sample t-test, one-sample t- test or paired t-test. If you have unequal sample sizes, use pwr.t2n.test(n1 = , n2= , d = , sig.level =, power = ) where n1 and n2 are the sample sizes. For t-tests, the effect size is assessed as Cohen suggests that d values of 0.2, 0.5, and 0.8 represent small, medium, and large effect sizes respectively. You can specify alternative="two.sided", "less", or "greater" to indicate a two-tailed, or one-tailed test. A two tailed test is the default. ANOVA For a one-way analysis of variance use pwr.anova.test(k = , n = , f = , sig.level = , power = ) where k is the number of groups and n is the common sample size in each group. For a one-way ANOVA effect size is measured by f where Cohen suggests that f values of 0.1, 0.25, and 0.4 represent small, medium, and large effect sizes respectively. Correlations For correlation coefficients use pwr.r.test(n = , r = , sig.level = , power = ) where n is the sample size and r is the correlation. We use the population correlation coefficient as the effect size measure. Cohen suggests that r values of 0.1, 0.3, and 0.5 represent small, medium, and large effect sizes respectively. " id="pdf-obj-15-32" src="pdf-obj-15-32.jpg">

Cohen suggests that f values of 0.1, 0.25, and 0.4 represent small, medium, and large effect sizes respectively.

Correlations

For correlation coefficients use

pwr.r.test(n = , r = , sig.level = , power = )

where n is the sample size and r is the correlation. We use the population correlation coefficient as the effect size measure. Cohen suggests that r values of 0.1, 0.3, and 0.5 represent small, medium, and large effect sizes respectively.

Linear Models

For linear models (e.g., multiple regression) use

pwr.f2.test(u =, v = , f2 = , sig.level = , power = )

where u and v are the numerator and denominator degrees of freedom. We use f2 as the effect size measure.

Linear Models For linear models (e.g., multiple regression) use pwr.f2.test(u =, v = , f2 =
Linear Models For linear models (e.g., multiple regression) use pwr.f2.test(u =, v = , f2 =

The first formula is appropriate when we are evaluating the impact of a set of predictors on an outcome. The second formula is appropriate when we are evaluating the impact of one set of predictors above and beyond a second set of predictors (or covariates). Cohen suggests f2 values of 0.02, 0.15, and 0.35 represent small, medium, and large effect sizes.

Tests of Proportions

When comparing two proportions use

pwr.2p.test(h = , n = , sig.level =, power = )

where h is the effect size and n is the common sample size in each group.

Linear Models For linear models (e.g., multiple regression) use pwr.f2.test(u =, v = , f2 =

Cohen suggests that h values of 0.2, 0.5, and 0.8 represent small, medium, and large effect sizes respectively.

For unequal n's use

pwr.2p2n.test(h = , n1 = , n2 = , sig.level = , power = )

To test a single proportion use

pwr.p.test(h = , n = , sig.level = power = )

For both two sample and one sample proportion tests, you can specify alternative="two.sided", "less", or "greater" to indicate a two-tailed, or one-tailed test. A two tailed test is the default.

Chi-square Tests

For chi-square tests use

pwr.chisq.test(w =, N = , df = , sig.level =, power = )

where w is the effect size, N is the total sample size, and df is the degrees of freedom. The effect size w is defined as

Linear Models For linear models (e.g., multiple regression) use pwr.f2.test(u =, v = , f2 =

Cohen suggests that w values of 0.1, 0.3, and 0.5 represent small, medium, and large effect sizes respectively.

some Examples

library(pwr)

# For a one-way ANOVA comparing 5 groups, calculate the # sample size needed in each group to obtain a power of # 0.80, when the effect size is moderate (0.25) and a # significance level of 0.05 is employed.

pwr.anova.test(k=5,f=.25,sig.level=.05,power=.8)

# What is the power of a one-tailed t-test, with a # significance level of 0.01, 25 people in each group, # and an effect size equal to 0.75?

pwr.t.test(n=25,d=0.75,sig.level=.01,alternative="greater")

# Using a two-tailed test proportions, and assuming a # significance level of 0.01 and a common sample size of # 30 for each proportion, what effect size can be detected # with a power of .75?

pwr.2p.test(n=30,sig.level=0.01,power=0.75)

Creating Power or Sample Size Plots

The functions in the pwr package can be used to generate power and sample size graphs. # Plot sample size curves for detecting correlations of # various sizes.

library(pwr)

# range of correlations r <- seq(.1,.5,.01) nr <- length(r)

# power values p <- seq(.4,.9,.1) np <- length(p)

# obtain sample sizes samsize <- array(numeric(nr*np), dim=c(nr,np)) for (i in 1:np){ for (j in 1:nr){ result <- pwr.r.test(n = NULL, r = r[j], sig.level = .05, power = p[i], alternative = "two.sided") samsize[j,i] <- ceiling(result$n)

}

}

# set up graph xrange <- range(r) yrange <- round(range(samsize)) colors <- rainbow(length(p)) plot(xrange, yrange, type="n", xlab="Correlation Coefficient (r)", ylab="Sample Size (n)" )

# add power curves for (i in 1:np){ lines(r, samsize[,i], type="l", lwd=2, col=colors[i])

}

# add annotation (grid lines, title, legend) abline(v=0, h=seq(0,yrange[2],50), lty=2, col="grey89") abline(h=0, v=seq(xrange[1],xrange[2],.02), lty=2,

col="grey89")

title("Sample Size Estimation for Correlation Studies\n Sig=0.05 (Two-tailed)") legend("topright", title="Power", as.character(p), fill=colors)

col="grey89") title("Sample Size Estimation for Correlation Studies\n Sig=0.05 (Two-tailed)") legend("topright", title="Power", as.character(p), fill=colors)

DESIGN OF EXPERIMENTS – POWER CALCULATIONS

November 18, 2009

By Ralph

Prior to conducting an experiment researchers will often undertake power calculations to determine the sample size required in their work to detect a meaningful scientific effect with sufficient power. In R there are functions to calculate either a minimum sample size for a specific power for a test or the power of a test for a fixed sample size.

When undertaking sample size or power calculations for a prospective trial or experiment we need to consider various factors. There are two main probabilities of interest that are tied up with calculating a minimum sample size or the power of a specific test, and these are:

Type I Error: The probability that the test accepts the null hypothesis, H_0, given that the null

hypothesis is actually true. This quantity is often referred to as alpha. Type II Error: The probability that the test rejects the null hypothesis, H_0, given that the null hypothesis is not true. This quantity is often referred to as beta.

A decision needs to be made about what difference between the two groups being compared should be considered as corresponding to a meaningful difference. This difference is usually denoted by delta.

The base package has functions for calculating power or sample sizes, which includes the

functions power.t.test, power.prop.test and power.anova.test for various common

scenarios.

Consider a scenario where we might be buying batteries for a GPS device and the average battery life that we want to have is 400 minutes. If we decided that the performance is not acceptable if the average is more than 10 minutes (delta) lower than this (390 minutes) then we can calculate the number of batteries to test:

power.t.test(delta = 10, sd = 6, power = 0.95, type = "one.sample", alternative = "one.sided")

For this example we have assumed a standard deviation of 6 minutes for batteries (would either be assumed or estimated from previous data) and that we want a power of 95% in the test. Power is defined as 1 – beta, the Type II error probability. The default option for this function is for 5% probability of alpha, a Type I error. The test will involve only one group so we are considering a one-sample t test and only a one sided alternative is relevant as we do not mind if the batteries perform better than required.

The output from this function call is as follows:

One-sample t test power calculation

n = 5.584552 delta = 10 sd = 6 sig.level = 0.05 power = 0.95 alternative = one.sided

So we would need to test at least 6 batteries to obtain the required power in the test based on the other parameters that have been used.

POWER CALCULATIONS RELATIONSHIP BETWEEN TEST POWER, EFFECT SIZE AND SAMPLE SIZE

January 17, 2013

By dgrapov

I was interested in modeling the relationship between the power and sample size, while holding the significance level constant (p = 0.05) , for the common two-sample t-Test. Luckily R has great support for power analysis and I found the function I was looking for in the package pwr.

To calculate the power for the two-sample T-test at different effect and sample sizes I needed to wrap the basic function power.t.test().

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

# Need pwr package if(!require(pwr)){install.packages("pwr");library("pwr")}

# t-TEST

#---------------------------------

d<-seq(.1,2,by=.1) # effect sizes n<-1:150 # sample sizes

t.test.power.effect<-

as.data.frame(do.call("cbind",lapply(1:length(d),function(i)

{

sapply(1:length(n),function(j)

{

power.t.test(n=n[j],d=d[i],sig.level=0.05,power=NULL,type=

"two.sample")$power

})

})))

t.test.power.effect[is.na(t.test.power.effect)]<-0 # some powesr couldn't be calculated, set these to zero

colnames(t.test.power.effect)<-paste (d,"effect size")

The object t.test.power.effect is 150 x 20 column data frame which lists the power for from 1 to 150 samples and effects sizes from 0 to 2 by 0.1. While this is useful as a look up table we would optimally like to see a visualization of it. Here is some example code to plot this data using base and ggplot2 packages.

#plot results using base

#------------------------------------------------

obj<-t.test.power.effect # object to plot

cols<-1:ncol(obj)

color<-rainbow(length(cols), alpha=.5) # colors lwd=5 # line thickness

lty<-rep(1,length(color))

lty[imp]<-c(2:(length(imp)+1))

#highligh important effect sizes imp<-c(2,5,8) # cuts cuts<-c("small","medium","large") # based on cohen 1988 color[imp]<-c("black") wording<-d wording[imp]<-cuts

par(fig=c(0,.8,0,1),new=TRUE)

#initialize plot

plot(1,type="n",frame.plot=FALSE,xlab="sample

size",ylab="power",xlim=c(1,150),ylim=c(0,1),main="t-Test", axes = FALSE) #add custom axis and grid abline(v=seq(0,150,by=10),col = "lightgray", lty = "dotted") abline(h=seq(0,1,by=.05),col = "lightgray", lty = "dotted")

axis(1,seq(0,150,by=10))

axis(2,seq(0,1,by=.05))

#plot lines for(i in 1:length(cols))

{lines(1:150,obj[,cols[i]],col=color[i],lwd=lwd,lty=lty[i])}

#legend

par(fig=c(.65,1,0,1),new=TRUE)

plot.new() legend("top",legend=wording,col=color,lwd=3,lty=lty,title="Effect Size",bty="n")

Which makes the following graph.

#legend par(fig=c(.65,1,0,1),new=TRUE) plot.new() legend("top",legend=wording,col=color,lwd=3,lty=lty,title="Effect Size",bty="n") Which makes the following graph. Based on this graph, we can

Based on this graph, we can see the relationship between power, effect sizes and sample number. I’ve marked the cutoffs suggested by Cohen 1988 delineating small, medium and large effect sizes. Based on this we can see that if we are designing an experiment and are trying to select a sample size for which our test will be powerd at 0.8 we need to consider the expected effect of our experimental treatment. If we think that or treatment should have a moderate effect we should consider some where around 60 samples per group. However and even better analysis would be to directly calculate the sample number needed to achieve some power and significance level given experimentally derived effects sizes based on preliminary data!

And just for kicks here is the same data plotted using ggplot2.

1

2

3

4

5

6

8

9

10

11

12

#plot using ggplot2

#------------------------------------------------

#plot results using ggplot2

library(ggplot2);library(reshape)

x11() # graphic device on windows obj<-cbind(size=1:150,t.test.power.effect) #flip object for melting

melted<-cbind(melt(obj, id="size"),effect=rep(d,each=150)) # melt and bind with effect for mapping

ggplot(data=melted, aes(x=melted$size, y=melted$value, color=as.factor(melted$effect))) + geom_line(size=2,alpha=.5) + ylab("power") + xlab("sample size") + ggtitle("t-Test")+theme_minimal()

# wow ggplot2 is amazing in its brevity # need to tweak legend and lty, but otherwise very similar

8 9 10 11 12 #plot using ggplot2 #------------------------------------------------ #plot results using ggplot2 library(ggplot2);library(reshape) x11() #gist if you want the source code . " id="pdf-obj-23-29" src="pdf-obj-23-29.jpg">

A little tweaking and these graphs are basically the same. Wow I really need to stop using base for my plots and fully embrace learning ggplot2!

SAMPLE SIZE CALCULATIONS EQUIVALENT TO STATA FUNCTIONS

June 25, 2013

Hi everyone, I'm trying out R knitr for my blog posts now; let me know what you think! Recently, I was looking for sample size calculations in R and found that R really doesn't have good built in funtions for sample size and power. For example, for finding the sample size necessary to find a difference in two proportions, one can use the power.prop.test() function and get the following output:

power.prop.test(n = NULL, 0.1, 0.25, 0.05, 0.9, alternative = "two.sided")

##

##

Two-sample comparison of proportions power calculation

##

##

n = 132.8

##

p1 = 0.1

##

p2 = 0.25

##

sig.level = 0.05

##

power = 0.9

##

alternative = two.sided

## ## NOTE: n is number in *each* group

But there are very few options in this function; for example, there is no option for a continuity correction for the normal approximation to binary data, you can't compare a one sample of a null vs alternative value instead of two sample proportion, and you can't change the ratio of the size of the two samples. Even more importantly, there's no way to find effective sample size when data is clustered. There are other functions in R but I found them complicated and not user friendly. The sampsi and sampclus functions in Stata are intuitive and useful, and have all of those important options. I decided to build my own R functions that had the same capabilities, and I've posted them here for others to use and comment on. Eventually I would like to make an R package that includes all of these functions for ease of use.

The first function, sampsi.prop(), calculates sample size for one and two sample tests of proportions. The required n for both are as follows:

Two-sample test of equality of proportions, with continuity correction:

\[ n_1 = \frac{n'}{4}\left(1+\left\{1+\frac{2(r+1)}{n'r|p_1-p_2|}\right\}^{1/2}\right)^2 \] and \( n_2 = rn_1 \) \[ n' = \frac{\left(z_{1-\alpha/2}\left\{(r+1)\bar{p}\bar{q}\right\}^{1/2}+z_{1-\beta} (rp_1q_1+p_2q_2)^{1/2}\right)^2}{r(p_1-p_2)^2} \] and \( \bar{p} = (p_1+rp_2)/(r+1) \) and \( \bar{q}=1-\bar{p} \) so that \( n_1=n' \) and \( n_2=rn_1 \) One-sample test of proportions where null is \( p=p_0 \) and alternative is \( p=p_A \):

\[ n = \left(\frac{z_{1-\alpha/2}\left\{p_0(1-p_0)\right\}^{1/2} + z_{1-\beta}\left\{p_A(1- p_A)\right\}^{1/2}}{p_A-p_0}\right)^2 \]

The function sampsi.prop() takes arguments p1 and p2, which are either the null and alternative proportions, respectively, for the one-sample test, or two proportions for the two-sample test. These arguments are required for the function to run. The rest of the arguments have preset default values so do not need to be indicated, and include the ratio of smaller group to larger group (default=1), power (default of \( \beta=.80 \)), significance level (default of \( \alpha=.05 \)), a logical argument

for whether to include the continuity correction (default is true), whether to perform a two-sided test or one-sided (default is two), and whether to perform the two-sample or one-sample test (defaut is two-sample test). The output is a list object.

sampsi.prop<-function(p1, p2, ratio=1, power=.90, alpha=.05, cont.corr=TRUE, two.sided=TRUE, one.sample=FALSE){ effect.size<-abs(p2-p1) avg.p<-(p1+ratio*p2)/(ratio+1) sd=ifelse(one.sample==FALSE, sqrt(ratio*p1*(1-p1)+p2*(1-p2)), sqrt(p2*(1-p2)))
sampsi.prop<-function(p1, p2, ratio=1, power=.90, alpha=.05, cont.corr=TRUE, two.sided=TRUE,
one.sample=FALSE){
effect.size<-abs(p2-p1)
avg.p<-(p1+ratio*p2)/(ratio+1)
sd=ifelse(one.sample==FALSE, sqrt(ratio*p1*(1-p1)+p2*(1-p2)), sqrt(p2*(1-p2)))
z.pow<-qt(1-power, df=Inf, lower.tail=FALSE) z.alph<-ifelse(two.sided==TRUE, qt(alpha/2, df=Inf, lower.tail=FALSE), qt(alpha, df=Inf, lower.tail=FALSE)) ct<-(z.pow+z.alph) n1<-(z.alph*sqrt((ratio+1)*avg.p*(1-avg.p))+z.pow*sd)^2/(effect.size^2*ratio) n1.cont<-ifelse(cont.corr==FALSE, n1, (n1/4)*(1+sqrt(1+(2*(ratio+1))/(n1*ratio*effect.size)))^2)
z.pow<-qt(1-power, df=Inf, lower.tail=FALSE)
z.alph<-ifelse(two.sided==TRUE, qt(alpha/2, df=Inf, lower.tail=FALSE), qt(alpha, df=Inf,
lower.tail=FALSE))
ct<-(z.pow+z.alph)
n1<-(z.alph*sqrt((ratio+1)*avg.p*(1-avg.p))+z.pow*sd)^2/(effect.size^2*ratio)
n1.cont<-ifelse(cont.corr==FALSE, n1, (n1/4)*(1+sqrt(1+(2*(ratio+1))/(n1*ratio*effect.size)))^2)

n<-(((z.alph*sqrt(p1*(1-p1)))+z.pow*sd)/effect.size)^2

if(one.sample==FALSE){ col1 <- c ("alpha" , "power" , "p1" , "p2" , "effect size" , "n2/n1"
if(one.sample==FALSE){
col1 <- c ("alpha" , "power" , "p1" , "p2" , "effect size" , "n2/n1" , "n1" , "n2")
col2 <- c(alpha, power, p1, p2, effect.size, ratio, ceiling(n1.cont), ceiling(n1.cont*ratio))
}
else{
col1<- c ("alpha" , "power" , "p" , "alternative" , "n")
col2<- c(alpha, power, p1, p2, ceiling(n))
}
ret<-as.data.frame(cbind(col1, col2))
ret$col2<-as.numeric(as.character(ret $col2))
colnames ( ret )<- c ("Assumptions" , "Value")
description <- paste ( ifelse ( one.sample ==FALSE , "Two-sample" , "One-sample") , ifelse ( two.sided ==TRUE ,
"two-sided" , "one-sided" ) , "test of proportions" , ifelse ( cont.corr ==FALSE , "without" , "with") ,
"continuity correction")

retlist<-list(description, ret)

return(retlist)
return(retlist)
}
}

Now we can do the following sample size calculations that match up perfectly to the corresponding commands in Stata:

sampsi.prop(0.1, 0.25)

## [[1]] ## [1] "Two-sample two-sided test of proportions with continuity correction" ## ## [[2]]

##

Assumptions Value

## 1

alpha

0.05

## 2

power

0.90

## 3

p1

0.10

## 4

p2

0.25

## 5 effect size

0.15

## 6

n2/n1

1.00

## 7

n1 146.00

## 8

n2 146.00

Notice that the results from the calculation above are corrected for continuity, so do not match up with the power.prop.test output from above. To get those results, specify cont.corr=FALSE like so:

sampsi.prop(0.1, 0.25, cont.corr = FALSE)

## [[1]] ## [1] "Two-sample two-sided test of proportions without continuity correction"

##

## [[2]]

 

##

Assumptions Value

## 1

alpha

0.05

## 2

power

0.90

## 3

p1

0.10

## 4

p2

0.25

## 5 effect size

0.15

## 6

n2/n1

1.00

## 7

n1 133.00

## 8

n2 133.00

Here are a couple more examples with some of the other defaults changed:

sampsi.prop(0.5, 0.55, power = 0.8, one.sample = TRUE)

## [[1]] ## [1] "One-sample two-sided test of proportions with continuity correction" ## ## [[2]]

##

Assumptions Value

## 1

alpha

0.05

## 2

power

0.80

## 3

p

0.50

## 4 alternative

n 783.00

0.55

## 5

sampsi.prop(0.5, 0.55, ratio = 2, two.sided = FALSE)

## [[1]]

## [1] "Two-sample one-sided test of proportions with continuity correction" ## ## [[2]]

##

Assumptions

Value

## 1

alpha

0.05

## 2

power

0.90

## 3

p1

0.50

## 4

p2

0.55

## 5 effect size

0.05

## 6

n2/n1

2.00

## 7

n1 1310.00

## 8

n2 2619.00

Next we can calculate sample size for a comparison of means. The required n for the one sample and two sample tests are as follows:

One sample test of mean where null hypothesis is \( \mu=\mu_0 \) and the alternative is \( \mu=\mu_A \): \[ n = \left\{\frac{(z_{1-\alpha/2} + z_{1-\beta})\sigma}{\mu_A-\mu_0}\right\}^2 \]

Two-sample test of equality of means:

\[ n_1 = \frac{(\sigma_1^2+\sigma_2^2/r)(z_{1-\alpha/2}+z_{1-\beta})^2}{(\mu_1-\mu_2)^2} \] \ [ n_2=rn_1 \]

The sampsi.means() function again works the same way as in Stata where you must input as arguments either a null mean and alternative mean for a one-sample test or two means for a two- sample test. At least one standard deviation of the mean is also required. It is possible to have different standard deviations for the two-sample test, but if only the first standard deviation value is assigned and a two-sample test is designated, then the function assumes both standard deviations to be the same. The other arguments and their defaults are similar to the sampsi.prop() function from above.

sampsi.means<-function(m1, m2, sd1, sd2=NA, ratio=1, power=.90, alpha=.05, two.sided=TRUE, one.sample=FALSE){ effect.size<-abs(m2-m1) sd2<-ifelse(!is.na(sd2) , sd2, sd1) z.pow<-qt(1-power, df=Inf,
sampsi.means<-function(m1, m2, sd1, sd2=NA, ratio=1, power=.90, alpha=.05, two.sided=TRUE,
one.sample=FALSE){
effect.size<-abs(m2-m1)
sd2<-ifelse(!is.na(sd2) , sd2, sd1)
z.pow<-qt(1-power, df=Inf, lower.tail=FALSE)
z.alph<-ifelse(two.sided==TRUE, qt(alpha/2, df=Inf, lower.tail=FALSE), qt(alpha, df=Inf,
lower.tail=FALSE))
ct<-(z.pow+z.alph)
n1<-(sd1^2+(sd2^2)/ratio)*(ct)^2/(effect.size^2) n<-(ct*sd1/effect.size)^2 if(one.sample==FALSE){ col1 <- c ("alpha" , "power" , "m1" , "m2" , "sd1" ,
n1<-(sd1^2+(sd2^2)/ratio)*(ct)^2/(effect.size^2)
n<-(ct*sd1/effect.size)^2
if(one.sample==FALSE){
col1 <- c ("alpha" , "power" , "m1" , "m2" , "sd1" , "sd2" , "effect size" , "n2/n1" , "n1" , "n2")
col2 <- c(alpha, power, m1, m2, sd1, sd2, effect.size, ratio, ceiling(n1), ceiling(n1*ratio))
}
else{
col1<- c ("alpha" , "power" , "null" , "alternative" , "n")
col2<- c(alpha, power, m1, m2, ceiling(n))
}
ret<-as.data.frame(cbind(col1, col2))
ret$col2<-as.numeric(as.character(ret $col2))
colnames ( ret )<- c ("Assumptions" , "Value")
description<-paste(ifelse(one.sample==FALSE, "Two-sample", "One-sample"), ifelse(two.sided==TRUE,
"two-sided", "one-sided"), "test of means")

retlist<-list(description, ret)

return(retlist)
return(retlist)
  • }

And here are the examples:

sampsi.means(0, 10, sd1 = 15, power = 0.8)

## [[1]]

## [1] "Two-sample two-sided test of means" ## ## [[2]]

##

Assumptions Value

## 1

alpha

0.05

## 2

power

0.80

## 3

m1

0.00

## 4

m2 10.00

## 5

sd1 15.00

## 6

sd2 15.00

## 7

effect size 10.00

## 8

n2/n1

1.00

## 9

n1 36.00

## 10

n2 36.00

sampsi.means(10, 30, sd1 = 15, sd2 = 20, alpha = 0.1, ratio = 1)

## [[1]] ## [1] "Two-sample two-sided test of means" ## ## [[2]]

##

Assumptions Value

## 1

alpha

0.1

## 2

power

0.9

## 3

m1

10.0

## 4

m2

30.0

## 5

sd1

15.0

## 6

sd2

20.0

## 7

effect size

20.0

## 8

n2/n1

1.0

## 9

n1

14.0

## 10

n2

14.0

Finally, we often work with clustered data and would like to calculate sample sizes for clustered observations. Here I created a samp.clus() function that works the same way as the sampclus function in Stata; that is, it takes a sampsi.prop() or sampsi.means() object, along with an interclass correlation coefficient (\( \rho \)) and either the number of observations per cluster or the number of clusters, and calculates the corresponding effective sample size.

samp.clus<-function(sampsi.object, rho, num.clus=NA, obs.clus=NA){

  • is.na

if(

else{

(

num.clus

)&
)&

is.na

(
(

obs.clus

))
))

print

("Either num.clus or obs.clus must be identified")

so<-sampsi.object[[2]] n1<-as.numeric(so[so$Assumptions=="n1",2]) n2<-as.numeric(so[so$Assumptions=="n2",2]) if(!is.na(obs.clus)){ deff<-1+(obs.clus-1 )*rho n1.clus<-n1*deff n2.clus<-n2*deff num.clus<-ceiling((n1.clus+n2.clus)/obs.clus) } else if(!is.na(num.clus)){
so<-sampsi.object[[2]]
n1<-as.numeric(so[so$Assumptions=="n1",2])
n2<-as.numeric(so[so$Assumptions=="n2",2])
if(!is.na(obs.clus)){
deff<-1+(obs.clus-1 )*rho
n1.clus<-n1*deff
n2.clus<-n2*deff
num.clus<-ceiling((n1.clus+n2.clus)/obs.clus)
}
else if(!is.na(num.clus)){
tot<-(n1*(1-rho)+n2*(1-rho))/(1-(n1*rho/num.clus) - (n2*rho/num.clus)) if( tot <=0) stop("Number of clusters is too small") else{ obs.clus<-ceiling(tot/num.clus) deff<-1+(obs.clus-1)*rho n1.clus<-n1*deff
tot<-(n1*(1-rho)+n2*(1-rho))/(1-(n1*rho/num.clus) - (n2*rho/num.clus))
if( tot <=0) stop("Number of clusters is too small")
else{
obs.clus<-ceiling(tot/num.clus)
deff<-1+(obs.clus-1)*rho
n1.clus<-n1*deff
n2.clus<-n2*deff
}
}
col1 <- c ("n1 uncorrected" , "n2 uncorrected" , "ICC" , "Avg obs/cluster" , "Min num
col1 <- c ("n1 uncorrected" , "n2 uncorrected" , "ICC" , "Avg obs/cluster" , "Min num clusters" , "n1
corrected" , "n2 corrected")
col2<-c(n1, n2, rho, obs.clus, num.clus, ceiling(n1.clus), ceiling(n2.clus))
ret<-as.data.frame(cbind(col1, col2))
colnames ( ret )<- c ("Assumptions" , "Value")
return(ret)
}
}

Here are a couple of examples of how it works (the same function samp.clus() works for both a sampsi.prop object or a sampsi.means object:

ss <- sampsi.prop(0.1, 0.25, power = 0.8, two.sided = FALSE)

samp.clus(ss, rho = 0.05, obs.clus = 15)

 

##

Assumptions Value

## 1

n1 uncorrected

92

## 2

n2 uncorrected

92

## 3

ICC

0.05

## 4

Avg obs/cluster

15

## 5 Min num clusters

21

## 6

n1 corrected

157

## 7

n2 corrected

157

samp.clus(ss, rho = 0.05, num.clus = 150)

 

##

Assumptions Value

 

## 1

n1 uncorrected

92

## 2

n2 uncorrected

92

## 3

ICC

0.05

## 4

Avg obs/cluster

2

## 5 Min num clusters

150

## 6

n1 corrected

97

## 7

n2 corrected

97

ss2 <- sampsi.means(10, 15, sd1 = 15, power = 0.8)

 

samp.clus(ss2, rho = 0.05, obs.clus = 15)

 

##

Assumptions Value

 

## 1

n1 uncorrected

142

## 2

n2 uncorrected

142

## 3

ICC

0.05

## 4

Avg obs/cluster

15

## 5 Min num clusters

33

## 6

n1 corrected

242

## 7

n2 corrected

242

samp.clus(ss2, rho = 0.05, num.clus = 5)

## Error: Number of clusters is too small

## Note: the above won't work because not enough clusters given as input; this error message will

 

occur to warn you. You must increase the number of

 

## clusters to avoid the error.

 

Finally, I add one functionality that Stata doesn't have with their basic sample size calculations, which is graphing sample size as a function of power. Here, I created two functions for these purposes, for either proprotions or means. The input is the starting point and ending point for the power, as well as the same inputs and defaults from the corresponding sampsi.prop or sampsi.means functions:

graph.power.prop<-function(from.power, to.power,p1, p2, ratio=1, power=.90, alpha=.05, cont.corr=TRUE, two.sided=TRUE, one.sample=FALSE){ seq.p<-seq(from.power, to.power, by=.01) n<-rep(NA, length(seq.p)) for(i in
graph.power.prop<-function(from.power, to.power,p1, p2, ratio=1, power=.90, alpha=.05,
cont.corr=TRUE, two.sided=TRUE, one.sample=FALSE){
seq.p<-seq(from.power, to.power, by=.01)
n<-rep(NA, length(seq.p))
for(i in 1:length(seq.p)){
ob<-sampsi.prop(p1=p1, p2=p2, power=seq.p[i], alpha=alpha, ratio=ratio, cont.corr=cont.corr,
two.sided=two.sided, one.sample=one.sample)[[2]]
n[i]<-as.numeric(ob[7,2])
}
plot ( n, seq.p, ylab ="Power" , xlab ="n (in smaller arm)" , type ="l" ,
p1=" , p1, "and p2=" , p2 ))
main = paste ("Power graph for
}
graph.power.means<-function(from.power, to.power, m1, m2, sd1, sd2=NA, ratio=1, alpha=.05, cont.corr=TRUE, two.sided=TRUE, one.sample=FALSE){ seq.p<-seq(from.power, to.power, by=.01) n<-rep(NA, length(seq.p))
graph.power.means<-function(from.power, to.power, m1, m2, sd1, sd2=NA, ratio=1, alpha=.05,
cont.corr=TRUE, two.sided=TRUE, one.sample=FALSE){
seq.p<-seq(from.power, to.power, by=.01)
n<-rep(NA, length(seq.p))
for(i in 1:length(seq.p)){
ob<-sampsi.means(m1=m1, m2=m2, sd1=sd1, sd2=sd2, power=seq.p[i], alpha=alpha, ratio=ratio,,
one.sample=one.sample, two.sided=two.sided)[[2]]
n[i]<-as.numeric(ob[9,2])
}
plot ( n, seq.p, ylab ="Power" , xlab ="n (in smaller arm)" , type ="l" ,
m1=" , m1, "and m2=" , m2 ))
main = paste ("Power graph for
}

And this is what it looks like. This example will graph power as a function of sample size. We restrict the graph from \( \beta=0.6 \) to \( \beta=1 \).

graph.power.prop(0.6, 1, p1 = 0.2, p2 = 0.35)

POWER AND SAMPLE SIZE ANALYSIS: Z TEST

October 17, 2012

Abstract

This article provide a brief background about power and sample size analysis. Then, power and sample size analysis is computed for the Z test. Next articles will describe power and sample size analysis for:

one sample and two samples t test;,

p test, chi-square test, correlation;

one-way ANOVA;

DOE

.
.

Finally, a PDF article showing both the underlying methodology and the R code here provided, will be published.

Background

Power and sample size analysis are important tools for assessing the ability of a statistical test to detect when a null hypothesis is false, and for deciding what sample size is required for having a reasonable chance to reject a false null hypothesis.

The following four quantities have an intimate relationship:

  • 1. sample size

  • 2. effect size

  • 3. significance level = P(Type I error) = probability of finding an effect that is not there

  • 4. power = 1 – P(Type II error) = probability of finding an effect that is there

Given any three, we can determine the fourth.

Z test

The formula for the power computation can be implemented in R, using a function like the following:

powerZtest = function(alpha = 0.05, sigma, n, delta){ zcr = qnorm(p = 1-alpha, mean = 0, sd = 1) s = sigma/sqrt(n) power = 1 - pnorm(q = zcr, mean = (delta/s), sd = 1) return(power)

}

In the same way, the function to compute the sample size can be built.

sampleSizeZtest = function(alpha = 0.05, sigma, power, delta){ zcra=qnorm(p = 1-alpha, mean = 0, sd=1) zcrb=qnorm(p = power, mean = 0, sd = 1) n = round((((zcra+zcrb)*sigma)/delta)^2)

 

return(n)

}

The above code is provided for didactic purpose. In fact, the pwr package provide a function to perform power and sample size analysis.

install.packages("pwr")

library(pwr)

The function pwr.norm.test() computes parameters for the Z test. It accepts the four parameters see above, one of them passed as NULL. The parameter passed as NULL is determined from the others.

Some examples

Power at for , ,
Power at
for
,
,

against

Power at for , , against 100″ />.

100″ />.

sigma = 15 h0 = 100 ha = 105

This is the result with the self-made function:

> powerZtest(n = 20, sigma = sigma, delta = (ha-h0)) [1] 0.438749

And here the same with the pwr.norm.test() function:

> d = (ha - h0)/sigma > pwr.norm.test(d = d, n = 20, sig.level = 0.05, alternative = "greater")

Mean power calculation for normal distribution with known variance

  • d = 0.3333333

n = 20

sig.level = 0.05 power = 0.438749 alternative = greater

The sample size of the test for power equal to 0.80 can be computed using the self-made function

> sampleSizeZtest(sigma = sigma, power = 0.8, delta = (ha-h0)) [1] 56

or with the pwr.norm.test() function:

> pwr.norm.test(d = d, power = 0.8, sig.level = 0.05, alternative = "greater")

Mean power calculation for normal distribution with known variance

  • d = 0.3333333

n = 55.64302 sig.level = 0.05

power = 0.8 alternative = greater

The power function can be drawn:

ha = seq(95, 125, l = 100) pwrTest = pwr.norm.test(d = d, n = 20, sig.level = 0.05, alternative = "greater")$power plot(d, pwrTest, type = "l", ylim = c(0, 1))

power = 0.8 alternative = greater The power function can be drawn: <a href= V iew Code R ha = seq(95, 125, l = 100) pwrTest = pwr.norm.test(d = d, n = 20, sig.level = 0.05, alternative = "greater")$power plot(d, pwrTest, type = "l", ylim = c(0, 1)) View (and download) the full code: Download powerZtest.R ### Self-made functions to perform power and sample size analysis powerZtest = function(alpha = 0.05, sigma, n, delta){ zcr = qnorm(p = 1-alpha, mean = 0, sd = 1) s = sigma/sqrt(n) power = 1 - pnorm(q = zcr, mean = (delta/s), sd = 1) return(power) } sampleSizeZtest = function(alpha = 0.05, sigma, power, delta){ zcra=qnorm(p = 1-alpha, mean = 0, sd=1) zcrb=qnorm(p = power, mean = 0, sd = 1) n = round((((zcra+zcrb)*sigma)/delta)^2) return(n) } ### Load pwr package to perform power and sample size analysis library(pwr) ### Data sigma = 15 h0 = 100 ha = 105 ### Power analysis " id="pdf-obj-32-13" src="pdf-obj-32-13.jpg">

View (and download) the full code:

? Download powerZtest.R

### Self-made functions to perform power and sample size analysis powerZtest = function(alpha = 0.05, sigma, n, delta){ zcr = qnorm(p = 1-alpha, mean = 0, sd = 1) s = sigma/sqrt(n) power = 1 - pnorm(q = zcr, mean = (delta/s), sd = 1) return(power)

}

sampleSizeZtest = function(alpha = 0.05, sigma, power, delta){ zcra=qnorm(p = 1-alpha, mean = 0, sd=1) zcrb=qnorm(p = power, mean = 0, sd = 1) n = round((((zcra+zcrb)*sigma)/delta)^2) return(n)

}

### Load pwr package to perform power and sample size analysis library(pwr)

### Data sigma = 15 h0 = 100 ha = 105

### Power analysis

# Using the self-made function powerZtest(n = 20, sigma = sigma, delta = (ha-h0)) # Using the pwr package pwr.norm.test(d = (ha - h0)/sigma, n = 20, sig.level = 0.05, alternative = "greater")

### Sample size analysis # Using the self-made function sampleSizeZtest(sigma = sigma, power = 0.8, delta = (ha-h0)) # Using the pwr package

pwr.norm.test(d = (ha - h0)/sigma, power = 0.8, sig.level = 0.05, alternative = "greater")

### Power function for the two-sided alternative ha = seq(95, 125, l = 100) d = (ha - h0)/sigma pwrTest = pwr.norm.test(d = d, n = 20, sig.level = 0.05, alternative = "greater")$power plot(d, pwrTest, type = "l", ylim = c(0, 1))