Anda di halaman 1dari 9

SAMPLE SIZE AND POWER

Craig JACKSON, Fang Gao SMITH


______________________________________________________________________________ Clinical trials often involve the comparison of a new treatment with an established treatment (or a placebo) in a sample of patients, and the differences between the two treatment groups is analysed using a hypothesis test. It is important that the sample size is large enough to detect a treatment effect, at a given significance level, if there is one. If the sample size is too small there is the likelihood that a type II error will occur.

Types of error in hypothesis testing


The two types of error that can occur when using hypothesis tests are summarised in table 1.

1. Type I error. A type I error occurs if the null hypothesis is rejected i.e. a

significant result is obtained, when the null hypothesis is in fact true. A type I error is a false positive result. The probability of making a type 1 error is denoted as .

2. Type II error.

A type II error occurs if the null hypothesis is accepted i.e. an

insignificant result is obtained, when the null hypothesis is in fact not true (the alternative hypothesis is true). A type II error is a false negative result. The probability of making a type II error is denoted as .

The importance of sample size


In many published clinical trials, it is apparent that little consideration has been give to the appropriate sample size required to confirm or refute the investigators hypotheses. The default sample size is sometimes far too small to detect anything but the most gross difference. A non-significant result is reported and a type II error may occur. On the other hand, a sample that is too large is an unnecessary waste of clinical resources and brings ethical issues, regarding patients inconvenience and unnecessary discomfort etc., into question. It is therefore essential to make an assessment of the optimal sample size required before starting an investigation and to report the sample size calculation in the subsequent paper.

Factors affecting required sample size


In clinical trials involving, for example, the comparison of two independent groups, the sample size required depends on power,

clinically worthwhile difference to be detected, standard deviation of the variable and the nominated significance level. These factors are interdependent so that it is possible to calculate any one of them given the others.

1. Power. Power is the probability that a study of a given size would

detect as statistically significant a real difference of a given magnitude. It is generally recommended that the power of a clinical trial

should be at least 80% - 90%. A high power means that there is a high chance of detecting a

significant difference, if there is one, and a low chance of making a type II error. With a high power, if a result is non-significant, one can be reasonably sure, though not certain, that it is valid to accept the null hypothesis. Since the probability of not detecting a real difference between

study groups is (i.e. the probability of a type II error), the probability of detecting a real difference (the power) is 1-.

2. Minimum clinically worthwhile difference. (a). Hypothesised difference and sample size.

If the difference between two treatments is large then relatively

small samples are likely to produce a significant result. If the difference between treatments is small then much larger numbers are required. (b). Statistical significance and clinical difference. When differences are expected to be small, it becomes

important to distinguish between statistical significance and clinical significance. The investigator needs to define the minimum difference

between the groups that he is going to consider clinically relevant. For example, suppose that a new bronchodilator is believed to cause a real increase in tidal volume of 10 ml in patients with chronic bronchitis. The standard deviation of tidal volume in this population is likely to be considerably higher than this figure. It would be possible, however, given a huge sample, to demonstrate the real increase, but the exercise would be very expensive and very pointless because with such a small (but statistically significant) difference, the drug would be of little clinical consequence. Given a large enough sample, any difference, no matter how

small and trivial, can be made statistically significant. Hence, experience and judgement are needed when deciding

the minimum treatment effect that is going to be of value to patients

and which therefore justifies the time, effort and finance required to investigate it.

3. Standard deviation. The larger the standard deviations of the two groups relative to

the minimal clinically important difference to be detected, the larger the sample size that is going to be required; the smaller the standard deviations, the smaller the sample size required. The ratio of the minimal clinically important difference to the

standard deviation is referred to as the standardised difference and is used in the Altman normogram. The estimated SD. At the start of many investigations, an estimate of the standard deviation may not be readily available. There are several approaches to this problem: Perform a pilot study. Start the trial with the intention of estimating the likely standard

deviation from the first patients Use the standard deviation found in the investigators own

previous trials of a similar kind. Use the standard deviation quoted in similar trials or in similar

patients in similar circumstances reported in the literature.

3. Significance level. The significance level, , has an important bearing on the

sample size required. If the significance level is 0.01 rather than 0.05, a much larger sample size will be required to avoid a error, whereas if is 0.1 then a much smaller sample size will be adequate, though there is then an increased risk of making a type I error. There is a reciprocal relationship between and - as the

nominated decreases the chances of a error increases and vice versa. In other words, for a given sample size, if is 0.01 rather than 0.05, there is a smaller probability of making a type I error but greater probability of making type II error. An value of 0.05 implies that 1 trial in 20 will produce a type I error purely by chance. As a rule of thumb the probability of a type II error of a study

should be approximately 4 times the significance level chosen. For example, if the significance level is 5%, the power should be at least 80%. If the significance level is 1%, the power should be 95%.

Calculating the sample size


Mathematical methods are available for estimating the sample sizes required for every type of clinical trial, with both categorical and continuous data, comparisons of means and proportions, one, two or

more groups, paired and unpaired samples, groups of equal or unequal sizes. However, the formulae used are complex and for many investigator, the guidance of a statistician would be essential.

Two alternative approaches are available for medical investigators with limited mathematical knowledge. 1. The Altman normogram. (a). Advantages. The Nomogram is a simple and elegant idea that can be applied to categorical and continuous data including paired and unpaired samples and groups of unequal sizes. (b). Using the normogram. Figure 1 shows the normogram. It relates standardized

difference (left scale), significance level (middle scale) and power (right scale). In the next section, the normogram will be used to find the sample size required for a two independent group comparison of a continuous measurement. Let the standardized difference (smallest clinically worthwhile

difference/SD) = 1.1, the significance level = 0.05 and the power = 0.85. A straight line is drawn between 1.1 on the standardized difference scale and 0.85 on the power scale. The line passes through N = 30 on the 0.05 significance scale. Therefore, 15 patients in each group will be sufficient to detect the smallest

clinically important difference between the two groups with power of 0.85 at a significance level of 0.05. 2. Computers. Investigator-friendly computer software programs are available

that can calculate sample sizes quickly and automatically. Also, several pages on the world wide web are dedicated to

assisting surfers with automatic sample size calculations.

Further reading ________________________________________________________________


Altman DG. Statistics and ethics in medical research. III How large a sample? British Medical Journal. 1980; 281: 1336-1338. Altman DG. Clinical trials. In: Altman DG (Ed). Practical statistics for medical research. London, Chapman & Hall, 1991, pp. 440-474.

Related topics of interest ________________________________________________________________


Design of clinical trials, p. xxx; Choice of statistical tests, p. xxx;

Table 1. The types of errors associated with hypothesis tests: Decision to accept H0 Decision to reject H0 Type I error (false positive) Probability

H0 true

Correct decision

H0 false

Type II error (false negative) Probability

Right decision

Anda mungkin juga menyukai