Anda di halaman 1dari 8

What is a P value?

Why do we need statistical calculations?

When analyzing data, your goal is simple: You wish to make the strongest possible conclusion from limited amounts of data. To do this, you need to overcome two problems: Important differences can be obscured by biological variability and experimental imprecision. This makes it hard to distinguish real differences from random variability. The human brain excels at finding patterns, even from random data. Our natural inclination (especially with our own data) is to conclude that differences are real, and to minimize the contribution of random variability. Statistical rigor prevents you from making this mistake. Statistical analyses are most useful when you are looking for differences that are small compared to experimental imprecision and biological variability. If you only care about large differences, you may follow these aphorisms: If you need statistics to analyze your experiment, then you've done the wrong experiment.If your data speak for themselves, don't interrupt! But in many fields, scientists care about small differences and are faced with large amounts of variability. Statistical methods are necessary.

Population vs. samples


The basic idea of statistics is simple: you want to extrapolate from the data you have collected to make general conclusions. Statistical analyses are based on a simple model. There is a large population of data out there, and you have randomly sampled parts of it. You analyze your sample to make inferences about the population. Consider several situations: Quality control Sample: The items you tested. Population: The entire batch of items produced. Political polls

Sample: The ones you polled. Population: All voters. Clinical studies Sample: Subset of patients who attended Tuesday morning clinic in August Population: All similar patients. Laboratory research Sample: The data you actually collected Population: All the data you could have collected if you had repeated the experiment many times the same way The logic of statistics assumes that your sample is randomly selected from the population, and that you only want to extrapolate to that population. This works perfectly for quality control. When you apply this logic to scientific data, you encounter two problems:

You don't really have a random sample. It is rare for a scientist to randomly select subjects from a population. More often you just did an experiment a few times and want to extrapolate to the more general situation. It is sufficient that your data be representative of the population, and that the population be hypothetical. You want to make conclusions that extrapolate beyond the population. The statistical inferences only apply to the population your samples were obtained from. Let's say you perform an experiment in the lab three times. All the experiments used the same cell preparation, the same buffers, and the same equipment. Statistical inferences let you make conclusions about what would happen if you repeated the experiment many more times with that same cell preparation, those same buffers, and the same equipment. You probably want to extrapolate further to what would happen if someone else repeated the experiment with a different source of cells, freshly made buffer and different instruments. Statistics can't help with this further extrapolation. You can use scientific judgment and common sense to make inferences that go beyond statistics. Statistical logic is only part of data interpretation.

Assumption of independence
It is not enough that your data are sampled from a population. Statistical tests are also based on the assumption that each subject (or each experimental unit) was sampled independently of the rest. The assumptions of independence is easiest to understand by studying counterexamples. You are measuring blood pressure in animals. You have five animals in each group, and measure the blood pressure three times in each animal.

You do not have 15 independent measurements, because the triplicate measurements in one animal are likely to be closer to each other than to measurements from the other animals. You should average the three measurements in each animal. Now you have five mean values that are independent of each other. You have done a laboratory experiment three times, each time in triplicate. You do not have nine independent values. If you average the triplicates, you do have three independent mean values. You are doing a clinical study, and recruit ten patients from an inner city hospital and ten more patients from a suburban clinic. You have not independently sampled 20 subjects from one population. The data from the ten inner-city patients may be closer to each other than to the data from the suburban patients. You have sampled from two populations, and need to account for this in your analysis. This is a complicated situation, and you should probably contact a statistician.

P values
Definition of a P value

Consider an experiment where you've measured values in two samples, and the means are different. How sure are you that the population means are different as well? There are two possibilities: The populations have different means. The populations have the same mean, and the difference you observed is a coincidence of random sampling. The P value is a probability, with a value ranging from zero to one. It is the answer to this question: If the populations really have the same mean overall, what is the probability that random sampling would lead to a difference between sample means as large (or larger) than you observed? How are P values calculated? There are many methods, and you'll need to read a statistics text to learn about them. The choice of statistical tests depends on how you express the results of an experiment (measurement, survival time, proportion, etc.), on whether the treatment groups are paired, and on whether you are willing to assume that measured values follow a Gaussian bell-shaped distribution.
Common misinterpretation of a P value

Many people misunderstand what question a P value answers. If the P value is 0.03, that means that there is a 3% chance of observing a difference as large as you observed even if the two population means are identical. It is tempting to conclude, therefore, that there is a 97% chance that the difference you observed reflects a real difference between populations and a 3% chance that the difference is due to chance. Wrong. What you can say is that random sampling from identical populations would lead to a difference smaller than you observed in 97% of experiments and larger than you observed in 3% of experiments. You have to choose. Would you rather believe in a 3% coincidence? Or that the population means are really different?
"Extremely significant" results

Intuitively, you probably think that P=0.0001 is more statistically significant than P=0.04. Using strict definitions, this is not correct. Once you have set a threshold P value for statistical significance, every result is either statistically significant or is not statistically significant. Some statisticians feel very strongly about this. Many scientists are not so rigid, and refer to results as being "very significant" or "extremely significant" when the P value is tiny. Often, results are flagged with a single asterisk when the P value is less than 0.05, with two asterisks when the P value is less than 0.01, and three asterisks when the P value is less than 0.001. This is not a firm convention, so you need to check the figure legends when you see asterisks to find the definitions the author used.
One- vs. two-tail P values

When comparing two groups, you must distinguish between one- and two-tail P values. Start with the null hypothesis that the two populations really are the same and that the observed discrepancy between sample means is due to chance. The two-tail P value answers this question: Assuming the null hypothesis, what is the chance that randomly selected samples would have means as far apart as observed in this experiment with either group having the larger mean? To interpret a one-tail P value, you must predict which group will have

the larger mean before collecting any data. The one-tail P value answers this question: Assuming the null hypothesis, what is the chance that randomly selected samples would have means as far apart as observed in this experiment with the specified group having the larger mean? A one-tail P value is appropriate only when previous data, physical limitations or common sense tell you that a difference, if any, can only go in one direction. The issue is not whether you expect a difference to exist - that is what you are trying to find out with the experiment. The issue is whether you should interpret increases and decreases the same. You should only choose a one-tail P value when you believe the following: Before collecting any data, you can predict which group will have the larger mean (if the means are in fact different). If the other group ends up with the larger mean, then you should be willing to attribute that difference to chance, no matter how large the difference. It is usually best to use a two-tail P value for these reasons: The relationship between P values and confidence intervals is more clear with two-tail P values. Some tests compare three or more groups, which makes the concept of tails inappropriate (more precisely, the P values have many tails). A two-tail P value is more consistent with the P values reported by these tests. Choosing a one-tail P value can pose a dilemma. What would you do if you chose a one-tail P value, but observed a large difference in the opposite direction to the experimental hypothesis? To be rigorous, you should conclude that the difference is due to chance, and that the difference is not statistically significant. But most people would be tempted to switch to a two-tail P value or to reverse the direction of the experimental hypothesis. You avoid this situation by always using two tail P values.

Statistical hypothesis testing


The P value is a fraction. In many situations, the best thing to do is report that number to summarize the results of a comparison. If you do this, you can totally avoid the term "statistically significant", which is often misinterpreted. In other situations, you'll want to make a decision based on a single

comparison. In these situations, follow the steps of statistical hypothesis testing. 1. Set a threshold P value before you do the experiment. Ideally, you should set this value based on the relative consequences of missing a true difference or falsely finding a difference. In fact, the threshold value (called alpha) is traditionally almost always set to 0.05. 2. Define the null hypothesis. If you are comparing two means, the null hypothesis is that the two populations have the same mean. 3. Do the appropriate statistical test to compute the P value. 4. Compare the P value to the preset threshold value. If the P value is less than the threshold, state that you "reject the null hypothesis" and that the difference is "statistically significant". If the P value is greater than the threshold, state that you "do not reject the null hypothesis" and that the difference is "not statistically significant". Note that statisticians use the term hypothesis testing very differently than scientists.
Statistical significance

The term significant is seductive, and it is easy to misinterpret it. A result is said to be statistically significant when the result would be surprising if the populations were really identical. A result is said to be statistically significant when the P value is less than a preset threshold value. It is easy to read far too much into the word significant because the statistical use of the word has a meaning entirely distinct from its usual meaning. Just because a difference is statistically significant does not mean that it is important or interesting. And a result that is not statistically significant (in the first experiment) may turn out to be very important. If a result is statistically significant, there are two possible explanations: The populations are identical, so there really is no difference. You happened to randomly obtain larger values in one group and smaller values in the other, and the difference was large enough to generate a P value less than the threshold you set. Finding a statistically significant result when the populations are identical is called making a Type I error. The populations really are different, so your conclusion is correct. There are also two explanations for a result that is not statistically significant: The populations are identical, so there really is no difference. Any difference you observed in the experiment was a coincidence. Your

conclusion of no significant difference is correct. The populations really are different, but you missed the difference due to some combination of small sample size, high variability and bad luck. The difference in your experiment was not large enough to be statistically significant. Finding results that are not statistically significant when the populations are different is called making a Type II error.

Confidence intervals
Statistical calculations produce two kinds of results that help you make inferences about the populations from the samples. You've already learned about P values. The second kind of result is a confidence interval.
95% confidence interval of a mean

Although the calculation is exact, the mean you calculate from a sample is only an estimate of the population mean. How good is the estimate? It depends on how large your sample is and how much the values differ from one another. Statistical calculations combine sample size and variability to generate a confidence interval for the population mean. You can calculate intervals for any desired degree of confidence, but 95% confidence intervals are used most commonly. If you assume that your sample is randomly selected from some population, y ou can be 95% sure that the confidence interval includes the population mean. More precisely, if you generate many 95% CI from many data sets, you expect the CI to include the true population mean in 95% of the cases and not to include the true mean value in the other 5%. Since you don't know the population mean, you'll never know for sure whether or not your confidence interval contains the true mean.
Other situations

When comparing groups, calculate the 95% confidence interval for the difference between the population means. Again interpretation is straightforward. If you accept the assumptions, there is a 95% chance that the interval you calculate includes the true difference between population means. Methods exist to compute a 95% confidence interval for any calculated statistic, for example the relative risk or the best-fit value in nonlinear regression. The interpretation is the same in all cases. If you accept the assumptions of the test, you can be 95% sure that the interval contains

the true population value. Or more precisely, if you repeat the experiment many times, you expect the 95% confidence interval will contain the true population value in 95% of the experiments.
Why 95%?

There is nothing special about 95%. It is just convention that confidence intervals are usually calculated for 95% confidence. In theory, confidence intervals can be computed for any degree of confidence. If you want more confidence, the intervals will be wider. If you are willing to accept less confidence, the intervals will be narrower.

Read about the book Intuitive Biostatistics GraphPad Home

Anda mungkin juga menyukai