Step 1. Find out the midpoint of each class, by adding its endpoints i) x2 =22 + 52 + 62 + 72 + 82 = 178
and dividing by two. Add it to the table. Call this column "x" ii) 5 = 35.6
x2 /n = 178
iii) Mean = (2 + 5 + 6 + 7 + 8) 5 = 5.6 Mean2 = 5.62 = 31.36
Step 2. Add another column, and put in it the values of iv) x /n mean = 35.6 31.36 = 4.24
2 2
1
Answering Exam Questions on Statistics Bio Factsheet
www.curriculumpress.co.uk
Interpreting the mean and standard deviation Degrees of freedom: you do not need to know the exact meaning, although
The mean, of course, is the average - but that does not mean half the values you do need to know how to calculate them (see below). The idea is that
are below and half above it, or that it is a common value. For example, the the amount of data you have affects the critical value - this is because you
mean of the values 1, 1, 2, 3, 100 is 21.4; this is nowhere near any of the actual are much more likely to get unusual results by chance if you only have a few
values, and four out of the five values are below it! observations, than if you have a lot of observations.
The mean also does not distinguish betwee these two data sets:- Interpreting results and drawing conclusions
A: 48, 49, 50, 51, 52 You must remember that if the value you calculate (the test statistic) is
B: 35, 40, 50, 62, 63 greater than the value from the tables (the critical value), then you reject
Both sets of data have mean 50, but they are not very similar. the null hypothesis. Otherwise you accept it.
This is where the standard deviation comes in. This measures how spread You then need to relate this back to the original hypotheses; this will be
out the data are - the bigger the standard deviation, the greater the spread. discussed in more detail for each test.
For example, for data set A above, the standard deviation is 1.414, and for
set B, it is 11.296. Choose your words carefully - a statistical test does not "prove" a
hypothesis is true - there is always a chance that a wrong decision could be
So, for example if you know the following: made. It is normal to say "the result is significant at the 5% level" or "the
Data set 1: mean = 45.2 standard deviation = 2.13 alternative hypothesis was accepted at the 5% level".
Data set 2: mean = 43.7 standard deviation = 10.03
We know that data set 2 is more spread out than data set 1. Let's consider The remainder of the section is divided between the chi-squared test and the
which would be more likely to have a value in it above 50, say. t-test.
For data set 1, 50 is more than 2 standard deviations away from the mean
(45.2 + 2 2.13 = 49.46)
Chi-squared test
For data set 2, 50 is less than 1 standard deviation away from the mean
There are two types main types of chi-squared test you may have to do:
(43.7 + 10.03 = 53.73).
a) Testing to see if there is a difference
This tells us that 50 is a less "extreme" or "uncommon" value for data set
b) Testing to see if the theoretical ratios predicted by genetics apply
2 than for data set 1. So data set 2 is more likely to have values above 50.
The hypotheses for the tests are
Statistical tests a) H0: there is no difference between the different conditions
In the exam, you will always be told which statistical test to use if you are H1: there is a difference between the different conditions
being required to do calculations. You will be given any tables you need.
There are various types of questions:- b) H0: the observations are in accordance with the predictions of genetics
understanding statistical terms like degrees of freedom, significance, etc H1: the observations are not in accordance with the predictions of
interpreting results and drawing conclusions genetics
doing the calculations according to the test formula Calculations for the test formula
finding degrees of freedom In chi-squared, you will need to calculate expected frequencies, and then
using statistical tables the value of chi-squared, using the formula:
Some of these are the same for both t-test and chi-squared; others are specific
(O E- E)
2 O is observed values - the data from the question
to the test. 2 =
E is expected values - the ones you calculate
means sum of
Understanding statistical terms
Hypotheses: the purpose of a statistical test is to decide between the null a) To calculate expected values when you are testing for a difference, you
hypothesis and the alternative hypothesis. The exact form of these just add up all the values and divide by the number of them.
hypotheses depends on the test. When you are carrying out the test, you
accept the null hypothesis, unless you have convincing evidence otherwise b) To calculate expected values for genetics, you have to use the genetic
(in a court of law, the "null hypothesis" is that the person is innocent - he ratio. The procedure is:
is only decided to be guilty if there is enough evidence). i) Add up all the values from the data you are given
ii) Add up all the numbers in the genetic ratio
Test statistic: this is the value calculated from your data. The formula for (eg for 9:3:3:1, do 9 + 3 + 3 + 1 = 16)
it depends on the test you are doing. This tells you the number of parts you will be dividing your total
from i) into.
Critical value: this is the value you compare the test statistic to, to decide iii) Find out how much one part is, by dividing your total from i) by your
whether you are going to accept or reject the null hypothesis. total from ii)
For both t-test and chi-squared test, you reject the null hypothesis if your iv) Find out the expected frequencies, by multiplying one part by the
test statistic is greater than the critical value. numbers in the ratio (eg by 9, 3, 3 and 1)
Critical values come from statistical tables.
Once you have calculated the expected frequencies, you substitute into the
Significance level: It is possible to reject the null hypothesis even if it is formula above to find the chi-squared value.
true, because "unusual" results can occur by chance (eg it is possible -
although unlikely - to get 100 heads in succession when tossing a coin). Finding degrees of freedom
The significance level is the chance of rejecting the null hypothesis when it You need to learn this formula:
is true. These may be written as percentages (10%, 5%, 1%) or as decimals
(0.1, 0.05, 0.01). For chi-squared:
The normal significance level in science is 5%. Use this unless you degrees of freedom = number of categories - 1
are told otherwise.
2
Answering Exam Questions on Statistics Bio Factsheet
www.curriculumpress.co.uk
t-test
There are two types of t-test, paired and unpaired. The exam will always
make it clear which you should do. You will always be given the relevant Common mistakes
formulae. These are some of the commonest errors candidates make:-
The hypotheses for both tests are Rounding errors, due to rounding too early. If in doubt, use all the
H0: mean 1 = mean 2 figures.
H1: mean 1 mean 2 It is useful to keep figures in your calculator, to avoid having to keep
(This is a 2-tailed test - you may also come across 1-tailed tests, but in the writing down and re-entering data. Learn how to use your calculator
exam you will never have to choose between the two) memory.
Calculations for the test formula Calculator errors - putting the correct figures into the calculator
The calculations for either type of type of t-test are similar to those for finding wrongly. See the calculator tips in this Factsheet and practice using
means and standard deviations. You also need to be able to substitute into your calculator well before the exam.
a formula. Provided you can do calculations like the ones on page 1, you will
not have a problem with these. Remember, you will be given any formulae Failure to show working - hence throwing away all the marks if there
you require. is even one tiny error in calculation.
The paired t-test first requires you to find the differences between each pair
of values. You then work with these differences only.
Failure to recall the formulae for degrees of freedom - these have
to be learnt. If you get them wrong, they will invalidate your tables
x is the mean of the differences value and your conclusion.
x (n -1) n is the number of pairs
paired t-test: t =
s s is the standard deviation of the Not drawing conclusions correctly - you must learn that if your
differences calculated value is larger than the tables value, you reject the null
hypothesis.
In the unpaired t-test, you will need to use these formulae:
s= x12 - n1x12 + x22 - n2x22 x1 and x2 are the means of the Getting the hypotheses the wrong way round - if your calculated
n1 + n2 - 2 two samples result is greater than the tables value, then:
x1 - x2 n1 and n2 are the sizes of the for the t-test, there is a difference between the means
t = two samples for testing for a difference in chi-squared, there is a difference
1+1 means "sum of"
s n n
1 2
for genetics chi-squared, the results are not as predicted by
genetics
Exam questions will get you to do these calculations bit by bit and "follow
through" marks are likely to be awarded - so if you calculate s wrong, for
example, but use your value correctly to calculate the value of t, then you
will get the rest of the marks.
Calculator Tips:-
To carry out any calculation that is set out as a fraction, you
must put brackets round the top and round the bottom.
It is probably easier to work out the number inside the square-
root first, then take the square root, rather than trying to do it all
in one go.