Introduction
No physical measurements are ever exactly correct. Thus we generally make many measurements of a quantity and use the mean as our best estimate of the true value. The question
which then needs to be answered is; what is the likelihood that the true value lies within some
specified range about our mean value? Thus an estimate of the error of any result is always
necessary.
Errors are not simply mistakes like reading from the wrong end of a vernier or reading a
stop watch in tenths of a second when it is really graduated in fifths. Such mistakes have no
place in physics. Error in a physical world means uncertainty which one can reduce by being
very careful.
Errors are of two kinds - (1) Systematic and (2) Random.
Systematic Errors
Here the result is being altered in a regular determinable way by some unsuspected cause for
which no allowance has been made. Thus the room temperature may have altered during the
experiment without the experimenter being aware of the change. When he realizes that temperature variations are affecting his results, he can either (1) remove the cause (by thermostatic
control), (2) redesign his experiment to reduce the effect of the disturbing factor, (3) measure
the change in conditions and allow for this in calculating his new results or (4) convert the
systematic error into a random error by rearranging the work, e.g. repeating the experiment
at different times of the day and night.
Since the systematic errors may be constant or vary in some regular way with the value
measured, they are not revealed by repeating the experiment, but only become evident when
the conditions of the experiment are radically altered or when the physical property which
is being measured is determined in an entirely different way. For instance, the value of the
electronic charge e measured by Millikans oil drop experiment was affected for many years
by an unsuspected variation in the viscosity of air. The error was only discovered later when
another determination of e became possible from x-ray measurements of the spacing of crystal
lattices.
To summarize, a systematic error acts always in the same direction for the same conditions.
It is not revealed by repeated experiments and is therefore difficult to spot. Much thought and
cunning is needed to design experiments or find corrections which will eliminate it.
Random Errors
These are residual, usually small, errors of uncertain origin and irregular occurrence. To be
absolutely sure of our conclusions we would have to repeat the experiment from scratch thousands of times, correcting each result for known systematic errors. Suppose for instance, we are
1
measuring the length of a bench about 3 metres long. Our results could then vary from 300.01
cm to 300.07 cm.
Between 300.00 and 300.01 we would have 0 readings.
Between 300.01 and 300.02 we might have 5 readings.
Between 300.02 and 300.03 we might have 50 readings, and so on.
From these results we construct a histogram (figure 1).
Figure 1: Histogram
If we increase the number of measurements and further subdivide the ranges of 0.01 cm
intervals into much smaller intervals we finally approximate to a continuous curve known as a
frequency distribution or limiting distribution (figure 2). The limiting distribution can never
be measured exactly but is rather a theoretical construct.
curve also tells us the relative probability of any particular measurement and the spread of
these measurements around the true value. Thus it gives a quantitative idea of the accuracy
or reliability of the method of measurement.
The distribution curve is not necessarily symmetrical about the most frequent value, or
mode as it is called. If the curve is asymmetrical the mean or average value, x = xi /n of all n
of the readings xi , is not the same as the mode. Nevertheless unless we have sufficient readings
to prove asymmetry (which is not often) we usually assume that mean and mode are coincident
and the mean value represents the most probable or true value.
If we took a very large number of measurements, N , where N is allowed to approach infinity,
and computed the mean of these, we would expect the mean to be the true value. We will call
this true value the population mean, , where
N
1 X
xi
N i=1
(1)
The spread of values in this population is characterised by the population standard deviation, where
2 =
1 X
(xi )2
N
(2)
However, we have to make do with a finite sample of n measurements. From this finite
sample we then wish to estimate the true value and also estimate the likelihood that the real
true value lies within some specified range of our estimate of the true value. We can calculate
for our finite sample the following two quantities;
Sample mean x
Sample standard deviation, s, defined by
s
s=
1 X
(xi x)2
N
(3)
1 x
y = A exp
2
2 #
(4)
For the normal distribution, 68% of observations are between and 95% between 2
and 99% between 2.6 (figure 4). The percentages for other values may be found from
tables.
Statistical theory predicts that the standard deviation of x is / n, called the standard
error of the mean (SE). In practice, of course, we do not know and so we use the standard
4
deviation of our sample, s, as an estimate of , and s/ n as an estimate of the standard error
of the mean.
For a normally distributed quantity, any value picked at random will have a 68% chance of
lying within one standard deviation of the mean. As the sample means themselves are normally
distributed, with a standard deviation (of the means) equal to the standard error, there is a
68% chance that our sample mean lies within one standard error of the true mean. Conversely
there is a 68% chance that the true value lies within one standard error of our sample mean.
Example: 5 independent determinations of the mass, m, in grams of a piece of brass are:
x1
x1
x1
x1
x1
xi
= 10.13
= 10.20
= 10.18
= 10.17
= 10.17
x xi
0.04
-0.03
-0.01
0
0
The mean: x =
(
x xi )2
16 104
9 104
1 104
0
0
5
1X
xi = 10.17
5 i=1
v
u
5
u1 X
t
(x
5 i=1
x)2 = 0.0255
s
The standard error of the mean: SE = = 0.0114
5
The 95% confidence limit is 0.023 assuming a normal distribution.
The result is then correctly given as:
m = (10.17 0.02) g (95% confidence limit)
it is best to observe the range for a few seconds and estimate the most likely value and
uncertainty.
3. If only one determination of a physical quantity can be made in the time available (or
for some other reason) and if the accuracy of the measuring device is unknown (e.g. the
measurement of the length of a brass rod with a cheap ruler), then a reasonable guess for
the accuracy is limit of reading or the smallest graduated division.
Combining Errors
If two quantities x1 and x2 with standard deviations s1 and s2 are combined in a formula
x = x1 + x2 (or x = x1 x2 ). Then the variance s2 of x is given by
s2 = s21 + s22
(5)
(6)
(x1 )2 + (x2 )2
(x1 )2 (x2 )2
x u
=t
+
x
x21
x22
(8)
In 1st year practical classes you used a simplification of these formulae - directly adding
the errors or fractional errors. This simplification gives too large a result - it supposes that
all errors are in the same sense. In truth, the errors are as likely to cancel as to add. The
error distribution of a sum or difference is the convolution of the error distributions of the
contributing quantities; the convolution
of two normal distribution with SDs 1 and 2 is a
q
2
normal distribution with SD = 1 + 22 . From this result, the above formulae follow.
Graphing
To determine the experimental error we can apply two approaches. Either we repeat the
measurements so we can use the statistical analysis described previously or we take data over
a range of some parameter we can easily control. Once we have such a data series we can plot
that and fit the data to whatever relationship theory predicts. Generally we will plot our data
in such a way that we have a linear relationship and fit the data to a straight line. This might
take some manipulation of the measured quantities before we can plot them.
For example when examining the penetration of gamma radiation through matter we have
the relationship,
I = I0 ex
(9)
6
I is the intensity of the radiation (measured by the number of counts we detect), x is the
thickness of the particular material the radiation is passing through. is the attenuation
coefficient for the material. We can determine by plotting the right graph. Plotting the
natural logarithm of the number of counts ln(I), on the y-axis and x on the x-axis, the data
should be linear, according to the theory.
When plotting a straight line graph the best approach is to fit a straight line to your data
using the method of least squares. In this procedure the line is chosen so that the sum of the
squares of the distances (in the y-direction) from each point to the line is minimized. There
is an analytic expression that can be used to calculate the slope and y-intercept of the line
of best fit this way. This is done when you add a trendline in EXCEL for example. It wont
be necessary to calculate this from first principles, you can just use the built in functions in
EXCEL (or whatever graphing package you prefer). The error in the slope of the line and the
y-intercept can also be determined from how far away the points are from the fitted line. Every
time you plot a straight line graph you must include a measure of the uncertainty of the fit.
Error Exercises
Part 1: Statistical analysis of sample data
Look at the data sheet provided. This was taken by measuring background counts with a
Geiger-Muller tube and a counter. There are 200 background counts, each taken over a 30
second period. This data will comprise your population. The population has been divided into
twenty samples, each of ten readings, by considering each consecutive series of ten readings to
be a sample.
Using the spreadsheet program EXCEL carry out the following analysis. (Look at the
separate EXCEL help notes if you require assistance with using a spreadsheet.)
1. For each sample determine: the mean, standard deviation, standard error, 68% and
95% confidence limits. Use the functions AVERAGE (mean) and STDEV (standard
deviation). Then apply the required formula given in these notes for the remaining
quantities. (2 marks)
2. For the population determine the mean and standard deviation. What proportion of
your population lies within one and two standard deviations from the mean? Use the
FREQUENCY function to sort your data. (1 mark)
3. Plot a histogram for your population, (a column graph using your sorted data, show
each count value as a separate column, dont group them together (bins of 1)). Plot on
this histogram the normal curve corresponding to your population mean and standard
deviation (use equation 4 to generate the data, you can put any convenient value for A.
Add this as another series to your chart and then select it and change the chart type to
xy scatter. ) How well does the normal distribution describe your data? (2 marks)
4. Determine the proportion of your sample means which lie within one and two standard
errors from the population mean. (Note: Each sample has a different standard error.)
Comment on how well the sample means and sample standard deviations estimate the
population mean and standard deviation. (1 mark)
ln(C) (y-axis)
7.605
7.558
7.528
7.489
7.441
7.394
7.370