Lectures
MODULE 25 STATISTICS II
1. Mean and standard error of sample data
2. Binomial distribution
3. Normal distribution
4. Sampling
5. Confidence intervals for means
6. Hypothesis testing
Ex 1. A new plant at a manufacturing site has to be installed and then commissioned. The times required
for the two steps depend upon different random factors, and can therefore be regarded as independent. Based
on past experience the respective distributions for X (installation time) and Y (commissioning time), both
in days, are
P (X = 3) = 0.2, P (X = 4) = 0.5, P (X = 5) = 0.3
P (Y = 2) = 0.4, P (Y = 3) = 0.6
Find the joint distribution for X and Y , and the probability that the total time will not exceed 6 days.
Since the factors are independent P (X = xi ∩ Y = yj ) = P (X = xi ) P (Y = yj ). Thus the joint probability
table is
Y /X 3 4 5 Y /X 3 4 5
2 0.2 × 0.4 0.5 × 0.4 0.3 × 0.4 i.e. 2 0.08 0.20 0.12
3 0.2 × 0.6 0.5 × 0.6 0.3 × 0.6 3 0.12 0.30 0.18
Note that the column and row totals give the individual distributions for X and Y .
For the second part of the question
P (X + Y ≤ 6) = P (X = 3 ∩ Y = 2) + P (X = 3 ∩ Y = 3) + P (X = 4 ∩ Y = 2) :
no other combinations are possible. The joint distributions can then be read from the table giving
It would be easy
X to calculate P (X + Y = wi ) for each wi , and then evaluate E(X + Y ) using the expression
E(X + Y ) = wi P (X + Y = wi ) .
wi
1
X
However, we can use a more general result. Given that E(X) = xi P (X = xi ) and that E(Y ) satisfies
xi
a similar formula, then it can be shown that for independent variables
E(X + Y ) = E(X) + E(Y ) and Var(X + Y ) = Var(X) + Var(Y ).
Given some data we usually do not know the exact distribution – but it would be useful to try to estimate
the mean and variance from the data
n
1 X
Def. For a sample {X1 , X2 , ..., Xn } of data the sample average is defined by X = Xi
n i=1
n
1 X
Def. For sample data {X1 , X2 , ..., Xn } the sample variance is defined by SX 2 = (Xi − X)2 .
n−1
i=1
[Note that n − 1 is used in the denominator in SX 2 because the differences Xi − X sum to zero and so are
not independent.]
Ex 2. A die was tossed six times producing the following results 6, 2, 4, 2, 1, 5 . Find the sample average
and the standard variance.
1 20 10
X = (6 + 2 + 4 + 2 + 1 + 5) = = ∼ 3.33
6 " 6 3
2 2 2 2 2 2 #
2 1 10 10 10 10 10 10
SX = 6− + 2− + 4− + 2− + 1− + 5−
6−1 3 3 3 3 3 3
" #
2 2 2 2 2 2
1 8 4 2 4 7 5
= + − + + − + − +
5 3 3 3 3 3 3
1 64 + 16 + 4 + 16 + 49 + 25 174 58
= = = ∼ 3.87 → SX ∼ 1.97
5 9 45 15
[For unbiased die it can be shown that the theoretical results for X and SX for a large number of tosses
are 3.5 and 1.708 respectively. ]
2. Binomial distribution
Specifying the exact distribution of a random variable requires a lot of information. Good estimates of mean,
variance etc. can be obtained from data. Often probability distributions can be determined by formulae
using the estimated values of the parameters.
Consider the simple coin tossing experiment where only two outcomes are possible – success (say 1) or failure
(say 0). A Bernoulli trial is a simple observation of a random variable X , say, that can take the values
1 or 0: suppose
P (X = 1) = p, P (X = 0) = 1 − p .
Then
E(X) = X = 1(p) + 0(1 − p) = p + 0 = p ,
and Var(X) = E(X − X)2 = E(X − p)2 = (0 − p)2 P (X = 0) + (1 − p)2 P (X = 1)
= p2 (1 − p) + (1 − p)2 p = p(1 − p)(p + (1 − p)) = p(1 − p) .
Now let {X1 , ..., Xn } denote n independent Bernoulli trials, each with success probability p . Then
number of successes Y = X1 + X2 + . . . + Xn . Suppose Y = k , where 0 ≤ k ≤ n , then k of the Xi values
equal 1 and n − k equal 0. The probability of k 1’s at first with n − k 0’s following is pk (1 − p)n−k , since
the outcomes are independent.
n n!
The number of ways of distributing k successes among n trials = = ,
k (n − k)!k!
n
hence P (Y = k) = pk (1 − p)n−k .
k
2
This is the general form of the binomial distribution and leads to
Ex 3. If on average 1 in 20 of a certain type of column fails under loading, what is the probability that
among 16 such columns at most 2 will fail?
Given
1 19
P (column failing) = P (F ) = = 0.05 , P (column not failing) = P (F ) = = 0.95 ,
20 20
therefore
P (0 F ) = (0.95)16 = 0.44013,
16 16!
P (1 F ) = (0.05)1 (0.95)15 = (0.05)(0.46329) = (0.05)(0.46329)(16) = 0.37063
1 15! 1!
16 16! (16)(15)
P (2 F ) = (0.05)2 (0.95)14 = (0.0025)(0.48767) = (0.0025)(0.48767) = 0.14630
2 14! 2! 2
→ P (at most 2 fail) = 0.44013 + 0.37063 + 0.14630 = 0.9571
3. Normal distribution
This occurs very frequently in practice.
Def. A continuous random variable X has a normal distribution (or Gaussian distribution) with
mean µX and variance σX 2 if the probability density function satisfies
1 (x − µX )2
fX (x) = √ exp − (−∞ < x < ∞) .
σX 2π 2σX 2
Write X ∼ N (µX , σX 2 ) to represent a random variable with normal distribution which has mean µX and
variance σX 2 (see figure 3).
fX
0.4
µ =0, σ =1
X X
µ =0, σ =3
X X
-6 -4 -2 0 2 4 6 x
figure 3
The standard normal distribution has mean 0 and variance 1, and leads
Z z to
1 2
Def. Standard normal cumulative distribution is Φ(z) = √ e−x /2 dx .
2π −∞
Φ(z) is usually tabulated, and a summary of values appears on the Formula Sheet. Note that Φ(z) denotes
the area under the probability distribution curve to the left of x = z.
Suppose that X is a normal variable with mean µX and variance σX 2 then it can be shown that
X − µX
Z= is also a normal variable with mean 0 and variance 1.
σX
(The latter result is very useful in applications.)
3
Ex 4. The burning time X of an experimental rocket is a random variable having (approximately) a
normal distribution with mean 600s and standard deviation 25s. Find the probability that such a rocket
will burn for (a) less than 550s, (b) more than 637.5s.
(a) Given µX = 600 and σX = 25 , therefore
X − 600 550 − 600
P (X < 550) = P < = P (Z < −2) ,
25 25
X − 600
where Z = . From symmetry (look at figure 3), and then using the results on the Formula Sheet,
25
4. Sampling
One of the major problems in statistics, estimating the properties of a large population from the properties
of a sample of individuals chosen from that population, is considered in this section.
Select at random a sample of n observations X1 , X2 , . . . , Xn taken from a population. From these n
observations you can calculate the values of a number of statistical quantities, for example the sample mean
X. If you choose another random sample of size n from the same population, a different value of the statistic
will, in general, result. In fact, if repeated random samples are taken, you can regard the statistic itself as a
random variable, and its distribution is called the sampling distribution of the statistic.
For example, consider the distribution of heights of all adult men in England, which is known to conform
very closely to the normal curve. Take a large number of samples of size four, drawn at random from the
population, and calculate the mean height of each sample. How will these mean heights be distributed? We
find that they are also normally distributed – about the same mean as the original distribution. However,
a random sample of four is likely to include men both above and below average height and so the mean of
the sample will deviate from the true mean less than a single observation will. This important general result
can be stated as follows:
If random samples of size n are taken from a distribution whose mean is µX and whose standard
√ σX , then the sample means form a distribution with mean µX and standard deviation
deviation is
σX = σX / n.
Note that the theorem holds for all distributions of the parent population. However, if the parent distribution
is normal then it can be shown that the sampling distribution of the sample mean is also normal.
The standard deviation of the sample mean, σX defined above, is usually called the standard error of
the sample mean.
Let us now present three worked examples.
Ex 5. A random sample is drawn from a population with a known standard deviation of 2.0. Find the
standard error of the sample mean if the sample is of size (i) 9, (ii) 100. What sample size would give a
standard error equal to 0.5?
Using the result stated earlier
σX 2
(i) standard error = √ = √ = 0.667, to 3 decimal places,
n 9
4
σX 2
(ii) standard error = √ = √ = 0.2
n 100
√
If the standard error equals 0.5, then 2/ n = 0.5. Squaring then implies that 4/n = 0.25 or n = 16, (i.e.
the sample size is 16).
Ex 6. The diameters of shafts made by a certain manufacturing process are known to be normally dis-
tributed with mean 2.500 cm and standard deviation 0.009 cm. What is the distribution of the sample mean
diameter of nine such shafts selected at random? Calculate the percentage of such sample means which can
be expected to exceed 2.506 cm.
Since the process is normal we know that the sampling distribution of the sample mean will also
√ be normal,
with the same mean, 2.500 cm, but with a standard error (or standard deviation) σX = 0.009/ 9 = 0.003 cm.
In order to calculate the probability that the sample mean is bigger than 2.506, i.e. X > 2.506, we standardise
in the usual way by putting Z = (X − 2.500)/0.003, and then
X − 2.500 2.506 − 2.500
P (X > 2.506) = P > = P (Z > 2.0)
0.003 0.003
= 1 − P (Z ≤ 2.0) = 1 − Φ(2.0)
= 1 − .9772 = 0.0228, using the formula sheet
Hence, 2.28% of the sample means can be expected to exceed 2.506 cm.
Ex 7. What is the probability that an observed value of a normally distributed random variable lies within
one standard deviation from the mean?
Given normally distributed random variable, X , has mean µX and standard deviation σX , i.e. X ∼
X − µX
N (µX , σX 2 ). We need to calculate P (µX − σX ≤ X ≤ µX + σX ). Define Z = , then Z ∼ N (0, 1).
σX
It follows that
(µX − σX ) − µX X − µX (µX + σX ) − µX
P (µX − σX ≤ X ≤ µX + σX ) = P ≤ ≤
σX σX σX
= P (−1 ≤ Z ≤ 1) = 2 P (0 ≤ Z ≤ 1), by symmetry
= 2(Φ(1) − Φ(0)) = 2(0.8413 − 0.5000)
= 2(0.3413) = 0.6826
It was stated above that when the parent distribution is normal then the sampling distribution of the
sample mean is also normal. When the parent distribution is not normal, then obtain the following theorem
(surprising result?):
Central limit theorem If a random sample of size n, (n ≥ 30), is taken from ANY distribution with
mean µX and standard deviation σX ,√then the sampling distribution of X is approximately normal with
mean µX and standard deviation σX / n , the approximation improving as n increases.
Ex 8. It is known that a particular make of light bulb has an average life of 800 hrs with a standard
deviation of 48 hrs. Find the probability that a random sample of 144 bulbs will have an average life of less
than 790 hrs.
Since the number of bulbs in the sample is large, the sample mean will be normally distributed with mean
48 X − µX (X − 800)
= 800 and standard error σX = √ = 4 . Put Z = = , then
144 σX 4
X − 800 790 − 800
P (X < 790) = P <
4 4
= P (Z < −2.5) = P (Z > 2.5), by symmetry
= 1 − P (Z ≤ 2.5) = 1 − 0.9938 = 0.0062 .
5
To conclude this section the main results concerning the distribution of the sample mean X are summarised.
Consider a parent population with mean µX and standard deviation σX . From this population take a
√ X − µX
random sample of size n with sample mean X and standard error σX / n . Define Z = √ then
σX / n
(i) for all n — the distribution of Z is N (0, 1) if the distribution of the parent population is normal;
(ii) n < 30 — the distribution of Z is approximately N (0, 1) if the distribution of the parent population is
approximately normal;
(iii) n ≥ 30 — the distribution of Z is a good approximation to N (0, 1) for all distributions of the parent
population.
P (X − k ≤ µX ≤ X + k) = .95
then the interval is called the 95% confidence interval – if the interval is calculated for very many samples
then 95 out of 100 intervals would contain µX . On replacing 95 by 99 you obtain the definition of a 99%
confidence interval.
X − µX
To proceed further, assume that the standard deviation σX is known and that Z = √ is distributed
σX / n
with the standard normal distribution N (0, 1) (the conditions for this to hold were stated at the end of
section 4). The results on the Formula Sheet show that 95% of the standard normal distribution lies between
−1.96 and 1.96. Hence
P (−1.96 ≤ Z ≤ 1.96) = 0.95 .
Rearranging the inequalities inside the bracket to obtain conditions on X yields
X − µX √ √
Z= √ ≤ 1.96 → X − µX ≤ 1.96σX / n → µX ≥ X − 1.96σX / n;
σX / n
X − µX √ √
Z= √ ≥ −1.96 → X − µX ≥ −1.96σX / n → µX ≤ X + 1.96σX / n.
σX / n
It follows that the earlier expression can be re-written as
σX σX
P X − 1.96 √ ≤ µX ≤ X + 1.96 √ = 0.95 ,
n n
6
Ex 9. The percentage of copper in a certain chemical is to be estimated by taking a series of measure-
ments on randomly chosen small quantities of the chemical and using the sample mean to estimate the true
percentage. From previous experience individual measurements of this type are known to have a standard
deviation of 2.0%. How many measurements must be made so that the standard error of the estimate is
less than 0.3%? If the sample mean w of 45 measurements is found to be 12.91%, give a 95% confidence
interval for the true percentage, ω .
√
Assume that n measurements √ are made. The standard2 error of the sample mean is (2/ n)%. For the
required precision require 2/ n < 0.3 , i.e. n > (2/0.3) = 4/0.09 = 44.4 . Since n must be an integer, at
least 45 measurements are necessary for required precision.
With a sample of 45 measurements, you can use the central limit theorem √ and take the sample mean
percentage W to be distributed normally with mean ω and standard error 2/ 45. Hence, if ω is the true
W −ω
percentage, it follows that Z = √ is distributed as N (0, 1). Since 95% of the area under the standard
2/ 45
normal curve lies between Z = −1.96 and Z = 1.96 ,
W −ω
P −1.96 ≤ √ ≤ 1.96 = 0.95 .
2/ 45
2 2
Re-arranging, we obtain P W − 1.96 √ ≤ ω ≤ W + 1.96 √ = 0.95 .
45 45
Hence, the 95% confidence interval for the true percentage is
6. Hypothesis testing
An assumption made about a population is called a statistical hypothesis. From information contained
in a random sample we try to decide whether or not the hypothesis is true:
if evidence from the sample is inconsistent with the hypothesis, then hypothesis is rejected;
if the evidence is consistent with the hypothesis, then the hypothesis is accepted.
The hypothesis being tested is called the null hypothesis (usually denoted by H0 ) – it either specifies a
particular value of the population parameter or specifies that two or more parameters are equal.
A contrary assumption is called the alternative hypothesis (usually denoted by H1 ) – normally specifies
a range of values for the parameter.
A common example of the null hypothesis is H0 : µX = µ0 . Then three alternative hypotheses are
Types (i) and (ii) are said to be one-sided (or one-tailed, see figure 6b) – type (iii) is two-sided (or
two-tailed, see figure 6a).
The result of a test is a decision to choose H0 or H1 . This decision is subject to uncertainty, and two types
of error are possible:
7
(i) a type I error occurs when we reject H0 on the basis of the test although it happens to be true – the
probability of this happening is called the level of significance of the test and this is prescribed before
testing – most commonly chosen values are 5% or 1%.
(ii) a type II error occurs when you accept the null hypothesis on the basis of the test although it happens
to be false.
The above ideas are now applied to determine whether or not the mean, X, of a sample is consistent with
a specified population mean µ0 . The null hypothesis is H0 : µX = µ0 and a suitable statistic to use is
X − µ0
Z= √ , where σX is the standard deviation of the population and n is the size of the sample.
σX / n
Find the range of values of Z for which the null hypothesis would be accepted – known as acceptance
region for the test – depends on the pre-determined significance level and the choice of H1 .
Corresponding range of values of Z for which H0 is rejected (i.e. not accepted) is called the rejection
region.
Ex 10. A standard process produces yarn with mean breaking strength 15.8 kg and standard deviation
1.9 kg. A modification is introduced and a sample of 30 lengths of yarn produced by the new process is tested
to see if the breaking strength has changed. The sample mean breaking strength is 16.5 kg. Assuming the
standard deviation is unchanged, is it correct to say that there is no change in the mean breaking strength?
Here H0 : µX = µ0 , H1 : µX 6= µ0 ,
where µ0 = 15.8 and µX is the mean breaking strength for the new process.
X − µ0
If H0 is true (i.e. µX = µ0 ), then Z = √ has approximately the N (0, 1) distribution, where X is the
σX / n
mean breaking strength of the 30 sample values and n = 30 .
At the 5% significance level there is a rejection region of 2.5% in each tail, as shown in figure 6a (since, under
H0 ,
P (Z < −1.96) = P (Z > 1.96) = 1 − P (Z ≤ 1.96) = 1 − Φ(1.96) = 0.025, i.e. 2.5%).
2 12 % -1.96 0 1.96 2 1 %
2
Figure 6a
Let us now consider a slightly differently worded question. Suppose the modification was specifically designed
so as to increase the strength of the yarn. In this case
H0 : µX = µ0 , H1 : µX > µ0 ,
and H0 is rejected if the value of Z is unreasonably large. In this situation the test is one-sided and
acceptance and (one-tailed) rejection regions at the 5% significance level are shown below.
8
acceptance rejection
region region
0 1.64 5%
Figure 6b
rec/01ls2