Anda di halaman 1dari 9

FACULTY OF MATHEMATICAL STUDIES

MATHEMATICS FOR PART I ENGINEERING

Lectures

MODULE 25 STATISTICS II
1. Mean and standard error of sample data
2. Binomial distribution
3. Normal distribution
4. Sampling
5. Confidence intervals for means
6. Hypothesis testing

1. Mean and standard error of sample data


Two different random variables can be measured for the same object: e.g. the height and weight of a person.
Both these random variables have distributions, mean values and variances. However, taller people are
usually heavier than shorter people so these two variables are dependent.
Variables can also be independent: independent events → P (A ∩ B) = P (A) P (B)
independent discrete random variables → P (X = xi ∩ Y = yj ) = P (X = xi ) P (Y = yj )
The latter defines a joint distribution for the two random variables.

Ex 1. A new plant at a manufacturing site has to be installed and then commissioned. The times required
for the two steps depend upon different random factors, and can therefore be regarded as independent. Based
on past experience the respective distributions for X (installation time) and Y (commissioning time), both
in days, are
P (X = 3) = 0.2, P (X = 4) = 0.5, P (X = 5) = 0.3
P (Y = 2) = 0.4, P (Y = 3) = 0.6
Find the joint distribution for X and Y , and the probability that the total time will not exceed 6 days.
Since the factors are independent P (X = xi ∩ Y = yj ) = P (X = xi ) P (Y = yj ). Thus the joint probability
table is
Y /X 3 4 5 Y /X 3 4 5
2 0.2 × 0.4 0.5 × 0.4 0.3 × 0.4 i.e. 2 0.08 0.20 0.12
3 0.2 × 0.6 0.5 × 0.6 0.3 × 0.6 3 0.12 0.30 0.18
Note that the column and row totals give the individual distributions for X and Y .
For the second part of the question

P (X + Y ≤ 6) = P (X = 3 ∩ Y = 2) + P (X = 3 ∩ Y = 3) + P (X = 4 ∩ Y = 2) :

no other combinations are possible. The joint distributions can then be read from the table giving

P (X + Y ≤ 6) = 0.08 + 0.12 + 0.20 = 0.40

It would be easy
X to calculate P (X + Y = wi ) for each wi , and then evaluate E(X + Y ) using the expression
E(X + Y ) = wi P (X + Y = wi ) .
wi

1
X
However, we can use a more general result. Given that E(X) = xi P (X = xi ) and that E(Y ) satisfies
xi
a similar formula, then it can be shown that for independent variables
E(X + Y ) = E(X) + E(Y ) and Var(X + Y ) = Var(X) + Var(Y ).
Given some data we usually do not know the exact distribution – but it would be useful to try to estimate
the mean and variance from the data
n
1 X
Def. For a sample {X1 , X2 , ..., Xn } of data the sample average is defined by X = Xi
n i=1
n
1 X
Def. For sample data {X1 , X2 , ..., Xn } the sample variance is defined by SX 2 = (Xi − X)2 .
n−1
i=1
[Note that n − 1 is used in the denominator in SX 2 because the differences Xi − X sum to zero and so are
not independent.]

Ex 2. A die was tossed six times producing the following results 6, 2, 4, 2, 1, 5 . Find the sample average
and the standard variance.
1 20 10
X = (6 + 2 + 4 + 2 + 1 + 5) = = ∼ 3.33
6 " 6 3
2  2  2  2  2  2 #
2 1 10 10 10 10 10 10
SX = 6− + 2− + 4− + 2− + 1− + 5−
6−1 3 3 3 3 3 3
"            #
2 2 2 2 2 2
1 8 4 2 4 7 5
= + − + + − + − +
5 3 3 3 3 3 3
 
1 64 + 16 + 4 + 16 + 49 + 25 174 58
= = = ∼ 3.87 → SX ∼ 1.97
5 9 45 15

[For unbiased die it can be shown that the theoretical results for X and SX for a large number of tosses
are 3.5 and 1.708 respectively. ]

2. Binomial distribution
Specifying the exact distribution of a random variable requires a lot of information. Good estimates of mean,
variance etc. can be obtained from data. Often probability distributions can be determined by formulae
using the estimated values of the parameters.
Consider the simple coin tossing experiment where only two outcomes are possible – success (say 1) or failure
(say 0). A Bernoulli trial is a simple observation of a random variable X , say, that can take the values
1 or 0: suppose
P (X = 1) = p, P (X = 0) = 1 − p .
Then
E(X) = X = 1(p) + 0(1 − p) = p + 0 = p ,
and Var(X) = E(X − X)2 = E(X − p)2 = (0 − p)2 P (X = 0) + (1 − p)2 P (X = 1)
= p2 (1 − p) + (1 − p)2 p = p(1 − p)(p + (1 − p)) = p(1 − p) .
Now let {X1 , ..., Xn } denote n independent Bernoulli trials, each with success probability p . Then
number of successes Y = X1 + X2 + . . . + Xn . Suppose Y = k , where 0 ≤ k ≤ n , then k of the Xi values
equal 1 and n − k equal 0. The probability of k 1’s at first with n − k 0’s following is pk (1 − p)n−k , since
the outcomes are independent.  
n n!
The number of ways of distributing k successes among n trials = = ,
k (n − k)!k!
 
n
hence P (Y = k) = pk (1 − p)n−k .
k

2
This is the general form of the binomial distribution and leads to

mean = np , variance = np(1 − p) .

Ex 3. If on average 1 in 20 of a certain type of column fails under loading, what is the probability that
among 16 such columns at most 2 will fail?
Given
1 19
P (column failing) = P (F ) = = 0.05 , P (column not failing) = P (F ) = = 0.95 ,
20 20
therefore

P (0 F ) = (0.95)16 = 0.44013,
 
16 16!
P (1 F ) = (0.05)1 (0.95)15 = (0.05)(0.46329) = (0.05)(0.46329)(16) = 0.37063
1 15! 1!
 
16 16! (16)(15)
P (2 F ) = (0.05)2 (0.95)14 = (0.0025)(0.48767) = (0.0025)(0.48767) = 0.14630
2 14! 2! 2
→ P (at most 2 fail) = 0.44013 + 0.37063 + 0.14630 = 0.9571

3. Normal distribution
This occurs very frequently in practice.
Def. A continuous random variable X has a normal distribution (or Gaussian distribution) with
mean µX and variance σX 2 if the probability density function satisfies
 
1 (x − µX )2
fX (x) = √ exp − (−∞ < x < ∞) .
σX 2π 2σX 2

Write X ∼ N (µX , σX 2 ) to represent a random variable with normal distribution which has mean µX and
variance σX 2 (see figure 3).
fX
0.4
µ =0, σ =1
X X

µ =0, σ =3
X X

-6 -4 -2 0 2 4 6 x
figure 3

The standard normal distribution has mean 0 and variance 1, and leads
Z z to
1 2
Def. Standard normal cumulative distribution is Φ(z) = √ e−x /2 dx .
2π −∞
Φ(z) is usually tabulated, and a summary of values appears on the Formula Sheet. Note that Φ(z) denotes
the area under the probability distribution curve to the left of x = z.
Suppose that X is a normal variable with mean µX and variance σX 2 then it can be shown that
X − µX
Z= is also a normal variable with mean 0 and variance 1.
σX
(The latter result is very useful in applications.)

3
Ex 4. The burning time X of an experimental rocket is a random variable having (approximately) a
normal distribution with mean 600s and standard deviation 25s. Find the probability that such a rocket
will burn for (a) less than 550s, (b) more than 637.5s.
(a) Given µX = 600 and σX = 25 , therefore
 
X − 600 550 − 600
P (X < 550) = P < = P (Z < −2) ,
25 25

X − 600
where Z = . From symmetry (look at figure 3), and then using the results on the Formula Sheet,
25

P (Z < −2) = P (Z > 2) = 1 − P (Z ≤ 2) = 1 − Φ(2) = 1 − 0.9772 = 0.0228


 
X − 600 637.5 − 600
(b) P (X > 637.5) = P > = P (Z > 1.5)
25 25
= 1 − P (Z ≤ 1.5) = 1 − Φ(1.5) = 1 − 0.9332 = 0.0668
When n > 20 then it can be shown that the normal distribution with mean np and variance np(1 − p)
provides a very accurate approximation to the binomial distribution.

4. Sampling
One of the major problems in statistics, estimating the properties of a large population from the properties
of a sample of individuals chosen from that population, is considered in this section.
Select at random a sample of n observations X1 , X2 , . . . , Xn taken from a population. From these n
observations you can calculate the values of a number of statistical quantities, for example the sample mean
X. If you choose another random sample of size n from the same population, a different value of the statistic
will, in general, result. In fact, if repeated random samples are taken, you can regard the statistic itself as a
random variable, and its distribution is called the sampling distribution of the statistic.
For example, consider the distribution of heights of all adult men in England, which is known to conform
very closely to the normal curve. Take a large number of samples of size four, drawn at random from the
population, and calculate the mean height of each sample. How will these mean heights be distributed? We
find that they are also normally distributed – about the same mean as the original distribution. However,
a random sample of four is likely to include men both above and below average height and so the mean of
the sample will deviate from the true mean less than a single observation will. This important general result
can be stated as follows:
If random samples of size n are taken from a distribution whose mean is µX and whose standard
√ σX , then the sample means form a distribution with mean µX and standard deviation
deviation is
σX = σX / n.
Note that the theorem holds for all distributions of the parent population. However, if the parent distribution
is normal then it can be shown that the sampling distribution of the sample mean is also normal.
The standard deviation of the sample mean, σX defined above, is usually called the standard error of
the sample mean.
Let us now present three worked examples.

Ex 5. A random sample is drawn from a population with a known standard deviation of 2.0. Find the
standard error of the sample mean if the sample is of size (i) 9, (ii) 100. What sample size would give a
standard error equal to 0.5?
Using the result stated earlier
σX 2
(i) standard error = √ = √ = 0.667, to 3 decimal places,
n 9

4
σX 2
(ii) standard error = √ = √ = 0.2
n 100

If the standard error equals 0.5, then 2/ n = 0.5. Squaring then implies that 4/n = 0.25 or n = 16, (i.e.
the sample size is 16).

Ex 6. The diameters of shafts made by a certain manufacturing process are known to be normally dis-
tributed with mean 2.500 cm and standard deviation 0.009 cm. What is the distribution of the sample mean
diameter of nine such shafts selected at random? Calculate the percentage of such sample means which can
be expected to exceed 2.506 cm.
Since the process is normal we know that the sampling distribution of the sample mean will also
√ be normal,
with the same mean, 2.500 cm, but with a standard error (or standard deviation) σX = 0.009/ 9 = 0.003 cm.
In order to calculate the probability that the sample mean is bigger than 2.506, i.e. X > 2.506, we standardise
in the usual way by putting Z = (X − 2.500)/0.003, and then
 
X − 2.500 2.506 − 2.500
P (X > 2.506) = P > = P (Z > 2.0)
0.003 0.003
= 1 − P (Z ≤ 2.0) = 1 − Φ(2.0)
= 1 − .9772 = 0.0228, using the formula sheet
Hence, 2.28% of the sample means can be expected to exceed 2.506 cm.

Ex 7. What is the probability that an observed value of a normally distributed random variable lies within
one standard deviation from the mean?
Given normally distributed random variable, X , has mean µX and standard deviation σX , i.e. X ∼
X − µX
N (µX , σX 2 ). We need to calculate P (µX − σX ≤ X ≤ µX + σX ). Define Z = , then Z ∼ N (0, 1).
σX
It follows that
 
(µX − σX ) − µX X − µX (µX + σX ) − µX
P (µX − σX ≤ X ≤ µX + σX ) = P ≤ ≤
σX σX σX
= P (−1 ≤ Z ≤ 1) = 2 P (0 ≤ Z ≤ 1), by symmetry
= 2(Φ(1) − Φ(0)) = 2(0.8413 − 0.5000)
= 2(0.3413) = 0.6826

It was stated above that when the parent distribution is normal then the sampling distribution of the
sample mean is also normal. When the parent distribution is not normal, then obtain the following theorem
(surprising result?):
Central limit theorem If a random sample of size n, (n ≥ 30), is taken from ANY distribution with
mean µX and standard deviation σX ,√then the sampling distribution of X is approximately normal with
mean µX and standard deviation σX / n , the approximation improving as n increases.

Ex 8. It is known that a particular make of light bulb has an average life of 800 hrs with a standard
deviation of 48 hrs. Find the probability that a random sample of 144 bulbs will have an average life of less
than 790 hrs.
Since the number of bulbs in the sample is large, the sample mean will be normally distributed with mean
48 X − µX (X − 800)
= 800 and standard error σX = √ = 4 . Put Z = = , then
144 σX 4
 
X − 800 790 − 800
P (X < 790) = P <
4 4
= P (Z < −2.5) = P (Z > 2.5), by symmetry
= 1 − P (Z ≤ 2.5) = 1 − 0.9938 = 0.0062 .

5
To conclude this section the main results concerning the distribution of the sample mean X are summarised.
Consider a parent population with mean µX and standard deviation σX . From this population take a
√ X − µX
random sample of size n with sample mean X and standard error σX / n . Define Z = √ then
σX / n
(i) for all n — the distribution of Z is N (0, 1) if the distribution of the parent population is normal;
(ii) n < 30 — the distribution of Z is approximately N (0, 1) if the distribution of the parent population is
approximately normal;
(iii) n ≥ 30 — the distribution of Z is a good approximation to N (0, 1) for all distributions of the parent
population.

5. Confidence intervals for means


Choose a sample at random from a population, then the mean, X, of the sample is said to provide a point
estimator of the population mean µX .
The accuracy of this estimate is measured by the confidence interval, which is the interval within which
you can be reasonably sure the value of the population mean µX lies.
One usually calculates k by specifying that the probability that the interval (X − k, X + k) contains the
population mean is 0.95, or 0.99. For example, if k is calculated so that

P (X − k ≤ µX ≤ X + k) = .95

then the interval is called the 95% confidence interval – if the interval is calculated for very many samples
then 95 out of 100 intervals would contain µX . On replacing 95 by 99 you obtain the definition of a 99%
confidence interval.
X − µX
To proceed further, assume that the standard deviation σX is known and that Z = √ is distributed
σX / n
with the standard normal distribution N (0, 1) (the conditions for this to hold were stated at the end of
section 4). The results on the Formula Sheet show that 95% of the standard normal distribution lies between
−1.96 and 1.96. Hence
P (−1.96 ≤ Z ≤ 1.96) = 0.95 .
Rearranging the inequalities inside the bracket to obtain conditions on X yields
X − µX √ √
Z= √ ≤ 1.96 → X − µX ≤ 1.96σX / n → µX ≥ X − 1.96σX / n;
σX / n
X − µX √ √
Z= √ ≥ −1.96 → X − µX ≥ −1.96σX / n → µX ≤ X + 1.96σX / n.
σX / n
It follows that the earlier expression can be re-written as
 
σX σX
P X − 1.96 √ ≤ µX ≤ X + 1.96 √ = 0.95 ,
n n

and hence the interval


σX σX
X − 1.96 √ ≤ µX ≤ X + 1.96 √ is the 95% confidence interval for µX .
n n
σX σX
Similarly X − 2.58 √ ≤ µX ≤ X + 2.58 √ is the 99% confidence interval for µX .
n n

6
Ex 9. The percentage of copper in a certain chemical is to be estimated by taking a series of measure-
ments on randomly chosen small quantities of the chemical and using the sample mean to estimate the true
percentage. From previous experience individual measurements of this type are known to have a standard
deviation of 2.0%. How many measurements must be made so that the standard error of the estimate is
less than 0.3%? If the sample mean w of 45 measurements is found to be 12.91%, give a 95% confidence
interval for the true percentage, ω .

Assume that n measurements √ are made. The standard2 error of the sample mean is (2/ n)%. For the
required precision require 2/ n < 0.3 , i.e. n > (2/0.3) = 4/0.09 = 44.4 . Since n must be an integer, at
least 45 measurements are necessary for required precision.
With a sample of 45 measurements, you can use the central limit theorem √ and take the sample mean
percentage W to be distributed normally with mean ω and standard error 2/ 45. Hence, if ω is the true
W −ω
percentage, it follows that Z = √ is distributed as N (0, 1). Since 95% of the area under the standard
2/ 45
normal curve lies between Z = −1.96 and Z = 1.96 ,
 
W −ω
P −1.96 ≤ √ ≤ 1.96 = 0.95 .
2/ 45
    
2 2
Re-arranging, we obtain P W − 1.96 √ ≤ ω ≤ W + 1.96 √ = 0.95 .
45 45
Hence, the 95% confidence interval for the true percentage is

(12.91 − 1.96(0.298), 12.91 + 1.96(0.298)) = (12.33, 13.49) .

To complete this section we define the sample variance.


Def. Given a sample of n observations X1 , X2 , . . . , Xn the sample variance, s2 , is given by s2 =
n
1 X 2
Xi − X , where X denotes the sample mean.
n − 1 i=1
In our discussion of confidence intervals for the mean it was assumed that the population variance σX 2 was
known. What happens when this it is not known? For samples of size n > 30, a good estimate of σX 2 is
obtained by calculating the sample variance s2 and using this value. (For small samples, n < 30, need to
use the t-distribution – not considered in this module).

6. Hypothesis testing
An assumption made about a population is called a statistical hypothesis. From information contained
in a random sample we try to decide whether or not the hypothesis is true:
if evidence from the sample is inconsistent with the hypothesis, then hypothesis is rejected;
if the evidence is consistent with the hypothesis, then the hypothesis is accepted.
The hypothesis being tested is called the null hypothesis (usually denoted by H0 ) – it either specifies a
particular value of the population parameter or specifies that two or more parameters are equal.
A contrary assumption is called the alternative hypothesis (usually denoted by H1 ) – normally specifies
a range of values for the parameter.
A common example of the null hypothesis is H0 : µX = µ0 . Then three alternative hypotheses are

(i) H1 : µX > µ0 , (ii) H1 : µX < µ0 , (iii) H1 : µX 6= µ0 .

Types (i) and (ii) are said to be one-sided (or one-tailed, see figure 6b) – type (iii) is two-sided (or
two-tailed, see figure 6a).
The result of a test is a decision to choose H0 or H1 . This decision is subject to uncertainty, and two types
of error are possible:

7
(i) a type I error occurs when we reject H0 on the basis of the test although it happens to be true – the
probability of this happening is called the level of significance of the test and this is prescribed before
testing – most commonly chosen values are 5% or 1%.
(ii) a type II error occurs when you accept the null hypothesis on the basis of the test although it happens
to be false.

The above ideas are now applied to determine whether or not the mean, X, of a sample is consistent with
a specified population mean µ0 . The null hypothesis is H0 : µX = µ0 and a suitable statistic to use is
X − µ0
Z= √ , where σX is the standard deviation of the population and n is the size of the sample.
σX / n
Find the range of values of Z for which the null hypothesis would be accepted – known as acceptance
region for the test – depends on the pre-determined significance level and the choice of H1 .
Corresponding range of values of Z for which H0 is rejected (i.e. not accepted) is called the rejection
region.

Ex 10. A standard process produces yarn with mean breaking strength 15.8 kg and standard deviation
1.9 kg. A modification is introduced and a sample of 30 lengths of yarn produced by the new process is tested
to see if the breaking strength has changed. The sample mean breaking strength is 16.5 kg. Assuming the
standard deviation is unchanged, is it correct to say that there is no change in the mean breaking strength?

Here H0 : µX = µ0 , H1 : µX 6= µ0 ,
where µ0 = 15.8 and µX is the mean breaking strength for the new process.
X − µ0
If H0 is true (i.e. µX = µ0 ), then Z = √ has approximately the N (0, 1) distribution, where X is the
σX / n
mean breaking strength of the 30 sample values and n = 30 .
At the 5% significance level there is a rejection region of 2.5% in each tail, as shown in figure 6a (since, under
H0 ,
P (Z < −1.96) = P (Z > 1.96) = 1 − P (Z ≤ 1.96) = 1 − Φ(1.96) = 0.025, i.e. 2.5%).

This is an example of a two-sided test leading to a two-tailed rejection region.

rejection acceptance rejection


region region region

2 12 % -1.96 0 1.96 2 1 %
2
Figure 6a

The test is therefore: accept H0 if −1.96 ≤ Z ≤ 1.96, otherwise reject.


16.5 − 15.8
From the data, Z = √ = 2.018. Hence, H0 is rejected at the 5% significance level: i.e. the
1.9/ 30
evidence suggests that there IS a change in the mean breaking strength.

Let us now consider a slightly differently worded question. Suppose the modification was specifically designed
so as to increase the strength of the yarn. In this case

H0 : µX = µ0 , H1 : µX > µ0 ,

and H0 is rejected if the value of Z is unreasonably large. In this situation the test is one-sided and
acceptance and (one-tailed) rejection regions at the 5% significance level are shown below.

8
acceptance rejection
region region

0 1.64 5%
Figure 6b

At 5% significance level, test is accept H0 if Z ≤ 1.64, otherwise reject.


From earlier work Z = 2.018 and again the null hypothesis is rejected.
[Compare the two diagrams above, which illustrate the statement that the rejection region for a test depends
on the form of both the alternative hypothesis and the significance level.]

rec/01ls2

Anda mungkin juga menyukai