Anda di halaman 1dari 16

ST 260: Statistical Data Analysis

Normal Approximation to Binomial and Poisson


Distributions
Example 1 In a digital communication channel, assume that the number of bits
received in error can be modeled by a binomial random variable, and assume that
the probability that a bit is received in error is 1 105 . If 16 million bits are
transmitted, what is the probability that 150 or fewer errors occur? Let the random
variable X denote the number of errors. Then X is binomial and
P (X 150) =

150
X
x=0

16, 000, 000


x

(105 )x (1 105 )16,000,000x

Clearly, the probability above is difficult to compute. Fortunately, the normal distribution can be used to find an excellent approximation in this case.

Normal Approximation to Binomial


Definition 1 If X is a binomial random variable with parameters n and p
X np
Z=q
np(1 p)
is approximately a standard normal random variable. To approximate a binomial
probability with a normal distribution, a continuity correction is applied as
follows:

x + 0.5 np

P (X x) = P (X x + 0.5) P Z q
np(1 p)
and

x 0.5 np

P (X x) = P (X x 0.5) P Z q

np(1 p)

Probabilities involving X can be approximated by using the standard normal


distribution. The above approximation is good when n is large relative to p (see
Figure 1).

n = 10, p = 0.5

n = 10, p = 0.25

600

600

400

400

200

200

0
5

10

n = 10, p = 0.75

10

n = 1000, p = 0.25

600

500
400

400

300
200

200

100
0

10

0
200

15

250

300

Figure 1: Normal approximations to binomial distribution.


Example 2 The digital communication problem in the previous example is solved
as follows:

150.5 160
q
P (X 150) = P (X 150.5) = P q
160(1 105 )
160(1 105 )
X 160

P (Z 0.75)
= 0.227
Example 3 Again consider the transmission of bits as given in the previous examples. To judge how well the normal approximation works, assume only 50 bits are
to be transmitted and the probability of an error is p = 0.1. The exact probability
that two or less errors occur is
P (X 2) =

50
0

!
50

0.9 +

50
1

50
2

49

0.1(0.9) +

0.12 (0.9)48 = 0.112

Based on the normal approximation

P (X 2) = P q

X 5

2.5 5

= P (Z < 1.18) = 0.119


<q
50(0.1)(0.9)
50(0.1)(0.9)

Even for a small sample of 50 bits, the normal approximation is reasonable.


3

Normal Approximation to Poisson


Definition 2 If X is a Poisson random variable with E(X) = and V ar(X) = ,
then
X
Z=

is approximately a standard normal random variable. The approximation is good


for > 5.
!=1

!=8

1500

1000
800

1000

600
400

500

200
0
5

0
10

10

! = 10

10

20

! = 20

800

600

600

400

400
200

200
0

10

15

20

10

20

30

40

Figure 2: Normal approximations to Poisson distribution.


Example 4 Assume that the number of asbestos particles in a squared meter of
dust on a surface follows a Poisson distribution with mean of 1000. If a squared
meter is analyzed, what is the probability that 950 or fewer particles are found?
The probability can be expressed exactly as
P (X 950) =

950 1000
X
e
1000x
0

x!

The computational difficulty is clear. The probability can be approximated as


950.5 1000

P (X x) = P Z
1000

= P (Z 1.5653) = 0.058756

Assessing Normality
Whether or not the normal distribution is an adequate model for a given
data set is often assessed by
1. Comparing characteristics of the data with theoretical properties of
normal distribution
2. Frequency histogram
3. Box-plots
4. Normal probability plot

Comparing Data Characteristics to Theoretical Properties


The normal distribution has several important theoretical properties:
Symmetrical; thus, the mean and median are equal.
It is bell-shaped; thus, the empirical rule applies.
The interquartile range equals 1.33 standard deviations.
Example 5 Consider the data shown in the histograms in Figure 3. The raw
data histograms are shown on the left-hand side, while standardized data histograms
are shown on the right.
Poisson (! = 3)

Standardized Poisson (! = 3)

2500

250

2000

200

1500

150

1000

100

500

50

10

0
5

15

Poisson (! = 20)

Standardized Poisson (! = 20)

4000

250
200

3000

150
2000
100
1000
0

50
0

10

20

30

0
5

40

Figure 3: Histograms of count data.

If we compute the estimated mean, median and IQR using the standardized
data (with = 3), we obtain values 0.0219, 0.0070, and 1.1570, respectively.
If we compute the estimated mean, median and IQR using the standardized
data (with = 20), we obtain values 0.0090, 0.0034, and 1.3421, respectively.
If we examine the box-plots given in Figure 4, we see that the normal distribution is not a very good approximation to the Poisson distribution with
= 3.
The same conclusion can be drawn by examining the normal probability plots
given in Figure 5.

Standardized Poisson (! = 3)

10

3
2

Poisson (! = 3)

1
4
0
2
1
0

Poisson (! = 20)

Standardized Poisson (! = 20)

40

35

30
1
Z

25
20

15

10

Figure 4: Box-plots of count data.

Standardized Poisson (! = 20)

0.999

0.999

0.997

0.997

0.99

0.99

0.98

0.98

0.95

0.95

0.90

0.90

0.75

0.75
Probability

Probability

Standardized Poisson (! = 3)

0.50

0.50

0.25

0.25

0.10

0.10

0.05

0.05

0.02

0.02

0.01

0.01

0.003

0.003

0.001

0.001
1

0
Z

Figure 5: Normal probability plots of count data.

Constructing a Normal Probability Plot


Definition 3 A normal probability plot can be constructed on ordinary axes by
plotting the standardized normal scores zj against x(j) , where x(j) is the j th smallest
observation in the sample, and the standardized scores zj satisfy
j 0.5
= P (Z zj ) = (zj )
n
For example, if (j 0.5)/n = 0.05, (zj ) = 0.05 implies that zj = 1.64. Table
1 demonstrates calculation of the standardized normal scores for a data sample
containing n = 10 observations. Figure 6 shows a plot of x(j) versus zj for the
battery-life data given in Table 1.

Table 1: Example showing how to compute zj s when constructing NPP with


n = 10. The resulting zj s are then plotted against the x(j) s (i.e., the observations
ranked from smallest to largest).
j x(j)
1 176
2 183
3 185
4 190
5 191
6 192
7 201
8 205
9 214
10 220

(j 0.5)/10
0.05
0.15
0.25
0.35
0.45
0.55
0.65
0.75
0.85
0.95

zj
1.64
1.04
0.67
0.39
0.13
0.13
0.39
0.67
1.04
1.64

NPP of BattreryLife Data


2

1.5

zj

0.5

0.5

1.5

2
175

180

185

190

195

200

205

210

215

220

X(j)

Figure 6: Normal probability plot of battery-life data given Table 1.

Sampling Distributions and the Central Limit Theorem


A sampling distribution is the probability distribution of a given statistic
based on a random sample of size n.
Definition 4 A statistic is any function of the observations in a random sample.
S 2 , and S).
We have encountered statistics before (e.g., X,
Definition 5 Random sample. The random variables X1 , X2 , ..., Xn are a random sample of size n if (1) the Xi s are independent random variables, and (2)
every Xi has the same probability distribution.
Example 6 Consider tossing a fair die two times and computing the average of
= (X1 + X2 )/2, where Xi is the outcome of the ith toss).
the outcomes (i.e., X
is determined by looking at all samples of size
Then, the sampling distribution of X
n = 2 and their means (see Table 11). Note that in Table 11 there are a total of 36
can take on; that is
possible samples of size n = 2, and there are 11 values that X
[1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0 ]. The sampling distribution
X
is shown in Figure 7, while Figure 8 compares the distribution of X to that
for X
Note also that E(X)
= 3.5 and V ar(X)
= 1.46. This implies E(X)
= X
of X.
2
2
= /2, where X and denote the mean and variance of X.
and V ar(X)
X
X
Sampling Distribution for Average of Two Rolls of a Fair Die
6/36

5/36

Probability

4/36

3/36

2/36

1/36

1.5

2.5

3.5
Xbar

4.5

5.5

with n = 2 tosses of a single die.


Figure 7: Sampling distribution of X

Table 2: All possible outcomes for the average of two tosses of a fair die.
Sample
1, 1
1, 2
1, 3
1, 4
1, 5
1, 6
2, 1
2, 2
2, 3
2, 4
2, 5
2, 6
3, 1
3, 2
3, 3
3, 4
3, 5
3, 6
4, 1
4, 2
4, 3
4, 4
4, 5
4, 6
5, 1
5, 2
5, 3
5, 4
5, 5
5, 6
6, 1
6, 2
6, 3
6, 4
6, 5
6, 6

10

X
1.0
1.5
2.0
2.5
3.0
3.5
1.5
2.0
2.5
3.0
3.5
4.0
2.0
2.5
3.0
3.5
4.0
4.5
2.5
3.0
3.5
4.0
4.5
5.0
3.0
3.5
4.0
4.5
5.0
5.5
3.5
4.0
4.5
5.0
5.5
6.0

Probability Distribution for X

Probability

1/6

Probability Distribution for Xbar


6/36

5/36

Probability

4/36

3/36

2/36

1/36

1.5

2.5

3.5
4
Xbar (n = 2)

4.5

5.5

Figure 8: Probability distribution of X (i.e., outcome if die is tossed once) and X


(i.e., average of the outcomes if die is tossed twice).

11

Sampling Distribution of the Mean


is often called the sampling distribution
The probability distribution of X
of the mean.
In general, the sampling distribution of a statistic depends on (1) the distribution of the population, (2) the size of the sample, and (3) the method of
sample selection.
Definition 6 Suppose each of n observations in a random sample, say, X1 ,
2
=
X2 , ..., Xn , is normally distributed with mean X and variance X
. Then X
Pn
2
= X and variance X
=
i=1 Xi /n is also normally distributed with mean X
2
X /n.
If we are sampling from a population with unknown probability distribution,
the sampling distribution of the sample mean will still be approximately
normal with mean and variance 2 /n, if the sample size n is large.
Definition 7 Central Limit Theorem. If X1 , X2 , ..., Xn is a random sample of
size n taken from a population (either finite or infinite) with mean X and finite
2
is the sample mean, the limiting form of the distribution of
variance X
, and if X
Z=

X
X

X / n

as n is the standard normal distribution.


In many cases of practical interest, the rule of thumb is that if n 30, the
central limit theorem will work. If n < 30 the central limit theorem will
work if the distribution of the population is not grossly non-normal.
Example 7 The foreman of a bottling plant has observed that the amount of soda
in each 32-ounce can is normally distributed with a mean of 32.2 ounces and a
standard deviation of 0.3 ounces. If a customer purchases one bottle, what is the
probability that the bottle will contain more than 32 ounces?
We already know how to do this!
Example 8 The foreman of a bottling plant has observed that the amount of soda
in each 32-ounce can is normally distributed with a mean of 32.2 ounces and a
standard deviation of 0.3 ounces. If a customer purchases a 6-pack of soda, what is
the probability that the average amount of the 6 cans is more than 32 ounces?

To find this probability, we need to know the sampling distribution of X.

12

Example 9 The Dean of the College of Business claims that the average salary
of the schools graduates one year after graduation is $800 per week (X ) with a
standard deviation of $100 (X ). A second-year student would like to check whether
the claim about the mean is correct. He takes a random sample of 25 people who
graduated one year ago and determines their weekly salary. He discovers the sample
mean to be $750. Is this consistent with the Deans claim?

Sampling Distribution of a Proportion


The estimator of a population proportion of successes is the sample proportion, i.e., we count the number of successes in a sample and then
compute
X
P =
n
where X is the munber of successes and n is the sample size.
The normal approximation to the binomial is often used to approximate the
sampling distribution of a proportion.
It can be shown that E(P ) = P = p and V ar(P ) = P2 = p(1 p)/n (so
that P )
Note 1 For those interested and familiar with expectation and variance operators ...
E(P ) = E(X/n) =

1
np
E(X) =
=p
n
n

and
1
1
p(1 p)
V ar(P ) = V ar(X/n) = 2 V ar(X) = 2 np(1 p) =
n
n
n
Assuming the normal approximation to the binomial distribution is adequate,
we can standardize the sample proportion P and reference the standard
normal distribution, i.e.,
Z=q

P p
p(1 p)/n

13

N (0, 1)

Example 10 Suppose that bottle caps are manufactured for a local brewery.
Suppose you take a random sample of 100 bottle caps from the process and
determine the count of non-conforming bottle caps contained in the sample.
If the true process fraction nonconforming is p = 0.05, what is the probability
that the proportion of non-conforming bottle caps in the sample is greater than
0.10?
We are interested in the probability

P (P > 0.1) P Z > q

0.1 0.05
0.05(0.95)/100

= P (Z > 2.2942)

= 1 P (Z < 2.2942)
= 1 (2.2942)
= 0.0109

Sampling Distribution for Difference in Two Means


Often the sampling distribution for the difference in two means is of
interest, e.g.,
In a pharmaceutical study, one may want to determine if two different
drugs produce the same effect.
In the study of a chemical process, one may want to determine if two
different temperatures results in the same yield.
In a comparison of universities, one may be interested in comparing the
average salaries of their graduates.
Suppose you take two random samples: one from a normal population
with mean 1 and variance 12 and another from a normal population with
mean 2 and variance 22 .
1 X
2 , is also
Then the difference between the two sample means, or X
normally distributed with
1 X
2 ) = X X = 1 2
E(X
1
2
and
2
2
1 X
2 ) = 2 = 1 + 2
V ar(X
X1 X2
n1 n2

14

Note that X1 X2 = n11 + n22 is also called the standard error of the
difference between two means.
Example 11 Starting salaries for MBA graduates at two universities are
normally distributed with the means and standard deviations given in Table
11. If graduates are selected at random from each university, what is the
1 X
2?
sampling distribution of X

Mean ()
Std. Dev. ()
Sample Size (n)

University 1
$62,000/yr
$14,500/yr
50

University 2
$60,000/yr
$18,300/yr
60

Note that if both of the populations are NOT normally distributed, but
the sample sizes are large (> 30), then by the central limit theorem the
1 X
2 is approximately normal.
sampling distribution of X

15

Practice Problems
1. A machine is used to fill containers with a liquid product. Assume the fill
volume is normally distributed. A random sample of 10 containers is selected
and the net contents (oz) are as follows: 12.03, 12.01, 12.04, 12.02, 12.05,
11.98, 11.96, 12.02, 12.05, and 11.99, where x = 12.0150.
(a) If fill volume is known to have a mean of 12 oz and a standard deviation
of 0.03 oz, what is the probability that you will see a sample average
greater than x?
(b) Does the assumption of normality seem appropriate for the fill volume
data? (Note: you must justify your answer by assessing the normality
of the data.)
2. The inside diameters of bearings used in an aircraft landing gear assembly
are known to have a mean of 8.25 cm and standard deviation of 0.005 cm.
Suppose you take a random sample of 15 bearings and compute x = 8.2535.
What is the probability that you will see a sample mean greater than x?
3. A random sample of 200 printed circuit boards contains 18 defective or nonconforming units.
(a) What is the sample process fraction nonconforming?
(b) If the true process fraction nonconforming is p = 0.10, what is the
probability you will observe a sample process fraction nonconforming
less than 0.09.
4. Suppose you take take random samples of sizes n1 = 10 and n2 = 5 from
two different normal populations. Suppose the first population has mean
1 = 25 and variance 12 = 5, and the second population has mean 2 = 32
and variance 22 = 7. Suppose that you compute the sample means of each
sample and obtain x1 = 24.75 and x2 = 29.2. What is the probability of
observing a difference in the sample means less than that observed? (i.e.,
1 X
2 < 4.45))
you are looking for P (X

16

Anda mungkin juga menyukai