Distributions
Simple Random Sampling
There are two types of sampling
Probability Sampling : A probability sample is
a sample drawn from a population in such a
way that every member of the population has
a known probability of being included in the
sample
Nonprobability Sampling
Simple Random Sampling
If a sample of size n is drawn from a
population of size N in such a way that
every possible sample of size n has the
same probability of being selected, the
sample is called a simple random sample
Simple Random Sampling
Random sample can be done in two ways
With replacement
Without replacement
Example 4.2.1
Think about the data from Table 4.2.1
It show the fasting blood sugar levels of
150 normal people
Let say we want to select 10 subject from
this population
You can use Table D from Appendix to do
this
Or you can use Excel to select them
Sampling Distribution
The distribution of all possible values that
can be assumed by some statistic,
computed from sample of same size
randomly drawn from the same population
is called the sampling distribution of that
statistic
Sampling Distribution
To construct a sampling distribution:
Form a population of size N
Randomly draw all possible sample of size n
Compute statistic of interest for each sample
List in one column the different distinct
observed values and in an other column list
the corresponding frequencies
Distribution of the sample Mean
Example 4.4.1
Let say we have a population of size N=5 consisting of
the ages of five children who are outpatients in a
community mental healt center.
The ages are as follow
X1=6, x2=8, x3=10, x4=12, x5=14
We can obtain mean and variance of this population as
( )
( )
10
4
40
1
8
5
40
10
2
2
2
2
= =
=
= =
=
= =
N
x
s
N
x
N
x
i
i
i
x
x
N
x
x
x
i
N
x
N
x
x
n
x i
x
N
x
o
o
Distribution of the sample Mean
As it seen, it is not the same with population
variance
( )
( )
( ) ( )
4
25
100
25
10 14 ... 10 6
8
5
40
2 2
2
2
2
2
2
= =
+ +
=
=
= =
x
n
x i
x
i
N
x
N
x
o
o
Distribution of the sample Mean
t is equal to the population variance divided by the size of
sample used to obtain sampling distribution as:
( )
( ) ( )
4
2
8
4
25
100
25
10 14 ... 10 6
2
2
2 2
2
2
2
= = =
=
+ +
=
n
N
x
x
x
n
x i
x
o
o
o
o
Standard Error
The square root of the variance of the sampling
distribution is called Standard Error of Mean or
just Standard Error
n
x
o
o =
Generalization
When sampling is from a normally distributed
population, the distribution of sample mean will
posses the following properties
The disribution of will be normal
The mean, , of the distribution of wil be equal to
the mean of the population from which the samples were
drawn
The variance, , of the distribution of will be equal to
the variance of population divided by the sample size
x x
x
2
x
o
x
Generalization
For the case where sampling is from a nonnormally
distributed population, Central Limit Teheorem is
applied
This theorem say that: Given a population of any
nonnormal funtional form with a mean, , and finite
variance, , the sampling distribution of ,
computed from samples of size n from this
population, will be approximately normally
distributed with mean, , and varianve , , when
the sample size is large.
n
2
o
2
o
x
Sampling from finet population
without replacement
In general, when drawing samples size n from a
finete population of size N without replecament,
and ignoring the order in which sample values
are drawn, the number of possible samples is
given by the combination of N things taken n at a
time as:
( )! !
!
n N n
N
C
n N
=
Sampling from finet population
without replacement
For Example 4.4.1
There are 10 possible sample as:
( )
( )
10
! 3 ! 2
! 3 4 5
! 2 5 ! 2
! 5
! !
!
2 5
=
=
C
n N n
N
C
n N
Sampling from finet population
without replacement
Then the mean of this 10 samples is:
10
10
100
10
13 ... 8 7
= =
+ + +
= =
n N
i
x
C
x
n N
x i
x
C
x
o
But this value is different than the population
variance divided by the sample size (8/2=4)
Sampling from finet population
without replacement
However, an interesting relationship will be
appearent as
3
1 5
2 5
2
8
1
2
2
=
=
N
n N
n
x
o
o
Now with (Nn)/(N1) factor we relate the variance
of saampling distribution to the population
variance divided by the sample size
Sampling from finet population
without replacement
The factor
s called the finete population correction and can be
ignored when the sample size is small in
comparison with the population size
1
N
n N
In summary
When sampling is from a normally distributed
population with known population variance:
normal is of on distributi sampling The 3.
2.
1.
x
n
x
x
o
o
=
=
In summary
When sampling is from a nonnormally distributed
population with known population variance:
normal ely approximat is of on distributi sampling The 3.
1
otherwise
05 . 0 when 2.
1.
x
N
n N
n
N
n
n
x
x
x

.

\

=
s =
=
o
o
o
o
Example 4.4.2
Suppose it is known that in a certain human
population cranial (kafatas) length is
approximately normaly distributed with a mean of
185.6 mm and a standard deviation of 12.7 mm.
What is the probability that a random sample of
size 10 from this population will have a mean
greather than 190?
Example 4.4.2
It is assumed that population is approximately
normally distributed
Then
02 . 4
10
7 . 12
6 . 185
= = =
= =
n
and
x
x
o
o
Here we assume that the population is large
enough that we do not apply finite population
correction
Example 4.4.2
Now, because we have normal distribution (at least
approximately)
Then we can transform it to the standard normal
distribution
For this, we use following equation
09 . 1
02 . 4
6 . 185 190
/
=
=
z
n
x
z
x
o
02 . 4
10
7 . 12
6 . 185
= = =
= =
n
and
x
x
o
o
Example 4.4.2
The value z=1.09 is the area to the right of =190
under the curve of the sampling distribution
09 . 1
02 . 4
6 . 185 190
/
=
=
z
n
x
z
x
o
x
1.09
Example 4.4.2
By consulting standard normal table or using
exel, you will find that the area to the right of 1.09
is 0.1379.
So the probability is 0.1379
mean 190
pmean 185.6
stdev 4.02
p.stdev 12.7
n 10
z 1.0945274
probability 0.1368619
Example 4.4.3
If the mean and standard deviation of serum iron
values for healty men are 120 and 15 micrograms
per 100 mL, respectively.
what is the probability that a random sample of
50 normal men will yield a mean between 115
and 125 micrograms per 100 mL?
Example 4.4.3
Here, the funtional form of the population is not
specified
However we have more than 30 sample
So that we can apply central limit theorem
Thus, assume that the distribution is
approximately normal
With all this
12 . 2 50 / 15 /
120
= = =
= =
n
x
x
o o
Example 4.4.3
The probability we look for is:
( )
( ) ( )
( )
( ) 9818 . 0 125 115
0091 . 0 9909 . 0 125 115
36 . 2 36 . 2 125 115
12 . 2
120 125
12 . 2
120 115
125 115
= s s
= s s
s s = s s

.

\

s s
= s s
x P
x P
z P x P
z P x P
Distribution of the Difference
Between Two Sample Means
Most of the time we may want to know whether
the two population means are same or not
If they are different, then how different they are
And so on
Many other questions can be asked
Let us look at this issue with example 4.5.1
Example 4.5.1
Soppose we have two populations of individuals
One population (population 1) has experienced
some condition thought to be associated with
mental retardation
Population 2 has not experienced the condition
The distribution of intelligence scores in each of
two populations is belived to be approximately
normally distributed with a standard deviation of
20.
Example 4.5.1
Now, suppose also that we take a sample of 15
individuals from each population and compute for
each sample the mean intelligence scores as
=92 and = 105.
If ther is no difference between the two
populations, with respect to their true mean
intelligence scores,
What is the probability of observing a difference
this large or larger between sample
means?
1
x
) (
2 1
x x
2
x
Example 4.5.1
Here, we would have a normal distribution with a
mean equal to:
And variance equal to:
2
2
2
1
2
1
2
2 1
n n
x x
o o
o + =
0
2 1
2 1
= =
x x
Example 4.5.1
Then. The standard error of the difference
between sample means would be equal to
2
2
2
1
2
1
2 1
n n
x x
o o
o + =
Example 4.5.1
Now, if we turn back our example
We would have a normal distribution with a mean
of zero (0) (if there is no difference between the
true population means)
An a varianve of:
33 . 53
15
) 20 (
15
) 20 (
2 2
2
2 1
= + =
x x
o
Example 4.5.1
To convert the normal distribution to standard
normal distribution we use the following modified
equation
( ) ( )
2
2
2
1
2
1
2 1 2 1
n n
x x
z
o o
+
=
Example 4.5.1
The area under the curve of
corresponding to the probability we are looking for
is the area to the left of =92 105 = 13
Now the z value corresponding to 1.78
(assuming that there is no difference between
population means is:
( )
78 . 1
15
) 20 (
15
) 20 (
0 105 92
2 2
=
+
= z
2 1
x x
2 1
x x
Example 4.5.1
Once again by looking at Table C or using excel,
you can find the area under the standard normal
curve to the left of 1.78 is equal to 0.0375
So, the answer to the question, we can say that f
there is no difference between population means,
the probability of obtaining a diffrence between
sample means as large as or larger than 13 is
0.0375
In conclusion
The procedure for this example is valid even
when the sample sizez n
1
and n
2
are diffrent and
when the population variances and have
different values.
2
1
o
2
2
o
In conclusion
1
2
1
o
2
2
o
2 1
( ) ( )
2
2
2 1
2
1
/ / n n o o +
Given two normally distributed populations with
means, and , and variances, and ,
respectively, the sampling distribution of the
difference, , between the means of
independent sample size n
1
and n
2
drawn from
these populations is normally distributed with
mean, , and variance,
2 1
x x
Sampling from Nonnormal Population
Example 4.5.2: Suppose it has been established that for a
certain type of client the average length of a home visit by
a public healt nurse is 45 minutes with a standard
deviation of 15 minutes, and that for a second type of
client the average home visit is 30 minutes long with a
standard deviation of 20 minutes.
If a nurse randomly visits 35 clients from the first and 40
from the second group, what is the probability that the
average length of home visit will differ between the two
groups by 20 or more minutes?
Sampling from Nonnormal Population
This problem can be solved with the central limit theorem
since we have more than 30 samples for each group
Thus, the difference between sample means is at least
approximately normally distributed with the following
mean and variance
( ) ( )
4286 . 16
40
20
35
15
15 30 45
2 2
2
2
2
1
2
1
2
2 1
2 1
2 1
= + = + =
= = =
n n
x x
x x
o o
o
Sampling from Nonnormal Population
The area under the curve of which we
want to determine is the area to the right of 20
The corresponding value of z in the standard
normal is
( ) ( )
23 . 1
4286 . 16
15 20
2
2
2
1
2
1
2 1 2 1
=
=
+
=
n n
x x
z
o o
2 1
x x
Sampling from Nonnormal Population
From the Table C or using excel it is seen that
the area to the right of z=1.23 is 1 0.8907 =
0.1093
Then, the probability of the nurses random visits
resulting in a difference between the two means
as great as or greather than 20 minutes is o.1093
z 1.23
probability 0.1093486
Distribution of the sample Proportion
When you need to deal with sample
proportion that results from counts or
frequency data
Here you do not work with distribution from
measured variables
Is it possible that the variables in this
cases could be assigned nto two mutually
exculusive categories
Following example illustrates this
Example 4.6.1
Soppose we know that in a certain human
population 0.08 are color blind.
If we designate a population proportion by
p, we can say that here p=0.08
If we randomly select 150 individuals from
this population
What is the probability that the proportion
in the sample who are color blind will be
as great as 0.15?
Example 4.6.1
When the sample size is large, the distribution of
sample prportions is approimately normally
distributed (central limit theorem)
Thus,
The mean of all the possible sample proportipns
will be equal to the true population proportions, p,
as:
p
p
=
Example 4.6.1
The variance of the distribution and
corresponding z value will be equal to:
( )
( )
n
p p
p p
z
n
p p
p
=
1
1
2
o
Example 4.6.1
The question is of course when we say lage sample how
large it is
The criterion here is that both np and n(1 p) mst be
greather than 5
Sor our example
Thus, we see that we have large sample
138 ) 08 . 0 1 ( 150 ) 1 (
12 08 . 0 150
= =
= =
p n
np
Example 4.6.1
Now if we loook at mean and variance
00049 . 0
150
) 08 . 0 1 ( 08 . 0
08 . 0
2
=
=
=
p
p
o
Example 4.6.1
Then the probability we want to find is the area
under the curve of that is the right of 0.15
This area is equal to the area under the standard
normal curve to the right of z=3.15 as:
( )
15 . 3
00049 . 0
08 . 0 15 . 0
1
=
n
p p
p p
z
p
Example 4.6.1
Using Table C or excel we find that the probability
of observing in a random sample
of n=150 from a population in which p=0.08 is:
0008 . 0 9992 . 0 1 ) 15 . 0
( P = = > p
15 . 0
> p
Example 4.6.1
The normal approximation can be improved by
the following correction for continuity factors
when we say as:
p q
np x
n
pq
p
n
x
z
or
np x
n
pq
p
n
x
z
c
c
=
>
=
<
+
=
1 where
for ,
5 . 0
for ,
5 . 0
p n x
=
Example 4.6.1
For our example
01 . 3
00049 . 0
08 . 0
150
5 . 0 5 . 22 5 . 0
=
=
n
pq
p
n
x
z
c
5 . 22 15 . 0 150
= = p n
Example 4.6.1
The the probability will be
0013 . 0 9987 . 0 1 ) 15 . 0
( = = > p P
Example 4.6.2
Suppose it is known that in a certain population of
women, 90% entering their third trimester of
pregnancy have had some prenetal care.
If a random sample of size 200 is drawn from this
population, what is the probability that the sample
proportion who have had some prenetal care will
be less than 0.85?
Example 4.6.2
Assuming that the sampling distribution of is
approximately normally distributed with
and
36 . 2
0212 . 0
05 . 0
00045 . 0
90 . 0 85 . 0
) 1 (
=
n
p p
p p
z
p
90 . 0
=
p
00045 . 0
200
) 10 . 0 1 ( 10 . 0 ) 1 (
2
=
n
p p
p
o
0091 . 0 ) 36 . 2 ( ) 85 . 0
( = s = s z P p P
The area to the left of 2.36 under the standard normal curve is 0.0091
Thus the probability will be
Distribution of The Difference
Between Two Sample Proportions
If independent random samples of size n1 and n2 are
drawn from two populations of dichotomous variables
where the proportion of observations with the
characteristic of interest in the two populations , , is
aproximately normal with mean
And variance
Where n1 and n2 are large
2 1
2 1
p p
p p
=
2 1
p p
( ) ( )
2
2 2
1
1 1
2
1 1
2 1
n
p p
n
p p
p p
+
o
Distribution of The Difference
Between Two Sample Proportions
The probability of the difference between two
sample proportions will be
( ) ( )
( ) ( )
2
2 2
1
1 1
2 1 2 1
1 1
n
p p
n
p p
p p p p
z
=
Example 4.7.1
Soppose that the proportion of moderate to heavy
users of illegal drugs in population 1 is 0.50 while
in population 2 the proportion is 0.33.
What is the probability that sample of size 100
drawn from each of the populations will yield a
value of as large as 0.30?
2 1
p p
Example 4.7.1
Now, assuming that the sampling distribution of
. s approximately normal with mean
And varianve
2 1
p p
17 . 0 33 . 0 50 . 0
2 1
= =
p p
( ) ( )
04711 . 0
100
50 . 0 1 50 . 0
100
33 . 0 1 33 . 0
2
2 1
=
=
p p
o
Example 4.7.1
The area corresponding to the probability we seek
is the area under the curve of to the right
of 0.30
2 1
p p
( ) ( )
( ) ( )
( ) ( )
89 . 1
004711 . 0
17 . 0 30 . 0
1 1
2
2 2
1
1 1
2 1 2 1
=
=
n
p p
n
p p
p p p p
z
From Table C or using Excell you can calculate the probabilit as
P =1 0.9706 = 0.0294