Probability
Random Variables
Sampling Distributions
Central Limit Theorem
Problems
Review:
Two types of Normal Probability
Problems
y
Given
Raw score (y) z score tail area required p
from tables
y = + z
Given
p tail area z score Raw score (y)
from tables
L5.3
IQ tests
Scores on Intelligence tests are normally
distributed.
Intelligence tests are often scaled to have a
mean, m = 100 and std dev, s = 15
L5.4
Solving probability problems
Draw a diagram
L5.5
Review Exercise 1
What is the probability that your random friend should have
an IQ of at least 147?
L5.6
Review Exercise 2
What is the value of the IQ that cuts off the top 20% of the
distribution of IQs?
L5.8
Probability
Pick a colour from the grid below!
L5.11
Probability Characteristics
1. A number representing a probability must lie between 0
(absolute impossibility) and 1 (absolute certainty)
Example:
the number that comes up in
one throw of a die is a
random variable
L5.14
Probability Distribution of a Random Variable
The collection of
all possible values,
with the probabilities that they occur,
the probability distribution for the random variable.
L5.17
The true proportion (p)
Let p represent the population proportion
the true, usually unknown proportion
L5.19
The Sample Proportion (p)
Let the p represent a sample proportion
L5.20
Sample proportions vary!
Other Other
Other
Colours Colours
Colours
p1 p2 p4
p3
L5.21
The Sample Proportion: a Random Variable!
Sampling
Distribution
The Distribution of sample proportions of Red
smarties
Sampling
Distribution
too hard
Use a computer
L5.23
Simulation: Sampling Distribution of
Proportions
We can use Excel or R or some other computer program to simulate
many different random samples of Smarties and
count the number of Red smarties in each sample, and
produce a histogram showing the proportion of Red smarties in each
sample
Simulation:
Take 3000 different random samples, all of size 100 Smarties, from a
population with p = 0.125.
We guess that most of the sample proportion of these 3000 different
sample proportions should be close to p = 0.125.
? .
L5.24
Sample proportions from a population
with p = 0.125, n=100
100 smarties in
each sample
If
np 5 and
n(1-p) 5
L5.27
Standard Deviation of the Distribution of p
For any normal distribution, there are two parameters;
the mean (m) and the standard deviation (s) parameters
In this case we know the mean.
it is m = p, and we know that p = 0.125
The standard deviation of the distribution of sample proportions
depends on p and n.
(1)
It given by sp =
(1)
sp =
0.125(0.875)
sp =
100
= 0.03307
L5.29
Notation
Samples estimate Populations
Mean y m
Median ~y m
~
St Dev s s
Proportion p p
St dev(p) 1
Taking a larger sample
What will happen to the distribution of Sample proportions if
we take a larger sample?
L5.31
Sample proportions from a population
with p = 0.125, n=400
400 smarties in
each sample
(1)
sp =
0.125(0.875)
=
400
= 0.01654
L5.34
Larger sample smaller spread
When we increase the sample size
We obtain a similar estimate of p
The distribution is less spread out
Larger sample
Smaller spread
More accurate estimate
Less ERROR
L5.35
The Distribution of Sample Proportions is....
Approximately Normal
the mean of this Normal distribution is p, the true
(population) proportion
the standard deviation of this Normal distribution of sample
()
proportions is
IF
The observations are independent of each other, and
the sample size is large enough
i.e. np and n(1-p) 5
OR np and n(1-p) 5
L5.36
Applications
Example 1:
The proportion of boys born in Australia is known to be 0.517.
Consider a random sample of 1600 births from Australia.
? What is the probability that there will be less than 50%
boys in this sample?
50% or fewer boys proportion of boys in the sample
is less than or equal to 0.5
Given: p = 0.517 and n = 1600
Check: np = 827 and n(1-p) = 773
Distribution: Since np and n(1-p) are both 5, then
p ~ normal distribution
Find: Probability p < 0.5 ( p 0.5)
L5.38
Example 1
score
region
L5.39
Normal Probability problems
Draw a diagram
Shade in the required area
Find the z score
Find the shaded area(s)
score mean
score z=
std dev
Given
Raw score z score tail area required p
from tables L5.40
Distribution of Sample Proportion, p = 0.517
(1 )
=
0.517(10.517)
=
1600
5.40
Standardizing z-scores
score mean
z=
std dev
y m
Individuals scores (y) z=
s
pp
z=
Proportions (p) p (1 p )
n
L5.42
What is the probability of less than 50% boys in a
sample of 1600 births?
p = 0.517
sd(p) = 0.012493
If p = .5 z =
(1 0.5 0.517 p
-1.36 0 z
= 0.5 0.517
0.012493
z = -1.36
Left Tail area = 0.0869
= probability
There is a 0.0869 probability that the sample of 1600 boys from the
population will contain fewer than 50% boy births.
L5.43
Interpretation, Continued
So, for a sample of size 1600, there is about a 9% chance that
a single sample will contain fewer than 50% boys.
L5.44
Increasing the sample size
This time, we take a random sample of n=6400 births
This sample is 4 times as large as before!
The mean of the sampling distribution will not change
p = 0.517, but
the new standard deviation will change, because of the n in
the denominator of the formula for the sd:
(1 )
=
0.517(10.517)
=
6400
= 0.0062464 L5.45
Larger sample
The std dev of the sampling distribution for n = 1600 was
0.012493.
The std dev of the sampling distribution with n=6400 is
0.0062464.
The sd based on the larger sample is exactly half that of the
smaller sample.
L5.46
Check the assumptions again
We have a random sample from the population,
and np = 6400 x 0.517 = 3308.8 and
n(1-p) = 6400 x 0.483 = 3091.2
i.e. Since both np and n(1-p) are 5 the sample size is large
enough to assume p ~ normal distribution.
L5.47
What is the probability of less than 50% boys in a
sample of 6400 births?
m = 0.517
s = 0062464
If p = .5 z =
(1 0.5 0.517 p
-2.72 0 z
= 0.5 0.517
0.0062464
z = -2.72
Left Tail area = 0.0033
= probability
There is a 0.0033 probability that the sample of 6400 boys from the
population will contain fewer than 50% boy births.
L5.48
Larger sample Smaller spread
p = 0.517
s= 0.012493
p = 0.5
n = 1600
Left area = 0.0869 0.5 0.517 p
p = 0.517
s = 0.0062464
p = 0.5
n = 6400
Left area = 0.0033
0.5 0.517 p
L5.49
Larger Sample Smaller Error
If we take a random sample of 6400 births,
then we will make the mistake of deciding that the proportion
of boy births is less than 0.5 only a very small fraction of the
time, namely 0.0033 (= 0.33%)
L5.50
Example 2
SIBT believes that 55% of the current students are of Chinese
nationality.
L5.51
Example 2, Solution
Given: p = 0.55, n = 150, p = 90/150 . p = 90/150
Find: Probability that p < 0.6 = 0.6
Check:
L5.52
Sampling Distribution for Means
Sampling Distribution for Means
We have seen that sample proportions are random variables
which arise from a Normal distribution.
L5.55
Example: Distribution of Sample Means
L5.56
IQ scores: m = 100, s = 15
Freq
200 Individual scores
150
100
50
0
50 75 100 125 150
L5.58
Example: Distribution of Sample Means
Results of IQ tests can be
scaled so that m = 100, s = 15.
It is known that IQ's are
normally distributed
100
L5.59
IQ scores: m = 100, s = 15
Freq
200
150
100
Individual scores
50
0
50 75 100 125 150
Freq.
200
150
Averages from samples of
100
50
size 4
0
50 75 100 125 150
Freq.
200
150
100
Averages from samples of
50 size 10
0
50 70 90 110 130 150
L5.61
Descriptive Statistics
IQ scores: m = 100, s = 15
Variable Size Mean StDev Min Max
Individuals 500 100.67 14.60 51 147
Means of 4 500 99.68 7.58 78 120
Means of 10 500 99.78 4.79 84 112
y s
L5.62
The Population Distributions
averages of 4 scores
individual scores
50 100 150
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
L5.63
Some Simulations
On the following slides we show some simulations of random
samples from 3 different populations;
1. normal,
2. uniform and
3. a funny population.
L5.64
Sample Means from a Normal Population
individual
-4 -2 0 2 4
mean(n=5)
-4 4
mean (n=15)
-4 4
mean (n=25)
-4 4
L5.65
Sample Means from a Uniform Population
individual
0 1
mean (n=5)
0 1
mean (n=15)
0 1
mean (n=25)
0 1
L5.66
Sample Means from a Skewed Population
individual
0 1
mean (n=5)
0 1
mean (n=15)
0 1
mean (n=25)
0 1
L5.67
The Distribution of Sample Means
From the previous examples, it is clear that the distribution of
averages is approximately normal.
L5.70
Distribution of sample means
approx approx
L5.74
Distribution of sample means
approx normal
Mean=m
sd = s/5
L5.76
The spread of the distribution of means
se (y) = s/n
L5.79
Notation
Sample Population
estimate
Statistics Parameters
Mean y m
Median ~y ~
m
Std.dev s s
Proportion p p
1
se (p) 1
L5.80
The standard error of the distribution of sample
means and proportions
A Larger sample
Less spread
More accurate estimate
Less ERROR
Individual scores ( y) z=
Averages () z=
L5.84
Distributions
L5.85
z=
x
Given
Raw score (y) z score tail area required p
from tables
z=
x
Given
sample mean(y) z score tail area required p
from tables
L5.86
Example 1 Question
Find area under the
curve
score
region
L5.87
What is the probability that we will get a sample mean
of at least 120?
Given: IQs ~ normal distribution: m = 115 and s = 8, n=25
average IQ ~ normal distribution
If y = 120 -> =
120115 115 120 y
= 8
0 3.13 z
25
= 3.13 (2dp)
Assume that m
The mean yearly income in the population of teachers in Australia
is $38,000.
The standard deviation of yearly incomes in the population is
$20,000.
s
? What is the probability that the average income of 30 teachers in
Australia will be less than $40,000?
L5.89
Example 2 Question
Find area under the
curve
L5.90
Applying the CLT to Income Example, n=30
We have
m = $38,000
s = $20,000,
n = 30 38 y
$000s
L5.91
Example 2: Solution
What is the probability that the average income of a random sample of 30
teachers in Australia is less than $40,000 per year?
Given: m = $38,000, s = $20,000, n = 30
CLT check:
Since n=30 is >25 the CLT for means applies
If = 40 -> ym
z=
s
n
Tail area =
large enough
L5.95
Central Limit Theorem
The Central Limit Theorem applies to both means and
proportions under certain conditions.
1. The observations must be independent of each other
2. The data must be from a normal distribution OR
the sample size, n, must be large enough
Sample
Score Statistic Reason
size (n)
(1) np 5,
counts p ~ normal (p , ) CLT
n(1-p) 5
L5.96
Standardizing z-scores
score mean
z=
std dev
y m
Individuals scores (y) z=
s
y m
Mean (y) z=
s
n
pp
Proportions (p) z=
p (1 p )
n
L5.97