Lecture 5

Review
Probability
Random Variables
Sampling Distributions
Central Limit Theorem
Problems
Review:
Two types of Normal Probability
Problems
y
Given
Raw score (y) z score tail area required p
from tables
y = + z
Given
p tail area z score Raw score (y)
from tables
L5.3
IQ tests
Scores on Intelligence tests are normally
distributed.
Intelligence tests are often scaled to have a
mean, m = 100 and std dev, s = 15
55 70 85 100 115 130 145
L5.4
Solving probability problems
Draw a diagram
Shade in the required (or given) area
Find the z score
Find the corresponding probability

(or raw score)
L5.5
Review Exercise 1
What is the probability that your random friend should have
an IQ of at least 147?
L5.6
Review Exercise 2
What is the value of the IQ that cuts off the top 20% of the
distribution of IQs?
L5.8
Probability
Pick a colour from the grid below!
1. Probability you chose Red

2. Probability you chose Green
3. Probability you chose Green, Blue, Yellow or Hot Pink
4. Probability you did not choose Red
L5.11
Probability Characteristics
1. A number representing a probability must lie between 0
(absolute impossibility) and 1 (absolute certainty)
2. For any particular set of events of interest, (e.g. tossing a

coin, throwing 2 dice, etc.), if you add up the probabilities
of all the simple possible events for that set (e.g. getting
a head, getting a tail, etc.) you must get 1
L5.12
Random Variables
Random Variables
If the outcome of a random event can be written as a
number then the variable is called a random variable
Example:
the number that comes up in
one throw of a die is a
random variable
L5.14
Probability Distribution of a Random Variable
The collection of
all possible values,
with the probabilities that they occur,
the probability distribution for the random variable.
(1) All the possible outcomes of the throw of a die

are the integers 1 to 6.
(2) The probability of any one number is 1/6;
The probability distribution of the results of a dice toss is

UNIFORM
Sampling Distributions
Sampling
Consider the population of All smarties
.. and suppose we are interested in the proportion of Red

smarties in this population.
To find out the proportion of red smarties in this population
we would take a sample from the population!!
one sample proportion of Red Smarties
L5.17
The true proportion (p)
Let p represent the population proportion
the true, usually unknown proportion
In this example, there are eight different colours of Smarties

and the same number of each colour is produced.
So in this example, we know that the true population

proportion, p, of Red Smarties is 1/8 or 12.5%.
L5.19
The Sample Proportion (p)
Let the p represent a sample proportion
Sample proportions vary!
When we obtain random samples of the same size from the

same population, we will almost certainly get different values
for the sample proportions.
samples vary by nature,
sample statistics we calculate from samples vary
(The sample proportion of red smarties, p, in a sample is

a sample statistic, because it is calculated using the data)
L5.20
Sample proportions vary!
Red Smarties Red Smarties

Red Smarties
Other Other
Other
Colours Colours
Colours
p1 p2 p4
p3
L5.21
The Sample Proportion: a Random Variable!
If the outcome of a random event can be written as a

number then the variable is called a random variable
The numerical value of a sample proportion is the outcome of

the random sample we obtained, so
the sample proportion is a random variable
and
the sample proportion has a probability distribution!
Sampling
Distribution
The Distribution of sample proportions of Red
smarties
Sampling
Distribution
To discover the probability distribution of sample

proportions of Red Smarties we need to obtain many, many
random samples .
too hard
Use a computer
L5.23
Simulation: Sampling Distribution of
Proportions
We can use Excel or R or some other computer program to simulate
many different random samples of Smarties and
count the number of Red smarties in each sample, and
produce a histogram showing the proportion of Red smarties in each
sample
Simulation:
Take 3000 different random samples, all of size 100 Smarties, from a
population with p = 0.125.
We guess that most of the sample proportion of these 3000 different
sample proportions should be close to p = 0.125.
? .
L5.24
Sample proportions from a population
with p = 0.125, n=100
100 smarties in
each sample
Min. 1st Qu. Median Mean 3rd Qu. Max.

0.0300 0.1000 0.1200 0.1255 0.1500 0.2400
The Distribution?
The histogram of sample proportions
is centred close to 0.125 (as expected)

Ranges from about 0.025 to 0.24
seems unimodal,
looks very symmetric
Shape looks like a Normal Distribution.
The distribution of sample proportions is

approximately Normal with mean = p
L5.26
Distribution of sample proportions is Normal
If
np 5 and
n(1-p) 5
L5.27
Standard Deviation of the Distribution of p
For any normal distribution, there are two parameters;
the mean (m) and the standard deviation (s) parameters
In this case we know the mean.
it is m = p, and we know that p = 0.125
The standard deviation of the distribution of sample proportions
depends on p and n.
(1)
It given by sp =

To calculate sp substitute the known values of p and n into the

formula ... L5.28
Standard deviation of Sample Proportion,
n=100
The standard deviation of the distribution of sample
proportions is given by
(1)
sp =

So, for our example, where p = 0.125, and n = 100
0.125(0.875)
sp =
100
= 0.03307
L5.29
Notation
Samples estimate Populations
Mean y m
Median ~y m
~
St Dev s s
Proportion p p
St dev(p) 1

Taking a larger sample
What will happen to the distribution of Sample proportions if
we take a larger sample?
Instead of taking samples of 100 smarties, lets take samples of

400 smarties.
We should get a better estimate of the true proportion, p.
L5.31
Sample proportions from a population
with p = 0.125, n=400
400 smarties in
each sample
Min. 1st Qu. Median Mean 3rd Qu. Max.

0.0700 0.1125 0.1250 0.1245 0.1350 0.1900
L5.32
The Distribution?
This histogram of sample proportions
is centred close to 0.125 (as expected)

Ranges from about 0.07 to 0.19
seems unimodal,
looks very symmetric
Shape looks like a Normal Distribution.
The distribution of sample proportions is

approximately Normal with mean = p
L5.33
Standard deviation of Sample Proportion,
n=400
The standard deviation of the distribution of sample

proportions where p = 0.125, and n = 400 is
(1)
sp =

0.125(0.875)
=
400
= 0.01654
L5.34
Larger sample smaller spread
When we increase the sample size
We obtain a similar estimate of p
The distribution is less spread out
Larger sample
Smaller spread
More accurate estimate
Less ERROR
L5.35
The Distribution of Sample Proportions is....
Approximately Normal
the mean of this Normal distribution is p, the true
(population) proportion
the standard deviation of this Normal distribution of sample
()
proportions is

IF
The observations are independent of each other, and
the sample size is large enough
i.e. np and n(1-p) 5
OR np and n(1-p) 5
L5.36
Applications
Example 1:
The proportion of boys born in Australia is known to be 0.517.
Consider a random sample of 1600 births from Australia.
? What is the probability that there will be less than 50%
boys in this sample?
50% or fewer boys proportion of boys in the sample
is less than or equal to 0.5
Given: p = 0.517 and n = 1600
Check: np = 827 and n(1-p) = 773
Distribution: Since np and n(1-p) are both 5, then
p ~ normal distribution
Find: Probability p < 0.5 ( p 0.5)
L5.38
Example 1
Find area under the

curve
What is the probability that

there will less than 50% boys born
in a sample of 1600 births? n
score
region
L5.39
Normal Probability problems
Draw a diagram
Shade in the required area
Find the z score
Find the shaded area(s)
score mean
score z=
std dev
Given
Raw score z score tail area required p
from tables L5.40
Distribution of Sample Proportion, p = 0.517
(1 )
=

0.517(10.517)
=
1600
p-s .517 p+s p = 0.012493

-3 -2 -1 0 1 2 3 z
Sample proportion comes from a normal distribution
p ~ normal dist
5.40
Standardizing z-scores
score mean
z=
std dev
y m
Individuals scores (y) z=
s
pp
z=
Proportions (p) p (1 p )
n
L5.42
What is the probability of less than 50% boys in a
sample of 1600 births?
p = 0.517
sd(p) = 0.012493

If p = .5 z =
(1 0.5 0.517 p
-1.36 0 z
= 0.5 0.517
0.012493
z = -1.36
Left Tail area = 0.0869
= probability
There is a 0.0869 probability that the sample of 1600 boys from the
population will contain fewer than 50% boy births.
L5.43
Interpretation, Continued
So, for a sample of size 1600, there is about a 9% chance that
a single sample will contain fewer than 50% boys.
What happens if we take a larger sample?
L5.44
Increasing the sample size
This time, we take a random sample of n=6400 births
This sample is 4 times as large as before!
The mean of the sampling distribution will not change
p = 0.517, but
the new standard deviation will change, because of the n in
the denominator of the formula for the sd:
(1 )
=

0.517(10.517)
=
6400
= 0.0062464 L5.45
Larger sample
The std dev of the sampling distribution for n = 1600 was
0.012493.
The std dev of the sampling distribution with n=6400 is
0.0062464.
The sd based on the larger sample is exactly half that of the
smaller sample.
to halve the sd,

we need to quadruple the sample size!
L5.46
Check the assumptions again
We have a random sample from the population,
and np = 6400 x 0.517 = 3308.8 and
n(1-p) = 6400 x 0.483 = 3091.2
i.e. Since both np and n(1-p) are 5 the sample size is large
enough to assume p ~ normal distribution.
The assumptions are satisfied so the conclusions will be

sound.
L5.47
What is the probability of less than 50% boys in a
sample of 6400 births?
m = 0.517
s = 0062464

If p = .5 z =
(1 0.5 0.517 p
-2.72 0 z
= 0.5 0.517
0.0062464
z = -2.72
Left Tail area = 0.0033
= probability
There is a 0.0033 probability that the sample of 6400 boys from the
population will contain fewer than 50% boy births.
L5.48
Larger sample Smaller spread
p = 0.517
s= 0.012493
p = 0.5
n = 1600
Left area = 0.0869 0.5 0.517 p
p = 0.517
s = 0.0062464
p = 0.5
n = 6400
Left area = 0.0033
0.5 0.517 p
L5.49
Larger Sample Smaller Error
If we take a random sample of 6400 births,
then we will make the mistake of deciding that the proportion
of boy births is less than 0.5 only a very small fraction of the
time, namely 0.0033 (= 0.33%)
Larger Sample Smaller Error
L5.50
Example 2
SIBT believes that 55% of the current students are of Chinese
nationality.
What is the probability that, out of a random sample of 150

students, less than 90 students are of Chinese nationality?
p = 90/150
Given: p = 0.55, n = 150, p = 90/150 . = 0.6
Find: Probability that p < 0.6
L5.51
Example 2, Solution
Given: p = 0.55, n = 150, p = 90/150 . p = 90/150
Find: Probability that p < 0.6 = 0.6
Check:
L5.52
Sampling Distribution for Means
Sampling Distribution for Means
We have seen that sample proportions are random variables
which arise from a Normal distribution.
Now we consider the distribution of sample means from

numerical data.
Sample means are random variables in the same way as

sample proportions are random variables.
Each time we obtain a different random sample from a
population, the sample mean will vary.
L5.55
Example: Distribution of Sample Means
Results of IQ tests can be

scaled so that m = 100, s = 15.
It is known that IQ's are
normally distributed
100
1. Consider taking many observations from the population.
L5.56
IQ scores: m = 100, s = 15
Freq
200 Individual scores
150
100
50
0
50 75 100 125 150
L5.58
Example: Distribution of Sample Means
Results of IQ tests can be
scaled so that m = 100, s = 15.
It is known that IQ's are
normally distributed
100
2. Consider taking many random samples from the population,

all of size (i) n = 4, (ii) n=10.
a. Then calculate the mean of each of these samples.
b. Construct a histogram of the means of these samples,
and notice the shape of the distribution of averages.
L5.59
IQ scores: m = 100, s = 15
Freq
200
150
100
Individual scores
50
0
50 75 100 125 150
Freq.
200
150
Averages from samples of
100
50
size 4
0
50 75 100 125 150
Freq.
200
150
100
Averages from samples of
50 size 10
0
50 70 90 110 130 150
L5.61
Descriptive Statistics
IQ scores: m = 100, s = 15
Variable Size Mean StDev Min Max
Individuals 500 100.67 14.60 51 147
Means of 4 500 99.68 7.58 78 120
Means of 10 500 99.78 4.79 84 112
y s
The descriptive statistics show that the overall means of the

individual scores and of the sample means are close to 100.
The standard deviations are decreasing - the distributions are
becoming less spread out as the size of the samples increases.
The shape of each histogram appears normal.
L5.62
The Population Distributions
As n increases the distribution of

means becomes more
compressed averages of 10 scores
averages of 4 scores
individual scores
50 100 150
http://onlinestatbook.com/stat_sim/sampling_dist/index.html
L5.63
Some Simulations
On the following slides we show some simulations of random
samples from 3 different populations;
1. normal,
2. uniform and
3. a funny population.
For each we show samples of averages ( means) of

5 random observations
L5.64
Sample Means from a Normal Population
individual
-4 -2 0 2 4
mean(n=5)
-4 4
mean (n=15)
-4 4
mean (n=25)
-4 4
L5.65
Sample Means from a Uniform Population
individual
0 1
mean (n=5)
0 1
mean (n=15)
0 1
mean (n=25)
0 1
L5.66
Sample Means from a Skewed Population
individual
0 1
mean (n=5)
0 1
mean (n=15)
0 1
mean (n=25)
0 1
L5.67
The Distribution of Sample Means
From the previous examples, it is clear that the distribution of
averages is approximately normal.
Also, as the sample size, n, gets larger:

The centre remains the same
The spread decreases
The shape of the histogram becomes more normal.
These examples have demonstrated the results of the

Central Limit Theorem.
http://onlinestatbook.com/stat_sim/sampling_dist/index.html L5.68
The Central Limit Theorem (CLT)
The central limit theorem (CLT) is one of the most important
results in statistical theory.
It states that
If we take random samples of the same size n from a
population which is not normally distributed,
then the sample means will follow a normal distribution

provided n, the sample size, is large enough.
The central limit theorem is the reason why normal

distributions are so frequent in nature
The previous simulations and applets have demonstrated the
CLT.
L5.69
What is large enough?
The closer the original population is to a normal distribution,
the smaller the sample size required for the CLT effect to
apply.
Usually n = 25 will be large enough to assume an

approximate normal distribution for sample means.
If the original population is a normal distribution,
then sample means will arise from a normal distribution,

regardless of sample size (n).
L5.70
Distribution of sample means
approx approx
L5.74
Distribution of sample means
approx normal
Mean=m
sd = s/5
L5.76
The spread of the distribution of means
By the , the distribution of sample

means is less spread out for averages of larger samples.
An estimate of a population mean will be more accurate for
larger samples
The error associated with the estimate of a population
mean will be smaller for larger samples.
The measure of the spread of the distribution of

sample means is called the
standard error of the mean.
L5.77
Standard Error of the mean
The measure of the spread
of the distribution of sample means is called the
standard error of the mean.
The standard error of the distribution of averages of random

samples depends on
the standard deviation of the population, s, and
the sample size, n.
It is given by

=

= =
L5.78
Distributions of average IQ scores
If n increases
1/n decreases
1/n decreases
se (y) = s/n
L5.79
Notation
Sample Population
estimate
Statistics Parameters
Mean y m
Median ~y ~
m
Std.dev s s
se (y) s/n s/n
Proportion p p
1
se (p) 1

L5.80
The standard error of the distribution of sample
means and proportions
The distribution of sample means and

proportions is less spread out than the
distribution of individual scores y
A Larger sample
Less spread
More accurate estimate
Less ERROR
So we call the standard deviation of the distributions

of both means and proportions the standard error.
L5.81
Example 1: IQ scores
What is the probability that the mean IQ of a class of 25
university students students will be greater than 120?
Assume
that the measurements are independent of each other
m = 115 and s = 8 for the population of university
students
Let y = IQ score of a random university student.
Then y = average IQ of 25 university students.
We know that y ~ normal distribution (m = 120 and s = 8).

Find probability (y > 120)
L5.83
Standardising scores
Score z = score mean

std dev

Individual scores ( y) z=

Averages () z=

L5.84
Distributions
Let y = IQ score of a university student.

Distribution of individual IQs:
shape centre spread
normal m = 115 s=8
Then y = average IQ of 25 university students.

Distribution
shape of average
centre IQs: spread
normal m = 115 se(y)=8/25
= 1.6
L5.85

z=
x
Given
Raw score (y) z score tail area required p
from tables

z=
x
Given
sample mean(y) z score tail area required p
from tables
L5.86
Example 1 Question
Find area under the
curve

the average IQ of a class of 25
university students will be
n
greater than 120?
score
region
L5.87
What is the probability that we will get a sample mean
of at least 120?
Given: IQs ~ normal distribution: m = 115 and s = 8, n=25
average IQ ~ normal distribution

If y = 120 -> =

120115 115 120 y
= 8
0 3.13 z
25
= 3.13 (2dp)
Tail area = 0.00087.

= probability ..the probability of obtaining a
mean IQ of at least 120, from a
sample of size 25, is 0.00087.
L5.88
Example 2: Teachers Incomes
We know that the distribution of incomes is not Normal!
It is unimodal but not symmetric
(In fact, it is always skewed to the right)
Assume that m
The mean yearly income in the population of teachers in Australia
is $38,000.
The standard deviation of yearly incomes in the population is
$20,000.
s
? What is the probability that the average income of 30 teachers in
Australia will be less than $40,000?
L5.89
Example 2 Question
Find area under the
curve

the average income of a
random sample of 30 teachers
is less than $40,000?
n
score
region
L5.90
Applying the CLT to Income Example, n=30
We have
m = $38,000
s = $20,000,
n = 30 38 y
$000s
The CLT tells us of the sampling distribution of the means,

Shape will be approximately Normal
Mean should be $38,000
Standard Deviation should be $20,000/30 = $3651.5
L5.91
Example 2: Solution
What is the probability that the average income of a random sample of 30
teachers in Australia is less than $40,000 per year?
Given: m = $38,000, s = $20,000, n = 30
CLT check:
Since n=30 is >25 the CLT for means applies
If = 40 -> ym
z=
s
n
Tail area =
The probability that the average income is

less than $40,000 is L5.92
Summary
Data (= Real world) vs. Theory
0.6
The CLT does n o t say that 0.4
the distribution of sample data 0.2
is Normal if the sample size is 0

M F
large enough
The CLT is about the distribution

of sample proportions and
sample means of many
different samples drawn from
the same population.
L5.95
Central Limit Theorem
The Central Limit Theorem applies to both means and
proportions under certain conditions.
1. The observations must be independent of each other
2. The data must be from a normal distribution OR
the sample size, n, must be large enough
Sample
Score Statistic Reason
size (n)
y ~ normal (m, s) y ~ normal (m , s/n) n>0 Fact!
y ~ Any (m , s) y ~ normal (m , s/n) n 25 CLT
(1) np 5,
counts p ~ normal (p , ) CLT
n(1-p) 5
L5.96
Standardizing z-scores
score mean
z=
std dev
y m
Individuals scores (y) z=
s
y m
Mean (y) z=
s
n
pp
Proportions (p) z=
p (1 p )
n
L5.97

Lecture 5

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Lecture 5

Diunggah oleh

Hak Cipta:

Format Tersedia

Review

55 70 85 100 115 130 145

Shade in the required (or given) area

Find the z score

Find the corresponding probability

1. Probability you chose Red

2. For any particular set of events of interest, (e.g. tossing a

(1) All the possible outcomes of the throw of a die

The probability distribution of the results of a dice toss is

.. and suppose we are interested in the proportion of Red

To find out the proportion of red smarties in this population

we would take a sample from the population!!

one sample proportion of Red Smarties

In this example, there are eight different colours of Smarties

So in this example, we know that the true population

Sample proportions vary!

When we obtain random samples of the same size from the

(The sample proportion of red smarties, p, in a sample is

Red Smarties Red Smarties

If the outcome of a random event can be written as a

The numerical value of a sample proportion is the outcome of

To discover the probability distribution of sample

Min. 1st Qu. Median Mean 3rd Qu. Max.

is centred close to 0.125 (as expected)

Shape looks like a Normal Distribution.

The distribution of sample proportions is

To calculate sp substitute the known values of p and n into the

So, for our example, where p = 0.125, and n = 100

Instead of taking samples of 100 smarties, lets take samples of

We should get a better estimate of the true proportion, p.

Min. 1st Qu. Median Mean 3rd Qu. Max.

is centred close to 0.125 (as expected)

Shape looks like a Normal Distribution.

The distribution of sample proportions is

The standard deviation of the distribution of sample

Find area under the

What is the probability that

p-s .517 p+s p = 0.012493

What happens if we take a larger sample?

to halve the sd,

The assumptions are satisfied so the conclusions will be

Larger Sample Smaller Error

What is the probability that, out of a random sample of 150

Find: Probability that p < 0.6

Now we consider the distribution of sample means from

Sample means are random variables in the same way as

Results of IQ tests can be

1. Consider taking many observations from the population.

2. Consider taking many random samples from the population,

The descriptive statistics show that the overall means of the

As n increases the distribution of

For each we show samples of averages ( means) of

Also, as the sample size, n, gets larger:

These examples have demonstrated the results of the

then the sample means will follow a normal distribution

The central limit theorem is the reason why normal

Usually n = 25 will be large enough to assume an

If the original population is a normal distribution,

then sample means will arise from a normal distribution,

By the , the distribution of sample

The measure of the spread of the distribution of

The standard error of the distribution of averages of random

se (y) s/n s/n

The distribution of sample means and