Anda di halaman 1dari 85

Review

Probability
Random Variables
Sampling Distributions
Central Limit Theorem
Problems
Review:
Two types of Normal Probability
Problems
y
Given
Raw score (y) z score tail area required p
from tables

y = + z

Given
p tail area z score Raw score (y)
from tables
L5.3
IQ tests
Scores on Intelligence tests are normally
distributed.
Intelligence tests are often scaled to have a
mean, m = 100 and std dev, s = 15

55 70 85 100 115 130 145

L5.4
Solving probability problems

Draw a diagram

Shade in the required (or given) area

Find the z score

Find the corresponding probability


(or raw score)

L5.5
Review Exercise 1
What is the probability that your random friend should have
an IQ of at least 147?

L5.6
Review Exercise 2
What is the value of the IQ that cuts off the top 20% of the
distribution of IQs?

L5.8
Probability
Pick a colour from the grid below!

1. Probability you chose Red


2. Probability you chose Green
3. Probability you chose Green, Blue, Yellow or Hot Pink
4. Probability you did not choose Red

L5.11
Probability Characteristics
1. A number representing a probability must lie between 0
(absolute impossibility) and 1 (absolute certainty)

2. For any particular set of events of interest, (e.g. tossing a


coin, throwing 2 dice, etc.), if you add up the probabilities
of all the simple possible events for that set (e.g. getting
a head, getting a tail, etc.) you must get 1
L5.12
Random Variables
Random Variables
If the outcome of a random event can be written as a
number then the variable is called a random variable

Example:
the number that comes up in
one throw of a die is a
random variable

L5.14
Probability Distribution of a Random Variable
The collection of
all possible values,
with the probabilities that they occur,
the probability distribution for the random variable.

(1) All the possible outcomes of the throw of a die


are the integers 1 to 6.
(2) The probability of any one number is 1/6;

The probability distribution of the results of a dice toss is


UNIFORM
Sampling Distributions
Sampling
Consider the population of All smarties

.. and suppose we are interested in the proportion of Red


smarties in this population.

To find out the proportion of red smarties in this population

we would take a sample from the population!!

one sample proportion of Red Smarties

L5.17
The true proportion (p)
Let p represent the population proportion
the true, usually unknown proportion

In this example, there are eight different colours of Smarties


and the same number of each colour is produced.

So in this example, we know that the true population


proportion, p, of Red Smarties is 1/8 or 12.5%.

L5.19
The Sample Proportion (p)
Let the p represent a sample proportion

Sample proportions vary!

When we obtain random samples of the same size from the


same population, we will almost certainly get different values
for the sample proportions.
samples vary by nature,
sample statistics we calculate from samples vary

(The sample proportion of red smarties, p, in a sample is


a sample statistic, because it is calculated using the data)

L5.20
Sample proportions vary!

Red Smarties Red Smarties


Red Smarties

Other Other
Other
Colours Colours
Colours

p1 p2 p4
p3

L5.21
The Sample Proportion: a Random Variable!

If the outcome of a random event can be written as a


number then the variable is called a random variable

The numerical value of a sample proportion is the outcome of


the random sample we obtained, so
the sample proportion is a random variable
and
the sample proportion has a probability distribution!

Sampling
Distribution
The Distribution of sample proportions of Red
smarties

Sampling
Distribution

To discover the probability distribution of sample


proportions of Red Smarties we need to obtain many, many
random samples .

too hard
Use a computer

L5.23
Simulation: Sampling Distribution of
Proportions
We can use Excel or R or some other computer program to simulate
many different random samples of Smarties and
count the number of Red smarties in each sample, and
produce a histogram showing the proportion of Red smarties in each
sample
Simulation:
Take 3000 different random samples, all of size 100 Smarties, from a
population with p = 0.125.
We guess that most of the sample proportion of these 3000 different
sample proportions should be close to p = 0.125.
? .
L5.24
Sample proportions from a population
with p = 0.125, n=100

100 smarties in
each sample

Min. 1st Qu. Median Mean 3rd Qu. Max.


0.0300 0.1000 0.1200 0.1255 0.1500 0.2400
The Distribution?
The histogram of sample proportions

is centred close to 0.125 (as expected)


Ranges from about 0.025 to 0.24
seems unimodal,
looks very symmetric

Shape looks like a Normal Distribution.

The distribution of sample proportions is


approximately Normal with mean = p
L5.26
Distribution of sample proportions is Normal

If
np 5 and
n(1-p) 5

L5.27
Standard Deviation of the Distribution of p
For any normal distribution, there are two parameters;
the mean (m) and the standard deviation (s) parameters
In this case we know the mean.
it is m = p, and we know that p = 0.125
The standard deviation of the distribution of sample proportions
depends on p and n.

(1)
It given by sp =

To calculate sp substitute the known values of p and n into the


formula ... L5.28
Standard deviation of Sample Proportion,
n=100
The standard deviation of the distribution of sample
proportions is given by

(1)
sp =

So, for our example, where p = 0.125, and n = 100

0.125(0.875)
sp =
100

= 0.03307
L5.29
Notation
Samples estimate Populations

Mean y m
Median ~y m
~
St Dev s s
Proportion p p

St dev(p) 1

Taking a larger sample
What will happen to the distribution of Sample proportions if
we take a larger sample?

Instead of taking samples of 100 smarties, lets take samples of


400 smarties.

We should get a better estimate of the true proportion, p.

L5.31
Sample proportions from a population
with p = 0.125, n=400

400 smarties in
each sample

Min. 1st Qu. Median Mean 3rd Qu. Max.


0.0700 0.1125 0.1250 0.1245 0.1350 0.1900
L5.32
The Distribution?
This histogram of sample proportions

is centred close to 0.125 (as expected)


Ranges from about 0.07 to 0.19
seems unimodal,
looks very symmetric

Shape looks like a Normal Distribution.

The distribution of sample proportions is


approximately Normal with mean = p
L5.33
Standard deviation of Sample Proportion,
n=400

The standard deviation of the distribution of sample


proportions where p = 0.125, and n = 400 is

(1)
sp =

0.125(0.875)
=
400

= 0.01654

L5.34
Larger sample smaller spread
When we increase the sample size
We obtain a similar estimate of p
The distribution is less spread out

Larger sample
Smaller spread
More accurate estimate
Less ERROR

L5.35
The Distribution of Sample Proportions is....
Approximately Normal
the mean of this Normal distribution is p, the true
(population) proportion
the standard deviation of this Normal distribution of sample
()
proportions is

IF
The observations are independent of each other, and
the sample size is large enough
i.e. np and n(1-p) 5
OR np and n(1-p) 5
L5.36
Applications
Example 1:
The proportion of boys born in Australia is known to be 0.517.
Consider a random sample of 1600 births from Australia.
? What is the probability that there will be less than 50%
boys in this sample?
50% or fewer boys proportion of boys in the sample
is less than or equal to 0.5
Given: p = 0.517 and n = 1600
Check: np = 827 and n(1-p) = 773
Distribution: Since np and n(1-p) are both 5, then
p ~ normal distribution
Find: Probability p < 0.5 ( p 0.5)
L5.38
Example 1

Find area under the


curve

What is the probability that


there will less than 50% boys born
in a sample of 1600 births? n

score
region
L5.39
Normal Probability problems

Draw a diagram
Shade in the required area
Find the z score
Find the shaded area(s)

score mean
score z=
std dev
Given
Raw score z score tail area required p
from tables L5.40
Distribution of Sample Proportion, p = 0.517

(1 )
=

0.517(10.517)
=
1600

p-s .517 p+s p = 0.012493


-3 -2 -1 0 1 2 3 z
Sample proportion comes from a normal distribution
p ~ normal dist

5.40
Standardizing z-scores
score mean
z=
std dev

y m
Individuals scores (y) z=
s

pp
z=
Proportions (p) p (1 p )
n

L5.42
What is the probability of less than 50% boys in a
sample of 1600 births?
p = 0.517
sd(p) = 0.012493


If p = .5 z =
(1 0.5 0.517 p
-1.36 0 z

= 0.5 0.517
0.012493
z = -1.36
Left Tail area = 0.0869
= probability

There is a 0.0869 probability that the sample of 1600 boys from the
population will contain fewer than 50% boy births.
L5.43
Interpretation, Continued
So, for a sample of size 1600, there is about a 9% chance that
a single sample will contain fewer than 50% boys.

What happens if we take a larger sample?

L5.44
Increasing the sample size
This time, we take a random sample of n=6400 births
This sample is 4 times as large as before!
The mean of the sampling distribution will not change
p = 0.517, but
the new standard deviation will change, because of the n in
the denominator of the formula for the sd:

(1 )
=

0.517(10.517)
=
6400

= 0.0062464 L5.45
Larger sample
The std dev of the sampling distribution for n = 1600 was
0.012493.
The std dev of the sampling distribution with n=6400 is
0.0062464.
The sd based on the larger sample is exactly half that of the
smaller sample.

to halve the sd,


we need to quadruple the sample size!

L5.46
Check the assumptions again
We have a random sample from the population,
and np = 6400 x 0.517 = 3308.8 and
n(1-p) = 6400 x 0.483 = 3091.2
i.e. Since both np and n(1-p) are 5 the sample size is large
enough to assume p ~ normal distribution.

The assumptions are satisfied so the conclusions will be


sound.

L5.47
What is the probability of less than 50% boys in a
sample of 6400 births?
m = 0.517
s = 0062464


If p = .5 z =
(1 0.5 0.517 p
-2.72 0 z
= 0.5 0.517
0.0062464
z = -2.72
Left Tail area = 0.0033
= probability

There is a 0.0033 probability that the sample of 6400 boys from the
population will contain fewer than 50% boy births.
L5.48
Larger sample Smaller spread
p = 0.517
s= 0.012493
p = 0.5
n = 1600
Left area = 0.0869 0.5 0.517 p

p = 0.517
s = 0.0062464
p = 0.5
n = 6400
Left area = 0.0033
0.5 0.517 p

L5.49
Larger Sample Smaller Error
If we take a random sample of 6400 births,
then we will make the mistake of deciding that the proportion
of boy births is less than 0.5 only a very small fraction of the
time, namely 0.0033 (= 0.33%)

Larger Sample Smaller Error

L5.50
Example 2
SIBT believes that 55% of the current students are of Chinese
nationality.

What is the probability that, out of a random sample of 150


students, less than 90 students are of Chinese nationality?
p = 90/150
Given: p = 0.55, n = 150, p = 90/150 . = 0.6

Find: Probability that p < 0.6

L5.51
Example 2, Solution
Given: p = 0.55, n = 150, p = 90/150 . p = 90/150
Find: Probability that p < 0.6 = 0.6
Check:

L5.52
Sampling Distribution for Means
Sampling Distribution for Means
We have seen that sample proportions are random variables
which arise from a Normal distribution.

Now we consider the distribution of sample means from


numerical data.

Sample means are random variables in the same way as


sample proportions are random variables.
Each time we obtain a different random sample from a
population, the sample mean will vary.

L5.55
Example: Distribution of Sample Means

Results of IQ tests can be


scaled so that m = 100, s = 15.
It is known that IQ's are
normally distributed
100

1. Consider taking many observations from the population.

L5.56
IQ scores: m = 100, s = 15
Freq
200 Individual scores
150

100

50

0
50 75 100 125 150

L5.58
Example: Distribution of Sample Means
Results of IQ tests can be
scaled so that m = 100, s = 15.
It is known that IQ's are
normally distributed
100

2. Consider taking many random samples from the population,


all of size (i) n = 4, (ii) n=10.
a. Then calculate the mean of each of these samples.
b. Construct a histogram of the means of these samples,
and notice the shape of the distribution of averages.

L5.59
IQ scores: m = 100, s = 15
Freq
200

150

100
Individual scores
50

0
50 75 100 125 150

Freq.
200

150
Averages from samples of
100

50
size 4
0
50 75 100 125 150
Freq.
200

150

100
Averages from samples of
50 size 10
0
50 70 90 110 130 150

L5.61
Descriptive Statistics
IQ scores: m = 100, s = 15
Variable Size Mean StDev Min Max
Individuals 500 100.67 14.60 51 147
Means of 4 500 99.68 7.58 78 120
Means of 10 500 99.78 4.79 84 112
y s

The descriptive statistics show that the overall means of the


individual scores and of the sample means are close to 100.
The standard deviations are decreasing - the distributions are
becoming less spread out as the size of the samples increases.
The shape of each histogram appears normal.

L5.62
The Population Distributions

As n increases the distribution of


means becomes more
compressed averages of 10 scores

averages of 4 scores

individual scores

50 100 150

http://onlinestatbook.com/stat_sim/sampling_dist/index.html
L5.63
Some Simulations
On the following slides we show some simulations of random
samples from 3 different populations;
1. normal,
2. uniform and
3. a funny population.

For each we show samples of averages ( means) of


5 random observations
15 random observations
25 random observations

L5.64
Sample Means from a Normal Population
individual

-4 -2 0 2 4

mean(n=5)

-4 4

mean (n=15)

-4 4

mean (n=25)

-4 4
L5.65
Sample Means from a Uniform Population
individual

0 1

mean (n=5)

0 1

mean (n=15)

0 1

mean (n=25)

0 1
L5.66
Sample Means from a Skewed Population
individual

0 1

mean (n=5)

0 1

mean (n=15)

0 1

mean (n=25)

0 1
L5.67
The Distribution of Sample Means
From the previous examples, it is clear that the distribution of
averages is approximately normal.

Also, as the sample size, n, gets larger:


The centre remains the same
The spread decreases
The shape of the histogram becomes more normal.

These examples have demonstrated the results of the


Central Limit Theorem.
http://onlinestatbook.com/stat_sim/sampling_dist/index.html L5.68
The Central Limit Theorem (CLT)
The central limit theorem (CLT) is one of the most important
results in statistical theory.
It states that
If we take random samples of the same size n from a
population which is not normally distributed,

then the sample means will follow a normal distribution


provided n, the sample size, is large enough.

The central limit theorem is the reason why normal


distributions are so frequent in nature
The previous simulations and applets have demonstrated the
CLT.
L5.69
What is large enough?
The closer the original population is to a normal distribution,
the smaller the sample size required for the CLT effect to
apply.

Usually n = 25 will be large enough to assume an


approximate normal distribution for sample means.

If the original population is a normal distribution,

then sample means will arise from a normal distribution,


regardless of sample size (n).

L5.70
Distribution of sample means

approx approx

L5.74
Distribution of sample means

approx normal
Mean=m
sd = s/5

L5.76
The spread of the distribution of means

By the , the distribution of sample


means is less spread out for averages of larger samples.
An estimate of a population mean will be more accurate for
larger samples
The error associated with the estimate of a population
mean will be smaller for larger samples.

The measure of the spread of the distribution of


sample means is called the
standard error of the mean.
L5.77
Standard Error of the mean
The measure of the spread
of the distribution of sample means is called the
standard error of the mean.

The standard error of the distribution of averages of random


samples depends on
the standard deviation of the population, s, and
the sample size, n.
It is given by

=

= =
L5.78
Distributions of average IQ scores
If n increases
1/n decreases
1/n decreases

se (y) = s/n

L5.79
Notation
Sample Population
estimate
Statistics Parameters
Mean y m

Median ~y ~
m
Std.dev s s

se (y) s/n s/n

Proportion p p
1
se (p) 1


L5.80
The standard error of the distribution of sample
means and proportions

The distribution of sample means and


proportions is less spread out than the
distribution of individual scores y

A Larger sample
Less spread
More accurate estimate
Less ERROR

So we call the standard deviation of the distributions


of both means and proportions the standard error.
L5.81
Example 1: IQ scores
What is the probability that the mean IQ of a class of 25
university students students will be greater than 120?
Assume
that the measurements are independent of each other
m = 115 and s = 8 for the population of university
students
Let y = IQ score of a random university student.
Then y = average IQ of 25 university students.

We know that y ~ normal distribution (m = 120 and s = 8).


Find probability (y > 120)
L5.83
Standardising scores

Score z = score mean


std dev


Individual scores ( y) z=


Averages () z=

L5.84
Distributions

Let y = IQ score of a university student.


Distribution of individual IQs:
shape centre spread

normal m = 115 s=8

Then y = average IQ of 25 university students.


Distribution
shape of average
centre IQs: spread
normal m = 115 se(y)=8/25
= 1.6

L5.85

z=
x

Given
Raw score (y) z score tail area required p
from tables


z=
x

Given
sample mean(y) z score tail area required p
from tables

L5.86
Example 1 Question
Find area under the
curve

What is the probability that


the average IQ of a class of 25
university students will be
n
greater than 120?

score
region

L5.87
What is the probability that we will get a sample mean
of at least 120?
Given: IQs ~ normal distribution: m = 115 and s = 8, n=25
average IQ ~ normal distribution


If y = 120 -> =

120115 115 120 y
= 8
0 3.13 z
25

= 3.13 (2dp)

Tail area = 0.00087.


= probability ..the probability of obtaining a
mean IQ of at least 120, from a
sample of size 25, is 0.00087.
L5.88
Example 2: Teachers Incomes
We know that the distribution of incomes is not Normal!
It is unimodal but not symmetric
(In fact, it is always skewed to the right)

Assume that m
The mean yearly income in the population of teachers in Australia
is $38,000.
The standard deviation of yearly incomes in the population is
$20,000.
s
? What is the probability that the average income of 30 teachers in
Australia will be less than $40,000?

L5.89
Example 2 Question
Find area under the
curve

What is the probability that


the average income of a
random sample of 30 teachers
is less than $40,000?
n
score
region

L5.90
Applying the CLT to Income Example, n=30

We have
m = $38,000
s = $20,000,
n = 30 38 y
$000s

The CLT tells us of the sampling distribution of the means,


Shape will be approximately Normal
Mean should be $38,000
Standard Deviation should be $20,000/30 = $3651.5

L5.91
Example 2: Solution
What is the probability that the average income of a random sample of 30
teachers in Australia is less than $40,000 per year?
Given: m = $38,000, s = $20,000, n = 30
CLT check:
Since n=30 is >25 the CLT for means applies
If = 40 -> ym
z=
s
n

Tail area =

The probability that the average income is


less than $40,000 is L5.92
Summary
Data (= Real world) vs. Theory
0.6
The CLT does n o t say that 0.4

the distribution of sample data 0.2

is Normal if the sample size is 0


M F

large enough

The CLT is about the distribution


of sample proportions and
sample means of many
different samples drawn from
the same population.

L5.95
Central Limit Theorem
The Central Limit Theorem applies to both means and
proportions under certain conditions.
1. The observations must be independent of each other
2. The data must be from a normal distribution OR
the sample size, n, must be large enough

Sample
Score Statistic Reason
size (n)

y ~ normal (m, s) y ~ normal (m , s/n) n>0 Fact!

y ~ Any (m , s) y ~ normal (m , s/n) n 25 CLT

(1) np 5,
counts p ~ normal (p , ) CLT
n(1-p) 5
L5.96
Standardizing z-scores
score mean
z=
std dev
y m
Individuals scores (y) z=
s
y m
Mean (y) z=
s
n

pp
Proportions (p) z=
p (1 p )
n
L5.97

Anda mungkin juga menyukai