p
Inference
Use statistic to describe the parameter
Parameter is the goal
Statistic is the tool
Sampling variability
Each time we take a random sample from a population, we are
likely to get a different set of individuals and calculate a
different statistic. This is called sampling variability. If we take a
lot of random samples of the same size from a given
population, the variation from sample to samplethe
sampling distributionwill follow a predictable pattern.
Sampling Distribution
Distribution of the statistic obtained from
repeated samples (or repeated trials of an
experiment) using the same number of
observations.
Changes with parameter
Random Variable
Imagine the data are random variables
Statistic = function of data
Statistic is also random
Survey
Simple Random Sample
78% agree with statement
Repeat survey
77% agree
73% agree
Sample is random
Sample proportions
The proportion of successes can be more informative than
the count. In statistical sampling the sample proportion of
successes, , is used to estimate the proportion p of
successes in a population.
For any SRS of size n, the sample proportion of successes is:
n
X
n
p = =
sample in the successes of count
( > p P
) 24 . 3 ( )
200
) 95 (. 05 .
05 . 10 .
( > =
> = z P z P
=1.9994 = .0006
n = 200
S: underfilled can
p = P(S) = .05
q = .95
np = 10 nq = 190
OK to use the normal
approximation
Confidence intervals for proportions
According to the empirical rule
95% between - 2 and +2
500 people in Washington surveyed
120 agree with recent changes to bankruptcy laws
Sample proportion = 120/500 = 0.24
Estimate: SE ( )=
p
Can we say?
24% of people in Washington agree with the
recent changes to bankruptcy laws. (No)
It is probably true that 24% of people in
Washington agree with the recent changes to
bankruptcy laws. (No)
We are 95% confident that between 24%-2*1.9%
and 24%+2*1.9% of people in Washington agree
with the recent changes to bankruptcy laws. (Yes)
95% Confidence
What does the term 95% confidence really
mean?
95% of samples of this size will produce
confidence intervals that capture the
true proportion
95% Confidence (cont.)
What does the term 95% confidence really mean?
We expect 5% of our samples to produce intervals that
fail to capture the true proportion
Margin of Error
A 95% CI for a population proportion p can be
written as:
Half of the width of the CI is called the margin
of error, so the CI can be written as:
estimate ME.
p 2 SE(
p ).
Critical Values
Confidence level 95%: weve been using 2 for the
empirical rule, but if we use software/table, we get z* =
1.96, so this is the critical value and
95% CI is
Confidence level 90%: z* = 1.645, so
90% CI is
). ( 96 . 1 p SE p
). ( 645 . 1 p SE p
90% Confidence Interval
Critical Values
Confidence Level Critical Value = z*
90 1.645
95 1.96
98 2.326
99 2.576
99.9 3.29
Confidence Intervals
Point estimate Critical value x Standard error
Generally
95% Confidence Interval
Example: Give a 95% Confidence Interval for
p.
Supposed that 43% of 1006 people surveyed in
Washington are in favor of a financial reform
Standard Error
Margin of error
Upper limit
Lower limit
BLS Household survey
60,000 people surveyed. 6.8% unemployed.
90% confidence interval
Assumptions and Conditions
(1) Independence Assumption: Are sample observations
independent of each other?
(1) Randomization Condition: Was the sample randomly
generated?
(2) 10% Condition: If sampling is done without replacement,
then the sample size, n, must be no larger than 10% of
the population.
(2) Success/Failure Condition: The sample size must be large
enough so that both np and n(1-p) are at least 10.
Questions
Lets think about the 95% confidence interval we just computed
for the proportion of people in Washington in favor of a
financial reform
1 If we wanted to be 99% confident, would our confidence
interval need to be wider or narrower?
2 Our margin of error was about 3%. If we wanted to reduce it
to 2%, would our level of confidence be higher or lower?
3 If we had polled more people, would the intervals margin of
error have been larger or smaller?
Sample size for a desired margin of error
You may need to choose a sample size large enough to achieve a
specified margin of error. However, because the sampling distribution
of is a function of the population proportion p, this process
requires that you guess a likely value for p: p*.
The margin of error will be less than or equal to m if p* is chosen to be 0.5.
Remember, though, that sample size is not always stretchable at will. There are
typically costs and constraints associated with large samples.
( ) *) 1 ( *
*
) 1 ( , ~
2
p p
m
z
n n p p p N p
|
.
|
\
|
=
p
What sample size...?
How large a sample would be necessary to
estimate the true proportion of defective
products in a large population of products
within 3%, with 95% confidence?
(Assume a pilot sample yields an estimate of
p of 0.12)
What sample size...?
Solution:
For 95% confidence, use z* = 1.96
m = 0.03
p* = 0.12, so use this to estimate p
So use n = 451
n =
z
2
p*(1 p*)
m
2
=
(1.96)
2
(0.12)(1 0.12)
(0.03)
2
= 450.74
(continued)