Anda di halaman 1dari 19

Statistics for Social Research

2015

Confidence Intervals I

3/25/2015
NYU

Spring

Sample vs. Population


Example
Is the unemployment rate (UR) we know a
population-based rate?
- Yes or No?
Current Population Survey (CPS)
- Monthly survey of a sample of about 50,000
adults regarding job-related activities by the
Bureau of Labor Statistics
- So vital that job-related estimates from CPS
often cause fluctuations in the stock market and
influence economic policies
How confident should we be in the estimated
UR?

Sample vs. Population

Estimation
Definition
A process whereby a random sample from a
population is selected and a sample statistic is
used to estimate a population parameter
- e.g., using the % of the unemployed estimated
from the CPS data to estimate the actual % of the
unemployed
Why do this, BTW?
- In most cases, we dont know the values of the
population parameters nor have enough
resources to survey the entire population
- With the use of sampling theory and statistical
inference, we can approximate the population
parameters

Point and Interval Estimation


Point estimates
Sample statistics used to estimate the exact
value of a population parameter
Examples
- Rate: unemployment rate, infant mortality
rate, college entrance rate,
- Mean: years of education, duration of
unemployment,
- Proportion: % of Americans watching Fox
News, MSMBC, or CNN,
- Others
Main concern
- How accurate are sample statistics?
- How to reflect the uncertainty due to sampling

Point and Interval Estimation


Interval estimates
A range of values within which the population
parameter may fall
Confidence Interval (CI)
A range of values defined by the confidence
level within which the population parameter is
estimated to fall
Also referred to as margin of error
Point estimate a margin of error
Confidence level
- The likelihood that a specified confidence
interval will contain the population parameter
- Expressed as a percentage or a probability
- Commonly used CI levels: 95%, 99%, 90%

Point and Interval Estimation

CI for the Population Mean


Inferential statistics
Provide a best guess of the population
parameters based entirely on information from a
sample of the population
With the sample, we know: a sample mean,
standard deviation, and sample size
Combine these statistics with sampling theory
(i.e., central limit theorem) to produce confidence
intervals
Infer the population parameters

CI for the Population Mean


Average income of single-parent families
1. Sampling
Procedure
The
population
of singleparent
families

A sample
of singleparent
families
2.
Estimation
Mean = $36,000
n = 100

3. Inference
about
population

$36,000$2,000

CI for the Population Mean


Recall the central limit theorem

X N ( ,

The 90-95-99 rule


A total of 90% of all random sample means will
fall within 1.65 standard error of the true
population mean
A total of 95% of all random sample means will
fall within 1.96 ( 2) standard error of the true
population mean
A total of 99% of all random sample means will
fall within 2.58 standard error of the true
population mean

CI for the Population Mean


e.g., 95% confidence level

95 times out of 100, the sample mean is within


2 s.e. of the population mean
In 95% of samples, the population mean is
within 2 s.e. of the sample mean

CI for the Population Mean


Formula
CI X Z (s.e.)
Determining the CI
Calculate the s.e. of the mean
Decide the confidence level
Find the corresponding Z value
Calculate the CI
Interpret the result

CI for the Population Mean


e.g., Average income of single-parent families
Mean = $36,000, n = 100
Suppose the population S.D. is $10,000

10000

s.e.

10

1000

Say, we decide on a 95% confidence level


The corresponding Z value is 1.96 or
approximately
95% CI X Z2(s.e.) 36000 2(1000) 36000 2000

CI for the Population Mean


Correct interpretation
- We are 95% confident that
- In 95% of samples,
- 95 times out of 100,
the true population mean, , of average income of
single-parent families is between $34,000 and
$38,000
which means that 5 times of 100, is not
included in the specified CI
Demonstration
Incorrect interpretation (think about why?)
- The probability that is in the specified
interval is 95%
- The probability that the sample mean is in the

CI for the Population Mean


Varying the confidence level
What happens?
90% CI 36000 1.65(1000) 36000 1650

95% CI 36000 1.96(1000) 36000 1960


99% CI 36000 2.58(1000) 36000 2580
Examine the width of CI as the confidence level
changes
- [upper
limit, lower
90%
CI : [34350,
37650limit]
]
95% CI : [34040, 37960]
99% CI : [33420, 38580]
Trade-off between confidence and precision
- The higher confidence, the less precise
estimate

CI for the Population Mean


Estimating the population S.D.
So far, we assume we know the population S.D.
But do we really know it?

s.e. X

Applying the central limit theorem in a slightly


different way, we see

sX
X
n , s X
X
n
n

- In other words, use the estimated s.e.


- Replace the actual population S.D. by the
sample S.D.

CI for the Population Mean


e.g., Average TV watching hours per day, GSS
2008
n = 562, mean = 2.98 hours, S.D. = 2.66 hours
Calculate the
standard error (s.e.)
s X estimated
2.66

s.e. s X

562

0.11

95%
is(0
1.96
95%
CICI;
2Z-score
.98 1.96
.11) 2.98 0.22 [2.76, 3.20]
Interpretation
- We are 95% confident that the actual average
TV watching hours would be between 2.76 and
3.2 hours per day.

CI for the Population Mean


Factors affecting CI
Look at the second part of the formula for CI
sX
Z (s.e.) Z
n

If the confidence level should be larger, Z


should be larger
If the standard deviation is larger, the CI is
wider
less precise
If n is larger, the CI is narrower more precise
e.g.,
TV
watching
hours per
day, GSS
n Average
s.e.
95%
CI
Interval
width
2008
195
0.19
[2.61, 3.35]
0.74
562
Mean
= 2.98
hours,
0.11
[2.76,
3.20]S.D. = 2.66
0.44 hours
198

0.06

[2.86, 3.10]

0.24

CI for the Population Mean


Example. Earnings differential among Hispanics,
Earnings
S.D.
2000 Census n
Cubans

29233

$24018

$36298

Puerto
Ricans

66933

$18748

$25694

Mexicans

34620

$16537

$23502

Cubans: s.e. = 36298/sqrt(29233) = 212.29


95% CI = 24018 1.96(212.29) = 24018
416
= [23602, 24434]
Puerto Ricans: s.e. = 25694/sqrt(66933) = 99.32
95% CI = 18748 1.96(99.32) = 18748
195
= [18553, 18943]
Mexicans: s.e. = 23502/sqrt(34620) = 126.31

Anda mungkin juga menyukai