Anda di halaman 1dari 89

5-1

Sampling Theory
Chapter 5
Theory & Problems of
Probability & Statistics
Murray R. Spiegel

5-2

Outline Chapter 5
Population X
mean and variance - , 2
Sample
mean and variance X, ^s2
Sample Statistics
X mean and variance x, x
2

s
s
^s mean and variance
2

5-3

Outline Chapter 5
Distributions
Population
Samples Statistics
Mean
Proportions
Differences and Sums
Variances
Ratios of Variances

5-4

Outline Chapter 5
Other ways to organize samples
Frequency Distributions
Relative Frequency Distributions
Computation Statistics for Grouped Data
mean
variance
standard deviation

5-5

Population Parameters
A population - random variable X
probability distribution (function) f(x)
probability function
- discrete variable f(x)
density function
- continuous variable
f(x) function of several parameters, i.e.:
mean: ,
variance: 2
want to know parameters for each f(x)

5-6

Example of a Population
5 project engineers in department
total experience of (X) 2, 3, 6, 8, 11 years
company performing statistical report
employees expertise based on experience
survey must include:
average experience
variance
standard deviation

5-7

Mean of Population
average experience mean:

2 3 6 8 11 30

6 years
5
5

5-8

Variance of Population

(x i )

variance:

2
2
2
2
2
(
2

6
)

(
3

6
)

(
6

6
)

(
8

6
)

(
11

6
)
2

5

16 9 0 4 25

10.8
5
2

5-9

Standard Deviation of Population


standard deviation:

s.d .
2

s.d . 10.8

10.8 3.29

5-10

Sample Statistics
What if dont have whole population
Take random samples from population
estimate population parameters
make inferences
lets see how
How much experience in company
hire for feasibility study
performance study

5-11

Sampling Example
manager assigns engineers at random
each time chooses first engineer she sees
same engineer could do both
lets say she picks (2,2)
mean of sample X= (2+2)/2 = 2
you want to make inferences about true

5-12

Samples of 2
replacement she will go to project department twice
pick engineer randomly
potentially 25 possible teams
25 samples of size two
5 * 5 = 25
order matters (6, 11) is different from (11, 6)

5-13

Population of Samples
All possible combinations are:
(2,2)

(2,3)

(2,6)

(2,8)

(2,11)

(3,2)

(3,3)

(3,6)

(3,8)

(3,11)

(6,2)

(6,3)

(6,6)

(6,8)

(6,11)

(8,2)

(8,3)

(8,6)

(8,8)

(8,11)

(11,2)

(11,3)

(11,6)

(11,8)

(11,11)

5-14

Population of Averages
Average experience or sample means are: Xi
(2)

(2.5)

(3)

(5)

(6.5)

(2.5)

(3)

(4.5)

(5.5)

(7)

(3)

(4.5)

(6)

(7)

(8.5)

(5)

(5.5)

(7)

(8)

(9.5)

(6.5)

(7)

(8.5)

(9.5)

(11)

5-15

Mean of Population Means


And mean of sampling distribution of means is :

(2) (2.5) (3) (5) ... (11) 150


X

6
25
25
This confirms theorem that states:

E( X ) X 6

5-16

Variance of Sample Means


variance of sampling distribution of means ( Xi - X)2
(2-6)2

(2.5-6)2

(2.5-6)2 (3-6)2

(3-6)2

(5-6)2

(4.5-6)2 (5.5-6)2

(6.5-6)2
(7-6)2

(3-6 )

(4.5-6)2 (6-6)2

(7-6)2

(8.5-6)2

(5-6 )2

(5.5-6)2 (7-6)2

(8-6)2

(9.5-6)2

(6.5-6 )2 (7-6)2

(8.5-6)2 (9.5-6 )2 (11-6)2

5-17

Variance of Sample Means


Calculating values:
16

12.25

0.25

12.25

2.25

0.25

2.25

6.25

0.25

12.25

0.25

6.25

12.25

25

5-18

Variance of Sample Means

variance is:

2
(
X

X
)
135

2
i
X

5. 4
n
25

Therefore standard deviation is

X 5.4 2.32

5-19

Variance of Sample Means


These results hold for theorem:

2
X

Where n is size of samples. Then we see that:

10.8

5.40
n
2
2

2
X

5-20

Math Proof
X mean
X = X1 + X2 + X 3 + . . . Xn
n
E( X) = E(X1) + E(X2)+ E(X3) + . . . E(Xn)
n
E( X) = + + + . . .
n
E( X) =

5-21

Math Proof
X variance
X = X1 + X2 + X 3 + . . . Xn
n
Var( X) = 2 x = 2x + 2x + 2x + . . . 2x
n2
=

5-22

Sampling Means No Replacement


manager picks two engineers at same time
order doesn't matter
order (6, 11) is same as order (11, 6)
10 choose 2

5!/(2!)(5-2)! = 10

10 possible teams, or 10 samples of size two.

5-23

Sampling Means No Replacement


All possible combinations are:
(2,3)

(2,6)

(2,8) (2,11) (3,6)

(3,8) (3,11) (6,8) (6,11) (8,11)


corresponding sample means are:
(2.5) (3)

(5)

(6.5) (4.5)

(5.5) (7)

(7)

(8.5) (9.5)

mean of corresponding sample of means is:


2.5 3 5 ... 9.5
X
6
10

5-24

Sampling Variance No Replacement


variance of sampling distribution of means is:

( X i X ) ( 2.5 6) (4 6) ... (9.5 6)

4.05
2

2
X

10

standard deviation is:

(Xi X)

X
4.05 2.01
2

2
X

5-25

Theorems on Sampling
Distributions with No Replacements
1.

X 6
2.

N n 10.8 5 2 10.8 3
2
X


4.05
n N 1
2 5 1
2 4

5-26

Sum Up Theorems on
Sampling Distributions
Theorem I:
Expected values sample mean = population mean
E( X ) = x =
: mean of population
Theorem II:
infinite population or sampling with replacement
variance of sample is
E[( X- )2] = x2 = 2/n
2: variance of population

5-27

Theorems on Sampling
Distributions
Theorem III: population size is N
sampling with no replacement
sample size is n
then sample variance is:
2

N n

2
x

n N 1

5-28

Theorems on Sampling
Distributions
Theorem IV: population normally distributed
mean , variance 2
then sample mean normally distributed
mean , variance 2/n

X
Z
N( 0,1)

5-29

Theorems on Sampling
Distributions
Theorem V:
samples are taken from distribution
mean , variance 2
(not necessarily normal distributed)
standardized variables

X
Z

n
asymptotically normal

5-30

Sampling Distribution of
Proportions
Population properties:
* Infinite
* Binomially Distributed
( p success; q=1-p fail)
Consider all possible samples of size n
statistic for each sample
= proportion P of success

5-31

Sampling Distribution
of Proportions
Sampling distribution of proportions of:
mean:

P p

std. deviation:

pq

p(1 p )
n

5-32

Sampling Distribution of
Proportions
large values of n (n>30)
sample distribution for P
approximates normal distribution
finite population sample without replacing
standardized P is

Pp
Z
pq
n

5-33

Example Proportions
Oil service company
explores for oil
according to geological department
37% chances of finding oil
drill 150 wells
P(0.4<P<0.6)=?

5-34

Example Proportions
P(0.4<P<0.6)=?

Pp
Z
pq
n
P(0.4-0.37
< P-.37 < 0.6-0.37)
=?
(.37*.63/150).5 (pq/n).5 (.37*.63/150).5

5-35

Example Proportions
P(0.4<P<0.6)=P(0.24<Z<1.84)
=normsdist(1.84)-normsdist(0.24)= 0.372

Think about mean, variance and distribution of


np the number of successes

5-36

Sampling Distribution of
Sums & Differences
Suppose we have two populations.
Population
XA XB
Sample of size nA

nB

Compute statistic

SA

SB

Samples are independent


Sampling distribution for SA and SB gives
mean:

SA

variance:

SB
SA2

SB2

5-37

Sampling Distribution of Sums


and Differences
combination of 2 samples from 2 populations
sampling distribution of differences
S = SA +/- SB
For new sampling distribution we have:
mean:

S = SA +/- SB

variance:

S2 = SA2 + SB2

5-38

Sampling Distribution of
Sums and Differences
two populations XA and XB
SA= XA and SB = XB sample means
mean:
XA+XB = XA + XB = A + B
variance:

2
XA X B

nA nB

Sampling from infinite population


Sampling with replacement

2
A

2
B

5-39

Example Sampling Distribution


of Sums
You are leasing oil fields from
two companies for two years
lease expires at end of each year
randomly assigned a new lease for next year
Company A - two oil fields
production XA: 300, 700 million barrels
Company B two oil fields
production XB: 500, 1100 million barrels

5-40

Population Means
Average oil field size of company A:
300 700
XA
500
2
Average oil field size of company B:
XB

500 1100

800
2

XA XB 500 800 1300

5-41

Population Variances
Company A - two oil fields
production XA: 300, 700 million barrels
Company B two oil fields
production XB: 500, 1100 million barrels
XA2 = (300 500)2 + (700 500)2/2 = 40,000
XB2 = (500 800)2 + (1100 800)2/2 = 90,000

5-42

Example Sampling Distribution


of Sums
Interested in total production: XA + XB
Compute all possible leases assignments
Two choices XA, Two choices XB
XAi

XBi

{300, 500}
{300, 1100}
{700, 500}
{700, 1100}

5-43

Example Sampling Distribution


of Sums
XAi

XBi

{300, 500}
{300, 1100}
{700, 500}
{700, 1100}
Then for each of the 4 possibilities
4 choices year 1, four choices year 2 = 4*4 samples

5-44

Example Sampling Distribution


of Sums
Samples
Year 1
Year 2
Year 1
Year 2
Year 1
Year 2
Year 1
Year 2

XAi
300
300
300
300
300
700
300
700

XBi
500
500
500
1100
500
500
500
1100

XAi
300
300
300
300
300
700
300
700

XBi
1100
500
1100
1100
1100
500
1100
1100

5-45

Example Sampling Distribution


of Sums
Samples
Year 1
Year 2
Year 1
Year 2
Year 1
Year 2
Year 1
Year 2

XAi
700
300
700
300
700
700
700
700

XBi
500
500
500
1100
500
500
500
1100

XAi
700
300
700
300
700
700
700
700

XBi
1100
500
1100
1100
1100
500
1100
1100

5-46

Compute Sum and Means of each


sample
Means
Year 1
Year 2
Year 1
Year 2
Year 1
Year 2
Year 1
Year 2

XAi+XBi Mean
800
800
800
800 1100
1400
800 1000
1200
800 1300
1800

XAi+XBi Mean
1400 1100
800
1400 1400
1400
1400 1300
1200
1400 1600
1800

5-47

Compute Sum and Means of each


Sample
Means
Year 1
Year 2
Year 1
Year 2
Year 1
Year 2
Year 1
Year 2

XAi+XBi Mean
1200
1000
800
1200
1300
1400
1200
1200
1200
1200
1500
1800

XAi+XBi Mean
1800 1300
800
1800 1600
1400
1800 1500
1200
1800 1800
1800

5-48

Mean of Sum of Sample Means


Population of Samples
{800, 1100, 1000, 1300, 1100, 1400, 1300, 1600, 1000, 1300,
1200, 1500, 1300, 1600, 1500, 1800}
_______
XAi+XBi =
(800 + 1100 + 1000 + 1300 + 1100 + 1400 + 1300 + 1600 +
1000 + 1300 + 1200 + 1500 + 1300 + 1600 + 1500 + 1800)
16
= 1300

5-49

Mean of Sum of Sample Means


This illustrates theorem on means
_____
(XA+XB)= 1300= XA+ XB = 500 + 800 = 1300
_____
What about variances of XA+XB

5-50

Variance of Sum of Means


Population of samples
{800, 1100, 1000, 1300, 1100, 1400, 1300, 1600, 1000, 1300,
1200, 1500, 1300, 1600, 1500, 1800}
2 = {(800 - 1300)2 + (1100 - 1300)2 + (1000 - 1300)2 +
(1300 - 1300)2 + (1100 - 1300)2 + (1400 - 1300)2 + (13001300)2 + (1600 - 1300)2 + (1000 - 1300)2 + (1300 - 1300)2 +
(1200 - 1300)2 + (1500 - 1300)2 + (1300 - 1300)2 + (1600 1300)2 + (1500 - 1300)2 + (1800 - 1300)2}/16
= 65,000

5-51

Variance of Sum of Means


This illustrates theorem on variances

2
XA X B

nA nB
2
A

2
B

40000 90,000
65,000

2
2

5-52

Normalize to Make Inferences on


Means

XA XB A B
A B

na nB
2

5-53

Estimators for Variance


Two choices
2
2
2
(
X

X
)

(
X

X
)

...

(
X

X
)
2
n
S2 1
n

use for populations

(
X

X
)

(
X

X
)

...

(
X

X
)
1
2
n
S
n 1
2

2
2

E (S )

unbiased better for smaller samples

5-54

Sampling Distribution of Variances


All possible random samples of size n
each sample has a variance
all possible variances
give sampling distribution of variances
sampling distribution of related random variable
nS 2 ( n 1)S 2 ( X 1 X ) 2 ( X 2 X ) 2 ... ( X n X ) 2

2
2

5-55

Example Population of Samples


All possible teams are:
(2,2)

(2,3)

(2,6)

(2,8)

(2,11)

(3,2)

(3,3)

(3,6)

(3,8)

(3,11)

(6,2)

(6,3)

(6,6)

(6,8)

(6,11)

(8,2)

(8,3)

(8,6)

(8,8)

(8,11)

(11,2)

(11,3)

(11,6)

(11,8)

(11,11)

5-56

Compute Variance for Each Sample


sample variance corresponding to each of 25 possible
choice that manager makes are: ^s2
0

0.25

20.25

.25

2.25

6.25

16

2.25

6.25

6.25

2.25

6.25

2.25

20.25 16

(2 6.5) 2 (11 6.5) 2


20.25
2

5-57

Sampling Distribution of Variance


Population of Variances
mean
variance
distribution
(n-1)s2/ 2 2n-1

5-58

What if Unknown Population


Variance?
X is Normal (, 2)
to make inference on means we normalize

X
Z

5-59

Unknown Population Variance


2

( n 1) S
2

2

n
2

( n 1) S
2

t n 1

S
n

5-60

Unknown Population Variance


X
P ( t n 1 , c 1
t n 1 , c 2 )

S
n
Use in the same way as for normal
except use different Tables
= 0.05
n = 25, =tinv(0.05,24)= 2.0639
P( 2.0639

-2.06

2.06

X
2.0639 ) 1 0.05

S
n

5-61

Uses t -statistics
Will use for testing
means, sums, and differences of means
small samples when variable is normal
substitute sample variance in for true

X
X
Z
t n 1

s
n
n

5-62

Uses t -statistics
sums and differences of means

X 1 X 2 ( 1 2 )
N( 0,1)
2
2
1 2
n1 n 2
unknown variance

X 1 X 2 ( 1 2 )
t n n 2
2
2
n1 - 1 s1 n 2 - 1 s 2 n1 n 2

n1 n 2 - 2
n1 n 2
1

5-63

Uses 2 statistic
Inference on Variance
Large sample test
2

( n 1) S
2

2

5-64
F Statistic
Inferences

2df1/df1 =
2df2/df2

s
Fdf 1,df 2
s
2
1
2
2

2
2
2
1

( n1 1)s12
2

1
2

( n 2 1)s 2
2

( n1 1)
( n 2 1)

F Statistic

5-65

Other tests
groups of coefficients

5-66

Other Statistics
.

Medians

med

1.2533

2n
n

n > 30, sample distribution of medians


nearly normal if X is normal

med

5-67

Frequency Distributions
If sample or population is large
difficult to compute statistics
(i.e. mean, variance, etc)
Organizing RAW DATA is useful
arrange into CLASSES or categories
determine number in each class
Class Frequency or Frequency Distribution

5-68

Frequency Distributions - Example


Example of Frequency Distribution:
middle size oil company
portfolio of 100 small oil reservoirs
reserves vary from 89 to 300 million barrels

5-69

Frequency Distributions - Example


arrange data into categories
create table showing ranges of reservoirs sizes
number of reservoirs in each range
Reserves
50-100
101-150
151-200
201-250
251-300
TOTAL

Number of
Fields
4
21
42
27
6
100

5-70

Frequency Distributions - Example


Class intervals are in ranges of 50 million barrels
Each class interval represented by median value
e.g. 200 up to 250 will be represented by 225
Can plot data
histogram
polygon
This plot is represents frequency distribution

5-71

Frequency Distributions Plotted Example

50-100
101-150
151-200
201-250
251-300
TOTAL

45
40
35
30
No. of Fields

Reserves

Number of
Fields
4
21
42
27
6
100

25
20
15
10
5
0
25

75

125

175
Reserves (mmb)

225

275

325

5-72

Relative Frequency Distributions


and Ogives
number of individuals
- frequency distribution
- empirical probability distribution
percentage of individual
- relative frequency distribution
empirical cumulative probability distribution
- ogive

5-73

Percent Ogives
OGIVE for oil company portfolio of reservoirs

Shows percent reservoirs < than x reserves

5-74

Computation of Statistics for


Grouped Data
can calculate mean and variance
from grouped data

5-75

Computation of Statistics for


Grouped Data
take 420 samples of an ore body
measure % concentration of Zinc (Zn)
frequency distribution of lab results

5-76

Computation of Statistics for


Grouped Data
% Weight
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
1.40
1.45
1.50

Frequency
2
5
11
21
33
41
53
42
38
31
34

% Weight
1.55
1.60
1.65
1.70
1.75
1.80
1.85
1.90
1.95
2.00

Frequency
28
14
22
18
15
4
2
2
3
1

TOTAL

420

5-77

Computation of Statistics
for Grouped Data
mean will then be:

fx

f1x1 f2x 2 ... fk x k

n
n
n fi f1 f2 ... fk
i i

And in our example:

fx

i i

1.00 * 2 1.05 * 5 ... 1.45 * 31 ... 2.00 * 1

1.40
420

5-78

Computation of Statistics for


Grouped Data
variance will then be:

fi ( x i x ) f1 ( x 1 x ) f2 ( x 2 x ) ... fk ( x k x )

5-79

Computation of Statistics for


Grouped Data
And in our example:
2
f
(
x

x
)
S2 i i

n
2
2
2
2
(
1
.
00

1
.
40
)

5
(
1
.
05

1
.
40
)

....

1
(
2
.
00

1
.
40
)
2
S
420
2
S 0.0365

5-80

Computation of Statistics
for Grouped Data
Similar formula are available for higher moments:

mr

fi ( x i x )
n

f1 ( x1 x ) f2 ( x 2 x ) ... fk ( x k x )

n
r

fx

i i

f1x1 f2x 2 ... fk x k

5-81

Sum up Chapter 5
Population X
mean and variance - , 2
distribution
A Sample
statistic from sample
usually mean and variance X, ^s2

5-82

Sum up Chapter 5
Sample Statistics
X mean and variance x,

^s2 mean and variance ^s2, ^s 2


Distribution

5-83

Sum Up Chapter 5
Samples Statistics
Mean X ~ , 2/n
Distribution

X
X
Z
t n 1

s
n
n

5-84

Sum Up Chapter 5
Samples Statistics
Proportions P ~ p, p(1-p)/n
n>30
Distribution

Pp
Z
pq
n

5-85

Sum Up Chapter 5
Samples Statistics
Differences and Sums
X1+/- X2 ~ 1 + 2, 12/n1 + 22/n2
Distribution

X 1 X 2 ( 1 2 )
N( 0,1)
2
2
1 2
n1 n 2

X 1 X 2 ( 1 2 )
t n n 2
2
2
n1 - 1 s1 n 2 - 1 s 2 n1 n 2

n1 n 2 - 2
n1 n 2
1

5-86

Sum Up Chapter 5
Samples Statistics
Variances
Distribution

n2 1

( n 1) S 2

Mean = n-1
Variance = 2(n-1)

5-87

Sum Up Chapter 5
Samples Statistics
Ratios of Variances

s
Fdf 1,df 2
s
2
1
2
2

2
2
2
1

5-88

Sum up Chapter 5
Other ways to organize samples
Frequency Distributions
Relative Frequency Distributions
Computation Statistics for Grouped Data
mean
variance
standard deviation

5-89

THATS ALL FOR


CHAPTER 5

THANK YOU!!

Anda mungkin juga menyukai