Anda di halaman 1dari 47

Sampling with Probability

Proportional to Size
(PPS sampling)

1
Introduction to Sampling with Probability Proportional
to Size (PPS sampling):
Need
Dalam SRS probability pemilihan untuk setiap unit sama. Oleh
karena itu ketika unit-unit ukurannya sangat bervariasi, SRS
tidak cocok. Unit yang memiliki nilai variabel Y yang besar
memiliki kontribusi yang lebih pada total populasi yang
diestimasi dibanding unit yang memiliki nilai y kecil
Hence
Unit-unit yang memiliki size besar harus memiliki probability
yang lebih besar untuk masuk ke dalam sampel
Atau
Probability pemilihan suatu unit harus proporsional 2
terhadap ukurannya
Perbandingan Probability dari
Pemilihan Unit dg SRS and PPS
Cluster 1 2 3

No. of 20 100 30
listing unit
SRS 1 1 1 1 1 1

3 20 3 100 3 30
PPS 20 1 100 1 30 1

150 20 150 100 150 30
3
Jika kita mengetahui nilai Y, maka kita
menggunakan:
Sampling with Probability Proportional to size
(PPS sampling)

Jika kita tidak mengetahui nilai Y, maka kita


menggunakan:

Sampling with Probability Proportional to


estimated size (PPES sampling)

4
Use an auxiliary variable X to measure the unit sizes.
X should be highly correlated with Y.
Thus:
Y1 Y2 ... YN ; not all known
Population:
X 1 X 2 ... X N ; all known

Probability of i-th unit being in the sample = Pi Xi


Pi = k Xi , i = 1, , N

N N
1
P k X
i i 1 k N
i 1 i 1
X
i 1
i

5
Xi
Pi N
, i 1,, N
X
i 1
i

i.e. Sampling is done with a varying probability in:

1
n

SRSWR, Pi 1 1
N
n
SRSWOR, Pi fixed probabilities
N
m
Simple cluster, Pi
M

But in PPS, Pi depends on Xi , size of the unit which is variable.


The larger the Xi , the more probable i-th unit to be chosen.
6
In SRS from a population of size N , population total Y
Probability of selection of a unit at any draw is 1/N

Unbiased

estimator of Y from i th unit is N.y
i
yi
Or, Y N . yi 1
N
Similarly, the unbiased estimator of the population total
Y in varying probability sampling from the ith draw is

yi , where pi (which varies from unit to
Y pps

pi

unit in the universe) is probability of selection of yi

7
If values y i were known before sampling and sampling is
carried out with probability proportional to yi ,
yi
pi N

y
i 1
i

If instead of drawing a unit with probability proportional to its


actual value, we draw with probability proportional to an
auxiliary variable whose size ( xi ) is related to y i by relation,
xi kyi
where k is positive constant, the probability of selection
xi kyi yi
ppxi N
N
N
ppy i
x
i 1
i k yi
i 1
y
i 1
i

remains the same, and would give the same results as pps of y.
8
If we draw a sample of n units with replacement out of N units,
with the initial probability of selection of the ith unit as pi ,the
combined unbiased estimator of Y is

1 n yi
Y pps n pi
The sampling variance of this estimator is
1 N Yi N 2


1 Y
( Y ) 2 pi ( i Y 2 )
2
y pps
n pi n pi

Estimated variance of Y pps
is
n
1 yi


2

n(n 1) i 1 pi Y pps
v( Y pps) = ( )

1 n
yi
2
2

= n Y
n(n 1) i 1 pi PPS

9
Examples
Crop survey Employment survey
Select villages Select villages
PP to geographical PP to village
area or cultivated area population in the last
for a recent year census

10
Selection Method
1. List all units in the population along with size measure

2. Select a sample with PPS using


Cumulative Total method
or
Lahiris method.

Note that we may have PPSWR or PPSWOR

11
Cumulative Total Method
Example: Select a PPS sample of 4 villages from following
population by WR and WOR
(1) (2) (3) (4)
Village No Area Cumulative Random
i (sq-km) Xi total numbers allotted
1 3 3 01 03
2 11 14 04 14
3 9 23 15 23
4 5 28 24 28
5 4 32 29 32
6 6 38 33 38
7 12 50 39 50
Given Data Computed Allotted
12
Cumulative Total Method

A table of cumulative total of sizes of the


units is made
Let Ti = x1 + x2 + +xi
A random number ,say R, is drawn between
1 and TN (=X)
The unit i is selected if Ti-1 < R Ti
The process is repeated to get the requisite
sample

13
Select two-digit random numbers between 1 and 50
Random no. 62 33 78 21 49 39 02

Village no. - 6 - 3 7 7 1

PPSWR 1 2 3 4

PPSWOR 1 2 3 4

This method is known as Cumulative Total Method


If N is large, cumulating can be time consuming.
14
PPS Systematic Sampling
Create the cumulative totals and allot
random numbers to different units.
1 N
Compute k X. i
n i 1
Select a random number r from 1 to k.
The sample is
r, r+k, r+2k, r+3k, , r+(n-1)k
If any unit has size greater than k, it is selected
more than once.

15
A simpler method:

Lahiris Method
Let X * max Xi e.g. X * 12, set X * = 19

Select a pair of random numbers (r , s)


Select r between 1 and N look at Xr
Select s between 1 and X * Compare s and Xr
If s Xr rth unit is chosen
If s > Xr useless, discard (r , s), do again with a fresh pair

16
Suppose, we have got:

7,19 r 7 X 7 12, s 19 X 7 discard (7,19)


3,11 r 3 X 3 9, s 11 X 3 discard (3, 11)
(3, 4) r 3 X 3 9, s 4 X 3 village 3 chosen
1, 4 r 1, X1 3, 4 X1 discard
5, 6 r 5, X 5 4, 6 X 5 discard
4, 3 X 4 5 s 3 village 4 chosen
7, 3 X 7 12 s 3 village 7 chosen

The sample : (3, 4, 7) with y3 8, y4 30, y7 32

17
Why Laharis Method Works?
Effective Draw: 1rN & s Xr
Ineffective Draw: 1rN & s > Xr
E.G.: Suppose r is chosen equal to 5, then
1 X5 X*

X5 X* X5
either (s X5) or (s > X5)

Pr(s X5 | r = 5) Pr(s > X5 | r = 5)

X5 X * X5
* 18
X X*
Pr(an Effective Draw) Pr(r i & s X i ) Pr(s X i | r i ) Pr(r i )
1 Xi
*
N X

Pr(an Ineffective Draw) Pr(r 1, s X 1 ) Pr(r N , s X N )


Pr(s X 1 | r 1) Pr(r 1)
Pr(s X N | r N ) Pr(r N )
X * X1 1 X * X 2 1 X* XN 1
*
*
*

X N X N X N
1 X * X1 X * X 2 X* XN
*
*

N X X X*
1 NX * T ( X ) X
1 *
X
*
N X
19
Pi = Probability of selecting certain unit i for the first time
This can happen as follows

1 draw 2 draws 3 draws


(unit i is chosen) First draw is ineffective First & second draws
unit is chosen at second are ineffective, unit i is
draw chosen at third draw

(r i & s X i ) (r i & s X i ) (r i & s X i )


( r i & s X i ) ( r i & s X i )
(r i & s X i )

1 Xi X 1 X i X X 1 X i
* 1 * *
1 * 1 * *

N X X N X X X N X
20
Pi Sum of above terms
X
1 i X X
2
1 Xi 1
*
1 1 * 1 *
NX X X NX
*
X
1 1 *
X
1 Xi 1 1 Xi X *
*
*

N X 1 1 X NX X
X*
Xi Xi

NX T ( X )
2
X X X
Since 1 1 * 1 * is a geometric series with 0 1 * 1
X X X
Because: for each Xi
Xi X
0 X i max X i X 0 * 1 0 1 * 1
*

X X
21
Example 2:
We would like to estimate the total number of unemployed persons in a
region, which has 14 villages. The numbers of households in these villages are:
400, 350, 760, 432, 860, 1180, 530, 600, 1320, 490, 1040, 310, 520, 900.
Select two villages by Laharis method.
Solution: N Population size 14
n Sample size 2
X * max X i 1320
Take X * 1320
Draw a number r between 1 and 14 and another number s between 1 and
1320. Suppose, we have r = 7 and s = 960.
Look at X7 =530 and compare it to s = 960:
s 960 X 7 530
This was an ineffective draw and we discard it.
22
Try again. Suppose r = 11, s = 870
s 870 X 11 1040
Thus village eleven is the first sample unit.
Try again. Let r = 8, s = 392

s 392 X 8 600

So, village eight is also chosen. Our sample is villages 11 and 8.

23
Estimation:
Population : U u1 , u2 ,, u N
Units are distinct and identifiab le
Study variable : Y with values Y1 , Y2 ,, YN
Unknown to us
Size measures : X with values X 1 , X 2 ,, X N
Known to us
Let T ( X ) X 1 X 2 X N
X1 X2 X
P1 , P2 , , PN N
T(X ) T(X ) T(X )
Xi
Choose a sample of size n and measure Yi and Pi for i 1,2,, n.
T(X )
So we have y1 , P1 , y2 , P2 , , yn , Pn
n
To estimate the population total T (Y ) Yi we have the following rules :
i 1

24
Estimation in PPSWR Sampling
n
1 yi
T (Y ) PPS YPPS Unbiased estimator

n i1 pi
2
Yj
var T (Y ) PPS T (Y ) p j
N
1
n j 1 p j
an unknown va lue
Unbiased Estimator of var T (Y ) PPS :

2
n
yi
T Y PPS
1
var T(Y) pps
n(n 1) i 1 pi


2
n
yi
n T Y PPS var T Y PPS
1

2

n(n 1) i 1 pi

25
Therefore, unbiased estimator of population mean Y is

1
y PPS T Y PPS
N
with variance
2
Yi
var( y PPS ) 2 var T Y PPS
N
1 1
2
T Y
N nN j 1 p j

whose unbiased estimator is


2
1 1 n
yi
2
var y PPS 2 var T Y PPS T Y PPS
N nn 1N i 1 pi

26
Example 3
Select a sample of 5 units from villages given in example 2 and
estimate and its variance.
Solution
Here, T(X) = 9692. Now, use Lahiris method to select 5 units.
2
yi yi
Village Size pi yi
pi pi
2 350 0.036 7 194.44 37808.64
5 860 0.089 18 202.25 40903.93
7 530 0.055 11 200.00 40000.00
10 490 0.051 12 235.29 55363.32
13 520 0.054 15 277.788 77160.49
1109.76 251236.38

27
1109.76
T Y PPS
221.952 222;
5


var T Y PPS
1
54
251236.38 5222
2

240.819 ;


SE T Y PPS 15.518

28
Relative Efficiency of PPSWR Estimator
V T Y SRS
RE 100
V T Y PPS

This usually is not available, for Yis are not known. But
var T Y SRS
RE 100
var T Y PPS

Where
2
var T Y SRS 2 N n T Y PPS var T Y PPS
n 2
1 yi 1
n i1 pi n
Exercise
Apply this rule for example 3 and find the estimate of relative
efficiency.
29
PPS Sampling Without Replacement (PPSWOR)
The PPSWOR sample design is more efficient than PPSWR. More
complex to select the sample units and computing estimates.
Because we have to adjust the size measures after each draw. So,
we need to take into account the order in which units are selected.
When sample fraction is small, the efficiency of sampling with or
without replacement does not differ significantly
For selecting a PPSWOR sample, first a unit is selected as WR
The selected unit is removed from the population
From remaining units, another PPS sample of size one is taken
The selected unit is removed from the population
The process is repeated to get the required sample of size n
30
PPS Sampling Without Replacement (PPSWOR)
Suppose n units are selected one by one with PPS measure X, at
each draw, by WOR. The probability of selection at the first
draw for the jth unit is given by N

p
Xj , j=1,2,3,..,N; where X X j
j j 1
X
Similarly, the probability that the ith unit is selected at the second
draw, when the jth unit has been selected at the first draw, is
given by,
p i ; i j ; and so on
pi / j
1 p j
The above set up of sampling comprises an ordered set of sample
values with probabilities.

31
PPS Sampling Without Replacement (PPSWOR)

Thus, we need to use the ordered estimators. One such


estimator is the Des Rajs ordered estimator.

Units selected in order of their draw:

u1, u2, , un
first second nth

y1 y2 yn

p1 p2 pn

32
Des Rajs Estimator: Estimation Procedure
Notations:
Sample = (y1 , y2 , , yn)
probability = (p1 , p2 , , pn), Define estimators (t1 , t 2 ,....., t n )
y1 y (1 p1 )
t1 , t2 y1 2
p1 p2
y3(1 p1 p2 )
t3 y1 y2
p3

yi (1 p1 p2 pi1 )
ti y1 y2 yi1
pi
yn (1 p1 p2 pn1 )
tn y1 y2 yn1
pn
33
t1 t2 tn
t , sample mean of t i ' s
n
n
1
st2 i
n 1 i1
(t t ) 2

Population total, Y:
2
s 1
Y t , SE(Y ) t
st
n n
This is Des Raj estimator.

Population mean, Y :

t 1
Y , SE(Y ) st
N N n

34
Example: Construct probabilities proportional to size

Village No. 1 2 3 4 5 6 7 Sum


Size measure i 3 11 9 5 4 6 12 50
Prob. of Selection
0.06 0.22 0.18 0.10 0.08 0.12 0.24 1.00
Xi / T(X)
Sample Values 8 30 32

8 30(1 0.18)
t1 44.44 t2 8 254
0.18 0.10

32(1 0.18 0.10)


t3 8 30 134
0.24
35
t 144.15 st2 11056.06 st 105.15


Y t 144.15 SE(Y ) 60.71


1
Y t 20.59 SE(Y ) 8.67
N

36
Murthys Unordered Estimator (PPSWOR)
Murthy gave an unordered PPSWOR estimator by suitably
combining Des Rajs ordered estimators. For illustration, suppose
the sample in Des Raj method is units 1 and 2, i.e., u1 and u2.
First suppose the order of selection is (1,2) which provides
1 y1 y2
Yd 1,2 1 p1 1 p1

2 p1 p2
Now, Suppose the order is (2,1) and you have
1 y2 y1
Yd 2,1 1 p2 1 p2

2 p2 p1
Probability of selection for Yd 1,2 and Yd 2,1 are
p1 p2 p1 p2
(1,2) , (2,1)
1 p1 1 p2
37
Combine the two ordered estimators Yd 1,2 and Yd 2,1 to get an
Y (1,2) (1,2) Y (2,1) (2,1)
unordered estimator YM
(1,2) (2,1)

1 y1 y2
YM (1 p2 ) (1 p1 )
2 p1 p2 p1 p2
2
1 N pi p j (1 pi p j ) Yi Y j
N
var(YM )
an unknown va lue
2 i j 1 (2 pi p j ) pi p j
2
(1 p1 )(1 p2 )(1 p1 p2 ) Y1 Y2
var(YM ) var(YM )
(2 p1 p2 ) 2
p1 p2
a known estimate
The general case is somewhat complicated. (cf Hedayat, A. S. and Sinha, B. K. (1991)
Design and Inference in Finite Population Sampling)
38
Horvitz Thompson Estimator (PPSWOR)
Horvitz-Thompson

estimator is a linear estimator of type


yi
Y
HT
is i
. For selecting a PPSWOR sample of size two the
Horvitz-Thompson sampling design consists in selecting first unit
with probability pi ' s and the second unit with probability
proportional to remaining sizes. The inclusion probabilities are
defined as, N pj p
i pi [ 1 i ], i 1,2, ,...N
j 1 1 p j 1 pi
1 1
ij pi p j [ ], i j 1,2..., N
1 pi 1 p j

The Horvitz-Thompson estimator (HTE) is defined as



y1 y 2
Y HT 1 2
39
Horvitz Thompson Estimator (PPSWOR)
The variance of the HTE as given by Yates and Grundy is

2
yi y j
VYG (Y HT ) i j ij
N


i j i j

The variance estimator of the HTE as given by Yates and Grundy is


2
1 2 12 y1 y2
V YG (Y HT )
12 1 2

40
Rao, Hartley, Cochran estimator (PPSWOR with random grouping)
The steps for sample selection are as under:
1. Split the population of size N into n random groups of sizes
N1, N2, , Nn units, such that N1 + N2 + + Nn = N
2. Select one unit with probability proportional to the group size
from each group:
One unit from the first group of size N1, call it u1
One unit from the second group of size N2, call it u2

One unit from the last group of size Nn, call it un

3. Measure the study vari able y to get y1 , y2 ,, yn


With size measures p1 , p2 ,, pn
41
Unbiased Rao, Hartley, Cochran (rhc) estimator of population total Y:
n
yi
Yrhc i

i 1 pi

with i sum of p j ' s in group i


n

i N
N 2
Yi
N

2

var(Yrhc ) i 1
Y pi an unknown va lue
N ( N 1) i1 pi
n

i N
N 2
Yi
n
2

var(Yrhc ) var (Yrhc ) i 1


n i Yrhc
N
2
N 2 i 1 pi
i
i 1

42
If groups are of equal size
N
N1 N 2 N n
n
Then 2
N n N
Yi

var(Yrhc ) Y pi
n( N 1) i 1 pi
2
N n n
yi

var(Yrhc ) Yrhc i
N (n 1) i1 pi

N ( N 1) Unequal size groups


n
V (Ypps ) n( N i2 N )

RE i1
V (Yrhc )
N 1
Equal size groups
N n

43
Area Sampling With PPS
1. Take a map showing villages or other
areas (like crop-fields) as sampling frame.
2. Select X- and Y- coordinates at random.
3. Locate (X, Y)-point on the map.
4. Choose the area unit in which this random
point falls.
5. Repeat until choosing n sampling unit.

44
Combination of PPS With SRS in Multistage
Sampling Designs
One-Stage sampling design (cluster sampling):
Cluster (PSUs) can be selected either by SRS or PPS (we have
discussed both cases)

Two-Stage sampling design: various possibilities


PSUs by SRS SSUs by SRS
PSUs by PPS SSUs by SRS
PSUs by SRS SSUs by PPS
PSUs by PPS SSUs by PPS

45
even more complicated designs:
using SRS in some SSU' s and

PPS in other SSU' s

Sample Survey Practitioners need to understand making


use of the appropriate choices of making use of SRS or
PPS at the Second Stage Units level in designing the
surveys

46
THANK
YOU
47

Anda mungkin juga menyukai