Proportional to Size
(PPS sampling)
1
Introduction to Sampling with Probability Proportional
to Size (PPS sampling):
Need
Dalam SRS probability pemilihan untuk setiap unit sama. Oleh
karena itu ketika unit-unit ukurannya sangat bervariasi, SRS
tidak cocok. Unit yang memiliki nilai variabel Y yang besar
memiliki kontribusi yang lebih pada total populasi yang
diestimasi dibanding unit yang memiliki nilai y kecil
Hence
Unit-unit yang memiliki size besar harus memiliki probability
yang lebih besar untuk masuk ke dalam sampel
Atau
Probability pemilihan suatu unit harus proporsional 2
terhadap ukurannya
Perbandingan Probability dari
Pemilihan Unit dg SRS and PPS
Cluster 1 2 3
No. of 20 100 30
listing unit
SRS 1 1 1 1 1 1
3 20 3 100 3 30
PPS 20 1 100 1 30 1
150 20 150 100 150 30
3
Jika kita mengetahui nilai Y, maka kita
menggunakan:
Sampling with Probability Proportional to size
(PPS sampling)
4
Use an auxiliary variable X to measure the unit sizes.
X should be highly correlated with Y.
Thus:
Y1 Y2 ... YN ; not all known
Population:
X 1 X 2 ... X N ; all known
N N
1
P k X
i i 1 k N
i 1 i 1
X
i 1
i
5
Xi
Pi N
, i 1,, N
X
i 1
i
1
n
SRSWR, Pi 1 1
N
n
SRSWOR, Pi fixed probabilities
N
m
Simple cluster, Pi
M
Unbiased
estimator of Y from i th unit is N.y
i
yi
Or, Y N . yi 1
N
Similarly, the unbiased estimator of the population total
Y in varying probability sampling from the ith draw is
yi , where pi (which varies from unit to
Y pps
pi
7
If values y i were known before sampling and sampling is
carried out with probability proportional to yi ,
yi
pi N
y
i 1
i
remains the same, and would give the same results as pps of y.
8
If we draw a sample of n units with replacement out of N units,
with the initial probability of selection of the ith unit as pi ,the
combined unbiased estimator of Y is
1 n yi
Y pps n pi
The sampling variance of this estimator is
1 N Yi N 2
1 Y
( Y ) 2 pi ( i Y 2 )
2
y pps
n pi n pi
Estimated variance of Y pps
is
n
1 yi
2
n(n 1) i 1 pi Y pps
v( Y pps) = ( )
1 n
yi
2
2
= n Y
n(n 1) i 1 pi PPS
9
Examples
Crop survey Employment survey
Select villages Select villages
PP to geographical PP to village
area or cultivated area population in the last
for a recent year census
10
Selection Method
1. List all units in the population along with size measure
11
Cumulative Total Method
Example: Select a PPS sample of 4 villages from following
population by WR and WOR
(1) (2) (3) (4)
Village No Area Cumulative Random
i (sq-km) Xi total numbers allotted
1 3 3 01 03
2 11 14 04 14
3 9 23 15 23
4 5 28 24 28
5 4 32 29 32
6 6 38 33 38
7 12 50 39 50
Given Data Computed Allotted
12
Cumulative Total Method
13
Select two-digit random numbers between 1 and 50
Random no. 62 33 78 21 49 39 02
Village no. - 6 - 3 7 7 1
PPSWR 1 2 3 4
PPSWOR 1 2 3 4
15
A simpler method:
Lahiris Method
Let X * max Xi e.g. X * 12, set X * = 19
16
Suppose, we have got:
17
Why Laharis Method Works?
Effective Draw: 1rN & s Xr
Ineffective Draw: 1rN & s > Xr
E.G.: Suppose r is chosen equal to 5, then
1 X5 X*
X5 X* X5
either (s X5) or (s > X5)
X5 X * X5
* 18
X X*
Pr(an Effective Draw) Pr(r i & s X i ) Pr(s X i | r i ) Pr(r i )
1 Xi
*
N X
1 Xi X 1 X i X X 1 X i
* 1 * *
1 * 1 * *
N X X N X X X N X
20
Pi Sum of above terms
X
1 i X X
2
1 Xi 1
*
1 1 * 1 *
NX X X NX
*
X
1 1 *
X
1 Xi 1 1 Xi X *
*
*
N X 1 1 X NX X
X*
Xi Xi
NX T ( X )
2
X X X
Since 1 1 * 1 * is a geometric series with 0 1 * 1
X X X
Because: for each Xi
Xi X
0 X i max X i X 0 * 1 0 1 * 1
*
X X
21
Example 2:
We would like to estimate the total number of unemployed persons in a
region, which has 14 villages. The numbers of households in these villages are:
400, 350, 760, 432, 860, 1180, 530, 600, 1320, 490, 1040, 310, 520, 900.
Select two villages by Laharis method.
Solution: N Population size 14
n Sample size 2
X * max X i 1320
Take X * 1320
Draw a number r between 1 and 14 and another number s between 1 and
1320. Suppose, we have r = 7 and s = 960.
Look at X7 =530 and compare it to s = 960:
s 960 X 7 530
This was an ineffective draw and we discard it.
22
Try again. Suppose r = 11, s = 870
s 870 X 11 1040
Thus village eleven is the first sample unit.
Try again. Let r = 8, s = 392
s 392 X 8 600
23
Estimation:
Population : U u1 , u2 ,, u N
Units are distinct and identifiab le
Study variable : Y with values Y1 , Y2 ,, YN
Unknown to us
Size measures : X with values X 1 , X 2 ,, X N
Known to us
Let T ( X ) X 1 X 2 X N
X1 X2 X
P1 , P2 , , PN N
T(X ) T(X ) T(X )
Xi
Choose a sample of size n and measure Yi and Pi for i 1,2,, n.
T(X )
So we have y1 , P1 , y2 , P2 , , yn , Pn
n
To estimate the population total T (Y ) Yi we have the following rules :
i 1
24
Estimation in PPSWR Sampling
n
1 yi
T (Y ) PPS YPPS Unbiased estimator
n i1 pi
2
Yj
var T (Y ) PPS T (Y ) p j
N
1
n j 1 p j
an unknown va lue
Unbiased Estimator of var T (Y ) PPS :
2
n
yi
T Y PPS
1
var T(Y) pps
n(n 1) i 1 pi
2
n
yi
n T Y PPS var T Y PPS
1
2
n(n 1) i 1 pi
25
Therefore, unbiased estimator of population mean Y is
1
y PPS T Y PPS
N
with variance
2
Yi
var( y PPS ) 2 var T Y PPS
N
1 1
2
T Y
N nN j 1 p j
26
Example 3
Select a sample of 5 units from villages given in example 2 and
estimate and its variance.
Solution
Here, T(X) = 9692. Now, use Lahiris method to select 5 units.
2
yi yi
Village Size pi yi
pi pi
2 350 0.036 7 194.44 37808.64
5 860 0.089 18 202.25 40903.93
7 530 0.055 11 200.00 40000.00
10 490 0.051 12 235.29 55363.32
13 520 0.054 15 277.788 77160.49
1109.76 251236.38
27
1109.76
T Y PPS
221.952 222;
5
var T Y PPS
1
54
251236.38 5222
2
240.819 ;
SE T Y PPS 15.518
28
Relative Efficiency of PPSWR Estimator
V T Y SRS
RE 100
V T Y PPS
This usually is not available, for Yis are not known. But
var T Y SRS
RE 100
var T Y PPS
Where
2
var T Y SRS 2 N n T Y PPS var T Y PPS
n 2
1 yi 1
n i1 pi n
Exercise
Apply this rule for example 3 and find the estimate of relative
efficiency.
29
PPS Sampling Without Replacement (PPSWOR)
The PPSWOR sample design is more efficient than PPSWR. More
complex to select the sample units and computing estimates.
Because we have to adjust the size measures after each draw. So,
we need to take into account the order in which units are selected.
When sample fraction is small, the efficiency of sampling with or
without replacement does not differ significantly
For selecting a PPSWOR sample, first a unit is selected as WR
The selected unit is removed from the population
From remaining units, another PPS sample of size one is taken
The selected unit is removed from the population
The process is repeated to get the required sample of size n
30
PPS Sampling Without Replacement (PPSWOR)
Suppose n units are selected one by one with PPS measure X, at
each draw, by WOR. The probability of selection at the first
draw for the jth unit is given by N
p
Xj , j=1,2,3,..,N; where X X j
j j 1
X
Similarly, the probability that the ith unit is selected at the second
draw, when the jth unit has been selected at the first draw, is
given by,
p i ; i j ; and so on
pi / j
1 p j
The above set up of sampling comprises an ordered set of sample
values with probabilities.
31
PPS Sampling Without Replacement (PPSWOR)
u1, u2, , un
first second nth
y1 y2 yn
p1 p2 pn
32
Des Rajs Estimator: Estimation Procedure
Notations:
Sample = (y1 , y2 , , yn)
probability = (p1 , p2 , , pn), Define estimators (t1 , t 2 ,....., t n )
y1 y (1 p1 )
t1 , t2 y1 2
p1 p2
y3(1 p1 p2 )
t3 y1 y2
p3
yi (1 p1 p2 pi1 )
ti y1 y2 yi1
pi
yn (1 p1 p2 pn1 )
tn y1 y2 yn1
pn
33
t1 t2 tn
t , sample mean of t i ' s
n
n
1
st2 i
n 1 i1
(t t ) 2
Population total, Y:
2
s 1
Y t , SE(Y ) t
st
n n
This is Des Raj estimator.
Population mean, Y :
t 1
Y , SE(Y ) st
N N n
34
Example: Construct probabilities proportional to size
8 30(1 0.18)
t1 44.44 t2 8 254
0.18 0.10
Y t 144.15 SE(Y ) 60.71
1
Y t 20.59 SE(Y ) 8.67
N
36
Murthys Unordered Estimator (PPSWOR)
Murthy gave an unordered PPSWOR estimator by suitably
combining Des Rajs ordered estimators. For illustration, suppose
the sample in Des Raj method is units 1 and 2, i.e., u1 and u2.
First suppose the order of selection is (1,2) which provides
1 y1 y2
Yd 1,2 1 p1 1 p1
2 p1 p2
Now, Suppose the order is (2,1) and you have
1 y2 y1
Yd 2,1 1 p2 1 p2
2 p2 p1
Probability of selection for Yd 1,2 and Yd 2,1 are
p1 p2 p1 p2
(1,2) , (2,1)
1 p1 1 p2
37
Combine the two ordered estimators Yd 1,2 and Yd 2,1 to get an
Y (1,2) (1,2) Y (2,1) (2,1)
unordered estimator YM
(1,2) (2,1)
1 y1 y2
YM (1 p2 ) (1 p1 )
2 p1 p2 p1 p2
2
1 N pi p j (1 pi p j ) Yi Y j
N
var(YM )
an unknown va lue
2 i j 1 (2 pi p j ) pi p j
2
(1 p1 )(1 p2 )(1 p1 p2 ) Y1 Y2
var(YM ) var(YM )
(2 p1 p2 ) 2
p1 p2
a known estimate
The general case is somewhat complicated. (cf Hedayat, A. S. and Sinha, B. K. (1991)
Design and Inference in Finite Population Sampling)
38
Horvitz Thompson Estimator (PPSWOR)
Horvitz-Thompson
2
yi y j
VYG (Y HT ) i j ij
N
i j i j
40
Rao, Hartley, Cochran estimator (PPSWOR with random grouping)
The steps for sample selection are as under:
1. Split the population of size N into n random groups of sizes
N1, N2, , Nn units, such that N1 + N2 + + Nn = N
2. Select one unit with probability proportional to the group size
from each group:
One unit from the first group of size N1, call it u1
One unit from the second group of size N2, call it u2
One unit from the last group of size Nn, call it un
i N
N 2
Yi
N
2
var(Yrhc ) i 1
Y pi an unknown va lue
N ( N 1) i1 pi
n
i N
N 2
Yi
n
2
42
If groups are of equal size
N
N1 N 2 N n
n
Then 2
N n N
Yi
var(Yrhc ) Y pi
n( N 1) i 1 pi
2
N n n
yi
var(Yrhc ) Yrhc i
N (n 1) i1 pi
43
Area Sampling With PPS
1. Take a map showing villages or other
areas (like crop-fields) as sampling frame.
2. Select X- and Y- coordinates at random.
3. Locate (X, Y)-point on the map.
4. Choose the area unit in which this random
point falls.
5. Repeat until choosing n sampling unit.
44
Combination of PPS With SRS in Multistage
Sampling Designs
One-Stage sampling design (cluster sampling):
Cluster (PSUs) can be selected either by SRS or PPS (we have
discussed both cases)
45
even more complicated designs:
using SRS in some SSU' s and
PPS in other SSU' s
46
THANK
YOU
47