Anda di halaman 1dari 11

SOME ASPECTS OF SUCCESSIVE SAMPLING

KAUSTAV ADITYA
M.Sc. (Agricultural Statistics), Roll No. 4493

I.A.S.R.I, Library Avenue, New Delhi-110012


Chairperson: Dr. Ranjana Agarwal
Abstract: Surveys often get repeated on many occasions (over years or seasons) for
estimating same characteristics at different points of time. Successive sampling is such
kind of sampling scheme which consists of selecting sample units on different occasions
such that some units are common with samples selected on previous occasions. Generally,
the main objective of successive surveys is to estimate the change with a view to study the
effects of the forces acting upon the population. For this, it is better to retain the same
sample from occasion to occasion. For populations where the basic objective is to study the
overall average or the total, it is better to select a fresh sample for every occasion. If the
objective is to estimate the average value for the most recent occasion, the retention of a
part of the sample over occasions provides efficient estimates as compared to other
alternatives. In this sampling scheme, on the first occasion a simple random sample s of n
units is selected by SRSWOR from the population and then on the second occasion a
simple random sample s1of m units and a simple random sample s2 of (n-m) units is
selected independently from the population by SRS with out replacement. Here the sample
s1 is common on both the occasion.
Key words: Surveys, Sampling, SRSWOR, Successive Sampling, Efficient Estimates.
1. Introduction
Surveys often gets repeated on many occasions (over years or seasons) for estimating same
characteristics at different points of time. The information collected on previous occasion
can be used to study the change or the total value over occasion for the character and also
in addition to study the average value for the most recent occasion. For example in milk
yield survey one may be interested in estimating the
1.
Average milk yield for the current season,
2. The change in milk yield for two different season and
3.
Total milk production for the year.
The successive method of sampling consists of selecting sample units on different
occasions such that some units are common with samples selected on previous occasions.
If sampling on successive occasions is done according to a specific rule, with partial
replacement of sampling units, it is known as successive sampling. The method of
successive sampling was developed by Jessen (1942) and extended by Patterson (1950)
and by Tikkiwal (1950, 53, 56, 64, 65, 67) and also Eckler (1955). Singh and Kathuria
(1969) investigated the application of this sampling technique in the agricultural field.
Hansen et al. (1955) and Rao and Graham (1964) have discussed rotation designs for
successive sampling. Singh and Singh (1965), Singh (1968), Singh and Kathuria (1969)
have extended successive sampling for many other sampling designs.

Some Aspects of Successive Sampling

Generally, the main objective of successive surveys is to estimate the change with a view
to study the effects of the forces acting upon the population. For this, it is better to retain
the same sample from occasion to occasion. For populations where the basic objective is to
study the overall average or the total, it is better to select a fresh sample for every occasion.
If the objective is to estimate the average value for the most recent occasion, the retention
of a part of the sample over occasions provides efficient estimates as compared to other
alternatives.
One important question arises in the context of devising efficient sampling strategies for
repetitive surveys is whether the same sample is to be surveyed on all occasions, or fresh
samples are to be chosen on each of the occasions; in what manner the composition of the
sample is changed from occasion to occasion. The answer depends on, apart from field
difficulties, the specific problems of estimation at hand. For instance if the aim is to
estimate only the difference between the item mean on the current ( Y ) and on the previous
( X ) occasion, then the sample on both the occasion would give rise to a better estimate
than the independent samples since the variance of the estimate in the former case viz,
V ( y - x ) = V ( y ) + V ( x ) 2Cov ( y , x ) < V ( y ) + V ( x ),
as y and x are highly correlated so that Cov ( y, x )>0 .
On the contrary, for estimating the average of the means the latter would be better than the
former in that
V ( y + x ) = V( y ) + V( x ) + 2Cov( y , x ) > V ( y ) + V ( x )
But, if the difference between the means and also their average are to be estimated
simultaneously, clearly neither of this alternatives are desirable ,hence arises the idea of
retaining a part (say S c ) of the previous sample (say S1 ) and supplement it by a set (
say S f ) of fresh units on the current occasion, and the data retaining to x on S1 , x and y
on Sc , and y on S f

build up the optimum estimator of Y so that it ,together with the

estimate of X , would give rise to efficient result for difference between Y and X ,and
also their average .The question then would be that big or small the set S c of common
units , or set S f of fresh units, should be for the surveys on the current occasion ,how
should S c and S f be chosen and what procedure be employed for working out estimates.
The entire question is interrelated and depends ultimately on the regression of y on x. It is
known that regression of y on x is linear with significant intercepts then we may choose S c
from S1 by SRS with out replacement and then employ regression estimator, or when the
intercept is not significant the sample S c may be chosen by SRS and ratio estimator be
employed.
2. Sampling on Two Successive Occasions
It is assumed that the survey population remains unaltered from occasion to occasion .For
the purpose of generality, let the sample size for the first occasion be n1 and that for the

Some Aspects of Successive Sampling

second occasion be n2 = n12 + n22 , where n12 is the number of common units between the 1st
and the 2nd occasion and n22 units to be drawn afresh on the second occasion ,where the
data obtained on current(i.e. 2nd in this case )occasion would be denoted by y and that on
the previous occasion (i.e. 1st in this case)by x. Now the sampling procedure consists of
the following steps:
Step (1.a): From the given survey population choose a sample S1 of size n1 units by SRS
without replacement for survey on the first occasion.
Step(1.b): On the second occasion choose a set, S c of n12 units from the sample taken at
step(1a) either by SRS or pps sampling depending on the situation at hand and supplement
it to another set, S f of n22 units taken independently from the unsurveyed N- n1 units of
the population by SRS without replacement, so that the total sample S2 = S c + S f on the
second occasion comprises n2 = n12 + n22 units, now as S1 ,acts as preliminary sample ,the
estimate say t c ,based on y and x values of Sc and x values of S1 , would be a double
n12
xi
1 n12 y j
p
=
,
X
=
,
where
sampling ratio or regression or pps estimate or tc =
i xi .

i
X
n12 j =1 p j
So we have,
E ( tc ) = Y

and V ( t c ) = A/ n1 + B/ n12 -

1 2
Sy
N

(2.1)

where A and B are quantities based on the population x and y values,


(here A = S , B =
2
y

= pi i Y )
i
Npi

2
ppx

Also in view of the selection of S f as noted in the step (1b), it is obvious for the mean y f
1
of S f ,where y f =
n22

n22

y
i =1

,
V ( y f ) = (1/ n22 - 1/N) S2 y

E ( yf ) = Y and

(2.2)

Further t c and y f are correlated in that,


Cov(t c ,yf )= Cov[E( t c / S1 ),E( y f / S1 )]

= Cov y1 ,

N Y n1 y1

N n1

1 2
Sy
N

(2.3)

Some Aspects of Successive Sampling

Now we have to find a best minimum variance combination of t c and y f . Now to


determine this we use the theorem that if ti, i =1,2,,n be statistics such that E ( ti ) = , V
( ti ) = 2 and cov(ti , t j ) = c, i j = 1, 2,..., n .Then the best in the of class of all linear
function ti, i =1,2,,n is given by

(1/ ) t .
=
(1/ )
2
i*

2
i*

Now if the linear function is L ( a ) = ai ti , and if n =2, then the function

L ( a ) = at1 + (1 a ) t2 , has the smallest variance when the value of a is


a* =

22

+ 22
2
1*

and variance of, t * = a*t1 + (1 a* ) t2 is ,

12 22

( )=

V t

2
1*

+ 22*

+c.

So in this sampling on two successive occasion, the best minimum variance combination
of t c and yf is given by,

yss = atc + (1 a ) y f

with

a=

*
f

V +V
*
c

*
f

and V ( yss ) =

(2.4)
*

*
c

V fV

V f* + Vc*

+ Cov ( tc , y f

...(2.5)

where V f* = V ( y f ) Cov ( tc , y f ) =

1 2
Sy
n22
A B
Vc* = V ( tc ) Cov ( tc , y f ) = +
n1 n12

(2.6)

in virtue of the equation (2.1-2.3).


If n1 = n2 = n, say, n22 = n so that n12 = n(1 ), then the variance of yss would simplify
to
S y2 1
1
S y2
Vss =
...(2.7)
2
2
n 1 + N
where = B / S y2 and A = S y2 B for any of the estimates .
Now minimizing (2.7) with respect to the optimum value of the fraction of units to be
drawn afresh on the second occasion is given by,
1
(2.8)
* =
1+

Some Aspects of Successive Sampling

For this value of , a=1/2 and the estimate of Y and its variance would reduce to
1
...(2.9)
y*ss = ( tc + y f )
2
and
1 +
1
(2.10)

Vss* = S y2
N
2n
Theorem 2.1: When the sampling procedure given by the steps (1,a-b) is employed for
getting a sample of n units on the both the occasions SRS with out replacement together
with ratio or regression estimate or pps with or with out replacement strategy of RaoHartley-Cochran, at step 1.b for selecting S c and building up a double sampling estimate
tc of Y from S c and then the best possible estimate of y on the second occasion would be
given by the average of the estimates obtained from S c and S f and its variance is as in
equation (2.10).
It may be noted that the quantity is always less tan unity in the particular situation under
2

the condition that B = pi i Y which is smaller than S y2 .


i
Npi

In the above portion we have studied how and under what circumstances, the estimator for
the mean for the second occasion can be improved by utilizing the information collected on
the first occasion. Another important problem in sampling on two occasions is to
estimating the change in the total value of the study variate during the period. The
estimation of change presents how ever a different problem of applying information
provided by the samples.
Here we shall discuss the problem of estimating the change in the population mean over a
period, based on samples taken on two occasions.
Let the sample size for the first occasion be n1 and that for the second occasion
is n2 = n12 + n22 , where n12 is the number of common units between the 1st and the 2nd
occasion and n22 units to be drawn afresh on the second occasion also let the sample on
the first occasion be S1 of size n1 units which is drawn by SRS without replacement. On
the second occasion choose a set, S c of n12 units from the sample S1 of size n1 units chosen
on the first occasion, then supplement it by another set, S f of n22 units taken
independently from the unsurveyed N- n1 units of the population by SRS without
replacement. Then an estimator of the change may be given by,

d =

n12
n
[ yc xc ] + 22 y f x f
n1
n1

...(a)

where, d = estimator for change


yc = Mean per unit for the second occasion for S c which is common on both occasion.
5

Some Aspects of Successive Sampling

xc = Mean per unit for the first occasion for S c which is common on both occasion.
y f = Mean per unit for the second occasion for S f which is drawn afresh on the first
occasion.
Mean
per unit for the first occasion for S f which is drawn afresh on the first occasion.
xf =
And it has sampling variance as,
2
n12 S y
V ( d ) = 2 1

n1 n1

(b)

where is the correlation coefficient between X and Y. If it is not known it is estimated


from the sample and then the variance will be given by,
2
n Sy
V ( d ) = 2 1 12 r
n1 n1

where r is the sample correlation coefficient between x and y.

(c)

3. A Practical Example on Successive Sampling


In a population survey, the total no of inhabitants in 30 randomly selected villages in
subdivision is counted for two successive years to study the average number of inhabitants.
The total no of villages in the subdivision was 2032. Now estimate the average no of
person per village over two years and total no of persons in the sub division in the second
year, using the available information for the previous year.
The table given:
Villages
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

85-86
78
17
85
21
19
25
7
11
19
31
51
62
22
25
27
21
21
19
28
32

86-87

30
20
28
17
21

82

29

Villages
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44

85-86
71
12
17
28
19
23

86-87
78

35

72
25
27
29
37
42
14
25
28
112
75
89
78
108

Some Aspects of Successive Sampling

21
22
23
24

36
49
42
63

45

45
46
47
48
49
50

86
43
72
3
17
82

Solution: As per the notations discussed, N = 2032, n1 =30, n12 =10, n22 = 20, & n2 = 30,

S2y = 731.068 So, A=731.068 and B =368.11,


2

where A = S , B = pi i Y , tc =36.1, y f =42.7,


i
Npi

1
Cov(t c ,yf )= S2 y = - 0.3599
N
N

2
y

Now the estimate of the average no of inhabitants on the current occasion using the
information from the previous occasion is given by:
yss =a tc + (1-a) y f
V f* = 36.55
Vc* = 61.18
V f*

then,

a=

or

a = 0.374 and yss = 40.2316,

and ,

V ( yss ) =

Vc* + V f*

= 0.374

V * f Vc*
V f* + Vc*

+ Cov ( tc , y f

= 24.37+ (-0.3599) = 23.6401

Now for estimating the average of the total no of persons in the subdivision over the two
year, we have = 0.5035 and so we get
1
=0.6651,
* =
1+
For this value we will now get, the estimate of the average no of inhabitants in the two
years in the sub division,
y*ss =

1
( tc + y f
2

=39.4

and variance will be :


1 +
1
= 20.47
Vss* = S y2
N
2n

So here we have obtained the estimate of the average no of inhabitants in two years in the
subdivision is = 39.4 and the variance of the estimator is 20.47 and the estimate of the
7

Some Aspects of Successive Sampling

change in average no of inhabitants in the villages in the second year is=40.23.16 and
variance of the estimate is = 23.6401.
Now we will derive the estimator for sampling on several successive occasions by Yates
(1949).
4. Sampling on Several Successive Occasions
Let s k 1 , f be the sample of nk-1, k-1 unmatched unit drawn afresh using simple ransom
sampling with out replacement from the remaining population unit on the (k-1)th occasion,
k = 2,3.and s1f =s1, which is of n1 units. To draw a sample of nk unit on the kth occasion
adopt the following procedure:
Step (2a): From the sample s k 1 , f of nk-1, k-1 unmatched unit drawn on the (k-1)th occasion
select a skc of nk-1, k-1 units using some probability sampling procedure;
Step (2b): In addition to the above sample skc selected above , choose independently by
SRS with out replacement from the remaining N-nk-1, K-1 units of the population sample skf
of nk,k units for investigation on the k- th occasion , where nk-1,k +nk,k = nk for k=2,3.
It may be noted that skf, , k=1, 2. constitutes a simple random sample from the
population, and b step (VD.2a): skcis a sub sample of , s k 1 , f ,k = 2,3 If tkc is a statistics
based on the observations from skc such that
E (tkc / sk-1c) = yk 1,k 1

(4.1)

i.e the mean of y-observations of the nk-1,k-1 units of sk-1, f then tkc provides a double
sampling estimate of Y, the population mean on the k th occasion k=2,3so it may be
assumed in line with the preceding discussion on the two occasion case that
Vkc= V ( tkc ) = A

(k 1, k ) B(k 1, K ) 1 2
+
S y,
nk 1,k
N K 1, K 1
N

(4.2)

where A ( k 1, k ) and B (k 1, k ) are constant based on the population values of the study
variate relating to k th and or (k-1) occasion. Set y k,k to denote the mean of the item values
obtained from s kf.
Now the optimum combination of the t kc and y kk is given by
Y wk= ak y k,k +(1-ak) t kc,

(4.3)

V *kc
V *k ,k +V *kc
V *kc = V *kc -Cov (ykk, tkc), V *kc = V *kk -Cov (ykk, tkc).

where, a k =
and

Further, the variance of Y wk is given by

...(4.4)
(4.5)

Some Aspects of Successive Sampling

Vwk = ak Vk*,k + Cov( yk ,k , tkc )

(4.5a)

But clearly
Cov ( y kk, ,tkc) = Cov[ E ( ykk / sk 1 f ), E (tkc / sk 1 f )]
Since for a given sk-1,f the sample skc, skf are independent.
nk 1,k 1
1
1
As
C ov{E ( ykk / sk 1, f ), E (tkc / sk 1, f )} = Cov[
V ( yk ,k )] = (
)S y 2
N nk 1,k 1
nkk N
It follows in view of (4.2) that
C ov{E ( ykk / sk 1, f ), E (tkc / sk 1, f )}

= Cov[
=

NY nk 1,k 1 yk 1,k 1
N nk 1,k 1

nk 1,k 1
N nk 1,k 1

, yk 1,k 1 ]

V ( yk 1,k 1 )

1 2
S y
N

Since the sample sk-1 of unmatched units always constitutes a simple random sample from
the population for all k=2, 3 For the reason
V ( yk , k ) = (

1
1
)S y 2
nkk N

(4.6)

The sampling procedure given by the step 2a and 2b together with (4.4) to (4.7) provides
the general frame for developing a optimum strategy for sampling on several successive
occasion. Avadhani and Sukhatme (1972) worked out the detailed formula controlled
simple random sampling with ratio estimator and RHC techniques for building up the
double sampling estimator
5. The Overview on Sampling on Several Successive Occasions with Equal and
Unequal Probabilities with out Replacement Discussed by Avadhani and
Sukhatme (1972)
This method was given by Avadhani and Sukhatme (1972) in their paper sampling on
several successive occasion for with equal and unequal probabilities and without
replacement.Hhere they have discussed the sampling on successive occasions for
estimating terms and relationship in a time series involves a number of problems in theory
and in practical survey design that need special attention. The problem arises because of
the need for estimating several parameters from a sample thus estimates are often needed
of aggregates or average at each period of time, such as for each month.
Since it is not desirable that the estimate at each period require the revision of the
preceding estimates, the problem of estimation in times series raises question such as
whether the sampling unit s should be identical at different pointing time, if not, what
9

Some Aspects of Successive Sampling

proportion of units should be identical (i.e., matched) and how one should utilize the
information from the past occasion to improve the estimates for the current occasion. They
have proved that equal and unequal probability sampling without replacement strategies
for estimating the population mean on the current occasion is the most efficient way , in a
series of successive occasion where in information obtained from the past occasion is used
in the best possible way.
6. Use of SRSWOR to Obtain a Better Estimate
Using simple random sampling (SRSWOR) on both occasions a new scheme of selecting
the unmatched part of the sample on the second occasion is presented. This procedure is
discussed by Ravindra Singh (1972) in his paper A Note on Sampling over Two
Occasion. This proposed estimate of the population mean is more efficient than the
estimate given by Pathak and Rao(1967) for the same expected cost.
Here let be respectively denote the values of the study variable for the jth unit of the
population on the first and the second occasions (j=1,2..,N). Also let Y1 and Y2 denote the
population mean and S 22 and S 22 be the population variances for the two occasions.
Now we consider the following three schemes of sampling over two occasions:
a. On the first occasion a simple random sample s of n units is selected with out
replacement (SRSWOR) from . On the second occasion a simple random sample s1of
m units and a simple random sample s2 of (n-m) units is selected independently from
with out replacement.
b. Selection of samples s and s1 is same as (a) but instead of s2 of (a) an s2 (b) of u units is
selected from -s without replacement.
c. Selection of samples s and s1 is same as (a)in this case also but instead of s2 of(a) a s2
(c)of u units is selected from - s1 without replacement.
Scheme (a) has been considered by many authors like Cochran (1963, Pathak and Rao
(1967) and Sukhatme (1970), scheme (b) is used by Kulldroff (1963) and considerd by Rao
and Ghangurde (1969). Singh had proposed the scheme (c), which is more efficient than
the estimator given by the scheme (a) but less efficient than the estimator given by the
scheme (b ).
7. Application of Successive Sampling in Agriculture for Estimating the Incidence of
Pest and Diseases on the Field Crops
This work is done by T.P. Abraham, R.K.Khosla and O.P.Kathuria from Institute of
Agricultural Research Statistics, New Delhi-12, in the year 1969.
Surveys to estimate the incidence of pest and diseases on field crops have to be generally
repeated due to large variation in the incidence of pest and diseases from year to year .It is
therefore interesting to examine the partial replacement of units in such repeat surveys
especially when taking some of the sampling units common from one year to another is
operationally convenient. In particular we examine how far partial matching of sampling
units is helpful in obtaining a better estimate of,

1. The incidence in the second year of the survey.


10

Some Aspects of Successive Sampling

2. the changes in occurrence from one year to other ,


3. Overall mean incidence over the two year.
For this a survey was conducted in Cuttack district of Orissa on major pest of rice (i.e. stem
borer and gallfly) and major disease was Helminthosporium oryzae , in each of the fields
periodical observation was on various pest and diseases are taken at an interval about a
month and up to and including harvest the first observation is taken after a month of
planting.
Now in each of the plots the no of the plots the number of the plants are recorded. The
number of dead hearts due to stem borer is recorded and silver shoot by gall fly is also
recorded. In case of helminthosporium disease, some plants are selected and the leaves with
maximum infection are chosen and the intensity of the infection was noted in comparison
with the standard chart given by Central Rice Research Institute, Cuttack. Also the
manifestation by those pests also noted field wise average percentage of incidence of pest
and diseases was worked out.
Now the estimates in change in incidence of stem borer and gallfly was taken mainly on
kharif and rabi season and it is found after applying the method s of sampling on different
occasion it is seen that the incidence of stem borer and gallfly in the months of March and
October during rabi And kharif seasons respectively is more than any other months. it is
also found that the incidence of those pest is much more in kharif than in rabi season. So
we can see how this sampling scheme can be used in agricultural experiments.
References
Avadhani, M.S and Sukhatme, B.V. (1972). Sampling on several successive occasion with
equal and unequal probabilities and without replacement, Australian Journal of
Statistics, 14,109.
Eckler, Albert R. (1955)-Rotation Sampling, Annals of Mathematical Statistics, 26,664.
Avadhani, M.S. (1991). Theory of sample survey and fields of application, Patanjali
publication,
Parimal M. (1998). Theory and methods of survey sampling, Prentice- Hall India
publication.
Singh, D. and Chaudhary, F.S. (1997). Theory and analysis of sample survey designs, New
age international (p) ltd, publishers,
Abraham,T.P., Khosla, R.K. and Kathuria, O.P. (1969). Some investigations on the use of
successive sampling in pest and diseases surveys, Journal of the Indian Society of
Agricultural Statistics, 21, 43.
Patterson, H.D. (1950). Sampling on successive occasion with partial replacement of
Units. J.Roy.Stat.Soc.Ser.B, 12, 241-245.
Yates, F. (1949). Sampling methods for census and surveys. Griffin pp.175-182, 233-235,
260-262.
Hanssen, M.H., Hurwitz, W.N. and Madow, W.G. (1953). Sample survey methods and
theory, John Wiley and Sons, vol.1, 490-503, vol.2. 268-279.
Singh, R. (1972). A Note on Sampling on Several Successive Occasions. Austral. J.
Statist., 14(2), 120-122.
Pathak, P.J. and Rao, T.J. (1967). Inadmissibility of customary estimators of sampling over
two occasions. Sankhya, (A), 31(4), 463-472.

11