:)
i=1
`
c:i)
(C
Y
1
[(X
A
.Y
1
)
)
`
i
(C
Y
1
[Y
1
)
.
34 Bueso, Angulo, Alonso
where `
i
(C) is the ith eigenvalue of C (eigenvalues are taken in decreasing order).
In the algorithm proposed here we start with an initial location set o
c
(for example,
the solution obtained by the greedy algorithm) and the upper bound l1 := [C
X
A
[Y
oc
[.
Consider the initial set of active subproblems L = (C
Y
11
. 1. 1. :) and the lower
bound 11 := d(C
Y
11
. 1. 1. :). The algorithm consists of the following steps:
1. If 11 < l1, remove an active subproblem (C
Y
11
. 1
/
. 1
/
. :) from L and select
i 1
/
as a branching index.
(a) Consider subproblem (C
Y
11
. 1
/
. 1
/
i. :).
i. If [1
/
[ [1
/
[ 1 :, append (C
Y
11
. 1
/
. 1
/
i. :) to L and calculate
d(C
Y
11
. 1
/
. 1
/
i. :).
ii. If [1
/
[ [1
/
[ 1 = :, define o := 1
/
1
/
i. If [C
X
A
[Y
o
[ < l1, replace o
c
with
o and set l1 := [C
X
A
[Y
o
[.
(b) Consider subproblem (C
Y
11
. 1
/
i. 1
/
i. :).
i. If [1
/
[ 1 < :, append (C
Y
11
. 1
/
i. 1
/
i. :) to L and calculate
d(C
Y
11
. 1
/
i. 1
/
i. :).
ii. If [1
/
[ 1 = :, define o := 1
/
i. If [C
X
A
[Y
o
[ < l1, replace o
c
with o and
set l1 := [C
X
A
[Y
o
[.
(c) Update 11 := nin
1L
d(1).
2. Otherwise, o
c
is an optimal solution.
3. An application to piezometric data
We illustrate the method described above by an application using piezometric data
from the Ve lez aquifer (Ma laga, Spain), consisting of observations from 66 wells. The
data have been collected by the Instituto del Agua (Water Institute) at the University
of Granada (Spain). The observations represent water heights in metres above sea level
and are shown in Table 1. The :
1
and :
2
coordinates are in UTM (Universal Transverse
Mercator). Figure 1 shows a contour-level plot of piezometric heads obtained by ordin-
ary isotropic kriging with a linear variogram.
We assume that the random component of the piezometric random field A satisfies a
stochastic partial differential equation given by the following expression:
0
2
r(s)
0:
2
1
0
2
r(s)
0:
2
2
cr(s) = c(s). (10)
where s = (:
1
. :
2
)
/
is the continuous coordinate vector, c is a positive parameter and
c(s) is white noise with variance o
2
c
. Equation (10) is the stochastic Laplace equation
considered by Whittle (1954) for data observed on a complete grid, and by Jones (1989)
and Angulo et al. (1994) for irregularly observed data (extensions of this equation have
been proposed by Vecchia, 1988, and by Jones and Vecchia, 1993). A justification of
the meaning and use of model (10) for the representation of piezometric head data is
provided, for example, in Jones (1989).
Process r(s) in equation (10) has an isotropic correlation structure in space given by
(Whittle, 1954)
Optimum spatial sampling design 35
,(i) = i
c
_
1
1
(i
c
_
).
where i is the distance between points and 1
1
is the modified Bessel function of the
second kind order 1. This equation establishes the meaning of parameter c with regard
to spatial dependence in model (10).
We consider the sample information to be given by observations at i locations
:
1
. . . . . :
i
, of a process, y(s), related to process r(s) by the observation equation
y(s
i
) = r(s
i
) c(s
i
). i = 1. . . . . i. (11)
where c(s
i
) (i = 1. . . . . i) are the measurement errors with zero mean and variance o
2
c
.
The unknown parameters are c, o
2
c
and o
2
c
. We assume that r(s) and y(s) are Gaussian
processes.
36 Bueso, Angulo, Alonso
:
1
:
2
Water heights
395000 4075788 71.509
397028 4074353 44.853
396920 4074295 44.793
397028 4074318 43.334
397065 4074220 43.161
397235 4074390 41.921
397823 4073703 36.06
398108 4073688 33.845
398978 4072778 25.39
399434 4075893 47.545
399452 4075898 46.189
399359 4075228 37.063
399378 4074843 34.163
399438 4074685 32.81
399465 4074235 30.298
399540 4073320 27.369
399890 4072990 26.554
399470 4072265 23.215
399448 4072303 23.265
399640 4072320 23.117
399600 4072375 23.972
399930 4071465 16.849
399875 4070850 15.781
400423 4071088 16.069
400780 4071150 16.06
400925 4071040 17.254
400050 4070760 15.722
400710 4070098 12.976
400713 4069963 12.758
400210 4069970 12.319
400453 4069543 11.166
400458 4069448 10.476
400478 4069303 9.93
:
1
:
2
Water heights
400530 4069555 11.585
400530 4069670 12.066
400578 4069333 10.093
400713 4069058 8.892
400888 4068802 7.064
401164 4068658 6.339
400865 4068540 6.27
400938 4068353 5.358
400878 4068118 4.015
400715 4068020 2.938
400978 4067983 3.191
400815 4067903 1.204
400793 4067302 0.226
400560 4067100 0.81
400726 4066843 0.043
400778 4066678 0.396
400545 4066480 2.839
400620 4066360 0.387
400753 4066213 0.268
400830 4066275 0.275
401100 4066365 0.325
401002 4065793 0.039
400790 4065330 0.051
400925 4065515 0.005
400635 4065820 0.181
400710 4065650 0.073
400662 4065490 0.205
400690 4065325 0.076
400425 4065535 0.463
400220 4065655 0.191
401645 4066125 0.254
401412 4065890 0.28
401420 4065715 0.203
Table 1. Piezometric data from 66 wells in the Ve lez aquifer (Ma laga, Spain)
In order to apply the method described in section 2, we need to compute the condi-
tional covariance matrix of r(s) given y(s
i
). i = 1. . . . . i (for s in A), according to equa-
tion (8). We estimate the covariance matrix using the maximum-likelihood estimates
obtained for the parameters from observations at the 66 wells. For the purposes of illus-
tration, we consider reducing the pre-existing network. First, a deterministic trend is
removed from the original data by fitting a quadratic surface, the residuals obtained
then being considered as the values for y(s) in the observation equation (11). We estimate
the values of the unknown parameters using the approach in Jones (1989), which results in
values of c = 0.00000280, o
c
= 0.46202, and i = 0.1068, with i = o
2
c
,o
2
r
. Assuming these
estimates to be true values for the parameters, a reduction of the network is performed.
To that end, a FORTRAN 77 program has been developed.
In all the cases studied here, a certain subregion of the aquifer domain defined by a
discrete mesh (see Fig. 1) is considered as set A. First, we consider a sequential (optimal
in each step) reduction of the network. The initial network and the prediction error
Optimum spatial sampling design 37
Figure 1. Contour-level plot of piezometric heads in the Ve lez aquifer (Ma laga, Spain),
and subregion of interest (coordinates are in UTM).
38 Bueso, Angulo, Alonso
Figure 2. Initial network for 66 observed locations in the Ve lez aquifer (Ma laga, Spain).
Figure 3. Contour-level map of prediction error standard deviations for the initial
network.
Optimum spatial sampling design 39
Figure 4. Resulting network after (sequentially) deleting 44 sites.
Figure 5. Contour-level map of prediction error standard deviations after (sequentially)
deleting 44 sites.
standard deviations are shown in Figs 2 and 3, respectively. The resulting network after
deleting 44 sites and the corresponding prediction error standard deviations are dis-
played in Figs 4 and 5, respectively. In Fig. 6, the resulting conditional entropy
(except for constant terms) and the rate of information 1(X
A
. Y
oo
d
),1(X
A
. Y
o
) are
represented with respect to the number of deleted sites. By using this ratio, we can
determine the maximum number of locations to be observed to maintain a certain
rate of information. For example, to retain 80% of the amount of information con-
40 Bueso, Angulo, Alonso
Figure 6. Conditional entropy (except for constant terms) and rate of information vs.
number of (sequentially) deleted sites.
tained in the initial network on the region of interest, we require at least 22 locations to
be observed. Finally, in Fig 7, we compare the entropy-based criterion with other
related criteria based on alternative measurements of the covariance matrix such as
the trace, the maximum eigenvalue, or the maximum element of the diagonal (see,
for example, Mardia and Goodall, 1993). In our context, these criteria are respectively
formulated as follows:
Optimum spatial sampling design 41
Figure 7. Comparison of the conditional entropy (except for constant terms) in sequen-
tial reduction of network obtained for the entropy-based criterion and other related
criteria based on alternative measurements of the covariance matrix (trace, maximum
eigenvalue and maximum element of the diagonal).
nin
(o
d
)
liC
X
A
[Y
oo
d
nin
(o
d
)
(nax
`
i
`
oo
d
i
)
nin
(o
d
)
(nax
u
i
u
oo
d
i
).
with `
oo
d
i
and u
oo
d
i
, for i = 1. . . . . `, being the eigenvalues and the elements of the
diagonal of C
X
A
[Y
oo
d
, respectively.
In the second example studied, we consider only 45 of the 66 available sites as poten-
tial locations to be observed. The non-included sites are located in the north of the
aquifer, far away from the region of interest, and their influence is negligible. We
force 25 predetermined sites to be in the network (see Fig. 8), completing it by addition
of 15 more sites. The optimal design has been achieved using the exact algorithm pre-
sented in section 2.3. The sequential solution has been taken as the starting solution for
the algorithm, which in the end turned out to be optimal. The results of this example
are shown in Fig. 8.
4. Conclusion
The objective of this work is to present a methodology to design or redesign a spatial
network when the underlying process of interest and the observation is defined by
means of a state-space model. The main advantage of working within this framework
is given by the fact that in many practical situations the available data may not corre-
spond to the variable of interest. In addition, the potentially observable locations may
be different from the set of interest sites for the variable A.
42 Bueso, Angulo, Alonso
Figure 8. Map containing the locations: non-included sites, v forced sites, non-
selected sites, and selected sites in the network.
The entropy-based approach to spatial sampling design has been extensively applied
in the literature. In the state-space-model context, the procedure consists in minimizing
the conditional entropy. In the Gaussian case, this is equivalent to minimizing the loga-
rithm of a conditional covariance matrix.
A detailed formulation for extending or reducing a pre-existing network is shown.
When the set of locations is increased (decreased) by sequentially adding (deleting)
locations, the procedure is quite simple by appropriately handling blocks of a certain
covariance matrix.
When the problem is of a large dimension, the achievement of an optimal network is
costly in computing time. An exact algorithm for finding an optimal design, adapting
the algorithm proposed by Ko et al. (1995) to the space-state-model framework, is pre-
sented. In the example studied here the optimal solution obtained coincides with the
sequential one.
The model proposed for the observed variable in the examples treated in this paper
consists in adding an observation error to the variable of interest. More complex
models may be considered as the procedure only requires, in the Gaussian case, the
covariance structure between the involved variables.
Acknowledgements
We thank the editor and the three referees for their helpful comments and suggestions,
which have significantly improved this paper.
This work has been supported in part by the Plan Nacional de I+D (Project AMB93
0932) of the Comisio n Interministerial de Ciencia y Tecnologa, Ministerio de
Educacio n y Ciencia, Spain.
We are also grateful to the Instituto del Agua (Water Institute), Universidad de
Granada (Spain), in particular to Jose , L. Garca-Aro stegui for support in preparing
the piezometric data of Ve lez aquifer and graphics.
References
Angulo, J.M., Azari, A.S., Shumway, R.H., and Yucel, Z.T. (1994) Fourier approximations for
estimation and smoothing of irregularly observed spatial processes. Stochastic and
Statistical Methods in Hydrology and Environmental Engineering, 2, 35365.
Aspie, D. and Barnes, R.J. (1990) Infill-sampling design and the cost of classification errors.
Mathematical Geology, 22(8), 91532.
Boga rdi, I., Ba rdossy, A., and Duckstein, L. (1985) Multicriterion network design using geosta-
tistics. Water Resources Research, 21(2), 199208.
Bras, R.L. and Rodrguez-Iturbe, I. (1976) Network design for the estimation of areal mean of
rainfall events. Water Resources Research, 12(6), 118595.
Caselton, W.F. and Hussian, T. (1980) Hydrologic networks: Information transmission. Journal of
the Water Resources Planning and Management Division, A.S.C.E., 106 (WR2), 50320.
Caselton, W.F., Kan, L., and Zidek, J.V. (1991) Quality data network designs based on entropy. In
Statistics in the Environmental and Earth Sciences, P. Guttorp and A. Walden (eds), Griffin,
London.
Optimum spatial sampling design 43
Caselton, W.F. and Zidek, J.V. (1984) Optimal monitoring network designs. Statistics and
Probability Letters, 2, 2237.
Christakos, G. (1992) Random Field Models in Earth Sciences. Academic Press, San Diego.
Cressie, N.A.C. (1991) Statistics for Spatial Data. Wiley, New York.
De Gruijter, J.J. and Ter Braak, C.J.F. (1990) Model-free estimation from spatial samples: A
reappraisal of classical sampling theory. Mathematical Geology, 22(4), 40715.
Guttorp, P., Le, N.D., Sampson, P.D., and Zidek, J.V. (1993) Using entropy in the redesign of an
environmental monitoring network. In Multivariate Environmental Statistics, G.P. Patil and
C.R. Rao (eds), Elsevier, New York pp. 175202.
Haas, T.C. (1992) Redesigning continental-scale monitoring networks. Atmospheric Environment,
26A, 18, 332333.
Jones, R.H. (1989) Fitting a stochastic partial differential equation to aquifer data. Stochastic
Hydrology and Hydraulics, 3, 8596.
Jones, R.H. and Vecchia, A.V. (1993) Fitting continuous ARMA models to unequally spaced
spatial data. Journal of the American Statistical Association, 88, 94754.
Journel, A.G. (1994) Resampling from stochastic simulations. Environmental and Ecological
Statistics, 1, 6391.
Ko, C.-W., Lee, J., and Queyranne, M. (1995) An exact algorithm for maximum entropy
sampling. Operations Research, 43, 68491.
Mardia, K.V. and Goodall, C.R. (1993) Spatial-temporal analysis of multivariate environmental
monitoring data. In Multivariate Environmental Statistics, G.P. Patil and C.R. Rao (eds),
Elsevier, New York, pp. 34786.
Samper, F.J. and Carrera, J. (1990) Geoestad stica. Aplicaciones a la hidrogeolog a subterranea.
Gra ficas Torres, Barcelona.
Shannon, C.E. (1948) A mathematical theory of communication. Bell System Technical Journal,
27, 379423.
Trujillo-Ventura, A. and Ellis, J.H. (1991) Multiobjective air pollution monitoring network
design. Atmospheric Environment, 25A(2), 46979.
Vecchia, A.V. (1988) Estimation and model identification for continuous spatial processes.
Journal of the Royal Statistical Society B, 50, 292312.
Whittle, P. (1954) On stationary processes in the plane. Biometrika, 41, 43449.
Wu, S. and Zidek, J.V. (1992) An entropy-based analysis of data from selected NADP/NTN
network sites for 19831986. Atmospheric Environment, 26A(11), 2089103.
Biographical sketches
The authors are members of the Departmento de Estadstica e Investigacio n Operativa
of Universidad de Granada, Spain, and collaborate on a regular basis with the Instituto
del Agua of this university on stochastic modelling and applications in hydrology, cur-
rently under project AMB93-0932, of Environment and natural Resources Planning, of
the Comisio n Interministeral de Ciencia y Technologa, Ministerio de Educacio n y
Ciencia, Spain. Jose M. Angulo, who is Associate Professor, is the person responsible
for the above mentioned project, and heads the research group on space-time stochastic
modelling. Francisco J. Alonso is Assistant Professor, and did his Ph.D. on estimation
and prediction of spatial processes. Maria C. Bueso is Assistant Professor and her
research is related to spatial sampling design problems.
44 Bueso, Angulo, Alonso