Applied Geostats Intro

An introduction to applied geostatistics
Part 4 – Model validation; simulation
Overheads
D G Rossiter
Department of Earth Systems Analysis
International Institute for Geo-information Science & Earth Observation (ITC)
<http://www.itc.nl/personal/rossiter>
July 9, 2005
AN INTRODUCTION TO APPLIED GEOSTATISTICS 1
Model validation
With any predictive method, we would like to know how good it is. This is model
validation.
• cf. model calibration, when we are building the model
The basic idea is to compare model predictions with reality. Two main
methods:
1. Separate validation dataset
2. Cross-validation using calibration dataset
D G R OSSITER
Independent validation
Simple measures of validity:
• Root mean squared error (RMSE) of the residuals: the actual vs. estimate
(from the model) in the validation dataset; lower is better:
" #1/2
n
1
RMSE = ∑
n i=1
(ŷi − yi )2
• Bias or mean error (ME) of estimated vs. actual mean of the validation
dataset; should be zero (0)
1 n
ME = ∑(ŷi − yi)
n i=1
D G R OSSITER
Cross-validation
If we don’t have an independent data set to evaluate a model, we can use the
same sample points that were used to estimate the model to validate that same
model.
This seems a bit dubious, but with enough points, the effect of the removed point
on the model (which was estimated using that point) is minor.
N.b. this is not legitimate for non-geostatistical models, because there is no

theory of spatial correlation.
D G R OSSITER
Cross–validation procedure
1. Compute experimental variogram with all sample points; model it
2. For each point

(a) Remove the point from the sample set
(b) predict at that point using the other points and the modelled variogram
3. Summarize the deviations of the model from the actual point
Then models can be compared by their summary statistics, also by looking at

individual predictions of interest.
D G R OSSITER
Summary statistics for cross–validation
• Root Mean Square Error (RMSE): lower is better ; computed as for

independent validation
• Bias or mean error (ME): should be 0; computed as for independent validation
• Mean Squared Deviation Ratio (MSDR) of residuals with kriging variance:

should be 1
1 N {z(~xi) − ẑ(~xi)}2
MSDR = ∑
N i=1 σ̂ 2(~xi)
D G R OSSITER
Cross-validation in gstat
> # leave-one-out cross-validation
> kcv<-krige.cv(log(cadmium)~1, ~x+y, meuse, model=m2); kcv
‘data.frame’: 155 obs. of 8 variables:
$ x : num 181072 181025 181165 181298 181307 ...
$ y : num 333611 333558 333537 333484 333330 ...
$ var1.pred: num 1.482 1.649 1.452 1.259 0.952 ...
$ var1.var : num 0.796 0.781 0.768 0.811 0.761 ...
$ observed : Named num 2.460 2.152 1.872 0.956 1.030 ...
..- attr(*, "names")= chr "1" "2" "3" "4" ...
$ residual : num 0.9774 0.5031 0.4195 -0.3033 0.0773 ...
$ zscore : num 1.0953 0.5692 0.4786 -0.3368 0.0886 ...
$ fold : int 1 2 3 4 5 6 7 8 9 10 ...
> truehist(kcv$residual)
> # some residuals are very large, show their locations
> bubble(kcv, z="residual", fill=F)
> # measures of goodness: ME = 0, MSE low, MSDR = 1
> mean(kcv$residual); mean(kcv$residual**2)
> mean((kcv$residual)**2/kcv$var1.var)
[1] 7.996923e-05
[1] 0.8644106
[1] 1.129047
D G R OSSITER
Residuals from cross-validation and their location
residual
●●
●
0.6
●
●● ●
●● ● ●
●● ●
●
● ●
●
333000 ●
0.5
● ●●
●
●
●
●
●
●
● ● ●● ●
● ●
●
● ● ●●
●
0.4
● ● ●
●●
●●
● ●
●● ●
● ●
●
●
●
●● ●●
332000
●●● ●● ● ● −2.787
●● ●●●●●●
0.3
● −0.384
● 0.057
y
● ●● ●
●
● 0.565
●
● ● ● ● ● 2.424
● ● ●
●
●●
0.2
●● ●
●
●
●
● ●
331000
●●● ●● ●●
● ●
●●● ● ● ●● ●
●●●
●
●
0.1
●
●
●
●
● ● ●
●
● ●
●
● ●
●● ● ●
● ●
●
● ●
0.0
● ● ●
330000
● ●
●
● ●
●
●
−3 −2 −1 0 1 2
kcv$residual 178500 179000 179500 180000 180500 181000 181500

x
D G R OSSITER
Spatial simulation
Simulation is the process or result of representing what reality might look like,
given a model.
In geostatistics, this reality is usually a spatial distribution (map).
D G R OSSITER
What is stochastic simulation?
• “Simulation” is a general term for studying a system without physically

implementing it.
• “Stochastic” simulation means that there is a random component to the

simulation model: quantified uncertainty is included so that each simulation is
different.
• Non-spatial example: planning the number and timing of clerks in a new

branch bank; customer behaviour (arrival times, transaction length) is
stochastic and represented by probability distributions.
• Reference for spatial simulation:

Goovaerts, P., 1997. Geostatistics for natural resources evaluation. Applied
Geostatistics Series. Oxford University Press, New York; Chapter 8.
D G R OSSITER
Why spatial simulation?
• Recall: the theory of regionalized variables assumes that the values we

observe come from some random process; in the simplest case, with one
expected value (first-order stationarity) with a spatially-correlated error that
is the same over the whole area (second-order stationarity).
• So we’d like to see “alternative realities”; that is, spatial patterns that, by this
theory, could have occurred in some “parallel universe”.
• In addition, kriging maps are unrealistically smooth, especially in areas

with low sampling density.
* Even if there is a high nugget effect in the variogram, this variability is not
reflected in adjacent prediction points, since they are estimated from almost
the same data.
D G R OSSITER
When must simulation be used?

Goovaerts: “Smooth interpolated maps should not be used for applications
sensitive to the presence of extreme values and their patterns of continuity.”
(p. 370)
Example: ground water travel time depends on sequences of large or small

values (“critical paths”), not just on individual values.
D G R OSSITER
Local uncertainty vs. spatial uncertainty
• Recall: kriging prediction also provides a prediction error; this is the BLUP
and its error for each prediction location separately.
• So, at each prediction location we obtain a probability distribution of the

prediction, a measure of its uncertainty. This is fine for evaluating each
prediction individually.
• But, it is not valid to evaluate the set of predictions! Errors are by definition
spatially-correlated (as shown by the fitted variogram model), so we can’t
simulate the error in a field by simulating the error in each point separately.
• Spatial uncertainty is a representation of the error over the entire field of

prediction locations at the same time.
D G R OSSITER
Practical applications of spatial simulation
• If the distribution of the target variable(s) over the study area is to be used as
input to a model, then the uncertainty is represented by a number of
simulations.
• Procedure:
1. Simulate a “large” number of realizations of the spatial field
2. Run the model on each simulation
3. Summarize the output of the different model runs
• The statistics of the output give a direct measure of the uncertainty of the
model in the light of the sample and the model of spatial variability.
D G R OSSITER
Unconditional simulation
In unconditional simulation, we simulate the field with no reference to the actual
sample, i.e. the data we have. (It’s only one realistion, no more valid than any
other.)
This is mainly to visualise a random field as modelled by a variogram, not for

prediction.
D G R OSSITER
What is preserved in unconditional simulation?
1. Mean over field
2. Covariance structure
Data points are not predicted exactly.
D G R OSSITER
Unconditional simulation in gstat

The krige function allows a number of simulation nsim to be specified. For
unconditional simulation, specify no data (data=NULL), instead use the
dummy=TRUE option.
Since there is no data with which to estimate the mean, it must be specified as
the beta parameter.
> x <- krige(log(cadmium) ~ 1, ~ x + y, data = NULL, newdata = meuse.grid,
+ model = m2, nmax = 20, nsim = 5, beta=mean(log(cadmium)), dummy = TRUE)
[using unconditional gaussian simulation]
> levelplot(z ~ x + y | name, map.to.lev(x, z=c(3:7)), aspect = mapasp(x),
+ main = "five unconditional realisations of a correlated Gaussian field")
D G R OSSITER
D G R OSSITER
Conditional simulation
This simulates the field, while respecting the sample. So the simulated maps look
more like the best (kriging) prediction, but usually much more spatially-variable
(depending on the magnitude of the nugget).
These are inputs into spatially-explicit models, e.g. hydrology.
D G R OSSITER
What is preserved in conditional simulation?
1. Mean over field
2. Covariance structure
3. Observed data (points are predicted exactly)
D G R OSSITER
Conditional simulation in gstat

Here the data must be named, so the dummy=TRUE option is not used. The beta
parameter may be given (usually as estimated from the data); if not it is estimated
by GLS.
> mean(log(cadmium))
[1] 0.5610659
> sims <- krige(log(cadmium) ~ 1, ~ x + y, meuse, meuse.grid,
+ model = m2, nmax = 64, nsim = 6, beta=mean(log(cadmium)))
[using conditional gaussian simulation]
> levelplot(z ~ x + y | name, map.to.lev(sims, z=c(3:8)), aspect = mapasp(sims),
+ main = "six conditional realisations of a correlated Gaussian field")
D G R OSSITER
D G R OSSITER
Indicator simulation
(See notes “Indicator Kriging”)
Indicator variables can also be simulated. Here the result is a 0/1 variable:
indicator false/true. This is unlike IK where the result is a probability of a 1
(indicator is true).
In gstat, the target value must be an indicator, and the indicator=TRUE argument
must be included. The mean is estimated by the proportion of true indicators in
the sample.
threshold<-4
indicator <- (cadmium >= threshold)
vi<-variogram(indicator ~1, ~x+y, meuse)
mi.f <- fit.variogram(vi, vgm(0.09, "Gau", 500, 0.11)
sims <- krige(indicator ~ 1, ~ x + y, meuse, meuse.grid,
+ model = vm1.f,
+ nsim=6, indicator=TRUE,
+ nmax=64, beta=sum(indicator)/length(indicator))
levelplot(z ~ x + y | name,
map.to.lev(sims, z=c(3:8)), aspect = mapasp(sims),
main = "Six conditional realisations of an indicator variable")
D G R OSSITER
D G R OSSITER

Applied Geostats Intro

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Applied Geostats Intro

Diunggah oleh

Hak Cipta:

Format Tersedia

An introduction to applied geostatistics

Part 4 – Model validation; simulation

• cf. model calibration, when we are building the model

1. Separate validation dataset

2. Cross-validation using calibration dataset

N.b. this is not legitimate for non-geostatistical models, because there is no

1. Compute experimental variogram with all sample points; model it

2. For each point

3. Summarize the deviations of the model from the actual point

Then models can be compared by their summary statistics, also by looking at

Summary statistics for cross–validation

• Root Mean Square Error (RMSE): lower is better ; computed as for

• Bias or mean error (ME): should be 0; computed as for independent validation

• Mean Squared Deviation Ratio (MSDR) of residuals with kriging variance:

Residuals from cross-validation and their location

kcv$residual 178500 179000 179500 180000 180500 181000 181500

In geostatistics, this reality is usually a spatial distribution (map).

What is stochastic simulation?

• “Simulation” is a general term for studying a system without physically

• “Stochastic” simulation means that there is a random component to the

• Non-spatial example: planning the number and timing of clerks in a new

• Reference for spatial simulation:

Why spatial simulation?

• Recall: the theory of regionalized variables assumes that the values we

• In addition, kriging maps are unrealistically smooth, especially in areas

When must simulation be used?

Example: ground water travel time depends on sequences of large or small

Local uncertainty vs. spatial uncertainty

• So, at each prediction location we obtain a probability distribution of the

• Spatial uncertainty is a representation of the error over the entire field of

Practical applications of spatial simulation

This is mainly to visualise a random field as modelled by a variogram, not for

What is preserved in unconditional simulation?

1. Mean over field

Data points are not predicted exactly.

Unconditional simulation in gstat

These are inputs into spatially-explicit models, e.g. hydrology.

What is preserved in conditional simulation?

1. Mean over field

3. Observed data (points are predicted exactly)

Conditional simulation in gstat

Anda mungkin juga menyukai