Geostatistics. Many Statistical Tools Are Useful

INTRODUCTION
This book presents an introduction to the set of tools that has become
known commonly as geostatistics. Many statistical tools are useful
in developing qualitative insights into a wide variety of natural phenomena; many others can be used to develop quantitative answers
t o specific questions. Unfortunately, most classical statistical methods make no use of the spatial information in earth science data sets.
Geostatistics offers a way of describing the spatial continuity that is an
essential feature of many natural phenomena and provides adaptations
of classical regression techniques to take advantage of this continuity.
The presentation of geostatistics in this book is not heavily mathematical. Few theoretical derivations or formal proofs are given; instead,
references are provided to more rigorous treatments of the material.
The reader should be able to recall basic calculus and be comfortable
with finding the minimum of a function by using the first derivative
and representing a spatial average as an integral. Matrix notation is
used in some of the later chapters since it offers a compact way of writing systems of simultaneous equations. The reader should also have
some familiarity with the statistical concepts presented in Chapters 2
and 3.
Though we have avoided mathematical formalism, the presentation
is not simplistic. The book is built around a series of case studies on
a distressingly real data set. As we soon shall see, analysis of earth
science data can be both frustrating and fraught with difficulty. We
intend to trudge through the muddy spots, stumble into the pitfalls,
and wander into some of the dead ends. Anyone who has already
A n Introduction to Applied Geostatistics
tackled a geostatistical study will sympathize with us in our many

dilemmas.
Our case studies different from those that practitioners encounter
in only one aspect; throughout our study we will have access to the
correct answers. The data set with which we perform the studies is in
fact a subset of a much larger, completely known data set. This gives
us a yardstick by which we can measure the success of several different
approaches.
A warning is appropriate here. The solutions we propose in the
various case studies are particular t o the data set we use. It is not our
intention t o propose these as general recipes. The hallmark of a good
geostatistical study is customization of the approach to the problem at
hand. All we intend in these studies is t o cultivate an understanding of
what various geostatistical tools can d o and, more importantly, what
their limitations are.
The Walker Lake Data Set

T h e focus of this book is a data set that was derived from a digital
elevation model from the western United States; the Walker Lake area
in Nevada.
We will not be using the original elevation values as variables in
our case studies. The variables we do use, however, are related t o the
elevation and, as we shall see, their maps exhibit features which are
related to the topographic features in Figure 1.1. For this reason, we
will be referring t o specific sub areas within the Walker Lake area by
the geographic names given in Figure 1.1.
The original digital elevation model contained elevations for about
2 million points on a regular grid. These elevations have been transformed to produce a data set consisting of three variables measured
a t each of 78,000 points on a 260 x 300 rectangular grid. T h e first
t w o variables are continuous and their values range from zero to several thousands. The third variable is discrete and its value is either
one or two. Details on how t o obtain the digital elevation model and
reproduce this data set are given in Appendix A.
We have tried to avoid writing a book that is too specific
t o one field of application. For this reason the variables in the
Walker Lake d a t a set are referred t o anonymously as V , U and T. Unfortunately, a bias toward mining applications will occasionally creep
Introduction
IlawUlane
Kl
NEVADA
Figure 1.1 A location map of the Walker Lake area in Nevada. The small rectangle
on the outline of Nevada shows the relative location of the area within the state.
The larger rectangle shows the major topographic features within the area.
in; this reflects both the historical roots of geostatistics as well as the
experience of the authors. The methods discussed here, however, are
quite generally applicable t o any data set in which the values are spatially continuous.
The continuous variables, V and U ,could be thicknesses of a geologic horizon or the concentration of some pollutant; they could be soil
strength measurements or permeabilities; they could be rainfall measurements or the diameters of trees. The discrete variable, T , can be
viewed as a number that assigns each point to one of two possible categories; it could record some important color difference or two different
species; it could separate different rock types or different soil lithologies; it could record some chemical difference such as the presence or
absence of a particular element.
For the sake of convenience and consistency we will refer to V and
U as concentrations of some material and will give both of them units
of parts per million (ppm). We will treat T as an indicator of two
types that will be referred to as type 1 and type 2. Finally, we will
assign units of meters t o our grid even though its original dimensions
are much larger than 260 x 300 m2.
T h e Walker Lake data set consists of V , U and T measurements a t
each of 78,000 points on a 1 x 1 m2 grid. From this extremely dense
d a t a set a subset of 470 sample points has been chosen t o represent a
typical sample data set. To distinguish between these two data sets,
the complete set of all information for the 78,000 points is called the
exhaustive data set, while the smaller subset of 470 points is called the
sample data set.
Goals of the Case Studies

Using the 470 samples in the sample data set we will address the following problems:
1. The description of the important features of the data.
2. The estimation of an average value over a large area.
3. The estimation of an unknown value at a particular location.
4. The estimation of an average value over small areas.

5 . The use of the available sampling t o check the performance of a n
estimation methodology.
6. The use of sample values of one variable t o improve the estimation of another variable.
7. The estimation of a distribution of values over a large area.

8. T h e estimation of a distribution of values over small areas.
9. The estimation of a distribution of block averages.
10. The assessment of the uncertainty of our various estimates.
Introduction
The first question, despite being largely qualitative, is very important. Organization and presentation is a vital step in communicating
the essential features of a large data set. In the first part of this book
we will look a t descriptive tools. Univariate and bivariate description
are covered in Chapters 2 and 3. In Chapter 4 we will look a t various
ways of describing the spatial features of a data set. We will then take
all of the descriptive tools from these first chapters and apply them
to the Walker Lake data sets. The exhaustive data set is analyzed in
Chapter 5 and the sample data set is examined in Chapters 6 and 7.
The remaining questions all deal with estimation, which is the topic
of the second part of the book. Using the information in the sample
data set we will estimate various unknown quantities and see how well
we have done by using the exhaustive data set to check our estimates.
O u r approach to estimation, as discussed in Chapter 8, is first to consider what it is we are trying to estimate and then t o adopt a method
that is suited to that particular problem. Three important considerations form the framework for our presentation of estimation in this
book. First, do we want an estimate over a large area or estimates for
specific local areas? Second, are we interested only in some average
value or in the complete distribution of values? Third, do we want our
estimates to refer to a volume of the same size as our sample data or
do we prefer to have our estimates refer to a different volume?
In Chapter 9 we will discuss why models are necessary and introduce the probabilistic models common to geostatistics. In Chapter
10 we will present two methods for estimating an average value over
a large area. We then turn to the problem of local estimation. In
Chapter 11 we will look at some nongeostatistical methods that are
commonly used for local estimation. This is followed in Chapter 12
by a presentation of the geostatistical method known as ordinary point
kriging. The adaptation of point estimation methods t o handle the
problem of local block estimates is discussed in Chapter 13.
Following the discussion in Chapter 14 of the important issue of
the search strategy, we will look a t cross validation in Chapter 15
and show how this procedure may be used to improve an estimation
methodology. In Chapter 16 we will address the practical problem of
modeling variograms, an issue that arises in geostatistical approaches
to estimation.
In Chapter 17 we will look at how to use related information t o
improve estimation. This is a complication that commonly arises in
practice when one variable is undersampled. When we analyze the

sample data set in Chapter 6, we will see that the measurements of the
second variable, U , are missing a t many sample locations. T h e method
of cokriging presented in Chapter 17 allows us to incorporate the more
abundant V sample values in the estimation of U , taking advantage
of the relationship between the two t o improve our estimation of the
more sparsely sampled U variable.
The estimation of a complete distribution is typically of more use
in practice than is the estimation of a single average value. In many
applications one is interested not in an overall average value but in
the average value above some specified threshold. This threshold is
often some extreme value and the estimation of the distribution above
extreme values calls for different techniques than the estimation of the
overall mean. In Chapter 18 we will explore the estimation of local
and global distributions. We will present the indicator approach, one
of several advanced techniques developed specifically for the estimation
of local distributions.
A further complication arises if we want our estimates t o refer to
a volume different from the volume of our samples. This is commonly
referred t o as the support problem and frequently occurs in practical
applications. For example, in a model of a petroleum reservoir one does
not need estimated permeabilities for core-sized volumes but rather for
much larger blocks. In a mine, one will be mining and processing volumes much larger than the volume of the samples that are typically
available for a feasibility study. In Chapter 19 we will show that the
distribution of point values is not the same as the distribution of average block values and present two methods for accounting for this
discrepancy.
In Chapter 20 we will look a t the assessment of uncertainty, an issue
that is typically muddied by a lack of a clear objective meaning for the
various uncertainty measures that probabilistic models can provide.
We will look a t several common problems, discuss how our probabilistic
model might provide a relevant answer, and use the exhaustive data
set to check the performance of various methods.
The final chapter provides a recap of the tools discussed in the
book, recalling their strengths and their limitations. Since this book
attempts a n introduction to basic methods, many advanced methods
have not been touched, however, the types of problems that require
more advanced methods are discussed and further references are given.
Introduction
Before we begin exploring some basic geostatisticd tools, we would

like to emphasize that the case studies used throughout the book are
presented for their educational value and not necessarily to provide a
definitive case study of the Walker Lake data set. It is our hope that
this book will enable a reader to explore new and creative combinations
of the many available tools and to improve on the rather simple studies
we have presented here.

Geostatistics. Many Statistical Tools Are Useful

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Geostatistics. Many Statistical Tools Are Useful

Diunggah oleh

Hak Cipta:

Format Tersedia

INTRODUCTION

A n Introduction to Applied Geostatistics

tackled a geostatistical study will sympathize with us in our many

The Walker Lake Data Set

A n Introduction to Applied Geostatistics

Goals of the Case Studies

3. The estimation of an unknown value at a particular location.

4. The estimation of an average value over small areas.

7. The estimation of a distribution of values over a large area.

10. The assessment of the uncertainty of our various estimates.

A n Introduction to Applied Geostatistics

practice when one variable is undersampled. When we analyze the

Before we begin exploring some basic geostatisticd tools, we would

Anda mungkin juga menyukai