StatAnalysis Howarth 2010

Statistical analysis and data display at the Geochemical Prospecting
Research Centre and Applied Geochemistry Research Group,

Imperial College, London
Richard J. Howarth1,* & Robert G. Garrett2
1
Dept. of Earth Sciences, University College London, Gower Street, London WC1E 6BT, United Kingdom
2
Emeritus Scientist, Geological Survey of Canada, 601 Booth St., Ottawa, Ontario, K1A 0E8, Canada
*Corresponding author: (e-mail: r.howarth@ucl.ac.uk)
ABSTRACT: The Imperial College of Science and Technology, a constituent college
of the University of London in the 1960s, had the good fortune to be one of the first
colleges in the United Kingdom to have access to digital computing facilities. This
review traces the history of the application of computing in the Geochemical
Prospecting Research Centre and its successor, the Applied Geochemistry Research
Group, as computing moved from being a frontier research area to becoming a
commonplace tool. The three principal areas in which it was involved comprised: the
quality control, and thereby assurance, of analytical data; the production of
pioneering atlases of regional geochemical variation in Northern Ireland (1973) and
England and Wales (1978); and the application of methods introduced by workers
in pattern-recognition and statistics to the interpretation of land-based and marine
regional geochemical data.
KEYWORDS: computers, computing, applied geochemistry, history of geochemistry, history of
statistics, history of cartography, regional mapping, spatial filters, geochemical atlas, SC4020,
LGP2703, multi-element maps, data transformation, factor analysis, cluster analysis, discriminant
analysis, ridge regression, Kleiner-Hartigan trees, robust statistics, quality assurance
The Geochemical Prospecting Research Centre (GPRC) was
established in 1954, under the direction of Professor John
Stuart Webb (19202007), in the Mining Geology section of
the Royal School of Mines (RSM), Imperial College of Science
and Technology (ICST), London. Initial studies were concerned with mineral prospecting using soil and drainage sampling in Northern Rhodesia (Zambia), Uganda, Sierra Leone,
Bechuanaland (Botswana), Tanganyika (Tanzania), British
North Borneo (Sabah, East Malaysia), Burma (Myanmar) and
the Federation of Malaya (West Malaysia), and extended in the
1960s to Southern Rhodesia (Zimbabwe), the Philippine
Republic, Borneo (now divided between Malaysia and
Indonesia), Fiji, East Africa, Australia, and the United
Kingdom. By 1960, its studies had broadened into regional
geochemistry, based on the analysis of stream sediments. In
1963, Webb initiated the first of a series of investigations
concerning the relationship between regional geochemistry and
agricultural problems in livestock in Eire (Webb 1964; Webb &
Atkinson 1965). The application of geochemistry to marine
mineral exploration began in 1964 (Tooms 1967). Consequently, by 1963, the Centres name was changed to the
Applied Geochemistry Research Group (AGRG) to reflect the
increasing breadth of its applications.
The work of the GPRC and AGRG was underpinned by
developments in two complementary spheres: methods and
instrumentation for chemical analysis (discussed in the paper by
Michael Thompson (2010) and computing (Fig. 1). The latter
facilitated: (i) statistical quality-assurance in the analytical labGeochemistry: Exploration, Environment, Analysis, Vol. 10 2010, pp. 289315
DOI 10.1144/1467-7873/09-238
Fig. 1. Annual numbers of GPRC/AGRG publications (total n=76)

and theses (n=24) with a substantial computing and/or statistical
content over the years 195488.
oratory; (ii) the display of large, multi-element, data sets in map

form; and (iii) the interpretation of such multi-element data
sets.
First steps
Many of the early studies undertaken by research students in
the GPRC included simple, manually-based, statistical analyses.
1467-7873/10/$15.00 2010 AAG/Geological Society of London
290
R. J. Howarth & R. G. Garrett
The situation in the early 1960s is summarized in Hawkes &

Webb (1962). The use of histograms to display the frequency
distributions of element concentrations was commonplace,
while probability plots of cumulative frequency distributions
were less frequently prepared. In both cases, the data for the
requisite plots were compiled by hand through the preparation
of tally tables. At that time, analytical quality assurance was
based on the use of statistical series samples. These were a
series of synthetic samples (each of which was composed of
known proportions of two natural end-members, one having a
low concentration of the element of interest, the other a high
concentration) which were included in analytical batches following the procedure developed by the ex-RSM geologist and
chemical engineer, Charles Alex Urton Craven (191893), with
advice from Professor George Alfred Bernard (19152002) of
the Mathematics Department, ICST (Craven 1954), in order to
estimate analytical accuracy and precision. For the large
amounts of photographic-plate spectrographic data generated
at the GPRC, bins for data concentration-ranges were selected
(because of a tendency of operators to unconsciously interpolate values which were biased towards those of the analytical
standards used), using a logarithmic concentration scale, and
bin boundaries were placed mid-way between the known
concentrations of the geochemical standards. A tick (the tally)
was placed in the appropriate bin for each analysis falling in that
range, every fifth count being drawn as a horizontal line
through the previous four ticks. This facilitated counting the
total numbers of analytical results falling into each bin. A book,
widely used by students at the time, was Moroneys (1960)
Facts from Figures, which gave formulae for the calculation of
means and standard deviations from such grouped data, as
accumulated in the tally tables.
For those more interested in statistical analysis, Dixon &
Massey (1957) was the text of choice. However, in the earlyand mid-1960s, textbooks written by geologist and statistician
co-authors started to be published on the topics of statistical
data analysis and modelling, e.g. Miller & Kahn (1962) and
Krumbein & Graybill (1965), and these, together with a
growing number of research papers, did much to expose
students to the possibilities of the application of mathematics
and statistics to applied geochemical problems. In the early
1960s such computations were carried out by means of tables
of logarithms and a six-inch (15 cm) slide-rule, with which
students were as adept as todays are with pocket calculators.
To assist in the calculations (based on a linear regression
model) required by Cravens (1954) method of estimation of
analytical accuracy and precision, preprinted work-sheets were
used; one simply followed the steps and the results were arrived
at very much a black box approach. In order to meet the
requirements of normality of residual errors in the regression
modelling, and homogeneity of variance when the concentration levels in the statistical series samples spanned over an
order-of-magnitude, it was desirable to carry out these calculations following a logarithmic transformation. This was the
subject of an MSc thesis by Stern (1959), but the routine
application of his method was computationally complex, and
essentially impractical for routine application, even using the
Monroe electro-mechanical calculator available in the GPRC.
Sterns supervisor in the Department of Mathematics, Dr. G.
M. Jenkins (193382), who later became an expert in timeseries analysis and systems engineering, appears to have begun
work on improving the deficiencies he recognised in Sterns
approach, in an unpublished manuscript A statistical problem
in geochemical prospecting (1959?, recently found in old
AGRG files). In 1970, an ex-member of the GPRC staff,
Clifford (Cliff) Henry James (19312003), published a version
of Cravens method still adapted to hand-calculation, on the

grounds that one of the difficulties of the method as originally
described is that the calculations involved require a computer
or an electronic calculator with a memory unit . . . many
laboratories do not possess these facilities (James 1970, B88).
REGIONAL MAPPING
Following extensive fieldwork over several thousand square
miles of Africa in the mid-1950s by Webb, Tooms and their
students, it became apparent that there was considerable scope
for regional geochemical surveys based on drainage reconnaissance surveys. By 1960, this hypothesis was confirmed through
further studies in what was then Northern Rhodesia, elsewhere
in Africa, and in S.E. Asia. In 1960, a suite of drainage samples
collected for a base metal drainage reconnaissance survey over
3000 mi2 (7770 km2 ) of the LivingstoneNamwala Concession
area, Zambia, were made available to the GPRC by Namwala
Concessions Ltd. These were analysed spectrographically and
chemically for 17 elements in 196162. Following a study of the
association between trace element concentrations in the drainage
materials and the geology (Harden 1962), it was apparent that
the <80-mesh (<177 m) size fraction of the sediment would be
sufficiently representative. Regional maps for 10 elements (Webb
et al. 1964a), accompanied by a geochemical interpretation
(Webb et al. 1964b), were subsequently published (Fig. 2) and, at
the time, were the most comprehensive of their kind.
Computing arrives
In early 1964, an IBM 7090/1401 installation arrived at
Imperial College and was set up in the Electrical Engineering
building, just behind the Royal School of Mines. The story was
that these machines, at that time considered to be at the leading
edge, had been in one of Britains Atomic Weapons Research
Establishments (AWREs), and for tax purposes IBM donated
them to Imperial College rather than remove them from the
country. The number cruncher was the transistorised IBM
7090, one of the fastest machines of its day and now regarded
as the classic mainframe computer because of its architecture,
performance and financial success (Ceruzzi 1998). It had 32K
words of core memory (equivalent to c. 150 Kbytes today); a
36-bit word-length, which suited it to the accuracy required in
scientific computation; and a large memory-swapping magnetic
drum. The small IBM 1401 computer that accompanied it was
for input of programs or data from decks of 80-column
Hollerith punched cards, copying the input stream to a reel of
magnetic tape for transfer to the mainframe and output to
132-character wide lineprinters. The facility became available
for general use in the summer of 1964. Intending users had to
undertake a brief training course in the FORTRAN IV programming language and the IBSYS operating system, and had
also to become familiar with the keypunch machine used to
prepare the punched cards. Special control cards, containing
specific operating system codes punched into particular columns of the cards, were inserted into the card deck for a job
and informed the computer that what followed was the
FORTRAN program, data, or the start or end of the job, etc.
For his Ph.D. project, Garrett was having to deal with
13-element regional geochemical data from three schist belts in
Sierra Leone, each with up to 1300 sample sites and, by 1964,
he had become totally frustrated with tally sheets. So, courses
were taken, cards were punched, and software to plot histograms was written and punched onto cards. That first GPRC
program (Garrett 1966, 181190) drew histograms using the
lineprinters output characters, computed summary statistics
Fig. 2. Portion of a point-symbol map of the concentration of cold-extractable copper (ppm) in the <80-mesh stream sediment fraction over the Namwala concession area, Zambia. Concentration
isolines (red) were interpolated by eye and hand-drawn. (from Webb et al. 1964a, Map I; original is 30 91 cm in size).
Statistical analysis and data display at GPRC and AGRG

291
292
(with or without a logarithmic transform), calculated the

inter-element correlation coefficients for the data and their
statistical significance using Students [W.S. Gosset (1836
1937)] t-test (Student 1908), all as described in Moroneys
Facts from Figures. The availability of this program was well
worth the investment in time as, finally, the tally sheets were
banished from the GPRC (together with the frustration of
never appearing to have exactly the same number of analyses
for each element!).
The histograms were redrafted for incorporation into the
regional geochemical maps, which at that time were still plotted
by hand, using a standard set of circular symbols of gradually
increasing size and visual density to represent the geochemical
concentration levels (Fig. 2). Sheets of such symbols were later
produced commercially (Letraset Sheet S3589) and were
widely used within AGRG.
Trend-surfaces and rolling means
The advent of computing power was truly a mind-expanding
event, finally mathematical and statistical approaches, hitherto
computationally intractable, were made possible. Encouraged
by statistical advice from Colin James Dixon (19332006) of
the Mining Geology section of the Department and Dr. Dennis
J.G. Farlie of the Mathematics Department, software was
written to provide Sterns (1959) solution to Cravens (1954)
statistical series calculations (Garrett 1966, 176180), and the
work of Miller (1956), Krumbein (1959), Allen & Krumbein
(1962), and Whitten (1959, 1963) on trend-surface analysis (a
method of fitting linear, quadratic, cubic or higher polynomial
surfaces by regression analysis to spatially-distributed data) was
investigated and applied to the ultra-low density basement
survey of Sierra Leone undertaken in 1964 (Garrett, 1966,
121125, 127132; Garrett & Nichol 1967). Whittens (1963)
trend-surface program was adapted to undertake the surfacefitting on the basis of log-transformed concentration values,
reconverting to natural numbers for output of the lineprinter
contour maps. The boundaries between the bands of symbols
were subsequently redrawn by hand to produce the final maps
(Fig. 3). The same approach was used by Hazelhoff-Roelfzema
(1968) in his study of detrital cassiterite distribution in Mounts
Bay, Cornwall.
It is easy to forget that at this time, although computercontouring packages (such as those of McCue (1963) and McCue
& DuPrie (1965), based on values falling at the intersections of a
square grid; and of Shepard (1968) for irregularly-spaced data),
were being written for the IBM 7094, suitable plotters, other than
the Stromberg-Carlson (SC) 4020 Microfilm Recorder used by
McCue at North American Aviation Inc., were not yet widely
available and lineprinter output was more commonly used
(Shepard 1968). Even so, fast, computationally-efficient, solutions for the contouring of irregularly-spaced data sets of more
than a few hundred points did not come into use until the late
1970s (e.g. Akima 1978a, b; 1979).
In 1965 Garrett wrote a FORTRAN IV program (Garrett
1966, 191198) to compute rolling means (Berthorsson &
Doos 1955), a term then used for moving-averages applied in a
2-dimensional spatial context, and the results were found to
compare favourably with the polynomial surface-fitting
approach (Garrett 1966, 136139; Nichol et al. 1969) (Fig. 3).
Residuals from the computed surfaces were used as an indication of potentially anomalous element concentrations. Garrett
(1966) also plotted moving standard deviation surfaces, and
Khaleelee (1969, 484504) extended the program to map the
ratio of within-cell variance to total variance, both as indicators
of the spatial geochemical variability.
The first computer-plotted regional maps

Experiments were also made by Garrett with directly plotting
the point-source geochemical map data onto 35-mm film using
the SC 4020 plotter at AWRE Aldermaston, Berkshire. Tapes
were first written on the ICST computer for each element, with
easting- and northing-coordinates and a symbol number corresponding to the element concentration level at each sample
location. These were then taken to the computer center at
Aldermaston, where a FORTRAN IV program had been
written by Dr. J.G.T. Jones to plot the data using the standard
AGRG set of graded circular symbols to represent increasing
concentration classes (Fig. 4; Nichol et al. 1966a, b). The SC
4020 took high-resolution photographs of plots displayed on a
special cathode-ray tube, the Charactron Tube (developed by
Convair in 1948), in which the cathode-ray beam first passes
through a mask within the tube, taking on the appearance of a
designated character or symbol, before plotting on the tube
screen (which had a 1024 1024 raster resolution) at the
correct spatial location. This enabled complete plot symbols to
be instantaneously displayed at high speed, rather than requiring
them to be drawn individually. A glass plate carrying additional
artwork could be placed between the tube face and the camera,
so that standard backgrounds (e.g. a drainage network and
other geographic information) could be added at no additional
cost in computer time. The final image could be permanently
recorded on 16-mm film, 35 mm film, or as 8 8 in (20 20
cm) square images on a roll of sensitised vellum paper exposed
in a 9.5 in (24 cm) camera and wet-developed.
In 1963, Webb had initiated the first of a series of studies
investigating the relationship between regional geochemistry
and agricultural problems in livestock in Eire (Webb 1964;
Webb & Atkinson 1965) and, in 1965, the counties of Devonshire, Denbighshire and Derbyshire in England, based on
stream sediment surveys. The latter three studies comprised the
first regional-scale geochemical surveys to be undertaken in
Britain. Atlases with 1:0.25M scale maps in colour were
produced, based on data sets of c. 1200 samples in each case
(Nichol et al. 1970a, b; 1971).
Khaleelee, for his PhD project, was working on the Devonshire, Denbighshire and Derbyshire data sets to produce the
atlases, and interpret the data. He investigated (Khaleelee 1969,
104122) the feasibility of posting sample values using an
offline California Computer Products Inc. CALCOMP 563
plotter which plots using a pen onto a continuous roll of paper
31 in (79 cm) wide; both chart drum and the pen are capable of
bi-directional movement. In practice, the limiting factor when
plotting large numbers of points was the number of plotting
instructions that could be written to a single magnetic tape. As
a reminder of how far things have come since these early days,
Figure 5 shows the time required to compute the plotting
instructions, and to actually draw the map, for a set of 25 25
in (64 64 cm) maps requiring 4 or 5 numerical characters for
the posted value at each sample location for a set of 100900
data points with their spatial coordinates drawn from a uniform
random distribution. It was clear that for regional map production, use of the AWRE SC 4020 plotter would again be
essential. In addition to the production of working maps using
the AGRG symbols, as before (Fig. 6; Nichol & Webb 1967),
for the atlas maps, the plotting program was modified by
Khaleelee (1969) to produce separate solid point-symbol
images for each concentration-class level. These were then
combined, using offset-lithography printing, to make maps with
coloured point symbols overprinted on a map showing the
geology and drainage network (Fig. 7). Khaleelee (1969, 113
125) describes the difficulties he encountered in practice in
293
Fig. 3. Comparison of (left) cubic trend-surface and (right) rolling mean (moving-average) maps for nickel concentrations (ppm) in bedrock,
soil, and the <80-mesh stream sediment fraction over eastern Sierra Leone. Percentage values in each map are a measure of goodness of fit.
Nichol & Webb (1967, fig. 4; redrawn from Garrett 1966).
294
Fig. 4. Prototype (1965)

SC4020-plotted geochemical map
(Nichol et al. 1966, fig. 1).
making these maps, caused by the non-uniform stretch of the

SC 4020 vellum prints: shrinkage across the width of the paper
plus a slight shrinkage (or stretch) along the length of the
paper, caused by slight variations in the tension maintained by
the rollers which wound the vellum through the camera; in
addition, there was insufficient contrast between the darker plot
symbols and the white vellum for subsequent photographic
reproduction, and sometimes uneven development of some of
the vellum plates as a result of variations in the distribution of
the developer fluid. Two of the map areas (Denbighshire and
Devonshire) were rectangular and these required two vellum
plates each for coverage; the variable stretch frequently produced slight differences where they abutted. Consequently,
although the entire data processing was accomplished in 15 min
of IBM 7094 time on the AWRE installation to produce 236
vellum plates, a further 100 man-hours were required to
produce the final camera-ready copy for the offset-litho plates

(Khaleelee 1969, 121).
For the production of synoptic regional maps, movingaverage smoothing became the standard approach, as it effectively
reduced the inevitable variability (noise) introduced as a result of
field sampling and subsequent chemical analysis (Armour-Brown
& Nichol 1970; Howarth & Lowenstein 1971, 1972). The success
of the Devonshire, Denbighshire and Derbyshire pilot studies led
on to the production of the pioneering regional atlases of the
stream sediment geochemistry.
Geochemical atlases of Northern Ireland and England
and Wales
Whereas earlier data sets had been of the order of a few
hundred to c. 1200 samples in size, the Northern Ireland (Webb
295
In 1968, the Imperial College IBM 7094/1401 installation

had been replaced by a Control Data Corporation (CDC) 6400,
and a CDC 6600 supercomputer formed the nucleus of the
London University Computer Centre (a courier service transmitted card decks, tapes and lineprinter output between the two
establishments). These machines were extremely fast and,
despite precautions taken by the more careful programmers to
prevent numerical truncation and round-off errors in programs
written for the earlier generation of computers, the 60-bit
word-length of the CDC machine brought about great
improvement in numerical accuracy of calculations, because it
enabled much larger exponents for a numerical value to be
held. (This was strikingly illustrated when, as a class demonstration in 1968, a colleague in the ICST Geology Department,
John Ferguson, happened to run a widely-used trend-surface
fitting program on the CDC 6400 using the same data set with
which he had run it on the IBM machine the previous year: the
topography of the contours of the newly-fitted 3rd degree
surface was surprisingly different from its predecessor! Related
computation problems were recognized by Mancey (1980, 155)
as being present in the results of early factor analysis studies in
the literature.)
The program for production of the AGRG lineprinter
geochemical maps, PLTLP1 (Howarth 1971a), was first written
in 1969. Its objective was to be able to process a data set of
unlimited size by proceeding in a line-by-line manner from the
top to bottom of the map area, but only retaining in memory
the data required for calculation of the current strip of
map-cells (each corresponding to the 1/8 in by 1/10 in
(0.32 0.25 cm) printer character-size) and its immediate
neighbours:
i averaging any data points falling into the individual map cells;
ii computing the local moving-average smoothing;
iii gap-filling, i.e. filling-in small holes in the smoothed image
Fig. 5. (a) IBM7090 computation time (min) and (b) Calcomp pen
plotter time (min) to plot maps of values at 100 to 1000 uniformly
randomly distributed locations, in 1969. Redrawn from Khaleelee
(1969, figs 17, 18).
et al. 1973) and England and Wales (Webb et al. 1978)

geochemical atlases involved the production and interpretation
of data sets comprising 18 elements and c. 4 800 samples, and
22 elements and c. 50 000 samples, respectively. This increase in
scope necessitated the development of new approaches to both
analytical quality assurance (discussed below) and map display.
In both cases, the majority of elements in the <80-mesh
fraction of the oven-dried sediment samples were determined
using an ARL 29000B Quantometer (a 40-channel automatic
optical emission spectrometer) linked to an automatic typewriter and an IBM 545 card punch via a Solartron analogue
computer. Arsenic and molybdenum were determined using
rapid chemical procedures and these data were transcribed,
together with the field data records, to punched cards by hand.
As a result of the necessity to produce many large-scale working
maps, and smoothed synoptic maps at 1:1M and 1:2M scales,
rapidly and cheaply, it was decided to use lineprinter maps with
multiple-character overprints to produce a grey-scale image
with up to 10 classes (Howarth 1971a).
while preserving large ones corresponding to unsampled areas,

such as areas underlain by calcareous rocks where there was
little drainage, major bodies of inland water, or and offshore of
the coastline. Hitherto, regional geochemical maps tended to
have a small number of classes, with bounds centered on
logarithmically-increasing concentration intervals. Experimentation with the atlas data sets gave convincing evidence that,
where the concentration range and analytical resolution of the
data values were suitable, the use of 10 classes, with lower
bounds corresponding to the 0, 10, 20, 30, 40, 50, 60, 70, 80
and 90th concentration percentiles (Fig. 8; Webb et al. 1973); or
the 0, 10, 20, 40, 60, 80, 90, 95, 99 and 99.9th percentiles
(Webb et al. 1978) were very effective at portraying regional
variation, the latter being better-suited for elements where the
display of more anomalous values was of interest. Although
percentile-based classes have now been adopted as a standard
method, at that time it was felt that many users of these atlases
might find them too unfamiliar, so only a few examples of these
new-style maps were included in the Northern Ireland (Webb et
al. 1973) and England and Wales (Webb et al. 1978) atlases, in
addition to a set based on the conventional approach using
logarithmic class boundaries.
It was obvious that the quality of the grey-tones of the
lineprinter maps left much to be desired for atlas production.
Fortunately, with the assistance of Peter Ferrer of Seiscom Ltd.,
Sevenoaks, Kent, the production of suitable true grey-scale
images was made possible by using their Dresser LGP 2703
Laser Graphic Plotter system: an LGP 2000 plotter, driven by
an on-line Raytheon 703 computer. This was normally used for
the production of seismic cross-sections and could produce
positive 16-level grey-scale images up to 40 in (101 cm) across,
296
Fig. 6. Prototype SC4020-plotted map

(Dec. 1965) for zinc concentrations
(ppm) in the <80-mesh stream
sediment fraction over Denbighshire,
UK (original is 29.5 22 cm in size).
on dimensionally-stable Kodak 2496 RAR film. The plotting

instructions for the LGP 2703 were generated by an off-line IBM
360 computer from tapes produced at the ICST, using a modified
version of the AGRG plotting program. The rectangular
lineprinter-size cells were retained for the Northern Ireland atlas
(Fig. 9; Webb et al. 1973), but for the England and Wales atlas
(Webb et al. 1978) the program was modified to produce plots
with square cells at the appropriate map-scale. The laser-plotted
map images were then contact-printed, using a suitable half-tone
screen, onto dimensionally-stable ammonia-developed Ozalid
sepia diazo pro-film for production of the final offset-lithography
plates for the atlases (Howarth & Lowenstein 1974). Figure 10
shows the 1976 Chromalin proofs of the unsmoothed and
smoothed gap-filled percentile-based maps for zinc from the
England and Wales atlas (Webb et al. 1978).
Multi-element maps
Experimentation into methods of map production had also
been undertaken by the AGRG in collaboration with the
Natural Environment Research Councils (NERC) Experimental Cartography Unit (ECU) at the Royal College of Art,
London, under the direction of the cartographer David Pelham
Bickmore (19171993). These arrived at multivariate
proportional-length symbols of the multi-arm windrose type
superimposed over a variety of ingenious monochrome geological maps (Rhind et al. 1973). Although various trial maps
based on a portion of the Northern Ireland data set were
produced (Fig. 11), they were deemed, by both AGRG staff
and others in industry to whom they were shown, to be only
suited to large-scale maps, and were thought too expensive to
produce for routine use. An exhaustive series of tests on the
efficacy of their use carried out by the ECU also suggested that
the low level of accuracy in the use of the symbols may
suggest that graduated, rather than proportional symbols are
more suitable for use (Rhind et al. 1973, 118). Nevertheless,

the method was subsequently taken up by the British Geological Survey for production of some of their early large-scale
geochemical atlases (e.g. Institute of Geological Sciences 1978)
and proved successful with their users.
However, colour-combined 3-component multispectral
images were at this time beginning to be used in remotesensing, and it was decided to investigate the applicability of
such methods to regional geochemical maps. In order to
convince somewhat skeptical colleagues, using the Northern
Ireland atlas data, Howarth initially wrote a program to display
a map of three elements using only three concentration
intervals (background, near-anomalous, anomalous) each,
i.e. a total of 27 possible combinations, as the lineprinter
characters AZ, plus an asterisk where all three fell into the
anomalous class. A translucent overlay was then hand-coloured
to produce the final map. Having established the utility of the
method, once the laser-plotter images became available, these
were used to produce subtractive colour-combined maps with
up to 10 levels per component (Webb et al. 1973, 1978). A
colour-combined map for potassium, strontium and chromium
was first shown at the Geochemical Exploration 1972 meeting
in London (Lowenstein & Howarth 1973) and formed the
cover of the Northern Ireland atlas (Fig. 12; Webb et al. 1973).
An economy version was also developed, which simply used a
series of lineprinter-produced black masks and diazo contact
printing, to produce 3-level, 3-component maps (Howarth &
Lowenstein 1976). The England and Wales atlas (Webb et al.
1978) included examples of both two-component (Fig. 13) and
three component (Fig. 14) colour-combined maps.
Anomaly enhancement
In order to assist the recognition of areas of anomalously high
element concentrations in regional maps, the PLTLP1 program
297
Fig. 7. Prototype SC4020-plotted map

(May 1967) for lead concentrations
sediment fraction over Derbyshire, UK
(original is 20.5 25 cm).
was extended in 197273 to include a variety of image-processing

operators: the moving-average smoothing already being used was
a low-pass filter; now the option was added to apply:
i
a high-pass filter (HPF), contrasting a central map-cell with

the mean of a square block of surrounding cells;
ii a picture-frame filter (PFF; Holmes 1966), contrasting the
central cell with the mean of a surrounding square annulus
of cells at some distance away;
iii a probabilistic Kolmogorov-Smirnov filter (KSF; Muerle &
Allen 1968)
which computed if the cumulative frequency distribution of the
values in a square central block of cells was statistically greater
than that of the cumulative distribution of those in a surrounding square annulus at some distance away, based on the
KolmogorovSmirnov test (Kolmogorov 1933; Smirnov 1939).
These were initially investigated using data from the Northern
Ireland atlas, operating on the image grey-levels (Fig. 15),
essentially the equivalent of using a ranked statistics approach
(Howarth 1983).
The PLTLP1 program was subsequently applied in joint
work, directed by Professor George Koch Jr. at the University
of Georgia, USA, under contract to the United States Department of Energy National Uranium Resource Evaluation
(NURE) Program, to data sets from the Hydrogeochemical and

Stream Sediment Reconnaissance (HSSR) phase of the programme for North Carolina, Virginia, and Tennessee (Koch
et al. 1979, 1981a, b; Howarth et al. 1980). For this project,
Howarths original program was converted in 1977/78 by C.Y.
(Sam) Chork, a post-doctoral researcher in AGRG, who had
obtained his PhD at the University of New Brunswick, Canada,
under Professor Gerald (Gerry) Govett (who had obtained his
own PhD at AGRG; 2010), to use the actual concentration
values and automatic selection of percentile class intervals; and
by Steven Mancey, an AGRG doctoral student, to use interactive user-selection of classes. Mancey (1980, 255271) applied
these techniques to the entire England and Wales atlas data set,
with output onto 35-mm microfilm, using the recently-installed
off-line Calcomp 1670 microfilm plotter at the University of
London Computer Centre, with a mapping program (MICMAP) he had written (Mancey 1980, 5761) utilising their
PICPAC microfilm plotting software (Colvill & Kitchingman
1976). The results from all these studies showed the PFF was
more effective at identifying regional geochemical anomalies
than a comparable HPF, and that the KSF produced no better
results than the PFF, while it was computationally much more
time-consuming, owing to its greater complexity of calculations
required.
298
Fig. 8. Trial gap-filled moving-average

lineprinter map for lead concentrations
sediment fraction over Northern
Ireland, UK, using 10, 10-percentile
classes (original is 28.5 22.5 cm).
Upper histogram: frequency distribution
for binned data values; Lower
histogram grey-levels in the map.
In related studies, Steven Earle (1978, 158167) developed a

modified version of the lineprinter plotting program (STRMPLT) to produce large-scale maps for drainage sediment
geochemistry which explicitly took into account the geometry
of the stream segments and drainage basin upstream of each
sample site, so as to produce correctly-weighted smoothed
maps, again plotted using the Calcomp 1670. He also used
geostatistical (Matheron 1963; David 1977) interpolation techniques in detailed soil surveys, and to assess dispersion distances in both stream sediments and water in the Mendip Hills,
Somerset, and wrote an interactive program (GEOSTAT) for
interactive fitting of the semi-variogram to aid this work (Earle
1978, 184205).
DATA ANALYSIS
By the mid-1960s, the advent of the computer-enabled multivariate data analysis became a practical proposition, opening up
the realm of working with geochemical data many elements at
a time, seeking multi-element patterns that might reflect the
regional geology and the presence of mineral occurrences.
Factor analysis
The method used to achieve this was factor analysis, first
introduced as a theoretical concept by Spearman (1904) and
subsequently developed by Thurstone (1947) and other

workers. It aims to reduce the dimensionality of a large data set,
consisting of n samples and p variables, to a much smaller
number (k) of factors, each of which is a linear function of the
p original variables, in such a way that the first of the k factors
accounts for the maximum variability in the data, then decreasingly so as one progresses through to the last (k-th) factor. The
correlations between each of the k factors and the p original
variables (the factor loadings) enable the geological or geochemical significance of the factors to be interpreted. The
factors are extracted in such a way that they themselves are also
correlated to some degree (various criteria are used to achieve
this, such as Kaisers (1958) varimax criterion), thus yielding a
result held to be a more interpretable solution than if the
factors were to remain uncorrelated the result obtained using
the alternative method of principal components analysis
(Hotelling 1933).
John Imbrie, working in the research group of the mathematical geologist, Professor William Christian Krumbein
(190279) at Northwestern University, Illinois, USA, introduced the term vector analysis (Imbrie 1963) to describe an
adapted form of factor analysis in which the final results are
determined in terms of normalised compositions. This produces unit vectors, and the measure of similarity between all of
the n samples is based on the angle () between each pair of
299
Fig. 9. Trial LGP2703 laser-plotted

map for zinc concentrations (ppm) in
the <80-mesh stream sediment fraction
over Northern Ireland, UK, using
empirical (logarithmic) class intervals
(original is 30 27.5 cm).
vectors, the cos coefficient (Imbrie 1963; for which significance levels were given by Howarth 1977). Imbrie (1963) called
this approach Q-mode analysis, in contrast to calculation of
the more familiar correlation coefficient matrix between the p
variables (which he called R-mode analysis). In a conventional
R-mode factor analysis, the first step is to compute the k
uncorrelated principal components, then to retain the first p of
these and rotate them (e.g. using the varimax criterion) to
obtain the final solution. In the Q-mode approach, each of the
k factors corresponds to an actual sample of extreme composition. These are end-members, and the entire data set is thus
represented in terms of the relative contributions of these
end-members, the factor scores increasing from zero to unity
as a samples composition approaches more exactly that of one
of the end-member reference vectors. This approach was first
applied at Northwestern to compositional analysis of variations
in carbonate sediments (Imbrie & Purdy 1962) and to lithostratigraphy (Krumbein & Imbrie 1963). The availability of a
computer program for the IBM 7094/1401 computer system
(Manson & Imbrie 1964) enabled Q-mode factor analysis to be
taken up by Garrett (1967), who was able to successfully apply
it to interpretation of the geochemistry of his regional stream
sediment data from Sierra Leone (Garrett 1966, 142148;
Nichol & Webb 1967; Nichol et al. 1969) (Fig. 16). However,
because of memory-size limitations, Q-mode analysis was at
this time subject to the serious restriction of handling a
maximum of 100 samples.
The relatively small number of samples capable of being
analysed by Q-mode factor analysis program proved a powerful
limitation where the large data sets involved in regional geochemistry were concerned. On completion of his thesis, Garrett
visited Northwestern University on a post-doctoral research
fellowship, where he investigated the utility of R-mode factor

analysis as an alternative to the Q-mode approach, based on a
study of stream sediment samples from the Nimini Hills schist
belt of Sierra Leone (Garrett & Nichol 1969). The computer
programs developed by Garrett (1966, 1967) were modified by
Khaleelee in 1969 for use within the AGRG.
In an extensive study of the regional drainage sediment
geochemistry of parts of SW England, Wales, and the English
Peak District, Khaleelee (1969, 204445) used both the R- and
Q-mode approaches. He concluded that, quite apart from the
severe limitation on the number of samples to which the
Q-mode method could be applied, the fact that compositions
of Q-mode end-members differed markedly between different
subsets of the same regional data set made the R-mode
approach far better suited to analysis of large regional data sets.
He verified this by means of a detailed comparison of the
results of R-mode analysis applied to the geochemistry of
bedrock, soil, and stream sediment samples (n = 104, 287 and
198, respectively) from the Onecote district of NE Staffordshire, analysed for 15 elements (Khaleelee 1969, 326445).
Tragically, Khaleelee lost his life in a helicopter accident, shortly
after taking up a new post in Australia in 1970.
The R-mode approach was subsequently used: by Ashlyn
Armour-Brown (1971; Armour-Brown & Nichol 1970) with
regional geochemical data from Zambia; by Colin Summerhayes (1971, 1972) to interpret the geochemistry of phosphatic
continental margin sediments from NW Africa; and by
Geoffrey Glasby (Glasby et al. 1974) in a study of the
geochemistry of manganese concretions from the Indian ocean.
This approach thereafter became an established tool in AGRG
work. Mancey & Howarth (1978) used varimax-rotated factors
of the BoxCox (1964) transformed (see below) England and
300
301
Fig. 10. Chromalin trial prints (Jan. 1976) of the LGP2703 laser-plotted percentile-based maps for zinc concentrations (ppm) in the <80-mesh
stream sediment fraction over England and Wales: (a) unsmoothed; (b) moving-average smoothed and gap-filled (originals are 27 40 cm).
302
Fig. 11. Detail of 1972 trial windrose

map for copper (left), zinc (vertical)
and lead (right) concentrations (ppm)
in the <80-mesh stream sediment
fraction over a part of Northern
Ireland. Geological boundaries (grey);
faults (gold). AGRG-NERC ECU trial
map 5a (original is 40 26 cm).
Fig. 12. Cover of the Northern Ireland

geochemical atlas (Webb et al. 1973)
showing subtractive colour-combined
map for potassium (increasing
concentrations (ppm), magenta),
chromium (increasing, yellow) and
strontium (increasing, cyan) in the
<80-mesh stream sediment fraction
(original is 35 27 cm).
303
Fig. 13. Subtractive colour-combined

map of molybdenum (increasing
concentrations (ppm), magenta) and
copper (decreasing concentrations
(ppm), cyan) in the <80-mesh stream
sediment fraction over England and
Wales. (Webb et al. 1978, fig. 67;
original is 30 39 cm). Bovine
hypocupraemia is particularly associated
with areas of black shale in which the
copper:molybdenum ratios are low
(Leech 1984; see Thornton (2010) for
discussion).
Wales atlas data set to produce a pair of colour-combined maps

which together embodied 68% of its total variance; see Mancey
(1980, 182196) for detailed interpretation. These were printed
from enlargements of grey-scale images generated on 35-mm
microfilm using Manceys MICMAP program (Fig. 17).
Empirical (potential function) discriminant analysis
Nevertheless, the increasingly large size of the geochemical data
sets being produced within AGRG made it imperative that
additional multivariate methods capable of giving further
insight into relationships between sample compositions were
made available for AGRG workers. Two approaches were
initially investigated: discriminant analysis, which allocates
samples into pre-defined compositional groups based on a
training set of samples for each group; and cluster analysis,
which seeks to allocate samples to natural groups of similar
composition. Although the classical techniques for accomplishing these objectives had become available to geologists since
the mid-1960s, largely through the computer programs disseminated by the Kansas Geological Survey Computer Bulletin and
Special Distribution Publication series, and textbooks, such as
Davis & Sampson (1973), new methods were being developed
by workers in pattern recognition and electrical engineering. The
first of these to be implemented in AGRG, in 1969, was the

empirical discriminant function (EDF; Specht 1967), which
combined the use of gaussian potential (kernel) functions with a
Bayesian classifier. This proved a very effective classification
method and had the added advantage that samples sufficiently
unlike any of the training sets were classified as unknown,
rather than being forced into one of the pre-defined groups, as
was the case with the classical linear discriminant function
approach. In the AGRG implementation (Howarth 1973a) for
the CDC 6600, sequential backwards selection (BAKWRD
program) proved the best method to find both the optimum
combination of elements (Howarth 1973c, 1974) and the best
value of the smoothing parameter for the potential function, on
which to base the subsequent classification process. The results
of the classification itself (PRSYS1 program, written by Howarth
in 1969) were plotted as a map using an off-line Calcomp pen
plotter. The practicality of the method was initially established
using a lithogeochemical data set (Howarth 1971b), then the
Devon atlas data (Howarth 1971c, 1972), and in a more
exhaustive study by Rolando Castillo-Muoz (1973; CastilloMuoz & Howarth 1976) (Fig. 18). Fong Tai Loon, an MSc
student in the Department of Computing and Control, supervised by Howarth and Dr. Francis N. Parr, examined the utility
304

map of lead (increasing, magenta),
copper (increasing, yellow) and zinc
(increasing, cyan) concentrations in the
<80-mesh stream sediment fraction
over England and Wales. (Webb et al.
1978, fig. 63; original is 30 39 cm).
of the Divergence (Bhattacharyya 1943) and Bhattacharyya

distance (Jeffreys 1946; Kailath 1967) criteria as measures of
class separability in a lithogeochemical context, finding divergence to be a useful separability criterion, but it required fairly
large training set sizes (Fong 1975). Although the EDF method
was found to be very effective (particularly in attracting attention
to data unlike anything present in the training sets), it became
evident that there was not much demand from AGRG users for
this approach in analysis of the regional atlas data and this work
was not pursued.
Cluster analysis
Hitherto, virtually all the cluster analysis methods used in
geological studies were based on agglomerative hierarchical
clustering methods (Imbrie & Purdy 1962; Parks 1970). These
were commonly used with relatively small data sets and
produced a hierarchical tree structure, with the most
compositionally-similar samples grouped at the tips of the
branches (Fig. 19). John Sammon Jr. at the Rome Air
Development Centre (RADC), New York, developed an alternative non-linear mapping (NLM) algorithm (Sammon 1969).
This projected the positions of points in high-dimensional (i.e.
multivariate) space onto a plane, adjusting the locations of the
points in the plane until the matrix of their inter-point distances

was as close as possible to that for the equivalent points in the
original high-dimensional space. Howarth (1973b) compared
the results of the NLM method with those obtained in a
number of geological studies that had previously used hierarchical cluster analysis, and found it to be extremely effective.
Application to a variety of applied geochemical data sets
(Howarth 1973b, c; Castillo-Muoz 1973; Howarth et al. 1977;
Howarth & Johnson 1977) showed it could successfully delineate natural, i.e. separated, clusters or reveal a compositional
continuum, where such existed, as well as identify outliers. In a
set of marine manganese nodule data classified by hierarchical
clustering into a number of discrete groups (Fig. 19; Glasby et al.
1974), NLM shows that there is actually a complete lack of
discrete natural clusters (Fig. 20) although division of the cloud
of points in the NLM plot produces viable groupings in a spatial
context (Fig. 21). The success of the NLM method was such that,
with the permission of the RADC, AGRG (via the ICST
Computer Centre) distributed the program for many years.
In order to cope with cluster analysis of large data sets one
could use a non-hierarchical method, such as NLM, or the
ISODATA clustering algorithm (Ball & Hall 1965), which had
first been evaluated in AGRG with data from the Denbighshire
305
Fig. 15. Trial KolomogorovSmirnov

filtered map for zinc concentrations
sediment fraction over Northern
Ireland. (%) Unsampled areas; (*)
anomalous map-cells; (m) associated
with manganese scavenging; (z)
associated with mineralization. (Original
is 28.5 23 cm).
atlas by David Crisp (1974), using a subset of data (subsampled

at a broadly even spatial density) to identify a number of
geochemical groups, which were then used as training sets for
subsequent EDF classification of the entire data set. In this way,
Mancey (1980, 207254) was able to achieve a regional classification of the entire England and Wales atlas data into nine
spatially-coherent and geochemically-meaningful categories, plus
a small number of outlying (anomalous) data points.
Data transformation
It has been known for many years that many geochemical
elements have concentrations that tend to have a skewed
(asymmetrical) frequency distribution, usually with the distribution extended towards higher concentration values. From the
earliest days of geochemistry (Ahrens 1954; Hawkes & Webb
1962), it was assumed that logarithmic transformation of such
data provided an adequate transformation to symmetry, thereby
adequately approximating a normal frequency distribution.
However, as ever-larger regional data sets were investigated in
AGRG, it became apparent that distributions existed with
positive (or occasionally negative) skewness which could not be
symmetrised by simple log-transformation. Howarth & Earle
(1979) wrote a program (MINSK) that implemented the power
transform to normality of Box & Cox (1964).
y=
sx 1d , 0
ln sxd, = 0
x . 0,
where x is the set of original observations; is the power

coefficient; and y is the transformed data set, by minimising an
objective function of skewness and kurtosis (Fig. 22). While in
many cases the transformed distribution (y) is still not a perfect
normal distribution, it is generally symmetrical, which is the
most important thing. Mancy (1980, 157182) and Mancey &
Howarth (1978, 1980) illustrated its efficacy with principal
components analysis of the England and Wales atlas data.
Turner (1986, 181253) found data transformation, and the
BoxCox transform in particular, to be equally useful with a

503-sample, 24-element British Geological Survey stream sediment data set from the Dalradian of the MorayBuchan area of
NE Scotland (British Geological Survey 1991), as did Neil
Coward (1986) with a suite of marine geochemical data from
the SW Pacific.
The problem of induced correlation in percentaged (constantsum) data has been recognised for many years. The first attempt
to address this in the geosciences was made by the petrographer
Felix Chayes (1971; Howarth 2004). In 1982, the statistician,
John Aitchison, proposed a solution based on application of
the logratio transform (Aitchison 1982, 1986), which transforms a set of percentaged variables, x1, . . . xk (with the
provisio that all xi > 0 and xi,i=1,k =100) to a new set of
variables y1, . . ., y(k1) where yj = loge[xi /x(k1)]; j=1, k1.
In recent years, this transform has been widely promoted for
use in the earth sciences (most recently by Buccianti et al. 2006).
Howarth tried on several occasions to apply the logratio
transform as a precursor to multivariate analysis of various
AGRG geochemical data sets but found that, in practice, the
results were often geochemically uninterpretable, and that
whenever xi <0, the transform resulted in serious outlier
problems. More research, with a wide variety of data sets, is
required on this subject.
In recent years, Dennis Helsel (Helsel & Hirsch 1992,
357376; Helsel 2005), of the US Geological Survey Water
Resources Division, has provided new approaches to the
problem of dealing with censored, i.e. below analytical detection limit (dl), data. In the days of GPRC/AGRG, any such
values in a data set were routinely set to the appropriate dl/2
for the purposes of statistical analysis and, in practice, it is
doubtful whether it brought any geologically significant bias
into the geochemical interpretations arrived at.
Robust methods
The deleterious effect of outliers present in a data set, leading to
bias in calculation of the mean, inflated variances, spurious
306
Fig. 16. Q-mode factor analysis maps for the geochemistry of the <80-mesh stream sediment fraction, Nimini Hills, eastern Sierra Leone: (upper
left) vector 1, (upper right) vector 2, (lower left) vector 3, end-members shown by solid dot in each case; (lower right) communality. Garrett
(1966, fig. 52; original is 18 23 cm).
correlation coefficients, and so on, has long been recognised.

However, it was only in the mid-1970s that methods which
could automatically down-weight the effects of outliers to
obtain robust estimates of both univariate statistics (such as

the mean and standard deviation) and the covariance matrix
or correlation matrix (which underpin principal components,
307

map of the first three varimax rotated
factors of the BoxCox transformed
geochemistry of the <80-mesh stream
sediment fraction over England and
Wales. This map accounts for 64% of
the variation of the entire data set
(Mancey & Howarth 1978, sheet 1;
original is 14 18.5 cm).
factor and classical linear discriminant analysis) began to be

developed (Andrews et al. 1972; Huber 1981), but it was a
while before their potential utility in applied geochemistry was
pointed out (Campbell 1982; Garrett 1983; Howarth 1984).
Robust correlation matrices were calculated by Leech (1984),
using software developed by the statistician Norman Campbell of the Commonwealth Scientific and Industrial Research
Organisation, Australia (Campbell 1980), who had taken his
doctorate in the Statistics Department at Imperial College.
Turner (1986) implemented robust versions of both principal
components analysis and ridge regression software, which
proved immensely useful to AGRG research subsequently
(e.g. Coward & Cronan 1987).
The extensive study of the application of robust principal
components and factor analysis by Turner (1986, 434548)
concluded that factor analysis is preferable to principal components analysis because the use of a small number of factors
forces a grouping of the variables, reducing the dimensionality
of the problem and increasing interpretability. The greatest
anomaly contrast is obtained using untransformed data; prior
BoxCox transformation of the data is best if background
associations and relationships are to be revealed.
Data displays
The arrival of the interactive statistical package MINITAB
(Ryner et al. 1976) on the Colleges distributed terminal system
enabled routine data analysis to be used by both staff and
students in AGRG and, because it embodied much of the
recent thinking on graphics-based Exploratory Data Analysis
(Tukey 1977), box-plots, quantile-quantile plots and other
graphical displays were soon taken up in AGRG work (Earle
1982; Howarth 1984; Turner 1986). Earle (1982, 168183)
developed a program (GIRAF) for the interactive dissection of
probability plots into constituent sub-populations. Turner
(1986, 166179) showed the utility of multivariate probability
plots, based on the cube-root of the Mahalanobis distance
(Healy 1968; Campbell 1979) for detection of multivariate
outliers.
Use of two new multivariate graphics to portray multielement sample compositions for the purpose of comparison
were extensively investigated by Turner (1986), using the
MorayBuchan data set: (i) Chernoff faces (Chernoff 1973),
which assigns features of the human face (e.g. position/style of
eyes, eyebrows, nose, mouth) to different variables to make
308
Fig. 18. Empirical discriminant classification map of the geochemistry of Pb, Ga, V, Mo, Cu, Zn, Ti, Ni, Co, Mn, Cr and Fe2O3 in the <80-mesh
stream sediment fraction over Denbighshire, UK. Training areas are for five lithologies are boxed in; samples assigned to unknown group are
shown solid, of these samples 62% were related to known mineralized areas. (Castillo-Muoz & Howarth 1976, fig. 6).
Fig. 19. (right) Q-mode weighted pair

group dendrogram based on agglomerative
cluster analysis of the normalized Mn, Fe,
SiO2, Ti, V, Cr, Co, Ni, Cu, Zn, Zr, Mo
and Pb concentrations of 180 manganese
nodules and crusts from the western
flanks of the Carlsberg Ridge, Indian
Ocean; rectangle length is proportional to
number of samples/group; (left)
proportion of massive or granular crusts,
or nodules in each cluster-analysis group;
(centre) proportion of collection sites in
massif or ridge settings (Glasby et al.
1974, fig. 4).
Fig. 20. Nonlinear mapping onto 2-dimensions of the geochemistry

of manganese nodules from the Pacific Ocean on the basis of
normalised Mn, Fe, Co, Ni, Cu, Pb and Ti. The compositional
continuum is divided into 6 classes for purposes of interpretation.
(Redrawn from Glasby et al. 1977, fig. 1).
comparative displays, each face corresponding to a sample

composition; and (ii) KleinerHartigan (KH) trees (Fig. 23a;
Kleiner & Hartigan 1981; Garrett 1983).
Turner found Chernoff faces to be unsatisfactory, in that
much work was required to find the best facial features to
which a particular element should correspond (which implied
that the technique could be used to deliberately distort results
by emphasis or suppression of any variable) and that, in order
to achieve the best visual emphasis for any anomalous samples,
the analyst must have prior knowledge of which they are
(Turner 1986, 346, 355). The KH trees were far more
effective at portraying the multi-element sample compositions,
using a tree morphology based on the hierarchical cluster
analysis of a robust correlation matrix; branch-lengths are
drawn proportional to the concentrations of the elements to
which each corresponds (Fig. 23a). It was likened to performing a visual factor analysis. Although the physical size of the
plotted trees made it difficult to use them in a spatial context
with a large data set by plotting them at their corresponding
sample location on a map, nevertheless, side-by-side comparison of the trees laid out as a graphic table, in numerical order
of sample numbers (Fig. 23b), proved quite satisfactory. KH
trees were also extensively used by Coward (1986).
Ridge regression
Linear multivariate regression analysis has long been used in
applied geochemisty to correct for the effects of element
interaction (e.g. enhancement of element concentration levels
as a result of iron and manganese scavenging), and to empirically explain the behaviour of an element in terms of others.
Emphasis is often placed on the regression residuals (the
observed concentration minus that predicted by the fitted
regression model) as a means of identifying anomalous behaviour. For example, Moorby et al. (1987) fitted quadratic trend
surfaces (see above) to the residuals of Pb and Zn as predicted
by separate regression models fitted to the suite of elements
{Ca, Mg, Al, Fe, Mn} in order to delineate broad trends of
background variation in carbonate-rich marine sediments (and
hence the spatial setting of anomalous concentration values) in
309
two areas of the continental shelf of Greece. Stable anomaly

patterns were shown to exist off the Sounion Peninsula, a
known area of mineralisation.
However, where it is crucial that the relative importance of
a number of elements in controlling the behaviour of another is
determined, Hoerl & Kennard (1970) recognised that whenever
the supposedly independent predictors in a linear regression
model are correlated (as is always the case where geochemical
data are concerned) it will lead to the coefficients of some
predictors in the fitted regression equation which will be too
large, and may even be of the wrong sign. Consequently, they
introduced the ridge regression method to overcome such
undesirable features. The existence of their work was first
brought to the attention of geologists by Jones (1972). It was
programmed for use in AGRG by Turner in 1979 (Turner
1980), and the RIDGE11 program was subsequently extended
by Howarth in 198182, during work on the NURE contract
with the University of Georgia (see above; Howarth 1984;
Howarth & Koch 1986) to include interactive selection of the
ridge parameter, choice of variables, progressive deletion of
outliers, and resubstitution of the entire data set, using the final
fitted equation, to obtain the residuals. The method was
extensively investigated in an exploration context by Philip
Davies (1983), and proved to be equally helpful in deriving
interpretational models in relation to the occurrence of bovine
hypocupraemia (Leech et al. 1983; Leech 1984), and in the
analysis of a suite of marine mineral exploration data from the
southwest Pacific (Coward 1986; Coward & Cronan 1987).
Turner (1986, 549593), using Ba, Pb and Zn as response
variables for the 23-element MorayBuchan data set, demonstrated the efficacy of robust ridge regression, and showed that
anomaly (regression residual) contrast was maximised if
untransformed data were used.
Other work
Miscellaneous applications have included: analysis of variance
(ANOVA) to quantify variability attributable to both field
sampling and analysis (Garrett 1969; Howarth & Lowenstein
1971, 1972) and in the doctoral thesis by Richard Duff (1975),
and the application of robust ANOVA by Ramsey et al. (1992);
development of statistically-based criteria for the recognition of
uraniferous granitoids from NURE HSSR data (Koch et al.
1981a, b; Howarth et al. 1981); and the application of numerical
modelling in vapour geochemistry by Ruan Tianjian (Ruan
1981; Ruan et al. 1985a, b). In more recent years, Geographical
Information Systems have been applied in studies of
environmental- and urban-geochemistry by workers in the
Environmental Geochemistry Research Group, the successor
to AGRG at Imperial College (Tristan-Montero 2000; Tristan et
al. 2000; Thums & Farago 2001; Thums 2003; Li et al. 2004;
Appleton et al. 2008).
ANALYTICAL QUALITY ASSURANCE
The development of analytical methods and related quality
assurance and interpretation methods in the GPRC and AGRG
are discussed by Thompson (2010) but, for the sake of
completeness, brief details are also included here. As was
mentioned in the Introduction, Cravens statistical series
approach continued to be used into the 1970s (Stanton 1966;
James 1970), but it came to be recognized that the low- and
high-concentration end-members of a statistical series might
not be representative, so far as their nature and matrix were
concerned, of the field samples being analysed, and that the
method could only provide either an estimate of analytical
310
Fig. 21. Spatial disposition in the

Pacific Ocean (Lambert equal-area
projection) of the 6 classes from the
nonlinear mapping of Fig. 20. (Redrawn
from Glasby et al. 1977, fig. 3).
Fig. 22. Comparison of the BoxCox

transform in reducing skewness (s) and
kurtosis (k; shown as k) for a data set
with the same parameters for the
untransformed and log-transformed data
values (Howarth & Earle 1979, fig. 8).
precision (repeatability) at a particular concentration, or an

average precision value over the concentration range.
Thompson & Howarth (1973, 1976, 1978), Howarth &
Thompson (1976), and Thompson (1978, 1981, 1983), developed an alternative approach, based on duplicate analysis of
randomised splits of routine field samples in which it was
311
Fig. 23. (a) KleinerHartigan (KH)

tree morphology for the
MorayBuchan, Scotland, stream
sediment data set based on Wards
(1963) agglomerative clustering
algorithm applied to a robust
correlation matrix of BoxCox
transformed data (redrawn from Turner
1986, fig. 7.76); (b) Examples of KH
trees for actual samples from the
MorayBuchan data set (portion of
Turner 1986, fig. 7.102).
assumed: (i) that analytical error could generally be well

modelled by the normal distribution (Thompson & Howarth
1980); and (ii) that analytical precision varied as a linear
function of concentration in the analytical system (Thompson
1988) which, it turns out, was also assumed by Jenkins in his
unpublished (1959?) manuscript mentioned p. 290. The dupli-
cate analysis method rapidly became established within AGRG,

alongside the use of classical Shewhart (1931) control charts to
control analytical batch performance through the monitoring of
analyte concentration levels in splits of long-term house reference materials (Thompson 1981, 1983). The Thompson
Howarth chart, as it became named, was subsequently adopted
312
by the wider geochemical and chemical community (e.g. Analytical Methods Committee 2002) and their approach continues
to be extended in scope (e.g. Stanley 2006; Stanley & Lawie
2008).
In other applications, simulation and regression techniques
have been applied to evaluation of matrix correction and
interference effects (Howarth 1973d; Thompson et al. 1979)
and to the comparison of analytical accuracy between analytical
methods (Thompson 1982). More recently, robust ANOVA
has been used to determine the magnitude of analytical variance
in relation to other sources of variance in geochemical data
(Ramsey et al. 1992).
When John Webb initiated the pioneering series of multielement multi-purpose geochemical atlases in the mid-1960s,
there was inevitable trade-off between the analytical method
used, expected analytical precision, and rapidity of turn-round;
this was not what traditional geochemists were used to, and the
matter proved controversial. AGRG staff had to justify this
new approach (Howarth & Lowenstein 1971, 1972; Webb &
Thompson 1977; Webb et al. 1978). Even today, despite
considerable advances in analytical methods, such a fitness-forpurpose approach to analysis requires explanation (Thompson
& Fearn 1996; Fearn et al. 2002).
Looking back now, it is probably impossible for younger
geochemists to realise just how difficult it was, not only to
implement many of the statistical techniques, where we were
breaking new ground in applied geochemistry, but to convince
potential users of the utility of the results. In a broader
perspective, Garrett et al. (2008) reviewed the development of
international geochemical mapping to date; it is pleasing to
think that AGRG pioneered many of the methods that subsequently became adopted.
The development and implementation of the computer-based
methods over the years described here was enabled by many bodies.
We principally have to thank the Department of Scientific and
Industrial Research and its successor, the Natural Environment
Research Council in Britain for their support to AGRG over many
years; other contributions have come from the Anglo American
Corporation (South Africa) Ltd.; the Institute of Geological
Sciences/British Geological Survey; Roan Selection Trust Technical
Services; Sierra Leone Geological Survey; Ministerio de Economia,
Industria y Comercio de Costa Rica; Wolfson Foundation; and the
U.S. Department of Energy, National Aeronautics and Space
Administration, and Rome Air Development Center (New York).
We are grateful to them all for their assistance, whether through
research contracts, support for studentships, or other help. The
authors are most grateful to the Editor, Gwendy Hall, and the
Association of Applied Geochemists for their assistance with the
funding of the colour illustrations in this paper.
REFERENCES
A, L.H. 1954. The log-normal distribution of the elements (A fundamental law of geochemistry and its subsidiary). Geochemica et Cosmochimica
Acta, 5, 4973; 6, 121131.
A, J. 1982. The statistical analysis of compositional data. Journal of the
Royal Statistical Society, B44, 139177.
A, J. 1986. The Statistical Analysis of Compositional Data. Chapman and
Hall. London and New York.
A, H. 1978a. A method of bivariate interpolation and smooth surface
fitting for irregularly distributed data points. ACM Transactions on Mathematical
Software, 4, 148159.
A, H. 1978b. Algorithm 526. Bivariate interpolation and smooth surface
fitting for irregularly distributed data points. ACM Transactions on Mathematical
A, H. 1979. Remark on Algorithm 526. ACM Transactions on Mathematical
A, P. & K, W.C. 1962. Secondary trend components in the Top
Ashdown Pebble Bed: A case history. Journal of Geology, 70, 507538.
ANALYTICAL METHODS COMMITTEE 2002. A simple fitness-forpurpose control chart based on duplicate results obtained from routine test
materials. Analytical Methods Committee Technical Brief no. 9, at the website:
www.rsc.org/Membership/Networking/InterestGroups/Analytical/
AMC/TechnicalBriefs.asp
A, D.F., B, P.J., H, F.R., H, P.J., R, W.H. &
T, J.W. 1972. Robust estimates of location. Survey and advances. Princeton
University Press, Princeton, NJ.
A, J.D., R, B.G. & T, I. 2008. National scale
estimation of potentially harmful elements background concentrations in
topsoil using parent material classified soil:stream sediment relationships.
Applied Geochemistry, 23, 25962611.
A-B, A. 1971. Provincial and regional geochemical studies in Zambia.
Unpublished PhD thesis, University of London, UK.
A-B, A. & N, I. 1970. Regional geochemical reconnaissance and the location of metallogenic provinces. Economic Geology, 65,
312330.
B, G.H. & H, D.J. 1965. ISODATA, a novel method of data analysis
and pattern classification. Stanford Research Institute, Menlo Park, CA.
Research Report, AD-699616, April 1965.
B, P. & D, B.R. 1955. Numerical weather map analysis. Tellus,
7, 1660.
B, A. 1943. On a measure of divergence between two
statistical populations defined by their probability distributions. Bulletin of
the Calcutta Mathematical Society, 35, 99109.
B, G.E.P. & C, D.R. 1964. An analysis of transformations. Journal of the
Royal Statistical Society, B26, 211252.
B G S. 1991. Regional geochemistry of the East Grampians
area. British Geological Survey, Keyworth, Nottingham.
B, A., M-F, G. & P-G, V. (eds) 2006.
Compositional Data Analysis in the Geosciences. From Theory to Practice. Geological Society, London, Special Publication, 264.
C, N.A. 1979. Canonical variate analysis: Some practical aspects. Unpublished PhD thesis, University of London, UK.
C, N.A. 1980. Robust procedures in multivariate analysis. I. Robust
covariance estimation. Applied Statistics, 29, 231237.
C, N.A. 1982. Statistical treatment of geochemical data. In: S,
R.E. (ed.) Geochemical Exploration in Deeply Weathered Terrain. CSIRO
Institute of Energy and Earth Resources, Floreat Park, WA, 141144.
C-M , R. 1973. Application of discriminant and cluster analysis to regional
geochemical surveys. Unpublished PhD thesis, University of London, UK.
C-M , R. & H, R.J. 1976. Application of the empirical
discriminant function to regional geochemical data from the United
Kingdom. Bulletin of the Geological Society of America, 87, 15671581.
C, P.E. 1998. A history of modern computing. MIT Press, Cambridge, MS.
C, F. 1971. Ratio Correlation. A Manual for Students of Petrology and
Geochemistry. The University of Chicago Press, Chicago and London.
C, H. 1973. The use of faces to represent points in K-dimensional
space graphically. Journal of the American Statistical Association. 68, 361368.
C, R. & K, P.G. 1976. Digital image processing on a microfilm
plotter. Unpublished report, University of London Computer Centre,
London.
C, R.N. 1986. A statistical appraisal of regional geochemical data from the
south-west Pacific for mineral exploration. Unpublished PhD thesis, University of
London, UK.
C, R.N. & C, D.S. 1987. A statistical evaluation of geochemical
data in regard to bedrock and placer mineral exploration in the S.W.
Pacific. Marine Mining, 6, 205221.
C, C.A.U. 1954. Statistical estimation of the accuracy of assaying.
Transactions of the Institution of Mining & Metallurgy, London, 63, 551563.
C, D.A. 1974. Application of multivariate methods to regional geochemistry: the
evaluation of a new technique. Unpublished MSc thesis, University of London,
UK.
D, M. 1977. Geostatistical ore reserve estimation. Elsevier, Amsterdam.
D, P.R. 1983. Geochemical applications of ridge regression for tinmineralised
granitoids. Unpublished MSc thesis, University of London, UK.
D, J.C. & S, R.J. 1973. Statistics and data analysis in geology. John
Wiley & Sons, New York.
D, W.J. & M, F.J. 1957. Introduction to Statistical Analysis. 2nd edition.
McGraw-Hill Book Co, New York.
D, J.R.V. 1975. Variability in some stream sediment geochemical data from
Australia. Unpublished PhD thesis, University of London, UK.
E, S.A.M. 1982. Geological interpretation of the geochemistry of stream sediments,
waters and soils in the Bristol district, with particular reference to the Mendip Hills,
Somerset. Unpublished PhD thesis, University of London, UK.

E, S.A.M. 1978. Spatial presentation of data from regional geochemical
stream surveys. Transactions of the Institution of Mining and Metallurgy, London,
B87, 6165.
F, T., F, S.A., T, M. & E, S.L. 2002. A decision
theory approach to fitness for purpose in analytical measurement. The
Analyst, 127, 818824.
F, T.L. 1975. Feature selection in multiclass pattern recognition. Unpublished
MSc thesis, University of London, UK.
G, R.G. 1966. Regional Geochemical Reconnaissance of Eastern Sierra Leone.
Unpublished PhD thesis, University of London, UK.
G, R.G. 1967. Two programs for the factor analysis of geologic and remote
sensing data. National Aeronautics and Space Administration, Northwestern
University Report, 12.
G, R.G. 1969. The determination of sampling and analytical errors in
exploration geochemistry. Economic Geology, 64, 568569.
G, R.G. 1983. Opportunities for the 80s. Mathematical Geology, 15,
385398.
G, R.G. & N, I. 1967. Regional geochemical reconnaissance in
eastern Sierra Leone. Transactions of the Institution of Mining & Metallurgy,
London, B76, 97B112.
G, R.G. & N, I. 1969. Factor analysis as an aid in the
interpretation of regional geochemical stream sediment data. In: C,
F.C. (ed.) Proceedings of the International Geochemical Exploration Symposium
(April 1720, 1968, Colorado School of Mines, Golden, Colorado).
Quarterly of the Colorado School of Mines, 64, 245264.
G, R.G., R, C., S, D.B. & X, X. 2008. From geochemical prospecting to international geochemical mapping: a historical overview. Geochemistry: Exploration, Environment, Analysis, 8, 205217.
G, G.P., T, J.S. & H, R.J. 1974. Geochemistry of manganese concretions from the northwest Indian Ocean. New Zealand Journal of
Science, 17, 387407.
G, J.G.S. 2010. Early years in the Geochemical Prospecting Research
Center, Imperial College of Science and Technology, London: exploration
geochemistry in Zambia in the late 1950s; a personal recollection.
Geochemistry: Exploration, Environment, Analysis, 10, 237249.
H, G. 1962. Geochemical dispersion patterns and their relation to bedrock geology
in the Nyawa area, N. Rhodesia. Unpublished PhD thesis, University of
London, UK.
H, H.E. & W, J.S. 1962. Geochemistry in Mineral Exploration. Harper &
Row, New York.
H-R, B.H. 1968. Geochemical dispersion of tin in marine
sediments. Mounts Bay, Cornwall. Unpublished PhD thesis, University of
London, UK.
H, M.J.R. 1968. Multivariate normal plotting. Applied Statistics. 17,
157161.
H, D.R. 2005. Nondetects And Data Analysis. Statistics for Censored Environmental Data. Wiley-Interscience, John Wiley, Hoboken, NJ.
H, D.R. & H, R.M. 1992. Statistical Methods in Water Resources.
Studies in Environmental Science, 49. Elsevier, Amsterdam, London and
New York.
H, A.E. & K, E.W. 1970. Ridge regression: biased estimation for
nonorthogonal problems. Technometrics, 12, 5567, 6982.
H, W.S. 1966. Automatic photointerpretation and target location.
Proceedings of the IEEE, 54, 16791686.
H, H. 1933. Analysis of a complex of statistical variables into
principal components. Journal of Educational Psychology, 24, 417441,
498520.
H, R.J. 1971a. FORTRAN IV program for grey-level mapping of
spatial data. Mathematical Geology, 3, 95121.
H, R.J. 1971b. An empirical discriminant method applied to sedimentary rock classification from major-element geochemistry. Mathematical
Geology, 3, 5160.
H, R.J. 1971c. Empirical discriminant classification of regional
stream-sediment geochemistry in Devon and east Cornwall. Transactions of
the Institution of Mining and Metallurgy, London, B80, 142149.
H, R.J. 1972. Empirical discriminant classification of regional
stream-sediment geochemistry in Devon and east Cornwall. Discussion.
Transactions of the Institution of Mining and Metallurgy, London, B81,
115119.
H, R.J. 1973a. FORTRAN IV programs for empirical discriminant
classification of spatial data. Geocom Bulletin, 6, 131.
H, R.J. 1973b. Preliminary assessment of a nonlinear mapping
algorithm in a geological context. Mathematical Geology, 5, 3957.
H, R.J. 1973c. The pattern recognition problem in applied geochemistry. In: J, M.J. (ed.) Geochemical Exploration 1972. Institution of Mining
and Metallurgy, London, 259273.
313
H, R.J. 1973d. Monte Carlo simulation of matrix correlation effects.

The Analyst, 98, 777781.
H, R.J. 1974. The impact of pattern recognition methodology in
geochemistry [Abstract]. Proceedings of the Second Joint Conference on Pattern
Recognition. Copenhagen, August 1974, 411412.
H, R.J. 1977. Approximate levels of significance for the cos theta
coefficient. Computers & Geosciences, 3, 2530.
H, R.J. 1983. Mapping. In: H, R.J. (ed.), Statistics and data
analysis in geochemical prospecting. Elsevier, Amsterdam, 111205.
H, R.J. 1984. Statistical applications in geochemical prospecting: A
survey of recent developments. Journal of Geochemical Exploration, 21, 4161.
H, R.J. 2004. Not just a petrographer: The life and work of Felix
Chayes (19161993). Earth Sciences History, 23, 343364.
H, R.J., C, D.S. & G, G.P. 1977. Non-linear mapping of
regional geochemical variability of manganese nodules in the Pacific
Ocean. Transactions of the Institution of Mining and Metallurgy, London, B86, 48.
H, R.J. & E, S.A.M. 1979. Application of a generalised power
transform to geochemical data. Mathematical Geology, 11, 4558.
H, R.J. & J, R.W. 1977. Multi-element trends of variation of
the South Bismark Sea rocks as shown by the nonlinear mapping
algorithm. In: J, R.W. (ed.) Distribution and major-element chemistry of
late Cainozoic volcanoes at the southern margin of the Bismark Sea, Papua New
Guinea. Australian Bureau of Mineral Resources, Canberra. 162170.
H, R.J. & K, G.S. jr 1986. Problems of using rock-volume data in
predictive resource studies. Economic Geology, 81, 617626.
H, R.J., K, G.S., C, C.Y., C, R.H. &
S, J.H. 1980. Statistical map analysis techniques applied to
regional distribution of uranium in stream sediment samples from the
southeastern United States for the National Uranium Resource Evaluation
program. Mathematical Geology, 12, 339366.
H, R.J., K, G.S. jr, P, J.A. & L, R.K. 1981. Identification of uraniferous granitoids in the USA using stream sediment
geochemical data. Mineralogical Magazine, 44, 455470.
H, R.J. & L, P.L. 1971. Sampling variability of streamsediments in broad-scale geochemical reconnaissance. Transactions of the
Institution of Mining and Metallurgy, London, B80, 363372.
H, R.J. & L, P.L. 1972. Sampling variability of streamsediments in broad-scale geochemical reconnaissance. Discussion. Transactions of the Institution of Mining and Metallurgy, London, B81, 122124.
H, R.J. & L, P.L. 1974. Data Processing for the Provisional
Geochemical Atlas of Northern Ireland. Applied Geochemistry Research Group,
Imperial College of Science and Technology, London. Technical Communication, 61.
H, R.J. & L, P.L. 1976. Three-component colour maps
from lineprinter output. Transactions of the Institution of Mining and Metallurgy,
London, B85, 234237.
H, R.J. & T, M. 1976. Duplicate analysis in geochemical
practice. II. Examination of the proposed method and examples of its use.
The Analyst, 101, 699709.
H, P.J. 1981. Robust statistics. John Wiley, New York.
I, J. 1963. Factor and vector analysis programs for analysing geologic data. Oce
of Naval Research, Geography Branch. Northwestern University, Evanston,
Illinois. Technical Report no. 6. ONR Task no. 389135.
I, J. & P, E.G. 1962. Classification of modern Bahamian carbonate
sediments. In: H, W.E. (ed.) Classification of carbonate rocks: a symposium.
Memoir 1, American Association of Petroleum Geologists, Tulsa, OK.
253272.
INSTITUTE OF GEOLOGICAL SCIENCES 1978. Geochemical atlas of Great
Britain: Shetland Islands. Institute of Geological Sciences [British Geological
Survey], London.
J, C.H. 1970. A rapid method for calculating the statistical precision of
geochemical prospecting analyses. Transactions of the Institution of Mining and
Metallurgy, London, B79, 8889.
J, H. 1946. An invariant form for the prior probability in estimations
problems. Proceedings of the Royal Society, London, A186, 453461.
J, T.A. 1972. Multiple regression with correlated independent variables.
Mathematical Geology, 4, 203218.
K, T. 1967. The Divergence and Bhattacharyya distance measures in
signal selection. IEEE Transactions on Communication Technology, 15, 5260.
K, H.F. 1958. The varimax criterion for analytic rotation in factor
analysis. Psychometrika, 23, 187200.
K, J. 1969. The application of some data processing techniques to the
interpretation of geochemical data. Unpublished PhD thesis, University of
London, UK.
K, B. & H, J.A. 1981. Representing points in many dimensions
by trees and castles. Journal of the American Statistical Association, 76, 260269.
314
K, G.S. Jr, H, R.J., C, R.H. & S, J.H.

1979. Development of data enhancement and display techniques for stream-sediment
data collected in the National Uranium Resource Evaluation Program of the United
States Department of Energy. U.S. Department of Energy, Grand Junction,
Colorado. Open-file Report, GJBX-28(80).
K, G.S. Jr, H, R.J. & S, J.H. 1981a. Uranium resource
assessment through statistical analysis of exploration geochemical and other data. Final
Report. U.S. Department of Energy, Grand Junction, Colorado. Open-file
Report, GJBX-140(81).
K, G.S. Jr, H, R.J., S, J.H. & L, R.K. 1981b.
Uranium resource assessment through statistical analysis of exploration
geochemical and other data. Economic Geology, 76, 10561066.
K, A.N. 1933. Sulla determinazione empirico di una legge di
distribuzione. Giornale dellIstituto Italiano degli Attuari, Rome, 4, 8391.
K, W.C. 1959. Trend Surface analysis of contour-type maps with
irregular control-point spacing. Journal of Geophysical Research, 64, 823834.
K, W.C. & G, F.A. 1965. An introduction to statistical models in
geology. McGraw-Hill Book Co, New York.
K, W.C. & I, J. 1963. Stratigraphic factor maps. Bulletin of the
American Association of Petroleum Geologists, 47, 698701.
L, A.F. 1984. The application of regional geochemistry to the causes and predicted
incidence of bovine hypocupraemia. Unpublished PhD thesis, University of
London, UK.
L, A., T, I., H, R.J. & L, G. 1983. The incidence of
bovine hyprocupraemia in England and Wales and its relationship with
geochemistry. In: S, N.F., G, R.G., A, W.M., L,
K.A. & W, G. (eds) Trace elements in animal production and veterinary
practice. British Society of Animal Production. Occasional paper, 7, 130
131.
L, X.D., L, S.L., W, S.C., S, W.Z. & T, I. 2004. The study
of metal contamination in urban soils of Hong Kong using a GIS-based
approach. Environmental Pollution, 129, 113124.
L, P.L. & H, R.J. 1973. Automated colour mapping of
three-component systems and its application to regional geochemical
reconnaissance. In: J, M.J. (ed.) Geochemical Exploration 1972. Institution of Mining and Metallurgy, London, 297304.
M, S.J. 1980. Computer-based interpretation of large regional geochemical data
sets. Unpublished PhD thesis, University of London, UK.
M, S.J. & H, R.J. 1978. Factor score maps of regional geochemical data
from England and Wales. Applied Geochemistry Research Group, Imperial
College of Science, Technology and Medicine, London., 2 sheets.
M, S.J. & H, R.J. 1980. Power transform removal of skewness
from large data sets. Transactions of the Institute of Mining and Metallurgy,
London, B89, 9297.
M, V. & I, J. 1964. FORTRAN program for factor and vector analysis
of geologic data using an IBM 7090 or 7094/1401 computer system. Kansas
Geological Survey, Lawrence, KS. Special Distribution Publication, 13.
M, G. 1963. Principles of geostatistics. Economic Geology, 58, 1246
1266.
MC, G.A. 1963. Optimization by function contouring techniques. Space and
Information Systems Division, North American Aviation Inc., Downey,
CA. Report SID 63-171.
MC, G.A. & DP, H.J. 1965. Improved FORTRAN IV function contouring
program. Space and Information Systems Division, North American Aviation Inc., Downey, CA. Report SID 65-672.
M, R.L. 1956. Trend surfaces: their application to analysis and description
of environments of sedimenation. I. The relation of sediment-size parameters
to current-wave systems and physiography. Journal of Geology, 64, 425466.
M, R.L. & K, J.S. 1962. Statistical Analysis in the Geological Sciences. John
Wiley & Sons, New York, USA.
M, S.A., H, R.J., S, P.A. & C, D.S. 1987. An
investigation of the applicability of trend surface analysis to marine
exploration geochemistry. In: T, P.G., D, M.R., M, J.R. &
S, U. (eds) Marine Minerals: Advances in Research and Resource
Assessment (NATO ASI series). Series C: Mathematical and Physical Sciences, 194. D. Reidel, Dordrecht, 559576.
M, M.J. 1960. Facts from Figures. Penguin Books, Harmondsworth.
M, J.L. & A, D.C. 1968. Experimental evaluation of techniques for
automatic segmentation of objects in a complex scene. In: C, G.C.,
L, R.S., P, D.K. & R, A. (eds) Pictorial pattern
recognition. Thompson Book Co, New York, 313.
N, I., G, R.G. & W, J.S. 1966a. Studies in regional geochemistry. Transactions of the Institution of Mining & Metallurgy, London, B75,
106107.
N, I., G, R.G. & W, J.S. 1966b. Automatic data plotting and
mathematical and statistical interpretation of geochemical data. In:
C, E.M. (ed.) Proceedings of the Symposium on Geochemical Prospecting,

Ottawa, April, 1964. Geological Survey of Canada Paper 6654, 195210.
N, I., G, R.G. & W, J.S. 1969. The role of some statistical
and mathematical methods in the interpretation of regional geochemical
data. Economic Geology, 64, 204220.
N, I., T, I., W, J.S., F, W.K., H, R.J.,
K, J. & T, D. 1970a. Regional geochemical reconaissance
of the Derbyshire area. Report 70/2. Institute of Geological Sciences
[British Geological Survey], London.
N, I., T, I., W, J.S., F, W.K., H, R.J.,
K, J. & T, D. 1970b. Regional geochemical reconaissance
of the Denbighshire area. Report 70/8. Institute of Geological Sciences
N, I., T, I., W, J.S., F, W.K., H, R.J. &
K, J. 1971. Regional geochemical reconaissance of the Devon
and North Cornwall area. Report 71/2. Institute of Geological Sciences
N, I. & W, J.S. 1967. The application of computerised mathematical
and statistical procedures to the interpretation of geochemical data.
Proceedings of the Geological Society of London, 1642, 186199.
P, J.M. 1970. FORTRAN IV program for Q-mode cluster analysis on
distance function with printed dendrogram. Computer Contribution no. 46.
Kansas Geological Survey, Lawrence, KS.
R, M.H., T, M. & H, M. 1992. Objective evaluation of
precision requirements for geochemical analysis using robust analysis of
variance. Journal of Geochemical Exploration, 44, 2336.
R, D.W., S, M.A. & H, R.J. 1973. Experimental geochemical maps a case study in cartographic techniques for scientific research.
The Cartographic Journal, 10, 112118.
R, T. 1981. Some new approaches in vapour geochemistry. Unpublished PhD
thesis, University of London, UK.
R, T., H, R.J. & H, M. 1985a. Numerical modelling experiments in vapour geochemistry. I: Method and FORTRAN program.
Computers & Geosciences, 11, 5567.
R, T., H, M. & H, R.J. 1985b. Numerical modelling experiments in vapour geochemistry. II: Vapour dispersion patterns and exploration implications. Journal of Geochemical Exploration, 23, 265280.
R, T.A. jr, J, B.L. & R, B.F. 1976. MINITAB student handbook.
Duxbury Press, North Scituate, MS.
S, J.W. jr 1969. A nonlinear mapping for data structure analysis. IEEE
Transactions on Computers, C18, 410409.
S, D. 1968. A two-dimensional interpolation function for irregularlyspaced data. In: Proceedings of the 23rd National Conference of the Association for
Computing Machinery. Brandon/Systems Press, Princeton, NJ, 517523.
S, W.A. 1931. The Economic control of manufactured product. D. Van
Nostrand Company, New York and London.
S, V.I. 1939. On the estimation of the discrepancy between empirical
curves of distribution for two independent samples. Bulletin Mathmatique de
lUniversit de Moscou, 2, fasc. 2.
S, C. 1904. General intelligence, objectively determined and
measured. American Journal of Psychology, 15, 201293.
S, D.F. 1967. Generation of polynomial discriminant functions for
pattern recognition. IEEE Transactions on Electronic Computers, EC16,
308319.
S, R.E. 1966. Rapid methods of trace analysis for geochemical applications.
Edward Arnold, London.
S, C.R. 2006. On the special application of Thompson-Howarth error
analysis to geochemical variables exhibiting a nugget effect. Geochemistry:
Exploration, Environment, Analysis, 6, 357368.
S, C.R. & L, D. 2008. Thompson-Howarth error analysis: unbiased alternatives to the large-sample method for assessing non-normally
distributed measurement error in geochemical samples. Geochemistry: Exploration, Environment, Analysis, 8, 173182.
S, J.E. 1959. A statistical problem in geochemical prospecting. Unpublished MSc
thesis, Imperial College, University of London, UK.
S [W. S. G] 1908. Probable error of a correlation coefficient.
Biometrika, 6, 302310.
S, C.P. 1971. Phosphate deposits on the northwest African continental
shelf and slope. Unpublished PhD thesis, University of London, UK.
S, C.P. 1972. Geochemistry of continental margin sediments
from northwest Africa. Chemical Geology, 10, 137156.
T, L.L. 1947. Multiple factor analysis. University of Chicago Press,
Chicago.
T, M. 1978. DUPAN3, a subroutine for the interpretation of
duplicated data in geochemical analysis. Computers & Geosciences, 4, 333340.

T, M. 1981. Quality control in the laboratory. In: F, W.K.
(ed.) Analytical methods in geochemical prospecting. Handbook of Exploration
Geochemistry, 1, 2546.
T, M. 1982. Regression methods in the comparison of accuracy. The
Analyst, 107, 11691180.
T, M. 1983. Control procedures in geochemical analysis. In:
H, R.J. (ed.) Statistics and data analysis in geochemical prospecting.
Handbook of Exploration Geochemistry, 2, 3958.
T, M. 1988. Variation of precision with concentration in an
analytical system. The Analyst, 113, 15791587.
T, M. 2010. Analytical methodology in the Applied Geochemistry
Research Group (19501988) at the Imperial College of Science and
Technology, London. Geochemistry: Exploration, Environment, Analysis, 10,
251259.
T, M. & F, T. 1996. What exactly is fitness for purpose in
analytical measurement. The Analyst, 121, 275278.
T, M. & H, R.J. 1973. The rapid estimation and control of
precision by duplicate determinations. The Analyst, 98, 153160.
T, M. & H, R.J. 1976. Duplicate analysis in geochemical
practice. I. Theoretical approach and estimation of analytical reproducibility. The Analyst, 101, 690698.
T, M. & H, R.J. 1978. A new approach to the estimation of
analytical precision. Journal of Geochemical Exploration, 9, 2330.
T, M. & H, R.J. 1980. The frequency distribution of
analytical error. The Analyst, 105, 11881195.
T, M., W, S.J. & W, S.J. 1979. Statistical appraisal of
interference effects in the determination of trace elements by atomicabsorption spectrophotometry in applied geochemistry. The Analyst, 104,
299312.
T, I. 2010. Research in Applied Environmental Geochemistry, with
particular reference to Geochemistry and Health. Geochemistry: Exploration,
Environment, Analysis, 103, 000000.
T, C.R. 2003. Geochemical associations and the spatial distribution of metals in
urban soils. Unpublished PhD thesis, University of London, UK.
T, C. & F, M.E. 2001. Investigating urban geochemistry using
geocgaphical information systems. Science Progress, 84, 183204.
T, J.S. 1967. The inorganic mineral potential of the sea-floor and
problems in its exploration. In: Proceedings of the British National Conference on
the Technology of the Sea and Seabed held at the Atomic Energy Research
Establishment, Harwell, April 5th, 6th and 7th, 1967; sponsored by the Ministry of
Technology. United Kingdom Atomic Energy Authority (Research Group),
Harwell, Didcot, Berks. Report AERE-R 5500. Her Majestys Stationery
Office, London. Paper SB16, 120.
T, E., D, A., R, M.H., R, M.S.,
S, P., T, I., V, E. & V, K. 2000.
315
Spatially resolved hazard and exposure assessments: an example of lead in

soil at Laviron, Greece. Environmental Research, A82, 3345.
T-M, E.E. 2000. Human health risk assessment for contaminated land
in historical mining areas. Unpublished PhD thesis, University of London,
UK.
T, J.W. 1977. Exploratory data analysis. Addison-Wesley, Reading, MS
[preliminary edition printed for private circulation, 1970].
T, M.St.J. 1980. A comparative study of multiple regression techniques in
geochemistry. Unpublished MSc thesis, University of London, UK.
T, M.St.J. 1986. Statistical analysis of geochemical data illustrated by reference to
the Dalradian of N.E. Scotland. Unpublished PhD. Thesis, University of
London, UK.
W, J.H. 1963. Hierarchical grouping to optimize an objective function.
Journal of the American Statistical Association. 58, 236244.
W, J.S. 1964. Geochemistry and life. New Scientist. 23, 504507.
W, J.S. & A, W.J. 1965. Regional geochemical reconnaissance
applied to some agricultural problems in Co. Limerick, Eire. Nature, 208,
10561059.
W, J.S. & T, M. 1977. Analytical requirements in exploration
geochemistry. Pure and Applied Chemistry, 49, 15071518.
W, J.S., F, J., N, I. & T, J.S. 1964a. Regional geochemical
reconnaissance in the Namwala Concession area Zambia. To accompany the
Geochemical Maps of the Namwala Concession Area published by the
Geological Survey of Zambia in. 1964. Applied Geochemistry Research
Group, Imperial College of Science and Technology, London. Technical
Communication no. 47.
W, J.S., F, J. et al. 1964b. Regional geochemical maps of the Namwala
Concession area, Zambia based on a Reconnaissance Stream Sediment Survey.
Geological Survey of Zambia, Zambia.
W, J.S., N, I., F, R., L, P.L. & H, R.J. 1973.
Provisional Geochemical Atlas of Northern Ireland. Applied Geochemistry
Research Group, Imperial College of Science and Technology, London.
Technical Communication, 60.
W, J.S., T, I., T, M., H, R.J. & L,
P.L. 1978. The Wolfson Geochemical Atlas of England and Wales. Clarendon
Press, Oxford and London.
W, E.H.T. 1959. Composition trends in granite: Modal variations and
ghost stratigraphy in part of the Donegal Granite, Eire. Journal of Geophysical
Research, 64, 835848.
W, E.H.T. 1963. A surface-fitting program suitable for testing geological models
which involve areally-distributed data. Office of Naval Research, Geography
Branch. Northwestern University, Evanston, Illinois. Technical Report No.
2, ONR Task No. 389135, Contract No. 1228(26).
Received 29 July 2008; revised typescript accepted 8 April 2009.

StatAnalysis Howarth 2010

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

StatAnalysis Howarth 2010

Diunggah oleh

Hak Cipta:

Format Tersedia

Statistical analysis and data display at the Geochemical Prospecting

Research Centre and Applied Geochemistry Research Group,

Fig. 1. Annual numbers of GPRC/AGRG publications (total n=76)

oratory; (ii) the display of large, multi-element, data sets in map

R. J. Howarth & R. G. Garrett

The situation in the early 1960s is summarized in Hawkes &

of Cravens method still adapted to hand-calculation, on the

Statistical analysis and data display at GPRC and AGRG

R. J. Howarth & R. G. Garrett

(with or without a logarithmic transform), calculated the

The first computer-plotted regional maps

Statistical analysis and data display at GPRC and AGRG

R. J. Howarth & R. G. Garrett

Fig. 4. Prototype (1965)

making these maps, caused by the non-uniform stretch of the

produce the final camera-ready copy for the offset-litho plates

Statistical analysis and data display at GPRC and AGRG

In 1968, the Imperial College IBM 7094/1401 installation

et al. 1973) and England and Wales (Webb et al. 1978)

while preserving large ones corresponding to unsampled areas,

R. J. Howarth & R. G. Garrett

Fig. 6. Prototype SC4020-plotted map

on dimensionally-stable Kodak 2496 RAR film. The plotting

more suitable for use (Rhind et al. 1973, 118). Nevertheless,

Statistical analysis and data display at GPRC and AGRG

Fig. 7. Prototype SC4020-plotted map

was extended in 197273 to include a variety of image-processing

a high-pass filter (HPF), contrasting a central map-cell with

(NURE) Program, to data sets from the Hydrogeochemical and

R. J. Howarth & R. G. Garrett

Fig. 8. Trial gap-filled moving-average

In related studies, Steven Earle (1978, 158167) developed a

subsequently developed by Thurstone (1947) and other

Statistical analysis and data display at GPRC and AGRG

Fig. 9. Trial LGP2703 laser-plotted

fellowship, where he investigated the utility of R-mode factor

R. J. Howarth & R. G. Garrett

Statistical analysis and data display at GPRC and AGRG

R. J. Howarth & R. G. Garrett

Fig. 11. Detail of 1972 trial windrose

Fig. 12. Cover of the Northern Ireland

Statistical analysis and data display at GPRC and AGRG

Fig. 13. Subtractive colour-combined

Wales atlas data set to produce a pair of colour-combined maps

first of these to be implemented in AGRG, in 1969, was the

R. J. Howarth & R. G. Garrett

Fig. 14. Subtractive colour-combined

of the Divergence (Bhattacharyya 1943) and Bhattacharyya

points in the plane until the matrix of their inter-point distances

Statistical analysis and data display at GPRC and AGRG

Fig. 15. Trial KolomogorovSmirnov

atlas by David Crisp (1974), using a subset of data (subsampled

where x is the set of original observations; is the power

BoxCox transform in particular, to be equally useful with a

R. J. Howarth & R. G. Garrett

correlation coefficients, and so on, has long been recognised.

obtain robust estimates of both univariate statistics (such as

Statistical analysis and data display at GPRC and AGRG

Fig. 17. Subtractive colour-combined

factor and classical linear discriminant analysis) began to be

R. J. Howarth & R. G. Garrett

Fig. 19. (right) Q-mode weighted pair

Statistical analysis and data display at GPRC and AGRG

Fig. 20. Nonlinear mapping onto 2-dimensions of the geochemistry