Anda di halaman 1dari 41

NUMERICAL ANALYSIS OF

BIOLOGICAL AND
ENVIRONMENTAL DATA
Lecture 1
Introduction
John Birks
TEACHING OF THE COURSE
Course Leader Gavin Simpson (UCL)
Lectures 1, 4, 5, 8, 10, 12 John Birks (Bergen & UCL)
Lectures 2, 3, 6, 7, 9, 11 Gavin Simpson (UCL)
Practicals 1-10 Gavin Simpson (UCL)

Course administration Adam Young (UCL)
Book list
Level of course
Aims of course
What are multivariate data?
What is multivariate data analysis?
Aims of multivariate data analysis
Why do multivariate data analysis?
Terminology
Types of variables
Geometrical models and concept of similarity
(dissimilarity or distance)
Computing
Course topics
INTRODUCTION
Approach from practical biological and geological
viewpoint, not statistical theory viewpoint.

Assume no background in matrix algebra,
eigenanalysis, or statistical theory.

Emphasis on techniques that are ecologically realistic
and useful and that are computationally feasible.
LEVEL OF THE COURSE
Truths which can be proved can also be
known by faith. The proofs are difficult and
can only be understood by the learned; but
faith is necessary also to the young, and to
those who, from practical preoccupations,
have not the leisure to learn. For them,
revelation suffices.

Bertrand Russell 1946
The History of Western Philosophy
It cannot be too strongly emphasised that a long
mathematical argument can be fully understood on first
reading only when it is very elementary indeed, relative
to the readers mathematical knowledge. If one wants
only the gist of it, he may read such material once only,
but otherwise he may expect to read it at least once
again. Serious reading of mathematics is best done sitting
bolt upright on a hard chair at a desk. Pencil and paper
are indispensable.
L Savage 1972
The Foundations of Statistics.
BUT:
A journey of a thousand miles begins with a single step
Lao Tsu
STATUS OF MULTIVARIATE NUMERICAL DATA
ANALYSIS
Basic mathematics of correlation, regression, analysis of variance,
eigenanalysis, randomisation etc. not new, worked out in 1920-1930s.
Arithmetic manipulations and calculations involved so numerous and so
time consuming; virtually impossible to work with anything other than
smallest data-sets on hand calculator or early computer.
Development of numerical data analysis closely linked to development of
computers.
Now possible to do in seconds what would have taken hours, days, even
weeks.
Increased availability of computer program packages has advantages and
disadvantages.


Advantages
fast
painless
simple
Disadvantages
too fast
too easy
too simple
Need to understand a technique
well before one can critically
evaluate results. Sound
interpretation requires a good
understanding of the technique.
Provide introductory understanding to the most appropriate
methods for the numerical analysis of complex multivariate
biological and environmental data. Recent maturation of
methods.
Provide introduction to what these methods do and do not
do.
Provide some guidance as to when and when not to use
particular methods.
Provide an outline of major assumptions, limitations,
strengths, and weaknesses of different methods.
Indicate to you when to seek expert advice.
Encourage numerical thinking (ideas, reasons, potentialities
behind the techniques). Not so concerned here with numerical
arithmetic (the numerical manipulations involved).
AIMS
ON THE USES AND METHODS OF
STATISTICS
By Professor F. Y. Edgeworth, M. A., D. C. L.
Syllabus for Edgeworths
1892
Newmarch Lectures,
University College London
I. FIRST PRINCIPLES
The extent of the subject here treated is that which is denoted by two leading definitions of statistics, viz: the study
of numerical statements relating to society, and the theory of means. The subject may be divided according as the
element of induction is more or less prevalent. First come general directions as to the acquisition of data; e.g., that
figures should be accurate, and terms unambiguous. Examples of the violation of these rules; together with other
precepts and cautions. Use of relative figures (per head, per cent, &c.). Analysis of the data.
References: Confrences sur la Statistique (Rozier Editeur), 1891; Pidgin, Practical Statistics, 1888; Giffen,
International Statistical Comparisons, Economic Journal, June, 1892.
II. GRAPHICAL METHODS
The Cartesian system of co-ordinates. Integration and interpolation. Case where several dependent variables (i.e.
diseases from different causes) are referred to one independent variable (i.e. the time). The case of one variable
dependent on two independent variables is properly represented by a surface; but curves of level and variously
coloured planes are more convenient. Methods of expressing variation of a quantity relative to its initial, or average,
value. Miscellaneous devices for exhibiting numerical relations to the eye.
References: Marey, La Mthode Graphique, 1885; Favaro, Leons de Statique Graphique (translated into French by
Terrier), Ch. V. with appendix by the translator. Levasseur, La Statistique Graphique, Journal of the Statistical
Society, Jubilee vol., 1885; Marshall, The Graphic Method of Statistics, Ibid; Cheysson, Les Cartogrammes
teintes gradues, Journal de la Socit de Statistique de Paris, 1887; Scribners Statistical Atlas of the United
States; Longstaff, Studies in Statistics, 1891.
III. THE DOCTRINE OF AVERAGES
The general idea of a mean comprehends innumerable species, of which the most important are, the Arithmetic
Mean, the Median, the Greatest Ordinate (or centre of greatest condensation) and the Geometric Mean. A cross
division is between simple and weighted means. Concrete instances of these varieties. Subtle distinction between
so-called objective and subjective means. Peculiar prestige attaches to the means of which the constituents are
grouped according to the Probability Curve, or law of error. A priori demonstration, and empirical verification, that
this form arises under certain conditions.
References: Venn, Logic of Chance, Third Edition, 1888, chap, xviii., and xix.; On.Averages. Journal of the
Statistical Society, 1891; Galton, Statistics by inter-comparison, Philosophical Magazine, 1875; Bertillon, Moyenne,
Dictionnaire Encyclopdique des Science Mdicales; Edgeworth, On the Choice of Means, Phil. Mag., 1887, On
the empirical proof of the law of error, Ib., 1887.
IV. TYPES AND CORRELATIONS
The mean man has for stature, length of cubit, height of knee, &c, the respective means of the statures, lengths,
&c., of a greater number of men. Reply of the objection that such a combination of partial means may not form a
possible whole. Relation between the deviation of one organ or attribute, e.g. length of cubit, from its mean; as
established by Mr. Galton, and illustrated by Mr. H. Dickson. Abridged method of ascertaining the co-efficient
which expresses the correlation between three attributes, e.g. stature, length of cubit and height of knee. The
formula for the most probable attribute, e.g. stature corresponding to assigned values of two other attributes, e.g.
length of cubit and height of knee, may be ascertained either from three simple correlations, between stature and
cubit, stature and height of knee, cubit and height of knee; or by observations special to the case of three variables.
Correlation between any number of attributes.
References: Quetelet, Anthropomtrie; Galton, Family Likeness in Stature, Proceedings of the Royal Society, 1886;
Co-relations and their measurements Ibid. 1888; Weldon, Correlated Variations, Ibid, 1892.
V. THE STATISTICAL PART OF INDUCTIVE LOGIC
Passing Insurance and other direct applications of statistics, we come to the investigation of causes. The inductive
method to which statistics lends itself, the Method of Agreement, is liable to the fallacy Post hoc propter hoc; of
which numerous examples occur. The Method of Concomitant variations is facilitated by the use of parallel curves.
The Method of Residues is exemplified when in comparing the death rates of different classes, we make allowance
for their different ages; and in similar cases.
References: Mill, Logic; Giffen, Essays on Finance, and Article in June No. of Economic Journal; Humphreys,
Value of death rates as a test of Sanitary conditions, Journal of the Statistical Society, 1874, Class Mortality
Statistics, Ibid, 1887.
VI. THE ELIMINATION OF CHANCE
One case of the Method of Residues, for which there exists a technical apparatus, is where the agency allowed for
consists of those fleeting causes called chance. The simple method of eliminating chance, described by Mill
(Logic, iii, xviii, 4) and the higher method derived from the theory of error. The latter method is particularly
applicable where the deviation from the average value of a ratio e.g. that between male and female births
follows the analogy of the simpler games of chance. In other cases the higher theory affords rather regulative ideas
than exact conclusions; in this respect, comparable to the use of the mathematical theory of economics.
References: Westergaard, Grundzge der Theorie der Statistik, 1891; Duesing, Das geschlechtverhaltniss in
Preussen, 1890; Edgeworth, Methods of Statistics, Journal of the Statistical Society, Jubilee vol., 1885.
[The lectures were presented on six consecutive Wednesdays
at 5:00 P.M., beginning 11 May 1892, admission free.]
At the end of the semester, could my students fully understand all of the
statistical methods used in a typical issue of Ecology? Probably not, but they did
have the foundation to consider the methods if authors clearly described their
approach. Statistics can still mislead students, but students are less apt to see all
statistics as lies and more apt to constructively criticise questionable methods.
They can dissect any approach by applying the conceptual terms used throughout
the semester. Students leave the course believing that statistics does, after all,
have relevance, and that it is more accessible than they believed at the beginning
of the semester.
At its best, statistical analysis sharpens thinking about data, reveals new
patterns, prompts creative thinking, and stimulates productive discussions in
multi-disciplinary research groups. For many scientists, these positive possibilities
of statistics are over-shadowed by negatives; abstruse assumptions, emphasis of
things one cant do, and convoluted logic based on hypothesis rejection. One
colleagues reaction to this Special Feature (on statistical analysis of ecosystem
studies) was that statistics is the scientific equivalent of a trip to the dentist.
This view is probably widespread. It leads to insufficient awareness of
the fact that statistics, like ecology, is a vital, evolving discipline with ever-
changing capabilities.
AIMS
Species #11 #12 #13 #14 #15 #16 #17 #18 #19 #20
Equisetum pratense 4 - 1 2 - 7 10 13 18 17
Rubus pubescens 11 4 13 18 4 7 17 - 13 2
R. strigosus 1 8 1 2 19 8 3 5 2 8
Cornus stolonifera 6 - - 1 - - 1 1 - 1
C. canadenis - - 2 - 12 - - 1 - -
Rosa acicularis 2 2 1 6 11 2 1 - 3 3
Galium boreale - - 12 3 22 - 2 - 1 -
Ribes oxycanthoides - 1 - 4 15 - - 8 - 3
R. triste 2 9 13 2 - 4 10 6 16 9
Mitella nuda - 6 - - 1 9 - 16 25 19
Mertensia nudicaulis - 11 6 10 - 2 10 4 1 12
Aralia nudicaulis 4 - 6 1 3 - - 1 - 1
Viburnum edule 2 15 5 6 - 7 4 5 3 4
Calamagrostis canescens 3 3 - 1 1 6 11 8 4 4
Populus balsamifera (seedling) 2 1 - 1 1 2 2 - 1 -
Prunus virginiana (seedling) - - 1 - - - - - 1 -
Populus tremuloides (seedling) - - 1 - 1 - - 1 - -
Actaea rubra - - 1 - 1 - - - - 1
Circaea alpina 4 - 1 18 1 3 - - 2 11
Thalictrun venulosum 3 - - - - 1 1 - - -
Matteuccia struthiopteris - - - - - - - - - 2
NO. OF SPECIES 12 10 14 14 12 12 12 12 13 14
July 18, 1998. Plot 6 (quadrats) (Rt. Bank, c 300 m S of mouth of Steepbank R., 40m inland)
A typical page from a field notebook. This one records observations on the ground vegetation in
Populus balsamifera woodland in the flood plain of the Athabasca River, Alberta.
TYPES OF MULTIVARIATE DATA
Object (n) Variable (m)
Botany (plant ecology) Quadrat
Relev
Plot
Plant species
Archaeology Sites Artefacts
Geology Samples Particle-size classes
Chemistry Stream sediments Trace elements
Zoology Geographical localities Morphometric
characters
Pollen analysis Sediment samples Pollen types
Diatom analysis Sediment samples Diatom types
Palaeontology Rock samples Fossil taxa
... ... ...
Features in common
MANY OBJECTS n
MANY VARIABLES m
CAN BE ARRANGED IN DATA MATRIX
of SAMPLES or OBJECTS x VARIABLES
Samples (n samples)
1 2 3 4 ... N (columns)
1 x
ik
* * *
...
X
1n

Variables
(m vars)
2 * * * *
3 * * * *
4 * * * *
... ...
M
(rows)
x
m1
X
mn

DATA MATRIX
Matrix X with n columns x m rows. n x m matrix. Order (n x m).
|
|
.
|

\
|
=
23
13
22
12
21
11
x
x
x
x
x
x
X
X
21
element in row
two
column
one
X
ik
row i column k
subscript
FEATURES OF MULTIVARIATE DATA
Complex
Show: Noise
Redundancy
Internal relationships
Outliers
Some information in the data is only
indirectly interpretable
BIOLOGICAL DATA
many species
+/, quantitative, often %,
many zero values, skewed
non-linear responses to
environment
ENVIRONMENTAL DATA
fewer variables
+/, ranks, quantitative
non-normal
linear inter-relationships, often high
correlations, some redundancy
STATISTICS AND DATA ANALYSIS
1. Hypothesis testing confirmatory data analysis (CDA).
2. Model building
explanatory
empirical
[statistical]
Pielou (1981) Quart. Rev. Biol.
Models are often displayed with little or no effort to link them with the
real world. As a result the whole body of knowledge and theory has
grown top-heavy with models... Models are not useless but too much
should not be expected of them. Modelling is only a part, and a
subordinate part, of research.
3. Hypothesis generation exploratory data analysis (EDA).
Detective work
CDA & EDA - different aims, philosophies, methods
We need both exploratory and confirmatory.
J W Tukey 1980
EXPLORATORY
DATA ANALYSIS
Real world facts
Observations
Measurements
Data
Data analysis
Patterns
Information
Hypotheses
Decisions
CONFIRMATORY DATA
ANALYSIS
Hypotheses
Real world
facts
Observations
Measurements
Data
Statistical
testing
Hypothesis
testing
Theory
Underlying statistical model (e.g.
linear or unimodal response)
Exploratory data
analysis
Biological Data Y
Description
Confirmatory data
analysis
Testable null
hypothesis
Additional (e.g.
environmental data) X
Rejected hypotheses
Observation
Data
collection
Analysis
Evaluate statistical H
0
, H
A
Evaluate prediction
Evaluate scientific H
0
, H
A
Evaluate theory/paradigm
Theory/Paradigm
Prediction
Scientific H
0
Scientific H
A
Statistical H
0
Statistical H
A
Conceptual design of study, choice
of format (experimental, non-
experimental) and classes of data
Sampling or
experimental
design
induction
deduction
deduction
The Popperian hypothetico-deductive method, after Underwood and others.
H
O
= null hypothesis H
A
= alternative hypothesis
EXPLORATORY
DATA ANALYSIS
CONFIRMATORY
DATA ANALYSIS
How can I optimally describe or
explain variation in data set?
Can I reject the null hypothesis
that the species are unrelated to a
particular environmental factor or
set of factors?
Samples can be collected in many
ways, including subjective
sampling.
Samples must be representative of
universe of interest random,
stratified random, systematic.
Data-fishing permissible, post-hoc
analyses, explanations, hypotheses,
narrative okay.
Analysis must be planned a priori.
P-values only a rough guide. P-values meaningful.
Stepwise techniques (e.g. forward
selection) useful and valid.
Stepwise techniques not strictly
valid.
Main purpose is to find pattern or
structure in nature. Inherently
subjective, personal activity.
Interpretations not repeatable.
Main purpose is to test hypotheses
about patterns. Inherently
analytical and rigorous.
Interpretations repeatable.
A WELL-DESIGNED MODERN ECOLOGICAL
STUDY COMBINES BOTH.
1) Two-phase study
- Initial phase is exploratory, perhaps
involving subjectively located plots or
previous data to generate hypotheses.
- Second phase is confirmatory,
collection of new data from defined
sampling scheme, planned data
analysis.
2) Split-sampling
- Large data set (>100 objects),
randomly split into two (75/25)
exploratory set and confirmatory set.
- Generate hypotheses from
exploratory set (allow data fishing);
test hypotheses with confirmatory set.
- Rarely done in ecology.
Data diving with cross-validation: an investigation of broad-
scale gradients in Swedish weed communities.
ERIK HALLGREN, MICHAEL W. PALMER and PER MILBERG.
Journal of Ecology, 1999, 87, 1037-1051.
Full data set
Some
previously
removed
data
Clean data set
Exploratory
data set
Combined
data set
Confirmatory
data set


RESULTS
Remove observations with missing data
Random split
Hypotheses
Ideas for
more analysis
Choice of variables
Analyses for display

Hypothesis
tests


Flow chart for the
sequence of analyses.
Solid lines represent the
flow of data and dashed
lines the flow of analysis.
EUROPEAN FOOD
(From A Survey of Europe Today, The Readers Digest Association Ltd.) Percentage of all
households with various foods in house at time of questionnaire. Foods by countries.
Country
GC ground coffee 90 82 88 96 94 97 27 72 55 73 97 96 96 98 70 13
IC instant coffee 49 10 42 62 38 61 86 26 31 72 13 17 17 12 40 52
TB tea or tea bags 88 60 63 98 48 86 99 77 61 85 93 92 83 84 40 99
SS sugarless sugar 19 2 4 32 11 28 22 2 15 25 31 35 13 20 - 11
BP packaged biscuits 57 55 76 62 74 79 91 22 29 31 - 66 62 64 62 80
SP soup (packages) 51 41 53 67 37 73 55 34 33 69 43 32 51 27 43 75
ST soup (tinned) 19 3 11 43 25 12 76 1 1 10 43 32 4 10 2 18
IP instant potatoes 21 2 23 7 9 7 17 5 5 17 39 11 17 8 14 2
FF frozen fish 27 4 11 14 13 26 20 20 15 19 54 51 30 18 23 5
VF frozen vegetables 21 2 5 14 12 23 24 3 11 15 45 42 15 12 7 3
AF fresh apples 81 67 87 83 76 85 76 22 49 79 56 81 61 50 59 57
OF fresh oranges 75 71 84 89 76 94 68 51 42 70 78 72 72 57 77 52
FT tinned fruit 44 9 40 61 42 83 89 8 14 46 53 50 34 22 30 46
JS jam (shop) 71 46 45 81 57 20 91 16 41 61 75 64 51 37 38 89
CG garlic clove 22 80 88 16 29 91 11 89 51 64 9 11 11 15 86 5
BR butter 91 66 94 31 84 94 95 65 51 82 68 92 63 96 44 97
ME margarine 85 24 47 97 80 94 94 78 72 48 32 91 94 94 51 25
OO olive, corn oil 74 94 36 13 83 84 57 92 28 61 48 30 28 17 91 31
YT yoghurt 30 5 57 53 20 31 11 6 13 48 2 11 2 - 16 3
CD crispbread 26 18 3 15 5 24 28 9 11 30 93 34 62 64 13 9
D I F NL B L GB P A CH S DK N SF E IRL
Dendrogram showing the results of minimum variance agglomerative cluster
analysis of the 16 European countries for the 20 food variables listed in the table.
Key:
Countries: A Austria, B Belgium, CH Switzerland, D West Germany, E Spain, F France, GB
Great Britain, I Italy, IRL Ireland, L Luxembourg, N Norway, NL Holland, P Portugal, S
Sweden, SF Finland
Classification
Ordination
Correspondence analysis of percentages of households in 16
European countries having each of 20 types of food.
Key:
Countries:
A Austria,
B Belgium,
CH Switzerland,
D West Germany,
E Spain,
F France,
GB Great Britain,
I Italy,
IRL Ireland,
L Luxembourg,
N Norway,
NL Holland,
P Portugal,
S Sweden,
SF Finland
Minimum spanning tree fitted to the full 15-dimensional correspondence
analysis solution superimposed on a rotated plot of countries from
previous figure.
Percentages of
people employed
in nine different
industry groups in
Europe. (AGR =
agriculture, MIN =
mining, MAN =
manufacturing, PS
= power supplies,
CON =
construction, SER
= service
industries, FIN =
finance, SPS =
social and personal
services, TC =
transport and
communications).
Country AGR MIN MAN PS CON SER FIN SPS TC
Belgium 3.3 0.9 27.6 0.9 8.2 19.1 6.2 26.6 7.2
Denmark 9.2 0.1 21.8 0.6 8.3 14.6 6.5 32.2 7.1
France 10.8 0.8 27.5 0.9 8.9 16.8 6 22.6 5.7
W. Germany 6.7 1.3 35.8 0.9 7.3 14.4 5 22.3 6.1
Ireland 23.2 1 20.7 1.3 7.5 16.8 2.8 20.8 6.1
Italy 15.9 0.6 27.6 0.5 10 18.1 1.6 20.1 5.7
Luxembourg 7.7 3.1 30.8 0.8 9.2 18.5 4.6 19.2 6.2
Netherlands 6.3 0.1 22.5 1 9.9 18 6.8 28.5 6.8
UK 2.7 1.4 30.2 1.4 6.9 16.9 5.7 28.3 6.4
Austria 12.7 1.1 30.2 1.4 9 16.8 4.9 16.8 7
Finland 13 0.4 25.9 1.3 7.4 14.7 5.5 24.3 7.6
Greece 41.4 0.6 17.6 0.6 8.1 11.5 2.4 11 6.7
Norway 9 0.5 22.4 0.8 8.6 16.9 4.7 27.6 9.4
Portugal 27.8 0.3 24.5 0.6 8.4 13.3 2.7 16.7 5.7
Spain 22.9 0.8 28.5 0.7 11.5 9.7 8.5 11.8 5.5
Sweden 6.1 0.4 25.9 0.8 7.2 14.4 6 32.4 6.8
Switzerland 7.7 0.2 37.8 0.8 9.5 17.5 5.3 15.4 5.7
Turkey 66.8 0.7 7.9 0.1 2.8 5.2 1.1 11.9 3.2
Bulgaria 23.6 1.9 32.3 0.6 7.9 8 0.7 18.2 6.7
Czechoslovakia 16.5 2.9 35.5 1.2 8.7 9.2 0.9 17.9 7
E. Germany 4.2 2.9 41.2 1.3 7.6 11.2 1.2 22.1 8.4
Hungary 21.7 3.1 29.6 1.9 8.2 9.4 0.9 17.2 8
Poland 31.1 2.5 25.7 0.9 8.4 7.5 0.9 16.1 6.9
Romania 34.7 2.1 30.1 0.6 8.7 5.9 1.3 11.7 5
USSR 23.7 1.4 25.8 0.6 9.2 6.1 0.5 23.6 9.3
Yugoslavia 48.7 1.5 16.8 1.1 4.9 6.4 11.3 5.3 4
Source: Euromonitor (1979, pp. 76-7) with the percentage employed in
finance in Spain reduced from 14.7 to the more reasonable figure of 8.5
Correspondence
analysis
Correspondence
analysis
WHY DO MULTIVARIATE DATA ANALYSIS?
1: Data simplification and data reduction - signal from noise
2: Detect features that might otherwise escape attention.
3: Hypothesis generation and prediction.
4: Data exploration as aid to further data collection.
5: Communication of results of complex data.
Ease of display of complex data.
6: Aids communication and forces us to be explicit.
The more orthodox amongst us should at least reflect that
many of the same imperfections are implicit in our own
cerebrations and welcome the exposure which numbers bring to
the muddle which words may obscure.
D Walker (1972)
7: Tackle problems not otherwise soluble. Hopefully better
science.
8: Fun!
General impressions are never to be trusted.
Unfortunately when they are of long standing they
become fixed rules of life, and assume a prescriptive
right not to be questioned. Consequently those who are
not accustomed to original inquiry entertain a hatred
and a horror of statistics. They cannot endure the idea
of submitting their sacred impressions to cold-blooded
verification. But it is the triumph of scientific men to
rise superior to their superstitions, to desire tests by
which the value of their beliefs may be ascertained,
and to feel sufficiently masters of themselves to discard
contemptuously whatever may be found untrue.

Francis Galton
Quoted from Quotes, Damned Quotes and...
compiled by J Bibby Edinburgh: John Bibby (Books)

TERMINOLOGY
Sample, object, individual sampling unit
Statistician Others
Single unit Sampling unit Sample
Collection of units Sample Sample set
Variable, character, attribute
Algorithms, methods, models, programs
Classification, clustering, partitioning, scaling, gradient analysis
[assignment, identification, discrimination]
[dissection]
Objective, repeatable
TYPES OF VARIABLES
1) Numeric, quantitative, continuous variables
3) Binary or dichotomous variables +/ (e.g. male, female)
4) Conditionally present variables
2) Nominal and ordinal variables (qualitative multistate)
Nominal disordered multistate (e.g. red, white, blue)
Ordinal ordered multistate (e.g. dry, moist, wet)
e.g. 3 species - A, B, C
Only A & B have petals
A pink petals
B white petals
A B C
Pink petals + - -
White petals - + - nominal disordered
No petals - - +
5) Mixed data see Lecture 12
Pollen data - 2 pollen types x 15 samples
Depths are in
centimetres, and
the units for
pollen
frequencies may
be either in
grains counted or
percentages.
Sample Depth Type A Type B
1 0 10 50
2 10 12 42
3 20 15 47
4 30 17 38
5 40 18 43
6 50 22 37
7 60 23 35
8 70 26 26
9 80 35 23
10 90 37 22
11 100 43 18
12 110 38 17
13 120 47 15
14 130 42 12
15 140 50 10
Samples
Variables
Adam (1970)
GEOMETRICAL MODELS
Palynological
representation
Geometrical
representation
ALTERNATE REPRESENTATIONS OF THE POLLEN DATA
In (a) the data are plotted as a standard diagram, and in (b) they
are plotted using the geometric model. Units along the axes may be
either pollen counts or percentages.
Adam (1970)
Geometrical model of a vegetation space
containing 52 records (stands).
A: A cluster within the cloud of points
(stands) occupying vegetation space.
B: 3-dimensional abstract vegetation
space: each dimension represents an
element (e.g. proportion of a certain
species) in the analysis (X Y Z axes).
A, the results of a classification approach
(here attempted after ordination) in which
similar individuals are grouped and
considered as a single cell or unit.
B, the results of an ordination approach in
which similar stands nevertheless retain
their unique properties and thus no
information is lost (X
1
Y
1
Z
1
axes).
N. B. Abstract space has no connection with
real space from which the records were
initially collected.
Concept of Similarity, Dissimilarity, Distance and Proximity
s
ij
how similar object i is object j
Proximity measure DC or SC
Dissimilarity = Distance
_________________________________
Convert s
ij
d
ij

s
ij
= C d
ij
where C is constant

( )
ij ij
s d = 1
) (
ij ij
s d = 1
) (
ij
ij
d
s
+
=
1
1
COMPUTING
In the 10 practicals, mainly use R, a public-domain statistical-
computing environment, rather than specific commercial
packages such as MINITAB or SYSTAT.
Relatively steep learning curve but worth it.
Recommend Fox (2002) An R and S-PLUS companion to applied
regression (Sage), Crawley (2005) Statistics An introduction
using R (Wiley), Crawley (2007) The R Book (Wiley), Everitt
(2005) An R and S-PLUS companion to multivariate analysis
(Springer), and Verzani (2005) Using R for introductory
statistics (Chapman Hall/CRC) as excellent guides.
Will also use specialised software for specific methods (e.g.
TWINSPAN, CANOCO and CANODRAW, C2, ZONE, etc.)
Computing practicals are an integral and essential part of the
course.
COURSE TOPICS
Introduction Lecture 1 -
Exploratory Data Analysis Lecture 2 Practical 1
Cluster Analysis Lecture 3 Practical 2
Regression Analysis Lectures 4 & 5 Practicals 3 & 4
Ordination (Indirect Gradient
Analysis)
Lecture 6 Practical 5
Constrained Ordination (Direct
Gradient Analysis)
Lecture 7 Practical 6
Calibration and Environmental
Reconstructions
Lecture 8 Practical 7
Classification Lecture 9 Practical 8
Analysis of Stratigraphical and
Spatial Data
Lecture 10 Practical 9
Hypothesis Testing Lecture 11 Practical 10
Overview and Future Developments Lecture 12 -
COURSE POWERP0INTS
In some of the lectures, some of the slides are
rather technical.
They are included for the sake of completion to
the topic under discussion.
They are for reference only and are marked REF

Anda mungkin juga menyukai