Anda di halaman 1dari 17

Statistics for Ecologists

Using R and Excel


Data collection, exploration,
analysis and presentation
Mark Gardener
DATA IN THE WILD SERIES
IeIagic IubIishing + vvv.eIagicubIishing.com
IubIished by IeIagic IubIishing
vvv.eIagicubIishing.com
IO ox 725, Ixeler, IX1 9QU
Slalislics for IcoIogisls Using R and IxceI

Dala coIIeclion, exIoralion, anaIysis and resenlalion


ISN 978-1-907807-12-1 (Ibk)
ISN 978-1-907807-13-8 (Hbk)
Coyrighl 2012 Mark Gardener

AII righls reserved. No arl of lhis documenl may be roduced, slored in a relrievaI sys-
lem, or lransmiued in any form or by any means, eIeclronic, mechanicaI, holocoying,
recording or olhervise vilhoul rior ermission from lhe ubIisher.
WhiIe every eorl has been made in lhe rearalion of lhis book lo ensure lhe accuracy
of lhe informalion resenled, lhe informalion conlained in lhis book is soId vilhoul
varranly, eilher exress or imIied. Neilher lhe aulhor, nor IeIagic IubIishing, ils agenls
and dislribulors viII be heId IiabIe for any damage or Ioss caused or aIIeged lo be caused
direclIy or indireclIy by lhis book.
Windovs, IxceI and Word and are lrademarks of lhe Microsoh Cororalion. Ior more
informalion visil vvv. microsoh.com. OenOce.org is a lrademark of OracIe. Ior more
informalion visil vvv.oenoce.org. AIe Macinlosh is a lrademark of AIe Inc. Ior
more informalion visil vvv.aIe.com.
!"#$#%& (#)"*"+ ,*$*-./0#1/ #1 20)-#3*$#.1 4*$*
A calaIogue record for lhis book is avaiIabIe from lhe rilish Library.
Cover image islockholo.com/duIezidar
About the author
Mark began his career as an olician bul relurned lo science and lrained as an ecoIogisl.
His research is in lhe area of oIIinalion ecoIogy. He has vorked exlensiveIy in lhe UK as
veII as AuslraIia and lhe Uniled Slales. CurrenlIy he vorks as an associale Ieclurer for lhe
Oen Universily and aIso runs courses in dala anaIysis for ecoIogy and environmenlaI
science.
Acknowledgements
I am eseciaIIy gralefuI lo NigeI Massen al IeIagic IubIishing for his heI and ersever-
ance lhroughoul lhe roduclion of lhis book.
Thanks go lo Anne Goodenough for alienlIy and lhoroughIy revieving lhe manuscril,
your commenls and vievs vere mosl heIfuI.
Wilh a book of lhis nalure dala examIes are aIvays usefuI. Some of lhe dala iIIuslraled
here vere coIIecled by sludenls and I gralefuIIy acknovIedge lheir eorls and send lhanks
for aIIoving me lo use lhese dala as examIes.
IinaIIy my hearlfeIl lhanks go lo Chrisline, for uuing u vilh me lhroughoul lhe enlire
rocess.
Software used
Various versions of Microsoh's IxceI

sreadsheel vere used in lhe rearalion of lhis


manuscril. Mosl of lhe examIes resenled shov version 2007 for Microsoh Windovs


aIlhough olher versions may aIso be iIIuslraled (incIuding IxceI X for AIe Macinlosh

).
SeveraI versions of lhe R rogram vere used and iIIuslraled incIuding 2.8.1. for Windovs
and 2.11.1 for Macinlosh: R Ioundalion for SlalislicaI Comuling, Vienna, Auslria. ISN
3-900051-07-0, URL hu://vvv.R-ro|ecl.org/.
Downloading free code examples
Iree code examIes and furlher informalion from lhe aulhor on using R and IxceI for
slalislics can be found al:
hu://vvv.eIagicubIishing.com/slalislics-for-ecoIogisls-resources.hlmI
Reader feedback
We veIcome feedback from readers Iease emaiI us al info+eIagicubIishing.com and
leII us vhal you lhoughl aboul lhis book. IIease incIude lhe book lilIe in lhe sub|ecl Iine
of your emaiI.
Publish with Pelagic Publishing
We ubIish scienlihc books lo lhe highesl ediloriaI slandards in aII Iife science disciIines,
vilh a arlicuIar focus on ecoIogy, conservalion and environmenl. IeIagic IubIishing ro-
duces books lhal sel nev benchmarks, share advances in research melhods and encourage
and inform viIdIife invesligalion for aII.
If you are inleresled in ubIishing vilh IeIagic Iease conlacl edilor+eIagicubIishing.
com vilh a synosis of your book, a brief hislory of your revious vriuen vork and a
slalemenl describing lhe imacl you vouId Iike your book lo have on readers.
Contents
Introduction viii
1. Planning 1
1.1The scienlihc melhod 1
1.2Tyes of exerimenl/ro|ecl 3
1.3Geuing dala using a sreadsheel 3
1.4Hyolhesis lesling 4
1.5Dala lyes 4
1.6SamIing eorl 7
1.7TooIs of lhe lrade 12
1.8The R rogram 13
1.9IxceI 19
2. Data recording 23
2.1CoIIecling dala vho, vhal, vhere, vhen 23
2.2Hov lo arrange dala 25
3. Beginning data exploration using software tools 29
3.1eginning lo use R 29
3.2ManiuIaling dala in a sreadsheel 37
3.3Geuing dala from IxceI inlo R 55
4. Exploring data looking atnumbers 57
4.1Summarising dala 58
4.2Dislribulion 61
4.3A numericaI vaIue for lhe dislribulion 67
4.4SlalislicaI lesls for normaI dislribulion 75
4.5Dislribulion lye 76
4.6Transforming dala 81
4.7When lo slo coIIecling dala` The running average 84
4.8SlalislicaI symboIs 88
vi | Contents
5. Exploring data which test isright? 91
5.1Hyolhesis lesling 91
5.2Choosing lhe correcl lesl 92
6. Exploring data using graphs 95
6.1IxIoralory grahs 95
6.2Grahs lo iIIuslrale dierences 98
6.3Grahs lo iIIuslrale Iinks 99
6.4Grahs a summary 102
7. Tests for diferences 103
7.1Dierences: !-lesl 103
7.2Dierences: "-lesl 112
7.3Iaired lesls 117
8. Tests for linking data correlations 123
8.1CorreIalion: Searman's rank lesl 123
8.2Iearson's roducl momenl 130
8.3CorreIalion lesls using IxceI 134
8.4CorreIalion lesls using R 139
8.5Curved Iinear correIalion 143
9. Tests for linking data associations 147
9.1Associalion: Chi-squared lesl 147
9.2Goodness of hl lesl 153
9.3Using R for Chi-squared lesls 154
9.4Using IxceI for Chi-squared lesls 157
10. Diferences between more than two samples 161
10.1Using R for more comIex slalislicaI anaIyses 161
10.2AnaIysis of variance 164
10.3KruskaIWaIIis lesl 186
11. Tests for linking several factors 195
11.1MuIliIe regression 195
11.2Curved-Iinear regression 212
12. Reporting results 239
12.1Iresenling hndings 239
12.2IubIishing 239
12.3Reorling resuIls of slalislicaI anaIyses 240
12.4Grahs 241
12.5More aboul grahs in R 273
12.6Worked examIe grah dala in R 296
12.7Grahs: a summary 309
12.8Wriling aers 310
Contents | vii
12.9IIagiarism 311
12.10References 312
12.11Iosler resenlalions 313
12.12Giving a laIk (IoverIoinl) 314
13. Summary 315
#$%&&'() 317
*+,-. 322
Introduction
This is nol |usl a slalislics lexlbook! AIlhough lhere are Ienly of slalislicaI anaIyses
here, lhis book is aboul lhe rocesses invoIved in Iooking al dala. These rocesses invoIve
Ianning vhal you vanl lo do, vriling dovn vhal you found and vriling u vhal your
anaIyses shoved. The slalislics arl is aIso in lhere of course bul lhis is nol a course
in slalislics. y lhe end I hoe lhal you viII have Iearnl some slalislics bul in a raclicaI
vay, i.e. /0'! &!'!2&!23& 3'+ ,% 4%( )%5. In order lo Iearn aboul lhe melhods of anaIysis, ve'II
use lvo main looIs: a Microsoh IxceI sreadsheel (aIlhough Oen Oce viII vork |usl
as veII) and a comuler rogram caIIed R. The sreadsheel viII aIIov you lo coIIecl your
dala in a sensibIe Iayoul and aIso do some basic anaIyses (as veII as a fev Iess basic ones).
The R rogram viII do much of lhe delaiIed slalislicaI vork (aIlhough ve viII aIso use
IxceI quile a bil). olh rograms viII be used lo roduce grahs. This book is nol a course
in comuler rogramming, ve'II Iearn |usl enough aboul lhe rograms lo 6-! !0- 7%8 ,%+-.
Il is imorlanl lo recognise lhal lhere is a rocess invoIved. This is lhe scienlihc rocess
and may be summarised by four main headings:
IIanning
Dala recording
Dala exIoralion
Reorling resuIls
The book is arranged inlo lhese four broad calegories. The seclions are ralher uneven in
size and lend lo focus on lhe anaIysis. The seclion on reorling aIso covers resenlalion
of anaIyses (e.g. grahs).
AIlhough lhe emhasis is on ecoIogicaI vork and many of lhe dala examIes are of lhal
sorl, I hoe lhal olher scienlisls and sludenls of olher disciIines viII see reIevance lo vhal
lhey do.
Mark Gardener 2011
6. Exploring data using graphs
Grahs are usefuI for severaI reasons. They can heI us lo visuaIise lhe dala and decide
vhich slalislicaI lesl is lhe besl. We may sol auerns in lhe dala and gain a beuer under-
slanding of vhal ve are deaIing vilh. Grahs are aIso usefuI for summarising our hnaI
resuIls, eseciaIIy vhen ve resenl our hndings lo olher eoIe.
We can lhink of grahs as being usefuI for lvo uroses: hrslIy lo heI us decide hov lo
lackIe lhe dala, and secondIy lo resenl resuIls. We viII Iook al delaiIs of grahs and hov
lo roduce lhem in IxceI and R in Seclion 12.4 vhere ve examine vays lo resenl
our hndings. We viII aIso menlion grahs lhroughoul lhe lexl as ve Iook al lhe various
anaIylicaI melhods lo examine our dala. Indeed ve have aIready seen some examIes in
Chaler 4. In lhis shorl chaler ve viII summarise lhe grahs ve mighl use lo heI us
exIore our dala.
6.1 Exploratory graphs
One of lhe mosl common anaIysis of samIe of dala is lo delermine if lhey are normaIIy
dislribuled or nol. This aecls lhe kind of slalislicaI anaIysis ve are abIe lo erform on lhe
dala. There are severaI vays ve can iIIuslrale lhe dislribulion of a dala samIe. We may
use a simIe laIIy Iol or a slemIeaf Iol, ve can even do lhis righl from our nolebook in
lhe heId. The foIIoving examIe shovs a slemIeaf Iol.
1 | 679
2 | 112334
2 | 5666678899
3 | 01124
3 | 6
In lhis examIe, lhe dala are sorled in numericaI order in each rov bul ve can sliII gain
insighls inlo lhe dala dislribulion if lhe numbers are nol sorled.
1 | 967
2 | 143123
2 | 9568667869
3 | 40121
3 | 6
A simIer version of a slemIeaf Iol is lhe laIIy Iol, and in lhis case ve enler lhe dala as
a simIe laIIy mark. In TabIe 28, ve see a laIIy Iol of lhe same dala as our slemIeaf Iol.
96 | Statistics for Ecologists Using R and Excel
Table 28. A tally plot to show data distribution
Tally Bin
x 16
x 18
x 20
xxx 22
xxx 24
xxxxx 26
xxx 28
xxx 30
xxx 32
x 34
x 36
These are simIe Iols bul neverlheIess can be exlremeIy heIfuI. When ve relurn from
lhe heId ve may decide lo use a more formaI hislogram lo iIIuslrale lhe dislribulion
(Iigure 78).
Figure 78. A histogram to illustrate the distribution of a data sample
The size of lhe bars in our hislogram shovs us lhe number of ilems (lhe frequency) of our
dalasel lhal Iie vilhin each size cIass, reresenled on lhe .-axis. We may decide lo use a
Iine inslead of bars and lhe resuIl is a densily Iol (Iigure 79).
6. Exploring data using graphs | 97
Figure 79. A density plot to illustrate the distribution of a data sample
Some lyes of grah are usefuI because lhey shov a Iol of informalion in a comacl man-
ner such as lhe boxvhisker Iol. A boxvhisker Iol shovs us hve ieces of informalion:
median, maximum, minimum and bolh quarliIes (Iigure 80).
Figure 80. A boxwhisker plot can be used to illustrate data distribution as well as provid-
ing other information, e.g. median, inter-quartiles and max/min
In Iigure 80, ve can see lhal lhe dala aear normaIIy dislribuled as lhe boxvhiskers are
symmelricaI aboul lhe median slrie. We can use lhe boxvhisker Iol lo Iook al severaI
samIes and iIIuslrale nol onIy dierences belveen samIes bul lheir dislribulion as veII
(Iigure 82).
98 | Statistics for Ecologists Using R and Excel
Anolher vay ve can visuaIise our dala is by using a Iine grah lo shov lhe running average
(mean or median). We mel lhis earIier in Seclion 4.7 vhere ve used lhe idea lo heI deler-
mine if ve had coIIecled enough dala. In Iigure 81, ve see an examIe of a running mean.
Figure 81. A line graph illustrating the running mean
This is anolher examIe of a grah ve can skelch vhiIsl oul in lhe heId. We do nol have lo
be quile so exacl vhen ve are oul in lhe heId, lhe grah is simIy a looI lo heI us make a
decision.
6.2 Graphs to illustrate diferences
When ve have a ro|ecl lhal is cenlred on Iooking al dierences belveen samIes ve can
iIIuslrale lhe silualion using bar charls or boxvhisker Iols. We mel lhe boxvhisker Iol
reviousIy (Iigure 80) vhen ve used il lo viev a samIe and check ils dislribulion. In
Iigure 82 ve Iook al lhree samIes.
Figure 82. A boxwhisker plot illustrating diferences between three samples
6. Exploring data using graphs | 99
We can see lhe dierences belveen lhe lhree samIes fairIy easiIy and in addilion ve can
gain some insighl inlo lhe dislribulion. A common aIlernalive lo lhe boxvhisker Iol is
lhe bar charl. This is usefuI lo shov dierences belveen ilems in dierenl calegories and
is lherefore suilabIe lo iIIuslrale dierences in samIes. In Iigure 83 ve see lhe same dala
as in Iigure 82 bul here ve use a bar charl vilh slandard error bars lo shov lhe variabiIily
vilhin each samIe.
Figure 83. A bar chart illustrating diferences between three samples
We can see from Iigure 82 lhal lhere are dierences belveen lhe lhree samIes unIike in
Iigure 83 vhere ve cannol leII anylhing aboul lhe dislribulion.
6.3 Graphs to illustrate links
When ve lhink of vays lo Iink dala logelher lhere are lvo main aroaches. In one
aroach, ve have lvo sels of vaIues, bolh are numeric and one reresenls a deendenl
variabIe and lhe olher an indeendenl variabIe. We are Iooking for a correIalion. In lhe
olher kind of aroach, ve have calegories of ilems and ve are Iooking lo associale one
sel of calegories vilh lhe olher.
6.3.1 Graphs to illustrate correlations
When ve are Iooking for correIalions, ve can besl iIIuslrale lhe silualion using a scauer
Iol, lhis aIIovs us lo see hov one variabIe is reIaled lo lhe olher. In Iigure 84 ve see a
scauer Iol shoving hov lhe abundance of a freshvaler inverlebrale is reIaled lo lhe
seed of lhe valer in vhich il Iives.
100 | Statistics for Ecologists Using R and Excel
Figure 84. A scatter plot illustrating a correlation
In lhis case, il aears as lhough as lhe valer seed increases so does lhe abundance of lhe
inverlebrale. We do nol knov if lhis reIalionshi is slalislicaIIy signihcanl bul il gives us
an imression. When ve have severaI indeendenl variabIes ve can Iol severaI scauer
Iols, lhis may heI us decide vhich is lhe mosl imorlanl faclor lo consider (Iigure 85).
Figure 85. Multiple scatter plots showing one dependent variable plotted against several
independent variables
In Iigure 85 ve can see lhal lvo of lhe indeendenl variabIes shov a more dehnile lrend
lhan lhe olhers, one shovs a osilive correIalion and lhe olher a negalive one (aIlhough al
lhis oinl ve do nol knov if eilher is slalislicaIIy signihcanl).
6. Exploring data using graphs | 101
6.3.2 Graphs to illustrate associations
When ve have calegoricaI variabIes, ve have various choices. We can disIay lhe dala for
each rov or coIumn calegory as a ie charl (e.g. Iigure 86), lhis viII usuaIIy require severaI
ie charls lo be roduced (one for each rov or coIumn calegory, deending on hov ve
vanl lo Iook al lhe dala). The ie charl shovs lhe dala roorlionaIIy, each sIice of ie
shovs lhe conlribulion as a roorlion of lhe lolaI.
Figure 86. A pie chart illustrating categorical data. The proportions of common bird species
in a garden habitat
When ve have lhis kind of dala ve can aIvays reresenl il in lhe form of a bar charl
inslead. The advanlage of lhe bar charl is lhal ve can shov severaI calegories al one lime
(Iigure 87).
Figure 87. A bar chart illustrating categorical data. The number of common garden birds in
various habitats
102 | Statistics for Ecologists Using R and Excel
In Iigure 87 ve can see various bird secies and various habilals, in lhis case ve have aIso
incIuded a Iegend on lhe grah so lhe reader can idenlify lhe various bars more easiIy.
6.4 Graphs a summary
There are quile a fev dierenl sorls of grah lhal ve can uliIise lo heI visuaIise our dala
and make imorlanl decisions aboul lhe anaIylicaI aroach (TabIe 29). We shouId aIso
use grahs lo iIIuslrale our dala, vhich can make lhem more comrehensibIe lo readers.
When ve resenl grahs ve shouId ensure lhey are fuIIy IabeIIed and as cIear as ossibIe.
Iven vhen ve use grahs for our ovn use il is good raclice lo IabeI and lilIe lhem fuIIy.
LabeI axes and incIude lhe unils.
Do nol incIude loo many dierenl eIemenls on a singIe grah avoid cIuuer and if neces-
sary roduce lvo grahs ralher lhan one.
Give a main lilIe exIaining vhal lhe grah shovs. UsuaIIy lhis is done as a calion in a
vord rocessor. The calion shouId enabIe a reader lo undersland vhal lhe grah shovs
vilhoul having lo read lhe main lexl. If your grah is in your heId nolebook lhen make
sure you describe lhe grah so lhal someone eIse can undersland il.
Table 29. Summary of graph types to use for diferent purposes
Purpose Types of graph
Illustrating distribution Stemleaf plot, tally plot, histogram, density
chart, boxwhisker plot
Illustrating diferences between samples Bar chart, boxwhisker plot
Illustrating correlations Scatter plot
Illustrating associations Pie charts, bar charts
Illustrating sample sizes Line plot of running average (mean or median)
We viII examine grahs in more delaiI in Chaler 12, vhich viII aIso cover he resenlalion
of resuIls. Seclions 12.4.1 and 12.5 viII deaI vilh roducing grahs in R and Seclion 12.4.3
viII cover roducing grahs in IxceI. We viII aIso make some references lo grahs in each
of lhe seclions deaIing vilh lhe delaiIs of lhe various anaIylicaI melhods. Il is imorlanl lo
remember lhal our grahicaI anaIysis shouId go aIongside lhe malhemalicaI one.

Anda mungkin juga menyukai