Anda di halaman 1dari 14

Appendix 2: Handling data

by Neil C Millar
THE ISSUES Sampling and accuracy Data analysis software Types of data Summarising data Displaying data Ecology statistics

Sampling and accuracy


In biology investigations, we are trying to nd out something about the natural world, such as how nitrates affect plant growth, or whether there is a link between atmospheric pollution and species diversity, or if a drug reduces pain. To nd out, we make measurements or observations, but were never going to be able to observe every single plant or every species or every single human. Instead, we make our measurements on a small sample of the total population, and hope that our conclusion from the sample can be applied to the whole population. How big should the sample be? Obviously a single measurement on one organism is not enough since the organism could by chance have unusual characteristics, or the measurement could simply go wrong. So we need several replicates (repeated measurements). Bigger samples (more replicates) generally give more reliable conclusions, but too many replicates make an investigation unwieldy and difcult to carry out. Indeed, one of the purposes of proper statistical analysis is to generate as much reliable information as possible from as little data as possible. A good rule of thumb is that for a reliable statistical analysis we should aim for at least 10 replicates of each measurement. If its easy to do more, then do so, and if 10 is difcult, then smaller samples are sometimes acceptable, but the bigger the sample the more reliable the conclusion. With investigations on living organisms, choosing the sample is also important. Since no two organisms are exactly alike, it is important to choose a random sample to even out individual differences. If possible, individuals should also be chosen so that they are similar for example, the same age, same sex, similar body mass, and so on. If the sample is random and big enough then any measurements and conclusion we make about the sample should apply to the whole population. The aim is for our measurements to be accurate that is, close to the true value, which is the value of a quantity if it could be measured perfectly, with no error. In biology, all measurements have some error so we can never know a true value for certain, but we can try to get close to it by minimising errors. There are two types of error.

Random errors are due to mistakes, inaccuracies or just natural variation. Random errors are always present, but can be minimised by taking many replicates. If the replicates of a measurement are close together (so they have a small range) then the random error is small and the measurements are said to be precise. Systematic errors are errors in one direction only due to poor technique, or faulty or poorly calibrated equipment (for example, a thermometer that always reads 1 C too high). Systematic errors cannot be improved by taking more replicates, and can only be mitigated by careful preparation and by using properly calibrated measuring devices. If similar results are obtained independently by other techniques or by other people then the systematic error is small and the results are said to be reliable.

Data that are both precise and reliable are presumed to be accurate.

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

APPENDIX 2: HANDLING DATA

Types of data
The measurements and observations from an investigation are called data (singular: datum). Data can be quantitative or qualitative. Quantitative data comprise any measurements in numeric form. This is the most common kind of scientic data, and includes measurements of quantities like length, mass, time, temperature, pH, concentration or speed, as well as counts of things like legs, cells, species or organisms. Quantitative data can be either continuous data (which can take any value), or discontinuous or discrete data (which can only be whole numbers). In practice, the distinction between continuous and discontinuous is rarely important, as all measurements are really discontinuous at some level (for example, a length might be measured to the nearest whole mm or m), and the same statistics and analyses can be applied to both. Quantitative data can also be classied as either normally distributed or non-normally distributed data. This distinction is more important since different techniques are used to analyse data in the two groups. The majority of measurements in biology are normally distributed (for example, height, mass, temperature, breathing rates, growth rate, blood cell counts, and so on), and give the familiar symmetrical bell-shaped curve of the normal distribution when replicates are plotted in a histogram (see Figure A2.12). But some kinds of data, such as arbitrary scales (like 15) and calculated data (like IQ scores or diversity indices), are unlikely to be normally distributed, and sometimes even simple measurements can deviate from a normal distribution curve. These non-normal data must be treated as ordinal data, where only the ranks (or order) of the values matter. While normally distributed data can be analysed using powerful parametric tests, ordinal data must be analysed using the less-powerful non-parametric statistics. How do you know whether your data are normally distributed or not? In practice, it often doesnt matter very much, and it is usually safe to assume that your measurements are normally distributed, unless youve been told otherwise. There are tests for normality, but they require a large number of replicates and are not particularly reliable. Qualitative (or categoric) data comprise observations using words rather than numbers. Sometimes the words can be placed in rank order for example, observations such as small/medium/large, the pain scores used by doctors and the ACFOR abundance scale used by ecologists. These categories can therefore be assigned numerical ranks, and treated as ordinal data, analysed by non-parametric tests. Categoric data that cannot be ranked are called nominal data (for example, colours, shapes, species). Its a little surprising that we can do statistics at all on such nominal data, but if a very large number of observations are made then the number of observations of each category can be counted to give frequencies, and special frequency tests can be used. Frequencies are often just called counts of things, but this can be very misleading as counts are quantitative data, not qualitative. The different kinds of data are summarised in Figure A2.1.
Figure A2.1 Different kinds of data.
quantitative data (numerical) normally distributed normally distributed data non-normally distributed, or ordinal data

qualitative data (categorical)

can be ranked

cannot be ranked nominal data

ordinal data

use parametric tests

use non-parametric tests

use frequency tests

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

APPENDIX 2: HANDLING DATA

Handling very large and very small numbers


Sometimes in science we have to deal with very large or very small numbers. There are two ways to deal with them. 1 Use prexes. All SI units can take these prexes in front of them to make them smaller or larger: 103, milli, m 106, micro, 109, nano, n 1012, pico, p 103, kilo, k 106, mega, M 109, giga, G 1012, tera, T The prexes increase or decrease by factors of a thousand, so by choosing the right prex, all values can be in the range 1999. For example:

10 mm instead of 0.01 m 2.56 MPa instead of 2 560 000 Pa 75 l instead of 0.075 ml

2 Use standard notation (or scientic notation) for example, 3.2 106 cells ml1. This is useful for doing maths on large or small numbers.

To multiply numbers together you add the powers for example, 103 106 109. To divide numbers you subtract the powers for example, 103 106 103.

You should be able to convert between standard notation and prex forms for example, 4 108 m 40 nm.

Notes on using units and prefixes


Try to avoid centi (c, 102) or deci (d, 101) for example, cm, dm. Theyre not factors of a thousand and cause confusion. Names of units are always spelt with a small letter, even if theyre named after scientists (for example, joule). Symbols do not need a full stop (like an abbreviation) or an s (like a plural) for example, 3 min not 3 mins. There should be a space between the value and its symbol (for example, 6 g not 6g). There is no space between a prex and a symbol (for example, 7 mN not 7 m N). Use a space to indicate thousands, not a comma (for example, 72 000 not 72,000). Use the index 1 for division in units, not a slash (for example, kJ m2 y1 not kJ/m2/y).

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

APPENDIX 2: HANDLING DATA

Using software for data analysis


Until recently all data analysis and statistical tests had to be carried out by hand using a calculator. But these days computer software can be used to carry out data analysis, more quickly and more accurately than by hand. A search of the internet will reveal a large range of statistical software available for biologists to use, from big professional packages like SPSS or Minitab, to completely free software, and many of these will be able to do the analysis described here. Many biologists also use Microsoft Excel for data analysis, which is an excellent generalpurpose spreadsheet, but lacks a number of useful statistical tests and charts. Merlin is an add-in for Excel, which allows Excel to do most of the statistical tests and charts a biology student might want to use. Functions like MEAN and ANOVA are simply typed into cells like any other Excel function. There are no calculations and no look-up tables, so hopefully no mistakes! Merlin can also be used for results tables and charts, which can be plotted quickly, accurately and neatly. Merlin also includes a test chooser, which suggests the correct test to apply following a series of simple questions. The Merlin package comes with an introduction to statistics for biology students, and worksheets with practice questions. Merlin is completely free, and is available from: www.heckgrammar.kirklees.sch.uk/index.php?p10310 If this URL should fail, try a search for merlinstatistics.

Summarising data
Having collected experimental data with a suitable number of replicates, the next step is to summarise the replicate data with one or two values, which we can plot on a chart. In general, we want the centre point of the replicates and an indication of the spread of the replicates (also known as the error) around the centre. Although some of these summary values can be calculated by hand, these days biologists usually use computer software for all their calculations.

The mean, standard deviation and confidence interval


Imagine an investigation into the effectiveness of a new fertiliser on the growth of potatoes. 20 equalsized plots are set up in a eld with 10 similar potato plants growing in each of them. 10 plots are chosen at random to be treated with the old fertiliser (A), while the other 10 plots are treated with an equal concentration of the new fertiliser (B). So there are 10 replicates for each fertiliser. After a xed length of time the potato crop is harvested and the total mass of potatoes from each plot (the yield) is measured. The yields for the 20 plots are shown in this Excel results table (Figure A2.2).
Figure A2.2

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

APPENDIX 2: HANDLING DATA We need to summarise all these data. We can assume that the yields are normally distributed, so the central value of the replicates is given by the mean (also called the arithmetic mean, or average). To calculate the mean for fertiliser A using Merlin (see page 4), cell B13 contains the formula MEAN(B3:B12). We really need to know how reliable this calculated mean is, so we must also calculate a measure of the error of the mean. The most common measures of spread or error are the standard deviation (SD) and the 95% condence interval (CI). Both values are calculated in this example: cell B14 contains the formula STDEV(B3:B12) and cell B15 contains the formula CI(B3:B12). These formulae are all dragged across to column C to quickly make the same calculations for fertiliser B. The SD indicates the spread or scatter of the replicates, while the CI indicates how accurate the calculated mean is. The CI has a simple meaning: we can be 95% condent that the range mean CI encloses the true mean. The condence interval illustrates an important point about sampling. It is tempting to say that the true mean yield for fertiliser A was 20.2 kg, but this would be wrong because weve only measured a fairly small sample of 10 plots, not every potato. Instead we should say that, from our sample, we are condent that the true mean yield for fertiliser A lies somewhere in the range 17.9 kg (20.2 2.3) to 22.5 kg (20.2 2.3). These values are called the lower condence limit and the upper condence limit respectively. The error values are often quoted together with the calculated mean, so we might say that the yield with fertiliser A was 20.2 2.3 kg. It is good practice to calculate a condence interval whenever a mean is calculated.

Error bars
Error values should also be shown on charts as error bars, to give a visual indication of the error of the mean. There is no standard for what error bars should be, so the gure legend should state whether the error bars are SD or CI (or something else). Again, the most useful is probably the CI, since it shows graphically the likely range of the true mean. In Figure A2.2, Merlin has been used to plot the mean yields on a bar chart, with the condence intervals shown as error bars. This makes the differences between the fertilisers easy to see. Although the bar for fertiliser B is taller than that for A, the error bars show that there is a good deal of uncertainty in both means. Since the condence intervals overlap, we cant be sure that fertiliser B is signicantly better than fertiliser A. In the A2 Biology course, we shall learn about statistical tests that can properly test for signicant differences like this, but in the meantime, the condence intervals give us a pretty good idea. The error bars can also be used to evaluate a conclusion, as this next example shows. It concerns an investigation into the effect of light intensity on the rate of photosynthesis. Four replicate measurements were taken at four different light intensities, and the means and 95% condence intervals were calculated using Merlin. A line graph is drawn of the mean rates against light intensity, with error bars showing the 95% CI of the means (Figure A2.3). This graph needs a smooth line of best t, since there must be a smooth continuous relation between light intensity and rate of photosynthesis. Now, we are condent that the true means lie somewhere within the 95% CI error bars, so any line of best t must pass through the errors bars. In this case a curved line seems to t the mean values best (solid line), but can we draw a different line through the error bars? In fact, a straight line can also be drawn through the error bars (broken line), so a linear relationship cant be ruled out.

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

APPENDIX 2: HANDLING DATA


Figure A2.3

The median
This next example shows an investigation into the abundance of brown algae on two rocky shores. Eight quadrats were randomly placed on a sheltered shore and eight on an exposed shore, and the abundance of algae in each quadrat was recorded. To make the data quicker and easier to collect, abundance is measured on a ve-point scale, with 5 being the most abundant. This kind of scale is an example of ordinal data, because its not an even scale (for example, 4 is not necessarily twice as abundant as 2). With ordinal data, its meaningless to calculate a mean or a CI, and instead the best measure of the central value is the median, which is the value with equal numbers of replicates above and below it. The spread of the data is given by the ve-point summary, which divides the data into four quartiles, each containing 25% of the data. Half of the data points are located between the rst and third quartiles, and this range (known as the interquartile range) can be used as a measure of the spread.
Figure A2.4

In the results table for the rocky shore investigation, the median is calculated with the Excel formula MEDIAN(B3:B10) in cell B11, and the interquartile range is calculated using the Excel formula QUARTILE(B2:B10,3)- QUARTILE(B2:B10,1) in cell B12. These formulae were then dragged across to column C to instantly make the same calculations for the sheltered shore (Figure A2.4).

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

APPENDIX 2: HANDLING DATA


Figure A2.5 A box plot.
4th quartile (maximum)

3rd quartile

2nd quartile (median)

1st quartile

0th quartile (minimum)

The ve-point summary is shown graphically as a box plot (or box and whisker plot), Figure A2.5. This shows the median as a central line, the interquartile range as a box, and the maximum and minimum values as whiskers. Unlike condence intervals of the mean, the box plot need not be symmetrical, and indeed an asymmetrical box plot is a good indicator that the replicates are not normally distributed. A box plot for the rocky shore investigation has been drawn using Merlin (see page 4) in Figure A2.4.

The mode
A third possible measure of the centre of a set of data is the mode, which is the most frequent replicate value. There can be more than one mode, if more than one value is equally frequent. The mode is rarely used in biology, except to indicate a bimodal distribution (one with two modes or peaks).

Displaying data
Almost all experimental data benet from being displayed as some kind of chart. Humans are visual animals, and we are more likely to understand a well-chosen chart than a table of numbers. The choice of a suitable chart depends on the type of data and the type of investigation. Scatter graphs and line graphs plot two numeric variables against each other, allowing trends between them to be observed. These graphs can be used for any kind of numeric data continuous, discrete, normally distributed or ordinal. Note that the term line graph in business and spreadsheet terminology refers to a different kind of graph where a variable is plotted at equal intervals (usually in a time series). When both variables are being measured, and you dont yet know whether there is a relation between them, plot a scatter graph (or scattergram). In Figure A2.6, mass of seeds and number of seeds produced were both measured for a number of plants to see if there was a link (or correlation) between them. Neither variable depends on the other, so there is no independent or dependent variable. The axes could be plotted either way round and no line is drawn.
Figure A2.6 A scatter graph.
16 mean mass per seed/mg 14 12 10 8 6 4 2 0 0 10 20 30 40 50 number of seeds

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

APPENDIX 2: HANDLING DATA When you know (or suspect) that one variable affects the other one, then plot a line graph. You usually perform a controlled experiment, varying one variable (the independent variable) while measuring the other (the dependent variable). Now the orientation does matter: the independent variable is always plotted on the horizontal (x) axis, while the dependent variable is on the vertical (y) axis. Figure A2.7 shows the results of an enzyme investigation in which the substrate concentration (independent variable) was varied while the rate of reaction (dependent variable) was measured. Here we know that there is a continuous underlying relationship between the independent and dependent variables, so we plot a smooth line of best t through the data (in this case a curved line).
Figure A2.7 A graph with a line of best fit.
9 8 rate of reaction/units 7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 substrate concentration/mM

If were not sure that there is a smooth relation, and so the intermediate values between the measured points are uncertain, we cant draw a smooth line of best t. Instead, we join the points with straight lines just to indicate that the points are linked in order, but were not making any assumptions about intermediate values. In the example in Figure A2.8, a patients temperature was measured every two hours throughout a day. The temperature could vary considerably between measurements. Sometimes the points can be omitted completely, giving just the jagged line.
Figure A2.8 A graph with no line of best fit.
temperature/C 39.5 39.0 38.5 38.0 37.5 37.0 36.5 36.0 00:00 02:00 04:00 06:00 08:00 10:00 12:00 14:00 16:00 18:00 20:00 22:00 00:00

time of day

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

APPENDIX 2: HANDLING DATA Bar charts plot a numeric dependent variable against a categoric independent variable. Bar charts allow differences between the categories to be observed. The bars may be horizontal or vertical (in which case it is sometimes called a column chart). There should be a gap between the bars, and the bars can be in any order, though sometimes it can help understanding if the bars are arranged in size order. In Figure A2.9, the rst example is a column chart, with the dependent variable plotted on the vertical axis. In the second example the categories have long names, so it is easier to read when plotted with horizontal bars.
Figure A2.9 Vertical and horizontal bar charts.
swamp and marsh tropical rain forest 25 20 15 10 tundra 5 desert 0 A B fertiliser C 0 10 20 30 40 50 60
2 1

temperate forest agricultural land coniferous forest temperate grassland

yield of potatoes/kg

70

net primary productivity/MJ m

Box plots are used as an alternative to bar charts where you want to show the range of replicates, or the ve-point summary (see page 7). They are especially appropriate when the dependent variable is not normally distributed. Like bar charts, box plots can be plotted horizontally or vertically. The two box plots in Figure A2.10 show pH measurements from eight samples of pond water in four different ponds shown as vertical and horizontal box plots.
Figure A2.10 Vertical and horizontal box plots.
8.0 pond 1 7.9 pond 2 pH 7.8 pond 3 7.7 pond 4 7.6 pond 1 pond 2 pond 3 pond 4 7.6 7.7 7.8 pH 7.9 8.0

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

10

APPENDIX 2: HANDLING DATA Mosiac charts plot two nominal variables against each other, allowing associations between them to be observed. This is the equivalent of the scatter graph for numeric data. In the example in Figure A2.11, a number of moth traps were set up in three different habitats, and the number of moths trapped of three species was recorded. The width of each bar is proportional to the frequency of that species, and the height of each row is proportional to the frequency of that habitat. Large boxes indicate an association between the two categories, so in this case moth A prefers woodland, moth B prefers elds and moth C prefers mountains.
Figure A2.11 Mosaic chart.

woodland

field mountain species A species B species C

Histograms are used to show the distribution of many replicate measurements of a single numeric variable (continuous or discrete). Histograms look like bar charts but the variable in question is plotted on the horizontal axis and the vertical axis always shows frequency (so histograms are sometimes called frequency histograms). If the variable is continuous it is divided into classes or bins, and the number of replicates in each class (the frequency) is tallied. The class size must be chosen to optimise the number of bars and their height. The histogram has no gaps between the bars to indicate the continuous scale, and the labels denote the boundaries between each class. In the example in Figure A2.12, the body temperature of 30 individuals was measured. Body temperature is a continuous variable, and the range has been divided into 6 classes each 0.3 C wide.
Figure A2.12 Histogram involving a continuous variable body temperature.
16 14 12 frequency 10 8 6 4 2 0 35.8 36.1 36.4 36.7 37.0 37.3 37.6

body temperature/C

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

11

APPENDIX 2: HANDLING DATA If the variable is discrete, the histogram looks more like a normal bar chart, with gaps between the bars and labels in the middle of each bar. The example in Figure A2.13 shows the number of worms found in 40 randomly placed quadrats (a discrete variable).
Figure A2.13 Histogram involving a discrete variable number of worms per quadrat.
frequency 12 10 8 6 4 2 0 1 2 3 4 5 6 7 8 worms per quadrat

A histogram can also be plotted as a frequency polygon to show the shape of the distribution more clearly. Instead of bars this has points joined by straight lines. Figure A2.14 shows the worm distribution as a frequency polygon.
Figure A2.14 Histogram plotted as a frequency polygon.
12 10 8 frequency 6 4 2 0 1 2 3 4 5 6 7 8 worms per quadrat

Pie charts can be used to show the distribution of many replicate measurements of a single categoric variable for example, the frequencies of different phenotypes in a genetic cross, shown in Figure A2.15. The size of each segment is proportional to its frequency. Pie charts are not very common in biology, although they are common in business.
Figure A2.15 Pie chart.
red

white

pink

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

12

APPENDIX 2: HANDLING DATA Kite charts (or kite diagrams) are used to display the abundance of different species along a transect. Species abundances are shown as kites, with the width of the kites being proportional to the abundance. Kite charts are often combined with line graphs and bar charts to display abiotic data along the transect as well. A good kite diagram makes it easy to identify trends along a transect and perhaps suggest which species are competing, or which species are adapted to certain abiotic factors. The abundances can be measured on any scale, such as number of individuals, percentage cover, or a 15 scale such as the ACFOR scale (Abundant, Common, Frequent, Occasional, Rare). Its often helpful to arrange the order so that species abundant at the start of the transect are plotted rst and species abundant at the end of the transect are plotted last, and some software can do this automatically. The kite graph in Figure A2.16 shows the distribution of various marine animals and plants on a transect down a rocky shore.
Figure A2.16 Kite chart.
gut weed bladder wrack spiral wrack coral weed barnacles ceramium limpets Irish moss saw wrack mussels 0 3 6 9 12 15 18 21 24 27 30 distance/m

Ecology statistics
This section describes two special statistical techniques used by ecologists: estimating the size of a population of animals, and measuring the diversity of a community.

Estimating populations
One of the most common measurements in ecology is measuring the population of a certain species in a habitat. If the organisms are sessile (they dont move) then populations can easily be measured by counting the individuals in a number of randomly placed quadrats and scaling up to the whole area, but if the organisms are mobile the measurement is much more difcult. Several techniques are available, but the simplest is the markreleaserecapture technique. The population estimate using this technique is known as the Lincoln-Petersen Index.

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

13

APPENDIX 2: HANDLING DATA The basic procedure is very simple: a number of animals are captured using a suitable random technique (nets or traps, for example), marked, released and some time later a second sample is captured. The Lincoln-Petersen Index is: N where: N estimate of the population (the Lincoln-Petersen Index) M number of individuals captured and marked in the rst sample C number of individuals captured in the second sample R number of marked individuals recaptured in the second sample In fact, this simple formula is biased and tends to produce an overestimate of the true population, especially for small samples, and a slightly different version is used by some software, including Merlin. There are several assumptions in the markreleaserecapture technique if it is to be at all accurate:

MC R

the population is closed that is, there is no change in population between the rst and second sampling the trapping is random that is, all animals have an equal chance of being caught the marks must last until the second sampling, and must not affect the animals catchability or survival the marked animals must be randomly distributed in the population by the time of the second sampling.

In practice, this means that the timing between the two samples is quite critical: long enough for the marked animals to mix, but not long enough for the population to change or the marks to come off. Suitable marks include rings on birds, tags on sh and mammals, and spots of paint on invertebrates (marks with UV security pens are very good since they are invisible to predators and investigators alike but show up under ultraviolet light). Even if all the criteria are met, the Lincoln-Petersen Index is still a very inaccurate estimate of the true population, so it is essential that an error value is also calculated, such as the 95% condence interval, which is analogous to the condence interval of the mean. There is a 95% chance that the true population size lies between the lower and upper condence limits, where the lower condence limit index 95%CI and the upper condence limit index 95%CI. It can be quite alarming to see just how wide the condence range is!
Figure A2.17 Results of a mark capturerecapture study.

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

14

APPENDIX 2: HANDLING DATA The example in Figure A2.17 shows the number of woodlice captured in a markrelease recapture study. The Merlin formula LINCOLN(B1:B3) is entered into cell B5 and the formula LINCOLNCI(B1:B3) is entered into cell B6. This shows that the Lincoln-Petersen population estimate is 217361 woodlice (289 72).

Measuring diversity
An important and interesting property of an ecosystem is its species variety or diversity. Communities with a large number of individuals of many different species are said to have high diversity, while those with only small populations of a few species have a low diversity. The concept of diversity is important to an understanding of habitats and conservation, and it can be linked to stability and degree of human impact. There are several different measures of species diversity, but one of the most common is the Simpson Diversity Index (D), which takes into account both species richness and species evenness. Simpsons Index varies from 0 up to the total number of species in the sample, with higher values indicating greater diversity.
Figure A2.18 Results of a study to calculate Simpson Diversity Index.

The example in Figure A2.18 shows the number of invertebrates captured in a three-minute kick-sample in a stream. Site 1 is upstream of a small town and site 2 is downstream, below a sewage outlet. Cell B17 contains the Excel formula SUM(B3:B16), cell B18 contains the Excel formula COUNT(B3:B16), and cell B19 contains the Merlin formula SIMPSON(B3:B16). These formulae were all dragged across to column C. In the example here, site 1 has a higher diversity than site 2 by all the different measures.

References and further reading


Ashcroft, S. and Pereira, C. (2003) Practical statistics for the biological sciences. Palgrave Macmillan. Ennos, R. (2000) Statistical and data handling skills in biology. Prentice Hall. Krebs, C. J. (1999) Ecological methodology (second edition). Addison Wesley. Sokal, R. R. and Rohlf, F. J. (1995) Biometry: the principles and practice of statistics in biological research (third edition). Freeman. Zar, J. H. (1999) Biostatistical analysis (fourth edition). Prentice Hall.

Edexcel Biology for AS Dynamic Learning CD-ROM

Hodder Education 2008

Anda mungkin juga menyukai