Anda di halaman 1dari 16

Statistical Sciences

STA2007F Project 1

Regression Macrofauna diversity of sandy


beaches

Group Members
Profesor Shabangu(shbpro003), Karabo Pookgwadi(pkgkar001), Bangizwe
Dlamini(dlmban006)

September 7, 2018
Abstract
This report summarizes the statistical modeling and analysis of the results associ-
ated with the data from a fictional study on the species richness of intertidal macro-
fauna found on sandy beaches of the west coast in the western cape. Based on the
given data, our task was to analyze it and come up with the statistical model(s) which
best describe(s) the relationship between the response variable (species richness) and
various environmental factors which are believed to have a significant impact on the
species richness. We used statistical methods such as multiple linear regression and
model selection in R, to predict our best models. According to our first model (model
1), there exists a positive correlation between the species richness and the width of the
beach, as one would expect a variety of species or organisms in a large space of habitat.
In our second model, explanatory variables include the Sea surface temperature, this
also makes sense, since higher speciation rates are expected in warmer conditions as a
consequence of increased metabolic rates. However, the relationship between the rich-
ness and the surface temperature is not that strong, because different animals prefer
different temperature, so this may add insights to the response.

Introduction
We want to study on the species richness of intertidal macrofauna found on sandy beaches
of the West Coast in the Western Cape, thus observe different factors that influence
because ecosystem and see how human affect the ecosytem. We have reasons to believe
that untainted(undisturbed) beaches with high phytoplankton biamass and small distance
to the kelp bed should have high number of species

Data and Method


We had number(60) of samples of the same size(richness) from a random sample of
different beaches. Then collect data of variable that we think affects species richness
of the beaches. Variables include: The coarseness of sand, width of the beach, beach
face slope,average height of waves, wave period(seconds), Sea surface temperature, con-
struction disturbance, fishing/bait collection, distance to nearest kelp bed, phytoplankton
biomass(chlorophyll concentration mg/m3 ) and beach type classification.

The assumption is that there’s linearity between our variables and richness of the beaches,
thus will perform a multiple regression to explain the inflence of these variables to species
richness. We eight models which we think will explain the richness of species.

1
model 1: Since more phytoplankton biomass generally implies food for variete of
species,this should impact amount of species that can occupy the area.
Variables: beach.width + dist.to.kelp + phyto.bio+ beach.width.

model 2:
Direction beach faces changes species that live there. Bait collected means potentially
more species able to live there.
Variable: slope + bait.collect + beach.type
odel 3: Sea temperature affects beaches differently.
Variables: beach.type + SST + wave.period + wave.height wave period + wave height
effect different beaches richness differently.
model 4: wave period, wave height and phytoplankton mass should affect richness,
and have different effects at different beaches.
Variables: beach.type + wave.period + phyto.bio + wave.height
model 5:
Richness can be explained by width of the beach because there’s big space for many or-
ganism to share, there’s interaction between average height of waves and wave period, the
closer the beach to thhe kelp bed the changes because this large brown algae are the cen-
tre for a large ecosystem, of course the phytoplankton biomass plays a role in attracting
different organisms to the place of high concetration, the coarseness of sand and beach
face slope.

model 6: Our explanatory variables are richness will be Sea surface temperature
because different animals prefer different temperature, so this may add insights to the
response, disturbance by human contruction which we know usually affect ecosystem of
that particular environment, fishing or bait collection can also be a factor in determining
richness of a beach.

model 7 Here we we pick what variable from model 1 and model 2 that are consid-
ered defining for marine life of many organisms, i.e phytoplankton biomass, distance to
kelp bed because type classification, disturbed beaches.

model 8 In case model 3 has overlooked some factors that can affect richness so we
added bait collection in model 4 to see if i can help use understand more richness.

2
Analysis and Results
Model 1:

Output: r.squared = 0.4706 ; adj.r.squared = 0.4423

Figure 1: Scatterplot matrix for model 1

Figure 2: Diagnostic plots for model 1

3
Model 5:

Output: r.squared = 0.5871462 ; adj.r.squared = 0.5223849

Figure 3: Scatterplot matrix for model 5

4
Output: it doesn’t seem to be too bad

Figure 4: Diagnostic plots for model 5

5
Model 6:

Output: r.squared = 0.4601 ; adj.r.squared = 0.3755

Figure 5: Scatterplot matrix for model 6

6
Figure 6: Diagnostic plots for model 6

7
Model 7:

Output: r.squared = 0.4601 ; adj.r.squared = 0.3755

Figure 7: Scatterplot matrix for model 7

8
constant ei varriance: chisquared = 1.154508 ; p-value = 0.2826074

Figure 8: Diagnostic plots for model 7

9
Model 8:

Output: r.squared = 0.4601 ; adj.r.squared = 0.3755

Figure 9: Scatterplot matrix for model 8

10
constant ei varriance: chisquared = 1.208716 ; p-value = 0.2715865

Figure 10: Diagnostic plots for model 8

11
check normality using Shapiro-Wilk test :
model 7:W = 0.98688 and p − value = 0.7671(lot of evidence for normality)
model 8: W = 0.98749 and p − value = 0.7971(even more evidence for normality)
From these diagnostic plots above, we can tell that model 7 and model 8 have constant
variance, show evidence for normality and don’t have influential points with the least
residuals.

Akaike’s Information Criterion(AIC) table:

models −2 × loglik K AIC ∆ AIC w Adusted R2


1 367.7320 5 392.64869 31.343071 1.37551×10−7 0.4422620
2 383.8241 6 399.8798 38.574295 3.060065×10−9 0.3803871
3 343.3055 7 410.1200 48.814442 1.828568×10−11 0.2760084
4 343.3055 7 403.4927 42.187130 5.025897×10−10 0.3517187
5 367.7320 10 387.7320 26.426461 1.329227×10−6 0.5223849
6 383.8241 10 403.8241 42.518518 4.258471×10−10 0.3754657
7 343.3055 9 361.3055 0.000000 0.7278343 0.6882257
8 343.3055 10 363.2729 1.967335 0.2721643 0.6822855

Analysis of variance/deviance Table

models Res.Df RSS Df Sum of Sq F Pr(> F )


1 56 2067.2
2 55 2255.5 1 -188.32
3 54 2587.6 1 -332.04
4 54 2317.0 0 270.59
5 51 1612.2 3 704.80 7.4319 0.0003197
6 51 2108.1 0 -495.92
7 52 1073.0 -1 1035.07
8 51 1072.4 1 0.58 0.0185 0.8924184

Independence: We formally check independence using the Durbin-Watson. Where


large p-value suggest no evidence against independence.

From the tables we get further evidence that model 7 and 8 are there better mod-
els, where model 7 has the better AIC,RSS and adjusted R2 and on diagnostic plots we
can see that model 7 and 8 are the best models we have.

12
There is evidence for autocorrelation and independence. Durbin-Watson test =1.9938 ,p-value = 0.7537

Figure 11: Test for independence of model 7

There is evidence for autocorrelation and independence. Durbin-Watson test =1.9919 ,p-value = 0.7347

Figure 12: Test for independence of model 8

13
Conclusions and Discussion
Model 1, can explained 44% richness, this confirm that beach type,phytoplankton biomass
and distance (m) to nearest kelp bed are autocorrelated to species richness.The Akaike
weight is very small which provide evidence against this model to be the best one.
Model 2, we get that bait collection and slope on a certain beach type has autorrelation
to species richness but is not adequate to explain most of the species variability in the
beach. Model 3 and model 4 only have one variable different but the adjusted R2 increase
significant on model 4 from 28% to 35%, from here we can infer SST has less correlation
to species richness than phyto.bio.
Model 5, the interaction between wave.period and wave.height increased the adusted
2
R from 38% in model1 to 52% but the interation between this variable has big p value
which means it may not be a good model us.
Model 6, in this model most of the parameters have big p-value only beach type has
small p-valaue, and the adjusted R2 is significantly smaller even though there are many
parameter for this model. This tell us that SST,bait collection and disturbed has small
influence in beach species richness.
Model 7 and 8 are models with the highest adjusted R2 69% and 68% respectively. since
the only different parameter between this models is bait collection,and adding decreased
the explanatory strength of the model this suggest bait collection does not significant in
explaining species richness. the p-values for parameter of model 7 are all > 0.01 which is
evidence for their influence.

From this analysis we can now conclude that model 7 is the best model. It’s evidence
w7
ratio compare to the other model; w1 = 5.29 × 106 compare to model 1, w7
w4 = 1.448 × 10
9
w7 5 w7 9
compare to model 4, w5 = 5.4756 × 10 compare to model 5, w6 = 1.707 × 10 compare to
model 6 and w7 w8 = 2.67 compare to model 8. This shows that model7 is more likely than
all the other models. We thus conclude that model 7 is the best model, which explain 69%
variability in species richness at the beach. Since we have evidence that it has constant
error variance,it’s indenpendent and has normality, we can say this is a good regression
model and the parameters:phyto.bio + dist.to.kelp + beach.width + slope + beach.type +
disturbed are best fit to explain species richness in the beaches.

14
References
The work presented is not my own, I have borrowed from many sources:

R Core Team (2013). R: A language and environment for statistical computing.R Foun-
dation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org/.

B. Erni,R. Altewegg, T. Photopoulou, Regression. Course Notes for STA2007H,2018

15

Anda mungkin juga menyukai