VIEWS: 5 PAGES: 64 POSTED ON: 1/22/2013 Public Domain
1.) Alpha diversity –the diversity at a single point 2.) Beta diversity – the diversity in a habitat 3.) Gamma diversity – the diversity in a landscape Organism # found pi ln(pi) pi*ln(pi) Water boatman 1 0.047619048 -3.044522438 -0.144977259 Hempitera 1 0.047619048 -3.044522438 -0.144977259 Water strider 2 0.095238095 -2.351375257 -0.223940501 Bivalvia 8 0.380952381 -0.965080896 -0.367649865 Gastropoda 3 0.142857143 -1.945910149 -0.277987164 Shiner 5 0.238095238 -1.435084525 -0.341686792 Sunfish 1 0.047619048 -3.044522438 -0.144977259 21 -1.646196099 H' 1.646196099 Simpson’s Diversity Index is 1-D where Organism n n(n-1) Water boatman 1 0 Hempitera 1 0 Water strider 2 2 Bivalvia 8 56 Gastropoda 3 6 Shiner 5 20 Sunfish 1 0 Sum 21 84 0.2 Simpson's 0.8 Diversity indices provide a summary of richness and evenness by combining these two facets of diversity into a single statistic. Common ones are Shannon-Weiner index, Simpson’s index and the log alpha index. library(BiodiversityR) data(dune) Diversity.1 <- diversityresult(dune, index = 'Shannon', method = 's') Diversity.2 <- diversityresult(dune, index = 'Simpson', method = 's') Diversity.3 <- diversityresult(dune, index = 'Logalpha', method = 's') cbind(Diversity.1, Diversity.2, Diversity.3) Note that it is not statistically meaningful to use parametric or not parametric statistics to determine if sites differ “significantly” from each other in terms of diversity n = 47 + 35 + 7 + 5 + 3 +2 Michigan fi fi log fi fi log2 fi n = 99 Oak 47 78.58859932 131.4078286 Corn 35 54.04238155 83.4451144 Blackberry 7 5.91568628 4.999334881 Beech 5 3.494850022 2.442795335 Cherry 3 1.431363764 0.682934075 Other 2 0.602059991 0.181238117 Louisiana fi fi log fi fi log2 fi Oak 48 80.69957939 135.6754607 Pine 23 31.31974023 42.64896209 Grape 11 11.45531954 11.92948597 Corn 13 14.48126358 16.1313073 Blueberry 8 7.224719896 6.524572197 Other 2 0.602059991 0.181238117 n = 48 + 23 + 11 + 13 + 8+2 n = 105 Basic diversity indices are often used to measure alpha diversity The function diversity in the vegan library finds the most commonly used diversity indices. library(vegan) data(BCI) H <- diversity(BCI) H ◦ This will give you the Shannon-Weiner diversity index for each site. To measure Simpson’s index of diversity, just use H.simpson <- diversity(BCI, index = 'simpson') H.simpson Beta diversity typically relies upon dimensionless metrics based on species dissimilarity or turnover between samples. S = # of specimens C = shared species A B A B 0.25 D E C C Site 1 Site 2 A B A B D E C C A A B A A B A A A A C C For distances/dissimilarity indices, the smaller the value, the less the difference in species composition. Distance/dissimilarity matrices provide information on ecological distances/dissimilarities between all pairs of sites within your data. Under Euclidean distance, you can use each species as an axis. Constrained to be between 0 and 1 ◦ Bray-Curtis ◦ Kulczynski Values can be larger than 1 ◦ Euclidean distance ◦ Hellinger distance ◦ Chi-square distance Presence/absence ◦ Jaccard ◦ Sorenson Most of the measures of distance/dissimilarity are sensitive to the numbers of individuals encountered In many cases, it is worthwhile to transform the data into proportions Species proportions for each site are known as species profiles. library(BiodiversityR) data(dune) community.hel <- disttransform(dune, method = 'hellinger') hellinger.distance <- vegdist(community.hel, method = 'euclidean') hellinger.distance You may want to test for a correlation among the values of an environmental gradient and ecological distance matrices. For this, you will use a Mantel test – a test of the correlation between two matrices data(dune.env) envir.euc <- vegdist(dune.env$A1, method = 'euclidean') ecology.bray <- vegdist(dune, method = 'bray') mantel(envir.euc, ecology.bray, 'pearson') The correlation is fairly low (r = 0.2379) but the results are significant (p = 0.042). The function betadiver in vegan estimates any of the 24 indices of beta diversity reviewed by Koleff et al. (2003) library(vegan) data(sipoo) What are the variables? ?sipoo Where A & B are the number of species in samples A & B and C is the number of species shared by the two samples d.sor <- betadiver(sipoo, "sor") d.sor The ANOSIM (ANalysis Of SIMilarity) test is very useful for comparing among categorical variables (e.g. management techniques) anosim(ecology.bray, dune.env$Management) Hierarchical clustering analysis can provide a method for examining how similar sites are to each other. library(cluster) distmatrix <- vegdist(dune, method = 'bray') Cluster.1 <- agnes(distmatrix, method = 'single') This is an example of agglomerative clustering It starts by treating each site as a cluster of 1 The nearest two sites are joined to form a cluster. The process continues until a cluster has been created that contains all the sites Divisive hierarchical clustering starts at the top and splits that level. The splitting continues until each observation has been split out. Cluster.4 <- diana(distmatrix) plot(Cluster.4, which.plot = 2) Often species variation follows multiple gradients rather than just one, in which case ordination is typically more appropriate The clustering techniques so far shows the ecological distance among clusters rather than among all the sites. Cophenetic correlation is the correlation between the distance of the original distance matrix and the cophenetic distance (the distance at which clusters are combined in a dendrogram). If the cophenetic correlation is large, than the distance portrayed in the dendrogram is a good representation of distances between individual sites copheneticdist <- cophenetic(Cluster.1) mantel(distmatrix, copheneticdist) Cafeillo (Faramea occidentalis) is a small tree ranging from Cuba through Trinidad and central Mexico through Brazil Imagine that we want to investigate the abundance of this species on Barro Colorado Island (Panama) We hypothesize that abundance is affected by precipitation Image from http://www.fs.fed.us/global/iitf/pdf/shrubs /Faramea%20occidentalis.pdf library(BiodiversityR) data(faramea) What are the names of the variables? Plot counts of the species against precipitation Count.model1 <- lm(Faramea.occidentalis ~ Precipitation, data = faramea, na.action = na.exclude) summary(Count.model1) fitted(Count.model1) residuals(Count.model1) Residuals are the part of the data that weren’t modeled and so shouldn’t show any pattern Download and install the library akima library(akima) surface.gam <- residualssurface(Count.model1, na.omit(faramea), 'UTM.EW', 'UTM.NS', gam = T, npol = 2, plotit = T, bubble = F, fill = T) This creates a trend-surface that models the changes in the response variable (i.e. the residuals) Residuals are highest in the center, decreasing as you move away indicating a spatially- autocorrelated population The assumptions of a linear regression model aren’t met here. Generalized linear models (GLMs) fit a wider, more general class of models and can cope with situations where observations are not normally distributed GLMs are characterized by two functions ◦ The link function describes how the mean of the response variable depends on linear predictors (the explanatory variables) ◦ The variance function captures how the variance of the response variable depends upon the mean Count.model3 <- glm(Faramea.occidentalis ~ Precipitation, family = poisson, data=faramea, na.action=na.exclude) summary(Count.model3) sum(resid(Count.model3, type = "pearson") ^2, na.rm=TRUE) /41 If this is >1, it is overdispersed Count.model4 <- glm(Faramea.occidentalis ~ Precipitation, family = quasipoisson, data=faramea, na.action=na.exclude) summary(Count.model4) The negative binomial GLM is often used where dispersion is >1 or where organisms are clumped (and not randomly distributed) This is VERY useful in ecology! library(MASS) Count.model5 <- glm.nb(Faramea.occidentalis ~ Precipitation, maxit = 5000, init.theta = 1, data = faramea, na.action = na.exclude) summary(Count.model5) Generalized additive models (GAMs) are more general than GLMs. ◦ They are based on smoothing. ◦ Because of this smoothing, the relationships with the explanatory variables are no longer linear Let’s fit a negative binomial GAM to the tree data Count.model6 <- gam(Faramea.occidentalis ~ s(Precipitation), family = poisson(), data = na.omit(faramea)) summary(Count.model6) Let’s try including precipitation squared in the negative binomial GLM Count.model7 <- glm.nb(Faramea.occidentalis ~ Precipitation + I(Precipitation^2), maxit = 5000, init.theta = 1, data = faramea, na.action = na.exclude) summary(Count.model7) Species will often have unimodel distributions, so a 2nd-order polynomial model will often be used Does the age of the forest affect the presence/absence of Cafeillo? One of the easiest ways to begin exploring hypotheses about presence/absence data is with contingency tables table1 <- table(faramea$Faramea.occidentalis>0, faramea$Age.cat) table1 Presabs.1 <- chisq.test(table1) Presabs.1$observed Presabs.1$expected Presabs.1 Presabs.model2 <- glm(Faramea.occidentalis>0 ~ Age.cat, family = binomial(link = logit), data = faramea, na.action = na.exclude) summary(Presabs.model2) Because of the logit link, you would need to calculate out y = exp(x)/(1+exp(x)) to get the estimates for each site. Or you can type ◦ predict(Presabs.model2, type = 'response', se.fit = T) The model performance where presence depends upon forest age wasn’t very impressive Let’s try a quasibinomial model Presabs.model3 <- glm(Faramea.occidentalis>0 ~ Age.cat, family = quasibinomial(link=logit), data = faramea, na.action = na.exclude) summary(Presabs.model3) Earlier, we saw that precipitation and precipitation squared may have an effect on the numbers of Cafeillo. Other variables might potentially have an effect as well – geology, age of the forest, elevation, etc. Let’s see what’s important! Presabs.model6 <- glm(Faramea.occidentalis>0 ~ Precipitation + I(Precipitation^2) + Geology + Age.cat + Elevation + I(Elevation^2), family = binomial(link = logit), data = faramea, na.action = na.exclude) summary(Presabs.model6) Rather than having a model that includes all variables (and may overfit!), it may be worthwhile to figure out which variables you can remove to better balance simplicity (fewer variables) and explained deviance drop1(Presabs.model6, test = 'Chi') Presabs.model7 <- glm(Faramea.occidentalis>0 ~ Precipitation + I(Precipitation^2) + Geology + Age.cat + Elevation, family = binomial(link = logit), data = faramea, na.action = na.exclude) summary(Presabs.model7)