Analysis of differences in species composition - Biology by hcj

VIEWS: 5 PAGES: 64

									 1.) Alpha diversity –the diversity at a
  single point
 2.) Beta diversity – the diversity in a
  habitat
 3.) Gamma diversity – the diversity in
  a landscape
Organism      # found        pi            ln(pi)       pi*ln(pi)
Water boatman       1   0.047619048    -3.044522438   -0.144977259
Hempitera           1   0.047619048    -3.044522438   -0.144977259
Water strider       2   0.095238095    -2.351375257   -0.223940501
Bivalvia            8   0.380952381    -0.965080896   -0.367649865
Gastropoda          3   0.142857143    -1.945910149   -0.277987164
Shiner              5   0.238095238    -1.435084525   -0.341686792
Sunfish             1   0.047619048    -3.044522438   -0.144977259
                   21                                 -1.646196099
                                      H'               1.646196099
   Simpson’s Diversity Index is 1-D where




               Organism            n      n(n-1)
               Water boatman            1      0
               Hempitera                1      0
               Water strider            2      2
               Bivalvia                 8     56
               Gastropoda               3      6
               Shiner                   5     20
               Sunfish                  1      0
               Sum                    21      84
                                             0.2
                               Simpson's     0.8
   Diversity indices provide a summary of
    richness and evenness by combining these
    two facets of diversity into a single statistic.
   Common ones are Shannon-Weiner index,
    Simpson’s index and the log alpha index.
   library(BiodiversityR)
   data(dune)
   Diversity.1 <- diversityresult(dune, index =
    'Shannon', method = 's')
   Diversity.2 <- diversityresult(dune, index =
    'Simpson', method = 's')
   Diversity.3 <- diversityresult(dune, index =
    'Logalpha', method = 's')
   cbind(Diversity.1, Diversity.2, Diversity.3)
   Note that it is not
    statistically
    meaningful to use
    parametric or not
    parametric statistics
    to determine if sites
    differ “significantly”
    from each other in
    terms of diversity
                                                   n = 47 + 35 + 7 + 5 + 3
                                                  +2
 Michigan     fi       fi log fi      fi log2 fi   n = 99
    Oak      47    78.58859932     131.4078286
   Corn      35    54.04238155      83.4451144
Blackberry    7     5.91568628     4.999334881
  Beech       5    3.494850022     2.442795335
  Cherry      3    1.431363764     0.682934075
   Other      2    0.602059991     0.181238117
    Louisiana    fi      fi log fi      fi log2 fi
       Oak      48    80.69957939    135.6754607
       Pine     23    31.31974023    42.64896209
      Grape     11    11.45531954    11.92948597
      Corn      13    14.48126358    16.1313073
    Blueberry    8    7.224719896    6.524572197
      Other      2    0.602059991    0.181238117
                                                n = 48 + 23 + 11 + 13 +
                                               8+2
                                                n = 105

   Basic diversity indices are often used to
    measure alpha diversity
   The function diversity in the vegan library
    finds the most commonly used diversity
    indices.
   library(vegan)
   data(BCI)
   H <- diversity(BCI)
   H
    ◦ This will give you the Shannon-Weiner diversity
      index for each site.
   To measure Simpson’s index of diversity, just
    use
   H.simpson <- diversity(BCI, index =
    'simpson')
   H.simpson
   Beta diversity typically relies upon
    dimensionless metrics based on species
    dissimilarity or turnover between samples.
                                         S = # of specimens
                                          C = shared species


      A       B                      A          B
                   0.25

                               D            E
       C                              C

     Site 1                        Site 2
A         B        A             B

              D              E
C                     C




A A       B       A A        B

    A A                A A
C                 C
   For distances/dissimilarity indices, the
    smaller the value, the less the difference in
    species composition.
   Distance/dissimilarity matrices provide
    information on ecological
    distances/dissimilarities between all pairs of
    sites within your data.
   Under Euclidean distance, you can use each
    species as an axis.
   Constrained to be between 0 and 1
    ◦ Bray-Curtis
    ◦ Kulczynski
   Values can be larger than 1
    ◦ Euclidean distance
    ◦ Hellinger distance
    ◦ Chi-square distance
   Presence/absence
    ◦ Jaccard
    ◦ Sorenson
   Most of the measures of distance/dissimilarity
    are sensitive to the numbers of individuals
    encountered
   In many cases, it is worthwhile to transform the
    data into proportions
   Species proportions for each site are known as
    species profiles.
   library(BiodiversityR)
   data(dune)
   community.hel <- disttransform(dune, method =
    'hellinger')
   hellinger.distance <- vegdist(community.hel,
    method = 'euclidean')
   hellinger.distance
   You may want to test for a correlation among
    the values of an environmental gradient and
    ecological distance matrices.
   For this, you will use a Mantel test – a test of
    the correlation between two matrices
   data(dune.env)
   envir.euc <- vegdist(dune.env$A1, method =
    'euclidean')
   ecology.bray <- vegdist(dune, method =
    'bray')
   mantel(envir.euc, ecology.bray, 'pearson')
   The correlation is fairly low (r = 0.2379) but
    the results are significant (p = 0.042).
   The function betadiver in vegan estimates any
    of the 24 indices of beta diversity reviewed by
    Koleff et al. (2003)
   library(vegan)
   data(sipoo)
   What are the variables?
   ?sipoo
   Where A & B are the number of species in
    samples A & B and C is the number of species
    shared by the two samples
   d.sor <- betadiver(sipoo, "sor")
   d.sor
   The ANOSIM (ANalysis Of SIMilarity) test is
    very useful for comparing among categorical
    variables (e.g. management techniques)
   anosim(ecology.bray, dune.env$Management)
   Hierarchical clustering analysis can provide a
    method for examining how similar sites are
    to each other.
   library(cluster)
   distmatrix <- vegdist(dune, method = 'bray')
   Cluster.1 <- agnes(distmatrix, method =
    'single')
   This is an
    example of
    agglomerative
    clustering
   It starts by
    treating each site
    as a cluster of 1
   The nearest two
    sites are joined to
    form a cluster.
    The process
    continues until a
    cluster has been
    created that
    contains all the
    sites
   Divisive hierarchical clustering starts at the
    top and splits that level.
   The splitting continues until each observation
    has been split out.
   Cluster.4 <- diana(distmatrix)
   plot(Cluster.4, which.plot = 2)
   Often species
    variation
    follows multiple
    gradients rather
    than just one, in
    which case
    ordination is
    typically more
    appropriate
   The clustering techniques so far shows the
    ecological distance among clusters rather than
    among all the sites.
   Cophenetic correlation is the correlation between
    the distance of the original distance matrix and
    the cophenetic distance (the distance at which
    clusters are combined in a dendrogram).
   If the cophenetic correlation is large, than the
    distance portrayed in the dendrogram is a good
    representation of distances between individual
    sites
   copheneticdist <- cophenetic(Cluster.1)
   mantel(distmatrix, copheneticdist)
   Cafeillo (Faramea
    occidentalis) is a small
    tree ranging from Cuba
    through Trinidad and
    central Mexico through
    Brazil
   Imagine that we want to investigate the
    abundance of this species on Barro Colorado
    Island (Panama)
   We hypothesize that abundance is affected by
    precipitation

                            Image from
                            http://www.fs.fed.us/global/iitf/pdf/shrubs
                            /Faramea%20occidentalis.pdf
   library(BiodiversityR)
   data(faramea)
   What are the names of the variables?




   Plot counts of the species against
    precipitation
   Count.model1 <- lm(Faramea.occidentalis ~
    Precipitation, data = faramea, na.action =
    na.exclude)
   summary(Count.model1)
   fitted(Count.model1)
   residuals(Count.model1)
   Residuals are the part of the data that weren’t
    modeled and so shouldn’t show any pattern
   Download and install the library akima
   library(akima)
   surface.gam <-
    residualssurface(Count.model1,
    na.omit(faramea), 'UTM.EW', 'UTM.NS', gam =
    T, npol = 2, plotit = T, bubble = F, fill = T)
   This creates a trend-surface that models the
    changes in the response variable
    (i.e. the residuals)
   Residuals are
    highest in the
    center, decreasing
    as you move away
    indicating a
    spatially-
    autocorrelated
    population
   The assumptions
    of a linear
    regression model
    aren’t met here.
   Generalized linear models (GLMs) fit a wider,
    more general class of models and can cope
    with situations where observations are not
    normally distributed
   GLMs are characterized by two functions
    ◦ The link function describes how the mean of the
      response variable depends on linear predictors (the
      explanatory variables)
    ◦ The variance function captures how the variance of
      the response variable depends upon the mean
   Count.model3 <- glm(Faramea.occidentalis ~
    Precipitation, family = poisson,
    data=faramea, na.action=na.exclude)
   summary(Count.model3)
   sum(resid(Count.model3, type = "pearson")
    ^2, na.rm=TRUE) /41

   If this is >1, it is overdispersed
   Count.model4 <- glm(Faramea.occidentalis ~
    Precipitation, family = quasipoisson,
    data=faramea, na.action=na.exclude)
   summary(Count.model4)
   The negative binomial GLM is often used
    where dispersion is >1 or where organisms
    are clumped (and not randomly distributed)
   This is VERY useful in ecology!
   library(MASS)
   Count.model5 <-
    glm.nb(Faramea.occidentalis ~ Precipitation,
    maxit = 5000, init.theta = 1, data = faramea,
    na.action = na.exclude)
   summary(Count.model5)
   Generalized additive models (GAMs) are more
    general than GLMs.
    ◦ They are based on smoothing.
    ◦ Because of this smoothing, the relationships with
      the explanatory variables are no longer linear
   Let’s fit a negative binomial GAM to the tree
    data
   Count.model6 <- gam(Faramea.occidentalis
    ~ s(Precipitation), family = poisson(), data =
    na.omit(faramea))
   summary(Count.model6)
   Let’s try including precipitation squared in
    the negative binomial GLM
   Count.model7 <-
    glm.nb(Faramea.occidentalis ~ Precipitation +
    I(Precipitation^2), maxit = 5000, init.theta =
    1, data = faramea, na.action = na.exclude)
   summary(Count.model7)
Species will often have
unimodel distributions, so
a 2nd-order polynomial
model will often be used
   Does the age of the forest affect the
    presence/absence of Cafeillo?
   One of the easiest ways to begin exploring
    hypotheses about presence/absence data is
    with contingency tables
   table1 <-
    table(faramea$Faramea.occidentalis>0,
    faramea$Age.cat)
   table1
   Presabs.1 <- chisq.test(table1)
   Presabs.1$observed




   Presabs.1$expected


   Presabs.1
   Presabs.model2 <-
    glm(Faramea.occidentalis>0 ~ Age.cat, family
    = binomial(link = logit), data = faramea,
    na.action = na.exclude)
   summary(Presabs.model2)
   Because of the logit link, you would need to
    calculate out y = exp(x)/(1+exp(x)) to get the
    estimates for each site.
   Or you can type
    ◦ predict(Presabs.model2, type = 'response', se.fit =
     T)
   The model performance where presence
    depends upon forest age wasn’t very
    impressive
   Let’s try a quasibinomial model
   Presabs.model3 <-
    glm(Faramea.occidentalis>0 ~ Age.cat, family
    = quasibinomial(link=logit), data = faramea,
    na.action = na.exclude)
   summary(Presabs.model3)
   Earlier, we saw that precipitation and
    precipitation squared may have an effect on the
    numbers of Cafeillo.
   Other variables might potentially have an effect
    as well – geology, age of the forest, elevation,
    etc.
   Let’s see what’s important!
   Presabs.model6 <- glm(Faramea.occidentalis>0
    ~ Precipitation + I(Precipitation^2) + Geology +
    Age.cat + Elevation + I(Elevation^2), family =
    binomial(link = logit), data = faramea, na.action
    = na.exclude)
   summary(Presabs.model6)
   Rather than having a model that includes all
    variables (and may overfit!), it may be
    worthwhile to figure out which variables you
    can remove to better balance simplicity (fewer
    variables) and explained deviance
   drop1(Presabs.model6, test = 'Chi')
   Presabs.model7 <-
    glm(Faramea.occidentalis>0
    ~ Precipitation +
    I(Precipitation^2) + Geology
    + Age.cat + Elevation, family
    = binomial(link = logit), data
    = faramea, na.action =
    na.exclude)
   summary(Presabs.model7)

								
To top