197

Document Sample
197 Powered By Docstoc
					Predicting food web connectivity
Phylogenetic scope, evidence thresholds, and
intelligent agents




         Cynthia Sims Parr
         Ecological Society of America
         Memphis, TN August 8, 2006
ELVIS: Ecosystem Localization,
Visualization, and Information System
                            Oreochromis niloticus
                                Nile tilapia


                   Bacteria
                   Microprotozoa
                                         Food web
      Species list Amphithoe longimana
                                         constructor
      constructor Caprella penantis
                   Cymadusa compta               ?
                   Lembos rectangularis
                   Batea catharinensis
                   Ostracoda
                   Melanitta
                   Tadorna tadorna . . .
ELVIS’s Food Web Constructor
predicts basic network structure




 Prelude to systems models
 Food Web

             node

                    S
               link


                    G
Evolutionary tree
         A


   G                  taxon

                      step

              S       taxon
     Evolutionary Distance Weighting
1.   Set distance thresholds
2.   Find relatives of target nodes X, Y with known link status
     E.g. relative A is close to X, relative B close to Y
     where Link Value between A and B is known
3.   For each found link, compute weight based on distance
                                               1
      WeightAB 
                   1  ( DistanceXA  PenaltyXA )  ( DistanceYB  PenaltyYB )
4.   Compute certainty index for a predicted link by combining
     weighted link values, with a discount for negative
     evidence
                                      N
                                   weighti
            CertaintyIdxXY               ( LinkValuei )
                             i 1 discount
Food web database

         Source          Webs     Nodes        Links

 Animal Diversity Web       n/a       711          2165
 EcoWEB                    212       4503         11967
 Webs on the Web            19       1373         12056
 Interaction Web DB         26       2139          9882
 Tuesday Lake                 2       101           510
           Total           259       8827         36580
                                    4600 distinct taxa

Food web data: Cohen 1989, Dunne et al. 2006, Vazquez 2006,
   Jonsson et al. 2005
Evolutionary tree: Parr et al. 2004. + plants from ITIS +
   hierarchy of non-taxonomic nodes
    Testing the algorithm
    Take each web out of the database
    Attempt to predict its links
    Compare prediction with actual data

Accuracy percentage of all predictions that are correct
     89%
Precision percentage of predicted links that are correct
     55%
Recall percentage of actual links that are predicted
     47%
Choosing parameters

   30 web subsample
   Representative of habitats, years, # nodes,
    percent identified to species
   Iterate over parameter settings
   Tradeoff between
                     Precision
     percentage of predicted links that are correct

                        Recall
      percentage of actual links that are predicted
        Evolutionary distance threshold
        2 steps up and 4 steps down

precision                                   recall

  0.6                                0.5


 0.55                                0.45

  0.5                                 0.4

 0.45                                0.35
                                                                  S4
  0.4                           S3                                steps
                                      0.3
        1     2   3
                           S1               1    2             S1 down
                       4                             3     4
            steps up                            steps up
Evolutionary direction penalty
not very sensitive
                                         1
WeightAB 
             1  ( DistanceXA  PenaltyXA )  ( DistanceYB  PenaltyYB )



                                  ancestor



                                       descendent




                       siblings
Negative evidence discount is sensitive
                              N
                                  weighti
           CertaintyIdxXY               ( LinkValuei )
                            i 1 discount

     0.7


     0.6
                Recall
     0.5


     0.4
               Precision
     0.3


     0.2
           0         25            50           75         100
                       Negative evidence discount
Results over all webs
Is evolutionary distance weighting
better than strict database search?
                                 Database search
    100     ***                  Evolutionary distance
                                 weighting
     80                ***
%    60

     40                          ***
     20

      0
          Accuracy   Precision   Recall        Paired T-tests
                                                   df=251
                                                ***p<0.001
 Database search is more precise, but
 evolutionary distance wt has better recall.
Older webs contribute

Recall percentage of actual links that are predicted
   47%  48% with no EcoWEB data

Precision percentage of predicted links that are correct
  55%  39% with no EcoWEB data
recent webs are bigger                                                                   large webs have fewer
                                                                                            unknown “taxa”
                           180
                                                                                              90
                           160
                                                                                              80




                                                                             % taxa unknown
        number of taxa

                           140
                                                                                              70
                           120                                                                60
                           100                                                                50
                            80                                                                40
                            60                                                                30
                            40                                                                20
                                                                                              10
                            20
                                                                                               0
                            0
                            1910     1930    1950     1970   1990   2010                            0   50        100         150   200

                                            Year of study                                                    Number of Taxa



large webs have better                                                       …but large webs are
 taxonomic resolution                                                        harder to predict
                          120                                                                 1.2
% identified to species




                          100                                                                  1

                           80

                                                                           Recall rate
                                                                                              0.8

                           60                                                                 0.6

                           40
                                                                                              0.4
                           20
                                                                                              0.2
                            0
                                                                                               0
                                 0     50           100      150     200
                                                                                                    0   50        100         150   200
                                            Number of taxa                                                   Number of taxa
Some phyla are easier to predict
than others
               0.8
               0.7
               0.6
 Recall rate




               0.5
               0.4
               0.3
               0.2
               0.1
                0
                      a




                                              a
                                 a




                                                       a



                                                                  a
                    id




                                           yt
                               od




                                                       at



                                                                sc
                  el




                                         ph



                                                     rd
                             op




                                                              lu
                nn




                                                   ho



                                                            ol
                                         io
                          hr
               A




                                      ar




                                                            M
                                                  C
                          rt


                                  i ll
                      A


                                ac
                               B




                                                  Phylum
How can we do better predicting links?

Trait space distance weighting

Euclidean distance in natural history
N-space

Parameterize functions from the literature that might predict
links using characteristics of taxa. For example, size or
stoichiometry.
  LinkStatusAB= ƒ(α, sizeA, sizeB), ƒ(β, stoichA, stoichB) …


…need more data
ETHAN
Evolutionary Trees and Natural History ontology
                Animal Diversity Web
                http://www.animaldiversity.org
                   geographic range
                   habitats
                   physical description
                   reproduction
                   lifespan
                   behavior and trophic info
                   conservation status




                Triples
                “Esox lucius” hasMaxMass “1.4 kg”
                “Esox lucius” isSubclassOf “Esox”
                “Esox” eats “Actinopterygii”
UMBC Triple Shop
Query
What are body masses of fishes that eat fishes?

Enter a SPARQL query
SELECT DISTINCT ?predator ?prey ?preymaxmass ?predatormaxmass
WHERE {
     ?link rdf:type spec:ConfirmedFoodWebLink .
     ?link spec:predator ?predator .
     ?link spec:prey ?prey .
     ?predator rdfs:subClassOf ethan:Actinopterygii .
     ?prey rdfs:subClassOf ethan:Actinopterygii .
      OPTIONAL { ?predator kw:mass_kg_high ?predatormaxmass }   .
      OPTIONAL { ?prey kw:mass_kg_high ?preymaxmass }
      }


                     . . . leaving out the FROM clause
UMBC Triple Shop
Create a dataset
Find semantic web docs that can answer query.


            Esox_lucius.owl
                 webs_publisher.php?
                 published_study=11

                         Actinopterygii.owl




                          http://swoogle.umbc.edu
UMBC Triple Shop
Get results
Apply query to dataset with semantic reasoning.




          http://sparql.cs.umbc.edu/tripleshop2/
    Summary
   Food Web Constructor uses evolutionary
    approach and large databases
   We chose parameters using subsample
   Explored results over entire database
       Evolutionary distance weighting recalls links better
        than database search
       Older webs are useful
       Large webs harder to predict
       Some phyla are easier than others to predict
   For future algorithms, we can gather and
    integrate data via ontologies and intelligent
    agents
http://spire.umbc.edu
UMBC: Tim Finin, Joel Sachs, Andriy Parafiynyk, Li
Ding, Rong Pan, Lushan Han, UMCP: David Wang,
RMBL: Neo Martinez, Rich Williams, Jennifer
Dunne, UC Davis: Jim Quinn, Allan Hollander

UMMZ Animal Diversity Web: Phil Myers, Roger
Espinosa

UMCP: Bill Fagan, Bongshin Lee, Ben Bederson
   ETHAN workflow
                                             Keywords       Keywords
                                             HTML           OWL


                                  XSLT
                                  template
                        Filters                                          ETHAN
                                             ADW
                                                                         Taxon
                                             taxon acct
                                                                         acct
                                             HTML
                                                                         OWL
                  ADW
                  database
                  MySQL


Others                                       Acct
                                             data
                             Animal                         Taxon Path
                             name            tabular
                                                            OWL
                             tree            text
 ITIS

         Plants, etc.


                                                            Phylum-
                                             Evolutionary   sized
                          SPIRE              Tree side
                          taxon                             ET
                                             of ontology    chunk
                          database           OWL
                          MySQL                             OWL
                               Info. Retrieval Agents
                               Food Web Constructor
                        UMBC   Evidence Provider


    U Maryland
                                            UC Davis
Semantic Web Tools
                                           Species List
                                           constructor

             Semantic Prototypes In
      NASA
     Goddard    Ecoinformatics
  Invasive Species                      Rocky Mtn
  Forecasting System                     Bio Lab
  Remote Sensing Data                        Food Webs
                                        Ecological Interaction
                                             Ontologies
  Food Web Constructor example
  Nile Tilapia in St. Marks
                              http://spire.umbc.edu/fwc
Question
What are potential
predators and prey of
Oreochromis niloticus in
the St. Marks estuary in
Florida?
Procedure
Submit species list for St.
Marks, with Oreochromis
niloticus added.
Food Web Constructor generates
possible links
Evidence provider gives details
Nile tilapia – what organisms
     could be impacted?
    Implications:
    parameterized functions

LinkPredictedCD = ƒ(α , sizeC,sizeD) + ƒ(β , stoichC,stoichD)


       Requires good data for target species
       Can incrementally add natural history functions to
        get better estimate, try different functions from
        literature or use genetic algorithms
       Parameterizing functions: multivariate statistics,
        machine learning, fuzzy inference
       Could use evolutionary info if you localize
        parameter estimates to clades or taxonomic
        subsets
    Distance weighting options

               Evolutionary
                
          2 steps
                    Uses phylogeny or classification or
X                   combination of these – assumes
                    related organisms like each other
    Y
                   Distance could be branch length or #
                    steps
                   Does not need natural history data
  Ontologies
  Richer way to design databases: instances of
  concepts that have well-defined meanings and
  formal relationships.
“Higher Taxon”                                “Taxon A”
                                             “Taxon A” lives in
                                              “TaxonA”
lives in “Australia”            TaxonA        “Australia”
                       is-a                  hasAgeOfSexualMaturity
                                              hasBreedingDuration
  HigherTaxon                                “1 year”
                                              “5 months”
                       is-a                   “Taxon B” lives in
                                TaxonB
                                              “Australia”


 Reproductive     is-a        Breeding   has-a   Breeding Duration
 Characteristic               Season

                  is-a        Sexual             Age of Sexual
                                         has-a
                              maturity           Maturity

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:2/18/2012
language:
pages:32