Quantitative aspects of DNA barcode matching in ... - Barcode of Life

Document Sample
Quantitative aspects of DNA barcode matching in ... - Barcode of Life Powered By Docstoc
					Optimising DNA
barcode regions
      Freek T. Bakker
      Nationaal Herbarium Nederland, Wageningen University branch,
      Biosystematics Group, Wageningen UR
      The Netherlands
    Structure of this talk

1. DNA barcoding, CBOL, GenBank
2. Non-COI protocol
3. Models in DN barcode matching
DNA barcoding

“Using molecular data as
species diagnostics isn’t
new, but global
standardization and scale of
implementation are”
 CBOL Structure
        Member Organizations

Secretariat        Executive
  Office           Committee

         Working          Scientific Advisory
         Groups                 Board
   Uses of DNA Barcodes
1. Establish reference library of barcodes from
   identified voucher specimens
2. If necessary, revise species limits
3. Then:
      Identify unknowns by searching against
      reference sequences
      Look for matches (mismatches) against ‘library
      on a chip’
      Before long: Analyze relative abundance in
      multi-species samples
 Reference versus micro-
BARCODE reference records
– Adhere to data standards
– Bidirectional reads, 500+ bp long
– Linked to voucher, species name
Query barcode records
–   Used in BLAST or other searches
–   Often single pass reads
–   Often very short – 100+ bp for good IDs
–   Can cost less than $2, take less than 6 hours
DNA barcode default
The Consortium for the Barcode of Life (CBOL) has so far
accepted the 648 base-pair ‘Folmer region’ of COI
(mitochondrial encoded cytochrome oxidase 1) as the default
DNA barcode region for vertebrates and insects and promotes
its use in as many other clades as possible.

The International Nucleotide Sequence Database Collaboration
(INSDC, consisting of GenBank, the European Molecular
Biology Laboratory and the DNA Data Bank of Japan) has
adopted the data standards proposed by CBOL for BARCODE
data records, and has empowered CBOL to decide which gene
regions can be given BARCODE status.
 CBOL  GenBank

New CO1
          Data standards

How many DNA barcodes do
we need, or, what’s ahead?
 1.7 x 106 described species
 10 barcodes per species
 ~20 x 106 barcodes of 650bp each
 10 x 106 more eukaryote species to go
 ~100 x 106 more barcodes of 650bp each
 In total this would be ~65,000,000,000 bp
 This is twice the total amount of bp currently in
 To be completed within the decade
                                         (Hajibabaei & al., 2005)
Optimal DNA barcodes
Barcoding gap: high inter-specific, low intra-
specific sequence divergence
Universal amplification/sequencing with
standard primers
Technically simple to sequence
Short enough to sequence in one reaction
Easily alignable (few insertions/deletions)
Readily recoverable from museum or herbarium
samples and other degraded samples


                                                              Resolving Species Through DNA Barcoding




                                     CO1 COI Divergence in Eukaryotes


20 % aa

                                         divergence in eukaryotes

                 Land Plants
  CBOL  GenBank

                     Data standards

non-COI barcode regions
 COI alone will not do
 mtDNA evolution too variable across major
 Other faults (e.g. heteroplasmy, introgression,
 COI not present – e.g. Rubinoff & al. 2007)
 rDNA: ITS, D3/D4, cpDNA: rpoC1, rpoB, matK
 Multiple barcodes
CBoL’s non-CO1 protocol
 Protocol, to be used as guideline, available now
 Reject CO1 as suitable region for clade of
 Propose alternative region based on required
 evidence as documented
 Barcode gap?
 NJ tree
 Multiple regions?
The DNA barcode gap

From Meyer & al. PLoS Biology 2004   From Hebert & al. PLoS Biology 2004
DNA barcode gap

           From Van Velzen & al. NEV 2007
     DNA barcode gap
                         20%                                      Shapes indicate species pair
minimum inter- and                                                comparisons; colors indicate
                                                                  gene regions
maximum intra
species divergence     Minimum
                                      Barcode Gap

However, in             between
                      species pairs
‘paraphyletically’                                     1:1 line

clustered barcodes:                                                       No Barcode Gap

intra > inter
divergence!                           Maximum intraspecific divergence (%)
Agave (Agavaceae) rpoB                                                        Tortella (Bryophyta) rpoC1
Cowan & al.                                                                                                    Hedderson & al.
                                                                                              Hampeella pallens
                                Agave angustiarum AG5649
                                                                                   Tortella flavovirens SL6
                                            1           Agave rhodacantha AG8333
                                                                                   Tortella flavovirens Netherlands
                                 Agave salmiana AGSN                                 Pleurochaete squarrosa SL1
                                                                                        Tortella nitens SL11
                                 Agave convallis AG7306
                                                                                    Tortella densa SL10
                                 Agave petrophila AG5811
                                                                                   Pleurochaete luteola SL2
                                  Agave stricta AGSN                                          Tortella tortuosa Austria

      Furcraea longeva AG6055                                                        Tortella fragilis SL7
                                                                                       Tortella fragilis SL8
                1               Furcraea aff guatemalensis AG
                                                                                     Tortella bambergii SL3
   Agave striata spp striata AGSN                                                     Tortella inclinata SL9

   Agave aff kerchovei JIC24709                                                      Tortella arctica SL4
                                                                                     Tortella arctica SL4
    Agave datylio AG8274
                                                                                     Tortella arctica SL5
     Agave titanota JICSN                                                                  Tortella humilis France

   Agave triangularis AG5818                                                       Pleurochaete luteola Peru
                                                                                   Pleurochaete squarrosa Arabia
   Agave victoria reginae AGSN
                                                                                   Pleurochaete squarrosa France
   0.1 changes                                   0.005 substitutions/site
      Rejection of CO1
Reject CO1 as suitable region for clade of
Propose alternative region based on required
evidence, i.e.
  Pattern of intra- and interspecific variation
  Resolving power

  Universality

Document the number of primer pairs needed to
succesfully PCR amplify & identify species
throughout the clade of interest
Protocols will be adopted for a period of 6 months during which
CBOL is open to suggestions for their improvement from the
CBOL will normally expect publication of evidence for
effectiveness of proposed non-COI barcode region(s) in a peer-
reviewed publication prior to submission of a proposal
Prior peer review and publication will support the proposal’s
claims and will inform the community of the proposed barcode
Upon approval by CBOL’s Executive Committee, INSDC will be
informed immediately and BARCODE status can be given
Is effectiveness of DNA barcode jeopardized by
using parameter-poor models?
Is NJ too crude to provide correct matches
between closely related barcodes?
How will non-coding DNA sequences perform
when matching ‘unknowns’?
Do we need Bayesian matching for critical
species? (PP’s on match, Priors to express uncertainty
on population parameters)
Is matching of multiple barcodes a special case?
DNA barcode matching
Character-based for closely related barcodes?
Phylogenetic clustering
Distance-based matching: what models?
 – Low divergence  few parameters (JK, K2P)

 – Codon models?

 – Composite barcodes  composite models?

 – Non-coding regions: length-variation

Pragmatism: large reference libraries, speed
 DNA barcode models
Simulate DNA barcodes using parameter-rich
models, derived from insect COI and from
cpDNA atpB data (GTR, c113)
100 replicates of simulated data sets: 60 barcode
sequences of 654nt
Distance: models simple  complex
NJ clustering of resulting distances
Semistrict consensus of 100 NJ trees
      DNA barcode models
 TM                  resolve     Simulation (rich model)
                                 cpDNA atpB              mtDNA COI

                         AGCTGACGTGGACGTA                   AGCTGACGTGGACGTA
                         AGCTGAGCTGGACGTA                   AGCTGAGCTGGACGTA
                         GGCTGAGCTGGACGTA                   GGCTGAGCTGGACGTA
                         AGCCGACGAGTACGTA    100           AGCCGACGAGTACGTA    100

NJ (poor model)

                                        ……..                                   ……..
      100 NJ trees

                                            NJatpB r/p                           NJCOI   r/p

NJ (rich model)

                                        ……..                                   ……..
      100 NJ trees

                                            NJatpB r/r                           NJCOI   r/r
 DNA barcode models
Findmodel (Los Alamos National Lab.): best-fitting
model for Lepidopteran COI data set
MrBayes/Tracer: model parameter values
Simulation tree: angiosperm species-level
phylogenetic tree topology (not ultrametric)
Seq-Gen: simulate 100 reps., 654nt×60 seqs.
PAUP*: NJ and consensus analysis
TreeView: tree interpretation
    cpDNA atpB model
Base composition   Relative subst. rates
    mtDNA COI model
Base composition   Relative subst. rates
       atpB vs. COI models
      Base composition   Relative subst. rates


            peltatum146                                                                       caylae210                                    outtaxon
             ranunculophyllum212                                                              boranenseZZZu                            boranenseZZZu
                quinquelobatum176                                                             mollicomum12                             myrrhifoliummy231
                  frutetorum211                                                               antidysentericuman228                    suburbanumbi230
                acraeum213                                                                    spinosum216                               mollicomum12
                        caylae210                                                             suburbanumbi230                         antidysentericuman228
                spinosum216                                                                   myrrhifoliummy231                          spinosum216
                              antidysentericuman228                                           mutans227                                   caylae210
                                                        boranenseZZZu                         praemorsum223                              acraeum213
                                    mollicomum12                                              tenuicaule208                             frutetorum211
                                  suburbanumbi230                                             senecioides99                           quinquelobatum176
                                 myrrhifoliummy231                                            redactum80                              ranunculophyllum212
                                          mutans227                                           peltatum146                                peltatum146
                                               praemorsum223                                  quinquelobatum176                            nanum2b
                                               redactum80                                     ranunculophyllum212                         hystrix221
                                                    senecioides99                             frutetorum211                           appendiculatum127
                                                  tenuicaule208                               acraeum213                                paniculatum65
                      pseufumarioides88                                                       pseufumarioides88                        schizopetalum64
                       minimum9190                                                            minimum9190                                 bowkeri41
                gros73ECape                                                                   gros73ECape                                alternans104
                       cotyledonis20                                                          gros102TdC                                  iocastum1b
                      buysii78                                                                gros72WCape                                  buysii78
                   iocastum1b                                                                 cotyledonis20                              gros102TdC
                      gros72WCape                                                             buysii78                                  gros72WCape
                         gros102TdC                                                           iocastum1b                                cotyledonis20
                        hypoleucum74                                                          drummondii0a                             cortusifolium183
                               geniculatum89                                                  australe4a                                crassicaule172
                                  filicaule69                                                 hypoleucum74                             crithmifolium105
                          drummondii0a                                                        geniculatum89                              carnosum45a
                         australe4a                                                           filicaule69                              pseufumarioides88
                      dichondrifolium116                                                      dichondrifolium116                         minimum9190
                            album93                                                           ionidiflorum16                            drummondii0a
                        exstipulatum14                                                        exstipulatum14                              australe4a
                             ionidiflorum16                                                   album93                                    gibbosum39b
                    nanum2b                                                                   nanum2b                                  anethifolium107
                        alpinum119                                                            incarnatum170                                 triste54
                                   grandiflorum26                                             cucculatum27b                               caffrum103
                       incarnatum170                                                          vitifolium222                            flabellifolium134
                                ternatum175                                                   ternatum175                                luridum138
                              hispidum207                                                     hispidum207                               hypoleucum74
                            cucculatum27b                                                     alpinum119                                geniculatum89
                                 vitifolium222                                                grandiflorum26                              filicaule69
                                  coronopifolium111                                           althaeoides75                            dichondrifolium116
                                  denticulatum215                                                                                       ionidiflorum16
                               quercifolium218                                                coronopifolium111                         exstipulatum14
                                    althaeoides75                                             denticulatum215                              album93
                                         lanceolatum135                                       quercifolium218                             mutans227
                                 laevigatum217                                                lanceolatum135                            praemorsum223
                                   hermanniifolium225                                         citronellum31                             tenuicaule208
                                   citronellum31                                              hermanniifolium225                        senecioides99
                    alternans104                                                              laevigatum217                              redactum80
                           bowkeri41                                                          alternans104                              stipulaceum55
                             schizopetalum64                                                  paniculatum65                             sericifolium123
                                       caffrum103                                             cortusifolium183                          nerviflorumTT
                                       flabellifolium134                                      crassicaule172                            fumarifoliumTT
                                        luridum138                                            crithmifolium105                           rapaceum130
                    paniculatum65                                                             carnosum45a                                undulatumTT
                      crithmifolium105                                                        gibbosum39b                               petroseleniTT
                      carnosum45a                                                             anethifolium107                           asarifoliumTT
                        gibbosum39b                                                           triste54                                  fissifolium129
                          anethifolium107                                                     bowkeri41                                    leptumTT
                         triste54                                                             schizopetalum64                            auritum126
                          cortusifolium183                                                    caffrum103                                 luteolum67
                               crassicaule172                                                 flabellifolium134                          leipoldtiiTT
                             appendiculatum127                                                luridum138                                incarnatum170
                              hystrix221                                                      appendiculatum127                          vitifolium222
                            stipulaceum55                                                     stipulaceum55                             cucculatum27b
                                    sericifolium123                                           hystrix221                                 ternatum175
                                              fumarifoliumTT                                  sericifolium123                            hispidum207
                                       nerviflorumTT                                          nerviflorumTT                              alpinum119

                                                                         Reconstructed                                Reconstructed
                                   rapaceum130                                                fumarifoliumTT                            grandiflorum26

Model tree
                                      fissifolium129                                          rapaceum130                               althaeoides75
                                       asarifoliumTT                                          undulatumTT                              coronopifolium111
                                        petroseleniTT                                         petroseleniTT                            denticulatum215
                                                undulatumTT                                   asarifoliumTT                            quercifolium218
                                        leptumTT                                                                                       lanceolatum135

                                                                          NJ tree K2P

                                                                                                                       NJ tree GTR
                                            auritum126                                        leptumTT                                  laevigatum217

                                           luteolum67                                         auritum126                              hermanniifolium225
                                          leipoldtiiTT                                        luteolum67                                citronellum31
                                                                                            COI   outtaxon
            peltatum146                                                                           caylae210                                outtaxon
             ranunculophyllum212                                                                  spinosum216                              boranenseZZZu
                quinquelobatum176                                                                 peltatum146                              myrrhifoliummy231
                  frutetorum211                                                                   quinquelobatum176                        suburbanumbi230
                acraeum213                                                                        ranunculophyllum212                      mollicomum12
                        caylae210                                                                 frutetorum211                            antidysentericuman228
                spinosum216                                                                       acraeum213                               spinosum216
                              antidysentericuman228                                               antidysentericuman228                    caylae210
                                                        boranenseZZZu                             boranenseZZZu                            acraeum213
                                    mollicomum12                                                  mollicomum12                             frutetorum211
                                  suburbanumbi230                                                 suburbanumbi230                          quinquelobatum176
                                 myrrhifoliummy231                                                myrrhifoliummy231                        ranunculophyllum212
                                          mutans227                                               mutans227                                peltatum146
                                               praemorsum223                                      praemorsum223                            nanum2b
                                               redactum80                                         tenuicaule208                            hystrix221
                                                    senecioides99                                 senecioides99                            appendiculatum127
                                                  tenuicaule208                                   redactum80                               paniculatum65
                      pseufumarioides88                                                           gros73ECape                              schizopetalum64
                       minimum9190                                                                iocastum1b                               bowkeri41
                gros73ECape                                                                       buysii78                                 alternans104
                       cotyledonis20                                                              cotyledonis20                            iocastum1b
                      buysii78                                                                    pseufumarioides88                        buysii78
                   iocastum1b                                                                     minimum9190                              gros102TdC
                      gros72WCape                                                                 drummondii0a                             gros72WCape
                         gros102TdC                                                               australe4a                               cotyledonis20
                        hypoleucum74                                                              gros72WCape                              cortusifolium183
                               geniculatum89                                                      gros102TdC
                                                                                                  hypoleucum74                             crassicaule172
                                  filicaule69                                                                                              crithmifolium105
                          drummondii0a                                                            geniculatum89
                                                                                                  filicaule69                              carnosum45a
                         australe4a                                                                                                        pseufumarioides88
                      dichondrifolium116                                                          dichondrifolium116
                                                                                                  ionidiflorum16                           minimum9190
                            album93                                                                                                        drummondii0a
                        exstipulatum14                                                            exstipulatum14
                             ionidiflorum16                                                       album93                                  australe4a
                    nanum2b                                                                       nanum2b                                  gibbosum39b
                        alpinum119                                                                vitifolium222                            anethifolium107
                                   grandiflorum26                                                 cucculatum27b                            triste54
                       incarnatum170                                                              incarnatum170                            caffrum103
                                ternatum175                                                       ternatum175                              flabellifolium134
                              hispidum207                                                         hispidum207                              luridum138
                            cucculatum27b                                                         alpinum119                               hypoleucum74
                                 vitifolium222                                                    grandiflorum26                           geniculatum89
                                  coronopifolium111                                               althaeoides75                            filicaule69
                                  denticulatum215                                                 coronopifolium111                        dichondrifolium116
                               quercifolium218                                                    denticulatum215                          ionidiflorum16
                                    althaeoides75                                                 quercifolium218                          exstipulatum14
                                         lanceolatum135                                           lanceolatum135                           album93
                                 laevigatum217                                                    citronellum31                            mutans227
                                   hermanniifolium225                                             hermanniifolium225                       praemorsum223
                                   citronellum31                                                  laevigatum217                            tenuicaule208
                    alternans104                                                                  alternans104                             senecioides99
                           bowkeri41                                                              gibbosum39b                              redactum80
                             schizopetalum64                                                      paniculatum65                            stipulaceum55
                                       caffrum103                                                 caffrum103                               sericifolium123
                                       flabellifolium134                                          schizopetalum64                          nerviflorumTT
                                        luridum138                                                bowkeri41                                fumarifoliumTT
                    paniculatum65                                                                 cortusifolium183                         rapaceum130
                      crithmifolium105                                                            crassicaule172                           undulatumTT
                      carnosum45a                                                                 crithmifolium105                         petroseleniTT
                        gibbosum39b                                                               carnosum45a                              asarifoliumTT
                          anethifolium107                                                         anethifolium107                          fissifolium129
                         triste54                                                                 triste54                                 leptumTT
                          cortusifolium183                                                        flabellifolium134                        auritum126
                               crassicaule172                                                     luridum138                               luteolum67
                             appendiculatum127                                                    appendiculatum127                        leipoldtiiTT
                              hystrix221                                                          nerviflorumTT
                                                                                                  fumarifoliumTT                           incarnatum170
                            stipulaceum55                                                         sericifolium123                          vitifolium222
                                    sericifolium123                                               stipulaceum55                            cucculatum27b
                                              fumarifoliumTT                                      hystrix221                               ternatum175

                                                                                                  rapaceum130                              hispidum207

                                   rapaceum130                                                    undulatumTT                              alpinum119

Model tree
                                      fissifolium129                                              petroseleniTT                            grandiflorum26
                                       asarifoliumTT                                              asarifoliumTT                            althaeoides75
                                        petroseleniTT                                             fissifolium129                           coronopifolium111
                                                undulatumTT                                       leptumTT                                 denticulatum215

                                                                          NJ tree K2P                                      NJ tree GTR
                                            auritum126                                            auritum126                               quercifolium218
                                                                                                  luteolum67                               lanceolatum135

                                           luteolum67                                             leipoldtiiTT                             laevigatum217
                                          leipoldtiiTT                                                                                     hermanniifolium225
   0.1                                                                                                                                     citronellum31
Parameter rich models not efficient in
reconstructing ‘parameter-rich patterns’?
Parameter-poor models do better
Artefact of pairwise comparison?

Various shapes & branch lengths
Different base-composition across tree
Different omega rates across tree
Codon models?
Non-COI barcode regions will be needed and are
proposed through CBOL protocol
CBOL approval  adoption by INSDC

NJ/K2P sufficient for performance testing of
proposed region
Character-based DNA barcode matching needed
for closely related barcodes
Multiple barcodes: matched simultaneously

Shared By: