Docstoc

PowerPoint Presentation - Edwards _ SDSU

Document Sample
PowerPoint Presentation - Edwards _ SDSU Powered By Docstoc
					                                        UBC, Vancouver, June 2006


THE GLOBAL MARINE VIRIOME

               Rob Edwards


                 Dept. Biology , SDSU
  Computational Sciences Research Center, SDSU
      Center for Microbial Sciences, San Diego,
Fellowship for Interpretation of Genomes, Chicago, IL
The Burnham Inst. for Medical Research, San Diego
                IMEC, LLC, San Diego
                           Outline

• Forget DGGE, just sequence it
    – (Fabulous four-five-four for facile functional findings)
•   Functional analysis is a blast
•   Is community structure antiestablishment?
•   Are there viruses in the ocean?
•   Why people suck
•   Why we’re screwed
                                     Metagenomics

                                        200 liters water
                                        5-500 g fresh fecal matter

                                         Concentrate and purify viruses

                                    Epifluorescent
                                    Microscopy        Extract nucleic acids


                                                       DNA/RNA LASL

                                                     Sequence

Breitbart et al., multiple papers
               Pyrosequencing
                 whole genome
                 amplification
5-100ng DNA                      2-5 µg DNA




 www.454.com
           454 Sequence Data
  (Only from Rohwer Lab, in one year)
• 42 libraries
   – 22 microbial, 20 phage
• 1,028,563,420 bp total
   – 33% of the human genome
   – 95% of all complete and partial bacterial genomes
   – 10% of community sequencing of JGI per year
• 9,933,184 sequences
   – Average 236,511 per library
• Average read length 103.5 bp
   – Av. read length has not increased in 12 months
                                       Metazoan
 Sampling                              associated
 Sites    Freshwater                    Corals
                        Aquifer         Fish
Marine                  Glacial lake     Human blood
  Near-shore water                       Human stool
  Off-shore water
  Near- and off-shore
  sediments




                                             Extreme
Terrestrial/Soil                             Hot springs
 Amazon rainforest                            (84oC; 78oC)
                                             Soda lake
 Konza prairie
                                              (pH 13)
 Joshua Tree desert
                                             Solar saltern
  Air                                         (>35% salt)
                           Outline

• Forget DGGE, just sequence it
    – (Fabulous four-five-four for facile functional findings)
•   Functional analysis is a blast
•   Is community structure antiestablishment?
•   Are there viruses in the ocean?
•   Why people suck
•   Why we’re screwed
 The SEED database developed
           by FIG
http://theseed.uchicago.edu/FIG/index.cgi
           Current version:

              580 Bacteria (342 complete)
              38 Archaea (26 complete)
              562 Eukarya (29 complete)
              1335 Viruses
              2 Environmental Genomes
Subsystems are not just for gene clusters



                                                genome context
                                        (virulence islands, prophages,
                                              conserved gene clusters)
                                         virulence mechanism


                                             enzymatic activity
                                           cellular localization

                              predicted or measured   co-regulation


                                           common phenotype


                                       combinations of criteria
         Cyanoseed:
http://cyanoseed.theFIG.info
                  Marine Seed:
http://theseed.uchicago.edu/FIG/organisms.cgi?show=marine
                           Outline

• Forget DGGE, just sequence it
    – (Fabulous four-five-four for facile functional findings)
•   Functional analysis is a blast
•   Is community structure antiestablishment?
•   Are there viruses in the ocean?
•   Why people suck
•   Why we’re screwed
            Assembly of 454 sequences
     mitochondrion ca. 17 kb




                        assembled fragment ca. 10 kb
Thanks: Lutz Krause
           Community structure


   Community structure based on frequency of finding
      overlapping fragments from the sequences




2-contig
                             3-contig
                           Outline

• Forget DGGE, just sequence it
    – (Fabulous four-five-four for facile functional findings)
•   Functional analysis is a blast
•   Is community structure antiestablishment?
•   Are there viruses in the ocean?
•   Why people suck
•   Why we’re screwed
Phages In The Worlds Oceans

       ARC
    56 samples
     16 sites
      1 year

             BBC
          85 samples
           38 sites
            8 years                       SAR
                                        1 sample
                                          1 site
                              GOM        1 year
                           41 samples
                   LI       13 sites
                 4 sites     5 years
                 1 year
Most Marine Phage Sequences are Novel
Phages are specific to environments




                                      ssDNA




                                      -like
Phage
Proteomic                             T4-like
Tree v. 5                             T7-like
(Edwards, Rohwer)
                                 Thanks:
                                 Mya Breitbart
        Marine Single-Stranded DNA Viruses

•   6% of SAR sequences ssDNA phage (Chlamydia-like
    Microviridae)



•   40% viral particles in SAR are ssDNA phage



•   Several full-genome sequences were recovered via de novo
    assembly of these fragments



•   Confirmed by PCR and sequencing
       SAR Aligned Against the Chlamydia 4


  Individual
  sequence
  reads




 Coverage
 Concatenated
 hits


Chlamydia phi 4                                         Chl4 ORF
genome                12,297 sequence fragments hit     calls
                  using TBLASTX over a ~4.5 kb genome
                           Outline

• Forget DGGE, just sequence it
    – (Fabulous four-five-four for facile functional findings)
•   Functional analysis is a blast
•   Is community structure antiestablishment?
•   Are there viruses in the ocean?
•   Why people suck
•   Why we’re screwed
Phages, Reefs, and Human Disturbance
Phages, Reefs, and Human Disturbance

                Kingman
                 Palmyra

                           Washington

                                 Fanning


                                 Christmas




                    The Northern Line Islands
                    Expedition, 2005
16S rDNA at each island
16S genes from 454 same as from cloning

       Black stuff
                     Cloned and 454 sequenced
                     16S are indistinguishable


                          Cloned
                             Red

           Red
Christmas to Kingman Bias in No. Phage Hosts
        Negative numbers mean relatively more phage hosts at Kingman


More photosynthesis at Kingman.
No people at Kingman.




                                                More pathogens at Christmas.
                                                More people at Christmas.
                           Outline

• Forget DGGE, just sequence it
    – (Fabulous four-five-four for facile functional findings)
•   Functional analysis is a blast
•   Is community structure antiestablishment?
•   Are there viruses in the ocean?
•   Why people suck
•   Why we’re screwed
     Computational Challenges
• Sequence annotations and analysis
   – What is there?
   – What is it doing?
   – How is it doing it?

• Gene predictions in unknowns
   – Lutz Krause

• Sequence comparisons
   – BLAST
   – Other ways to rapidly compare short sequences
   – What happens when everyone is using 454 sequencing?
   Sequence data from 21 libraries
      600 million bp
                            6 million sequences



• Each BLASTX search takes 1,000 CPU hours
• 42 libraries = 42,000 CPU hours or 4.8 CPU years
• Users want
    • repeat runs,
    • TBLASTX,
    • more analysis
    • more data
    • more, more, more, more
SDSU                       ANL              FIG
 Forest Rohwer               Rick Stevens     Veronika Vonstein
 Beltran Rodriguez-Brito     Bob Olsen        Ross Overbeek
 Lutz Krause                 CI Support       Annotators
USF
 Mya Breitbart             Also at SDSU     Math Guys@SDSU
Rohwer Lab                  Anca Segall      Peter Salamon
 Linda Wegley               Willow R-S       Joe Mahaffy
 Florent Angly              Stanley Maloy    James Nulton
 Matt Haynes                                 Ben Felts
                                             David Bangor
                            MIT:             Steve Rayhawk
                             Ed DeLong       Jennifer Mueller

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:10
posted:3/24/2012
language:English
pages:30