Docstoc

Sequencing - PowerPoint

Document Sample
Sequencing - PowerPoint Powered By Docstoc
					 Large-scale genome projects
                                        Strategy
• Sequencing DNA molecules in the Mb
size range                              Libraries


• All strategies employ the same       Sequencing

underlying principles:
                                       Assembly

       Random Shotgun sequencing        Closure

                                       Annotation

                                        Release
                                         Genomic DNA

                   Shearing/Sonication




                 Subclone and Sequence




                                         Shotgun reads

                  Assembly

                                           Contigs
Finishing read
                 Finishing



                                         Complete sequence
Nucleotide Database Growth
EMBL breakdown by organism
EMBL Release 65
Progress on Large Sequencing Projects
    Strategies for sequencing
                                 Strategy
• How big can you go??
                                 Libraries
• Large-insert clones
                                Sequencing
   • cosmids 30-40 kb
                                Assembly
   • BACs/PACs 50 - 100 kb
                                 Closure
• Whole chromosomes
                                Annotation
• Whole genomes
                                 Release
    Genome size and sequencing strategies
             Genome size (log Mb)

0        1            2             3          4
                                                   H.sapiens (3000 Mb)
                                                   D.melanogaster (170 Mb)
                                                   C.elegans (100Mb)
                                                   P.falciparum (30 Mb)
                                                   S.cerevisiae (14 Mb)
                                                   E.coli (4 Mb)

                                        Whole genome shotgun (WGS)
                                        Clone-by-clone
                                        Whole Chromosome Shotgun (WCS)
                                        Whole Genome Shotgun (WGS)
                                        with Clone ‘skims’
                                         Genomic DNA

                   Shearing/Sonication




                 Subclone and Sequence




                                         Shotgun reads

                  Assembly

                                           Contigs
Finishing read
                 Finishing



                                         Complete sequence
   Strategies for sequencing
• Size and GC composition of genome
                                       Strategy
   • Volume of data
                                       Libraries
   • Ease of cloning
                                      Sequencing
   • Ease of sequencing
                                      Assembly
• Genome complexity
                                       Closure
   • dispersed repetitive sequence
                                      Annotation
   • telomeres & centromeres
                                       Release
• Politics/Funding
  Strategies: Clone by Clone
                                   Strategy
• Simple (0.5 - 2 K reads)
                                   Libraries
• Few problems with repeats
                                  Sequencing
• Relatively simple informatics
                                  Assembly
• Scalability
                                   Closure
• Quality of physical map
                                  Annotation
   • Fingerprint / STS maps
                                   Release
   • End sequencing
Strategies: Whole Chromosome
        shotgun (WCS)
                                        Strategy
• Requires chromosome isolation
                                        Libraries
• Moderate complexity (10’s K reads)
                                       Sequencing
• Problems with repeats
                                       Assembly
• Complex informatics
                                        Closure
• Inefficient in isolation
                                       Annotation
• Quality of physical map
                                        Release
   • Skims of mapped clones
      Strategies: Whole Genome
            shotgun (WGS)
                                                    Strategy
• Moderate to High complexity (10-100’s K reads)
                                                    Libraries
• Problems with repeats
                                                   Sequencing
• Complex informatics
                                                   Assembly
• Quality of physical map
                                                    Closure
   • Fingerprint map
                                                   Annotation
   • STS markers
                                                    Release
   • End-sequences
   • Skims of mapped clones
 Sequencing my genome
                           Strategy
             Politics
                           Libraries
             Production
                          Sequencing

                          Assembly
             Finishing
                           Closure

                          Annotation
             Annotation
                           Release



TIME MONEY
         What do you get?
DATA!!, DATA !!, and more DATA!!
                                    Strategy

                                    Libraries
• Sequence
                                   Sequencing
   • incomplete v complete
                                   Assembly
• First-pass annotation
                                    Closure
   • Gene discovery
                                   Annotation
   • Full annotation
                                    Release
• A starting point for research
Genome annotation is central to functional genomics

          ORFeome based functional genomics



RNAi phenotypes                       Gene Knockout




                  Expression Microarray
           Sequencing

• Library construction     Strategy


• Colony picking           Libraries


• DNA preparation         Sequencing


• Sequencing reactions    Assembly


• Electrophoresis          Closure

                          Annotation
• Tracking/Base calling
                           Release
                    Libraries
• Essentially Sub-cloning                           Strategy

• Generation of small insert libraries in a well    Libraries
characterised vector.
                                                   Sequencing
   • Ease of propagation
                                                   Assembly
   • Ease of DNA purification
                                                    Closure
   • e.g. puc18, M13
                                                   Annotation

                                                    Release
           Libraries - testing
• Simple concepts                 Strategy

   • Insert/Vector ratio          Libraries

• Real data                      Sequencing

   • Insert size                 Assembly

   • Sequence ….                  Closure

   • Simple analysis             Annotation

                                  Release
        Sequence generation
• Pick colonies                                  Strategy

• Template preparation                           Libraries

• Sequence reactions                            Sequencing

   • Standard terminator chemistry              Assembly

   • pUC libraries sequenced with forward and    Closure
   reverse primers
                                                Annotation

                                                 Release
       Sequence generation
• Electrophoresis of products                     Strategy

   • Old style - slab gels, 32 > 64 > 96 lanes    Libraries

   • New style - capillary gels, 96 lanes        Sequencing

• Transfer of gel image to UNIX                  Assembly

   • Sequencing machines use a slave Mac/PC       Closure

   • Move data to centralised storage area for   Annotation
   processing
                                                  Release
       Gel image processing
• Light-to-Dye estimation       Strategy

• Lane tracking                 Libraries

• Lane editing                 Sequencing

• Trace extraction             Assembly

• Trace standardisation         Closure

   • Mobility correction       Annotation

   • Background substitution    Release
             Pre-processing
• Base calling using Phred                 Strategy

   • modifies SCF file                     Libraries

• Quality clipping                        Sequencing

• Vector clipping                         Assembly

   • Sequencing vector                     Closure

   • Cloning vector                       Annotation

• Screen for contaminants                  Release

• Feature mark up (repeats/transposons)
                        Finishing
• Assembly: Process of taking raw single-pass
                                                       Strategy
reads into contiguous consensus sequence
                                                       Libraries
• Closure: Process of ordering and merging
consensus sequences into a single contiguous          Sequencing

sequence
                                                      Assembly

                                                       Closure

• Finished is defined as sequenced on both strands    Annotation
using multiple clones. In the absence of multiple
clones the clone must be sequenced with multiple       Release

chemistries. The overall error rate is estimated at
less than 1 error per 10 kb
       Genome Assembly
                          Strategy

• Pre-assembly            Libraries

• Assembly               Sequencing

• Automated appraisal    Assembly

• Manual review           Closure

                         Annotation

                          Release
              Pre-Assembly
                                         Strategy

• Convert to CAF format                  Libraries

    • flatfile text format              Sequencing

    • choice of assembler               Assembly

    • choice of post-assembly modules    Closure

    • choice of assembly editor         Annotation

                                         Release
www.sanger.ac.uk/Software/CAF
                    Assembly
                                                 Strategy

• Assemble using Phrap                           Libraries

• Read fasta & quality scores from CAF file     Sequencing

• Merge existing Phrap .ace file as necessary   Assembly

• Adjust clipping                                Closure

                                                Annotation

                                                 Release
           Assembly appraisal
• auto-edit
    • removes 70% of read discrepancies        Strategy


• Remove cloning vector                        Libraries


• Mark up sequence features                   Sequencing


• finish                                      Assembly

                                               Closure
    • Identify low-quality regions
                                              Annotation
    • Cover using ‘re-runs’ and ‘long-runs’
                                               Release
• Compare with current databases
    • plate contamination
Manual Assembly appraisal
                                            Strategy

• Use a sequence editor (GAP/consed)        Libraries

• Tools to identify Internal joins         Sequencing

• Tools to identify and import data from   Assembly
an overlapping projects
                                            Closure
• Tools to check failed or mis-assembled
                                           Annotation
reads for inclusion in project
                                            Release
          Manual editing
• Sanger uses 100% edit strategy          Strategy


• Where additional data is required:      Libraries


   • Check clipping                      Sequencing


   • Additional sequencing               Assembly


       • Template / Primer / Chemistry    Closure


• Assemble new data into project         Annotation


   • GAP4 Auto-assemble                   Release


   • Repeat whole process
         Manual Quality Checks
• Force annotation tag consistency
                                                   Strategy
• All unedited data is re-assembled using Phrap
                                                   Libraries
• All high-quality discrepancies are reviewed
                                                  Sequencing
• Confirm restriction digest (clones)
                                                  Assembly
• Check for inverted repeats
                                                   Closure
• Manually check:
                                                  Annotation
   • Areas of high-density edits
                                                   Release
   • Areas with no supporting unedited data
   • Areas of low read coverage
                  Gap closure
• Read pairs
                                                Strategy
• PCR reactions (long-range / combinatorial)
                                                Libraries
• Small-insert libraries
                                               Sequencing

• Transposon-insertion libraries
                                               Assembly

                                                Closure

                                               Annotation

                                                Release
  Gap closure - contig ordering
• Read pair consistency
                              Strategy
• STS mapping
                              Libraries
   • Physical mapping
                             Sequencing

   • Genetic mapping
                             Assembly

   • Optical mapping
                              Closure

• Large-insert clone         Annotation

   • skims                    Release

   • end-sequencing
                 Annotation
                                         Strategy
• DNA features (repeats/similarities)
                                         Libraries
• Gene finding
                                        Sequencing
• Peptide features
                                        Assembly
• Initial role assignment
                                         Closure
• Others- regulatory regions
                                        Annotation

                                         Release
                        Annotation of eukaryotic genomes

Genomic DNA
                                                                    ab initio gene
                                                                    prediction
                                         transcription
Unprocessed RNA

                                         RNA processing
Mature mRNA           Gm3                                AAAAAAA

                                                                   Comparative gene
                                         translation
                                                                   prediction
Nascent polypeptide

                                         folding

 Active enzyme

                                                                   Functional
                                                                   identification
 Function                   Reactant A         Product B
Genome analysis overview: C.elegans
                DNA features
• Similarity features
• mapping repeats
                                   Strategy
   • simple tandem and inverted
                                   Libraries
   • repeat families
                                  Sequencing
• mapping DNA similarities
                                  Assembly
   • EST/mRNAs in eukaryotes
                                   Closure
   • Duplications,
                                  Annotation
   • RNAs
                                   Release

• mapping peptide similarities
   • protein similarities
              Gene finding
• ORF finding (simple but messy)         Strategy

• ab initio prediction                   Libraries

   • Measures of codon bias             Sequencing

   • Simple statistical frequencies     Assembly

• Comparative prediction                 Closure


   • Using similarity data              Annotation


   • Using cross-species similarities    Release
          Peptide features
                                             Strategy
• Peptide features
                                             Libraries
   • low-complexity regions
                                            Sequencing
   • trans-membrane regions
                                            Assembly
   • structural information (coiled-coil)
                                             Closure
• Similarities and alignments
                                            Annotation
• Protein families (InterPro/COGS)
                                             Release
     Initial role assignment
• Simple attempt to describe the
functional identity of a peptide            Strategy


• Uses data from:                           Libraries


   • peptide similarities                  Sequencing


   • protein families                      Assembly


• Vital for data mining                     Closure


• Large number of predicted genes remain   Annotation

hypothetical or unknown                     Release
  Other regulatory features
                               Strategy
• Ribosomal binding sites
                               Libraries
• Promoter regions
                              Sequencing

                              Assembly

                               Closure

                              Annotation

                               Release
                Data Release
• DNA release
   • Unfinished                  Strategy

   • Finished                    Libraries

• Nucleotide databases          Sequencing

   • GENBANK/EMBL/DDBJ          Assembly


• Peptide databases              Closure


   • SWISSPROT/TREMBL/GENPEPT   Annotation


• Others                         Release

				
DOCUMENT INFO