; Genome
Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out
Your Federal Quarterly Tax Payments are due April 15th Get Help Now >>

Genome

VIEWS: 7 PAGES: 45

  • pg 1
									                 Welcome to
Introduction to Bioinformatics
             Wednesday, 10 February
     Genome Sequencing/Assembly

      • Genome sequencing/Assembly




       This demonstration is best viewed as a slide show,
           enabling you to simulate a session and make
             changes in cursor go on more next slide
        Click anywhere to positionto theobvious.
To do this, click Slide Show on the top tool bar, then View show.
What to do for summer vacation?
Deadline, SUNday Feb 28!
Target, Monday Mar 1!
Deadline, ???
Deadline, FRIday Feb 26!
Global Viral Genome Project



      Deadline, whenever!
            Learn more about…
HHMI: http://www.vcu.edu/csbc/hhmi/
BBSI: http://www.vcu.edu/csbc/bbsi/
VCU-USF: http://www.research.vcu.edu/vpr/fellowship.htm
GVGP: http://biobike.csbc.vcu.edu (News)
                 Myers et al SQ2
What is the sequence (5' to 3') represented by the gel?
 G A T C
                 Myers et al SQ2
What is the sequence (5' to 3') represented by the gel?
 G A T C
Dideoxy sequencing
  (= Sanger sequencing)
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
                 Myers et al SQ2
What is the sequence (5' to 3') represented by the gel?
 G A T C
                 Myers et al SQ2
What is the sequence (5' to 3') represented by the gel?
 G A T C


                                                          T
                                                          C
                                                          G
                                                          T
                                                          G
                                      ddC                 T
                                                          A
                                                          C
                                                          A
                                                          T
                                      ddC                 C
                                                          G
                                      ddC                 T
                                                          A
                                                          A
                                      ddC                 C
                                      ddC                 A
                                                          C
                                                          G
                                                          G
                                                          T
                                                          T
                                                          A
                                                          A
                                                          G
                                                          T
              Sequencing process
                      Drosophila genome
                       (~100 million nt)
Sequence it



                 Technical limitation
              Reads limited to 100’s of nt
      Sequencing process
              Drosophila genome
               (~100 million nt)

                    ...
How many possible 500 nt fragments are there?
Sequencing process
    Drosophila genome
     (~100 million nt)

          ...
        SAMPLE
         Sequencing process
                Drosophila genome
                 (~100 million nt)

                      ...
                    SAMPLE
How many 500 nt samples needed  100 million nt?
                                     100 000 000
                                         500
         Sequencing process
                Drosophila genome
                 (~100 million nt)

                      ...
                    SAMPLE
How many 500 nt samples needed  100 million nt?
                                    1 000 000
                 Is this enough?        5
                 Oversampling … coverage?
          Study Question 8 & 9
        "oversampling"? "coverage"?
            Shotgun sequencing ?

Paint the wall

  How long
will this take?
          Study Question 8 & 9
        "oversampling"? "coverage"?
            Shotgun sequencing ?

Paint the wall

  How long
will this take?
          Study Question 8 & 9
        "oversampling"? "coverage"?
            Shotgun sequencing ?
                           40 "


Paint the wall




                                      25 "
  How long
will this take?




                  1 sq "
          Study Question 8 & 9
        "oversampling"? "coverage"?
            Shotgun sequencing ?
                       40 "


Paint the wall




                                      25 "
  How long
will this take?


   1000
paint balls?
 Study Question 8 & 9
"oversampling"? "coverage"?
    Shotgun sequencing ?
                 1

                0.9

                0.8
 Completeness




                0.7

                0.6
                              How much is painted
                0.5
                              with 1x oversampling?
                0.4
                              What fraction won't
                0.3

                0.2
                              be painted?
                0.1

                 0
                      0   2     4       6     8       10
                               Oversampling
  Intersection of possibilities
     (Rule of multiplication)
        Probability that two coins come up both tails

Rule of multiplication         Second coin toss
    intersection
    independent
                                  H            T


          First          H       HH           HT
          coin
          toss
                         T       TH           TT

                   Gets T from first AND gets T from second
                   P(TT) = 1/2          x       1/2 = 1/4
     Union of possibilities
      (Rule of addition)
   Probability that either of two coins comes up tails

1/2 x 1/2 = 1/4?             Second coin toss
1/2 + 1/2 = 1?
                                H            T


       First       H           HH           HT
       coin
       toss
                   T           TH           TT

                 Gets HT or TH or TT
                 P(at least 1 T) = 1/4 + 1/4 + 1/4
      Union of possibilities
       (Rule of addition)
    Probability that either of two coins comes up tails

 Rule of addition             Second coin toss
      union
mutually exclusive
                                 H            T


        First        H          HH           HT
        coin
        toss
                     T          TH           TT

                 Gets HT or TH or TT
                 P(at least 1 T) = 1/4 + 1/4 + 1/4
      Union of possibilities
    (Rule of complementation)
    Probability that either of two coins does not comes up tails

Rule of complementation           Second coin toss
        yin-yang
        Adds to 1
                                    H            T


           First        H          HH            HT
           coin
           toss
                        T           TH           TT

                     Probability(2 T) = 1 – Probability(NOT 2 T)
                     P(at least 1 T) = 1 - 1/4
         Sequencing process
                    Drosophila genome
                     (~100 million nt)

                          ...
              Focus on one nucleotide…
 What’s the probability that it’s covered by one read?
 What’s the probability that it’s covered by two reads?
What’s the probability that it’s covered by 200,000 reads?
Problem Set 3, Problem 2
  Statistics of mini-plasmid assembly
     Myers et al SQ6
Why read pairs? Scaffolds?



            Contig 1   Contig 2




DNA
              Myers et al SQ6
         Why read pairs? Scaffolds?


          ~2000 nt   mates      GATC
primer

                     primer
           insert

         x 1000's

          plasmid
       Myers et al SQ6
  Why read pairs? Scaffolds?
         Bacterial Artificial CHROMOSOME
        P1-derived Artificial CHROMOSOME
                   ~ 150,000 nt

                      ...



mates
     Myers et al SQ6
Why read pairs? Scaffolds?
        Myers et al (2000)
SQ14. From figures given in the text and in
Table 1, check the accuracy of each of the
following statements:

a. "We produced 3.156 million reads that yielded
1.76 Gbp of sequence. . ."

b. ". . .trillions of overlaps between reads are
examined."

c. ". . .to produce 654,000 of the 2-kbp mates and
497,000 of the 10-kbp mates."

								
To top