PairSeqAlgorithms.ppt - Marquette University

Document Sample
PairSeqAlgorithms.ppt - Marquette University Powered By Docstoc
					Algorithms for Pairwise
Sequence Alignment

     Craig A. Struble, Ph.D.
      Marquette University
Overview
   Pairwise Sequence Alignment
   Dynamic Programming Solution
       Global Alignment
       Local Alignment




            BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   2
Goals
   Define the pairwise sequence alignment
    problem
   Understand the difference between global
    and local alignment
   Understand dot matrix analysis
   Introduce and understand dynamic
    programming and its application to pairwise
    sequence alignment


          BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   3
Pairwise Sequence Alignment
   Problem
       Given two sequences (DNA or AA), “line
        them up” in a biologically meaningful way.

                                                       HEAGAWGHE-E
    HEAGAWGHEE
    PAWHEAE                                            P-A--W-HEAE




            BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   4
      Origins Of Similar Sequences
                                                        2
             a           duplication

     a1            a2                               1
                            speciation
a1    a2         a1 a2                              duplication

Species 1        Species 2                                                         2
                        2



1                                                           1

                                                                       Transfer
            Convergence
                        BIIN 200: Bioinformatics I - Pairwise Sequence Alignment       5
Why is comparing sequences
important?
   One of the fundamental phenomena explored by
    bioinformatics, around which many tools are built
       Databases, data selection, etc.
   Researchers compare sequences in order to:
       infer the function of genes
       infer the structure of genes and gene products
       infer the evolutionary history of genes and organisms
       identify variation responsible for disease and other complex
        phenotypes




              BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   6
Why is this a challenging
problem?
   Similar sequences contain variation
   Sequences mutate over time
       Mutations are spontaneous changes in
        sequence caused by replication (or other)
        errors. Mutation rates vary, and can be
        influenced by many factors.
   Sequence data contains errors
       Sequencing techniques are imperfect

            BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   7
   Four Basic Types of Mutations
A. Substitution:                             C. Insertion
  Thr Tyr Leu Leu                               Thr Tyr Leu Leu
  ACC TAT TTG CTG                               ACC TAT TTG CTG

  ACC TCT TTG CTG                               ACC TAC TTT GCT G--
  Thr Ser Leu Leu                               Thr Tyr Phe Ala

B. Deletion                                   D. Inversion
  Thr Tyr Leu Leu                                Thr Tyr Leu Leu
  ACC TAT TTG CTG                                ACC TAT TTG CTG


  ACC TAT TGC TG-                                ACC TTT ATG CTG
  Thr Tyr Cys                                    Thr Phe Met Leu
              BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   8
Influences on Variation
   Rates of mutations are influenced by:
       Substitution class (transition/transversion)
       Coding site (synonymous/nonsynonymous)
       Length of insertion/deletion
       Codon usage bias
       Nucleotide consist (GC content)
   Stability & fate of variation depends upon:
       Drift
       Selection (positive Darwinian/purifying, sexual, artificial)
       Other mutations (reversions are not uncommon)


              BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   9
Homology vs. Similarity
   Homology is a discrete state pertaining to relatedness
    - two genes are homologues if and only if they share
    a commone gene ancestor
       Orthologues: in different organisms, a result of speciation
       Paralogues: in the same organism, a result of gene
        duplication
       Homologues may have the same, similar, or different
        functions
   Similarity is a continuous state describing the degree
    of to which two homologues share characteristics
       Generally a percentage
       Distance estimates are also estimates of similarity

              BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   10
Kinds of Alignments
   The local alignment includes only regions of identity (or
    strong similarity). The favors finding conserved regions.
   The global alignment is stretched over the entire sequence
    length, including as many matches as possible.




             BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   11
When do you choose local vs.
global?
   Choose local alignment when
       DNA sequences encode genes with introns
       Amino acid sequences encoding proteins
   Choose a global alignment when
       Sequences can be seen to be very similar
       Similar regions are in the same order and
        orientation


            BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   12
Methods Of Sequence
Alignment
   Dot matrix analysis
   Dynamic programming algorithms
   Word or k-tuple methods
       BLAST, FASTA
       Discussed later in the semester




            BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   13
Dot Matrix Analysis
   Visualization of sequence similarity
   First technique to use on pairs of
    sequences
       Insertions/deletions
       Inverted repeats
   Does not show actual alignment
   Optimal alignment not obvious

            BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   14
       Simple Dot Matrix Example:
 For sequences:
 a) ATGCGTCGTT
                                               A T G C G T C G T T
 b) ATCCGCGAT                            A
Steps                                    T
1. Arrange sequences on a                C
    matrix
2. Place a dot anywhere                  C
    nucleotides match                    G
3. Diagonal stretches (here              C
    indicated by a line) are areas       G
    of alignment                         A
4. More than one area of
    alignment can appear                 T

                     BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   15
DNA sequence matrix: Noisy
   Sequence alignment of 2
    long DNA sequences
   Many random matches make
    it difficult or impossible to
    find areas of alignment
   Using a window & stringency             Quic kTim e™ and a TIFF (Unc ompress ed) decompres sor are needed to see this pic ture.


    setting, we can eliminate
    some of the noise




             BIIN 200: Bioinformatics I - Pairwise Sequence Alignment                                                           16
     DNA sequence matrix: Less noisy
   To decrease noise of
    random matches, a
    window of 11 nucleotides
    was defined, and a dot
    placed when at least 7
    matches occur                        Quic kTim e™ and a TIFF (Unc ompressed) dec ompress or are needed to see this picture.


   Window = 11, Stringency
    =7
   Some diagonal lines
    begin to appear



                BIIN 200: Bioinformatics I - Pairwise Sequence Alignment                                                     17
      DNA sequence matrix: Less noisy
   To decrease noise of random
    matches, a window of 23
    nucleotides was defined, and a dot
    placed when at least 15 matches
    occur
   Window = 23, Stringency = 15
   A clear diagonal line appears,            Quic kTim e™ and a TIFF (Unc ompress ed) decom press or are needed to see this picture.



    indicating an area of alignment
   A few other areas are still apparent
    - probably long random matches




                   BIIN 200: Bioinformatics I - Pairwise Sequence Alignment                                                   18
      Protein sequence matrix: Noisy
   Sequence comparison of
    amino acid sequence (same
    gene as previous example)
   Window = 1, stringency = 1
   To decrease noise due to               Qu i ckTi me ™ a nd a TIFF (Un co mp re ss ed ) de co mp re ss o r a re ne ed ed to se e th is pi c tu re.

    random matches, conditions
    can be tightened




                BIIN 200: Bioinformatics I - Pairwise Sequence Alignment                                                               19
      Protein sequence matrix: Less
      noisy
   Same sequence
    comparison, tighter
    analysis conditions
   Window = 3, stringency             Qu i ckTi me ™ a nd a TIFF (Unc o mpre ss ed ) d e co mpre ss or a re ne ed ed to se e thi s pi ctu re .

    =2
   A single aligned region
    is visible, with a number
    of areas of random
    matches



                 BIIN 200: Bioinformatics I - Pairwise Sequence Alignment                                                             20
             Evidence of repeats in a DNA
             sequence
                Window 1, stringency 1                                                               Window 23, stringency 7




QuickTime™ and a TIFF (Uncompres sed) decompressor are needed to see this picture.
                                                                                     Quic kTim e™ and a TIFF (Unc ompress ed) decompres sor are needed to see this picture.




                                                   BIIN 200: Bioinformatics I - Pairwise Sequence Alignment                                                     21
Programs for Dot Matrix
Analysis
   DNA Strider (Macintosh)
   Dotter (Unix/Linux, X-Windows)
       In the lab
   DOT plots in EMBOSS
       In the lab
   PLALIGN (FASTA)
       Plots alignments found by DP method
   Dotlet
       http://www.isrec.isb-sib.ch/java/dotlet/Dotlet.html

             BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   22
Optimal Sequence Alignments
   Example
                   HEAGAWGHEE
                   PAWHEAE

HEAGAWGHE-E                                   HEAGAWGHE-E
P-A--W-HEAE                                   --P-AW-HEAE

   Which one is better?


         BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   23
Scoring
   To compare two sequence alignments,
    calculate a score
   Scoring matrix
       Provide a score for each match/mismatch
            Sometimes a mismatch is acceptable
       PAM, BLOSUM are two classes of scoring matrices
   Gap penalty
       Initiating a gap
   Gap extension penalty
       Extending a gap
               BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   24
         Scoring Matrix Example
    A      E    G    H    W          • Gap penalty: -8
A   5      -1   0    -2   -3
                                     • Gap extension: -4
E   -1     6    -3   0    -3
H   -2     0    -2   10   -3
                                           HEAGAWGHE-E
P   -1     -1   -2   -2   -4
W   -3     -3   -3   -3   15               --P-AW-HEAE
(-8) + (-4) + (-1) + 5 + 15 + (-8) + 10 + 6 + (-8) + 6 = 13

                                           HEAGAWGHE-E
        Exercise: Calculate for
                                           P-A--W-HEAE

                     BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   25
Formal Description
   Problem: PairSeqAlign
   Input: Two sequences        x,y
          Scoring matrix         s
          Gap penalty           d
          Gap extension penalty e

   Output: The optimal sequence alignment


          BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   26
How Difficult Is This?
   Consider two sequences of length n
   There are
                 2n  (2n)!   22n
                 
                 n  (n!) 2  n 
                 
    possible global alignments, and we
    need to find an optimal one from
    amongst those!

         BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   27
So what?
                                        So at n = 20, we have
                                         over 120 billion possible
                                         alignments
                                        We want to be able to
                                         align much, much
                                         longer sequences
                                             Some proteins have 1000
                                              amino acids
                                             Genes can have several
                                              thousand base pairs


     BIIN 200: Bioinformatics I - Pairwise Sequence Alignment    28
Dynamic Programming
   General algorithmic development
    technique
   Reuses the results of previous
    computations
       Store intermediate results in a table for
        reuse
   Look up in table for earlier result to
    build from

            BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   29
     Global Alignment
        Needleman-Wunsch 1970
        Idea: Build up optimal alignment from optimal
         alignments of subsequences
        Three ways to align x1..i with y1..j
Extend both strings                                   xi already aligned,
at the same time                                      align yj with a gap
 IGAxi                     AIG Axi                           GAxi--
 LGVyj                     GVyj--                            SLG Vyj
                     yj already aligned,
                     align xi with a gap
                BIIN 200: Bioinformatics I - Pairwise Sequence Alignment    30
Global Alignment
   Notation
       xi – ith letter of string x
       yj – jth letter of string y
       x1..i – Prefix of x from letters 1 through I
       F – matrix of optimal scores
            F(i,j) represents optimal score lining up x1..i
             with y1..j
       d – gap penalty
       s – scoring matrix

               BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   31
Global Alignment
   The work is to build up F
   Initialize: F(0,0) = 0, F(i,0) = id, F(0,j)=jd
   Fill from top left to bottom right using the
    recursive relation
                    F (i  1, j  1)  s ( xi , y j )
                   
    F (i, j )  max       F (i  1, j )  d
                          F (i, j  1)  d
                   

            BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   32
          Global Alignment
                                                         yj aligned to gap


Move ahead in both
                     F(i-1,j-1)                     F(i,j-1)
                           s(xi,yj)                         d

                     F(i-1,j)                       F(i,j)
xi aligned to gap                      d


                      While building the table, keep track of where
                      optimal score came from, reverse arrows

                       BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   33
Example
            H     E     A      G     A     W     G      H     E       E

     0      -8    -16   -24    -32   -40   -48   -56    -64   -72     -80

 P   -8     -2    -9    -17    -25   -33   -42   -49    -57   -65     -73

 A   -16

 W   -24

 H   -32

 E   -40

 A   -48

 E   -56


           BIIN 200: Bioinformatics I - Pairwise Sequence Alignment         34
Completed Table
            H     E     A      G     A     W     G      H     E       E

     0      -8    -16   -24    -32   -40   -48   -56    -64   -72     -80

 P   -8     -2    -9    -17    -25   -33   -42   -49    -57   -65     -73

 A   -16    -10   -3    -4     -12   -20   -28   -36    -44   -52     -60

 W   -24    -18   -11   -6     -7    -15   -5    -13    -21   -29     -37

 H   -32    -14   -18   -13    -8    -9    -13   -7     -3    -11     -19

 E   -40    -22   -8    -16    -16   -9    -12   -15    -7    3       -5

 A   -48    -30   -16   -3     -11   -11   -12   -12    -15   -5      2

 E   -56    -38   -24   -11    -6    -12   -14   -15    -12   -9      1


           BIIN 200: Bioinformatics I - Pairwise Sequence Alignment         35
         Traceback
          H     E     A       G      A     W     G      H     E     E
                                                                           Trace arrows back
                                                                           from the lower right
    0     -8    -16   -24     -32    -40   -48   -56    -64   -72   -80    to top left
                                                                                 • Diagonal – both
P   -8    -2    -9    -17     -25    -33   -42   -49    -57   -65   -73
                                                                                 • Up – upper gap
A   -16   -10   -3    -4      -12    -20   -28   -36    -44   -52   -60          • Left – lower gap

W   -24   -18   -11   -6      -7     -15   -5    -13    -21   -29   -37

H   -32   -14   -18   -13     -8     -9    -13   -7     -3    -11   -19

E   -40   -22   -8    -16     -16    -9    -12   -15    -7    3     -5

A   -48   -30   -16   -3      -11    -11   -12   -12    -15   -5    2
                                                                                HEAGAWGHE-E
E   -56   -38   -24   -11     -6     -12   -14   -15    -12   -9    1
                                                                                --P-AW-HEAE


                           BIIN 200: Bioinformatics I - Pairwise Sequence Alignment           36
Summary
   Uses recursion to fill in intermediate
    results table
   Uses O(nm) space and time
       O(n2) algorithm
       Feasible for moderate sized sequences, but
        not for aligning whole genomes.



            BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   37
Local Alignment
   Smith-Waterman (1981)
   Another dynamic programming solution
                                  0
                    F (i  1, j  1)  s ( x , y )
                   
                                            i   j
    F (i, j )  max       F (i  1, j )  d
                          F (i, j  1)  d
                   
                   
                   

            BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   38
Example
          H     E     A      G     A     W     G      H     E       E

     0    0     0     0      0     0     0     0      0     0       0

 P   0    0     0     0      0     0     0     0      0     0       0

 A   0    0     0     5      0     5     0     0      0     0       0

 W   0    0     0     0      2     0     20    12     4     0       0

 H   0    10    2     0      0     0     12    18     22    14      6

 E   0    2     16    8      0     0     4     10     18    28      20

 A   0    0     8     21     13    5     0     4      10    20      27

 E   0    0     6     13     18    12    4     0      4     16      26


         BIIN 200: Bioinformatics I - Pairwise Sequence Alignment        39
        Traceback
        H    E    A     G     A     W      G     H     E     E      Start at highest score
                                                                    and traceback to first 0
    0   0    0    0     0     0     0      0     0     0     0

P   0   0    0    0     0     0     0      0     0     0     0

A   0   0    0    5     0     5     0      0     0     0     0

W   0   0    0    0     2     0     20     12    4     0     0                    AWGHE
H   0   10   2    0     0     0     12     18    22    14    6                    AW-HE

E   0   2    16   8     0     0     4      10    18    28    20

A   0   0    8    21    13    5     0      4     10    20    27

E   0   0    6    13    18    12    4      0     4     16    26


                       BIIN 200: Bioinformatics I - Pairwise Sequence Alignment           40
Summary
   Similar to global alignment algorithm
   For this to work, expected match with
    random sequence must have negative score.
       Behavior is like global alignment otherwise
   Similar extensions for repeated and overlap
    matching
   Care must be given to gap penalties to
    maintain O(nm) time complexity

             BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   41
Scoring Matrices
   Substitutions
   Models of substitutions
       PAM
       BLOSUM
   Gap penalties




           BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   42
DNA




      BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   43
        Transitional and Transversional Nucleotide
        Substitutions
                                    •  &  are rates of transitional and
        Pyrimidines                 transversional substitutions, respectively
C                           T       • Generally,  > 
                                   • Possible substitutions (total = 16):
                                         •Identical (freq = O): 4
                                         •Transitions (P): 4
                        
                                         •Transversions (Q): 8
                                    • Giving us:
                                         •p=P+Q
                                        • R = P/Q
A                           G            • R is usually between 0.5 and 2 for nuclear
         Purines                         genes, higher for mitochondrial genes (up to
                                         15)


                      BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   44
     Synonymous and Non-
     synonymous substitutions
    Synonymous                                   Non-synonymous
Thr Tyr Leu Leu                                  Thr Tyr Leu Leu
ACC TAT TTG CTG                                  ACC TAT TTG CTG

ACC TAC TTG CTG                                  ACC TCT TTG CTG
Thr Tyr Leu Leu                                  Thr Ser Leu Leu



   Synonymous substitutions more likely to occur
        Preserve AA

                BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   45
Categories of Amino Acids
Basic       Acidic               Polar                Nonpolar

Lys         Asp                  Ser                  Gly       Ile
Arg         Glu                  Thr                  Ala       Pro
His                              Tyr                  Val       Cys
                                 Asn                  Leu       Met
                                 Gln                  Phe       Trp

Grouped according to properties of side chain
         BIIN 200: Bioinformatics I - Pairwise Sequence Alignment     46
Amino Acid Substitutions
   Tend to preserve chemical similarity
   Tend to preserve structure
   Tend to preserve function
   More frequent in non-functional
    domains



         BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   47
Models of Substitution
   Percept Accepted Mutation (PAM)
       Dayhoff 1978
       “Accepted Mutation” changes accepted by natural selection
       PAM1 represents evolutionary divergence where 1% of
        amino change
   Blocks Amino Acid Substitution Matrices (BLOSUM)
       Henikoff and Henikoff 1992
       Observed AA substitutions in conserved AA blocks
       Maximum level of identity, BLOSUM62 represents 62%
        identity



             BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   48
      PAM
           Markov model                                  Probability of
                             pst=pts
                                                          transitioning from
                        S                   T             one state to
                                                          another
             C
                                          …
State for
amino                  P
acid

                 BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   49
PAM
   Assumes substitutions are independent
   pxy is calculated from observations
       1572 changes in 71 groups of proteins
       Organized into phylogenetic trees
       Changes counted
       Divided by normalizing factor
   The probabilities are stored in a matrix
       Probability form
   PAM1 represents 10 my evolutionary distance
       PAMN is derived from PAM1N because Markov Model is used


              BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   50
        PAM1 for DNA
                                                             Uniform Model
                                                           A          G       T     C
                                               A        0.99
                                               G    0.00333        0.99
         C             T                       T    0.00333 0.00333        0.99
                                               C    0.00333 0.00333 0.00333       0.99
   0.00333




                                                       Higher Transitions
                                                       A         G           T       C
         A             G                  A         0.99
             0.00333                      G        0.006      0.99
                                          T        0.002     0.002        0.99
0.99                                      C        0.002     0.002      0.006     0.99



                  BIIN 200: Bioinformatics I - Pairwise Sequence Alignment           51
BLOSUM
   ~2000 conserved amino acid patterns
       blocks ungapped patterns
       3-60 AA long
   >500 families of related proteins
   Software
       MOTIF (H. Smith et al. 1990)
       PROTOMAT (Henikoff and Henikoff)

           BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   52
      Computing BLOSUM Scores
          Consider all pairs (don’t know ancestor)
…A…           fAA=3+2+1=6; fAL=4; fAS=4; fLS=1
…L…       Calculate frequency of occurrence
…A…           qAA=fAA/(fAA+fAL+fAS+fLS) = 0.4
…S…
…A…       Calculate expected frequency of being in a pair
…A…           pA=(qAA+qAS/2+qAL/2)=0.66
          Calculate expected frequency of a pair
              eAA=pA*pA=0.44
          Matrix entry for pair
              mAA = qAA/eAA = 0.9

                  BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   53
Log Odds Scoring
   Each of the previous matrices are converted to log
    odds matrices
       DP algorithm based on addition
       log(xy)=log(x)+log(y)
       Compares real occurrence with random occurrence.
   BLOSUM
       sAA=log2(qAA/eAA) * 2 = -0.304 (will be rounded)
   PAM1 DNA (uniform)
       sCT = log2(pCMCT / pCpT)
        = log2(0.25 * 0.00333/ 0.252)
        = -6.23


              BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   54
            The PAM250 Matrix
            Amino acid group:
                        sulfhydryl

                small hydrophilic

acid, acidamide and hydrophilic
                                       QuickTime™ and a TIFF (Uncompressed) decompress or are needed to see this picture.

                            basic

              small hydrophobic

                        aromatic


Note:                                Each matrix value is calculated by first dividing the frequency of
•High values on diagonal             change, for each amino acid pair, in related proteins separated by one
•High values for similar groups      step in an evolutionary tree by the probability of a chance alignment
                                     based on the frequency of the amino acids. The ratios are expressed as
                                     logarithms to the base 10 (approx. 1/3 bit values).

                             BIIN 200: Bioinformatics I - Pairwise Sequence Alignment                                       55
      The Blosum62 Matrix
            Amino acid group:
                         sulfhydryl

                 small hydrophilic

acid, acidamide and hydrophilic             QuickTime™ and a TIFF (Uncompres sed) decompressor are needed to see this picture.

                              basic

               small hydrophobic

                          aromatic


 Each entry is the actual frequency of occurrence of the amino acid pair in the blocks database, clustered at the
 62% level, divided by the expected probability of occurrence. The expected value is calculated from the
 frequency of occurrence of each of the two individual amino acids in the blocks database,and provides a
 measure of a chance alignment of the two amino acids. The actual/expected ratio is expressed as a log odds. A
 zero score means that the frequency of the amino acid pair in the database was as expected by chance, a
 positive score that the pair was found more often than by chance, and a negative score that the pair was found
 less often than by chance.

                               BIIN 200: Bioinformatics I - Pairwise Sequence Alignment                                          56
Selecting Matrices
   PAM
       Mutational model of evolution
       Tracks evolutionary origins of proteins/sequences
       Use lower numbers for evolutionarily close sequences,
        higher numbers for distance sequences
   BLOSUM
       No model of evolution, conserved AA motifs
       Designed to find conserved domains
       Similar sequences, use higher numbers,
       Divergent sequences, use lower numbers.



             BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   57
GAP Penalties
   Recall d is gap opening penalty, e is gap extension
    penalty
       Total gap penalty wx=d+e(x-1)
   In order to make things work properly, need affine
    gap function (Smith et al. 1981)
       wx ≤ dx
       Any affine function works
       For the linear function above, e ≤ d
   Typical gap penalties (Mount p.142)
       BLOSUM50 d=15, e=8-15
       PAM250 d=15, e=5-15


              BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   58
Summary and Conclusions
   Similar sequences arise naturally
   Pairwise sequence alignment used to
    compare similarity of two sequences
   Dot matrix analysis is a visual technique for
    sequence alignment
   Dynamic programming is used for global and
    local alignments
   Scoring matrices based on biological
    assumptions

          BIIN 200: Bioinformatics I - Pairwise Sequence Alignment   59

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:12/10/2012
language:Unknown
pages:59