Sequence Alignment Scoring Functions and Affine Gap

Document Sample
scope of work template
							  Sequence Alignment – Scoring
Functions, N-W and S-W Affine Gap
             Penalties
                  Saurabh Sinha
                    02/05/2008
        Department of Computer Science
     University of Illinois Urbana-Champaign

     Scribed By: Chandrasekar Ramachandran
Contents
   Introduction
   Interpretations
   Types of Alignments
   Techniques for Solving
       Dynamic Programming
       Probabilistic Methods
   Scoring Functions
   N-W and S-W Affine Gap Penalties
Introduction
   Sequence Alignment:
      Ways of Arranging one sequence(DNA,RNA,Protein)
       on another to determine whether a region has been
       conserved in evolution or has a common evolutionary
       origin
   Strings of Letters
   Matrix Representation:

-     G G C         C    A    G G A         T    T
G G G C             C    -    G G -         T    T
Interpretations

   Mismatches?
       Point Mutations: Replacement of a Single Base
        Nucleotide
       Categorized as Transitions and Transversions
   Gaps?
       Indels or Insertion/Deletion Mutations
       Can produce Frameshift Mutations Unless
        Multiple of 3
       Introduced in one or both lineages
Interpretations(Contd.)
   What about Amino Acids?
       Degree of Similarity
            Estimates Conservation
            If Conservation is Less:
                  Indicates Region of High Importance

   Estimating Similar Functional Roles:
       By Assessing Similarity of Base Pairing
        Solving Sequence Alignment
                 Problems

   Dynamic Programming
       Initialization
       Matrix Fill or Scoring
       Traceback
   Probabilistic Methods
       Bayesian Methods for HMM
       Likelihood Derivatives and Fisher Scores
       Training and Model Comparison
         Needleman-Wunsch
     Algorithm(Global Alignment)‫‏‬
   Scores for Aligned Functions Specified by a
    Similarity Matrix
   Example:
       Sequence 1: -CCGCTTACCTA
       Sequence 2: TTCCGCTTATTA
       Possible Alignments:
          Sequence 1:-CCGCTTACCTA

          Sequence 2:-CCGCTTA- - - -

   Score Matches,Gaps and Indels Separately
          Global Alignment(contd.)‫‏‬
   The Scoring Matrix is Called F-Matrix
   Each (I,j) entry denoted by Fij
   Running Time:
       For Sequences of size a and b, O(ab)‫‏‬
   Summary:
       Initialization: Fill in Base Cases in Topmost Row and
        Leftmost Column
       Filling Partial Alignments: Traceback:
                                     Trace back to Initial
                                             Pointer Matrix to
           get best                          solution
Smith-Waterman Algorithm(Local
          Alignment)‫‏‬
   Involving Stretches Shorter than the Entire
    Sequence Length
   Generally involves Sequences which are
    significantly dissimilar
   Negative Scoring Matrix Cells are Set to Zero
   Backtracking starts at highest scoring cell and
    continues to a cell with zero score
   Prerequisite: Negative Expectation Score
        Scoring Functions - Overview
   Given sequences, a number is associated with
    each alignment
       E.g Matches : +x, Mismatches: -y,Gaps: -z
       Scoring Function: (x X #Matches) –(y X #mismatches) –
        (z X #Gaps)‫‏‬
       Alignment Scores:
            Sum of Substitution Scores and Gap Penalties
   Residue-Based
   Substitution Matrices:
       Protein
       Evolutionary
      Simple Substitution Matrices

   Expresses How one Character in a
    Sequence Changes with Other Character
    States
   N X N Matrix where: N=4 for DNA and 20
    for Amino Acids
   Another way would be to consider A,G as
    Purines and T,C as Pyrimidines
   Purines less likely to occur than
    Pyrimidines
          Minimum Entropy Scoring
                Function
   Minimum Entropy Score:
       Sum of Entropy Scores Computed For Each Column

                     S (mi )   cia log 2 pia
                                   a
   Here,
       i is a column
       cia the counts of letter a at column I
       pia the inferred probability
       Gap Characters: Residue Symbols
                     Gap Functions
   Gaps More Likely to Occur in Groups
   Examples:
       Convex Gap Scoring Functions
       Affine Gap Functions
   Convex Gap Scoring Functions:
       Penalties decrease as Gaps Get Longer
       (n):for all n, (n + 1) - (n) (n) - (n – 1)
       Now F(i,j) = max { F(i-1,j-1) + s(xi,yj)
                      maxk=0...i-1 F(k,j) – (i-k)
                      maxk=0...j-1 F(i,k) – (j-k)
               Affine Gap Functions
   Shortcomings of a general gap penalty function:
        Different Penalties for Additional Gaps
        Cubic Time for Updating Entries
   Example:




   First Gap Penalized Differently, Subsequent Gaps Penalized
    Linearly
   3 Matrices Computed Simultaneously  (x)  e  (x - 1)d; x  1
                                References
1.   http://webcourse.cs.technion.ac.il/236522/Winter2005-2006/ho/WCFiles/tutorial03.ppt
2.   http://engr.smu.edu/~saad/courses/cse8354/lectures/lecture6.pdf
3.   http://www.bioinfo.org.cn/lectures/index-13.html
4.   Needleman, S.B. and Wunsch, Ch.D. (1970) A general method applicable to the search for
     similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443-453.
5.   Smith, T.F. and Waterman, M.S. (1981) Comparison of Biosequences. Adv. appl. Math., 2, 482-
     489.
6.   Dayhoff,M.O., Barker,W.C. and Hunt,L.T. (1983) Establishing Homologies in Protein Sequences.
     Methods Enzymol., 91, 524-545.
7.   Gotoh, O. (1982) An Improved Algorithm for Matching Biological Sequences. J. Mol. Biol., 162,
     705-708.

						
Related docs
Other docs by alicejenny
to view Lesson from Teachers
Views: 201  |  Downloads: 0
GUIDELINES FOR POST EXPOSURE PROPHYLAXIS PEP
Views: 133  |  Downloads: 0
FIRST BANK ADDITION City of Bloomington
Views: 0  |  Downloads: 0
Is There Bubble in US Housing Markets MIT
Views: 24  |  Downloads: 0
CCEVS Policy Letter NIAP CCEVS
Views: 0  |  Downloads: 0
Ratification of Protocol No
Views: 190  |  Downloads: 0
Michigan Proposed Insurance Survey ASTSWMO
Views: 0  |  Downloads: 0
The Impact of the new NHS Dental Contract
Views: 0  |  Downloads: 0
OVERVIEW OF THE Bad Request
Views: 189  |  Downloads: 0