VIEWS: 5 PAGES: 21 POSTED ON: 9/24/2012 Public Domain
Sequence comparison: Local alignment Genome 559: Introduction to Statistical and Computational Genomics Prof. William Stafford Noble One-minute responses • It would be helpful to somehow get the solutions for the sample problems in our lecture printouts. – You can see the solutions by visiting the class web page and opening the slides there. • I am still a little confused about the difference between strings and lists. – A string is like a tuple of characters. Unlike a string a list is (1) mutable and (2) may contain other objects besides characters. • The Biotechniques paper went into a lot of detail -- how much of this should we understand? – I intend the paper to provide background for those who are interested. You should be sure you understand just what I go over in lecture. • I am slightly worried because I never seem to do things in the most straightforward way. – This just takes practice. Often, there is no single best way. One-minute responses • There was perhaps a bit too much • I had somewhat more difficulty programming in this class. with today's exercises. I think it • There was more class time for was due to the inherent Python, which was nice. complexity of adding new types to • I really liked the sample problem the repertoire. times. • Class moved at a good speed • Problem set is very reasonable. today. • The examples and practice are • I enjoyed the pace today. most useful teaching methods for • Today's pace was good. me at least. I am getting • The pace was good -- it was comfortable with the code through helpful for me to have more time practice. for problems. • I like the sample problems. In the • Good pace. last few classes I felt rushed to • Programming problems were a finish them, but this time I was good speed today. able to do all 3. It's very satisfying • The biostats portion was a little when they work. fast but manageable. One-minute responses • The cheat sheet really • Reviewing the DP helped. matrix was very • I really liked the list of helpful. operations and • I'm glad we reviewed methods on the back the Needleman- of the lecture notes. Wunsch algorithm. • Lists of commands in • The traceback review slides were helpful. helped me realize I'd forgotten how to do it. Local alignment • A single-domain protein may be homologous to a region within a multi-domain protein. • Usually, an alignment that spans the complete length of both sequences is not required. BLAST allows local alignments Global alignment Local alignment Global alignment DP • Align sequence x and y. • F is the DP matrix; s is the substitution matrix; d is the linear gap penalty. F 0,0 0 F i 1, j 1 sxi , y j F i, j maxF i 1, j d F i, j 1 d Local alignment DP • Align sequence x and y. • F is the DP matrix; s is the substitution matrix; d is the linear gap penalty. F 0,0 0 F i 1, j 1 s xi , y j F i, j max F i 1, j d F i, j 1 d 0 Local DP in equation form 0 F i 1, j 1 F i, j 1 s xi , y j d F i 1, j d F i, j A simple example Find the optimal local alignment of AAG and AGC. A C G T Use a gap penalty of d=-5. A 2 -7 -5 -7 C -7 2 -7 -5 G -5 -7 2 -7 A A G T -7 -5 -7 2 A 0 F i, j 1 G F i 1, j 1 s xi , y j d C F i 1, j d F i, j A simple example Find the optimal local alignment of AAG and AGC. A C G T Use a gap penalty of d=-5. A 2 -7 -5 -7 C -7 2 -7 -5 G -5 -7 2 -7 A A G T -7 -5 -7 2 0 0 0 0 A 0 0 F i, j 1 G 0 F i 1, j 1 s xi , y j d C 0 F i 1, j d F i, j A simple example Find the optimal local alignment of AAG and AGC. A C G T Use a gap penalty of d=-5. A 2 -7 -5 -7 C -7 2 -7 -5 G -5 -7 2 -7 A A G T -7 -5 -7 2 0 0 0 0 0 A 0 2 -5 -5 0 0 F i, j 1 G 0 F i 1, j 1 s xi , y j d C 0 F i 1, j d F i, j A simple example Find the optimal local alignment of AAG and AGC. A C G T Use a gap penalty of d=-5. A 2 -7 -5 -7 C -7 2 -7 -5 G -5 -7 2 -7 A A G T -7 -5 -7 2 0 0 0 0 A 0 2 0 F i, j 1 G 0 ? F i 1, j 1 s xi , y j d C 0 ? F i 1, j d F i, j A simple example Find the optimal local alignment of AAG and AGC. A C G T Use a gap penalty of d=-5. A 2 -7 -5 -7 C -7 2 -7 -5 G -5 -7 2 -7 A A G T -7 -5 -7 2 0 0 0 0 A 0 2 ? ? 0 F i, j 1 G 0 0 ? ? F i 1, j 1 s xi , y j d C 0 0 ? ? F i 1, j d F i, j A simple example Find the optimal local alignment of AAG and AGC. A C G T Use a gap penalty of d=-5. A 2 -7 -5 -7 C -7 2 -7 -5 G -5 -7 2 -7 A A G T -7 -5 -7 2 0 0 0 0 A 0 2 2 0 0 F i, j 1 G 0 0 0 4 F i 1, j 1 s xi , y j d C 0 0 0 0 F i 1, j d F i, j Local alignment • Two differences with respect to global alignment: – No score is negative. – Traceback begins at the highest score in the matrix and continues until you reach 0. • Global alignment algorithm: Needleman- Wunsch. • Local alignment algorithm: Smith- Waterman. A simple example Find the optimal local alignment of AAG and AGC. A C G T Use a gap penalty of d=-5. A 2 -7 -5 -7 C -7 2 -7 -5 G -5 -7 2 -7 A A G T -7 -5 -7 2 0 0 0 0 A 0 2 2 0 0 F i, j 1 G 0 0 0 4 F i 1, j 1 s xi , y j d C 0 0 0 0 AG F i 1, j d F i, j AG Local alignment Find the optimal local alignment of AAG and GAAGGC. A C G T Use a gap penalty of d=-5. A 2 -7 -5 -7 C -7 2 -7 -5 G -5 -7 2 -7 A A G T -7 -5 -7 2 0 0 0 0 G 0 A 0 0 F i 1, j 1 F i, j 1 A 0 s xi , y j d G 0 G 0 F i 1, j d F i, j C 0 Local alignment Find the optimal local alignment of AAG and GAAGGC. A C G T Use a gap penalty of d=-5. A 2 -7 -5 -7 C -7 2 -7 -5 G -5 -7 2 -7 A A G T -7 -5 -7 2 0 0 0 0 G 0 0 0 2 A 0 2 2 0 0 F i 1, j 1 F i, j 1 A 0 2 4 0 s xi , y j d G 0 0 0 6 G 0 0 0 2 F i 1, j d F i, j C 0 0 0 0 Local alignment Find the optimal local alignment of AAG and GAAGGC. A C G T Use a gap penalty of d=-5. A 2 -7 -5 -7 C -7 2 -7 -5 G -5 -7 2 -7 A A G T -7 -5 -7 2 0 0 0 0 AAG G 0 0 0 2 0 AAG A 0 2 2 0 F i 1, j 1 F i, j 1 A 0 2 4 0 s xi , y j d G 0 0 0 6 G 0 0 0 2 F i 1, j d F i, j C 0 0 0 0 Summary • Local alignment finds the best match between subsequences. • Smith-Waterman local alignment algorithm: – No score is negative. – Trace back from the largest score in the matrix.