Document Sample

Scoring Matrices Different types of matrices Matrices used PSSM = Position Specific Scoring Matrices PAM matrices BLOSUM = BLOCK SUBSTITUTION MATRIX Position-Specific Scoring Matrix A PSSM is a motif descriptor The descriptor includes a weight (score, probability) for each symbol occurring at each position along the motif Examples of motifs: Protein active sites, structural elements, zinc finger, intron/exon boundaries, transcription-factor binding sites, etc. Position-Specific Scoring Matrix Construction of PSSM is a multi-stage process: 1. Architecture of matrix 2. Create multiple alignment from which the matrix is derived 3. Calculate frequencies for each position 4. Applying BLAST to PSSM Position-Specific Scoring Matrix 10 vertebrate donor site sequences aligned at exon/intron boundary seq 1 GAGGTAAAC seq 2 TCCGTAAGT seq 3 CAGGTTGGA seq 4 ACAGTCAGT seq 5 TAGGTCATT seq 6 TAGGTACTG seq 7 ATGGTAACT seq 8 CAGGTATAC seq 9 TGTGTGAGT seq 10 AAGGTAAGT Position-Specific Scoring Matrix Calculate the absolute frequency of each nucleotide at each position seq 1 GAGGTAAAC 1 2 3 4 5 6 7 8 9 seq 2 TCCGTAAGT A seq 3 CAGGTTGGA C seq 4 ACAGTCAGT G seq 5 TAGGTCATT T seq 6 TAGGTACTG seq 7 ATGGTAACT seq 8 CAGGTATAC seq 9 TGTGTGAGT seq 10 AAGGTAAGT Position-Specific Scoring Matrix Calculate the absolute frequency of each nucleotide at each position seq 1 GAGGTAAAC 1 2 3 4 5 6 7 8 9 seq 2 TCCGTAAGT A 3 6 1 0 0 6 7 2 1 seq 3 CAGGTTGGA C 2 2 1 0 0 2 1 1 2 seq 4 ACAGTCAGT G 1 1 7 10 0 1 1 5 1 seq 5 TAGGTCATT T 4 1 1 0 10 1 1 2 6 seq 6 TAGGTACTG seq 7 ATGGTAACT seq 8 CAGGTATAC seq 9 TGTGTGAGT seq 10 AAGGTAAGT Position-Specific Scoring Matrix Calculate the relative frequency of each nucleotide at each position seq 1 GAGGTAAAC 1 2 3 4 5 6 7 8 9 seq 2 TCCGTAAGT A 3 6 1 0 0 6 7 2 1 seq 3 CAGGTTGGA C 2 2 1 0 0 2 1 1 2 seq 4 ACAGTCAGT G 1 1 7 10 0 1 1 5 1 seq 5 TAGGTCATT T 4 1 1 0 10 1 1 2 6 seq 6 TAGGTACTG seq 7 ATGGTAACT seq 8 CAGGTATAC seq 9 TGTGTGAGT 1 2 3 4 5 6 7 8 9 seq 10 AAGGTAAGT A C G T Position-Specific Scoring Matrix Calculate the relative frequency of each nucleotide at each position seq 1 GAGGTAAAC 1 2 3 4 5 6 7 8 9 seq 2 TCCGTAAGT A 3 6 1 0 0 6 7 2 1 seq 3 CAGGTTGGA C 2 2 1 0 0 2 1 1 2 seq 4 ACAGTCAGT G 1 1 7 10 0 1 1 5 1 seq 5 TAGGTCATT T 4 1 1 0 10 1 1 2 6 seq 6 TAGGTACTG seq 7 ATGGTAACT seq 8 CAGGTATAC seq 9 TGTGTGAGT 1 2 3 4 5 6 7 8 9 seq 10 AAGGTAAGT A 0.3 0.6 0.1 0 0 0.6 0.7 0.2 0.1 C 0.2 0.2 0.1 0 0 0.2 0.1 0.1 0.2 G 0.1 0.1 0.7 1 0 0.1 0.1 0.5 0.1 T 0.4 0.1 0.1 0 1 0.1 0.1 0.2 0.6 Position-Specific Scoring Matrix What is the probability of finding CAGGTTGGA? The product of the frequency of each nucleotide at each position: C is 0.2 at position 1, A is 0.6 at position 2, etc -> 0.2 * 0.6 * 0.7 * 1 * 1 * 0.1 * 0.1 * 0.5 * 0.1 1 2 3 4 5 6 7 8 9 A 0.3 0.6 0.1 0 0 0.6 0.7 0.2 0.1 C 0.2 0.2 0.1 0 0 0.2 0.1 0.1 0.2 G 0.1 0.1 0.7 1 0 0.1 0.1 0.5 0.1 T 0.4 0.1 0.1 0 1 0.1 0.1 0.2 0.6 HMM (hidden Markov model) HMMs and their Usage HMMs are very common in Computational Linguistics: Speech recognition (observed: acoustic signal, hidden: words) Handwriting recognition (observed: image, hidden: words) Part-of-speech tagging (observed: words, hidden: part-of-speech tags) Machine translation (observed: foreign words, hidden: words in target language) Hidden Markov Model (HMM) HMMs allow you to estimate probabilities of unobserved events Given plain text, which underlying parameters generated the surface E.g., in speech recognition, the observed data is the acoustic signal and the words are the hidden parameters Markov Chains Given a finite discrete set S of possible states, a Markov chain process occupies one of these states at each unit of time. The process either stays in the same state or moves to some other state in S. This occurs in a stochastic way, rather than in a deterministic one. A simple example Consider a 3-state Markov model of the weather. We assume that once a day the weather is observed as being one of the following: rainy or snowy, cloudy, sunny. We postulate that on day t, weather is characterized by a single one of the three states above, and give ourselves a transition probability matrix A given by: 0 .4 0 .3 0 .3 0 .2 0 .6 0 . 2 0 . 1 0 . 1 0 .8 Given that the weather on day 1 is sunny, what is the probability that the weather for the next 7 days will be “sun-sun-rain-rain-sun-cloudy- sun”? Hidden? What if each state does not correspond to an observable (physical) event? The Structure of a Profile HMM Squares: main states Diamonds: insert states Circles: delete states, silent states A Hidden Markov Model insertion node node 1 node 2 node 3 node 4 node 5 node 6 First Three and Last Three Columns Column 1: 4 A’s and 1 T probability for A is 0.8 probability for T is 0.2 A C A - - - A T G T C A A C T A T C A C A C - - A G C A G A - - - A T C A C C G - - A T C A C A - - - A T G T C A A C T A T C A C A C - - A G C Insertions A G A - - - A T C A C C G - - A T C Columns 4, 5, 6 are the insertions At the fourth column, 3 out of 5 sequences have insertions the probability of transition from the third node to the insertion node is 0.6 In the insertion node, 1 A, 2 C’s, 1 G, 1T the probabilities of A, C, G, T are 0.2, 0.4, 0.2, 0.2 Two Uses of a Markov Model Generate sequences according to the probabilities Compute the probability of a sequence A Markov Model Generating Random DNA Sequences A C begin end G T A Good Introduction to HMM The examples in the following slides are taken from: An introduction to hidden Markov models for biological sequences Anders Krogh In Computational Methods in Molecular Biology, edited by S. L. Salzberg, D.B. Searls, and S. Kasif, pages 45-63, Elsevier, 1998 http://www.binf.ku.dk/users/krogh/publicat ions/ps/Krogh98a.pdf

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 9 |

posted: | 3/1/2013 |

language: | |

pages: | 34 |

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.