Document Sample

. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from Nir Friedman’s HU course, available at www.cs.huji.ac.il/~pmai. Changes made by Dan Geiger, Ydo Wexler, and finally by Benny Chor. . The Setting We have a probabilistic model, M, of some phenomena. We know exactly the structure of M, but not the values of its probabilistic parameters, . Each “execution” of M produces an observation, x[i] , according to the (unknown) distribution induced by M. Goal: After observing x[1] ,…, x[n] , estimate the model parameters, , that generated the observed data. 3 Maximum Likelihood Estimation (MLE) The likelihood of the observed data, given the model parameters , as the conditional probability that the model, M, with parameters , produces x[1] ,…, x[n] . L()=Pr(x[1] ,…, x[n] | , M), In MLE we seek the model parameters, , that maximize the likelihood. 4 Maximum Likelihood Estimation (MLE) In MLE we seek the model parameters, , that maximize the likelihood. The MLE principle is applicable in a wide variety of applications, from speech recognition, through natural language processing, to computational biology. We will start with the simplest example: Estimating the bias of a coin. Then apply MLE to inferring phylogenetic trees. (will later talk about MAP - Bayesian inference).5 Example: Binomial Experiment Head Tail When tossed, it can land in one of two positions: Head (H) or Tail (T) We denote by the (unknown) probability P(H). Estimation task: Given a sequence of toss samples x[1], x[2], …, x[M] we want to estimate the probabilities P(H)= and P(T) = 1 - 6 Statistical Parameter Fitting (restement) Consider instances x[1], x[2], …, x[M] such that The set of values that x can take is known i.i.d. Each is sampled from the same distribution Samples (why??) Each sampled independently of the rest The task is to find a vector of parameters that have generated the given data. This vector parameter can be used to predict future data. 7 The Likelihood Function How good is a particular ? It depends on how likely it is to generate the observed data LD ( ) P ( D | ) P( x[m] | ) m The likelihood for the sequence H,T, T, H, H is L() LD ( ) (1 ) (1 ) 0 0.2 0.4 0.6 0.8 1 8 Sufficient Statistics Tocompute the likelihood in the thumbtack example we only require NH and NT (the number of heads and the number of tails) LD ( ) NH (1 ) NT NH and NT are sufficient statistics for the binomial distribution 9 Sufficient Statistics A sufficient statistic is a function of the data that summarizes the relevant information for the likelihood Formally, s(D) is a sufficient statistics if for any two datasets D and D’ s(D) = s(D’ ) LD() = LD’ () Datasets Statistics 10 Maximum Likelihood Estimation MLE Principle: Choose parameters that maximize the likelihood function This is one of the most commonly used estimators in statistics Intuitively appealing One usually maximizes the log-likelihood function, defined as lD() = ln LD() 11 Example: MLE in Binomial Data lD NH log NT log 1 Taking derivative and equating it to 0, we get NH NT ˆ NH 1 N H NT (which coincides with what one would expect) Example: L() (NH,NT ) = (3,2) MLE estimate is 3/5 = 0.6 0 0.2 0.4 0.6 0.8 1 12 From Binomial to Multinomial Now suppose X can have the values 1,2,…,K (For example a die has K=6 sides) We want to learn the parameters 1, 2. …, K Sufficient statistics: N1, N2, …, NK - the number of times each outcome is observed K Likelihood function: LD ( ) k Nk k 1 ˆ Nk k N MLE: (proof @ assignment 3) 13 Example: Multinomial Let x1 x2 ....xn be a protein sequence We want to learn the parameters q1, q2,…,q20 corresponding to the frequencies of the 20 amino acids N1, N2, …, N20 - the number of times each amino acid is observed in the sequence 20 Likelihood function: LD (q ) qk Nk k 1 Nk MLE: qk n 14 Inferring Phylogenetic Trees Let S1 , S2 ,.... , Sn be n sequence (DNA or AA). Assume for simplicity they are all same length, l. We want to learn the parameters of a phylogenetic tree that maximizes the likelihood. But wait: Should first specify a model. 15 A Probabilistic Model Our models will consist of a “regular” tree, where in addition, edges are assigned substituion probabilities. For simplicity, assume our “DNA” has only two states, say X and Y. If edge e is assigned probability pe , this means that the probability of substitution (X Y) across e is pe . 16 A Probabilistic Model (2) Our models will consist of a “regular” tree, where in addition, edges are assigned substituion probabilities. For simplicity, assume our “DNA” has only two states, say X and Y. If edge e is assigned probability pe , this means that the probability of substitution (X Y) across e is pe . 17 A Probabilistic Model (3) If edge e is assigned probability pe , this means that the probability of more involved patterns of substitution across e (e.g. XXYXY YXYXX) is determined, and easily computed: pe2 (1- pe)3 for this pattern. Q.: What if pattern on both sides is known, but pe is not known? A.: Makes sense to seek pe that maximizes probability of observation. So far, this is identical to coin toss example. 18 A Probabilistic Model (4) But a single edge is a fairly boring tree… XXYXY YXYXX pe2 pe1 pe 3 ????? YYYYX Now we don’t know the states at internal node(s), nor the edge parameters pe1, pe2, pe3 19 Two Ways to Go XXYXY YXYXX pe2 pe1 pe 3 ????? YYYYX 1. Maximize over states of internal node(s) 2. Average over states of internal node(s) In both cases, we maximize over edge parameters 20 Two Ways to Go XXYXY YXYXX pe2 pe1 pe 3 ????? YYYYX In the first version (average, or sum over states of internal nodes) we are looking for the “most likely” setting of tree edges. This is called maximum likelihood (ML) inference of phylogenetic trees. ML is probably the inference method most widely (wildly ) used. 21 Two Ways to Go XXYXY YXYXX pe2 pe1 pe 3 ????? YYYYX In the second version (maximize over states of internal nodes) we are looking for the “most likely” ancestral states. This is called ancestral maximum likelihood (AML). In some sense AML is “between” MP (having ancestral states) and ML (because the goal is still to maximize likelihood). 22 bust or a break .

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 2 |

posted: | 12/9/2011 |

language: | |

pages: | 23 |

OTHER DOCS BY xiaopangnv

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.