Document Sample

Econ 620 Maximum Likelihood Estimation (MLE) Deﬁnition of MLE • Consider a parametric model in which the joint distribution of Y = (y1 , y2 , · · ·, yn ) has a density (Y ; θ) with respect to a dominating measure µ, where θ ∈ Θ ⊂ RP . Deﬁnition 1 A maximum likelihood estimator of θ is a solution to the maximization problem max (y; θ) θ∈Θ • Note that the solution to an optimization problem is invariant to a strictly monotone increasing trans- formation of the objective function, a MLE can be obtained as a solution to the following problem; max log (y; θ) = max L (y; θ) θ∈Θ θ∈Θ Proposition 2 (Suﬃcient condition for existence) If the parameter space Θ is compact and if the likelihood function θ → (y; θ) is continuous on Θ, then there exists a MLE. Proposition 3 (Suﬃcient condition for uniqueness of MLE) If the parameter space Θ is convex and if the likelihood function θ → (y; θ) is strictly concave in θ, then the MLE is unique when it exists. • If the observations on Y are i.i.d. with density f (yi ; θ) for each observation, then we can write the likelihood function as n n (y; θ) = f (yi ; θ) ⇒ L (y; θ) = log f (yi ; θ) i=1 i=1 Properties of MLE Proposition 4 (Functional invariance of MLE) Suppose a bijective function g : Θ → Λ where Λ ⊂ Rq and θ is a MLE of θ, then λ = g θ is a MLE of λ ∈ Λ. ⇒ By deﬁnition of MLE, we have θ ∈ Θ and y; θ ≥ (y; θ) , ∀θ ∈ Θ or equivalently, λ ∈ Λ and y; g −1 λ ≥ y; g −1 (λ) , ∀λ ∈ Λ which implies that λ = g θ is a MLE of λ in a model with density y; g −1 (λ) . Proposition 5 (Relationship with suﬃciency) MLE is a function of every suﬃcient statistic. ⇒ Let S (Y ) be a suﬃcient statistic. From the factorization theorem of a suﬃcient statistic, the density function can be written as (y; θ) = Ψ (S (y) ; θ) h (y) , i.e., L (y; θ) = log Ψ (S (y) ; θ) + log h (y) . Hence max- imizing (y; θ) with respect to θ is equivalent to maximizing log Ψ (S (y) ; θ) with respect to θ. Therefore, MLE depends on Y through S (Y ) . • To discuss asymptotic properties of MLE, which are why we study and use MLE in practice, we need some so-called regularity conditions. These conditions are to be checked not to be granted before we use MLE. It is diﬃcult, mostly impossible, to check in practice, though. 1 Regularity Conditions 1. The variables Yi , i = 1, 2, · · · are independent and identically distributed with density f (y; θ) . 2. The parameter space Θ is compact. 3. The true but unknown parameter value θ0 is identiﬁed, i.e. θ0 = arg max Eθ0 log f (Yi ; θ) θ∈Θ 4. The likelihood function n L (y; θ) = log f (yi ; θ) i=1 is continuous in θ. 5. Eθ0 log f (Y ; θ) exists. 1 6. The log-likelihood function is such that n L (y; θ) converges almost surely (in probability) to Eθ0 log f (Yi ; θ) uniformly in θ ∈ Θ, i.e., 1 sup L (y; θ) − Eθ0 log f (Yi ; θ) < δ almost surely (in probability) for some δ > 0. θ∈Θ n Proposition 6 Under 1 - 6, there exists a sequence of MLE’s converging almost surely (in probability) to the true parameter value θ0 . That is, MLE is a consistent estimator. ⇒ 1 and 2 ensure the existence of MLE θn . It is obtained by maximizing L (y; θ) or equivalently, 1 1 1 n n L (y; θ) .Since n L (y; θ) = n i=1 log f (yi ; θ) can be interpreted as the sample mean of the random variables log f (yi ; θ) , which are i.i.d., the objective function converges almost surely (in probability) to Eθ0 log f (Y ; θ) by the strong(weak) law of large numbers. Furthermore, the uniform strong law of large 1 numbers implies that the solution to n n log f (yi ; θ) , θn , converges to the solution to the limit problem i=1 max Eθ0 log f (Y ; θ) θ∈Θ i.e., max log f (y; θ) f (y; θ0 ) dy θ∈Θ Y Now, note that the identiﬁability condition 3 ensures the convergence of θn to θ0 . More regularity conditions for asymptotic distribution 2’. θ0 ∈ Int (Θ) . 7. The log-likelihood function L (y; θ) is twice continuously diﬀerentiable in a neighborhood of θ0 . 8. Integration and diﬀerential operators are interchangeable. 9. The matrix ∂ 2 log f (Y ; θ0 ) I (θ0 ) = Eθ0 − ∂θ∂θ called information matrix, exists and non-singular. • The additional assumptions enables us to use diﬀerential method to obtain MLE and its asymptotic distribution. Lemma 7 ∂ log f (Y ; θ0 ) Eθ0 = 0. ∂θ 2 ⇒ ∂ log f (Y ; θ0 ) ∂ log f (y; θ0 ) Eθ0 = f (y; θ0 ) dy ∂θ ∂θ 1 ∂f (y; θ0 ) ∂f (y; θ0 ) = f (y; θ0 ) dy = dy f (y; θ0 ) ∂θ ∂θ However, f (y; θ0 ) dy = 1 by deﬁnition. Hence, diﬀerentiating with respect to θ gives ∂ ∂f (y; θ0 ) f (y; θ0 ) dy = dy = 0 ∂θ ∂θ Lemma 8 ∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 ) ∂ 2 log f (Y ; θ0 ) Eθ0 = Eθ0 − ∂θ ∂θ ∂θ∂θ ⇒ ∂ 2 log f (Y ; θ0 ) Eθ0 ∂θ∂θ 2 ∂ log f (y; θ0 ) ∂ ∂ log f (y; θ0 ) = f (y; θ0 ) dy = f (y; θ0 ) dy ∂θ∂θ ∂θ ∂θ ∂ 1 ∂f (y; θ0 ) = f (y; θ0 ) dy ∂θ f (y; θ0 ) ∂θ 1 ∂f (y; θ0 ) ∂f (y; θ0 ) 1 ∂ 2 f (y; θ0 ) = − 2 + f (y; θ0 ) dy (f (y; θ0 )) ∂θ ∂θ f (y; θ0 ) ∂θ∂θ 1 ∂f (y; θ0 ) 1 ∂f (y; θ0 ) ∂ 2 f (y; θ0 ) =− f (y; θ0 ) dy + dy f (y; θ0 ) ∂θ f (y; θ0 ) ∂θ ∂θ∂θ ∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 ) =− f (y; θ0 ) dy = −Eθ0 ∂θ ∂θ ∂θ ∂θ ∂ 2 f (y;θ0 ) The last line follows from the fact that ∂θ∂θ dy = 0. Proposition 9 Under 1,2’, 3 - 9, a sequence of MLE, θn , satisﬁes √ d −1 n θn − θ0 → N 0, I (θ0 ) ⇒ A Taylor series expansion of the ﬁrst order condition around the true value of θ, θ0 , yields ∂L θn ∂L (θ0 ) ∂ 2 L (θ∗ ) = + θn − θ0 ∂θ ∂θ ∂θ∂θ where θ∗ is on the line segment connecting θn and θ0 . From the ﬁrst order condition, we have ∂L (θ0 ) ∂ 2 L (θ∗ ) 0= + θn − θ0 ∂θ ∂θ∂θ Therefore, −1 √ 1 ∂ 2 L (θ∗ ) 1 ∂L (θ0 ) n θn − θ0 = − √ n ∂θ∂θ n ∂θ As n → ∞, n 1 ∂ 2 L (θ∗ ) 1 ∂ 2 log f (Yi ; θ∗ ) − = − n ∂θ∂θ n i=1 ∂θ∂θ 3 converges almost surely to ∂ 2 log f (Y ; θ0 ) I (θ0 ) = Eθ0 − ∂θ∂θ a.s. by the strong law of large numbers and the fact that θ∗ → θ0 . Moreover, n 1 ∂L (θ0 ) 1 ∂ log f (Y ; θ0 ) √ = √ n ∂θ n i=1 ∂θ n 1 ∂ log f (Y ; θ0 ) ∂ log f (Y ; θ0 ) = √ − Eθ0 n i=1 ∂θ ∂θ which converges in distribution to N (0, I (θ0 )) by the central limit theorem. We have used Lemma 7 and Lemma 8 here to get the asymptotic distribution of √n ∂L(θ0 ) . Then, 1 ∂θ √ d −1 n θn − θ0 → N 0, I (θ0 ) • The asymptotic distribution, itself is useless since we have to evaluate the information matrix at true value of parameter. However, we can consistently estimate the asymptotic variance of MLE by evaluating the information matrix at MLE, i.e., √ d −1 n θn − θ0 → N 0, I θn In other expression which is slightly misleading but commonly used in practice is d 1 −1 −1 θn → N θ0 , I θn =N θ0 , nI θn n ∂ 2 L(θn ) where nI θn = − ∂θ∂θ . We can also use the approximation that n ∂ log f yi ; θn ∂ log f yi ; θn nI θn = i=1 ∂θ ∂θ Proposition 10 Let g be a continuously diﬀerentiable function of θ ∈ Rp with values in Rq . Then, under the assumptions of Proposition 9, (i) g θn converges almost surely to g (θ0 ) . √ d dg (θ0 ) −1 dg (θ0 ) (ii) n g θn − g (θ0 ) → N 0, I (θ0 ) dθ dθ ⇒ The ﬁrst claim is straight application of Slutsky theorem. For the second claim, we do a Taylor expansion of g θn around θ0 to get dg (θ∗ ) g θn = g (θ0 ) + θn − θ0 dθ Hence, √ dg (θ∗ ) √ n g θn − g (θ0 ) = n θn − θ0 dθ Note that, as n → ∞, we have dg (θ∗ ) a.s. dg (θ0 ) → and dθ dθ √ d −1 n θn − θ0 → N 0, I (θ0 ) The claim follows immediately. 4

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 3 |

posted: | 10/9/2011 |

language: | English |

pages: | 4 |

OTHER DOCS BY fdh56iuoui

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.