# Parameter Estimation and Evaluation of Support

Document Sample

```					             Lecture 4

Parameter Estimation and Evaluation of
Support
Parameter Estimation

“The problem of estimation is of more central importance, (than
hypothesis testing).. for in almost all situations we know that the
effect whose significance we are measuring is perfectly real,
however small; what is at issue is its magnitude.” (Edwards, 1992,
pg. 2)

“An insignificant result, far from telling us that the effect is non-
existent, merely warns us that the sample was not large enough
to reveal it.” (Edwards, 1992, pg. 2)
Parameter Estimation

   Finding Maximum Likelihood Estimates (MLEs)
-   Local optimization (optim)
-   Global optimization
» Simulated Annealing (anneal)
» Genetic Algorithms (rgenoud)

   Evaluating the strength of evidence (“support”) for different
parameter estimates
-   Support Intervals
» Asymptotic Support Intervals
» Simultaneous Support Intervals
-   The shape of likelihood surfaces around MLEs
Parameter estimation: finding peaks on
likelihood “surfaces”...

The variation in likelihood for any given set of
parameter values defines a likelihood “surface”...

-147
The goal of
parameter
Log-Likelihood

-149
estimation is to find
the peak of the                        -151
likelihood surface....
(optimization)                         -153

-155
2   2.1   2.2     2.3   2.4   2.5    2.6   2.7   2.8
Parameter Estimate
Local vs Global Optimization

   “Fast” local optimization methods
- Large family of methods, widely used for nonlinear
regression in commercial software packages

   “Brute force” global optimization methods
- Grid search                0

- Genetic algorithms        -2       global optimum
Log Likelihood

- Simulated annealing -6
-4

-8
-10              local optimum
-12
-14
0   5         10           15   20
Parameter value

   Derivative-based (Newton-Raphson) methods:

l ( | y )  p( y |  )                      Likelihood surface
dl
 ...  0
d
General approach: Vary parameter estimate systematically
and search for zero slope in the first derivative of the
likelihood function...(using numerical methods to estimate the
derivative, and checking the second derivative to make sure it
is a maximum, not a minimum)

   The Simplex (Nelder Mead) method

- Much simpler to program
- Does not require calculation or estimation of a
derivative
-   No general theoretical proof that it works, (but lots
of happy practitioners…)
Global Optimization

“Virtually nothing is known about finding global extrema in
general.”

“There are tantalizing hints that so-called “annealing
methods” may lead to important progress on global
(optimization)...”

Quote from Press et al. (1986) Numerical Recipes
Global Optimization – Grid Searches

   Simplest form of optimization
(and rarely used in practice)
- Systematically search
parameter space at a grid of
points

   Can be useful for visualization
of the broad features of a
likelihood surface
Global Optimization – Genetic Algorithms

   Based on a fairly literal analogy with evolution
parameter sets
- Calculate the “fitness” (likelihood) of each individual
set of parameters
- Create the next generation of parameter sets based
on the fitness of the “parents”, and various rules for
recombination of subsets of parameters (genes)
- Let the population evolve until fitness reaches a
maximum asymptote
Global optimization - Simulated Annealing

   Analogy with the physical process of annealing:
- Start the process at a high “temperature”
- Gradually reduce the temperature according to an
annealing schedule
   Always accept uphill moves (i.e. an increase in
likelihood)
   Accept downhill moves according to the Metropolis
algorithm:

lh
              p = probability of accepting downhill move
lh = magnitude of change in likelihood
pe t                                           t = temperature
Effect of temperature (t)

lh                                 1
         Probability of Accepting                                      t=5
pe t            Downhill Move          0.8                                t=3
t=1
t = 0.5
0.6

0.4

0.2

0
0   2       4       6        8         10
Drop in likelihood
Simulated Annealing in practice...

A version with automatic adjustment of range...
Search range (step size)

Lower bound                                        Current value              Upper bound

REFERENCES:
Goffe, W. L., G. D. Ferrier, and J. Rogers. 1994. Global optimization of
statistical functions with simulated annealing. Journal of Econometrics
60:65-99.

Corana et al. 1987. Minimizing multimodal functions of continuous
variables with the simulated annealing algorithm. ACM Transactions on
Mathematical Software 13:262-280
Constraints – setting limits for the search...

   Biological limits
- Values that make no sense biologically (be careful...)
   Algebraic limits
- Values for which the model is undefined (i.e.
dividing by zero...)

Bottom line: global optimization methods let you cast
your net widely, at the cost of computer time...
Simulated Annealing - Initialization

   Set
- Annealing schedule
»   Initial temperature (t) (3.0)
»   Rate of reduction in temperature (rt) (0.95)N
»   Interval between drops in temperature (nt) (100)
»   Interval between changes in range (ns) (20)
- Parameter values
» Initial values (x)
» Upper and lower bounds (lb,ub)
» Initial range (vm)

Typical values in blue...
Simulated Annealing – Step 1

Pick a new set of parameter values (by varying just
1 parameter)
Begin {a single iteration}
{copy the current parameter array (x) to a temporary holder (xp) for this iteration}
xp := x;

vm is the range
{choose a new value for the parameter in use (puse)}
xp[puse] := x[puse] + ((random*2 - 1)*vm[puse]);
lb is the lower bound
ub is the upper bound
{check if the new value is out of bounds }
if xp[puse] < lb[puse] then xp[puse] := x[puse] - (random * (x[puse]-lb[puse]));
if xp[puse] > ub[puse] then xp[puse] := x[puse] + (random * (ub[puse]-x[puse]));
Simulated Annealing – Step 2

{call the likelihood function with the new set of parameter values}
likeli(xp,fp); {fp = new likelihood}

{accept the new values if likelihood increases or at least stays the same}
if (fp >= f) then
begin                                                 Accept the step
x := xp;
f := fp;
nacp[puse] := nacp[puse] + 1;
if (fp > fopt) then {if this is a new maximum, update the maximum likelihood}
begin
xopt := xp;
fopt := fp;
opteval := eval;
BestFit; {update display of maximum r}
end;
end
Simulated Annealing – Step 3

else {use Metropolis criteria to determine whether to accept a downhill move }
begin
try {fp < f, so the code below is a shortcut for exp(-1.0(abs(f-fp)/t)}
p := exp((fp-f)/t); {t = current temperature}
except
on EUnderflow do p := 0;
end;
pp := random;                    Use the Metropolis algorithm to
if pp < p then
begin
decide whether to accept a
x := xp;                                downhill step...
f := fp;
nacp[puse] := nacp[puse] + 1;
end;
end;
Simulated Annealing – Step 4

Periodically adjust the range (VM) within which new steps
are chosen...
{after nused * ns cycles, adjust VM so that half of evaluations are accepted}
If eval mod (nused*ns) = 0 then
begin                                               ns is typically ~ 20
for i := 0 to npmax do if xvary[i] then

This part is strictly ad hoc...
begin
ratio := nacp[i]/ns;
{ C controls the adjustment of VM (range) - references suggest setting at 2.0}
if (ratio > 0.6) then vm[i] := vm[i]*(1.0+c[i]*((ratio - 0.6)/0.4))
else if ratio < 0.4 then vm[i] := vm[i]/(1.0+c[i]*((0.4 - ratio)/0.4));
if vm[i] > (ub[i]-lb[i]) then vm[i] := ub[i] - lb[i];
end;
{ reset nacp[i]}
for i := 1 to npmax do nacp[i] := 0;
end;
Effect of C on Adjusting Range...

6
Fractional change in range

5
C=1
C=2
4
C=4
3

2

1

0
0   0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fraction of steps accepted
Simulated Annealing Code – Final Step

Reduce the “temperature” according to the annealing
schedule
{after nused * ns * nt cycles, reduce temperature t }
If eval mod (nused*ns*nt) = 0 then
I typically set nt = 100
begin
t := rt * t;
(a very slow annealing)
{store current maximum lhood in history list}
lhist[eval div (nused*ns*nt)].iter := eval;
lhist[eval div (nused*ns*nt)].lhood := fopt;
end;

rt = fractional reduction in temperature at each drop in
temperature:
NOTE: Goffe et al. restart the search at the previous MLE
estimates each time the temperature drops... (I don’t)
How many iterations?...

0        2500000        5000000
-218                                                         -11300
-220                                                         -11320
Maximum Likelihood

-11340

Likelihood
-222

-224
-11360
-11380
-226
-11400
-228
-11420
-230
-11440
-232                                                         -11460
0   100000 200000 300000 400000 500000
Iteration
Iteration

Logistic regression of windthrow
Red maple leaf litterfall                                                       susceptibility
(6 parameters)                                                           (188 parameters)
500,000 is way more than necessary!                                             5 million is not enough!

What would constitute convergence?...
Optimization - Summary

   No hard and fast rules for any optimization – be willing
to explore alternate options.

   Be wary of initial values used in local optimization
when the model is at all complicated

annealing, then switch to a local optimization…
Evaluating the strength of evidence for the MLE

Now that you have an MLE, how should you
evaluate it?

(Hint: think about the shape of the likelihood function, not
just the MLE)
Strength of evidence for particular
parameter estimates – “Support”

Log-likelihood = “Support” (Edwards 1992)

   Likelihood provides an objective measure of the
strength of evidence for different parameter
estimates...
-147
Log-Likelihood

-149

-151

-153

-155
2   2.1   2.2     2.3   2.4   2.5    2.6   2.7   2.8
Parameter Estimate
Fisher’s “Score” and “Information”

   “Score” (a function) = First derivative (slope) of the
likelihood function
- So, S(θ) = 0 at the maximum likelihood estimate of θ
   “Information” (a number) = -1 * Second derivative
(acceleration) of the likelihood function, evaluated at
the MLE..
- So this is a number: a measure of how steeply
likelihood drops off as you move away from the MLE
- In general cases, “information” is equivalent to the
variance of the parameter…
Profile Likelihood

   Evaluate support (information) for a range of values of
a given parameter by treating all other parameters as
“nuisance” and holding them at their MLEs…
Parameter 2

Parameter 1
Asymptotic vs. Simultaneous
M-Unit Support Limits

   Asymptotic Support Limits (based on Profile Likelihood):
- Hold all other parameters at their MLE values, and
systematically vary the remaining parameter until
likelihood declines by a chosen amount (m)...
-147
Maximum likelihood estimate

-149
Log-Likelihood

What should “m”
be? (2 is a good                    -151
number, and is
roughly analogous                    -153              2-unit support interval
to a 95% CI)
-155
2   2.1   2.2     2.3   2.4    2.5   2.6   2.7    2.8
Parameter Estimate
Asymptotic vs. Simultaneous
M-Unit Support Limits

   Simultaneous:
- Resampling method: draw a very large number of
random sets of parameters and calculate log-
likelihood. M-unit simultaneous support limits for
parameter xi are the upper and lower limits that
don’t differ by more than m units of support...

In practice, it can require an enormous number of iterations to
do this if there are more than a few parameters
Asymptotic vs. Simultaneous Support Limits

A hypothetical likelihood surface for 2 parameters...

Simultaneous 2-unit
support limits for P1
Parameter 2

2-unit drop
in support

Asymptotic 2-unit
support limits for P1

Parameter 1
Other measures of strength of evidence
for different parameter estimates

   Edwards (1992; Chapter 5)
- Various measures of the “shape” of the likelihood
surface in the vicinity of the MLE...

How pointed is the peak?...
Bootstrap methods

   Bootstrap methods can be used to estimate the
variances of parameter estimates
- In simple terms:
» generate many replicates of the dataset by sampling with
replacement (bootstraps)
» Estimate parameters for each of the datasets
» Use the variance of the parameter estimates as a bootstrap
estimate of the variance
Evaluating Support for Parameter
Estimates: A Frequentist Approach

   Traditional confidence intervals and standard errors of
the parameter estimates can be generated from the
Hessian matrix
- Hessian = matrix of second partial derivatives of the
likelihood function with respect to parameters,
evaluated at the maximum likelihood estimates
- Also called the “Information Matrix” by Fisher
- Provides a measure of the steepness of the
likelihood surface in the region of the optimum
- Can be generated in R using optim and fdHess
Example from R

The Hessian matrix (when maximizing a log likelihood) is a numerical
approximation for Fisher's Information Matrix (i.e. the matrix of
second partial derivatives of the likelihood function), evaluated at the
point of the maximum likelihood estimates. Thus, it's a measure of
the steepness of the drop in the likelihood surface as you move away
from the MLE.

> res\$hessian
a                  b                 sd
a         -150.182          -2758.360        -0.201
b         -2758.360        -67984.416        -5.925
sd        -0.202           -5.926            -299.422

(sample output from an analysis that estimates two parameters and a variance term)
More from R

now invert the negative of the Hessian matrix to get
the matrix of parameter variance and covariance
> solve(-1*res\$hessian)
a             b               sd
a       2.613229e-02 -1.060277e-03    3.370998e-06
b      -1.060277e-03 5.772835e-05     -4.278866e-07
sd     3.370998e-06   -4.278866e-07   3.339775e-03

the square roots of the diagonals of the inverted
negative Hessian are the standard errors*
> sqrt(diag(solve(-1*res\$hessian)))
a       b         sd
0.1616 0.007597   0.05779

(*and 1.96 * S.E. is a 95% C.I….)

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 10 posted: 9/8/2010 language: English pages: 35
How are you planning on using Docstoc?