Embed
Email

lecture

Document Sample

Shared by: changcheng2
Categories
Tags
Stats
views:
2
posted:
1/15/2012
language:
pages:
19
Non-Analytic Derivatives — Finite Differencing

Finite Differencing — Sparsity and Symmetry

#include









Numerical Optimization

Lecture Notes #16

Calculating Derivatives — Finite Differencing





Peter Blomgren,

blomgren.peter@gmail.com

Department of Mathematics and Statistics

Dynamical Systems Group

Computational Sciences Research Center

San Diego State University

San Diego, CA 92182-7720

http://terminus.sdsu.edu/





Fall 2011



Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (1/19)

Non-Analytic Derivatives — Finite Differencing

Finite Differencing — Sparsity and Symmetry

#include





Outline







1 Non-Analytic Derivatives — Finite Differencing

Taylor’s Theorem ⇒ Finite Differencing

Finite Difference Gradient

Finite Difference Hessian



2 Finite Differencing — Sparsity and Symmetry



3 #include

Project Milestone #2









Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (2/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Derivatives Needed!!!





As we have seen (and will see), algorithms for nonlinear

optimization (and nonlinear equations) require knowledge of

derivatives:



Nonlinear Optimization Nonlinear Equations

Gradient, vector, 1st order Jacobian, matrix, 1st order

Hessian, matrix, 2nd order



Often it is quite trivial to provide the code which computes those

derivatives, but in some cases the analytic expression for the

derivatives are not available and/or not practical to evaluate.

In those cases we need some other way to compute or

approximate the derivatives.



Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (3/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Finite Differences — The Return of Taylor’s Theorem



x

We can get an approximation of the gradient ∇f (¯) by evaluating

the objective f at (n + 1) points, using the forward difference

formula



x

∂f (¯) x e x

f (¯ + ǫ¯i ) − f (¯)

≈ , i = 1, 2, . . . , n,

∂xi ǫ

e

where ¯i is the i th unit vector, and ǫ > 0 is small.

If f is twice continuously differentiable, then by Taylor’s Theorem

1

f (¯ + p) = f (¯) + ∇f (¯)T p + pT ∇2 f (¯ + t¯)¯,

x ¯ x x ¯ ¯ x pp t ∈ (0, 1),

2

¯ e

with p = ǫ¯i , i.e.

1

f (¯+ǫ¯i ) = f (¯)+ǫ∇f (¯)T ¯i + ǫ2¯T ∇2 f (¯+tǫ¯i )¯i ,

x e x x e e x e e i = 1, 2, . . . , n.

2 i

Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (4/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Forward Differences Building the Gradient



With a bit of re-arrangement we see

x e x

f (¯ + ǫ¯i ) − f (¯) 1

∇f (¯)T ¯i =

x e − ǫ ¯T ∇2 f (¯ + tǫ¯i )¯i

e x e e

ǫ 2 i

x

∂f (¯)

∂xi Finite Difference Approximation Approximation Error







If the Hessian ∇2 f (¯) is bounded, i.e. ∇2 f (¯) ≤ Lc , then we

x x

have

x

∂f (¯) x e x

f (¯ + ǫ¯i ) − f (¯)

≈ ,

∂xi ǫ

where the approximation error is bounded by

ǫLc

.

2

Since the error is proportional to ǫ, this is a first-order

approximation.

Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (5/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Selecting ǫ Machine Epsilon / Unit Roundoff 1 of 3



Clearly, the smaller the ǫ the smaller the error. How small can we

set ǫ in finite precision???

Let ǫmach denote value for machine epsilon, a.k.a. unit roundoff, it

is essentially the largest value for which



((1.0 + ǫmach ) − 1.0) = 0, in finite precision



ǫmach ≈ 10−16 in double-precision arithmetic (IEEE 64-bit floating

point: “C” double, and Matlab internals on typical Intel-based

systems.)

ǫmach is a measure of how well (or badly) we can represent any

number in finite precision, and in extension a measure of the (best

case) quality of every computation.



Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (6/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Selecting ǫ 2 of 3



x x

If Lf is a bound on the value of f (¯), i.e. |f (¯)| ≤ Lf , then in finite

precision we have



x

computed(f (¯)) − f (¯) x ≤ ǫmach Lf

x e x e

computed(f (¯ + ǫ¯i )) − f (¯ + ǫ¯i ) ≤ ǫmach Lf .



Now, if we recall our finite difference approximation (with a slight

abuse of notation)



x

∂f (¯) x e x

f (¯ + ǫ¯i ) − f (¯) ǫLc

≈ + error .

∂xi ǫ 2

We find that the total error is

2ǫmach Lf ǫLc

error ∼ + .

ǫ 2

Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (7/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Selecting ǫ 3 of 3



Now,



derror 2ǫmach Lf Lc 4ǫmach Lf

∼− 2

+ =0 ⇒ ǫ2 = ,

dǫ ǫ 2 Lc

gives us the optimal value for epsilon. Since Lf and L are unknown

in general, most software packages, tend to select





ǫ= ǫmach ,



which is close to optimal in most cases.

Hence, the error in the approximated gradient is

√ Lc √

error ∼ 2Lf ǫmach + ǫmach .

2

Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (8/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Central Differences O h2 Accuracy

` ´





At twice the cost, we can get about 2.67 extra digits of precision in the

finite difference approximation, by using central differences.

More Taylor expansions...

2

∂f 1 ∂

f (¯ + ǫ¯i ) = f (¯) + ǫ ∂xi + 2 ǫ2 ∂xf2 + O(ǫ3 )

x e x

i

2

∂f 1 ∂

f (¯ − ǫ¯i ) = f (¯) − ǫ ∂xi + 2 ǫ2 ∂xf2 + O(ǫ3 )

x e x

i

∂f

f (¯ + ǫ¯i ) − f (¯ − ǫ¯i ) =

x e x e 2ǫ ∂xi + O(ǫ3 )



We get

x

∂f (¯) x e x e

f (¯ + ǫ¯i ) − f (¯ − ǫ¯i )

= + O(ǫ2 ),

∂xi 2ǫ

by arguments similar to the ones for the forward difference formula, we

can show that the optimal ǫ and overall error is

√ 2/3

ǫ= 3

ǫmach ⇒ error ∼ O ǫmach .



Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (9/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Approximating the Hessian The Easy Case 1 of 5



The easy case: Analytic Gradient given

If the analytic gradient is known, then we can get an

approximation of the Hessian by applying forward or central

differencing to each element of the gradient vector in turn.

When the second derivatives exist and are Lipschitz continuous,

Taylor’s theorem says



∇f (¯ + p) = ∇f (¯) + ∇2 f (¯)¯ + O( p 2 ).

x ¯ x xp ¯



¯ e

Again, we let p = ǫ¯i , i = 1, 2, . . . , n and get



x e x

∇f (¯ + ǫ¯i ) − ∇f (¯)

∇2 f (¯)¯i ≈

xe + O(ǫ), or

ǫ

x e x e

∇f (¯ + ǫ¯i ) − ∇f (¯ − ǫ¯i )

∇2 f (¯)¯i ≈

xe + O(ǫ2 ).



Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (10/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Approximating the Hessian Symmetrize 2 of 5









It is worth noting that this is a column-at-a-time process, which

does not — due to numerical roundoff and approximation errors —

necessarily give a symmetric Hessian.

It is often necessary to symmetrize the result



sym 1 T

Hnum = Hnum + Hnum .

2









Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (11/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Approximating the Hessian Special Case 3 of 5







Special Case: In Newton-CG methods we do not require full

knowledge of the Hessian. Each iteration requires the

Hessian-vector product ∇2 f (¯)¯, where p is the given search

xp ¯

direction, this expression can be approximated



x p x p

∇f(¯ + ǫ¯) − ∇f (¯[−ǫ¯])

∇2 f (¯)¯ ≈

xp + O(ǫ[2] )

[2]ǫ

This approximation is very cheap — only one [two] extra gradient

evaluation[s] is [are] needed.









Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (12/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Approximating the Hessian Hard (Realistic) Case 4 of 5





The harder case: Analytic Gradient not given

When the analytic gradient is not given we must use a finite

difference formula using only function values to approximate the

Hessian.

The first order forward difference approximation is given by



∂ 2 f (¯)

x x e e

f (¯ + ǫ¯i + ǫ¯j ) − f (¯ + ǫ¯i ) − f (¯ + ǫ¯j ) + f (¯)

x e x e x

≈ 2

∂xi ∂xj ǫ

j



−1 +1





i

+1 −1









Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (13/19)

Non-Analytic Derivatives — Finite Differencing Taylor’s Theorem ⇒ Finite Differencing

Finite Differencing — Sparsity and Symmetry Finite Difference Gradient

#include Finite Difference Hessian





Approximating the Hessian 5 of 5



At a price of ∼ n2 additional function evaluations (an increase of

33%) we can use the second order central difference

approximation



∂ 2 f (¯)

x x e e x e

f (¯ + ǫ¯i + ǫ¯j ) − f (¯ + ǫ¯i − ǫ¯j ) − f (¯ − ǫ¯i + ǫ¯j ) + f (¯ − ǫ¯i − ǫ¯j )

e x e e x e e



∂xi ∂xj 4ǫ2



j



−1 +1



i





+1 −1



∂ 2 f (¯)

x

Figure: The second order 4-point central difference approximation stencil for ∂xi ∂xj

at the central point in the stencil — note that the value in that point is not part

of the evaluation!

Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (14/19)

Non-Analytic Derivatives — Finite Differencing

Finite Differencing — Sparsity and Symmetry

#include





Sparsity and Symmetry 1 of 3





Now that we are paying ∼ 4 function evaluations per entry in the

Hessian matrix, it is worth taking sparsity and symmetry into

account.

Ponder the extended Rosenbrock function:

double function rosenbrock( int n, double *x )

{

xxdouble f = 0.0;

xxintxxxxi;

xxfor( i=0; i 1.



Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (15/19)

Non-Analytic Derivatives — Finite Differencing

Finite Differencing — Sparsity and Symmetry

#include





Sparsity and Symmetry 2 of 3





The fill-pattern of the Hessian of the extended Rosenbrock

function consists of 2×2-diagonal blocks:

0





1





2





3





4





5





6





7





8





9

0 1 2 3 4 5 6 7 8 9

nz = 16









There are a lot of zero-entries in this Hessian. If somehow we have

knowledge of the sparsity pattern, then we can exploit this by not

computing/touching the zeros.



Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (16/19)

Non-Analytic Derivatives — Finite Differencing

Finite Differencing — Sparsity and Symmetry

#include





Sparsity and Symmetry 3 of 3



By using the fact that the Hessian is symmetric, we can save about

half of the work,

0 0





1 1





2 2





3 3





4 4





5 5





6 6





7 7





8 8





9 9

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9

nz = 36 nz = 28





Figure: The entries to the left Hij , j ≤ i must be computed, but using symmetry

we can fill in the missing ones Hij = Hji , j > i.





Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (17/19)

Non-Analytic Derivatives — Finite Differencing

Finite Differencing — Sparsity and Symmetry Project Milestone #2

#include





Project Extensions Milestone #2: with linesearch+1







Add the following to your codebase:

fdhessg Finite difference approximation to the Hessian using analytic gradient.

(Executed when analgrad=TRUE, analhess=FALSE, cheapf=TRUE.)

fdjac The core call from fdhessg, note that fvec in the pseudo-code

corresponds to your analytic gradient.



fdgrad Finite difference (forward) approximation to the gradient. (Executed

when analgrad=FALSE.)

fdhessf Finite difference approximation to the Hessian using only func-

tion values. (Executed when analgrad=FALSE, analhess=FALSE,

cheapf=TRUE.)



Compare: Performance of analytic everything (from before) / analytic gradient

(fdhessg+fdjac) / finite difference everything (fdhessf+fdgrad). Try optimal and

non-optimal ǫ. Use 2 test problems from Dennis-Schnabel, Appendix B.

Add-on: Central differencing strategies.









Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (18/19)

Non-Analytic Derivatives — Finite Differencing

Finite Differencing — Sparsity and Symmetry Project Milestone #2

#include





Project: Milestone #0









Please let me know in the very near future what you are working

on!!!









Peter Blomgren, blomgren.peter@gmail.com Calculating Derivatives — Finite Differencing — (19/19)



Related docs
Other docs by changcheng2
examples
Views: 0  |  Downloads: 0
Reg_2011_Cl_3à_pr_gir_2
Views: 0  |  Downloads: 0
odgupdates
Views: 0  |  Downloads: 0
CecilCounty
Views: 0  |  Downloads: 0
CP_Snow_lect
Views: 0  |  Downloads: 0
Magie_et_croyances
Views: 3  |  Downloads: 0
RFHSnack_bar_Schedule_2010
Views: 1  |  Downloads: 0
Porcelain _ Bakelite Lampholders
Views: 0  |  Downloads: 0
Algebra
Views: 3  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!