LECTURE 11

Document Sample
LECTURE 11 Powered By Docstoc
					                                  Lecture :# 11     Date 10/13/04



   1.      Refer to pages 8-9 on graphical methods to solve LINEAR optimization problems.


                                          TODAY'S CLASS:

   1.      Basic principles of how a numerical routine will work-derivative and iteratively
           double derivative for non-linear optimization. information

   2.      Limitations of classical techniques (based on derivatives)

   3.      Recent non-conventional methods -counter intuitive methods Genetic Algorithm,
           Simulated Annealing

   4.      Reading Assignment of paper handed out on lecture # 10.

                                 Gradients and Double Derivatives

                               ˆ           ˆ
   For a univariate case, f( X ) where X = {x1} deriving gradients and double derivatives is
   easy. In reality, problems are hardly univariate in nature.

   But if if  = {x1 x2 x3 . . . . . . . xn}T, then comes the issue of vectors, Matrices, Jacobians and
   Hessian.


   Univariate                                        Multivariate
    ˆ
   X {x1} scalor                                      ˆ
                                                      X {x x . . . xn}vector
   Gradient                                          Jacobian vector
   Double derivative                                 Hessian Matrix



Example :  = {x1 x2}T

            F ()

           The Jacobian would be :

                    
                                                                     
             F                              2F 2F          
             x                              2                
             1  And Hession would be         x1 x1x 2      
             F                              2                
             x                              F      2F      
             2                               x2x1 x2 2
                                                                
                                                                 
Now notice the Hessian matrixi:

           (1) It is a square matrix

           (2) It is a symmetric matrix

                                                                               2F
           (3) Diagonal times are the absolute double derivatives                   not mixed like
                                                                               x 2
                 2F
                          .
                x 2 x 2

                                                                                    2F
       Earlier when we talked about Minimality requirements we mentioned                  0 (for a
                                                                                    x 2
       univariate case). However, for Hessians, the terminology now changes as it in a
       MATRIX.

       For requirements the Hessian H MUST be positive definite.

       Positive Definiteness

       A matrix is positive definite when its eigen values are all positive.

            A I = 0

           A = matrix / Identity matrix eigen value.

           See Hand-out on Eigen value.

           Another definition of positive definiteness of a matrix is,

                          a11 a12 a13 . . . . . . a1n 
                          a a                         
                           21 22 a13 . . . . . . a1n 
                          .                           
           if suppose A                              
                          .                           
                          .                           
                                                      
                          an1 a n2 .......... ann 
                                            ..... 
           nn
           matrix

Then say         A1 = a11

                      a11 a12
               A2 =
                      a21 a 22
                           a11 a12      a13
                    A3 = a21 a 22       a 23
                           a31 a 32     a 33

                           The whole matrix
                    An =
                                       nn

         Then A is positive definite if

A1, A2, A3 . . . . . An are ALL positive.

    (1) So REMEMBER - when it comes to multivariate case of

     ˆ
     X = {x1 x2 . . . . . xn}T, 'Hessian' matrix require positive –definiteness.

                        SEE HAND OUT ON EIGEN VALUES

Home Work # 3 PROBLEM SOLUTION

Maximum volume of a box in a sphere of unit radius:

Using constraint :
                    1
x3 = (1-x12-x22) 2

Rewrite objective function as,
                                   1
f (x1 x2) = 8x1x2 (1- x12-x22) 2


                             
                                                   1

f                                            
                                                       2
                                 x2
     8 x  1 -x12  x 2 2 2  2 1 2            0
                                               
                           1


x1                          x1  x2          
         2                                    

f
x1
     8 x1
             
             
                              
              1 -x12  x 2 2 12        x2 2
                                   1  x12  x2 2          
                                                           1
                                                                 
                                                                 0
                                                               2 
                                                                

or, 1-2x12-x22 = 0

1-x12-2x22= 0

                              1
which gives xA1= xA2 =
                               3
         1
 xA1=      So it's a CUBE
          3
Now check for 2nd order partial derivatives  write the Hession - if you wish !!
                         NON-LINEAR OPTIMIZATION (CLASSICAL)
                                      ROUTINES

    (1) They are all iterative
         - F(x) - is highly non-linear

                                      ˆ
    (2) Start with some initial guess X 

    (3) Gradually 'head' towards optimum by trial (or search)

        and error using knowledge of gradient and double derivatives.

    (4) Some methods just use gradients  called Gradient Methods

        Some methods also use double gradients for more accuracy.

    (5) Work on the concept of 'step length' and direction

    (6) Direction  Downhill -ve gradient

        Step length - compromise needed - too small will take you FOREVER to climb a
        MOUNTAIN

                                                   too big - you may miss peaks.

                                                   dF        ˆ
    (7) Obviously the search direction will be -         F ( X ) if we want to reach the minimum.
                                                    ˆ
                                                   dx

Let's write in mathematical form our problems
Suppose 'p' is search direction and 'x' is step length, then ideally our next best approximation to
minimum value of f(xn) at iteration K (if k = o, x0 is initial ppoint) is,
               f(xk+p)
       which can be expanded as a Taylor's series,
                               1 2 T 2
 f((xk+p) = (xk) + pTfk+       p  f ( x k ) p (orders highest than 2 neglected).
                               2
Now, (1) If the algorithm works on both       fk    &     2fk
                                            Jacobian        Hessian

then it will be more accurate than as algorithm using just fk.

Why? Because the Taylor series approximation is MORE ACCURATE

Figure to be drawn in class



But the problems with Hessian:-

    (1) More computationally intensive to compute Hessian than Jacobian (more computer
        operations)

    (2) Gradient only methods will be efficient but may not be as accurate.
                                SIGNS OF A GOOD ALGORITHM

   (1) Should not be sensitive to initial guess.

   (2) Should not be too sensitive to step length

   (3) Converge fast

   (4) Should not get trapped easily in LOCAL MINIMA.

                                        SO MUCH FOR
                                  NON-LINFAR OPTIMIZATION
                                 BUT WHAT IS THE PROBLEM IN
                                       HYDROSCIENCE

   A. Unfortunately Model response (out put), say, goodness of fit with observation, its
                      ˆ
      surface, with X , model parameters is a very noisy, complicated discontinuous surface-
      Full of local minim, ridges, flat areas etc. (show example in class) – Reter to saddle
      points pts of inflection

   B. Gradient based techniques tend to get trapped or stuck - even computing gradient is
      difficult at times.

   C. Scaling issues - f(x) = 1O9x12+x22

   f(x) is very sensitive to slight change in x1 but not as sensitive to x2.

   In natural systems f(x) type objective functions are very common- Why?

   Consider river flow with ground water flow

   River flow ~ km/hrs

   ground water flow ~ cm/hr

   There is a scale difference of 105 orders

   Given these known problems,

   Hydrologists come up with non conventional techniques. I'll mention only two:

   A. Genetic Algorithm

   B. Simulated Amealing

                       READING ASSIGNMENT FOR FALL BREAK:

Read the paper by Duan et al. (1992)

Next class Oct 20/22-write in ONE PAGE summary of what you've understood reading the
papers.

HW#4
Solve:

Compute the gradient f(x) and 2f(x) of the Rosenbrock function:

f(x) = 100(x2-x12)2+(1-x1)2

Find the 'global' minimum.

(you have to show that Hessian is +ve definite)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:8
posted:6/27/2012
language:
pages:6