# Function Approximation Algorithms for Continuous Optimization of

Document Sample

```					Function Approximation Algorithms for
Continuous Optimization of Multi-modal
Computationally Expensive Models

Christine Shoemaker and Rommel Regis
Civil and Environmental Engr. &
Operations Research and Industrial Engr.
Cornell University

With some results from joint work with others as noted

SAMSI Workshop
1
September, 2006
Applications of Optimization
• Optimization in environmental problems
can be used for designing the “best”
solution to a pollution problem.
• Optimization can also be used to solve the
“inverse” problem, i.e. given observations,
find the best values of calibration
parameters to fit the observations.

2
Global versus Local Minima
Multi-Modal Problems have Multiple local minima

F(x)

Local minimum

Global minimum

X (parameter value)
3
Why do We Care About Multi-
Modal Functions?

• With Complex nonlinear models, we
typically do not know if the model is multi-
modal (i.e. has many local minima) or
unimodal.
• Hence to analyze a complex model we
need to search for a global minimum even
if we do not know if it is multi modal.
4
Why Do We Care About
Optimization of Computationally
Expensive Functions?
• Many engineering and physical science
problems are described by complex
models that take hours or longer to run.
.
• We need new methods for optimization
that can find solutions with few evaluations
of the computationally expensive function.

5
Many Algorithms Are Not Feasible for
Computationally Expensive Functions

• Some methods for optimization or uncertainty of
multi-modal functions assume that we can make
thousands or tens of thousands of simulation.
• Doing 5,000 evaluations of a 1 hour simulation
model takes about 7 months, which is usually
not feasible
• If the simulation model is computationally
expensive, we need special algorithms that do
not require such large number of simulation
evaluations.
6
Goal:

Derivative-Free Optimization of

Computationally Expensive

Black Box,

Multimodal Functions

7
Optimization
This can be a
• Our goal is to find the               measure of
error between
model
minimum (or maximum) of f(x)   prediction and
observations
where x є D               X can be
parameter
values
• .
• Let Fmax be the maximum number of function evaluations
(e.g. simulations)
• We want Fmax to be small because f(x) is “costly” to
evaluate
• Evaluation of the objective function f(x) requires a
simulation of a computationally expensive model G(x)
8
Computationally Expensive
Functions and Limited Evaluations
Our focus is on computationally expensive
functions, but an algorithm’s performance
will depend more directly on the number of
function evaluations (simulations).

Fmax= total number of function simulations

Fmax= total computer time budget (hours)
time for one simulation (hours)   9
Function Approximation Methods
• A nonlinear function approximation R(x) is a
continuous nonlinear multivariate approximation
to f(x).

• R(x) has also been called a “response surface”
or a “surrogate model”.

• We use radial basis functions for our function
approximation, but other methods for non-
convex surfaces could also be used.
10
Why Use Function
Approximation Methods?

• A function approximation R(x) can be
used as part of an efficient parallel
optimization algorithm in order to
reduce the number of evaluations of
f(x).

11
Function Approximation of the Objective Function
Versus Low Order Approximation
of the Simulation
• Note that in our approach, we are approximating
(building a response surface) for the objective
function f(x)
• This is distinctly different from using a “low
fidelity” response surface approximation of the
simulation model G(x).
• The expected advantage of this is that the
approximation can focus on the factors that are
important in terms of the objective function.

12
Experimental Design with Symmetric
Latin Hypercube (SLHD)

• To fit the first function approximation we
need to have evaluated the function at
several points.
• We use a symmetric Latin Hypercube
(SLHD) to pick these initial points.

13
One Dimensional Example of Experimental Design
to Obtain Initial Function Approximation

Objective              Costly Function Evaluation
Function    (e.g. over .5 hour CPU time for one evaluation).

f(x)
measure
of error

x (parameter value-one dimensional example)

14
Function Approximation with Initial
Points from Experimental Design

f(x)

x (parameters)
In real applications x is multidimensional since there are many
parameters (e.g. 10).                                             15
Update in Function Approximation with New Evaluation

Update done in each iteration for function
approximation for each algorithm expert.
f(x)

new

x (parameter value)
Function Approximation is a guess of the function value of
f(x) for all x.                                                 16
• The following section will discuss
application of an earlier version of our
function approximation algorithms to a
realistic and computationally intensive
simulation model.
• A later section will discuss the newer
version of the algorithms with theoretical
and numerical results.

17
NSF Support

• The following results are from my NSF grant
“Improving Calibration, Sensitivity and
Uncertainty Analysis of Data-Based Models of
the Environment” from Engineering Directorate.
• This example requires solution of a systems of
partial differential equations and is a truly
“computationally expensive function” requiring
hours for one evaluation.

18
Optimization of Calibration of
Groundwater Bioremediation
Model
Christine Shoemaker
Rommel Regis

Published in Dec. 2005 in
Water Resources Research

19
Engineered Dechlorination by Injection of
Hydrogen Donor and Extraction

Injected Donor promotes degradation by providing hydrogen.

Groundwater decontamination can cost hundreds of millions
of dollars at a single site so optimization is important.

20
Complex model: 22 species at each of thousands
of nodes of finite difference model to solve system
of partial differential equations

Dechlorinator

Chlorinated.
Ethenes
PCE      TCE              DCE             VC     Ethene

Donors
H2
Butyrate

Acetate                      Methane
Propionate
Prop2Ace         But2Ace        Hyd2Meth

Lactate
Lac2Prop         Lac2Ace        But2Ace    21
Objective Function
Model Observed
 ∑ (C − C )              s         o        2

−1                 
S                i , j ,t   i , j ,t
j ,t
Min. CNS =            ∑ 1 −                                    
S   i =1 
∑(       o
Ci, j ,t − Ciav 2
)       

    j ,t                            

Where the decision variables are model parameters and
CNS is the Cumulative Nash-Sutcliffe coefficient (1>CNS>-∞)
Ciav is the observed average for species i, over time and space
(which is obtained by substituting the current values of the
parameters into the partial differential equations and solving them
numerically).
CNS is a dimensionless transformation of the squared residual error – it
weights the residuals with squared deviation of observed values from the
mean                                                                          22
Optimization of Calibration of
Chlorinated Ethenes Model
• The model involves solution of a system of
partial differential equations by finite
difference methods.
• The equations are highly nonlinear
because of biological reactions.
• Optimization applied to two cases:
– hypothetical- at least 8 minutes/simulation
– field data application- 3 hours/simulation

23
Algorithms Used for Comparison of
Optimization Performance on Calibration
•   Stochastic Greedy Algorithm
– Neighborhood defined to make search global
– Neighbors generated from triangular distribution around current
solution. Moves only to a better solution.
•   Evolutionary Algorithms
– Derandomized evolution strategy DES with lambda = 10 and b1 =
1/n and b2 = 1/n0.5 (Ostermeier et al. 1992)
– Binary or Real Genetic algorithm GA, population size 10, one point
cross-over, mutation probability 0.1, crossover probability 1
•   RBF Function Approximation Algorithms
– RBF Gutmann- radial basis function approach, with cycle length
five, SLH space filling design
Global Stochastic RBF-Cornell radial basis function approach
•   FMINCON
– derivative based optimizer in Matlab with numerical derivatives

•   10 trials of 100 function evaluations were performed for heuristic
and function approximation algorithms for comparison
24
Algorithms Used for Comparison of
Optimization Performance on Calibration
•   Stochastic Greedy Algorithm
– Neighborhood defined to make search global
– Neighbors generated from triangular distribution around current
solution. Moves only to a better solution.
•   Evolutionary Algorithms
– Derandomized evolution strategy DES with lambda = 10 and b1 =
1/n and b2 = 1/n0.5 (Ostermeier et al. 1992)
– Binary or Real Genetic algorithm GA, population size 10, one point
cross-over, mutation probability 0.1, crossover probability 1
•   RBF Function Approximation Algorithms
– RBF Gutmann- radial basis function approach, with cycle length
five, SLH space filling design
Stochastic RBF-Cornell radial basis function approach (related to
CORS).
•   FMINCON
– derivative based optimizer in Matlab with numerical derivatives

•   10 trials of 100 function evaluations were performed for heuristic
and function approximation algorithms for comparison                    25
Comparison of Algorithm Performance on
Hypothetical Aquifer – CNS

Experimental Design for RBF
algorithms gets over at 28

26
Mean and Standard deviation of best solution
produced after 100 function evaluations –
Hypothetical Example

7
6
-[Average NS ± S.D.]

5
4
3
2
1
0
-1

S
A

A

S
ES

N
n

-R
G

G
iG

an

O
D

R

FA

C
B

m

IN
ut

FM
-G
FA

•Algorithm with a lowest mean and lowest standard deviation is
desirable
27
•Based on 10 trials
Real Field Site: Alameda Field Data
• The next step was to work with a real field
site.

• We obtained data from a DOD field site
studied.
•
• Running the simulation model takes about
three hours for one run of the chlorinated
ethene model at this site because of the
nonlinearities in the kinetics equations. 28
Numerical Set-up for
Simulation
• 3-D problem with 8
layers                              h = 21’ (6.4 m)
• Finite difference grid
with 17Cx41R
flow
• Constant head of 21’ up

171 ft (52m)
• No flux boundaries
along columns on either
side
• Active boundary down
• About 4800 nodes            No flow                       No flow

65 ft (22m)
29
Numerical set-up for simulation
Why Does Function Evaluation
Take So Long?
• Partial differential equation has 22 species at
each node in finite difference model and a total
of 1448 time steps.
• There are in 4800 nodes 3 dimensions.
• At each time step the value of the 22 species at
each node must be computed iteratively to solve
the nonlinear system of equations.
• The total number of unknowns over all time
steps in this nonlinear system is about 153
million.
• Many groundwater problems are larger than this.  30
Comparison of Algorithm Performance for
Alameda Field Site - CNS

Experimental Design for FA
algorithms gets over at 45

31
Comparison of Algorithm Performance for
Alameda Field Site After Experimental Design -
CNS

•3 trials of 100 function evaluations were performed for each algorithm        32
•Average value of the objective function shown after 45 function evaluations
Mean, max, and min of best solution produced
after 100 function evaluations – CNS

Range of Objective Values for CNS Objective
Function at Alameda field site - Mean, min and max
are shown for each algorithm

100

90
-[CNS]

80

70

60
DES       FA-Gutmann      FA-RS         FMINCON

•Based on 3 trials
•Algorithm with a lowest mean and least spread is desirable             33
Conclusions on Bioremediation
Example
• Our function approximation algorithm
generally outperformed the alternative
algorithms considered.
• This performance was based on a limited
number of function evaluations.
• The function approximation algorithm was
robust in that it had very few bad results
out of 3 or10 trials.

34
300 Hours is a long time to wait!
Need for Parallel Algorithms

• For the field example, we ran each trial for
a hundred simulations (3 hours/simulation)
so total time per trial was 300 hours!
• For this reason we have developed
parallel algorithms.
• This requires a change in the algorithm to
make it effective in parallel.

35
Parallel Function
Approximation Algorithms
for:
Continuous Optimization of Multi-
modal Computationally Expensive
Functions

Funded primarily by NSF CISE Directorate
36
Average Wall Clock Times of Parallel Global Optimization Methods on the Shekel10 Test Problem

450

Parallel MG-Restart-RBF
400
Parallel CORS-Restart-RBF
350                                                                              Parallel Multistart Fmincon
average wall clock time (hrs)

300

250

200

150

100

50

0
1                    2                   4                   6                    8                  12

number of processors

Regis and Shoemaker, European Journal of Operations
37
Research in press 2006
38
Regis and Shoemaker, European Journal of Operations Research in press 2006
Information on our Algorithm:
Metric Stochastic RBF
Rommel Regis and
Christine Shoemaker

Regis and Shoemaker, INFORMS Journal of Computing
39
in press 2005 (available from Shoemaker)
Function Approximation with Initial
Points from Experimental Design

f(x)

x (parameters)
In real applications x is multidimensional since there are many
parameters (e.g. 10).                                             40
Radial Basis Functions (RBF) for Function Approximation

41
Example of an RBF (solid purple) surface
interpolating 3 points as a sum of three radial
functions (dashed lines) each centered around an
observation point.

Approximation
of objective
function

Values of a parameter
42
Outline of Function Approximation
Optimization
• 1. Use a “space filling” experimental design (e.g. SLHD)
to select a limited number of evaluation points.
• 2. Make an approximation of the function (with Radial
Basis Function splines) based on experimental design
points over the entire domain.
• 3. Use our algorithm to select next function point for
evaluation based on FA and location of previously
evaluated points
• 4. Construct a new function approximation that
incorporates the newly evaluated point and all prior
points.
• 5. Stop if reach maximum number of iterations.
Otherwise go to Step 3

43
44
45
46
47
Convergence Theorem for Metric Stochastic RBF

48
Comparisons to Other Algorithms
• Since there are no theoretical results for
rates of convergence of performance for a
given number of evaluations, comparison
to alternative algorithms has to be done
empirically.
• The results for different problems might be
different so we have done empirical
comparisons on a variety of problems.

49
Global Optimization Options for
Comparisons of Algorithms

• 1. Multi-start of an algorithm that finds local
minima. By starting the algorithm many times at
different initial points, the hope is that the global
minimum will be among the local minima found.
• 2. Use of a global optimization method.

We will compare three global optimization methods
and a multistart method using gradient based
optimization.
50
Local Optimization Methods Considered
with
Multi Level Single Linkage (MLSL) Multistart Methods

• SQP (fmincon in Matlab toolbox)
• Direct search (Matlab toolbox)
• UOBYQA (Powell 2000) as implemented
in Matlab by Vanden Berghen and Bersini
(2004)

51
Global Methods Considered
• Global Stochastic RBF (ours)
• Local Stocahstic RBF with multistart by
generating random new symmetric Latin
Hypercube.(ours)
• Simulated Annealing (with systematic
procedure for picking initial temperature
and α)
• RBF-Gutmann (2001) from Powell group
52
Summary of Results
• We solved 17 test problems and one small
groundwater problem.
• Multistart Local Metric SToch RBF (ours) is the
best among the eight algorithms considered on
higher dimensional problems and is as good or
better on lower dimensional test problems.
• Global Metric Stoch RBF is competitive with
other alternatives on lower dimensional
problems.

53
54
55
56
57
58
Conclusions on
Metric Stochastic RBF
• The local metric stochastic RBF
outperformed all other algorithms when
looking at the suite of test problem.
• Current and Future work includes
– Combining local and global stochastic RBF
– New Multi-Start methods
– Derivative Free Local Optimization

59
Integrated Statistical and Optimization Analysis
for Computationally Expensive Models of
Complex Environmental Systems

National Science Foundation Project from the
Mathematical and Physical Science Directorate
C. Shoemaker and D. Ruppert- PIs
N. Blizniouk (lead author)
R. Regis, S. Wild, & P. Mugunthan
(submitted manuscript)

60
Calibration and Uncertainty
Analysis
• We are interested in the relationship between
data used to calibrate a model and uncertainty.

• Goal of our project is to assess the uncertainty in
calibrated parameter values and in model
outputs.

• The procedure combines our optimization and
function approximation procedures with
Bayesian Analysis.
61
Uncertainty and Costly Functions

• We are using function approximation in
both the optimization and statistical
assessment to reduce the number of
costly function evaluations.

62
Objective Function
• The objective (to be maximized) is an
approximation of the likelihood function.
• The likelihood function includes the basic
parameters as well as transformations to
convert non normal random variables into
normal.

63
Role of Optimization
• We want to approximate the objective
function in parameter space.
• We use our optimization search to find the
global minimum and important local
minimum.
• We build a function approximation of the
objective function based on this
information

64
Optimization Problem is Changed:
Find All Important Local Optima

Important
Objective                  Optima
(Likelihood)

65
Optimization Problem is Changed:
Find All Important Local Minima

• Finding multiple local optimizers requires a
change in the search procedure.
• It also requires more evaluations than
finding just a single optimizer.

66
Function Approximation of the
Likelihood Function
• Based on the “costly” function evaluations done
in the optimization search, we build a function
approximation of the likelihood function. (100’s
of simulations).
• We then apply MCMC (Markov Chain Monte
Carlo) using the function approximation of the
likelihood function to obtain the joint pdf
(probability density function) of the parameter
values (requires 10,000+ function evaluations).

67
•Below is an illustration of the environmental application problem
used in this analysis.
•There is a chemical spill of mass M into a long narrow channel of
water at both locations marked in red.
•The system is described by advection diffusion equation.
•We want to estimate the mass M, the time t and location L of the
second spill, and the diffusion coefficent D, which are the model
parameters.
•The output from the model we want is average pollutant
concentration over time at the end of the channel.

68
The equations describing the concentration C as a
function of location s and time t and the parameters
M,D,L, and tau are:

69
Numerical Results on Chemical
Spill Problem
• The following slides show some numerical
results.
• The objective function in this case has one local
minimum.
• The solid line is the true value of the marginal
pdf of the parameter distribution.
• The colored line shows the pdf obtained by
MCMC on the function approximation. (There
are multiple RBF dashed lines for different
approximations.

70
The graphs below are the estimates of the marginal posterior
densities obtained by a) (solid line) conventional MCMC
Analysis with 10,000 function evaluations and b) (dashed
lines) with our function approximation method with
150 function evaluations. Each graph is for one parameter.

71
Hence we obtained with 150 function evaluations (and
function approximation) densities that are very similar to
those obtained with 10,000 function evaluations and
conventional Bayesian statistics which is over a
66 fold reduction in computational demands!

72
Uncertainty Analysis of
Model Output
• Typically what we really want to know is not the
uncertainty in the parameters but rather the
uncertainty in the model predictions (e.g.
“output”)
• The following shows the comparison of the
uncertainty of the prediction of average pollutant
concentrations using a) our function
approximation method with 150 function
evaluations and b) conventional MCMC
Bayesian Analysis.
73
Estimates of the marginal posterior density of the OUTPUT,
which is average concentration of the pollutant at
fixed location over time .

Comparison

74
Again we got excellent agreement between
our approach with 150 evaluations and
the conventional approach with 10,000 evaluations.

Output Comparison

75
Significance of Results
• To obtain the PDF by MCMC normally requires
10,000 or more" costly” function evaluations.
• These results indicated we were able to get
good results with much less computational effort
by
– Doing 150 “costly” function evaluations and then
fitting a function approximation to the likelihood
function and
– Evaluating 10,000 points on the function
approximation surface with the MCMC (an
insignificant amount of computing)

76
Summary

• Our Function Approximation Optimization (FAO)
appears to be promising for global optimization
of computationally expensive functions when the
number of simulations is limited.

• FAO can potentially also contribute to
uncertainty analysis for computationally
expensive functions. We reduced computational
requirements to 1/66th of that required by
conventional MCMC methods.

77
Collaboration
• We are interested in
collaborating with other groups
on applying our methods to
computationally expensive
simulation models

78
References
.
Selected Related Papers: (contact me <cas12@cornell.edu> for copy)

•   **Mugunthan, P., C.A. Shoemaker, R. G. Regis "Comparison of Function
Approximation, Heuristic and Derivative-based Methods for Automatic Calibration of
Computationally Expensive Groundwater Bioremediation Models," Water Resources
Research Vol. 41, W11427,doi:10.1029/2005WR004134, Dec. 2005

•   **Regis, R.G., C.A. Shoemaker, “A Stochastic Radial Basis Function Method for the
Global Optimization of Expensive Functions”, INFORMS Journal of Computing, in
press

•   *Regis, Rommel and C. Shoemaker, "Parallel Radial Basis Function Methods for the
Global Optimization of Expensive Functions," European Journal of Operations
Research, in press

•   Mugunthan, P., C.A. Shoemaker, “Assessing the Impacts of Parameter Uncertainty
for Computationally Expensive Groundwater Models,” Water Resources Research,
in press

•   Tolson, B. and C.A. Shoemaker, “The Dynamically Dimensioned Search Algorithm for
Computationally Efficient Automatic Calibration of Environmental Simulation Models,”
Water Resources Research, in press.

•   Some of these references are listed in the SAMSI abstract.                     79
Selecting the Next Simulation Point
based on Conflicting Goals
• We want points that are:
– Low values on the function approximation
– Are at some distance from previously
evaluated points (sets of parameter values)
We select the next point for simulation based on
weighted value of above two criteria
We cycle through so that the weighting changes
to move from local search emphasis to global
search emphasis.
80
Proof of Concept
• Although our eventual goal is to develop a method that can
be used on computational expensive models, in this
example we have used a model that can be evaluated very
quickly.
• In this way we can afford to also do 10,000 model
evaluations as required by conventional Bayesian analysis
on this problem.
• For truly expensive functions the conventional approach and
10,000 evaluations is not feasible so our method is the only
feasible one.
• However, we need to know if our method will give similar
results to the conventional approach.
• The numerical results on this problem do support the
accuracy of the method.
81
82

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 7 posted: 5/29/2011 language: English pages: 82