# Rank-Deficient and Ill-Conditioned Nonlinear Least Squares Problems

Document Sample

```					Rank-Deﬁcient Nonlinear Least Squares Problems
Nonlinear Equations (time permitting)
Conclusions

Rank-Deﬁcient and Ill-Conditioned Nonlinear
Least Squares Problems

C. T. Kelley
NC State University
tim kelley@ncsu.edu
Joint with K. I. Dickson, S. Pope, I. C. F. Ipsen, L. Ellwein,
M. Olufsen, V. Novak

IFIP WG 2.5, Raleigh

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems
Nonlinear Equations (time permitting)
Conclusions

Outline

Rank-Deﬁcient Nonlinear Least Squares Problems
Theory
Subset Selection
Examples

Nonlinear Equations (time permitting)
Continuation
Bounds on Singular Values
Example

Conclusions

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Motivating Application: Pope, Olufsen, Ellwein, Novak

Compartmental Model of Cardio-Vascular System
Integrate dynamics with ode15s
Leads to nonlinear least squares problem min f where

f (p) = R(p)T R(p)/2; R : R N → R M

Too many ﬁtting parameters
nonlinear dependencies
insensitive model output
Problems with optimization
Levenberg-Marquardt decreases function then stagnates,
BUT diﬀerence gradients at “solution” are not small,
so there’s no reason to believe the results.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

What can you do?

Obvious thing: “Regularize” the Jacobian
Compute SVD of R ; set “small” singular values to zero;
Use the regularized Jacobian in place of R in the
Levenberg-Marquardt Step

(νI + R (p)T R (p))s = −R (p)T R(p) = − f (p)

Does exactly what you want if you have
small residual,
clear gap in singular values, and
highly accurate computation of R and R .
Otherwise, you can get very poor results.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Analysis in Ideal Case: nonlinear dependence

Assume we can factor R as
˜
R(p) = R(B(p))
˜
where B, R are Lipschitz continuously diﬀerentiable and for some
K ≤N
B : R N → R K has full row rank and
˜
R : R K → R M has full column rank.
˜
Smallest nonzero singular values of B and R uniformly
bounded away from zero.
Note: You do not know B, only that it exists.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Consequences

R = UΣV T has K nonzero singular values
There is σK > 0 such that σK ≥ σK for all p.
¯                       ¯
ˆ
ˆ ∈ R N the set {p | B(p) = b} consists of isolated
For any b
points.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Optimality assumptions
Assume that
˜ 1˜ ˜
f = RT R
2
has a unique minimizer b ∗ ∈ RK .

So f is minimized on the set

Z = {p | f (p) = f ∗ } = {p | B(p) = b ∗ },
˜
where f ∗ = (1/2)(R ∗ )T R ∗ and R ∗ = R(b ∗ ).
We assume that

{p | g (p) ≡ R (p)T R(p) = 0} = Z,

and let

Zδ = {p | p − p ∗ ≤ δ, for some p ∗ ∈ Z }.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Levenberg-Marquardt Method: Trial step st

From a current point pc ,

st = (νI + R (pc )T R (pc ))−1 R (pc )T R(pc )

Levenberg-Marquardt is a trust region approach, with the usual
surplus of parameters:

0 < ωdown < 1 < ωup , ν0 ≥ 0, and 0 ≤ µ0 < µlow ≤ µhigh < 1.

A typical choice is

µ0 = 0, µlow = 1/4, µhigh = 3/4, ωdown = 1/2, and ωup = 2.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Levenberg-Marquardt Method: Managing ν, ﬁnding p+

levmar step(pc , pt , p+ , f , ν)
1. z = pc
2. Do while z = pc
2.1 ared = f (pc ) − f (pt ), st = pt − pc , pred = − f (pc )T st /2
2.2 If ared/pred < µ0 then set z = pc , ν = max(ωup ν, ν0 ), and
recompute the trial point with the new value of ν.
2.3 If µ0 ≤ ared/pred < µlow , then set z = pt and
ν = max(ωup ν, ν0 ).
2.4 If µlow ≤ ared/pred, then set z = pt .
If µhigh < ared/pred, then set ν = ωdown ν.
If ν < ν0 then set ν = 0.
3. p+ = z.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Estimate for Levenberg-Marquardt step

st = (νI + R (p)T R (p))−1 R (p)T R(p)
If pc ∈ Zδ for suﬃciently small δ, then

st = −(νI + R (pc )T R (pc ))† R (pc )T R (pc )ec + ∆S ,

where
γ ec 2 γ ec R ∗
∆S ≤              +         .
2σK         ¯2
ν + σK
Here γ is the Lipschitz constant of R .

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Convergence Analysis

Let
d(p) = minp∗ ∈Z p − p ∗
The estimate for the Levenberg-Marquardt step implies

ν
d(p+ ) = O                   2
+ R(p ∗ ) + d(pc ) d(pc )
ν + σK

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Why is this good?

Nonlinear equations: N = M = K is Newton.
Full rank case K = N is Gauss-Newton.
K < N leads to convergence in exact arithmetic:
ν → 0 (so you’re getting close to Gauss-Newton).
st approaches minimum norm solution of

R (pc )st = −R(pc )

as it should.
Levenberg-Marquardt iterates converge to a point in Z
(but you can’t predict which one).

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Errors in R and R

If you have small errors in R and R ,
R ∗ is small, and
you know what K is (clear gap in computed σs),
then nothing goes wrong.
Replace the computed R with J, where

Rcompute (p) = UΣV T , let ΣJ = diag (σ1 , . . . , σK , 0, . . . , 0),

and set J = UΣJ V T .
Then we use J T R for the gradient.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Error Analysis

Let
σ
J = R + E , ˜ = −(ν + J T J)−1 J T R, and η(ν) =
s                                                                max
σK ≤σ≤σ1   ν + σ2
Assume that
2 E F                                                    E
γ=                        < 1/2 and E              2η(ν) +           2
< 1.
σk − 2 E                                                 ν + σk

Then
2 E
s −˜ ≤ R
s                     2η(ν)(1 + γ + γ 2 ) +               2
.
ν + σk

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

What can go wrong?

If the gap between σK and σK +1 is small,
you may have trouble identifying K , and, even if you know K ,
the span of the ﬁrst K singular vectors may change
signiﬁcantly with each nonlinear iteration,
so the error E in J could be ≈ σK
If R ∗ is too large then the convergence estimate is a
problem
Small J T R may be a poor indicator of convergence.
So there’s some confusion.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Subset Selection: Linear Least Squares

Find “optimal” linearly independent set of K columns for M × N
matirx A i. e.
span of columns you keep includes ones you discard
condition of M × K smaller matrix is good
So you transform a nearly rank deﬁcient matrix into a full rank one.
Golub/Klema/Stewart 1976
e
V`lez-Reyes 1992
Chandrasekaran/Ipsen 1994
Gu/Eisenstat 1996

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Subset Selection Ideas

Data: A = M × N, integer K
Find a permutation P so that AP = (A1 , A2 ) and for some η ≥ 1
A1 = M × K is well conditioned

σK (A)/η ≤ σK (A1 ) ≤ ησK (A)

Columns of A2 are “nearly spanned” by those of A1

σK +1 (A) ≤ min A1 Z − A2 ≤ ησK +1 (A).
Z

With “optimal” P you can get η =                       1 + K (N − K )

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Subset Selection for us

Assume prior knowledge of K
Apply to computed R at the start
extract K design variables
set other N − K to nominal values
do full-rank computation
Query span of K columns and conditioning at the end.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems       Theory
Nonlinear Equations (time permitting)      Subset Selection
Conclusions      Examples

Example: Parameter ID for IVP

Dynamics:

y = F (t, y : p), y (0) = y0 , p ∈ R N .

Fit numerical solution of IVP to data vector d ∈ R M ,
M
1
f (p) =                (˜ (ti : p) − di )2
y
2
i=1

˜
We compute y with ode15s.

C. T. Kelley      Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Jacobian and sensitivities

Ri (p) = y (ti : p) − di ,
˜
and we compute the columns of the Jacobian by computing the
sensitivities,

wp = ∂y /∂p, so Rij (p) = wpj (ti ).

wp is the solution of the initial value problem

wp + Fy (y , p)wp + Fp (y , p) = 0, wp (0) = 0.

Solve for w and y simultaneously, so accuracy in R and R is
roughly the same.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Driven Harmonic Oscillator

(1+10−3 δm )y +(c1 +c2 )y +ky = A sin(ωt), y (0) = y0 , y (0) = y0 .

With p = (δm , c1 , c2 , k)T ∈ R 4 . Small singular value from p1 and
one zero singular value since
∂R    ∂R
=     .
∂c1   ∂c2
Data come from exact solution with

p ∗ = (1.23, 1, 0, 1)T , and we use p0 = (0, 1, 1, .3)T .

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Highly Accurate Integration

Accuracy tolerances to ode15s were

τa = τr = 10−8

and we got

p = (1.22, .5, .5, 1)T (no SS) and (1.23, 0, 1, 1)T (with SS)

which is very good.
The singular values were

(1.13e + 02, 2.16e + 00, 5.57e − 04, 1.68e − 15)

so there is a clear gap.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems                 Theory
Nonlinear Equations (time permitting)                Subset Selection
Conclusions                Examples

Driven Oscillator: High Accuracy

4
10
Least Squares Error
2
10

0
10

−2
10

−4
10

−6
10

−8
10

−10
10

−12
10
0   5       10         15          20   25       30            35
Iterations

C. T. Kelley            Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems           Theory
Nonlinear Equations (time permitting)          Subset Selection
Conclusions          Examples

Driven Oscillator: High Accuracy: SS

4
10
Least Squares Error
2
10

0
10

−2
10

−4
10

−6
10

−8
10

−10
10

−12
10

−14
10
0    5        10                15       20                25
Iterations

C. T. Kelley          Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Large residual

Perturb data component wise by 1 + 10−4 rand. Resuts:

p = (.636, .5, .5, .998)T (no SS) and (1.27, 0, 1, 1)T (with SS)

So δm is completely wrong without SS.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems                Theory
Nonlinear Equations (time permitting)               Subset Selection
Conclusions               Examples

Driven Oscillator: High Accuracy: Large R ∗

4
10
Least Squares Error

2
10

0
10

−2
10

−4
10

−6
10

−8
10
0   5       10         15          20   25       30            35
Iterations

C. T. Kelley            Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems           Theory
Nonlinear Equations (time permitting)          Subset Selection
Conclusions          Examples

Driven Oscillator: High Accuracy: Large R ∗ ; SS

4
10
Least Squares Error

2
10

0
10

−2
10

−4
10

−6
10

−8
10
0     5        10                15       20                25
Iterations

C. T. Kelley          Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Theory
Nonlinear Equations (time permitting)   Subset Selection
Conclusions   Examples

Driven Oscillator; Low Resolution

In this example we set

τa = τr = 10−4

and get

p = (.09, .5, .5, 1)T (no SS) and (.97, 0, 1, 1)T (with SS)

So we can recover one ﬁgure with poor accuracy and SS.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems                  Theory
Nonlinear Equations (time permitting)                 Subset Selection
Conclusions                 Examples

Driven Oscillator: Low Accuracy

4
10
3
Least Squares Error
10

2
10

1
10

0
10

−1
10

−2
10

−3
10

−4
10

−5
10

−6
10
0   2        4         6                8   10       12            14
Iterations

C. T. Kelley              Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems             Theory
Nonlinear Equations (time permitting)            Subset Selection
Conclusions            Examples

Driven Oscillator: Low Accuracy: SS

4
10
Least Squares Error
3
10

2
10

1
10

0
10

−1
10

−2
10

−3
10

−4
10

−5
10
0   2   4    6     8       10     12   14       16      18        20
Iterations

C. T. Kelley         Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems           Theory
Nonlinear Equations (time permitting)          Subset Selection
Conclusions          Examples

1
10
No subset selection
Subset Selection (5 parameters)
0
10

−1
10

−2
10

−3
10

−4
10
0      2         4            6        8        10            12
Iteration

C. T. Kelley         Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Continuation
Nonlinear Equations (time permitting)   Bounds on Singular Values
Conclusions   Example

Parameter-Dependent Nonlinear Equations

Objective: Given G : R N+1 → R. Solve

G (u, λ) = 0

where u ∈ R N , λ ∈ R, to recover u as a function of λ.
Simple continuation (increasing λ) fails if Gu is singular.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Continuation
Nonlinear Equations (time permitting)   Bounds on Singular Values
Conclusions   Example

Pseudo-Arclength Continuation (Keller et al, very old)

Pseudo-arclength continuation adds an artiﬁcial parameter s and
treats x = (u, λ) as a function of s.

G (x)                 0
F (x, s) =                         =               .
N (x, s)               0
Here N is a normalization which makes s an “arclength”.
Example:
˙T
N (x, s) = x0 (x − x0 ) − (s − s0 )
˙
where x is an approximation of dx/ds.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Continuation
Nonlinear Equations (time permitting)   Bounds on Singular Values
Conclusions   Example

Assumptions on Singularities

We assume that either Gu is nonsingular or (u, λ) is a simple fold.
A solution (u0 , λ0 ) of G (u, λ) = 0 is a simple fold if
dim(Ker (Gu (u0 , λ0 ))) = 1 and
Gλ (u0 , λ0 ) ∈ Range(Gu (u0 , λ0 )).
In this case Fx is always nonsingular at a solution of F (x, s) = 0.
The length of the step in arclength we can take depends on
Fx (x, s)−1 , and we obtain a new bound for that.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Continuation
Nonlinear Equations (time permitting)   Bounds on Singular Values
Conclusions   Example

Bound on singular values Dickson, Ipsen, K (SINUM, 2007)

Let
Gu (u, λ) = UΣV T
be a singular value decomposition (SVD) of Gu (u, λ) where

Σ = diag (σ1 , σ2 , . . . , σN ),            σ1 ≥ σ2 ≥ · · · ≥ σN ,           uN ≡ UeN ,

¯
Since we have at worst simple folds, there is σ > 0 such that

σN−1 ≥ σ > 0
¯

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Continuation
Nonlinear Equations (time permitting)   Bounds on Singular Values
Conclusions   Example

Simple fold via SVD

Let (u0 , λ0 ) be a solution of G (u, λ) = 0, and let uN (u0 , λ0 ) be a
left singular vector of Gu (u0 , λ0 ) associated with σN .
Then (u0 , λ0 ) is a simple fold if
σN−1 (u0 , λ0 ) > 0,
dim(Ker (Gu (u0 , λ0 ))) = 1 and
uN (u0 , λ0 )T Gλ (u0 , λ0 ) = 0.

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Continuation
Nonlinear Equations (time permitting)   Bounds on Singular Values
Conclusions   Example

Simple folds are worst singularities via SVD

gap
max σN (u, λ)2 , |uN (u, λ)T Gλ (u, λ)|2                                 ≥ α > 0,
gap + ξ 2

where
gap ≡ σN−1 (u, λ)2 − σN (u, λ)2 ,
and

ξ ≡ |uN (u, λ)T Gλ (u, λ)| + (I − uN (u, λ)uN (u, λ)T )Gλ (u, λ) .

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems    Continuation
Nonlinear Equations (time permitting)   Bounds on Singular Values
Conclusions   Example

Estimate of Fx−1

On the solutiontion path

˙      ˙
dG (u, λ)/ds = Gu u + Gλ λ = 0

So near the solution path
˙
Gu u + Gλ λ ≤ τ < α
˙

and in that region

1
σmin (Fx ) ≥           1 − τ max         ,1 .
α

C. T. Kelley   Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems        Continuation
Nonlinear Equations (time permitting)       Bounds on Singular Values
Conclusions       Example

Chandrasekhar H-Equation

1                      −1
c            dνµ
H(µ) = 1 −                     H(ν)                       .
2     0      µ+ν
Objective: Compute H(µ) for µ ∈ [0, 1] as function of c ≥ 0.
Simple fold at c = 1.
Singularity structure for discrete problem is the same.

C. T. Kelley       Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems                      Continuation
Nonlinear Equations (time permitting)                     Bounds on Singular Values
Conclusions                     Example

H as function of c

14

12

10

8
|| H ||1

6

4

2

0
0   0.1   0.2     0.3    0.4   0.5      0.6   0.7   0.8   0.9    1
c

C. T. Kelley         Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems                      Continuation
Nonlinear Equations (time permitting)                     Bounds on Singular Values
Conclusions                     Example

σmin (GH ) as function of c

1

0.9

0.8

0.7

0.6
σmin(GH)

0.5

0.4

0.3

0.2

0.1

0
0   0.1   0.2     0.3    0.4   0.5      0.6   0.7   0.8   0.9    1
c

C. T. Kelley         Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems                         Continuation
Nonlinear Equations (time permitting)                        Bounds on Singular Values
Conclusions                        Example

σmin (Fx ) as function of c

1

0.95

0.9

0.85
σmin(FH,c)

0.8

0.75

0.7

0.65
0   0.1   0.2     0.3    0.4   0.5      0.6   0.7   0.8   0.9    1
c

C. T. Kelley         Rank-Deﬁcinet Problems
Rank-Deﬁcient Nonlinear Least Squares Problems
Nonlinear Equations (time permitting)
Conclusions

Conclusions

Rank-Deﬁcient Nonlinear Least Squares
Special structure from dependent design variables
Great results in exact arithmetic
Less great results with errors
Subset selection can help
Rank-Deﬁcient Nonlinear Equations
Simple fold singularities
Pseudo-arclength Continuation
Uniform condition estimates

C. T. Kelley   Rank-Deﬁcinet Problems

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 110 posted: 2/18/2010 language: English pages: 43