6. OVERCONSTRAINED LINEAR EQUATIONS • The Pseudoinverse • QR by lsg16921

VIEWS: 21 PAGES: 34

									    6. OVERCONSTRAINED LINEAR EQUATIONS




•   The Pseudoinverse
•   QR Decomposition
•   The Homogeneous Case
•   Nonlinear equations: Bisection and Newton’s method


Class web site:

http://www.di.ens.fr/~brette/calculscientifique/index.htm




                                 1
The LU Decomposition


• Gaussian elimination → the LU decomposition A = LU with


                   1
                                                  
                                              
                v     1
                                              
                 21
                                               
                                               
                                              
                 v31 v32 1
                                              
                                               
              L=
                                              
                 .    .    .     .
                                               
                                              
                                               
                                              
                 .    .    . ...        .
                                              
                                               
                                              
                                              
                  vn1 vn2 . . . . . . vn,n−1 1
and

                        ∗ ∗ ∗ ... ∗ ∗
                                             
                      
                          ∗ ∗ ... ∗ ∗
                                     
                      
                                     
                                     
                      
                           ∗ ... ∗ ∗
                   U =               
                               . . .
                                     
                      
                                     
                                     
                                  . .
                                     
                      
                                     
                                     
                                    ∗



                                                  2
• Cost: 2((n − 1) × (n − 1) + . . . + 2 × 2 + 1) ≈ n3.
                                                  3




                                2
Solving Ax = b



1. Compute L, U, P such that P A = LU
2. Note that P Ax = (LU )x = P b
  2.1. Solve Ly = P b
  2.2. Solve U x = y


In MATLAB:


[L,U,piv]=GEpiv(A);         ← 2n3/3 flops

y=LTriSol(L,b(piv)); ← n2 flops

x=UTriSol(U,y);             ← n2 flops


How much
does solving Ax = bi for i = 1 : k cost?
  2 3
    n + 0(kn2) flops.
  3




                                 3
// Returns the LU decomposition of A with pivoting
function [L,U,piv]=ludecpiv(A);
[n,n]=size(A);
piv=[1:n];
for k=1:n-1
  [mavx,r]=max(abs(A(k:n,k)));
  q=r+k-1;
  piv([k q])=piv([q k])
  A([k q],:)=A([q k],:)
  if A(k,k)~=0
    A(k+1:n,k)=A(k+1:n,k)/A(k,k);
    A(k+1:n,k+1:n)=A(k+1:n,k+1:n)-A(k+1:n,k)*A(k,k+1:n);
    end
  end
L=eye(n,n)+tril(A,-1);
U=triu(A);
endfunction

// Solves the linear system A*x=b via LU decomposition
// with pivoting
function x=lusolvepiv(A,b)
[L,U,piv]=ludecpiv(A);
y=lowtri(L,b(piv));
x=uptri(U,y);
endfunction




                          4
Condition number of a matrix


                 def
Define:     κ1(A) = ||A||1 ||A−1 ||1 is the condition number of A.

We have:

• κ1(A) ≥ 1,

• κ1(λA) = κ1(A),

• The larger κ1(A), the closer A is to being singular.


Theorem: If A ∈ IRn×n is non-singular, Ax = b, and εκ1(A) < 1,
then

• F l(A) is non-singular, and

                        x
• the solution of F l(A)ˆ = F l(b) verifies:
                    x
                  ||ˆ − x||1                 x
                                           ||ˆ||1
                             ≤ εκ1(A) (1 +        )
                     ||x||1                ||x||1

Note:

• This result has nothing to do with any algorithm.

• Similar results hold for the other norms.




                                  5
Backward Stability



• If x is the solution to the problem Ax = b found by Gaussian
     ˆ
elimination, then

                                   x
                            (A + E)ˆ = b

for some matrix E such that ||E|| ≈ ε||A|| .

• Note: ||F l(A) − A|| ≈ ε||A||

• Corollary:
                    
                    
                    
                    
                    
                    
                    
                           x
                        ||Aˆ − b|| ≈ ε||A|| ||ˆ||
                                              x
                    
                    
                    
                    
                    
                         x
                        ||ˆ − x||
                                  ≈ εκ(A)
                    
                    
                    
                    
                           ||x||
                    
                    
                    
                    




                                   6
Linear Systems Again...



• Square Systems
                                             
                                             
                                             
                       
                       
                          A   
                               
                                  x = b
                                         
                                         
                                             
                                             
                                             
                                                  
                                                  
                                                  
                                             



Solution: Gaussian elimination with pivoting.


• Rectangular Systems
                                        
                                        
                                        
                           A       
                                   
                                       x = [b]
                                         
                                         
                                         
                                        



◦ m < n: an infinite family of solutions


                                               
                                             
                                               
                                               
                                             
                                             
                           A        x = b
                                             
                                             
                                             
                                             
                                               
                                               
                                               
                                               



◦ m > n: an overconstrained system. No Solution




                                    7
Be Pragmatic!


If Ax = b is an overconstrained system without an exact solution,
find x that minimizes a reasonable error.

• Ex1: Minimize ||Ax − b||1.

• Ex2: Minimize ||Ax − b||2 ← LINEAR LEAST SQUARES.
                          2




Application: Data analysis: compute the parameters of a model
that best fits the data.




• Ex3:
                          f (x) = ax + b
• Ex4:
                  f (x) = anxn + · · · + a1x + a0



• Note: The Euclidean norm yields simple formulas for derivative
computation:

            ||x||2 = (x2 + x2 + · · · + x2 ) = x.x = xT x
                 2     1    2            n




                                  8
The PseudoInverse

• Define
                                                       
                                                        x1 
                                                        . −b
                                                  
                                                        . 
                                                  
            e = Ax − b = c1 c2 . . . cn
                                
                                
                                
                                                   
                                                   
                                                      . 
                                                        xn
                                                        




                = x1c1 + x2c2 + · · · xncn − b

• We want to minimize E = ||e||2 = eT e.
                               2
                            ∂E
• At a minimum, we have     ∂xi     = 0 for i = 1, ..., n, or

                 ∂E ∂eT       T ∂e     ∂eT
                     =     e+e      =2     e = 0.
                 ∂xi   ∂xi      ∂xi    ∂xi

But
                 ∂e   ∂
                    =    (x1c1 + · · · + xncn ) = ci ,
                 ∂xi ∂xi
So
               ∂eT                  ∂E
                   = cT
                      1   and           = 2cT (Ax − b) = 0,
                                            i
               ∂xi                  ∂xi
or
                 
              T
            ci 
       0 =  .  (Ax − b) = AT (Ax − b) ⇒ AT Ax = AT b,
            .
               
            .
                
                
            T 
             cn
or
                  x = A† b whereA† = (AT A)−1AT
is the pseudoinverse of A !


                                       9
Efficient solutions to the Linear Least-Squares Problem




• Minimizing ||Ax − b||2 ⇐⇒ Solving AT Ax = AT b
  ⇓
  Gaussian elimination can be used
  ⇓
  But computing AT A is expensive → 2mn2 flops



• Instead, write A = QR, where Q is an m × m matrix such that
QT Q = QQT = Id, and


       ˆ
       R
R=                 ˆ
           , where R is an n × n upper-triangular matrix.
       0
  ⇓
  QR decomposition!




                               10
The idea of QR decomposition



Suppose we could write



                                                   ˆ
                                                   R
                         A   =         Q
                                                    0


with QT Q = QQT = Id             (i.e., Q is orthogonal).



         AT Ax = AT b ⇔ RT QT QRx = RT QT b
                      ⇔ (RT R)x = RT QT b
                                 ˆ
                                  
                                 R
                         ˆ               ˆ    T
                      ⇔  RT 0      x = RT 0 Q b
                                 0


                   c
                   ˆ
Or if c = QT b =     ,
                   c
                   ˜

                    ˆ ˆ     ˆ ˆ     ˆ
                    RT Rx = RT c ⇐⇒ Rx = c
                                         ˆ

           ˆ
... and if R is upper triangular...




                                      11
Rotations in the plane


Consider the rotation of angle
θ mapping a onto b.


What is the matrix G such that Ga = b, i.e. the matrix representing
the rotation in the (x, y) coordinate system?
                
      g g
G =  11 12      
      g21 g22
                




         
              1     cos θ   g11 
                                       
         
           G  =          =
         
         
         
                                              c −s 
                                                 
              0   sin θ  g21 
         
         
         

              0   − sin θ   g12  =⇒ G = s c
                                          
         
          G    =            =
         
         
         
              1      cos θ        g22
         
         
         


with c = cos θ, s = sin θ.


Note:
       c s   c −s   c2 + s2    0  1 0
                                                     
 T
G G=                =                  =     = Id
      −s c     s c         0    c2 + s2   0 1


Note:
                            1 0 1 0 
                                             
                                       = Id
                            0 −1  0 −1
      1 0 
            

but        is not a rotation.
      0 −1



                                       12
A rotation can be used to zero a coordinate!


Idea: Map a back to b onto the x axis.



          b             a         cos θ sin θ   ax 
                                             
          x  = G(−θ)  x  = 
           0            ay       − sin θ cos θ    ay


        =⇒ ax sin θ = ay cos θ
             
             
                      ax
                 cos θ =     =c
             
             
             
             
                      2 + a2
             
                     ax                    c s
             
                                             
             
             
                           y
        =⇒                     and G = 
                      ay                 −s c
            sin θ =         =s
           
           
           
                     a2 + a2
           
           
           
                      x    y
           
           




                                   13
Avoiding numerical instability


 
 
                   ax         εx                 εx                 ax
     cos θ =              =                =              with εx =
 
 
 
 
                  a2 + a2                      1 + tan2 θ           |ax|
 
 
                                    ay 2
 
 
 
                  x    y
                              1+(      )
 
 
 
 
                                    ax
 
 
 
 
 
 
                ay            εy                 εy                 ay
     sin θ =           =                   =              with εy =
 
 
 
 
               a2 + a2                         1 + cot2 θ           |ay |
 
 
                                    ax 2
 
                x    y
 
 
 
                              1+(      )
 
 
 
 
                                    ay
 
 
 
 
 




• Potential difficulty: ax and/or ay may be very large, leading to
potential overflows and roundoff errors.

• Solution:

If |ay | > |ax|
                    ax         1
  then cot θ =         , s=        2
                                      , c = s cot θ
                    ay      1 + cot θ
                   ay         1
  else tan θ =        , c=            , s = c tan θ.
                   ax      1 + tan2 θ




                                      14
In general



A rotation of angle −θ in the (xi, xk ) plane can be represented by



                                                    
                                                    
                                                       c = cos θ
  G(i, k, θ) =                                with 
                                                       s = sin θ




If A′ = G(i, k, θ) ∗ A then
            A′ (i, 1 : m) = c ∗ A(i, 1 : m) + s ∗ A(k, 1 : m)
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
        
            and
        
        
        
        
        
        
        
            A′ (k, 1 : m) = −s ∗ A(i, 1 : m) + c ∗ A(k, 1 : m)
        
        
        
        




More compactly


                  A′ ([i k], :) = [c s; −s c] ∗ A([i k], :)




                                     15
Reduction to Upper-Triangular Form
                                                                    
                           ∗ ∗ ∗
                                           ∗   ∗ ∗
                           ∗ ∗ ∗           0   ∗ ∗
                                                   
                                           
• Problem 1: Transform                into          .
                                                    
                             
                           ∗ ∗ ∗           0   ∗ ∗
                                                   
                                    
                                                    
                                                   
                           ∗ ∗ ∗              0 ∗ ∗
1.) Apply a rotation in the (x3, x4) plane to zero the x4 coordinate
                                                             
                 ∗      ∗       ∗              ∗        ∗   ∗
                 ∗      ∗       ∗  G(3,4,θ34 )  ∗       ∗   ∗
                                                              
                 
                                       −→ 
                                                               
                                                               
                 ∗      ∗       ∗              ∗        ∗   ∗
                                                              
                                                              
                                                              
                   ∗     ∗       ∗                 0       ∗   ∗

2.) Apply a rotation in the (x2, x3) plane to zero the x3 coordinate
                                                             
                 ∗      ∗       ∗              ∗        ∗   ∗
                 ∗      ∗       ∗  G(2,3,θ23 )  ∗       ∗   ∗
                                                              
                 
                                       −→ 
                                                               
                                                               
                 ∗      ∗       ∗              0        ∗   ∗
                                                              
                                                              
                                                              
                   0     ∗       ∗                 0       ∗   ∗

3.) Apply a rotation in the (x1, x2) plane to zero the x2 coordinate
                                                             
                    ∗   ∗       ∗              ∗        ∗   ∗
                     ∗   ∗       ∗  G(1,2,θ12 )  0       ∗   ∗
                                                              
                 
                                       −→ 
                                                              
                                                               
                     0   ∗       ∗              0        ∗   ∗
                                                              
                 
                                                              
                                                              
                     0   ∗       ∗                 0       ∗   ∗
                                                                   
                      ∗ ∗ ∗        ∗  ∗                           ∗
                      ∗ ∗ ∗         0  ∗                           ∗
                                                                    
                                    
•Problem 2: Transform          into                                  .
                                                                     
                             
                      ∗ ∗ ∗         0  0                           ∗
                                                                    
                             
                                                                    
                                                                    
                      ∗ ∗ ∗            0 0                           0
  −→ Proceed column by column...


                                         16
Overall Transformation


   A → A1 = G1A → A2 = G2A1 → ... → At = GtAt−1 = R

   or

   R = GtAt−1 = (GtGt−1)At−2 = ...

        = (GtGt−1...G1)A = QT A ⇒ QR = A


• Now define:
                Q = (GtGt−1...G1)T = GT GT ...GT
                                      1 2      t


• Then:
               QQT = (GT GT ...GT )(GT GT ...G1)
                       1 2      t    t  t−1


                    = (GT ...GT )(Gt−1...G1) = ...
                        1     t−1


                    = Id = QT Q

• and Q can be constructed as :
                          
                          
                          
                          
                          
                          
                            Q ← Id
                            Q ← QGT
                          
                          
                          
                                   1
                          
                          
                          
                          
                                   T
                          
                           Q ← QG2
                          
                           ...
                          
                          
                          
                          
                          
                          
                           Q ← QGT
                          
                          
                          
                                   t




                                  17
SCILAB solution:

// Performs the QR decomposition of the matrix A.
function [Q,R]=qrdec(A)
[m,n]=size(A);
Q=eye(m,m);
for j=1:n
  for i=m:-1:j+1
    // zero A(i,j)
    [c,s]=rotate(A(i-1,j),A(i,j));
    [c,s]=rotate(A(i-1,j),A(i,j));
    A(i-1:i,j:n)=[c s;-s c]*A(i-1:i,j:n);
    Q(:,i-1:i)=Q(:,i-1:i)*[c s;-s c]’;
    end
  end
R=triu(A);
endfunction


function [c,s]=rotate(x1,x2)
if x2==0
  c=1; s=0;
  else
  if abs(x2)>=abs(x1)
    cotg=x1/x2; s=1/sqrt(1+cotg^2); c=s*cotg;
    else
    tang=x2/x1; c=1/sqrt(1+tang^2); s=c*tang;
    end
  end
endfunction



                         18
SCILAB Solution:

// computes the least squares solution of Ax=b
function [x,res]=lsq(A,b)
[m,n]=size(A);
for j=1:n
  for i=m:-1:j+1
    // zero A(i,j)
    [c,s]=rotate(A(i-1,j),A(i,j));
    A(i-1:i,j:n)=[c s;-s c]*A(i-1:i,j:n);
    b(i-1:i)=[c s;-s c]*b(i-1:i);
    end
  end
x=uptri(A(1:n,1:n),b(1:n));
if m==n
  res=0;
  else
  res=norm(b(n+1:m));
  end
endfunction




                         19
Cost:

• QR decomposition (9m − 4n)n2 → 5n3 (m = n)

• Least Squares only: (3m − 2n)n2 → n3 (m = n)

                                   2
• Compare to Gaussian elimination: 3 n3


Numerical Stability

• Old condition number κ = ||A|| ||A−1 || does not work.
• There is another one...
  then, if x is the real solution and x is the computed one
                                      ˆ
                   x
                 ||ˆ − x||2
                            ≈ ε κ + κ2||Ax − b||2
                    ||x||2

• The solution is stable: it satisfies a nearby LS problem.




                                 20
What is the residual?



Define
                        c
                        ˆ            c = c[1 : n]
                                     ˆ
                           

            c = QT b =  
                        c
                        ˜            c = c[n + 1 : m]
                                     ˜


||Ax − b||2 = (Ax − b)T (Ax − b)
          2


           = (Ax − b)T (QQT )(Ax − b) = [(Ax − b)T Q][QT (Ax − b)]

           = [QT (Ax − b)]T [QT (Ax − b)] = ||QT Ax − QT b||2
                                                            2


           = ||QT (QR)x − QT b||2 = ||Rx − QT b||2
                                2                2


                  ˆ
                 Rx      c
                         ˆ
           = ||[    ] − [ ]||2 = ||˜||2
                                   c 2
                 0       c 2
                         ˜




                                21
Non-Linear Equations



What are the values of x such that f (x) = 0?




Ex 1: f (x) = ax + b

Ex 2: f (x) = ax2 + bx + c

Ex 3: f (x) = ax3 + bx2 + cx + d

Ex 4: f (x) = a4x4 + a3x3 + a2x2 + a1x + a0

Ex n: f (x) = anxn + ... + a1x + a0
                                             2
Ex ’: f (x) = a1x3e6x + a2 cos 2xc3x + a3 1−x3
                                          4+x
       
       
       
       
       
       
        f1(x1, ..., xn) = 0
       
Ex ”:  . . .
      
       fn (x1 , ..., xn ) = 0
      
      
      




                                  22
Why Solve Non-Linear Equations?



• Astronomy Examples
  ◦ When are planets aligned?
  ◦ Do the orbits intersect?




• Geometric Problems
  ◦ When do two curves intersect?




  ◦ What are the singularities?




  ◦ Find a path for a robot from S to G.




                                  23
Bisection



• Suppose that we know a and b ∈ IR such that f (a)f (b) < 0.
[BRACKETING]
• Can we find a zero of f ? Assume for example f (a) > 0.




• By continuity, f must be zero in some point x0 ∈]a1b[
                                 a+b
  =⇒ Compute f (c) where c =      2
                > 0 there must be a zero on ]cb[
  =⇒ If f (c)
                < 0 there must be a zero on ]ab[
  etc...
Recursive procedure that subdivides the half interval containing a
root: bisection.




                                 24
Bisection (2)



• When a and b are given, bisection always converges to a root.



• Speed of convergence
Define x0 as the actual root and xi (i = 1, ..., n) as its estimate
                                   1
                        |x1 − x0| ≤ |b − a|
                                   2
So
                                       1
                       |xk − x0| ≤        |b − a|
                                       2k
     ⇓


Linear convergence.


• Problem: How do you find a bracketing interval?

• Example: If P (x) = anxn + ... + a1x + a0, all roots of P are in
(a, b) where
                                         ai
                   b = −a = max (1 + ).
                            0≤i≤n−1     an




                                  25
Bisection Test


                           x
Find a root of f (x) = tan( ) − 1 in the interval [2,4]
                           4

                          x π
                            = ⇔x=π
                          4  4

One new digit of accuracy every few iterations.




                                  26
Newton’s Method



• Suppose you are at a point xi on the curve.


• How can you move to a close-by root x0?



• Idea: Pretend the curve and its tangent in xi coincide and iterate...
      0 = f (xi+1) = f (xi ) + (xi+1 − xi)f ′ (xi) + ε(xi+1 − xi)2

                        f
      =⇒ xi+1 ≃ xi −       (xi)
                        f′




                                   27
Newton Test


                           x
Find a root of f (x) = tan( ) − 1 starting in x = 1.
                           4
Much faster convergence.




                                 28
Newton’s Method (2)



What is the rate of convergence? Define εi = xi − x0.
By definition
                           f                    f
            xi+1 = xi −       (xi) ⇒ εi+1 = εi − ′ (xi ) (1)
                           f′                   f

But
            
            
                                                 1
              f (xi ) = f (x0) + εif ′ (x0) + ε2f ′′ (x0) + ...
            
            
            
            
                                                 2 i
            
            
            
             f ′ (x ) = f ′ (x ) + ε f ′′ (x ) + ...
            
            
            
                    i          0     i       0


Substituting in (1) yields
                                                           
           1  ′                  ′       1 2 ′′
εi+1   = ′      εi f (xi ) − εi f (x0 ) − ε f (x0 ) + ...
                                            i
                                                          
         f (xi)                           2
                                                                            
                1            ′         2 ′′           ′       1 2 ′′
       =                εi f (x0 ) + ε f (x0 ) − εi f (x0 ) − ε f (x0 ) + ...
                                                                             
            ′                          i                        i
           f (x0) + ...                                       2
⇓
         1 2 f ′′
εi+1   ≈ εi ′ (x0)
         2 f

⇓

Quadratic convergence!




                                     29
Newton’s Method (3)



• In general, the error is not guaranteed to decrease at every iteration.

• In fact, the method can die because of local extrema

    f
       (xi+1) → ∞
    f′


• Theorem: When f satisfies a bunch of difficult-to-verify conditions,
then
                      |xi+1 − x0| ≤ k|xi − x0|2


⇓

• In other words, under these assumptions
    - Newton will converge to a root
    - With a quadratic rate of convergence




                                   30
Newton’s Method (4): When do you stop?


              f
xi+1 = xi −      (xi )
              f′

• When |xi+1 − xi | is small enough
               
               f (x) = tan x
               
               
               
               
               
               
  • Consider  ′
              f (x) =
                          1
                        cos2 x
             
             
             
             



      f                           −6
  •     ′ (x) = sin x cos x ≈ −10
      f

  ⇓
                    
                    
                    
                    
                    
                    
                      zeros of f : x = 0, π, 2π ...
                    
                               f           π       3π
                     zeros of
                    
                                  : x = 0, , π, ...
                               f′          2        2
                    
                    
                    
                    




• When |f (xi )| is small enough

  • Consider f (x) = (x − 1)100

  • f (1.1) = (0.1)100 = 10−100

  ⇓

  but the actual root is in 1.




                                       31
Combining Bisection and Newton’s Method



Idea:
   • Start with a bracketing interval.
   • Try Newton.
   • If you get out of the bracketing interval, then use bisection in-
stead.
   • Iterate.



When do you stop?

     • When the current bracketing interval is small enough,
or
     • When |f (xi )| is small enough,
or
     • When you have done too much work (too many iterations).




                                    32
Secant Method



Is it possible to apply Newton without supplying derivatives?
                                             f (xk )
                        xk+1 − xk = −
                                             f ′ (xk )


Idea: estimate f ′ (xk ) from values of F in xk−1 and xk
Use finite differences

       f (xk−1) = f (xk ) + (xk−1 − xk )f ′ (xk ) + ε(xk−1 − xk )2
  ⇓

                                    f (xk ) − f (xk−1 )
                      f ′ (xk ) ≈
                                        xk − xk−1
  ⇓

                                    f (xk )
             xk+1 = xk −                         (xk − xk−1)
                             f (xk ) − f (xk−1 )


Geometrically

  f (xk ) − f (xk−1 )     f (xk )
                      =
      xk − xk−1         xk − xk+1




                                       33
Advantages of the secant method:

  • No need for analytical derivatives

  • Same order of convergence as Newton

   • Cheaper computationally: Only one function evaluation per it-
eration


Note: two initial values are required
  xinit and xinit + δx.




                                  34

								
To top