# 6. OVERCONSTRAINED LINEAR EQUATIONS • The Pseudoinverse • QR by lsg16921

VIEWS: 21 PAGES: 34

• pg 1
```									    6. OVERCONSTRAINED LINEAR EQUATIONS

•   The Pseudoinverse
•   QR Decomposition
•   The Homogeneous Case
•   Nonlinear equations: Bisection and Newton’s method

Class web site:

http://www.di.ens.fr/~brette/calculscientifique/index.htm

1
The LU Decomposition

• Gaussian elimination → the LU decomposition A = LU with

1
                               
                              
v     1
                              
 21


                              
 v31 v32 1
                              

L=
                              
 .    .    .     .

                              

                              
 .    .    . ...        .
                              

                              
                              
vn1 vn2 . . . . . . vn,n−1 1
and

∗ ∗ ∗ ... ∗ ∗
                     

∗ ∗ ... ∗ ∗
               

               
               

     ∗ ... ∗ ∗
U =               
. . .
               

               
               
. .
               

               
               
∗

2
• Cost: 2((n − 1) × (n − 1) + . . . + 2 × 2 + 1) ≈ n3.
3

2
Solving Ax = b

1. Compute L, U, P such that P A = LU
2. Note that P Ax = (LU )x = P b
2.1. Solve Ly = P b
2.2. Solve U x = y

In MATLAB:

[L,U,piv]=GEpiv(A);         ← 2n3/3 ﬂops

y=LTriSol(L,b(piv)); ← n2 ﬂops

x=UTriSol(U,y);             ← n2 ﬂops

How much
does solving Ax = bi for i = 1 : k cost?
2 3
n + 0(kn2) ﬂops.
3

3
// Returns the LU decomposition of A with pivoting
function [L,U,piv]=ludecpiv(A);
[n,n]=size(A);
piv=[1:n];
for k=1:n-1
[mavx,r]=max(abs(A(k:n,k)));
q=r+k-1;
piv([k q])=piv([q k])
A([k q],:)=A([q k],:)
if A(k,k)~=0
A(k+1:n,k)=A(k+1:n,k)/A(k,k);
A(k+1:n,k+1:n)=A(k+1:n,k+1:n)-A(k+1:n,k)*A(k,k+1:n);
end
end
L=eye(n,n)+tril(A,-1);
U=triu(A);
endfunction

// Solves the linear system A*x=b via LU decomposition
// with pivoting
function x=lusolvepiv(A,b)
[L,U,piv]=ludecpiv(A);
y=lowtri(L,b(piv));
x=uptri(U,y);
endfunction

4
Condition number of a matrix

def
Deﬁne:     κ1(A) = ||A||1 ||A−1 ||1 is the condition number of A.

We have:

• κ1(A) ≥ 1,

• κ1(λA) = κ1(A),

• The larger κ1(A), the closer A is to being singular.

Theorem: If A ∈ IRn×n is non-singular, Ax = b, and εκ1(A) < 1,
then

• F l(A) is non-singular, and

x
• the solution of F l(A)ˆ = F l(b) veriﬁes:
x
||ˆ − x||1                 x
||ˆ||1
≤ εκ1(A) (1 +        )
||x||1                ||x||1

Note:

• This result has nothing to do with any algorithm.

• Similar results hold for the other norms.

5
Backward Stability

• If x is the solution to the problem Ax = b found by Gaussian
ˆ
elimination, then

x
(A + E)ˆ = b

for some matrix E such that ||E|| ≈ ε||A|| .

• Note: ||F l(A) − A|| ≈ ε||A||

• Corollary:







x
||Aˆ − b|| ≈ ε||A|| ||ˆ||
x





     x
||ˆ − x||
≈ εκ(A)




||x||





6
Linear Systems Again...

• Square Systems
                      
                      
                      


   A   

   x = b








                      

Solution: Gaussian elimination with pivoting.

• Rectangular Systems
     
     
     
A       

    x = [b]



     

◦ m < n: an inﬁnite family of solutions

                        
                      
                        
                        
                      
                      
A        x = b
                      
                      
                      
                      
                        
                        
                        
                        

◦ m > n: an overconstrained system. No Solution

7
Be Pragmatic!

If Ax = b is an overconstrained system without an exact solution,
ﬁnd x that minimizes a reasonable error.

• Ex1: Minimize ||Ax − b||1.

• Ex2: Minimize ||Ax − b||2 ← LINEAR LEAST SQUARES.
2

Application: Data analysis: compute the parameters of a model
that best ﬁts the data.

• Ex3:
f (x) = ax + b
• Ex4:
f (x) = anxn + · · · + a1x + a0

• Note: The Euclidean norm yields simple formulas for derivative
computation:

||x||2 = (x2 + x2 + · · · + x2 ) = x.x = xT x
2     1    2            n

8
The PseudoInverse

• Deﬁne
                       
x1 
. −b
                  
. 
                  
e = Ax − b = c1 c2 . . . cn





   . 
xn
                        

= x1c1 + x2c2 + · · · xncn − b

• We want to minimize E = ||e||2 = eT e.
2
∂E
• At a minimum, we have     ∂xi     = 0 for i = 1, ..., n, or

∂E ∂eT       T ∂e     ∂eT
=     e+e      =2     e = 0.
∂xi   ∂xi      ∂xi    ∂xi

But
∂e   ∂
=    (x1c1 + · · · + xncn ) = ci ,
∂xi ∂xi
So
∂eT                  ∂E
= cT
1   and           = 2cT (Ax − b) = 0,
i
∂xi                  ∂xi
or
      
T
 ci 
0 =  .  (Ax − b) = AT (Ax − b) ⇒ AT Ax = AT b,
 .
    
 .


 T 
cn
or
x = A† b whereA† = (AT A)−1AT
is the pseudoinverse of A !

9
Eﬃcient solutions to the Linear Least-Squares Problem

• Minimizing ||Ax − b||2 ⇐⇒ Solving AT Ax = AT b
⇓
Gaussian elimination can be used
⇓
But computing AT A is expensive → 2mn2 ﬂops

• Instead, write A = QR, where Q is an m × m matrix such that
QT Q = QQT = Id, and

ˆ
R
R=                 ˆ
, where R is an n × n upper-triangular matrix.
0
⇓
QR decomposition!

10
The idea of QR decomposition

Suppose we could write

ˆ
R
A   =         Q
0

with QT Q = QQT = Id             (i.e., Q is orthogonal).

AT Ax = AT b ⇔ RT QT QRx = RT QT b
⇔ (RT R)x = RT QT b
ˆ
          
R
 ˆ               ˆ    T
⇔  RT 0      x = RT 0 Q b
0

c
ˆ
Or if c = QT b =     ,
c
˜

ˆ ˆ     ˆ ˆ     ˆ
RT Rx = RT c ⇐⇒ Rx = c
ˆ

ˆ
... and if R is upper triangular...

11
Rotations in the plane

Consider the rotation of angle
θ mapping a onto b.

What is the matrix G such that Ga = b, i.e. the matrix representing
the rotation in the (x, y) coordinate system?
           
g g
G =  11 12      
g21 g22
            


1     cos θ   g11 
                          

G  =          =



c −s 
                                        
0   sin θ  g21 




0   − sin θ   g12  =⇒ G = s c
                              

 G    =            =



1      cos θ        g22




with c = cos θ, s = sin θ.

Note:
c s   c −s   c2 + s2    0  1 0
                                            
T
G G=                =                  =     = Id
−s c     s c         0    c2 + s2   0 1

Note:
1 0 1 0 
                   
             = Id
0 −1  0 −1
1 0 
        

but        is not a rotation.
0 −1

12
A rotation can be used to zero a coordinate!

Idea: Map a back to b onto the x axis.

b             a         cos θ sin θ   ax 
                                    
 x  = G(−θ)  x  = 
0            ay       − sin θ cos θ    ay

=⇒ ax sin θ = ay cos θ


         ax
cos θ =     =c




2 + a2

ax                    c s

                                


y
=⇒                     and G = 
           ay                 −s c
 sin θ =         =s



a2 + a2



x    y



13
Avoiding numerical instability



                  ax         εx                 εx                 ax
cos θ =              =                =              with εx =




a2 + a2                      1 + tan2 θ           |ax|


ay 2



                 x    y
1+(      )




ax






               ay            εy                 εy                 ay
sin θ =           =                   =              with εy =




a2 + a2                         1 + cot2 θ           |ay |


ax 2

x    y



1+(      )




ay






• Potential diﬃculty: ax and/or ay may be very large, leading to
potential overﬂows and roundoﬀ errors.

• Solution:

If |ay | > |ax|
ax         1
then cot θ =         , s=        2
, c = s cot θ
ay      1 + cot θ
ay         1
else tan θ =        , c=            , s = c tan θ.
ax      1 + tan2 θ

14
In general

A rotation of angle −θ in the (xi, xk ) plane can be represented by



   c = cos θ
G(i, k, θ) =                                with 
    s = sin θ

If A′ = G(i, k, θ) ∗ A then
A′ (i, 1 : m) = c ∗ A(i, 1 : m) + s ∗ A(k, 1 : m)
















and







A′ (k, 1 : m) = −s ∗ A(i, 1 : m) + c ∗ A(k, 1 : m)





More compactly

A′ ([i k], :) = [c s; −s c] ∗ A([i k], :)

15
Reduction to Upper-Triangular Form
                                       
∗ ∗ ∗
              ∗   ∗ ∗
∗ ∗ ∗           0   ∗ ∗
                      
              
• Problem 1: Transform                into          .
                       

∗ ∗ ∗           0   ∗ ∗
                      
       
                       
                      
∗ ∗ ∗              0 ∗ ∗
1.) Apply a rotation in the (x3, x4) plane to zero the x4 coordinate
                                            
∗      ∗       ∗              ∗        ∗   ∗
∗      ∗       ∗  G(3,4,θ34 )  ∗       ∗   ∗
                                             

−→ 
                            
                                              
∗      ∗       ∗              ∗        ∗   ∗
                                             
                                             
                                             
∗     ∗       ∗                 0       ∗   ∗

2.) Apply a rotation in the (x2, x3) plane to zero the x3 coordinate
                                            
∗      ∗       ∗              ∗        ∗   ∗
∗      ∗       ∗  G(2,3,θ23 )  ∗       ∗   ∗
                                             

−→ 
                            
                                              
∗      ∗       ∗              0        ∗   ∗
                                             
                                             
                                             
0     ∗       ∗                 0       ∗   ∗

3.) Apply a rotation in the (x1, x2) plane to zero the x2 coordinate
                                            
   ∗   ∗       ∗              ∗        ∗   ∗
∗   ∗       ∗  G(1,2,θ12 )  0       ∗   ∗
                                             

−→ 
                                             
                                              
0   ∗       ∗              0        ∗   ∗
                                             

                                             
                                             
0   ∗       ∗                 0       ∗   ∗
                                      
∗ ∗ ∗        ∗  ∗                           ∗
∗ ∗ ∗         0  ∗                           ∗
                                       
       
•Problem 2: Transform          into                                  .
                                        

∗ ∗ ∗         0  0                           ∗
                                       

                                       
                                       
∗ ∗ ∗            0 0                           0
−→ Proceed column by column...

16
Overall Transformation

A → A1 = G1A → A2 = G2A1 → ... → At = GtAt−1 = R

or

R = GtAt−1 = (GtGt−1)At−2 = ...

= (GtGt−1...G1)A = QT A ⇒ QR = A

• Now deﬁne:
Q = (GtGt−1...G1)T = GT GT ...GT
1 2      t

• Then:
QQT = (GT GT ...GT )(GT GT ...G1)
1 2      t    t  t−1

= (GT ...GT )(Gt−1...G1) = ...
1     t−1

= Id = QT Q

• and Q can be constructed as :






Q ← Id
Q ← QGT



1




T

 Q ← QG2

 ...






 Q ← QGT



t

17
SCILAB solution:

// Performs the QR decomposition of the matrix A.
function [Q,R]=qrdec(A)
[m,n]=size(A);
Q=eye(m,m);
for j=1:n
for i=m:-1:j+1
// zero A(i,j)
[c,s]=rotate(A(i-1,j),A(i,j));
[c,s]=rotate(A(i-1,j),A(i,j));
A(i-1:i,j:n)=[c s;-s c]*A(i-1:i,j:n);
Q(:,i-1:i)=Q(:,i-1:i)*[c s;-s c]’;
end
end
R=triu(A);
endfunction

function [c,s]=rotate(x1,x2)
if x2==0
c=1; s=0;
else
if abs(x2)>=abs(x1)
cotg=x1/x2; s=1/sqrt(1+cotg^2); c=s*cotg;
else
tang=x2/x1; c=1/sqrt(1+tang^2); s=c*tang;
end
end
endfunction

18
SCILAB Solution:

// computes the least squares solution of Ax=b
function [x,res]=lsq(A,b)
[m,n]=size(A);
for j=1:n
for i=m:-1:j+1
// zero A(i,j)
[c,s]=rotate(A(i-1,j),A(i,j));
A(i-1:i,j:n)=[c s;-s c]*A(i-1:i,j:n);
b(i-1:i)=[c s;-s c]*b(i-1:i);
end
end
x=uptri(A(1:n,1:n),b(1:n));
if m==n
res=0;
else
res=norm(b(n+1:m));
end
endfunction

19
Cost:

• QR decomposition (9m − 4n)n2 → 5n3 (m = n)

• Least Squares only: (3m − 2n)n2 → n3 (m = n)

2
• Compare to Gaussian elimination: 3 n3

Numerical Stability

• Old condition number κ = ||A|| ||A−1 || does not work.
• There is another one...
then, if x is the real solution and x is the computed one
ˆ
x
||ˆ − x||2
≈ ε κ + κ2||Ax − b||2
||x||2

• The solution is stable: it satisﬁes a nearby LS problem.

20
What is the residual?

Deﬁne
c
ˆ            c = c[1 : n]
ˆ
   

c = QT b =  
c
˜            c = c[n + 1 : m]
˜

||Ax − b||2 = (Ax − b)T (Ax − b)
2

= (Ax − b)T (QQT )(Ax − b) = [(Ax − b)T Q][QT (Ax − b)]

= [QT (Ax − b)]T [QT (Ax − b)] = ||QT Ax − QT b||2
2

= ||QT (QR)x − QT b||2 = ||Rx − QT b||2
2                2

ˆ
Rx      c
ˆ
= ||[    ] − [ ]||2 = ||˜||2
c 2
0       c 2
˜

21
Non-Linear Equations

What are the values of x such that f (x) = 0?

Ex 1: f (x) = ax + b

Ex 2: f (x) = ax2 + bx + c

Ex 3: f (x) = ax3 + bx2 + cx + d

Ex 4: f (x) = a4x4 + a3x3 + a2x2 + a1x + a0

Ex n: f (x) = anxn + ... + a1x + a0
2
Ex ’: f (x) = a1x3e6x + a2 cos 2xc3x + a3 1−x3
4+x






f1(x1, ..., xn) = 0

Ex ”:  . . .

 fn (x1 , ..., xn ) = 0




22
Why Solve Non-Linear Equations?

• Astronomy Examples
◦ When are planets aligned?
◦ Do the orbits intersect?

• Geometric Problems
◦ When do two curves intersect?

◦ What are the singularities?

◦ Find a path for a robot from S to G.

23
Bisection

• Suppose that we know a and b ∈ IR such that f (a)f (b) < 0.
[BRACKETING]
• Can we ﬁnd a zero of f ? Assume for example f (a) > 0.

• By continuity, f must be zero in some point x0 ∈]a1b[
a+b
=⇒ Compute f (c) where c =      2
> 0 there must be a zero on ]cb[
=⇒ If f (c)
< 0 there must be a zero on ]ab[
etc...
Recursive procedure that subdivides the half interval containing a
root: bisection.

24
Bisection (2)

• When a and b are given, bisection always converges to a root.

• Speed of convergence
Deﬁne x0 as the actual root and xi (i = 1, ..., n) as its estimate
1
|x1 − x0| ≤ |b − a|
2
So
1
|xk − x0| ≤        |b − a|
2k
⇓

Linear convergence.

• Problem: How do you ﬁnd a bracketing interval?

• Example: If P (x) = anxn + ... + a1x + a0, all roots of P are in
(a, b) where
ai
b = −a = max (1 + ).
0≤i≤n−1     an

25
Bisection Test

x
Find a root of f (x) = tan( ) − 1 in the interval [2,4]
4

x π
= ⇔x=π
4  4

One new digit of accuracy every few iterations.

26
Newton’s Method

• Suppose you are at a point xi on the curve.

• How can you move to a close-by root x0?

• Idea: Pretend the curve and its tangent in xi coincide and iterate...
0 = f (xi+1) = f (xi ) + (xi+1 − xi)f ′ (xi) + ε(xi+1 − xi)2

f
=⇒ xi+1 ≃ xi −       (xi)
f′

27
Newton Test

x
Find a root of f (x) = tan( ) − 1 starting in x = 1.
4
Much faster convergence.

28
Newton’s Method (2)

What is the rate of convergence? Deﬁne εi = xi − x0.
By deﬁnition
f                    f
xi+1 = xi −       (xi) ⇒ εi+1 = εi − ′ (xi ) (1)
f′                   f

But


1
f (xi ) = f (x0) + εif ′ (x0) + ε2f ′′ (x0) + ...




2 i



 f ′ (x ) = f ′ (x ) + ε f ′′ (x ) + ...



i          0     i       0

Substituting in (1) yields
                                         
1  ′                  ′       1 2 ′′
εi+1   = ′      εi f (xi ) − εi f (x0 ) − ε f (x0 ) + ...
i

f (xi)                           2
                                                     
1            ′         2 ′′           ′       1 2 ′′
=                εi f (x0 ) + ε f (x0 ) − εi f (x0 ) − ε f (x0 ) + ...
                                                     
′                          i                        i
f (x0) + ...                                       2
⇓
1 2 f ′′
εi+1   ≈ εi ′ (x0)
2 f

⇓

29
Newton’s Method (3)

• In general, the error is not guaranteed to decrease at every iteration.

• In fact, the method can die because of local extrema

f
(xi+1) → ∞
f′

• Theorem: When f satisﬁes a bunch of diﬃcult-to-verify conditions,
then
|xi+1 − x0| ≤ k|xi − x0|2

⇓

• In other words, under these assumptions
- Newton will converge to a root
- With a quadratic rate of convergence

30
Newton’s Method (4): When do you stop?

f
xi+1 = xi −      (xi )
f′

• When |xi+1 − xi | is small enough

f (x) = tan x






• Consider  ′
 f (x) =
1
cos2 x





f                           −6
•     ′ (x) = sin x cos x ≈ −10
f

⇓






zeros of f : x = 0, π, 2π ...

f           π       3π
 zeros of

: x = 0, , π, ...
f′          2        2





• When |f (xi )| is small enough

• Consider f (x) = (x − 1)100

• f (1.1) = (0.1)100 = 10−100

⇓

but the actual root is in 1.

31
Combining Bisection and Newton’s Method

Idea:
• Try Newton.
• If you get out of the bracketing interval, then use bisection in-
• Iterate.

When do you stop?

• When the current bracketing interval is small enough,
or
• When |f (xi )| is small enough,
or
• When you have done too much work (too many iterations).

32
Secant Method

Is it possible to apply Newton without supplying derivatives?
f (xk )
xk+1 − xk = −
f ′ (xk )

Idea: estimate f ′ (xk ) from values of F in xk−1 and xk
Use ﬁnite diﬀerences

f (xk−1) = f (xk ) + (xk−1 − xk )f ′ (xk ) + ε(xk−1 − xk )2
⇓

f (xk ) − f (xk−1 )
f ′ (xk ) ≈
xk − xk−1
⇓

f (xk )
xk+1 = xk −                         (xk − xk−1)
f (xk ) − f (xk−1 )

Geometrically

f (xk ) − f (xk−1 )     f (xk )
=
xk − xk−1         xk − xk+1

33
Advantages of the secant method:

• No need for analytical derivatives

• Same order of convergence as Newton

• Cheaper computationally: Only one function evaluation per it-
eration

Note: two initial values are required
xinit and xinit + δx.

34

```
To top