8. Linear least-squares problems by qok10781

VIEWS: 41 PAGES: 12

									                                                                      EE103 (Spring 2004-05)


                    8. Linear least-squares problems



• overdetermined sets of linear equations

• least-squares solution

• examples and applications




                                                                                        8–1




                   Overdetermined sets of linear equations

m equations in n variables

                                 a11x1 + a12x2 + · · · + a1nxn = b1
                                 a21x1 + a22x2 + · · · + a2nxn = b2
                                                               .
                                                               .
                                am1x1 + am2x2 + · · · + amnxn = bm

in matrix form: Ax = b with A ∈ Rm×n, b ∈ Rm


• A is skinny (m > n); more equations than unkowns

• for most b, cannot solve for x



Linear least-squares problems                                                           8–2
                                 Least-squares solution

one approach to approximately solve Ax = b:

                                       minimize       Ax − b

• r = Ax − b is called the residual or error
• x with smallest residual (smallest value of r = Ax − b ) is called the
  least-squares solution
• in Matlab: x = A\b


equivalent formulation:

                                                  2       m     T
                            minimize    Ax − b        =   i=1 (ai x   − b i )2

where aT is the ith row of A
       i


Linear least-squares problems                                                    8–3




example: three equations in two variables x1, x2


                           2x1 = 1,      −x1 + x2 = 0,          2x2 = −1


least-squares solution:

                     minimize (2x1 − 1)2 + (−x1 + x2)2 + (2x2 + 1)2

to find optimal x1, x2, set derivatives w.r.t. x1 and x2 equal to zero:

                                       10x1 − 2x2 − 4 = 0
                                      −2x1 + 10x2 + 4 = 0

solution x1 = 1/3, x2 = −1/3

(much more on solving LS problems later)

Linear least-squares problems                                                    8–4
                                    r1 = (2x1 − 1)2
                                     2                                                2
                                                                                     r2 = (−x1 + x2)2

                      30                                                 20

                                                                         15
                      20
                                                                         10
                      10
                                                                         5
PSfrag replacements                                PSfrag replacements
                      0                                                  0
                      2                                                  2
                                                                2                                              2
                                0                                                0
                                                      0                                                    0
                                 x2      −2 −2
                                                     x1                              x2     −2 −2
                                                                                                       x1

                                     2
                                    r3 = (2x2 + 1)2                                        2     2     2
                                                                                          r1 + r 2 + r 3

                      30                                                 60


                      20                                                 40


                      10                                                 20

PSfrag replacements                                PSfrag replacements
                      0                                                  0
                      2                                                  2
                                                                2                                              2
                                0                                                0
                                                      0                                                    0
                                 x2      −2 −2
                                                     x1                              x2     −2 −2
                                                                                                       x1


             Linear least-squares problems                                                                         8–5




                                                 Least-squares data fitting

             fit a function

                                             g(t) = x1g1(t) + x2g2(t) + · · · + xngn(t)

             to data (t1, y1), . . . , (tm, ym), i.e., we would like to have

                               g(t1) = y1,                g(t2) = y2,         ...,           g(tm) = ym


              • gi(t) : R → R are given functions (basis functions)
              • problem variables: the coefficients x1, x2, . . . , xn
              • usually m                n, hence no exact solution
              • applications:
                  – extrapolation, smoothing of data
                  – developing simple, approximate model of observed data

             Linear least-squares problems                                                                         8–6
least-squares fit: minimize the function

        m                             m
                                2                                                          2
             (g(ti) − yi) =                (x1g1(ti) + x2g2(ti) + · · · + xngn(ti) − yi)
       i=1                           i=1




                                                         2
in matrix notation: minimize Ax − b                          where
                                                                                  
         g1(t1) g2(t1) g3(t1) · · · gn(t1)                                        y1
        g (t ) g2(t2) g3(t2) · · · gn(t2)                                      y 
     A= 1.2       .      .            .   ,                               b =  .2 
           .      .      .            .                                        . 
         g1(tm) g2(tm) g3(tm) · · · gn(tm)                                        ym




Linear least-squares problems                                                                  8–7




                    Example: data fitting with polynomials

fit a polynomial

                                g(t) = x1 + x2t + x3t2 + · · · + xntn−1

to data (t1, y1), . . . , (tm, ym) (m ≥ n), i.e., we would like to have

                  g(t1) = y1,              g(t2) = y2,         ...,   g(tm) = ym


a set of m equations in n variables

                                         n−1
                        1 t 1 t2 · · · t 1
                                                                       
                               1                x1                       y1
                       1 t2 t2 · · · tn−1   x2                      y2 
                               2         2
                       . .
                       . .    .
                               .           .  .  = 
                                           .  .                       . 
                                                                          . 
                                         n−1
                        1 t m t2 · · · t m
                               m                xn                       ym


Linear least-squares problems                                                                  8–8
                polynomial interpolation: m = n
                if m = n, we can satisfy the equations g(ti) = yi exactly by solving a set
                of n linear equations in n variables (see page 3–3)
                example. fit a polynomial to f (t) = 1/(1 + 25t2) on [−1, 1]

                                            n=5                                              n = 15
                                                                           8
                     1.5

                                                                           6
                       1

                                                                           4

                     0.5
                                                                           2

                       0
                                                                           0
Sfrag replacements                              PSfrag replacements
                     −0.5                                                 −2
                       −1        −0.5           0        0.5        1      −1         −0.5       0        0.5       1



                (dashed line: f ; solid line: polynomial g ; circles: the points (t i, yi))

                increasing n does not improve the overall quality of the fit

                Linear least-squares problems                                                                           8–9




                polynomial approximation: m > n

                if m > n, we have more equations than variables, and (in general) it is not
                possible to satisfy the conditions g(ti) = yi exactly


                least-squares solution:
                                         m                              m                                          n−1
                 minimize                i=1 (g(ti )   − y i )2 =       i=1 (x1   + x 2 ti + x 3 t2 + · · · + x n ti − y i ) 2
                                                                                                  i




                                                                           2
                in matrix notation: minimize Ax − b                            where

                                          n−1
                         1 t 1 t2 · · · t 1
                                                                                                           
                                1                                                     x1                     y1
                        1 t2 t2 · · · tn−1                                         x                    y 
                     A= . .
                        . .
                                2
                                .
                                          2
                                            . ,                                x =  .2  ,
                                                                                     .               b =  .2 
                                                                                                            . 
                                .           . 
                         1 t m t2 · · · t m
                                m
                                          n−1                                         xn                     ym




                Linear least-squares problems                                                                        8–10
                example: fit a polynomial to f (t) = 1/(1 + 25t2) on [−1, 1]

                m = 50; ti: m equally spaced points in [−1, 1]

                                            n=5                                      n = 15

                       1                                                1

                     0.8                                              0.8

                     0.6                                              0.6

                     0.4                                              0.4

                     0.2                                              0.2
Sfrag replacements                              PSfrag replacements
                       0                                                0

                     −0.2                                             −0.2
                       −1        −0.5           0      0.5     1        −1    −0.5       0    0.5   1



                (dashed line: f ; solid line: polynomial g ; circles: the points (t i, yi))


                much better fit overall

                Linear least-squares problems                                                       8–11




                                                    Least-squares estimation


                                                             y = Ax + w

                • x is what we want to estimate or reconstruct
                • y is our measurement(s)
                • w is an unknown noise or measurement error (assumed small)
                • ith row of A characterizes ith sensor or ith measurement

                                                                        ˆ
                least-squares estimation: choose as estimate the vector x that minimizes

                                                                x
                                                               Aˆ − y

                i.e., minimize the deviation between what we actually observed (y), and
                                               ˆ
                what we would observe if x = x, and there were no noise (w = 0)

                Linear least-squares problems                                                       8–12
               Example: navigation by range measurements

PSfrag replacements
  determine position (u, v) in a plane by measuring distances to four
  beacons at known positions (pi, qi)
                                  beacons

                                                                   (p1, q1)

                                            (p4, q4)
                                                       −a1

                                                 −a4           −a2
                                                                              (p2, q2)

                                                         −a3
                unknown position (u, v)

                                                        (p3, q3)

  we assume that the beacons are far from the unknown position (u, v), so
  linearization around (u0, v0) = 0 (say) is nearly exact

  Linear least-squares problems                                                           8–13




  linearized equations (page 3–20):


                     ai1u + ai2v + wi = ρi −                   2
                                                        p2 + q i ,
                                                         i               i = 1, 2, 3, 4

  • −(ai1, ai2): unit vector from 0 to beacon i
  • ρi: measured distance to beacon i
  • wi: measurement error in ρi plus small error due to linearization

  problem: estimate u, v, given ρ, (p1, q1), (p2, q2), (p3, q3), (p4, q4)

  example
  • beacon positions (p1, q1) = (10, 0), (p2, q2) = (−10, 2),
    (p3, q3) = (3, 9), (p4, q4) = (10, 10)
  • actual position is (2, 2)
  • measured distances ρ = (8.22, 11.9, 7.08, 11.33)


  Linear least-squares problems                                                           8–14
approximate solutions:

• ‘just enough measurements method’: two measurements suffice to find
  (u, v) (when error w = 0)
                      ˆ ˆ
    e.g., can compute u, v from

                                 ρ1 −              2
                                            p2 + q 1
                                             1                a11 a12             ˆ
                                                                                  u
                                                        =
                                 ρ2 −              2
                                            p2 + q 2
                                             2
                                                              a21 a22             ˆ
                                                                                  v



    example (of previous page):

                                   −1.78               −1.00  0.00            ˆ
                                                                              u
                                               =
                                    1.72                0.98 −0.20            ˆ
                                                                              v

                            u ˆ
    solution (via Matlab): (ˆ, v ) = (1.78, 0.11) (norm of error: 1.90)


Linear least-squares problems                                                                 8–15




                                ˆ ˆ
• least-squares method: compute u, v by minimizing

                                                                                      2
                                            ai1u + ai2v − ρi +
                                               ˆ      ˆ              p2
                                                                      i   +    2
                                                                              qi
                                i=1,2,3,4




    example (of page 8–14):
                                                                             
                                                            ρ1 −          2
                                                                   p2 + q 1
                                                                                       
             −1.00  0.00                                            1             −1.77
            0.98 −0.20                                    ρ2 −          2
                                                                   p2 + q 2     1.72 
                                                                              
         A=                                                        2
            0.32 −0.95  ,                                                   =
                                                                                      
                                                            ρ3 −          2
                                                                   p2 + q 3       −2.41 
                                                        
                                                                   3         
             −0.71 −0.71                                    ρ4 −   p2 + q 4
                                                                    4
                                                                          2       −2.81

               u ˆ
    solution: (ˆ, v ) = (1.97, 1.90) (norm of error: 0.10)




Linear least-squares problems                                                                 8–16
                                   Least-squares system identification
        PSfrag replacements and output y(t) for t = 0, . . . , N of an unknown system
         measure input u(t)

                                                           unknown
                                   u(t)                     system                    y(t)


          example (N = 70):
                             4                                             5

                             2
                     u(t)




                                                                   y(t)
                             0                                             0

PSfrag replacements                             PSfrag replacements
                            −2

                            −4                                            −5
                              0       20       40    60                     0    20       40   60
                                           t                                          t
          system identification problem: find reasonable model for system based
          on measured I/O data u, y

          Linear least-squares problems                                                             8–17




          a simple and widely used model:

                      ˆ
                      y (t) = h0u(t) + h1u(t − 1) + h2u(t − 2) + · · · + hnu(t − n)

                ˆ
          where y (t) is the predicted output (or model output)

           • called a moving average (MA) model with n delays
           • predicted output is a linear combination of current and n previous inputs
           • h0, . . . , hn are parameters of the model

          least-squares identification: choose the model (i.e., h0, . . . , hn) that
          minimizes the prediction error

                                                       N                        1/2

                                               E=           (ˆ(t) − y(t))2
                                                             y
                                                      t=n




          Linear least-squares problems                                                             8–18
  formulation as a linear least-squares problem:

                          N                                                                    1/2

       E      =                   (h0u(t) + h1u(t − 1) + · · · + hnu(t − n) − y(t))2
                          t=n
              =         Ax − b

                                                                                          
                            u(n)   u(n − 1) u(n − 2)                      ···     u(0)
           
                         u(n + 1)   u(n)       u(n − 1)                  ···     u(1)     
                                                                                           
       A =               u(n + 2) u(n + 1)       u(n)                    ···     u(2)     
           
                             .
                              .        .
                                       .            .
                                                    .                               .
                                                                                    .
                                                                                           
                                                                                           
                            u(N )  u(N − 1) u(N − 2)                      · · · u(N − n)
                                                   
                          h0                y(n)
            
                         h1 
                                        y(n + 1) 
                                                     
        x =              h2  ,   b =  y(n + 2) 
            
                          . 
                           . 
                                       
                                             .
                                              .
                                                      
                                                      
                          hn                y(N )


  Linear least-squares problems                                                                      8–19




  example (I/O data of page 8–17) with n = 7: least-squares solution is

        h0 = 0.0240,                    h1 = 0.2819,            h2 = 0.4176,         h3 = 0.3536,
        h4 = 0.2425,                    h5 = 0.4873,            h6 = 0.2084,         h7 = 0.4412



                                   5
                                              solid: y(t): actual output
                                   4
                                                      ˆ
                                              dashed: y (t), predicted from model
                                   3
                                   2
                                   1
                                   0
                                  −1
PSfrag replacements               −2
                                  −3
                                  −4
                                    0    10     20     30       40   50         60   70
                                                            t

  Linear least-squares problems                                                                      8–20
               model order selection: how large should n be?

               obviously the larger n, the smaller the prediction error on the data used to
               form the model
                                                                                relative prediction error E/ y
                                                                          1

                                                                         0.8

                                                                         0.6
                                    PSfrag replacements
                                                                         0.4

                                                                         0.2

                                                            ¯ ¯
                                              test data set y , u         0
                                                   data set y , u          0             20                     40
                                                                                               n
               • suggests using largest possible n for smallest prediction error
               • a much more important question is: how good is the model at
                 predicting new data (i.e., not used to calculate the model)?


               Linear least-squares problems                                                                                            8–21




               model validation: test model on a new data set (from the same system)
                                                       4                                                    5

                                                       2
                                               u(t)




                                                                                                   y (t)




                                                       0                                                    0
                                                                                                   ¯
                                               ¯




    PSfrag replacements                                                    PSfrag replacements
                   y (t)
                   ¯                                  −2

                                                      −4                                                   −5
                                                        0   20           40         60        ¯
                                                                                              u(t)           0       20       40   60
                                                                     t                                                    t
                  relative prediction error




                                               1
                                                                                              • for n too large the predictive
                                              0.8
                                                                                                ability of the model becomes
PSfrag replacements 0.6
                                                                                                worse!
                                              0.4
                                                                     validation data          • plot suggests n = 10 is a good
                                              0.2                                               choice
                                                                     modeling data
                                               0
                                                0           20                 40
                                                                 n


               Linear least-squares problems                                                                                            8–22
                      for n = 50 the actual and predicted outputs on system identification and
                      model validation data are:

                                      model identification I/O set                     model validation I/O set
                               5                                                 5
   PSfrag replacements                                  PSfrag replacements
                                                       solid: y(t)                                       ¯
                                                                                                  solid: y (t)
                                                       dashed: predicted y(t)                                       ¯
                                                                                                  dashed: predicted y (t)

                                                                  solid: y(t)
                               0                      dashed: predicted y(t)     0
                    ¯
             solid: y (t)
                    ¯
 dashed: predicted y (t)

                             −5                 model identification I/O set     −5
model validation I/O set       0              20       40        60               0      20          40      60
                                                     t                                           t


                      loss of predictive ability when n too large is called model overfit or
                      overmodeling


                      Linear least-squares problems                                                               8–23

								
To top