VIEWS: 41 PAGES: 12 CATEGORY: Technology POSTED ON: 2/18/2010 Public Domain
EE103 (Spring 2004-05) 8. Linear least-squares problems • overdetermined sets of linear equations • least-squares solution • examples and applications 8–1 Overdetermined sets of linear equations m equations in n variables a11x1 + a12x2 + · · · + a1nxn = b1 a21x1 + a22x2 + · · · + a2nxn = b2 . . am1x1 + am2x2 + · · · + amnxn = bm in matrix form: Ax = b with A ∈ Rm×n, b ∈ Rm • A is skinny (m > n); more equations than unkowns • for most b, cannot solve for x Linear least-squares problems 8–2 Least-squares solution one approach to approximately solve Ax = b: minimize Ax − b • r = Ax − b is called the residual or error • x with smallest residual (smallest value of r = Ax − b ) is called the least-squares solution • in Matlab: x = A\b equivalent formulation: 2 m T minimize Ax − b = i=1 (ai x − b i )2 where aT is the ith row of A i Linear least-squares problems 8–3 example: three equations in two variables x1, x2 2x1 = 1, −x1 + x2 = 0, 2x2 = −1 least-squares solution: minimize (2x1 − 1)2 + (−x1 + x2)2 + (2x2 + 1)2 to ﬁnd optimal x1, x2, set derivatives w.r.t. x1 and x2 equal to zero: 10x1 − 2x2 − 4 = 0 −2x1 + 10x2 + 4 = 0 solution x1 = 1/3, x2 = −1/3 (much more on solving LS problems later) Linear least-squares problems 8–4 r1 = (2x1 − 1)2 2 2 r2 = (−x1 + x2)2 30 20 15 20 10 10 5 PSfrag replacements PSfrag replacements 0 0 2 2 2 2 0 0 0 0 x2 −2 −2 x1 x2 −2 −2 x1 2 r3 = (2x2 + 1)2 2 2 2 r1 + r 2 + r 3 30 60 20 40 10 20 PSfrag replacements PSfrag replacements 0 0 2 2 2 2 0 0 0 0 x2 −2 −2 x1 x2 −2 −2 x1 Linear least-squares problems 8–5 Least-squares data ﬁtting ﬁt a function g(t) = x1g1(t) + x2g2(t) + · · · + xngn(t) to data (t1, y1), . . . , (tm, ym), i.e., we would like to have g(t1) = y1, g(t2) = y2, ..., g(tm) = ym • gi(t) : R → R are given functions (basis functions) • problem variables: the coeﬃcients x1, x2, . . . , xn • usually m n, hence no exact solution • applications: – extrapolation, smoothing of data – developing simple, approximate model of observed data Linear least-squares problems 8–6 least-squares ﬁt: minimize the function m m 2 2 (g(ti) − yi) = (x1g1(ti) + x2g2(ti) + · · · + xngn(ti) − yi) i=1 i=1 2 in matrix notation: minimize Ax − b where g1(t1) g2(t1) g3(t1) · · · gn(t1) y1 g (t ) g2(t2) g3(t2) · · · gn(t2) y A= 1.2 . . . , b = .2 . . . . . g1(tm) g2(tm) g3(tm) · · · gn(tm) ym Linear least-squares problems 8–7 Example: data ﬁtting with polynomials ﬁt a polynomial g(t) = x1 + x2t + x3t2 + · · · + xntn−1 to data (t1, y1), . . . , (tm, ym) (m ≥ n), i.e., we would like to have g(t1) = y1, g(t2) = y2, ..., g(tm) = ym a set of m equations in n variables n−1 1 t 1 t2 · · · t 1 1 x1 y1 1 t2 t2 · · · tn−1 x2 y2 2 2 . . . . . . . . = . . . . n−1 1 t m t2 · · · t m m xn ym Linear least-squares problems 8–8 polynomial interpolation: m = n if m = n, we can satisfy the equations g(ti) = yi exactly by solving a set of n linear equations in n variables (see page 3–3) example. ﬁt a polynomial to f (t) = 1/(1 + 25t2) on [−1, 1] n=5 n = 15 8 1.5 6 1 4 0.5 2 0 0 Sfrag replacements PSfrag replacements −0.5 −2 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 (dashed line: f ; solid line: polynomial g ; circles: the points (t i, yi)) increasing n does not improve the overall quality of the ﬁt Linear least-squares problems 8–9 polynomial approximation: m > n if m > n, we have more equations than variables, and (in general) it is not possible to satisfy the conditions g(ti) = yi exactly least-squares solution: m m n−1 minimize i=1 (g(ti ) − y i )2 = i=1 (x1 + x 2 ti + x 3 t2 + · · · + x n ti − y i ) 2 i 2 in matrix notation: minimize Ax − b where n−1 1 t 1 t2 · · · t 1 1 x1 y1 1 t2 t2 · · · tn−1 x y A= . . . . 2 . 2 . , x = .2 , . b = .2 . . . 1 t m t2 · · · t m m n−1 xn ym Linear least-squares problems 8–10 example: ﬁt a polynomial to f (t) = 1/(1 + 25t2) on [−1, 1] m = 50; ti: m equally spaced points in [−1, 1] n=5 n = 15 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Sfrag replacements PSfrag replacements 0 0 −0.2 −0.2 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 (dashed line: f ; solid line: polynomial g ; circles: the points (t i, yi)) much better ﬁt overall Linear least-squares problems 8–11 Least-squares estimation y = Ax + w • x is what we want to estimate or reconstruct • y is our measurement(s) • w is an unknown noise or measurement error (assumed small) • ith row of A characterizes ith sensor or ith measurement ˆ least-squares estimation: choose as estimate the vector x that minimizes x Aˆ − y i.e., minimize the deviation between what we actually observed (y), and ˆ what we would observe if x = x, and there were no noise (w = 0) Linear least-squares problems 8–12 Example: navigation by range measurements PSfrag replacements determine position (u, v) in a plane by measuring distances to four beacons at known positions (pi, qi) beacons (p1, q1) (p4, q4) −a1 −a4 −a2 (p2, q2) −a3 unknown position (u, v) (p3, q3) we assume that the beacons are far from the unknown position (u, v), so linearization around (u0, v0) = 0 (say) is nearly exact Linear least-squares problems 8–13 linearized equations (page 3–20): ai1u + ai2v + wi = ρi − 2 p2 + q i , i i = 1, 2, 3, 4 • −(ai1, ai2): unit vector from 0 to beacon i • ρi: measured distance to beacon i • wi: measurement error in ρi plus small error due to linearization problem: estimate u, v, given ρ, (p1, q1), (p2, q2), (p3, q3), (p4, q4) example • beacon positions (p1, q1) = (10, 0), (p2, q2) = (−10, 2), (p3, q3) = (3, 9), (p4, q4) = (10, 10) • actual position is (2, 2) • measured distances ρ = (8.22, 11.9, 7.08, 11.33) Linear least-squares problems 8–14 approximate solutions: • ‘just enough measurements method’: two measurements suﬃce to ﬁnd (u, v) (when error w = 0) ˆ ˆ e.g., can compute u, v from ρ1 − 2 p2 + q 1 1 a11 a12 ˆ u = ρ2 − 2 p2 + q 2 2 a21 a22 ˆ v example (of previous page): −1.78 −1.00 0.00 ˆ u = 1.72 0.98 −0.20 ˆ v u ˆ solution (via Matlab): (ˆ, v ) = (1.78, 0.11) (norm of error: 1.90) Linear least-squares problems 8–15 ˆ ˆ • least-squares method: compute u, v by minimizing 2 ai1u + ai2v − ρi + ˆ ˆ p2 i + 2 qi i=1,2,3,4 example (of page 8–14): ρ1 − 2 p2 + q 1 −1.00 0.00 1 −1.77 0.98 −0.20 ρ2 − 2 p2 + q 2 1.72 A= 2 0.32 −0.95 , = ρ3 − 2 p2 + q 3 −2.41 3 −0.71 −0.71 ρ4 − p2 + q 4 4 2 −2.81 u ˆ solution: (ˆ, v ) = (1.97, 1.90) (norm of error: 0.10) Linear least-squares problems 8–16 Least-squares system identiﬁcation PSfrag replacements and output y(t) for t = 0, . . . , N of an unknown system measure input u(t) unknown u(t) system y(t) example (N = 70): 4 5 2 u(t) y(t) 0 0 PSfrag replacements PSfrag replacements −2 −4 −5 0 20 40 60 0 20 40 60 t t system identiﬁcation problem: ﬁnd reasonable model for system based on measured I/O data u, y Linear least-squares problems 8–17 a simple and widely used model: ˆ y (t) = h0u(t) + h1u(t − 1) + h2u(t − 2) + · · · + hnu(t − n) ˆ where y (t) is the predicted output (or model output) • called a moving average (MA) model with n delays • predicted output is a linear combination of current and n previous inputs • h0, . . . , hn are parameters of the model least-squares identiﬁcation: choose the model (i.e., h0, . . . , hn) that minimizes the prediction error N 1/2 E= (ˆ(t) − y(t))2 y t=n Linear least-squares problems 8–18 formulation as a linear least-squares problem: N 1/2 E = (h0u(t) + h1u(t − 1) + · · · + hnu(t − n) − y(t))2 t=n = Ax − b u(n) u(n − 1) u(n − 2) ··· u(0) u(n + 1) u(n) u(n − 1) ··· u(1) A = u(n + 2) u(n + 1) u(n) ··· u(2) . . . . . . . . u(N ) u(N − 1) u(N − 2) · · · u(N − n) h0 y(n) h1 y(n + 1) x = h2 , b = y(n + 2) . . . . hn y(N ) Linear least-squares problems 8–19 example (I/O data of page 8–17) with n = 7: least-squares solution is h0 = 0.0240, h1 = 0.2819, h2 = 0.4176, h3 = 0.3536, h4 = 0.2425, h5 = 0.4873, h6 = 0.2084, h7 = 0.4412 5 solid: y(t): actual output 4 ˆ dashed: y (t), predicted from model 3 2 1 0 −1 PSfrag replacements −2 −3 −4 0 10 20 30 40 50 60 70 t Linear least-squares problems 8–20 model order selection: how large should n be? obviously the larger n, the smaller the prediction error on the data used to form the model relative prediction error E/ y 1 0.8 0.6 PSfrag replacements 0.4 0.2 ¯ ¯ test data set y , u 0 data set y , u 0 20 40 n • suggests using largest possible n for smallest prediction error • a much more important question is: how good is the model at predicting new data (i.e., not used to calculate the model)? Linear least-squares problems 8–21 model validation: test model on a new data set (from the same system) 4 5 2 u(t) y (t) 0 0 ¯ ¯ PSfrag replacements PSfrag replacements y (t) ¯ −2 −4 −5 0 20 40 60 ¯ u(t) 0 20 40 60 t t relative prediction error 1 • for n too large the predictive 0.8 ability of the model becomes PSfrag replacements 0.6 worse! 0.4 validation data • plot suggests n = 10 is a good 0.2 choice modeling data 0 0 20 40 n Linear least-squares problems 8–22 for n = 50 the actual and predicted outputs on system identiﬁcation and model validation data are: model identiﬁcation I/O set model validation I/O set 5 5 PSfrag replacements PSfrag replacements solid: y(t) ¯ solid: y (t) dashed: predicted y(t) ¯ dashed: predicted y (t) solid: y(t) 0 dashed: predicted y(t) 0 ¯ solid: y (t) ¯ dashed: predicted y (t) −5 model identiﬁcation I/O set −5 model validation I/O set 0 20 40 60 0 20 40 60 t t loss of predictive ability when n too large is called model overﬁt or overmodeling Linear least-squares problems 8–23