Document Sample
					                                                                                                                ADAPTIVE FILTERS   259

Now consider the expectation of Eq. (37):                             For stable convergence each term in Eq. (45) must be less
                                                                      than one, so we must have
           E[w (n + 1)] = E[w (n)] + 2µE[d(n)x (n)]
             w              w                x
                                                               (38)                                         1
                           − 2µE[x (n)x (n)T ]E[w (n)]
                                 x x            w                                              0<µ<                                (46)
We have assumed that the filter weights are uncorrelated
with the input signal. This is not strictly satisfied, because         where max is the largest eigenvalue of the correlation matrix
the weights depend on x(n); but we can assume that        has         R, though this is not a sufficient condition for stability under
small values because it is associated with a slow trajectory.         all signal conditions. The final convergence rate of the algo-
So, subtracting the optimum solution from both sides of Eq.           rithm is determined by the value of the smallest eigenvalue.
(38), and substituting the autocorrelation matrix R and cross-        An important characteristic of the input signal is therefore
correlation vector p, we get                                          the eigenvalue spread or disparity, defined as

E[w (n + 1)] − R−1 p = E[w (n)] − R−1 p + 2µR{R−1 p − E[w (n)]}
  w                      w                              w                                        λmax/λmin                         (47)
                                                                      So, from the point of view of convergence speed, the ideal
    Next, defining                                                     value of the eigenvalue spread is unity; the larger the value,
                                                                      the slower will be the final convergence. It can be shown (3)
                ξ (n + 1) = E[w (n + 1)] − R−1 p
                              w                                (40)   that the eigenvalues of the autocorrelation matrix are
                                                                      bounded by the maximum and minimum values of the power
from Eq. (39) we obtain                                               spectral density of the input.
                                                                         It is therefore concluded that the optimum signal for fast-
                    ξ (n + 1) = (I − 2µR)ξ (n)                 (41)   est convergence of the LMS algorithm is white noise, and that
                                                                      any form of coloring in the signal will increase the conver-
This process is equivalent to translation of coordinates. Next,       gence time. This dependence of convergence on the spectral
we define R in terms of an orthogonal transformation (7):              characteristics of the input signal is a major problem with the
                                                                      LMS algorithm, as discussed in Ref. 6.
                           R = K T QK                          (42)
                                                                      LMS-Based Algorithms
where Q is a diagonal matrix consisting of the eigenvalues
( 0, 1, . . ., N) of the correlation matrix R, and K is the uni-         The Normalized LMS Algorithm. The normalized LMS
tary matrix consisting of the eigenvectors associated with            (NLMS) algorithm is a variation of the ordinary LMS algo-
these eigenvalues.                                                    rithm. Its objective is to overcome the gradient noise amplifi-
   Substituting Eq. (42) in Eq. (41), we have                         cation problem. This problem is due to the fact that in the
                                                                      standard LMS, the correction e(n)x(n) is directly propor-
                ξ (n + 1) = (I − 2µK T QK)ξ (n)                       tional to the input vector x(n). Therefore, when x(n) is large,
                                                               (43)   the LMS algorithm amplifies the noise.
                           = K T (I − 2µQ)Kξ (n)
                                                                         Consider the LMS algorithm defined by
Multiplying both sides of the Eq. (43) by K and defining
                                                                                     w (n + 1) = w (n) + 2µe(n)x (n)
                                                                                                               x                   (48)
                    v (n + 1) = Kξ (n + 1)
                                                               (44)   Now consider the difference between the optimum vector w*
                             = (I − 2µQ)v (n)
                                                                      and the current weight vector w(n):
we may rewrite Eq. (44) in matrix form as
                                                                                             v (n) = w ∗ − w (n)                   (49)
         
   v0 (n)
 v (n)                                                              Assume that the reference signal and the error signal are
 1       
         
     .   
     .
      .                                                                                      d(n) = w ∗Tx (n)                     (50)
  vN−1 (n)                                                                              e(n) = d(n) − w (n)Tx (n)                  (51)
                                                         
            (1 − 2µλ1 )n
                          (1 − 2µλ2 )n                              Substituting Eq. (50) in Eq. (51), we obtain
                                                         
        =                               .               
                                         .
                                          .                                           e(n) = w ∗Tx (n) − w (n)Tx (n)
                                              (1 − 2µλN )n                                   = [w ∗T − w (n)T ]x (n)
                                                                                                w              x                   (52)
                                                          
                                                    v0 (0)                                   = v (n)x (n)
                                                  v (0) 
                                                  1       
                                                             (45)
                                                      .
                                                       .             We decompose v(n) into its rectangular components
                                                      .   
                                                   vN−1 (0)                               v (n) = v o (n) + v p (n)                (53)

                                  vp(n–1)                               Therefore, the NLMS algorithm given by Eq. (64) is equiva-
                                                 vp(n)                  lent to the LMS algorithm if

                                                                                                  2µ =                                (66)
                                                                                                         x T (n)x (n)
                            vp(n)         vp(n)           x(n)

   Figure 13. Geometric interpretation of the NLMS algorithm.           NLMS Algorithm
                                                                               Parameters :         M = filter order
                                                                                                    α = step size
where vo(n) and vp(n) are the orthogonal component and the                     Initialization :     Set w (0) = 0
parallel component of v(n) with respect to the input vector.                   Computation :        For n = 0, 1, 2, . . ., compute
This implies
                                                                                                    y(n) = w (n)Tx (n)
                        v p (n) = Cx (n)
                                   x                             (54)                               e(n) = d(n) − y(n)
where C is a constant. Then substituting Eq. (53) and Eq. (54)                                      β= T
                                                                                                         x (n)x (n)
in Eq. (52), we get
                                                                                                    w (n + 1) = w (n) + βe(n)x (n)
                 e(n) = [v o (n) + v p (n)]Tx (n)
                         v                                       (55)
                 e(n) = [v o (n) + Cx (n)]Tx (n)
                         v          x                            (56)      Time-Variant LMS Algorithms. In the classical LMS algo-
                                                                        rithm there is a tradeoff between validity of the final solution
Because vo(n) is orthogonal to x(n), the scalar multiplication          and convergence speed. Therefore its use is limited for several
is                                                                      practical applications, because a small error in the coefficient
                                                                        vector requires a small convergence factor, whereas a high
                          v Tx (n) = 0
                            o                                    (57)   convergence rate requires a large convergence factor.
                                                                           The search for an optimal solution to the problem of ob-
Then solving for C from Eqs. (56) and (57) yields                       taining high convergence rate and small error in the final
                                                                        solution has been an arduous in recent years. Various algo-
                                   e(n)                                 rithms have been reported in which time-variable conver-
                        C=                                       (58)
                              x T (n)x (n)
                                      x                                 gence coefficients are used. These coefficients are chosen so
                                                                        as to meet both requirements: high convergence rate and low
and                                                                     MSE. Interested readers may refer to Refs. 9–14.

                                   e(n)x (n)
                      v p (n) =                                  (59)   Recursive Least-Squares Algorithm
                                  x T (n)x (n)
                                                                        The recursive least-squares (RLS) algorithm is required for
The target now is to make v(n) as orthogonal as possible to             rapidly tracking adaptive filters when neither the reference-
x(n) in each iteration, as shown in Fig. 13. The above men-             signal nor the input-signal characteristics can be controlled.
tioned can be done by setting                                           An important feature of the RLS algorithm is that it utilizes
                                                                        information contained in the input data, extending back to
                  v (n + 1) = v (n) − αv p (n)
                                       v                         (60)   the instant of time when the algorithm is initiated. The re-
                                                                        sulting convergence is therefore typically an order of magni-
Finally, substituting Eq. (49) and Eq. (59), we get                     tude faster than for the ordinary LMS algorithm.
                                                                           In this algorithm the mean squared value of the error sig-
                                                   e(n)x (n)
          w ∗ − w (n + 1) = w ∗ − w (n) − α                      (61)   nal is directly minimized by a matrix inversion. Consider the
                                                  x T (n)x (n)
                                                         x              FIR filter output
                                          e(n)x (n)
               w (n + 1) = w (n) + α                             (62)                             y(n) = w Tx (n)                     (67)
                                         x T (n)x (n)

where, in order to reach the target,         must satisfy (9)           where x(n) is the input vector given by x(n) [x(n), x(n 1,
                                                                        . . ., x(n M 1)]T and w is the weight vector. The optimum
                            0<α<2                                (63)   weight vector is computed in such a way that the mean
                                                                        squared error, E[e2(n)] is minimized, where
In this way
                                                                                     e(n) = d(n) − y(n) = d(n) − w Tx (n)             (68)
                w (n + 1) = w (n) + βe(n)x (n)
                                         x                       (64)
                                                                                       E[e (n)] = E[{d(n) − w x (n)} ]
                                                                                           2                            T   2
                                                                        To minimize E[e2(n)], we can use the orthogonality principle
                              α                                         in the estimation of the minimum. That is, we select the
                        β= T                                     (65)
                          x (n)x (n)
                               x                                        weight vector in such a way that the output error is orthogo-
                                                                                                                                        ADAPTIVE FILTERS   261

nal to the input vector. Then from Eqs. (67) and (68), we ob-                                Next, for convenience of computation, let
                                                                                                                          Q(n) = R−1 (n)                   (82)
                      E[x (n){d(n) − x (n)w}] = 0
                        x                 w    T
                                                                                                                              R−1 (n − 1)x (n)
                    E[x (n)x T (n)w] = E[d(n)x (n)]
                      x x         w          x                                       (71)                       K(n) =                                     (83)
                                                                                                                         λ + x T (n)R−1 (n − 1)x (n)

Assuming that the weight vector is not correlated with the
                                                                                             Then from Eq. (81) we have
input vector, we obtain
                    E[x (n)x T (n)]w = E[d(n)x (n)]
                      x x          w         x                                       (72)      w (n) =     [Q(n − 1) − K(n)x T (n)Q(n − 1)]
                                                                                                         λ                                                 (84)
                                                                                                         [λp (n − 1) + d(n)x (n)]
                                                                                                           p               x
which can be rewritten as
                                                                                                w (n) = Q(n − 1)p (n − 1) +
                                                                                                                p              d(n)Q(n − 1)x (n)
                                      Rw = p
                                       w                                             (73)                                    λ
                                                                                                         − K(n)x T (n)Q(n − 1)p (n − 1)
                                                                                                               x              p                            (85)
where R and p are the autocorrelation matrix of the input                                                  1
signal and the correlation vector between the reference sig-                                            − d(n)K(n)xT (n)Q(n − 1)x (n)
                                                                                                                        x              x
nal d(n) and input signal x(n), respectively. Next, assuming
ergodicity, p can be estimated in real time as                                                                        1
                                                                                                w (n) = w (n − 1) + d(n)Q(n − 1)x (n) x
                                                                                                           Q(n − 1)x (n)x T (n)w (n − 1)
                                                                                                                     x x        w
                        p (n) =             λn−k d(k)x (k)
                                                     x                               (74)               −                                                  (86)
                                                                                                             λ + x T (n)Q(n − 1)x (n)
                        n−1                                                                                1 d(n)Q(n − 1)x (n)x T (n)Q(n − 1)x (n)
                                                                                                                             x x              x
              p (n) =         λn−k d(k)x (k) + d(n)x (n)
                                       x           x                                                       λ         λ + x T (n)Q(n − 1)x (n)
                                                                                     (75)                           1       Q(n − 1)x (n)
                           n−1                                                                  w (n) = w (n − 1) +
                      =λ         λ   n−k−1
                                             d(k)x (k) + d(n)x (n)
                                                 x           x                                                      λ λ + x T (n)Q(n − 1)x (n)
                           k=0                                                                           × [λd(n) + d(n)x (n)Q(n − 1)x (n)
                                                                                                                        x T
                                                                                                                                        x                  (87)

              p (n) = λp (n − 1) + d(n)x (n)
                       p               x                                             (76)                − λx T (n)w (n − 1) − d(n)x T (n)Q(n − 1)x (n)]
                                                                                                            x      w               x              x
                                                                                                                     1      Q(n − 1)x (n)
where     is the forgetting factor. In a similar way, we can ob-                               w (n) = w (n − 1) +
tain                                                                                                                 λ λ+x   T (n)Q(n − 1)x (n)
                                                                                                                                          x                (88)
                                                                                                         × λ[d(n) − x T (n)w (n − 1)]
                    R(n) = λR(n − 1) + x (n)x T (n)
                                            x                                        (77)
                                                                                             Finally, we have
Then, multiplying Eq. (73) by R                        and substituting Eq. (76)
and Eq. (77), we get                                                                                              w (n) = w (n − 1) + K(n) (n)             (89)

  w = [λR(n − 1) + x (n)x T (n)]−1 [λp (n − 1) + d(n)x (n)]
                        x            p               x                               (78)    where

Next, according to the matrix inversion lemma                                                                                  Q(n − 1)x (n)
                                                                                                                K(n) =                                     (90)
                                                                                                                          λ + x T (n)Q(n − 1)x (n)
                 −1        −1          −1              −1          −1 −1        −1
   (A + BCD)          =A         −A         B(DA            B +C     )     DA        (79)
                                                                                             and (n) is the a priori estimation error, based on the old
with A        R(n       1), B              x(n), C           1, and D            xT(n), we   least-square estimate of the weights vector that was made at
obtain                                                                                       time n 1, and defined by

           1 −1                       1 −1
w (n) =      R (n − 1) −                R (n − 1)x (n)
                                                 x                                                                 (n) = d(n) − w T (n − 1)x (n)
                                                                                                                                           x               (91)
           λ                          λ
              1 T                                                 1 T                        Then Eq. (89) can be written as
          ×     x (n)R−1 (n − 1)x (n) + 1
                                x                                   x (n)R−1 (n − 1)
              λ                                                   λ
                                                                                                                w (n) = w (n − 1) + Q(n) (n)x (n)
                                                                                                                                            x              (92)
          × [λp (n − 1) + d(n)x (n)]
              p               x
                                                                                             where Q(n) is given by
                                      −1                                 −1
        1                 R (n − 1)x (n)x (n)R (n − 1)
                                       x x                   xT
w (n) =     R−1 (n − 1) −
        λ                    [λ + x T (n)R−1 (n − 1)x (n)]
                                                    x                                                       1                 Q(n − 1)x T (n)Q(n − 1)
                                                                                                   Q(n) =        Q(n − 1) −                                (93)
        × [λp (n − 1) + d(n)x (n)]
            p               x                              (81)                                             λ                 λ + x T (n)Q(n − 1)x (n)

   The applicability of the RLS algorithm requires that it ini-          3. S. Haykin, Adaptive Filter Theory, 3rd ed., Upper Saddle River,
tialize the recursion of Q(n) by choosing a starting value                  NJ: Prentice-Hall, 1996.
Q(0) to ensure the nonsingularity of the correlation matrix              4. B. Friedlander, Lattice filters for adaptive processing, Proc.
R(n) (3).                                                                   IEEE, 70: 829–867, 1982.
                                                                         5. J. J. Shynk, Adaptive IIR filtering, IEEE ASSP Mag., 6 (2): 4–
RLS Algorithm                                                               21, 1989.
       Initialization :    Set Q(0)                                      6. P. Hughes, S. F. A. Ip, and J. Cook, Adaptive filters—a review of
                                                                            techniques, BT Technol. J., 10 (1): 28–48, 1992.
                           w (0) = 0
                                                                         7. B. Widrow and S. Stern, Adaptive Signal Processing, Englewood
       Computation :       For n = 1, 2, . . ., compute                     Cliffs, NJ: Prentice-Hall, 1985.
                                         Q(n − 1)x (n)
                                                  x                      8. B. Widrow and M. E. Hoff, Jr., Adaptive switching circuits, IRE
                           K(n) =
                                    λ + x T (n)Q(n − 1)x (n)
                                                       x                    WESCON Conv. Rec., part 4, 1960, pp. 96–104.
                                                                         9. J. Nagumo and A. Noda, A learning method for system identifi-
                            (n) = d(n) − w T (n − 1)x (n)
                                                    x                       cation, IEEE Trans. Autom. Control, AC-12: 282–287, 1967.
                           w (n) = w (n − 1) + K(n) (n)                 10. R. H. Kwong and E. W. Johnston, A variable step size LMS algo-
                           w (n + 1) = w (n) + βe(n)x (n)
                                                    x                       rithm, IEEE Trans. Signal Process., 40: 1633–1642, 1992.
                           Q(n) = R−1 (n)                               11. I. Nakanishi and Y. Fukui, A new adaptive convergence factor
                                                                            with constant damping parameter, IEICE Trans. Fundam. Elec-
                                                                            tron. Commun. Comput. Sci., E78-A (6): 649–655, 1995.
IMPLEMENTATIONS OF ADAPTIVE FILTERS                                     12. T. Aboulnasr and K. Mayas, A robust variable step size LMS-
                                                                            type algorithm: Analysis and simulations, IEEE Trans. Signal
In the last few years many adaptive filter architectures have                Process., 45: 631–639, 1997.
been proposed, for reducing the convergence rate without in-            13. F. Casco et al., A variable step size (VSS-CC) NLMS algorithm,
creasing the computational cost significantly. The digital im-               IEICE Trans. Fundam., E78-A (8): 1004–1009, 1995.
plementations of adaptive filters are the most widely used.              14. M. Nakano et al., A time varying step size normalized LMS algo-
They yield good performance in terms of adaptivity, but con-                rithm for adaptive echo canceler structures, IEICE Trans. Fun-
sume considerable area and power. Several implementations                   dam., E78-A (2): 254–258, 1995.
achieve power reduction by dynamically minimizing the order             15. J. T. Ludwig, S. H. Nawab, and A. P. Chandrakasan, Low-power
of the digital filter (15) or employing parallelism and pipelin-             digital filtering using approximate processing, IEEE J. Solid
ing (16). On the other hand, high-speed and low-power appli-                State Circuits, 31: 395–400, 1996.
cations require both parallelism and reduced complexity (17).           16. C. S. H. Wong et al., A 50 MHz eight-tap adaptive equalizer for
   Is well known that analog filters offer advantages of small               partial-response channels, IEEE J. Solid State Circuits, 30: 228–
area, low power, and higher-frequency operation over their                  234, 1995.
digital counterparts, because analog signal-processing opera-           17. R. A. Hawley et al., Design techniques for silicon compiler imple-
tions are normally much more efficient than digital ones.                    mentations of high-speed FIR digital filters, IEEE J. Solid State
Moreover, since continuous-time adaptive filters do not need                 Circuits, 31: 656–667, 1996.
analog-to-digital conversion, it is possible to prevent quanti-         18. M. H. White et al., Charge-coupled device (CCD) adaptive dis-
zation-related problems.                                                    crete analog signal processing, IEEE J. Solid State Circuits, 14:
   Gradient descent adaptive learning algorithms are com-                   132–147, 1979.
monly used for analog adaptive learning circuits because of             19. T. Enomoto et al., Monolithic analog adaptive equalizer inte-
their simplicity of implementation. The LMS algorithm is of-                grated circuit for wide-band digital communications networks,
ten used to implement adaptive circuits. The basic elements                 IEEE J. Solid State Circuits, 17: 1045–1054, 1982.
used for implementing the LMS algorithm are delay elements              20. F. J. Kub and E. W. Justh, Analog CMOS implementation of high
(which are implemented with all-pass first-order sections),                  frequency least-mean square error learning circuit, IEEE J. Solid
multipliers (based on a square law), and integrators. The                   State Circuits, 30: 1391–1398, 1995.
techniques utilized to implement these circuits are discrete-           21. Y. L. Cheung and A. Buchwald, A sampled-data switched-current
time approaches, as discussed in Refs. 18 to 21, and continu-               analog 16-tap FIR filter with digitally programmable coefficients
ous-time implementations (22,23,24).                                        in 0.8 m CMOS, Int. Solid-State Circuits Conf., February 1997.
   Several proposed techniques involve the implementation of            22. J. Ramirez-Angulo and A. Dıaz-Sanchez, Low voltage program-
the RLS algorithm, which is known to have very low sensitiv-                mable FIR filters using voltage follower and analog multipliers,
                                                                            Proc. IEEE Int. Symp. Circuits Syst., Chicago, May 1993.
ity to additive noise. However, a direct analog implementa-
tion of the RLS algorithm would require a considerable effort.          23. G. Espinosa F.-V. et al., Ecualizador adaptivo BiCMOS de tiempo
To overcome this problem, several techniques have been pro-                 continuo, utilizando una red neuronal de Hopfield, CONIELEC-
                                                                            OMP’97, UDLA, Puebla, Mexico, 1997.
posed, such as structures based on Hopfield neural networks
(23,25,26,27).                                                          24. L. Ortız-Balbuena et al., A continuous time adaptive filter struc-
                                                                            ture, IEEE Int. Conf. Acoust., Speech Signal Process., Detroit,
                                                                            1995, pp. 1061–1064.
BIBLIOGRAPHY                                                            25. M. Nakano et al., A continuous time equalizer structure using
                                                                            Hopfield neural networks, Proc. IASTED Int. Conf. Signal Image
 1. S. U. H. Qureshi, Adaptive equalization, Proc. IEEE, 73: 1349–          Process., Orlando, FL, November 1996, pp. 168–172.
    1387, 1985.                                                                                              ´
                                                                        26. G. Espinosa F.-V., A. Dıaz-Mendez, and F. Maloberti, A 3.3 V
 2. J. Makhoul, Linear prediction: A tutorial review, Proc. IEEE, 63:       CMOS equalizer using Hopfield neural network, 4th IEEE Int.
    561–580, 1975.                                                          Conf. Electron., Circuits, Syst., ICECS97, Cairo, 1997.
                                                                       ADAPTIVE RADAR   263

27. M. Nakano-Miyatake and H. Perez-Meana, Analog adaptive fil-
    tering based on a modified Hopfield network, IEICE Trans. Fun-
    dam., E80-A: 2245–2252, 1997.

Reading List
M. L. Honig and D. G. Messerschmitt, Adaptive Filters: Structures,
   Algorithms, and Applications, Norwell, MA: Kluwer, 1988.
B. Mulgrew and C. F. N. Cowan, Adaptive Filters and Equalisers,
   Norwell, MA: Kluwer, 1988.
S. Proakis et al., Advanced Signal Processing, Singapore: Macmillan.

                               GUILLERMO ESPINOSA FLORES
                               JOSE ALEJANDRO DIAZ MENDEZ
                                  ´            ´    ´
                               National Institute for Research in
                                 Astrophysics, Optics and

Shared By: