Docstoc

Basis Expansion and Regularization

Document Sample
Basis Expansion and Regularization Powered By Docstoc
					  Basis Expansion and
     Regularization


       Prof. Liqing Zhang
Dept. Computer Science & Engineering,
     Shanghai Jiaotong University
                     Outline
• Piece-wise Polynomials and Splines
• Wavelet Smoothing
• Smoothing Splines
• Automatic Selection of the Smoothing Parameters
• Nonparametric Logistic Regression
• Multidimensional Splines
• Regularization and Reproducing Kernel Hilbert
  Spaces

2012/11/6       Basis Expansion and Regularization   2
 Piece-wise Polynomials and Splines
                                                    N
• Linear basis expansion                f ( x)    m hm ( x)
                                                   m 1


• Some basis functions that are widely used
              hm ( x)  x
              hm ( x)  x p
              hm ( x)  log( x)
              hm ( x)  sin(mx); cos(mx)


  2012/11/6           Basis Expansion and Regularization         3
              Regularization
• Three approaches for controlling the
  complexity of the model.                   N

   – Restriction     f ( x)    h ( x)              k k
                                          k 1
  – Selection                      m
                             y    k hk ( x)  
  – Regularization:               k 1

                                         M                      M      m           2

                                                 (i )  yi    k hk ( xi )
                                                        2
                             min 
                                         i 1                   i 1   k 1

                                         M                  m           2

                             min       i 1
                                                 yi    k hk ( xi )  J (  )
                                                        k 1

                             J ( )   ?
                                                  2


  2012/11/6      Basis Expansion and Regularization                                    4
 Piecewise Polynomials and Splines
h1  X   I  X  1 ,
h2  X   I 1  X   2 
h3  X   I  2  X  ;
h1  X   I  X  1 ,
h2  X   I 1  X   2 
h3  X   I  2  X ,
hm 3  X   hm  X  X ;
h1  X   1, h2  X   X ,
h3  X    X  1  ,
h4  X    X   2 
  2012/11/6                  Basis Expansion and Regularization   5
    Piecewise Cubic Polynomials
• Increasing orders
  of continuity at the
  knots.
• A cubic spline with
  knots at  1 and  2:
   1, X , X 2 , X 3 ,
    X  1 3 ,  X   2 3 ;
                           

• Cubic spline
  truncated power
  basis
  2012/11/6                 Basis Expansion and Regularization   6
    Piecewise Cubic Polynomials
• An order-M spline with knots , j=1,…,K is a
  piecewise-polynomial of order M, and has
  continuous derivatives up to order M-2.
• A cubic spline has M=4.
• Truncated power basis set:
                                j 1
                hj ( X )  X           ,   j  1, , M
              hM l ( X )  ( X   l )  1 , l  1, , M
                                        M




  2012/11/6              Basis Expansion and Regularization   7
               Natural cubic spline
• Natural cubic spline adds additional constraints,
  namely that the function is linear beyond the
  boundary knots.
                       Natural boundary constraints


              Linear                               Cubic

                                 Cubic                             Linear



                         1                 2               3

  2012/11/6                   Basis Expansion and Regularization            8
                                   B-spline
• The augmented knot sequenceτ:
        1   2     M  0 ;
               jM   j ,          j  1,, K ;
                K 1   K  M 1   K  M  2     K  2 M
• Bi,m(x), the i-th B-spline basis function of order m
  for the knot-sequenceτ, m≤M.
                 1  i  x   i 1 
          Bi ,1 ( x)                           i  1, , K  2 M  m
                       0 others                
                            x  i                       im  x
          Bi ,m ( x)                   Bi.m 1 ( x)                   Bi 1,m 1 ( x)
                        i  m 1   i                 i  m   i 1
  2012/11/6                    Basis Expansion and Regularization                         9
                B-spline
                                          • The sequence of
                                            B-spline up to
                                            order 4 with ten
                                            knots evenly
                                            spaced from
                                            0 to 1
                                          • The B-spline
                                            have local
                                            support; they are
                                            nonzero on an
                                            interval spanned
                                            by M+1 knots.
2012/11/6   Basis Expansion and Regularization              10
                       Smoothing Splines
• Based on the spline basis method:
                         N
                                 f ( x)    k hk ( x)
                m

• So y    k hk ( x)   ,  is the noise.
                                            k 1

                k 1
• Minimize the penalized residual sum of squares
                                N
               RSS ( f ,  )   yi  f ( xi )     f ' ' (t ) dt
                                                      2            2

                               i 1

      is a fixed smoothing parameter
   0 : f can be any function that interpolates the data
    : the simple least squares line fit
   2012/11/6                  Basis Expansion and Regularization          11
               Smoothing Splines
                                                            N
• The solution is a natural spline: f ( x)   N j ( x) j
                                                           j 1
• Then the criterion reduces to:
        RSS( , )   y  N   y  N    T N
                               T


   – where N  {N j ( xi ) };  Nij   Ni" (t ) N "j (t )dt
• So the solution:
                 ˆ
                  ( N T N  N )1 N T y
• The fitted smoothing spline:
                                     N
                         f ( x)   N j ( x) j
                         ˆ                   ˆ
                                    j 1


   2012/11/6           Basis Expansion and Regularization         12
            Smoothing Splines
                                            • Function of age,
                                                that response the
                                                relative change in
                                                bone mineral
                                                density measured
                                                at the spline in
                                                adolescents
                                            • Separate
                                                smoothing
                                                splines fit the
                                                males and
                                                females,  0.00022
脊骨BMD—骨质密度                                  • 12 degrees of
2012/11/6
                                                freedom
               Basis Expansion and Regularization                13
                Smoothing Matrix
• fˆ the N-vector of fitted values
               ˆ
              f  N ( N T N  N )1 N T y  S y
• The finite linear operator S  — the smoother matrix
• Compare with the linear operator in the LS-fitting:
   M cubic-spline basis functions, knot sequenceξ
        ˆ
        f  B ( B B )1 B y  H y, B is N  M matrix
                   T         T


• Similarities and differences:
   – Both are symmetric, positive semidefinite matrices
   – H  H   H  idempotent(幂等的) ; S  S   S  shrinking
   – rank: r ( S  )  N , r ( H  )  M
  2012/11/6           Basis Expansion and Regularization       14
              Smoothing Matrix
• Effective degrees of freedom of a smoothing spline
                    df   trace( S  )
• S  in the Reinsch form: S   ( I  K ) 1
• Since f  ˆ  S y, solution: min f y  f 2  f T Kf
                

• S  is symmetric and has a real eigen-decomposition
                              N
                     S     k ( )uk uk
                                         T

                             k 1

                                      1
                      k ( ) 
                             1  d k
  – d k is the corresponding eigenvalue of K

  2012/11/6         Basis Expansion and Regularization   15
• Smoothing spline
  fit of ozone(臭氧)
  concentration
  versus Daggot
  pressure gradient.
• Smoothing
  parameter df=5
  and df=10.
• The 3rd to 6th
  eigenvectors of
  the spline
  smoothing
  matrices
  2012/11/6      Basis Expansion and Regularization   16
• The smoother matrix for a
  smoothing spline is nearly
  banded, indicating an
  equivalent kernel with local
  support.
  2012/11/6       Basis Expansion and Regularization   17
              Bias-Variance Tradeoff
• Example:           Y  f (X )  
                           sin(12( X  0.2))
                  f (X ) 
                                X  0.2
        ˆ                 ˆ
• For f  S y, then cov( f )  S cov(y)S  S S
                                           T       T


• The diagonal contains the pointwise variances at
  the training x i
                         ˆ            ˆ
• Bias is given by Bias( f )  f  E( f )  f  S f
• f is the (unknown) vector of evaluations of the true f


  2012/11/6        Basis Expansion and Regularization   18
            Bias-Variance Tradeoff
                                     • df=5, bias high,
                                       standard error band
                                       narrow

                                      • df=9, bias slight,
                                        variance not
                                        increased appreciably

                                      • df=15, over learning,
                                        standard error widen

2012/11/6        Basis Expansion and Regularization          19
           Bias-Variance Tradeoff
• The integrated squared predictionThe EPE and CV
                                            error (EPE)
                                           in a single
  combines both bias and variance curves have the a
  summary: EPE( fˆ )  E (Y  fˆ ( X ))2 similar shape.
                                              ˆ                ˆ
                            Var(Y )  E BiasAnd, overall(theX ))
                                             ( f  ( X ))  Var f  ( CV     
                              MSE( f
                              2          ˆ ) curve is
                                              
• N fold (leave one) cross-validation:                approximately
                             N                        unbiased as an
              CV ( f                  ˆ
                   ˆ )  ( y  f i ( x ))2
                                               i     estimate of the
                            i 1
                                                      EPE curve
                                                       2
                             N           ˆ
                                   yi  f  ( xi ) 
                              
                                                     
                                                     
                            i 1  1  S  (i, i ) 
  2012/11/6          Basis Expansion and Regularization                 20
                          Logistic Regression
• Logistic regression with a single quantitative
  input X         Pr(Y  1 | X  x)
              log                    f ( x)
                                Pr(Y  0 | X  x)
                                                  e f ( x)
                            Pr(Y  1 | X  x) 
                                                1  e f ( x)
• The penalized log-likelihood criterion
               N
l ( f ;  )    yi log p( xi )  (1  yi ) log(1  p( xi ))    { f " (t )}2 dt
                                                                1
              i 1                                              2

                                                        
               N
                                                           1
            yi f ( xi )  log(1  e        f ( xi )
                                                        )    { f " (t )}2 dt
               i 1                                        2
   2012/11/6                    Basis Expansion and Regularization                 21
          Multidimensional Splines
• Tensor product basis
  – The M1×M2 dimensional tensor product basis
  g jk ( X )  h1 j ( X 1 )h2 k ( X 2 ), j  1, , M 1 , k  1, , M 2
  – h1 j ( X 1 ), basis function for coordinate X1
  – h2 k ( X 2 ), basis function for coordinate X2
                               M1 M 2
                    g ( X )   jk g jk ( X )
                               j 1 k 1




  2012/11/6            Basis Expansion and Regularization                22
Tenor product basis of B-splines, some selected pairs
  2012/11/6       Basis Expansion and Regularization   23
           Multidimensional Splines
• High dimension smoothing Splines
                       N
               min   f  yi  f ( xi )2  J  f ,              xi  IR d
                       i 1
    – J is an appropriate penalty function
                   2 f ( x)  2   2 f ( x)  2   2 f ( x)  2 
 J  f     2 
                   x 2   2 x x    x 2  dx1dx2
                                                             
              IR
                 
                         1        1 2                  2      
a smooth two-dimensional surface, a thin-plate spline.
• The solution has the form
                                                     N
                     f ( x)   0   T x    j h j ( x)
                                                     j 1
   2012/11/6                  Basis Expansion and Regularization               24
        Multidimensional Splines
                                         • The decision
                                           boundary of an
                                           additive logistic
                                           regression model.
                                           Using natural
                                           splines in each of
                                           two coordinates.
                                         • df = 1 +(4-1) + (4-
                                           1) = 7



2012/11/6     Basis Expansion and Regularization             25
        Multidimensional Splines
                                           • The results of
                                             using a tensor
                                             product of natural
                                             spline basis in
                                             each coordinate.
                                           • df = 4 x 4 = 16




2012/11/6     Basis Expansion and Regularization             26
        Multidimensional Splines
                                     • A thin-plate spline fit
                                       to the heart disease
                                       data.
                                     • The data points are
                                       indicated, as well as
                                       the lattice of points
                                       used as knots.




2012/11/6     Basis Expansion and Regularization             27
Reproducing Kernel Hilbert space
• A regularization problems has the form:
                                                                              ~ 2
                   N                                                        f ( s)
              min  L( yi , f ( xi ))  J ( f )               J( f )      ~     ds
              f H
                    i 1                                                    G ( s)
  – L(y,f(x)) is a loss-function.
  – J(f) is a penalty functional, and H is a space of
    functions on which J(f) is defined.
• The solution
                            K                     N
                  f ( x)    kk ( X )    i G ( X  xi )
                           k 1                  i 1
  –    k span the null space of the penalty functional J
  2012/11/6                 Basis Expansion and Regularization                      28
Spaces of Functions Generated by Kernel
• Important subclass are generated by the positive
  kernel K(x,y).
• The corresponding space of functions Hk is called
  reproducing kernel Hilbert space.
• Suppose thatK has an eigen-expansion
       K ( x, y )    ii ( x)i ( y ),  i  0, i 1  i2  
                                                                  

                      i 1


• Elements of H have an expansion
                                                          
              f ( x)   cii ( x),                      ci2 /  i  
                                                  2
                                              f   Hk
                                                       ˆ
                      i 1                                i 1
  2012/11/6                  Basis Expansion and Regularization            29
Spaces of Functions Generated by Kernel
• The regularization problem become
                          N                      2 
                 min  L( yi , f ( xi ))   f H 
                 f H k
                           i 1                   k
                                                     
                          N                                   
                 min  L( yi ,  c j j ( xi ))    c j /  j 
                                                          2
                 C j 1  i 1 j 1                j 1        
• The finite-dimension solution(Wahba,1990)
                                      N
                          f ( x)    i K ( x, xi )
                                     i 1
• Reproducing properties of kernel function

 K (, xi ), f  H K  f ( xi ),  K (, xi ), K (, x j )  K ( xi , x j )
  2012/11/6                Basis Expansion and Regularization               30
Spaces of Functions Generated by Kernel
•                                    N
                         f ( x)    i K ( x, xi )
                                     i 1

• The penalty functional
                    N N
                        J ( f )   K ( xi , x j ) i j
                                   i 1 j 1
• The regularization function reduces to a
  finite-dimensional criterion
                min L( y, K )   T K , K  K ( xi , x j )
                 
                                                                
    – K is NxN matrix, the ij-th entry K(xi, xj)
    2012/11/6               Basis Expansion and Regularization       31
                            RKHS
• Penalized least squares
              min ( y  K )T ( y  K )   T K
               

• The solution of  :   ( K  I )T y
                      ˆ
• The fitted values: N
                     f ( x )    k K ( x , xk )
                     ˆ           ˆ
                                k 1
• The vector of N fitted value is given by
                   ˆ
                   f  K  K ( K  I ) 1 y
                        ˆ
                      ( I  K 1 ) 1 y
  2012/11/6           Basis Expansion and Regularization   32
                Example of RKHS
• Polynomial regression
  – Suppose h( x) : IR p  IR M M huge
  – Given x1 , x2 ,  , x N , with M  N , H  {h j ( xi )}
  – Loss function: R(  )  ( y  H )T ( y  H )   T 
           L( 
  – The penalty)polynomialyregression: 0
                                     ˆ      ˆ
                     0   H T (  H )   
                     N     M
                                             
                                               2    M

                    
              min  yi T
                    M
                      i 1 
                                
                               m hm ( xi )  ˆ    m
                              m 1
                                   ˆ
              { m }1  HH ( y  H )  H  0
                                                   m 1
                                                         2
                                                              
   The solutionH  ( HH T  I ) 1 HH T y
                 :
                 {HH T } : h( xi ), h( x j )  K ( xi , x j )
                       N
   f ( x)  h( x)     i K ( x, xi ),   ( K  I ) 1 y
   ˆ            T
                        ˆ                ˆ
                        i 1
  2012/11/6              Basis Expansion and Regularization       33
Penalized Polynomial Regression
• Kernel: K ( x, x )  (1  x, x ) d has M   p  d 
                                                        
  eigen-functions                                                  d        

• E.g. d=2, p=2:
                               
  K ( x, x)  (1  x1 x1  x2 x2 )
                                                                       
               1  2 x1 x1  2 x2 x2  ( x1 x1 ) 2  ( x2 x2 ) 2  2 x1 x1 x2 x2
       h( x)  (1, 2 x1 , 2 x2 , x12 , x2 , 2 x1 x2 )
                                        2


• The penalty polynomial regression:
                             2
                     N
                            M
                                                M
              min   yi    m hm ( xi )      m2

                      i 1                
              { m }1
                    M
                             m 1                m 1
  2012/11/6                Basis Expansion and Regularization                    34
         RBF kernel & SVM kernel
• Gaussian Radial Basis Functions
                         x  y / 2 2
                               2

         K ( x, y )  e                ;
         h( x j )  K ( x, x j ); j  1,..., M
• Support Vector Machines
                                 N
               f ( x )   0    j K ( x, x j )
                                 j 1

                     N                          
              min  1  yi f ( xi )    K 
                                             T
               0 ,
                      i 1                      

  2012/11/6          Basis Expansion and Regularization   35
               Wavelet smoothing
• Another type of bases——Wavelet bases
• Wavelet bases are generated by translations and
  dilations of a single scaling function  (x).
• If  ( x)  I ( x [0 1]) ,then 0,k ( x)   ( x  k ) generates an
  orthonormal basis for functions with jumps at the
  integers.
• 0,k ( x) form a space called reference space V0
• The dilations 1,k ( x)  2 (2x  k ) form an orthonormal
  basis for a space V1  V0
• Generally, we have   V1  V0  V1  
   2012/11/6           Basis Expansion and Regularization          36
             j ,k ( x)  2 j / 2  (2 j x  k )

        V0  V1  V2  


        V j 1  V j  W j
                            W j  信号的细节,正交于 j
                                            V


         ( x)   (2 x)   (2 x  1)  W0上的小波基
         j ,k ( x)  2 j / 2 (2 j x  k )  W j 上的小波基

2012/11/6                     Basis Expansion and Regularization   37
                  Wavelet smoothing
• The L2 space dividing
                     V j 1  V j  W j  V j 1  W j 1  W j
                           V0  W0  W1    W j

                2V1 V0
                V                     V1                          V
                                                                   2
               W 3 W 2 W1                  W0  V1 /V0         

• Mother wavelet  ( x)   (2 x)   (2 x  k ) generate
  function  0,k   ( x  k ) form an orthonormal basis for
  W0. Likewise  j ,k  2 j / 2 (2 j x  k ) form a basis for Wj.

   2012/11/6              Basis Expansion and Regularization           38
               Wavelet smoothing
• Wavelet basis on W0 :
             ( x)   (2 x)   (2 x  1)
• Wavelet basis on Wj :
             j ,k ( x)  2 j / 2 (2 j x  k )
• The symmlet-p wavelet:
   – A support of 2p-1 consecutive intervals.
   – p vanishing moments:
                  ( x) x dx  0,           j  1,, p
                          j


   2012/11/6          Basis Expansion and Regularization   39
        Adaptive Wavelet Filtering
• Wavelet transform: y *  W T y
  – y: response vector, W: NxN orthonormal wavelet
    basis matrix
• Stein Unbiased Risk Estimation (SURE)
             min y W 2  2  1
                        2


  – The solution:
                    ˆ  sign( y )(| y |  )
                    j         j      j        

  – Fitted function is given by inverse wavelet
    transform: fˆ  Wˆ , LS coefficients truncated to 0
  – Simple choice for  :    2 log N
  2012/11/6         Basis Expansion and Regularization   40
S   ( I  K )  N ( N N   N ) N
             1       T          1     T
              Smoothing Matrix
• Effective degrees of freedom of a smoothing spline
                    df   trace( S  )
• S  in the Reinsch form: S   ( I  K ) 1
• Since f  ˆ  S y, solution: min f y  f 2  f T Kf
                

• S  is symmetric and has a real eigen-decomposition
                              N
                     S     k ( )uk uk
                                         T

                             k 1

                                      1
                      k ( ) 
                             1  d k
  – d k is the corresponding eigenvalue of K

  2012/11/6         Basis Expansion and Regularization   42

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:5
posted:11/6/2012
language:English
pages:43