SVM

Document Sample
SVM Powered By Docstoc
					         Course

Summer



         Mining


 Data




         Summer Course: Data Mining
                      Vector Vector Machines
              SupportSupportMachines
              and other penalization classifiers


                                  Presenter: Georgi Nalbantov


                               Presenter: Georgi Nalbantov




                                                                August 2009
Summer
         Course
                                                                  2/20

         Mining


 Data




                                              Contents


                     Purpose
                     Linear Support Vector Machines
                     Nonlinear Support Vector Machines
                     (Theoretical justifications of SVM)
                     Marketing Examples
                     Other penalization classification methods
                     Conclusion and Q & A
                     (some extensions)
Summer
         Course
                                                                                             3/20

         Mining


 Data




                                                 Purpose


                     Task to be solved (The Classification Task):


                      Classify cases (customers) into “type 1” or “type 2” on the basis of
                      some known attributes (characteristics)




                     Chosen tool to solve this task:

                      Support Vector Machines
Summer
         Course
                                                                                                                4/20

         Mining


 Data




                                         The Classification Task

                     Given data on explanatory and explained variables, where the explained variable
                      can take two values {  1 }, find a function that gives the “best” separation between
                      the “-1” cases and the “+1” cases:

                      Given:     ( x1, y1 ), … , ( xm , ym )      n        {1}

                      Find:       : n  {  1 }

                      “best function” = the expected error on unseen data ( xm+1, ym+1 ), … , ( xm+k , ym+k )
                                        is minimal

                     Existing techniques to solve the classification task:
                             Linear and Quadratic Discriminant Analysis
                             Logit choice models (Logistic Regression)
                             Decision trees, Neural Networks, Least Squares SVM
Summer
         Course
                                                                                                         5/20

         Mining


 Data




                            Support Vector Machines: Definition


                     Support Vector Machines are a non-parametric tool for classification/regression




                     Support Vector Machines are used for prediction rather than description purposes




                     Support Vector Machines have been developed by Vapnik and co-workers
      Summer
                                Course
                                                                                                                             6/20

                                Mining


               Data




                                                              Linear Support Vector Machines
                                                                               A direct marketing company wants to sell a
                                                                                new book:
                                                                                       “The Art History of Florence”
                                  ∆      buyers
Number of art books purchased




                                  ●      non-buyers                            Nissan Levin and Jacob Zahavi in Lattin,
                                         ∆       ∆                              Carroll and Green (2003).
                                                      ∆
                                             ∆                     ∆
                                ∆                                              Problem: How to identify buyers and non-
                                                     ●         ●                buyers using the two variables:
                                          ∆                            ●         Months since last purchase


                                         ●       ∆                  ∆ ●          Number of art books purchased

                                                                   ●
                                             ●            ●
                                                                      ●
                                                              ●
                                                 ●

                                         Months since last purchase
      Summer
                                Course
                                                                                                                             7/20

                                Mining


               Data




                                                                  Linear SVM: Separable Case
                                                                                 Main idea of SVM:

                                                                                  separate groups by a line.
                                  ∆      buyers
Number of art books purchased




                                  ●      non-buyers
                                                                                 However: There are infinitely many lines
                                         ∆       ∆
                                                                                  that have zero training error…
                                                      ∆
                                             ∆
                                ∆                                                … which line shall we choose?
                                          ∆                           ●
                                                                      ●
                                                                  ●
                                             ●            ●
                                                                      ●
                                                              ●
                                                 ●

                                         Months since last purchase
      Summer
                                Course
                                                                                                                             8/20

                                Mining


               Data




                                                                  Linear SVM: Separable Case
                                                                                  SVM use the idea of a margin around the
                                                                                   separating line.

                                  ∆      buyers
Number of art books purchased




                                  ●      non-buyers                               The thinner the margin,
                                         ∆       ∆
                                                                      margin
                                                      ∆
                                             ∆
                                ∆
                                                                                  the more complex the model,
                                          ∆                            ●
                                                                       ●
                                                                  ●
                                             ●            ●                        The best line is the one with the
                                                                       ●       
                                                              ●                    largest margin.
                                                 ●

                                         Months since last purchase
      Summer
                                Course
                                                                                                                               9/20

                                Mining


               Data




                                                                 Linear SVM: Separable Case
                                                                                  The line having the largest margin is:

    x2
                                                                                   w1x1 + w2x2 + b = 0
                                                     w
Number of art books purchased




                                         ∆       ∆
                                                     ∆
                                             ∆
                                ∆
                                         ∆                           ●            Where
                                                 margin                               x1 = months since last purchase
                                                                     ●
                                                                 ●                    x2 = number of art books purchased
                                             ●           ●
                                                                     ●
                                                             ●                    Note:
                                                 ●                                    w1xi 1 + w2xi 2 + b  +1    for i  ∆
                                                                          x1          w1xj 1 + w2xj 2 + b  –1    for j  ●
                                         Months since last purchase
      Summer
                                Course
                                                                                                                                        10/20

                                Mining


               Data




                                                                 Linear SVM: Separable Case


                                                                                  The width of the margin is given by:
    x2                                               w
                                                                                                 1  ( 1)          2
Number of art books purchased




                                                                                     margin                 
                                         ∆       ∆                                                w1  w 2
                                                                                                   2     2       || w ||
                                                     ∆
                                             ∆
                                ∆
                                         ∆                                        Note:
                                                                     ●
                                                 margin                             2 w              w 2                    w   2
                                                                                                                                    2
                                                                     ●
                                                                 ●                  maximize       minimize                minimize
                                             ●           ●
                                                                     ●             the margin
                                                             ●
                                                 ●
                                                                          x1
                                         Months since last purchase
Summer
         Course
                                                                                                             11/20

         Mining


 Data




                                          Linear SVM: Separable Case
                                                                                                     2
                                                            2 w                 w 2              w       2
x2                                                               maximize       minimize       minimize
                                                                the margin



                  ∆       ∆
                              ∆                            The optimization problem for SVM is:
                      ∆
         ∆
                                          margin                                       2
                  ∆                                         minimize L( w )  w            2
                                              ●
                                              ●
                                          ●                subject to:
                      ●           ●
                                              ●                   w1xi 1 + w2xi 2 + b  +1     for i  ∆
                                      ●                     

                          ●                                      w1xj 1 + w2xj 2 + b  –1     for j  ●
                                                   x1
Summer
         Course
                                                                                                             12/20

         Mining


 Data




                                          Linear SVM: Separable Case


x2                                    “Support vectors”

                                                              “Support vectors” are those points that lie
                  ∆       ∆                                    on the boundaries of the margin
                              ∆
                      ∆
         ∆
                  ∆                           ●               The decision surface (line) is determined
                                                               only by the support vectors. All other
                                              ●                points are irrelevant
                                          ●
                      ●           ●
                                              ●
                                      ●
                          ●
                                                      x1
Summer
         Course
                                                                                                          13/20

         Mining


 Data




                                       Linear SVM: Nonseparable Case
                                                          Non-separable case: there is no line
            Training set: 1000 targeted customers          separating errorlessly the two groups
x2
                                                          Here, SVM minimize L(w,C) :
           ∆      buyers
           ●      non-buyers
                  ∆       ∆                                     L( w,C )  w
                                                                                2
                                                                                    2         C  i
                               ∆                                                                   i
                      ∆                 ∆
         ∆                                                                 maximize         minimize the
                              ●        ●                                   the margin       training errors
                   ∆                         ●
                                                                L(w,C) = Complexity +          Errors
                  ●       ∆                 ∆ ●
                                           ●
                      ●            ●                       subject to:
                                              ●        
                                       ●
                          ●                                   w1xi 1 + w2xi 2 + b  +1 – i      for i  ∆
                                                  x1          w1xj 1 + w2xj 2 + b  –1 + i      for j  ●
                                                              I,j  0
Summer
         Course
                                                                                                                    14/20

         Mining


 Data




                                             Linear SVM: The Role of C
                  x2                                                    x2
                       ∆       ∆                     C=5                        ∆       ∆               C=1
                                      ∆                                                     ∆
                       ∆                         ●                              ∆                       ●
                                             ∆                                                      ∆
                           ●         ●           ●                                          ●
                                         ●                                          ●           ●       ●

                                                       x1                                                   x1
        Bigger C              increased complexity                Smaller C           decreased complexity
                                    ( thinner margin )                                      ( wider margin )

                               smaller number errors                                    bigger number errors
                                   ( better fit on the data )                           ( worse fit on the data )


        Vary both complexity and empirical error via C … by affecting the optimal w and optimal
         number of training errors
Summer
         Course
                                              15/20

         Mining


 Data




                  Bias – Variance trade-off
Summer
         Course
                                                                                                         16/20

         Mining


 Data




                           From Regression into Classification


                     We have a linear model, such as


                       y  b * x  const
                     We have to estimate this relation using our training data set and having in mind
                      the so-called “accuracy”, or “0-1” loss function (our evaluation criterion).

                     The training data set we have consists of only MANY observations, for instance:

                             Output (y)      Input (x)
                             -1              0.2
                              1              0.5
Training data:                1              0.7
                             ...             ..
                              -1             -0.7
    Summer
             Course
                                                                                                  17/20

             Mining


     Data




                                    From Regression into Classification
   We have a linear model, such as

    y  b * x  const                                y

   We have to estimate this relation using our
    training data set and having in mind the so-
                                                      1
    called “accuracy”, or “0-1” loss function (our
    evaluation criterion).
   The training data set we have consists of
    only MANY observations, for instance:            -1


                      Training data:
    Output (y)                Input (x)
                                                                                                  x
    -1                        0.2
     1                        0.5
     1                        0.7                                                                 x
    ...                       ..
                                                     Support vector              Support vector
    -1                        -0.7                                    “margin”
    Summer
             Course
                                                                      18/20

             Mining


     Data




                               From Regression into Classification:
                                    Support Vector Machines
   flatter line  greater penalization
                                          y    y  b * x  const
                      equivalently:
   smaller slope  bigger margin         1



                                          -1



                                                                      x



                                                                      x

                                                           “margin”
Summer
           Course
                                                                                                             19/20

           Mining


    Data




                               From Regression into Classification:
                                    Support Vector Machines


                    y  b1 * x1  b2 * x2  const                   y  b1 * x1  b2 * x2  const



y
                                                             x2




     x2                                                                                                 x1
                                           x1
                                                                        “margin”
          flatter line  greater penalization      equivalently:           smaller slope  bigger margin
Summer
         Course
                                                                                                                       20/20

         Mining


 Data




                                  Nonlinear SVM: Nonseparable Case
                                                          Mapping into a higher-dimensional space

x2
                                                                x11   x12              x11
                                                                                            2
                                                                                                     2 x11x12     x12 
                                                                                                                    2

                                                               x                        2                         2 
                  ∆       ∆                                     21    x 22 
                                                                                       x 21       2 x 21x 22   x 22 
                                                                                                             
                              ∆                                                        2                            
                      ∆                   ∆                                              xl 1                    xl 2 
                                                                                                                    2
         ∆                                                      xl1   xl 2                        2 xl 1xl 2        
                  ∆           ●       ●
                                              ●
                                                          Optimization task: minimize L(w,C)
                  ●       ∆                ∆ ●
                                          ●                                                           C  i
                                                                                    2
                                                                L(w,C )  w             2        
                      ●           ●
                                             ●                                                             i
                                      ●                   subject to:
                          ●                                                                                                ∆
                                                           
                                                                 w1xi2  w2 2 xi 1xi 2  w3 xi22  b  1  i
                                                  x1                 1
                                                                                                                           ●
                                                                w1x 21  w2 2 x j 1x j 2  w3 x 22  b  1  j
                                                           
                                                                    j                           j
Summer
         Course
                                                                                                                     21/20

         Mining


 Data




                                Nonlinear SVM: Nonseparable Case

         Map the data into higher-dimensional space:         2  3
                                                                         1, 1        
                                                                                    1, 2 , 1         ∆
                                  x12 
               x1                                                1,  1        
                                                                                    1, 2 , 1         ∆
                                2 x1 x2 
              x                                                      1,  1        1,        ●
               2                x2                                                        2,1

                                    2                                 1, 1       1,    2,1 ●

                           x2                                   2
                                                               x2

            ●                                                                                              ∆
                                  ∆
                  (-1,1)              (1,1)
                                           x1                                                 ●

            ∆                     ●                                                                            x12
                  (-1,-1)             (1,-1)
                                                    2 x1 x2
Summer
         Course
                                                                                                                    22/20

         Mining


 Data




                                Nonlinear SVM: Nonseparable Case

         Find the optimal hyperplane in the transformed space

                                                                        1, 1        
                                                                                   1, 2 , 1         ∆
                                  x12 
               x1                                               1,  1        
                                                                                   1, 2 , 1         ∆
                                2 x1 x2 
              x                                                     1,  1        1,        ●
               2                x2                                                       2,1

                                    2                                1, 1       1,    2,1 ●

                           x2                                  2
                                                              x2

            ●                                                                                             ∆
                                  ∆
                  (-1,1)              (1,1)
                                           x1                                                ●

            ∆                     ●                                                                           x12
                  (-1,-1)             (1,-1)
                                                    2 x1 x2
Summer
         Course
                                                                                                                  23/20

         Mining


 Data




                            Nonlinear SVM: Nonseparable Case

         Observe the decision surface in the original space (optional)

                                                                      1, 1        
                                                                                 1, 2 , 1         ∆
                              x12 
               x1                                             1,  1        
                                                                                 1, 2 , 1         ∆
                            2 x1 x2 
              x                                                   1,  1        1,        ●
               2            x2                                                         2,1

                                2                                  1, 1       1,    2,1 ●

                       x2                                    2
                                                            x2

            ●                                                                                           ∆
                              ∆

                                    x1                                                      ●

            ∆                 ●                                                                             x12
                                                  2 x1 x2
Summer
         Course
                                                                                                                   24/20

         Mining


 Data




                                   Nonlinear SVM: Nonseparable Case

         Dual formulation of the (primal) SVM minimization problem

                           Primal                                      Dual
                               2

            min
                          w
                           2
                                       C  i         max          
                                                                  i
                                                                      i
                                                                           1
                                                                           2
                                                                               i   j
                                                                                       i   j   yi yj  xi  xj 
                                                                                                     
                                                                                                     
                                                                                                               
                                                                                                               
                                            i


           Subject to                                 Subject to

                  yi w  x i   b  1   i
                                                       0  i  C
              i  0
                  yi   1                           
                                                        i
                                                             i   yi  0

                                                       yi   1
Summer
         Course
                                                                                                                                                    25/20

         Mining


 Data




                                        Nonlinear SVM: Nonseparable Case

         Dual formulation of the (primal) SVM minimization problem

                                            x12                                                 Dual
             x1                                  
                                          2 x1 x2 
            x 
             2                            x2 
                                              2     
                                                                                    max        
                                                                                             i
                                                                                                 i
                                                                                                      1
                                                                                                      2
                                                                                                              i   j
                                                                                                                        i   j   yi yj  xi  xj 
                                                                                                                                      
                                                                                                                                      
                                                                                                                                                
                                                                                                                                                


         ( xi )  ( x j ) 
         x ,     2
                  i1      2 xi1 xi 2 , xi22    x
                                              
                                                     2
                                                     j1                      
                                                          , 2 x j1 x j 2 , x 22 
                                                                             j

         ( x          i1 , xi 2 )  ( x j1 , x j 2 )        2
                                                                  
          x x   i         j
                                 2



                                                                                    Subject to
         K ( xi , x j )  ( xi )  ( x j )
                         (kernel function)
                                                                                    0  i  C          i
                                                                                                                  i   yi  0        yi   1
Summer
         Course
                                                                                                                                                                     26/20

         Mining


 Data




                                           Nonlinear SVM: Nonseparable Case

         Dual formulation of the (primal) SVM minimization problem

                                               x12                                                       Dual
             x1                                     
                                             2 x1 x2 
            x 
             2                               x2 
                                                 2     
                                                                                           max         
                                                                                                     i
                                                                                                          i
                                                                                                               1
                                                                                                               2
                                                                                                                           i   j
                                                                                                                                     i   j   yi yj  xi  xj 
                                                                                                                                                   
                                                                                                                                                   
                                                                                                                                                             
                                                                                                                                                             


         ( xi )  ( x j ) 
         x ,     2
                  i1         2 xi1 xi 2 , x2
                                               x
                                           i2 
                                                     2
                                                     j1   , 2 x j1 x j 2 , x   2
                                                                               j2      max    2
                                                                                                i  1 i j yi yj  ( xi )  ( xj )
                                                                                                i          i
                                                                                                                     
                                                                                                                     
                                                                                                                       j
                                                                                                                                        
                                                                                                                                        


         ( x          i1   , xi 2 )  ( x j1 , x j 2 )      2
                                                                  
          x x   i           j
                                   2
                                                                                           max       i  12 i j yi yj  xi  xj 
                                                                                                                                                                 2

                                                                                                     i                     i   j

                                                                                           Subject to
         K ( xi , x j )  ( xi )  ( x j )
                            (kernel function)
                                                                                           0  i  C            i
                                                                                                                               i   yi  0        yi   1
Summer
         Course
                                                                                          27/20

         Mining


 Data




                           Strengths and Weaknesses of SVM


                 Strengths of SVM:
                   Training is relatively easy

                   No local minima

                   It scales relatively well to high dimensional data

                   Trade-off between classifier complexity and error can be controlled

                    explicitly via C
                   Robustness of the results

                   The “curse of dimensionality” is avoided




                 Weaknesses of SVM:
                     What is the best trade-off parameter C ?
                     Need a good transformation of the original space
Summer
         Course
                                                                           28/20

         Mining


 Data




                              The Ketchup Marketing Problem


                     Two types of ketchup: Heinz and Hunts
                     Seven Attributes
                       Feature Heinz

                       Feature Hunts

                       Display Heinz

                       Display Hunts

                       Feature&Display Heinz

                       Feature&Display Hunts

                       Log price difference between Heinz and Hunts


                     Training Data: 2498 cases (89.11% Heinz is chosen)
                     Test Data: 300 cases (88.33% Heinz is chosen)
Summer
           Course
                                                                                                                29/20

           Mining


    Data




                                The Ketchup Marketing Problem
                                                       Choose a kernel mapping:
                    Cross-validation mean squared
                     errors, SVM with RBF kernel        K (xi , x j )  (xi  x j )         Linear kernel
                                                        K (xi , x j )  (xi  x j  1)d     Polynomial kernel
                                                                           xi xj / 2 2
                                                                                  2

                                                        K (xi, xj )  e                     RBF kernel



                                                       Do (5-fold ) cross-validation procedure to
C                                                       find the best combination of the manually
                                                        adjustable parameters (here: C and σ)

                    min   max




                                   σ
Summer
         Course
                                                                                                               30/20

         Mining


 Data




                    The Ketchup Marketing Problem – Training Set



                  Model


                  Linear Discriminant                      Heinz
                                                                   Predicted Group
                                                                   Membership         Total
                  Analysis
                                                                   Hunts     Heinz                  Hit Rate
                                        Original   Count   Hunts       68       204           272   89.51%

                                                           Heinz       58      2168      2226

                                                   %       Hunts   25.00%   75.00%    100.00%

                                                           Heinz    2.61%   97.39%    100.00%
Summer
         Course
                                                                                                              31/20

         Mining


 Data




                    The Ketchup Marketing Problem – Training Set



                  Model

                                                                  Predicted Group
                  Logit Choice Model                      Heinz   Membership         Total

                                                                  Hunts     Heinz                  Hit Rate
                                       Original   Count   Hunts      214        58           272   77.79%

                                                          Heinz      497      1729      2226

                                                  %       Hunts   78.68%   21.32%    100.00%

                                                          Heinz   22.33%   77.67%    100.00%
Summer
         Course
                                                                                                          32/20

         Mining


 Data




                    The Ketchup Marketing Problem – Training Set



                  Model


                  Support Vector                      Heinz
                                                              Predicted Group
                                                              Membership         Total
                  Machines
                                                              Hunts     Heinz                  Hit Rate
                                   Original   Count   Hunts      255        17           272   99.08%

                                                      Heinz        6      2220      2226

                                              %       Hunts   93.75%     6.25%   100.00%

                                                      Heinz    0.27%   99.73%    100.00%
Summer
         Course
                                                                                                           33/20

         Mining


 Data




                      The Ketchup Marketing Problem – Training Set



                   Model

                                                               Predicted Group
                  Majority Voting                      Heinz   Membership         Total

                                                               Hunts     Heinz                  Hit Rate
                                    Original   Count   Hunts        0       272           272   89.11%

                                                       Heinz        0      2226      2226

                                               %       Hunts      0%      100%    100.00%

                                                       Heinz      0%      100%    100.00%
Summer
         Course
                                                                                                               34/20

         Mining


 Data




                          The Ketchup Marketing Problem – Test Set



                  Model


                  Linear Discriminant                      Heinz
                                                                   Predicted Group
                                                                   Membership         Total
                  Analysis
                                                                   Hunts     Heinz                  Hit Rate
                                        Original   Count   Hunts        3        32           35    88.33%

                                                           Heinz        3       262           265

                                                   %       Hunts    8.57%   91.43%    100.00%

                                                           Heinz    1.13%   98.87%    100.00%
Summer
         Course
                                                                                                                 35/20

         Mining


 Data




                           The Ketchup Marketing Problem – Test Set



                   Model

                                                                  Predicted Group
                  Logit Choice Model                      Heinz   Membership            Total

                                                                  Hunts     Heinz                     Hit Rate
                                       Original   Count   Hunts       29            6           35       77%

                                                          Heinz       63       202              265

                                                  %       Hunts   82.86%   17.14%       100.00%

                                                          Heinz   23.77%   76.23%       100.00%
Summer
         Course
                                                                                                            36/20

         Mining


 Data




                           The Ketchup Marketing Problem – Test Set



                   Model


                  Support Vector                        Heinz
                                                                Predicted Group
                                                                Membership         Total
                   Machines
                                                                Hunts     Heinz                  Hit Rate
                                     Original   Count   Hunts       25        10           35    95.67%

                                                        Heinz        3       262           265

                                                %       Hunts   71.43%   28.57%    100.00%

                                                        Heinz    1.13%   98.87%    100.00%
•37/36



         •Part II
         •Penalized classification and regression methods



            Support Hyperplanes

            Nearest Convex Hull classifier

            Soft Nearest Neighbor

            Application:
             An example Support Vector Regression financial study

            Conclusion
•38/36



         •Classification:
         •Support Hyperplanes


                             •+                                       •+

                   •+                                        •+
                 •+ •+
                      •+                                   •+  •+•+
                      •+                                         •+

                         


          •Consider a (separable) binary             •There are infinitely many
          classification case: training data (+,-)   hyperplanes that are semi-consistent
          and a test point x.                        (= commit no error) with the training
                                                     data.
•39/36



         •Classification:
         •Support Hyperplanes



                             •+                                      •+
                             •Support hyperplane

                  •+                                       •+
                             of x

                     •+                                      •+•+
                •+ •+                                    •+
                     •+                                        •+

                         

          •For the classification of the test      •The SH decision surface. Each
          point x, use the farthest-away h-        point on it has 2 support h-planes.
          plane that is semi-consistent with
          training data.
•40/36


          •Classification:
          •Support Hyperplanes
              SH, linear kernel      SH, RBF kernel, =5      SH, RBF kernel,=35
                     •+                       •+                       •+
                •+ •+                    •+ •+                    •+ •+
              •+ •+                    •+ •+                    •+ •+
                   •+                       •+                       •+


             SVM, linear kernel     SVM, RBF kernel, =5     SVM, RBF kernel,=35
                     •+                       •+                       •+
                •+ •+                    •+ •+                    •+ •+
              •+ •+                    •+ •+                    •+ •+
                   •+                       •+                       •+


         •Toy Problem Experiment with Support Hyperplanes and Support Vector Machines
•41/36



             •Classification:
             •Support Vector Machines and Support Hyperplanes




         •   Support Vector                 •   Support Hyperplanes
             Machines
•42/36



             •Classification:
             •Support Vector Machines and Nearest Convex Hull cl.




         •   Support Vector                   •   Nearest Convex Hull
             Machines                             classification
•43/36



             •Classification:
             •Support Vector Machines and Soft Nearest Neighbor




         •   Support Vector                  •   Soft Nearest Neighbor
             Machines
•44/36




             •Classification: Support Hyperplanes




         •   Support Hyperplanes               •    Support Hyperplanes
         •   (bigger penalization)
•45/36




                 •Classification: Nearest Convex Hull classification




         •   Nearest Convex Hull classification      •   Nearest Convex Hull
                                                         classification
                 •   (bigger penalization)
•46/36




                 •Classification: Soft Nearest Neighbor




         •       Soft Nearest Neighbor              •   Soft Nearest Neighbor
             •   (bigger penalization)
•47/36



         •Classification: Support Vector Machines,
         •Nonseparable Case




                       •   Support Vector Machines
•48/36



         •Classification: Support Hyperplanes,
         •Nonseparable Case




                        •   Support Hyperplanes
•49/36



         •Classification: Nearest Convex Hull classification,
         •Nonseparable Case




                      •   Nearest Convex Hull
                          classification
•50/36



         •Classification: Soft Nearest Neighbor,
         •Nonseparable Case




                        •   Soft Nearest Neighbor
•51/36




                Summary: Penalization Techniques for Classification




     •Penalization methods for classification: Support Vector Machines (SVM), Support Hyperplanes (SH), Nearest
     Convex Hull classification (NCH), and Soft Nearest Neighbour (SNN). In all cases, the classificarion of test point x is
     dete4rmined using the hyperplane h. Equivalently, x is labelled +1 (-1) if it is farther away from set S_ (S+).
Summer
         Course
                                                                                         52/20

         Mining


 Data




                      Conclusion


                     Support Vector Machines (SVM) can be applied in the binary
                      and multi-class classification problems

                     SVM behave robustly in multivariate problems

                     Further research in various Marketing areas is needed to justify
                      or refute the applicability of SVM

                     Support Vector Regressions (SVR) can also be applied


                     http://www.kernel-machines.org

                     Email: nalbantov@few.eur.nl

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:6
posted:9/7/2011
language:English
pages:52