Second order cone programming approaches for handing missing and

W
Document Sample
scope of work template
							Second order cone programming
approaches for handing missing
      and uncertain data
   P. K. Shivaswamy, C. Bhattacharyya
             and A. J. Smola

                               Discussion led by
                               Qi An
                               Mar 30th, 2007
                  Outline
•   Missing and uncertain Data problem
•   Problem formulation
•   Classification with uncertainty
•   Extensions
•   Experimental results
•   Conclusions
        Missing data problem
• Consider a problem of classification or regression
  with missing data (only in feature).
  – Traditional method: simple imputation
  – Proposed method: robust estimation
• Formulate the classification or regression problem
  into an optimization problem as long as we have
  information on the first and second moments of
  data
Classification problem with missing
                data
• Compute the sample mean and covariance for
  each classes (binary here) from the available
  observations
• Impute the missing data with their conditional
  mean.
• The classification problem can be alternatively
  formulated into an optimization problem, as
  shown in the next slide.
       Classification problem with missing data
Suppose we are given n data points. First c points have complete
feature vectors and last (n-c) points have feature vectors with missing
values.
First, Compute the sample mean and covariance for each classes (binary
here) from the available observations
Then, for any data vector with missing values, we can find the imputed means
and covariances




   Given decomposed sample mean and sample covariance
Then the classification problem is equivalent to
Once we finish estimating the weights, w and b, from the dataset, we
can make the prediction for any new data vector
1, If x has no missing values, go to 4
2, if x has missing values, fill the missing values xm using the
parameters of each class to get two imputed data x+ and x-
3, Find the distance of imputed data from hyper-plane, d+ and d-, and
choose the imputed sample with higher distance




4, classify the data using sgn(w’x+b)
Classification with certainty (SVM)
                          w, x  b  1
      Linear separable
                          w, x  b  0
 -1                         w, x  b  1




                  +1

                             Classification problem can be solved by
                             solving this optimization problem
Classification with certainty (SVM)
                                w, x  b  1
  Linear non-separable
                                w, x  b  0
  -1                              w, x  b  1

                                          Two equivalent formulations:




                  +1




second order cone constraint


   These two formulations are equivalent by choosing suitable C and W
  Classification with uncertainty
• Here, uncertainty means that for each pair (xi,yi)
  we only have a distribution over xi instead of a
  value for xi. As a result, xi is a random variable.
• In this case, we rewrite the constraint in a
  probabilistic form

   In other words, we require that the random variable xi lies on
   the correct side of the decision hyper-plane with some
   probability greater than a pre-set threshold κi
              If we assume each xi has mean x i and variance Σi. We want to be
Robust        able to classify correctly even for the worst distribution in this class.
formulation   The previous constraint becomes




              If we assume each xi has mean x i and variance Σi and follows a
Normal        normal distribution. This should allow us to provide tighter bounds,
formulation   as we have perfect knowledge on how xi is distributed
              The previous constraint becomes




       It can be proven that both of these two formulations lead to the
       same optimization problem by using multivariate Chebyshev
       inequality, which is summarized in Theory 1.
This optimization problem can be solved efficiently using various optimization
methods
       Geometric interpretation of
              constraint
• Constraint 10(b) can be interpreted in a geometric
  way
  – If we assume x takes value in an ellipsoid



  – The robustness constraint 10(b) is equivalent to the
    geometric constraint below
            Error measure
• When classifying a point
  – Worst case error
  – Expected error
                      Extensions
• The optimization problem can be extended
  – Regression problem
  – Multi-class classification/regression
  – Some different constraints
  – Kernelized formulation


 Go back to the missing feature problem
                 Experiments
• In a OCR data reorganization problem, compare
  SVM with simple imputation and proposed
  approach


         Some samples were misclassified by SVM but were correctly
         classified by the robust classifier
Ionospere regression problem
               Conclusions
• This paper propose a second order cone
  programming formulation for designing robust
  linear prediction function.
• This approach is capable of tackling uncertainty
  in the data vectors both in classification and
  regression setting
• It is applicable to any uncertainty distribution
  provided the first two moments are computable