Document Sample

How the SVM classifies instances: The SVM constructs a decision function that is represented in “dual space” by: p D( x) k K ( x k , x) b k Where: D(x) is the decision function. p is the number of training examples in the training set. is a learned parameter associated with the k’th training example. K is the kernel function which is taking the k’th training example and the current input x. b is a learned bias which is the same across all examples. So, here and b are the parameters which are learned. The kernel function used in the paper looks like this: K ( x k , x) ( x k ) ( x) So we are taking the dot product of mapped versions of both the current training example vector and the given input vector to be classified. But let’s just leave out the mappings for simplicity. We have the dot product of xk and x, which can be represented as: xk x xk x cos( ) where represents the angle between the two vectors and | | calculates the magnitude of the vectors. This is of course obvious to anyone who knows calculus, but it may help those who do not. So, if two vectors are in complete opposite directions from each other, you have cos(180) = -1. So you get negative result with the maximum magnitude. Conversely, if the two vectors are very similar, the angle would be 0, so you get cos(0) = 1. Your kernel function then gives a positive result if the given input example should be given the same label as the current training example. It is negative if they are in opposite classes. The absolute value of K() gives the degree of this difference of similarity. How is this translated into the decision function? If you consider two classes, A and B, an instance is in class A if the decision function yields a positive result, otherwise it is in class B. So the decision function must be adjusted so that every training example with label B gives a negative result (and positive for one with an A label). This is accomplished through the adjusting of the parameter. For a training instance xk that has a label yk=B, then its should be negative, so that it contributes a negative result when compared to a similar instance x (K(xk,x) would then be positive). Then if the input instance was very different, the kernel function would be negative and the resultant quantity that is added to the sum would be positive (because one then considers it as part of class A, not B, which means D(x) should be positive). How the SVM learns: The alpha parameter is adjusted in such a way that it maximizes the distances between the nearest features in each class when plotting D(x) = 0. I will describe how the margin the decision function is learned in direct space, where the decision function is: D( x ) w * ( x ) b where w is a vector of weights of the same dimensionality as the input vectors (this is just like the activation function for a node on a neural network). D( x) The distance between the hyperplane and the example x is: |w| The objective is to find w such that |w| = 1 and the margin M is maximized, subject to the constraint that the distance for every element which we can now express as ykD(xk) is greater than or equal to M (yk here is used to compensate for the sign, depending on the class, so yA = 1 and yB = -1). The support vectors will be where the distance is equal to the margin. So the bound M* for this maximal margin is equal to the distance of the closest instance and this all becomes a minimax problem: max min yk D( xk ) w,| w| 1 k It possible to derive an identical expression in dual space, which is where the decision function relies a kernel, as I discussed above, but I don’t feel that I have the background to explain it. The article that explains all of this is by Vapnik “A Training Algorithm for Optimal Margin Classifiers.” You can find this article at portal.acm.org. Michael Lawrence

DOCUMENT INFO

Shared By:

Categories:

Tags:

Stats:

views: | 11 |

posted: | 11/27/2011 |

language: | English |

pages: | 2 |

OTHER DOCS BY stariya

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.