Document Sample

Support Vector Machine Chan-Gu Gang, MK Hasan and Ha-Yong, Jung 2004/11/17 Introduction Learning Theory Objective: Two classes of objects new object assign it to one of the two classes Binary Pattern Recognition (Binary Classification) xi : pattern, case, input, instance, .. X : domain (where the values of xi are taken from) yi : label, target, output In order to map xi values to yi values, we need the notion of similarity in X and yi For yi : trivial For X : ? Support Vector Machine 3 Similarity Measure Given two patterns x and x', returns a real number characterizing their similarity k is called a kernel Support Vector Machine 4 Simple Example of Similarity Measure : Dot Product Dot Product of Vectors However, the patterns xi are not yet in the form of vectors. The patterns xi don't exist in dot product space yet. They can be any kind of object. To use dot product as a similarity measure, Transform patterns into vectors (Dot Product Space H) Three benefits of the transformation (into vector form) define similarity measure from the dot product in H deal with the patterns geometrically apply linear algebra and analytic geometry freedom to choose the mapping phi enable large variety of similarity measures and learning algorithms change the representation into one that is more suitable for the given problem Support Vector Machine 5 Simple Pattern Recognition Algorithm Basic idea: Assign a previously unseen pattern to the class with closer mean. Means of two classes (+,-) When c is mid-point of c+ and c- x will be classified into the class with the closer mean Support Vector Machine 6 Decision Function From above formulas, substitution result Support Vector Machine 7 Parzen Windows Estimators Condition for the resulting Decision Function to be Bayes Classifier The class means have the same distance to the origin b=0 k is a probability density function which of the p+(x) and p-(x) is larger the new sample is labeled Support Vector Machine 8 Generalization Can be generalized to Support Vector Machine 9 To Make the Classification Technique More Sophisticated Two ways to make it more sophisticated selection of the patterns on which the kernels are centered remove the influence of patterns that are very far away from the decision boundary because we expect that they will not improve the generalization error of the decision function, or to reduce the computational cost of evaluating the decision function choice of weights ai that are placed on the individual kernels in the decision function the weights above are only (1/m+) or (1/m-) above more variety of weights Support Vector Machine 10 Some Insights from Statistical Learning Theory Some exceptions are allowed ("outliers") the boundary is ambiguous Almost linear separation of the classes misclassifies the two outliers , and other "easy" points which are so close to the decision boundary that the classifier really should be able to get them right. Compromise gets most points right, without putting too much trust in any individual point Support Vector Machine 11 More into Statistical Learning Theory Put the above intuitive arguments in a mathematical framework Assumption: data are generated independently with probability distribution of P(x,y) iid (independent and identically distributed) Goal: find a function f that will correctly classify unseen examples (x,y) Measurement of correctness: zero-one loss function C(x,y,f(x)) := 0.5*|f(x)-y| Without restriction on the set of functions from which we choose our estimated f, might not generalize well. Support Vector Machine 12 Training Error & Test Error Minimizing the training error (empirical risk) does not imply a small test error (risk) Restrict the set of functions from which f is chosen to one that has a capacity suitable for the amount of available training data Support Vector Machine 13 VC Dimension Each function of the class separates the patterns in a certain way. Labels are {+1|-1} At most 2^m different labeling for m patterns “Shatter”: when a function class can realize all 2^m separations, it is “Shatter”ing the m points VC Dimension: largest m such that there exists a set of m points which the class can shatter, and infinity if no such m exists one number summary of a learning machine’s capacity Support Vector Machine 14 Example of VC Bound If h<m is the VC dimension of the class of functions that the learning machine can implement, For all functions of that class, independent of underlying probability distribution generating the data, With a probability of at least 1-delta To reproduce the random labeling by correctly separating all training examples, this machine will require a large VC Dimension h. phi(h,m,delta) will be large Small training error does not guarantee a small test error To get nontrivial predictions of bound, The function class must be restricted Capacity is small enough Class should be large enough : to provide functions that can model the dependencies hidden in P(x,y) The choice of the set of functions is crucial for learning from data Support Vector Machine 15 Kernel Machine Classifier Hyper plane Classifier We have a set of points Each point belongs to class +1 or class -1 The points are linearly separable Support Vector Machine 17 A Point Set Support Vector Machine 18 Growing Ball Support Vector Machine 19 Growing Ball Several hyper planes exists Support Vector Machine 20 Growing Balls Bigger balls fewer hyper planes Support Vector Machine 21 Growing Balls A single hyper plane is left Support Vector Machine 22 Growing Ball Support vectors Support Vector Machine 23 Why Maximum Margin Generalization capability increases with increasing margin We are skipping the proof of this statement The problem can be solved using quadratic programming technique which is quite efficient A single global optimum exists. Turning point of choosing Support Vector Machine instead of Neur al Network as a tool. Support Vector Machine 24 How to get hyper plane with maximum margin Support Vector Machine 25 How to get optimum margin Support Vector Machine 26 Formulation Support Vector Machine 27 What if the points are not linearly separable We can map the points into some higher dimension space using some nonlinear transformation so that the points become linearly separable in higher dimension. Support Vector Machine 28 How to avoid computation to map in higher dimension We always use the dot product of the input vector Let Φ(x) be the function that map input vector xi to the vector in some higher dimension. If we can compute k(xi,xj) = Φ(xi).Φ(xj) without calculating Φ(xi) and Φ(xj) individually then we can save the time to map the input vectors in some higher dimension at the same time we can use the previous formulation in input space with nonlinear decision boundary. This k(x,y) is called the kernel function. Support Vector Machine 29 Formulation using kernel function Support Vector Machine 30 Some Applications Text Categorization Why is it needed? As the volume of electronic information increases, there is growing interest in developing tools to help people better find, filter, and manage these resources. What is it? The assignment of natural-language texts to one or more predefined categories based on their contents Text Categorization Representing Text: a bag of words in a document Feature Selection: because of too big feature dimension Machine Learning Extended Applications Patent classification, Spam-mail filtering, Categorization of Web pages, Automatic essay grading(?), … Support Vector Machine 32 Text Categorization with SVM Conventional Learning Methods Naïve Bayes Classifier Rocchio Algorithm Decision Tree Classifier k-Nearest Neighbors Experiments Test Collections Reuters-21578 dataset 9603 training, 3299 test, 90 categories, 9947 distinct terms Direct correspondence, single category Ohsumed corpus 10000 training, 10000 test, 23 Mesh “diseases” categories, 15561 distinct terms Less direct correspondence, multiple category Support Vector Machine 33 Text Categorization with SVM Almost SVMs perform better independent of the choice of parameters. Best No overfitting SVM is better than k-NN on 62 of the 90 categories (20 ties), which is a significant improvement Fail Best SVM according to outperforms the binomial k-NN on all sign test 23 categories Support Vector Machine 34 Text Categorization with SVM Why Should SVMs Work Well for Text Categorization? High dimensional input space SVMs use overfitting protection which does not necessarily depend on the number of features Few irrelevant feature Even features ranked lowest still contain considerable information and are somewhat relevant a good classier should combine many features Document vectors are sparse the mistake bound model that additive algorithms which have a similar inductive bias like SVMs are well suited for problems with dense concepts and sparse instances Most text categorization problems are linearly separable The idea of SVMs is to find such linear (or polynomial, RBF, etc) separators Support Vector Machine 35 TREC11 Kernel Methods for Document filtering (MIT) Ranking 1. Adaptive T11F/U-assessor 1st 2. Batch T11F/U-assessor/intersection 1st 3. Routing assessor/intersection 1st Feature: Words in documents Filtering – Digits, words below two times Title has double weight Applying various kernel Second-order perceptron (2) SVM uneven margin SVM + new threshold-selection (3) Conclusion Good ranking except intersection topics More Complex Kernel, poorer results Various performance by each category Support Vector Machine 36 Face Detection We can define the face-detection problem as follows: 1. Given as input an arbitrary image, which could be a digitized video signal or a scanned photograph, 2. determine whether there are any human faces in the image, 3. and if there are, return an encoding of their location. 4. The encoding in this system is to fit each face in a bounding box defined by the image coordinates of the corners. It can be extended to many applications Face-recognition, HCI, surveillance systems, … Support Vector Machine 37 Applying SVMs to Face Detection Overview of overall process Training on a database of face and nonface patterns (fixed size) using SVM. Testing candidate image locations for local patterns that appear like faces, using a classification procedure that determines whether a given local image pattern is a face. the face-detection problem a classification problem : faces or nonfaces. Support Vector Machine 38 Applying SVMs to Face Detection The SVM face-detection system 1. Rescale the 2. Cut 19x19 input image window patterns 4. Classify the several times out of the scaled pattern using image the SVM 5. If the class corresponds 3. Preprocess the to a face, draw a rectangle window using masking, around the face in the light correction and output image. histogram equalization Support Vector Machine 39 Applying SVMs to Face Detection Experimental results on static images Set A: 313 high-quality, same number of faces Set B: 23 mixed quality, total of 155 faces Support Vector Machine 40 Applying SVMs to Face Detection Extension to a real-time system An example of the skin detection module implemented using SVMs Face Detection on the PC- based Color Real Time System Support Vector Machine 41 Summary Single layer neural network have simple and efficient learning algorithm, but have very limited expressive power. Multilayer networks, on the other hand, are much more expressive but are very hard to be trained. Kernel machine overcomes this problem. That is it can be trained very easily and at the same time, it can represent complex nonlinear function. Kernel machine is very efficient in hand writing recognition, text categorization, and face recognition. Support Vector Machine 42

DOCUMENT INFO

Shared By:

Categories:

Tags:
Oxford University Press, historical linguistics, language change, Cambridge, Linguistic Inquiry, Journal of Linguistics, Cambridge University Press, Canadian Journal, syntactic change, David Lightfoot

Stats:

views: | 6 |

posted: | 3/23/2011 |

language: | English |

pages: | 42 |

OTHER DOCS BY sanmelody

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.