Lecture 12: Automated Reasoning under Uncertainty
Shared by: noM8ZI4
-
Stats
- views:
- 2
- posted:
- 6/14/2012
- language:
- pages:
- 81
Document Sample


Lecture 4: Homework Discussion,
and more on Classification
CS 175, Fall 2007
Padhraic Smyth
Department of Computer Science
University of California, Irvine
Outline
• Discussion of Assignment 1
• Classification revisited
• Discussion of Assignment 2
– Due Wednesday (tomorrow) at noon
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 2
Grading of Assignment 1
• 40 points total
– Each MATLAB function = 10 points
– euclidean.m, nearest_neighbor.m, maxvalue.m
• functioning correctly on the test cases: +6 points
• comments: +2 points
• error-checking: +2 points
– test case example:
• x: random vector of length 100
• A: random matrix with 100 rows and 100 columns
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 3
Comments on Grading
• Common mistakes
– incorrect definition of Euclidean distance
• dE(x, y) = sqrt(S (xi - yi)2 )
– no error-checking
• nearest_neighbor(x, A) => check that
– cols(A) = cols(x)
– rows(x) = 1
– no comments
• no comments in header
• no comments in body of nearest_neighbor.m
• If you find any errors in the grading of your assignment please
see Nathan during lab hours (or email him to make an
appointment)
– no “grade negotiating”!
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 4
Suggestions
• Improve the performance by vectorization
– can speed-up significantly
– e.g., calculate vector distance in Euclidean.m function
• Do not output input / intermediate / output variables to screen
– can increase your run-time significantly
– use semicolon in the end of each line
• Helpful commands:
– To learn more about the function:
• help
– To find a built-in function:
• lookfor
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 5
Suggestions (2)
• Test your code (!)
– Some .m functions that were submitted did not run
– 2 types of errors:
• Simple syntax errors (understandable)
• Systematic errors
– Incorrect calculations (e.g.,for Euclidean.m)
– Incorrect logic in finding the minimum vector
– Sloppy assignment of variables to values
– How to address this:
• Define a set of simple test cases
• Run your code and compare with manual calculation
• Check that the results make intuitive sense
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 6
Example of a Euclidean.m function
function dist = euclidean(x,y)
% function dist = euclidean(x,y)
%
% Calculates the Euclidean distance between two vectors x and y
% A. Student, CS 175
% Inputs:
% x, y: 2 vectors of real numbers, each of size 1 x n
% Outputs:
% dist: the Euclidean distance between x and y
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 7
Example of a Euclidean.m function
function dist = euclidean(x,y)
% function dist = euclidean(x,y)
%
% Calculates the Euclidean distance between two vectors x and y
% A. Student, CS 175
% Inputs:
% x, y: 2 vectors of real numbers, each of size 1 x n
% Outputs:
% dist: the Euclidean distance between x and y
[xr, xc] = size(x);
[yr, yc] = size(y); Error Checking
if (xc ~= yc)
error('input vectors must be the same length');
end
if (xr ~= 1 | yr ~= 1)
error('inputs must both be row vectors (1 row, n columns)');
end
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 8
Example of a Euclidean.m function
function dist = euclidean(x,y)
% function dist = euclidean(x,y)
%
% Calculates the Euclidean distance between two vectors x and y
% A. Student, CS 175
% Inputs:
% x, y: 2 vectors of real numbers, each of size 1 x n
% Outputs:
% dist: the Euclidean distance between x and y
[xr, xc] = size(x);
[yr, yc] = size(y);
Note the use of
if (xc ~= yc) vectorization
error('input vectors must be the same length');
end
if (xr ~= 1 | yr ~= 1)
error('inputs must both be row vectors (1 row, n columns)');
end
% calculate a vector of component_by_component distances
delta = x - y;
% now calculate the Euclidean distance
dist = sqrt(delta*delta’);
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 9
Min.m function in MATLAB
» help min
MIN Smallest component.
For vectors, MIN(X) is the smallest element in X. For matrices,
MIN(X) is a row vector containing the minimum element from each
column. For N-D arrays, MIN(X) operates along the first
non-singleton dimension.
[Y,I] = MIN(X) returns the indices of the minimum values in vector I.
If the values along the first non-singleton dimension contain more
than one minimal element, the index of the first one is returned.
MIN(X,Y) returns an array the same size as X and Y with the
smallest elements taken from X or Y. Either one can be a scalar.
[Y,I] = MIN(X,[],DIM) operates along the dimension DIM.
When complex, the magnitude MIN(ABS(X)) is used. NaN's are ignored
when computing the minimum.
Example: If X = [2 8 4 then min(X,[],1) is [2 3 4],
7 3 9]
min(X,[],2) is [2 and min(X,5) is [2 5 4
3], 5 3 5].
See also MAX, MEDIAN, MEAN, SORT.
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 10
Example of a maxvalue.m function
function [maxvalue, rmax, cmax] = maxvalue(A);
% function [maxvalue, rmax, cmax] = maxvalue(A);
% brief description of function here
% Your Name, CS 175
%
% Inputs
% A: a matrix of size r x c, with r rows and c columns
%
% Outputs
% maxvalue: largest entry in A
% rmax, cmax: integers specifying the (row,column)
location of the max value
% Get a row vector containing the maximum value within each column
% Store idx_row - a vector containing the location of
% the maximum within the column
[mx_row, idx_row] = max(A);
% find the maximum within this vector
[maxvalue, cmax] = max(mx_row);
% Use the idx_row to find the row location of the max
rmax = idx_row(cmax);
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 11
Example of a nearest_neighbor.m function
function [y, i, d] = nearest_neighbor(x, A)
% function [y, i, d] = nearest_neighbor(x, A)
%
% Find the row vector y from a matrix of row vectors A
% that is closest in Euclidean distance to row vector x.
% A. Student, CS 175
%
% Inputs:
% x: a vector of numbers of size 1 x n
% A: k vectors of size 1 x n, "stacked" in a k x n matrix
%
% Outputs:
% y: the closes vector in A to x (of size 1 x n)
% i: the integer (row) index of y in A
% d: the Euclidean distance between x and y
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 12
Example of a nearest_neighbor.m function
function [y, i, d] = nearest_neighbor(x, A)
% function [y, i, d] = nearest_neighbor(x, A)
%
% Find the row vector y from a matrix of row vectors A
% that is closest in Euclidean distance to row vector x.
% A. Student, CS 175
%
% Inputs:
% x: a vector of numbers of size 1 x n
% A: k vectors of size 1 x n, "stacked" in a k x n matrix
%
% Outputs:
% y: the closes vector in A to x (of size 1 x n)
% i: the integer (row) index of y in A
% d: the Euclidean distance between x and y Error Checking
[xr, xc] = size(x);
[Ar, Ac] = size(A);
if (xc ~= Ac)
error('input vector x and matrix A must have the same number of columns');
end
if (xr ~= 1)
error('input vector x must be a row vector');
end
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 13
“For loop” version of nearest_neighbor.m function
function [y, i, d] = nearest_neighbor(x, A)
% function [y, i, d] = nearest_neighbor(x, A)
%
..
..
[xr, xc] = size(x);
[Ar, Ac] = size(A);
..
% "for loop" version of code
distances = zeros(Ar,1) % preallocate storage for distances
for j=1:Ar % loop over rows in A
y = A(j,:);
distances(j) = euclidean(x,y);
end
% find the minimum distance and its location
[d i] = min(distances);
% find the vector (the row in A) corresponding to the minimum distance
y = A(i,:);
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 14
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 15
Repmat.m function in MATLAB
» help repmat
» repmat([1 2], 3, 1)
REPMAT Replicate and tile an array.
B = REPMAT(A,M,N) replicates and tiles the matrix A to produce the ans =
M-by-N block matrix B.
1 2
B = REPMAT(A,[M N]) produces the same thing. 1 2
1 2
B = REPMAT(A,[M N P ...]) tiles the array A to produce a
M-by-N-by-P-by-... block array. A can be N-D. »
REPMAT(A,M,N) when A is a scalar is commonly used to produce
an M-by-N matrix filled with A's value. This can be much faster
than A*ONES(M,N) when M and/or N are large.
Example:
repmat(magic(2),2,3)
repmat(NaN,2,3)
See also MESHGRID.
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 16
Vectorized version of nearest_neighbor.m function
function [y, i, d] = nearest_neighbor(x, A)
% function [y, i, d] = nearest_neighbor(x, A)
%
..
..
% VECTORIZED VERSION OF THE CODE
% create a matrix of size Ar x xc, where each row consists of x
xmatrix = repmat(x,Ar,1);
% subtract the components of xmatrix and A, by matrix subtraction
delta = xmatrix - A;
% now square the differences, by component multiplication
squaredelta = delta.*delta;
% sum up the squared differences, row by row (note use of transpose: ')
distances = sqrt(sum(squaredelta')');
% find the minimum distance and its location (as before)
[d i] = min(distances);
% find the vector (the row in A) corresponding to the minimum distance
y = A(i,:);
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 17
Nearest-Neighbor Classification (revisited)
Example of Data from 2 Classes
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
8
6
4
Feature 2
2
0
-2
-4
-4 -2 0 2 4 6 8 10 12 14
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 19
Classifiers and Decision Boundaries
• What is a Classifier?
– A classifier is a mapping from feature space (a d-dimensional
vector) to the class labels {1, 2, … m}
– Thus, a classifier partitions the feature space into m decision
regions
– The line or surface separating any 2 classes is the decision
boundary
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 20
2-Class Data with a Linear Decision Boundary
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
8
Decision Region 2
Decision
Region 1
6
4
Feature 2
2
0
-2
Decision
Boundary
-4
-4 -2 0 2 4 6 8 10 12 14
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 21
Classification Problem with Overlap
8
7
6
5
FEATURE 2
4
3
2
1
0
0 1 2 3 4 5 6 7 8
FEATURE 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 22
8
Minimum Error
7 Decision Boundary
6
5
FEATURE 2
4
3
2
1
0
0 1 2 3 4 5 6 7 8
FEATURE 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 23
Classifiers = functions or mappings
Feature Values (which
are known, measured) Predicted Class Value
(true class is unknown
a to the classifier)
b c
Classifier
d
z
We want a mapping or function which takes any combination of
values x = (a, b, d, ..... z) and will produce a prediction c,
i.e., a function c = f(a, b, d, …. z) which produces a value c=1, c=2,…c=m
The problem is that we don’t know this mapping: we have to learn it from data!
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 24
Classification Accuracy
• Say we have N feature vectors
• Say we know the true class label for each feature vector
• We can measure how accurate a classifier is by how many
feature vectors it classifies correctly
• Accuracy = percentage of feature vectors correctly classified
– training accuracy = accuracy on training data
– test accuracy = accuracy on new data not used in training
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 25
Some Notation
• Training Data
– Dtrain = { [x(1), c(1)] , [x(2), c(2)] , …………[x(N), c(N)] }
– N pairs of feature vectors and class labels
• Feature Vectors and Class Labels:
– x(i) is the ith training data feature vector
– in MATLAB this could be the ith row of an N x d matrix
– c(i) is the class label of the ith feature vector
– in general, c(i) can take m different class values, e.g., c = 1, c =
2, ...
– Let y be a new feature vector whose class label we do not know,
i.e., we wish to classify it.
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 26
Example
Feature 2
1 1
1
2
2
1 1
2 2
1
1 2
1
2 2
2
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 27
kNN Decision Boundary (k=1)
1 In general:
Nearest-neighbor classifier
1 1 produces piecewise linear
Feature 2 2 decision boundaries
2
1 1 2
2
1
1 2
1
2 2
2
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 28
K-Nearest Neighbor (kNN) Classifier
• Find the k-nearest neighbors to y in Dtrain
– i.e., rank the feature vectors according to Euclidean distance
– select the k vectors which are have smallest distance to y
• Classification
– ranking yields k feature vectors and a set of k class labels
– pick the class label which is most common in this set (“vote”)
– classify y as belonging to this class
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 29
K-Nearest Neighbor (kNN) Classifier
• Notes:
– In effect, the classifier uses the nearest k feature vectors from
Dtrain to “vote” on the class label for y
– the single-nearest neighbor classifier is the special case of k=1
– for two-class problems, if we choose k to be odd (i.e., k=1, 3, 5,…)
then there will never be any “ties”
– “training” is trivial for the kNN classifier, i.e., we just use Dtrain as
a “lookup table” when we want to classify a new feature vector
• Extensions of the Nearest Neighbor classifier
– weighted distances
• e.g., if some of the features are more important
• e.g., if features are irrelevant
– fast search techniques (indexing) to find k-nearest neighbors in d-
space
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 30
Assignment 2
• Due Wednesday…..
• 4 parts
1. Plot classification data in two-dimensions
2. Implement a nearest-neighbor classifier
3. Plot the errors of a k-nearest-neighbor classifier
4. Test the effect of the value k on the accuracy of the classifier
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 31
Data Structure
simdata1 =
shortname: 'Simulated Data 1'
numfeatures: 2
classnames: [2x6 char]
numclasses: 2
description: [1x66 char]
features: [200x2 double]
classlabels: [200x1 double]
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 32
Plotting Function
function classplot(data, x, y);
% function classplot(data, x, y);
%
% brief description of what the function does
% ......
% Your Name, CS 175, date
%
% Inputs
% data: (a structure with the same fields as described above:
% your comment header should describe the structure explicitly)
% Note that if you are only using certain fields in the structure
% in the function below, you need only define these fields in the input comments
-------- Your code goes here -------
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 33
First simulated data set, simdata1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 34
Second simulated data set, simdata2
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 35
Nearest Neighbor Classifier
function [class_predictions] = knn(traindata,trainlabels,k, testdata)
% function [class_predictions] = knn(traindata,trainlabels,k, testdata)
%
% a brief description of what the function does
% ......
% Your Name, CS 175, date
%
% Inputs
% traindata: a N1 x d vector of feature data (the "memory" for kNN)
% trainlabels: a N1 x 1 vector of classlabels for traindata
% k: an odd positive integer indicating the number of neighbors to use
% testdata: a N2 x d vector of feature data for testing the knn classifier
%
% Outputs
% class_predictions: N2 x 1 vector of predicted class values
-------- Your code goes here -------
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 36
Plotting k-NN Errors
function knn_plot(traindata,trainlabels,k,testdata,testlabels);
% function knn_plot(traindata,trainlabels,k,testdata,testlabels);
%
% Predicts class-labels for the data in testdata using the k nearest
% neighbors in traindata, and then plots the data (using the first
% 2 dimensions or first 2 features), displaying the data from each
% class in different colors, and overlaying circles on the points
% that were incorrectly classified.
%
% Inputs
% traindata: a N1 x d vector of feature data (the "memory" for kNN)
% trainlabels: a N1 x 1 vector of classlabels for traindata
% k: an odd positive integer indicating the number of neighbors to use
% testdata: a N2 x d vector of feature data for testing the knn classifier
% trainlabels: a N2 x 1 vector of classlabels for traindata
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 37
Accuracy of kNN Classifier as k is varied
function [errors] = knn_error_rates(traindata,trainlabels, testdata, testlabels,kmax,plotflag)
% function [errors] = knn_error_rates(traindata,trainlabels, testdata, testlabels,kmax,plotflag)
%
% a brief description of what the function does
% ......
% Your Name, CS 175, date
%
% Inputs
% traindata: a N1 x d vector of feature data (the "memory" for kNN)
% trainlabels: a N1 x 1 vector of classlabels for traindata
% testdata: a N2 x d vector of feature data for testing the knn classifier
% testlabels: a N2 x 1 vector of classlabels for traindata
% kmax: an odd positive integer indicating the maximum number of neighbors
% plotflag: (optional argument) if 1, the error-rates versus k is plotted,
% otherwise no plot.
%
% Outputs
% errors: r x 1 vector of error-rates on testdata, where r is the
% number of values of k that are tested.
-------- Your code goes here -------
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 38
Training Data and Test Data
• Training data
– labeled data used to build a classifier
• Test data
– new data, not used in the training process, to evaluate how well a
classifier does on new data
• Memorization versus Generalization
– better training_accuracy
• “memorizing” the training data:
– better test_accuracy
• “generalizing” to new data
– in general, we would like our classifier to perform well on new test
data, not just on training data,
• i.e., we would like it to generalize well to new data
• Test accuracy is more important than training accuracy
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 39
Test Accuracy and Generalization
• The accuracy of our classifier on new unseen data is a
fair/honest assessment of the performance of our classifier
• Why is training accuracy not good enough?
– Training accuracy is optimistic
– a classifier like nearest-neighbor can construct boundaries which
always separate all training data points, but which do not separate
new points
• e.g., what is the training accuracy of kNN, k = 1?
– A flexible classifier can “overfit” the training data
• in effect it just memorizes the training data, but does not learn
the general relationship between x and C
• Generalization
– We are really interested in how our classifier generalizes to new
data
– test data accuracy is a good estimate of generalization
performance
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 40
Another Example
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
6
Decision
Region 1 Decision
5 Region 2
4
3
Feature 2
2
1
0
Decision
Boundary
-1
2 3 4 5 6 7 8 9 10
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 41
A More Complex Decision Boundary
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
6
Decision
Region 1 Decision
5 Region 2
4
3
Feature 2
2
1
0
Decision
Boundary
-1
2 3 4 5 6 7 8 9 10
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 42
Example: The Overfitting Phenomenon
Y
X
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 43
A Complex Model
Y = high-order polynomial in X
Y
X
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 44
The True (simpler) Model
Y = a X + b + noise
Y
X
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 45
How Overfitting affects Prediction
Predictive
Error
Error on Training Data
Model Complexity
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 46
How Overfitting affects Prediction
Predictive
Error
Error on Test Data
Error on Training Data
Model Complexity
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 47
How Overfitting affects Prediction
Predictive Underfitting Overfitting
Error
Error on Test Data
Error on Training Data
Model Complexity
Ideal Range
for Model Complexity
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 48
Linear Classifiers
Decision Boundaries
• What is a Classifier?
– A classifier is a mapping from feature space (a d-dimensional
vector) to the class labels {1, 2, … m}
– Thus, a classifier partitions the feature space into m decision
regions
– A line or curve separating the classes is a decision boundary
• in more than 2 dimensions this is a surface (e.g., a
hyperplane)
• Linear Classifiers
– a linear classifier is a mapping which partitions feature space using
a linear function (a straight line, or a hyperplane)
– it is one of the simplest classifiers we can imagine
• “separate the two classes using a straight line in feature space”
– in 2 dimensions the decision boundary is a straight line
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 50
2-Class Data with a Linear Decision Boundary
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
8
Decision Region 2
Decision
Region 1
6
4
Feature 2
2
0
-2
Decision
Boundary
-4
-4 -2 0 2 4 6 8 10 12 14
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 51
Non-Linearly Separable Data, with Decision Boundary
TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
6
Decision
Region 1 Decision
5 Region 2
4
3
Feature 2
2
1
0
Decision
Boundary
-1
2 3 4 5 6 7 8 9 10
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 52
Convex Hull of a Set of Points
• Convex Hull of a set of Q points:
– Intuitively
• think of each point in Q as a nail sticking out from a 2d board
• the convex hull = the shape formed by a tight rubber band
that surrounds all the nails
– Formally: “the convex hull is the smallest convex polygon P for
which each point in Q is either on the boundary of P or in its
interior”
• (p.898, Cormen, Leiserson, and Rivest, Introduction to Algorithms)
• can be found (for n points) in time n log n
• Relation to Class Overlap
– define convex hulls of data points D1 and D2 as P1 and P2
– If P1 and P2 do not intersect => D1 and D2 are linearly separable
– if P1 and P2 intersect, then we have overlap
– If P1 and P2 intersect then D1 and D2 are not linearly separable
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 53
Convex Hull Example
Feature 2
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 54
Convex Hull Example
Convex Hull P1
Feature 2
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 55
Data from 2 Classes: Linearly Separable?
Feature 2 x
x
x
x
x
x
x
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 56
Data from 2 Classes: Linearly Separable?
Convex Hull P1
Feature 2 x
x
x
x
x
x
x
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 57
Data from 2 Classes: linearly separable?
Convex Hull P1 Convex Hull P2
Feature 2 x
x
x
x
x
x
The 2 Hulls intersect x
=> data from each class
are not linearly separable
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 58
Different data that is linearly separable
Convex Hull P1 Convex Hull P2
Feature 2 x
x
x
x
x
x
x
Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 59
Some Theory
Let N be the number of data points
Let d be the dimension of the data points
Consider N points in general position and assume each point is labeled
as belonging to class 1 or class 2
There are 2N possible labelings
Let F(N, d) = the fraction of labelings of N points in d dimensions
that are linearly separable
It can be shown that:
= 1 if d > N-2
F(N, d)
= (1 / 2 N-1 ) Sdi=0 (N-1)! / [ (N-1-i)! i! ] if N > d
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 60
Fraction of Labellings in d-space that are Linearly Separable
F(N,d)
= fraction that
are linearly d = infinity
separable
1
d = 10
0.5
d=1
0 1 2 3 N/(d+1)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 61
Fraction of Labellings in d-space that are Linearly Separable
F(N,d) Note that for N <= d+1,
= fraction that any labeling of N points in
are linearly d-dimensions is linearly separable
separable (e.g., N=3, d = 2 or N=50, d=100)
1
0.5
0 1 2 3 N/(d+1)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 62
A Linear Classifier in 2 Dimensions
Let Feature 1 be called X
Let Feature 2 be called Y
A linear classifier is a linear function of X and Y,
i.e., it computes f(X,Y) = aX + bY + c
Here a, b, and c are the “weights” of the classifier
Define the output of the linear classifier to be
T(f) = -1, if f <= 0
T(f) = +1, if f > 0
if f(X,Y) <= 0, the classifier produces a “-1” (Decision Region 1)
if f(X,Y) > 0, the classifier produces a “+1” (Decision Region 2)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 63
Decision Boundaries for a 2d Linear Classifier
Depending on whether f(X,Y) is > or < 0, the features (X,Y) get
classified into class 1 or class 2
Thus, f(X,Y) = 0 must define the decision boundary between class 1 and 2
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 64
Decision Boundaries for a 2d Linear Classifier
Depending on whether f(X,Y) is > or < 0, the features (X,Y) get
classified into class 1 or class 2
Thus, f(X,Y) = 0 must define the decision boundary between class 1 and 2
What is the equation for this decision boundary?
f(X,Y) = aX + bY + c = 0 OR Y = (c – aX)/b
Thus, defining a, b, and c automatically locates the decision boundary in
X,Y space
In summary:
- a classifier defines a decision boundaries between classes
- for a linear classifier, this boundary is a line or a plane
- the equation of the plane is defined by the parameters of
the classifier
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 65
An Example of a Linear Decision Boundary
14
Decision
Region 1
12
10
8
Decision
6 Region 2
4
2
Decision Boundary
0 defined by
a = 1, b = -1, c = 0
-2
-4
-4 -2 0 2 4 6 8 10 12 14
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 66
A Better Linear Decision Boundary
14
12 Decision
Region 2
10 Decision
Region 1
8
6
4
2
0
Decision Boundary
-2
defined by
a = 1, b = 1, c = 0
-4
-4 -2 0 2 4 6 8 10 12 14
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 67
The Perceptron Classifier (for 2 features)
X
w1
w2
Y f = w1 X + w2 Y + w3 T(f) {-1, +1}
w3 Threshold
Weighted Sum Function Output
1 = class
of the inputs
decision
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 68
The Perceptron Classifier (for 2 features)
X
w1
w2
Y f = w1 X + w2 Y + w3 T(f) {-1, +1}
w3 Threshold
Weighted Sum Function Output
1 = class
of the inputs
decision
Note: weights w1, w2, w3,
are the same as a, b, c in the
previous slides, i.e., f = aX + bY + c
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 69
Perceptrons
• Perceptron = a linear classifier
– The w’s are the weights (denoted as a, b,c, earlier)
• real-valued constants (can be positive or negative)
– Define an additional constant input “1” (allows an intercept in
decision boundary)
• A perceptron calculates 2 quantities:
– 1. A weighted sum of the input features
– 2. This sum is then thresholded by the T function
• A simple artificial model of human neurons
• weights = “synapses”
• threshold = “neuron firing”
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 70
Notation
• Inputs:
– x1, x2, …………, xd, xd+1
– x1, x2, …………, xd-1, xd are the values of the d features
– xd+1 = 1 (a constant input)
– x = (x1, x2, …………, xd, xd+1 )
• Weights:
– w1, w2, …………, wd, wd+1
– we have d+1 weights
– one for each feature + one for the constant
– w = (w1, w2, …………, wd, wd+1 )
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 71
Perceptron Operation
• Equations of operation:
= 1 (if w1x1 +… wd+1 xd+1 > 0)
o[x1, x2,…, xd-1, xd]
= -1 (otherwise)
Note that
w = (w1,….. wd+1) , the “weight vector” (row vector, 1 x d+1)
and x = (x1,…… xd+1), the “feature vector” (row vector, 1 x d+1)
=> w1x1 + w2x2 +… wd+1 xd+1 = w . x’
and w . x’ is the “vector inner product” (w*x’ or w.*x in MATLAB)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 72
Vector Inner Product
This is the transpose
Note that
of the row vector x
(it becomes a column
w . x’ = (w1,….. wd+1) (x1 vector)
x2
..
..
xd
xd+1 )
= w1x1 + w2x2 +… wd+1 xd+1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 73
Perceptron Decision Boundary
• Equations of operation (in vector form):
= 1 (if w . x’ > 0)
o(x1, x2,…, xd, xd+1)
= -1 (otherwise)
The perceptron represents a hyperplane decision surface
in d-dimensional space
e.g., a line in 2d, a plane in 3d, etc
The equation of the hyperplane is
w . x’ = 0
This is the equation for points in x-space that are on the
boundary
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 74
Example of Perceptron Decision Boundary
w = (w1, w2,w3)
= (1, -1, 0)
x2
x1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 75
Example of Perceptron Decision Boundary
w = (w1, w2,w3)
= (1, -1, 0)
w . x’ = 0
x2
=> 1. x1 - 1. x2 + 0.1 = 0
=> x1 - x2 = 0
=> x1 = x2
x1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 76
Example of Perceptron Decision Boundary
w = (w1, w2,w3)
= (1, -1, 0)
w . x’ = 0
x2
=> 1. x1 - 1. x2 + 0.1 = 0
=> x1 - x2 = 0
=> x1 = x2
This is the equation
for the decision boundary
x1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 77
Example of Perceptron Decision Boundary
w = (w1, w2,w3)
w . x’ < 0 = (1, -1, 0)
w . x’ = 0
=> x1 - x2 < 0 x2
=> x1 < x2
(this is the
equation for
decision
region -1)
x1
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 78
Example of Perceptron Decision Boundary
w = (w1, w2,w3)
= (1, -1, 0)
w . x’ = 0
x2
w . X’ < 0
w . x’ > 0
=> x1 - x2 > 0
=> x1 > x2
(this is the
equation for
x1 decision
region +1)
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 79
Representational Power of Perceptrons
• What mappings can a perceptron represent perfectly?
– A perceptron is a linear classifier
– thus it can represent any mapping that is linearly separable
– some Boolean functions like AND (on left)
– but not Boolean functions like XOR (on right)
0 x 0 x
0 0 x 0
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 80
Summary
• Review of Assignment 1
• K-nearest-neighbor classifiers
– Basic concepts
– Assignment 2
• Training and test accuracy
• Linear classifiers
– Perceptron classifier
• Next lecture
– How we can learn the weights of a perceptron
CS 175, Fall 2007: Professor Padhraic Smyth Slide Set 4: Discussion, Classification 81
Related docs
Other docs by noM8ZI4
essentiels pour le diagnostic Rechercher un s�jour en zone end�mique r�gion
Views: 52 | Downloads: 0
Get documents about "