# Lecture 12: Automated Reasoning under Uncertainty

Shared by:
Categories
Tags
-
Stats
views:
2
posted:
6/14/2012
language:
pages:
81
Document Sample

```							Lecture 4: Homework Discussion,
and more on Classification

CS 175, Fall 2007

Padhraic Smyth
Department of Computer Science
University of California, Irvine
Outline

• Discussion of Assignment 1

• Classification revisited

• Discussion of Assignment 2
– Due Wednesday (tomorrow) at noon

CS 175, Fall 2007: Professor Padhraic Smyth           Slide Set 4: Discussion, Classification 2
Grading of Assignment 1

• 40 points total
– Each MATLAB function = 10 points

– euclidean.m, nearest_neighbor.m, maxvalue.m
• functioning correctly on the test cases: +6 points
• comments: +2 points
• error-checking: +2 points

– test case example:
• x: random vector of length 100
• A: random matrix with 100 rows and 100 columns

CS 175, Fall 2007: Professor Padhraic Smyth                         Slide Set 4: Discussion, Classification 3
Comments on Grading

• Common mistakes
– incorrect definition of Euclidean distance

• dE(x, y) = sqrt(S (xi - yi)2   )
– no error-checking
• nearest_neighbor(x, A) => check that
– cols(A) = cols(x)
– rows(x) = 1
– no comments
• no comments in header
• no comments in body of nearest_neighbor.m

• If you find any errors in the grading of your assignment please
see Nathan during lab hours (or email him to make an
appointment)
– no “grade negotiating”!

CS 175, Fall 2007: Professor Padhraic Smyth                       Slide Set 4: Discussion, Classification 4
Suggestions

• Improve the performance by vectorization
– can speed-up significantly
– e.g., calculate vector distance in Euclidean.m function

• Do not output input / intermediate / output variables to screen
– can increase your run-time significantly
– use semicolon in the end of each line

• Helpful commands:
– To learn more about the function:
• help
– To find a built-in function:
• lookfor

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 5
Suggestions (2)

• Test your code (!)
– Some .m functions that were submitted did not run

– 2 types of errors:
• Simple syntax errors (understandable)

• Systematic errors
– Incorrect calculations (e.g.,for Euclidean.m)
– Incorrect logic in finding the minimum vector
– Sloppy assignment of variables to values

– How to address this:
• Define a set of simple test cases
• Run your code and compare with manual calculation
• Check that the results make intuitive sense

CS 175, Fall 2007: Professor Padhraic Smyth                            Slide Set 4: Discussion, Classification 6
Example of a Euclidean.m function
function dist = euclidean(x,y)
% function dist = euclidean(x,y)
%
% Calculates the Euclidean distance between   two vectors x and y
%                            A. Student, CS   175
% Inputs:
%    x, y: 2 vectors of real numbers, each    of size 1 x n
% Outputs:
%    dist: the Euclidean distance between x   and y

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 7
Example of a Euclidean.m function
function dist = euclidean(x,y)
% function dist = euclidean(x,y)
%
% Calculates the Euclidean distance between   two vectors x and y
%                            A. Student, CS   175
% Inputs:
%    x, y: 2 vectors of real numbers, each    of size 1 x n
% Outputs:
%    dist: the Euclidean distance between x   and y
[xr, xc] = size(x);
[yr, yc] = size(y);                                             Error Checking
if (xc ~= yc)
error('input vectors must be the same length');
end

if (xr ~= 1 | yr ~= 1)
error('inputs must both be row vectors (1 row, n columns)');
end

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 8
Example of a Euclidean.m function
function dist = euclidean(x,y)
% function dist = euclidean(x,y)
%
% Calculates the Euclidean distance between   two vectors x and y
%                            A. Student, CS   175
% Inputs:
%    x, y: 2 vectors of real numbers, each    of size 1 x n
% Outputs:
%    dist: the Euclidean distance between x   and y
[xr, xc] = size(x);
[yr, yc] = size(y);
Note the use of
if (xc ~= yc)                                                          vectorization
error('input vectors must be the same length');
end

if (xr ~= 1 | yr ~= 1)
error('inputs must both be row vectors (1 row, n columns)');
end

% calculate a vector of component_by_component distances
delta = x - y;

% now calculate the Euclidean distance
dist = sqrt(delta*delta’);

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 9
Min.m function in MATLAB
» help min
MIN Smallest component.
For vectors, MIN(X) is the smallest element in X. For matrices,
MIN(X) is a row vector containing the minimum element from each
column. For N-D arrays, MIN(X) operates along the first
non-singleton dimension.

[Y,I] = MIN(X) returns the indices of the minimum values in vector I.
If the values along the first non-singleton dimension contain more
than one minimal element, the index of the first one is returned.

MIN(X,Y) returns an array the same size as X and Y with the
smallest elements taken from X or Y. Either one can be a scalar.

[Y,I] = MIN(X,[],DIM) operates along the dimension DIM.
When complex, the magnitude MIN(ABS(X)) is used. NaN's are ignored
when computing the minimum.

Example: If X = [2 8 4 then min(X,[],1) is [2 3 4],
7 3 9]

min(X,[],2) is [2 and min(X,5) is [2 5 4
3],                5 3 5].

See also MAX, MEDIAN, MEAN, SORT.

CS 175, Fall 2007: Professor Padhraic Smyth                                        Slide Set 4: Discussion, Classification 10
Example of a maxvalue.m function
function [maxvalue, rmax, cmax] = maxvalue(A);
% function [maxvalue, rmax, cmax] = maxvalue(A);
% brief description of function here
%                            Your Name, CS 175
%
% Inputs
%     A: a matrix of size r x c, with r rows and c columns
%
% Outputs
%     maxvalue: largest entry in A
%     rmax, cmax: integers specifying the (row,column)
location of the max value

% Get a row vector containing the maximum value within each column
% Store idx_row - a vector containing the location of
% the maximum within the column
[mx_row, idx_row] = max(A);

% find the maximum within this vector
[maxvalue, cmax] = max(mx_row);

% Use the idx_row to find the row location of the max
rmax = idx_row(cmax);

CS 175, Fall 2007: Professor Padhraic Smyth                       Slide Set 4: Discussion, Classification 11
Example of a nearest_neighbor.m function
function [y, i, d] = nearest_neighbor(x, A)
% function [y, i, d] = nearest_neighbor(x, A)
%
% Find the row vector y from a matrix of row vectors A
% that is closest in Euclidean distance to row vector x.
%                             A. Student, CS 175
%
%   Inputs:
%     x: a vector of numbers of size 1 x n
%     A: k vectors of size 1 x n, "stacked" in a k x n matrix
%
%   Outputs:
%     y: the closes vector in A to x (of size 1 x n)
%     i: the integer (row) index of y in A
%     d: the Euclidean distance between x and y

CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 12
Example of a nearest_neighbor.m function
function [y, i, d] = nearest_neighbor(x, A)
% function [y, i, d] = nearest_neighbor(x, A)
%
% Find the row vector y from a matrix of row vectors A
% that is closest in Euclidean distance to row vector x.
%                             A. Student, CS 175
%
%   Inputs:
%     x: a vector of numbers of size 1 x n
%     A: k vectors of size 1 x n, "stacked" in a k x n matrix
%
%   Outputs:
%     y: the closes vector in A to x (of size 1 x n)
%     i: the integer (row) index of y in A
%     d: the Euclidean distance between x and y                    Error Checking
[xr, xc] = size(x);
[Ar, Ac] = size(A);

if (xc ~= Ac)
error('input vector x and matrix A must have the same number of columns');
end

if (xr ~= 1)
error('input vector x must be a row vector');
end
CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 13
“For loop” version of nearest_neighbor.m function

function [y, i, d] = nearest_neighbor(x, A)
% function [y, i, d] = nearest_neighbor(x, A)
%
..
..
[xr, xc] = size(x);
[Ar, Ac] = size(A);
..

% "for loop" version of code
distances = zeros(Ar,1)   % preallocate storage for distances
for j=1:Ar    % loop over rows in A
y = A(j,:);
distances(j) = euclidean(x,y);
end
% find the minimum distance and its location
[d i] = min(distances);
% find the vector (the row in A) corresponding to the minimum distance
y = A(i,:);

CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 14
CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 15
Repmat.m function in MATLAB

» help repmat
» repmat([1 2], 3, 1)
REPMAT Replicate and tile an array.
B = REPMAT(A,M,N) replicates and tiles the matrix A to produce the             ans =
M-by-N block matrix B.
1    2
B = REPMAT(A,[M N]) produces the same thing.                                      1    2
1    2
B = REPMAT(A,[M N P ...]) tiles the array A to produce a
M-by-N-by-P-by-... block array. A can be N-D.                                 »

REPMAT(A,M,N) when A is a scalar is commonly used to produce
an M-by-N matrix filled with A's value. This can be much faster
than A*ONES(M,N) when M and/or N are large.

Example:
repmat(magic(2),2,3)
repmat(NaN,2,3)

See also MESHGRID.

CS 175, Fall 2007: Professor Padhraic Smyth                                     Slide Set 4: Discussion, Classification 16
Vectorized version of nearest_neighbor.m function

function [y, i, d] = nearest_neighbor(x, A)
% function [y, i, d] = nearest_neighbor(x, A)
%
..
..
% VECTORIZED VERSION OF THE CODE
% create a matrix of size Ar x xc, where each row consists of x
xmatrix = repmat(x,Ar,1);

% subtract the components of xmatrix and A, by matrix subtraction
delta = xmatrix - A;

% now square the differences, by component multiplication
squaredelta = delta.*delta;

% sum up the squared differences, row by row (note use of transpose: ')
distances = sqrt(sum(squaredelta')');

% find the minimum distance and its location (as before)
[d i] = min(distances);
% find the vector (the row in A) corresponding to the minimum distance
y = A(i,:);

CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 17
Nearest-Neighbor Classification (revisited)
Example of Data from 2 Classes

TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
8

6

4
Feature 2

2

0

-2

-4
-4     -2    0     2      4       6   8     10     12        14
Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                                              Slide Set 4: Discussion, Classification 19
Classifiers and Decision Boundaries

• What is a Classifier?
– A classifier is a mapping from feature space (a d-dimensional
vector) to the class labels {1, 2, … m}

– Thus, a classifier partitions the feature space into m decision
regions

– The line or surface separating any 2 classes is the decision
boundary

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 20
2-Class Data with a Linear Decision Boundary

TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
8
Decision Region 2
Decision
Region 1
6

4
Feature 2

2

0

-2
Decision
Boundary
-4
-4     -2       0     2       4       6   8   10       12        14
Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                                                  Slide Set 4: Discussion, Classification 21
Classification Problem with Overlap

8

7

6

5
FEATURE 2

4

3

2

1

0
0   1   2   3       4       5   6    7           8
FEATURE 1

CS 175, Fall 2007: Professor Padhraic Smyth                                       Slide Set 4: Discussion, Classification 22
8

Minimum Error
7       Decision Boundary

6

5
FEATURE 2

4

3

2

1

0
0         1         2   3       4       5   6              7                8
FEATURE 1
CS 175, Fall 2007: Professor Padhraic Smyth                           Slide Set 4: Discussion, Classification 23
Classifiers = functions or mappings

Feature Values (which
are known, measured)                                 Predicted Class Value
(true class is unknown
a                                                to the classifier)

b                                                                       c
Classifier
d

z

We want a mapping or function which takes any combination of
values x = (a, b, d, ..... z) and will produce a prediction c,
i.e., a function c = f(a, b, d, …. z) which produces a value c=1, c=2,…c=m

The problem is that we don’t know this mapping: we have to learn it from data!

CS 175, Fall 2007: Professor Padhraic Smyth                      Slide Set 4: Discussion, Classification 24
Classification Accuracy

• Say we have N feature vectors
• Say we know the true class label for each feature vector

• We can measure how accurate a classifier is by how many
feature vectors it classifies correctly

• Accuracy = percentage of feature vectors correctly classified

– training accuracy = accuracy on training data

– test accuracy = accuracy on new data not used in training

CS 175, Fall 2007: Professor Padhraic Smyth                               Slide Set 4: Discussion, Classification 25
Some Notation

• Training Data
– Dtrain = { [x(1), c(1)] , [x(2), c(2)] , …………[x(N), c(N)] }
– N pairs of feature vectors and class labels

• Feature Vectors and Class Labels:
– x(i) is the ith training data feature vector
– in MATLAB this could be the ith row of an N x d matrix

– c(i) is the class label of the ith feature vector
– in general, c(i) can take m different class values, e.g., c = 1, c =
2, ...

– Let y be a new feature vector whose class label we do not know,
i.e., we wish to classify it.

CS 175, Fall 2007: Professor Padhraic Smyth                           Slide Set 4: Discussion, Classification 26
Example

Feature 2

1               1
1
2
2

1               1
2                     2
1
1                                   2

1
2            2
2

Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth                                             Slide Set 4: Discussion, Classification 27
kNN Decision Boundary (k=1)

1                     In general:
Nearest-neighbor classifier
1               1                                     produces piecewise linear
Feature 2                                                                       2         decision boundaries
2

1               1               2
2
1
1                                       2

1
2                   2
2

Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth                                                    Slide Set 4: Discussion, Classification 28
K-Nearest Neighbor (kNN) Classifier

• Find the k-nearest neighbors to y in Dtrain
– i.e., rank the feature vectors according to Euclidean distance
– select the k vectors which are have smallest distance to y

• Classification
– ranking yields k feature vectors and a set of k class labels
– pick the class label which is most common in this set (“vote”)
– classify y as belonging to this class

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 29
K-Nearest Neighbor (kNN) Classifier

• Notes:
–  In effect, the classifier uses the nearest k feature vectors from
Dtrain to “vote” on the class label for y
– the single-nearest neighbor classifier is the special case of k=1
– for two-class problems, if we choose k to be odd (i.e., k=1, 3, 5,…)
then there will never be any “ties”
– “training” is trivial for the kNN classifier, i.e., we just use Dtrain as
a “lookup table” when we want to classify a new feature vector

• Extensions of the Nearest Neighbor classifier
– weighted distances
• e.g., if some of the features are more important
• e.g., if features are irrelevant
– fast search techniques (indexing) to find k-nearest neighbors in d-
space

CS 175, Fall 2007: Professor Padhraic Smyth                             Slide Set 4: Discussion, Classification 30
Assignment 2

• Due Wednesday…..

• 4 parts
1.    Plot classification data in two-dimensions
2.    Implement a nearest-neighbor classifier
3.    Plot the errors of a k-nearest-neighbor classifier
4.    Test the effect of the value k on the accuracy of the classifier

CS 175, Fall 2007: Professor Padhraic Smyth                              Slide Set 4: Discussion, Classification 31
Data Structure

simdata1 =
shortname: 'Simulated Data 1'
numfeatures: 2
classnames: [2x6 char]
numclasses: 2
description: [1x66 char]
features: [200x2 double]
classlabels: [200x1 double]

CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 32
Plotting Function
function classplot(data, x, y);
% function classplot(data, x, y);
%
% brief description of what the function does
% ......
%                  Your Name, CS 175, date
%
% Inputs
% data: (a structure with the same fields as described above:
%        your comment header should describe the structure explicitly)
%        Note that if you are only using certain fields in the structure
%        in the function below, you need only define these fields in the input comments

-------- Your code goes here -------

CS 175, Fall 2007: Professor Padhraic Smyth                                Slide Set 4: Discussion, Classification 33
First simulated data set, simdata1

CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 34
Second simulated data set, simdata2

CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 35
Nearest Neighbor Classifier

function [class_predictions] = knn(traindata,trainlabels,k, testdata)
% function [class_predictions] = knn(traindata,trainlabels,k, testdata)
%
% a brief description of what the function does
% ......
%                  Your Name, CS 175, date
%
% Inputs
% traindata: a N1 x d vector of feature data (the "memory" for kNN)
% trainlabels: a N1 x 1 vector of classlabels for traindata
% k: an odd positive integer indicating the number of neighbors to use
% testdata: a N2 x d vector of feature data for testing the knn classifier
%
% Outputs
% class_predictions: N2 x 1 vector of predicted class values

-------- Your code goes here -------

CS 175, Fall 2007: Professor Padhraic Smyth                                   Slide Set 4: Discussion, Classification 36
Plotting k-NN Errors

function knn_plot(traindata,trainlabels,k,testdata,testlabels);
% function knn_plot(traindata,trainlabels,k,testdata,testlabels);
%
% Predicts class-labels for the data in testdata using the k nearest
% neighbors in traindata, and then plots the data (using the first
% 2 dimensions or first 2 features), displaying the data from each
% class in different colors, and overlaying circles on the points
% that were incorrectly classified.
%
% Inputs
% traindata: a N1 x d vector of feature data (the "memory" for kNN)
% trainlabels: a N1 x 1 vector of classlabels for traindata
% k: an odd positive integer indicating the number of neighbors to use
% testdata: a N2 x d vector of feature data for testing the knn classifier
% trainlabels: a N2 x 1 vector of classlabels for traindata

CS 175, Fall 2007: Professor Padhraic Smyth                                   Slide Set 4: Discussion, Classification 37
Accuracy of kNN Classifier as k is varied
function [errors] = knn_error_rates(traindata,trainlabels, testdata, testlabels,kmax,plotflag)
% function [errors] = knn_error_rates(traindata,trainlabels, testdata, testlabels,kmax,plotflag)
%
% a brief description of what the function does
% ......
%                  Your Name, CS 175, date
%
% Inputs
% traindata: a N1 x d vector of feature data (the "memory" for kNN)
% trainlabels: a N1 x 1 vector of classlabels for traindata
% testdata: a N2 x d vector of feature data for testing the knn classifier
% testlabels: a N2 x 1 vector of classlabels for traindata
% kmax: an odd positive integer indicating the maximum number of neighbors
% plotflag: (optional argument) if 1, the error-rates versus k is plotted,
%                      otherwise no plot.
%
% Outputs
% errors: r x 1 vector of error-rates on testdata, where r is the
%           number of values of k that are tested.

-------- Your code goes here -------

CS 175, Fall 2007: Professor Padhraic Smyth                                 Slide Set 4: Discussion, Classification 38
Training Data and Test Data

• Training data
– labeled data used to build a classifier
• Test data
– new data, not used in the training process, to evaluate how well a
classifier does on new data

• Memorization versus Generalization
– better training_accuracy
• “memorizing” the training data:
– better test_accuracy
• “generalizing” to new data
– in general, we would like our classifier to perform well on new test
data, not just on training data,
• i.e., we would like it to generalize well to new data
• Test accuracy is more important than training accuracy

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 39
Test Accuracy and Generalization

• The accuracy of our classifier on new unseen data is a
fair/honest assessment of the performance of our classifier

• Why is training accuracy not good enough?
– Training accuracy is optimistic
– a classifier like nearest-neighbor can construct boundaries which
always separate all training data points, but which do not separate
new points
• e.g., what is the training accuracy of kNN, k = 1?
– A flexible classifier can “overfit” the training data
• in effect it just memorizes the training data, but does not learn
the general relationship between x and C

• Generalization
– We are really interested in how our classifier generalizes to new
data
– test data accuracy is a good estimate of generalization
performance

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 40
Another Example

TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
6
Decision
Region 1           Decision
5                                                    Region 2

4

3
Feature 2

2

1

0
Decision
Boundary
-1
2      3      4      5       6          7   8        9          10
Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                                                    Slide Set 4: Discussion, Classification 41
A More Complex Decision Boundary

TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
6
Decision
Region 1                       Decision
5                                                   Region 2

4

3
Feature 2

2

1

0
Decision
Boundary
-1
2      3      4        5       6       7   8        9          10
Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                                                   Slide Set 4: Discussion, Classification 42
Example: The Overfitting Phenomenon

Y

X

CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 43
A Complex Model

Y = high-order polynomial in X

Y

X

CS 175, Fall 2007: Professor Padhraic Smyth          Slide Set 4: Discussion, Classification 44
The True (simpler) Model

Y = a X + b + noise

Y

X

CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 45
How Overfitting affects Prediction

Predictive
Error

Error on Training Data

Model Complexity

CS 175, Fall 2007: Professor Padhraic Smyth            Slide Set 4: Discussion, Classification 46
How Overfitting affects Prediction

Predictive
Error

Error on Test Data

Error on Training Data

Model Complexity

CS 175, Fall 2007: Professor Padhraic Smyth            Slide Set 4: Discussion, Classification 47
How Overfitting affects Prediction

Predictive                       Underfitting                  Overfitting
Error

Error on Test Data

Error on Training Data

Model Complexity

Ideal Range
for Model Complexity

CS 175, Fall 2007: Professor Padhraic Smyth                                        Slide Set 4: Discussion, Classification 48
Linear Classifiers
Decision Boundaries

• What is a Classifier?
– A classifier is a mapping from feature space (a d-dimensional
vector) to the class labels {1, 2, … m}
– Thus, a classifier partitions the feature space into m decision
regions
– A line or curve separating the classes is a decision boundary
• in more than 2 dimensions this is a surface (e.g., a
hyperplane)

• Linear Classifiers
– a linear classifier is a mapping which partitions feature space using
a linear function (a straight line, or a hyperplane)
– it is one of the simplest classifiers we can imagine
• “separate the two classes using a straight line in feature space”
– in 2 dimensions the decision boundary is a straight line

CS 175, Fall 2007: Professor Padhraic Smyth                           Slide Set 4: Discussion, Classification 50
2-Class Data with a Linear Decision Boundary

TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
8
Decision Region 2
Decision
Region 1
6

4
Feature 2

2

0

-2
Decision
Boundary
-4
-4     -2       0     2       4       6   8   10       12        14
Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                                                  Slide Set 4: Discussion, Classification 51
Non-Linearly Separable Data, with Decision Boundary

TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
6
Decision
Region 1           Decision
5                                                    Region 2

4

3
Feature 2

2

1

0
Decision
Boundary
-1
2      3      4      5       6          7   8        9          10
Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                                                    Slide Set 4: Discussion, Classification 52
Convex Hull of a Set of Points

•      Convex Hull of a set of Q points:
– Intuitively
• think of each point in Q as a nail sticking out from a 2d board
• the convex hull = the shape formed by a tight rubber band
that surrounds all the nails
– Formally: “the convex hull is the smallest convex polygon P for
which each point in Q is either on the boundary of P or in its
interior”
• (p.898, Cormen, Leiserson, and Rivest, Introduction to Algorithms)
• can be found (for n points) in time n log n

• Relation to Class Overlap
–    define convex hulls of data points D1 and D2 as P1 and P2
–    If P1 and P2 do not intersect => D1 and D2 are linearly separable
–    if P1 and P2 intersect, then we have overlap
–    If P1 and P2 intersect then D1 and D2 are not linearly separable

CS 175, Fall 2007: Professor Padhraic Smyth                                  Slide Set 4: Discussion, Classification 53
Convex Hull Example

Feature 2

Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth               Slide Set 4: Discussion, Classification 54
Convex Hull Example

Convex Hull P1
Feature 2

Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 55
Data from 2 Classes: Linearly Separable?

Feature 2                                              x

x
x

x
x

x
x

Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 56
Data from 2 Classes: Linearly Separable?

Convex Hull P1
Feature 2                                                  x

x
x

x
x

x
x

Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                       Slide Set 4: Discussion, Classification 57
Data from 2 Classes: linearly separable?

Convex Hull P1                                                    Convex Hull P2
Feature 2                                                                         x

x
x

x
x

x
The 2 Hulls intersect      x
=> data from each class
are not linearly separable
Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                                              Slide Set 4: Discussion, Classification 58
Different data that is linearly separable

Convex Hull P1                             Convex Hull P2
Feature 2                                                  x

x
x

x
x

x
x

Feature 1

CS 175, Fall 2007: Professor Padhraic Smyth                       Slide Set 4: Discussion, Classification 59
Some Theory
Let N be the number of data points

Let d be the dimension of the data points

Consider N points in general position and assume each point is labeled
as belonging to class 1 or class 2

There are 2N possible labelings

Let F(N, d) = the fraction of labelings of N points in d dimensions
that are linearly separable

It can be shown that:

=       1     if d > N-2
F(N, d)
= (1 / 2 N-1 )       Sdi=0   (N-1)! / [ (N-1-i)! i! ] if N > d

CS 175, Fall 2007: Professor Padhraic Smyth                                     Slide Set 4: Discussion, Classification 60
Fraction of Labellings in d-space that are Linearly Separable

F(N,d)
= fraction that
are linearly                              d = infinity
separable
1
d = 10

0.5
d=1

0             1    2             3             N/(d+1)

CS 175, Fall 2007: Professor Padhraic Smyth                                Slide Set 4: Discussion, Classification 61
Fraction of Labellings in d-space that are Linearly Separable

F(N,d)                                        Note that for N <= d+1,
= fraction that                               any labeling of N points in
are linearly                                  d-dimensions is linearly separable
separable                                     (e.g., N=3, d = 2 or N=50, d=100)
1

0.5

0             1   2     3            N/(d+1)

CS 175, Fall 2007: Professor Padhraic Smyth                      Slide Set 4: Discussion, Classification 62
A Linear Classifier in 2 Dimensions

Let Feature 1 be called X
Let Feature 2 be called Y

A linear classifier is a linear function of X and Y,
i.e., it computes f(X,Y) = aX + bY + c

Here a, b, and c are the “weights” of the classifier

Define the output of the linear classifier to be
T(f) = -1, if f <= 0
T(f) = +1, if f > 0

if f(X,Y) <= 0, the classifier produces a “-1” (Decision Region 1)

if f(X,Y) > 0, the classifier produces a “+1” (Decision Region 2)

CS 175, Fall 2007: Professor Padhraic Smyth                           Slide Set 4: Discussion, Classification 63
Decision Boundaries for a 2d Linear Classifier

Depending on whether f(X,Y) is > or < 0, the features (X,Y) get
classified into class 1 or class 2

Thus, f(X,Y) = 0 must define the decision boundary between class 1 and 2

CS 175, Fall 2007: Professor Padhraic Smyth                         Slide Set 4: Discussion, Classification 64
Decision Boundaries for a 2d Linear Classifier

Depending on whether f(X,Y) is > or < 0, the features (X,Y) get
classified into class 1 or class 2

Thus, f(X,Y) = 0 must define the decision boundary between class 1 and 2

What is the equation for this decision boundary?

f(X,Y) = aX + bY + c = 0   OR    Y = (c – aX)/b

Thus, defining a, b, and c automatically locates the decision boundary in
X,Y space

In summary:
- a classifier defines a decision boundaries between classes
- for a linear classifier, this boundary is a line or a plane
- the equation of the plane is defined by the parameters of
the classifier

CS 175, Fall 2007: Professor Padhraic Smyth                         Slide Set 4: Discussion, Classification 65
An Example of a Linear Decision Boundary

14
Decision
Region 1
12

10

8
Decision
6                                                   Region 2

4

2

Decision Boundary
0                       defined by
a = 1, b = -1, c = 0
-2

-4
-4        -2   0   2   4        6        8    10      12        14

CS 175, Fall 2007: Professor Padhraic Smyth                                        Slide Set 4: Discussion, Classification 66
A Better Linear Decision Boundary

14

12                Decision
Region 2

10     Decision
Region 1
8

6

4

2

0

Decision Boundary
-2
defined by
a = 1, b = 1, c = 0
-4
-4        -2     0         2   4   6        8      10          12        14

CS 175, Fall 2007: Professor Padhraic Smyth                                                 Slide Set 4: Discussion, Classification 67
The Perceptron Classifier (for 2 features)

X
w1

w2
Y                              f = w1 X + w2 Y + w3     T(f)              {-1, +1}

w3                                      Threshold
Weighted Sum         Function            Output
1                                                                         = class
of the inputs
decision

CS 175, Fall 2007: Professor Padhraic Smyth                               Slide Set 4: Discussion, Classification 68
The Perceptron Classifier (for 2 features)

X
w1

w2
Y                              f = w1 X + w2 Y + w3          T(f)               {-1, +1}

w3                                         Threshold
Weighted Sum            Function               Output
1                                                                               = class
of the inputs
decision

Note: weights w1, w2, w3,
are the same as a, b, c in the
previous slides, i.e., f = aX + bY + c

CS 175, Fall 2007: Professor Padhraic Smyth                                     Slide Set 4: Discussion, Classification 69
Perceptrons

• Perceptron = a linear classifier
– The w’s are the weights (denoted as a, b,c, earlier)
• real-valued constants (can be positive or negative)
– Define an additional constant input “1” (allows an intercept in
decision boundary)

• A perceptron calculates 2 quantities:
– 1. A weighted sum of the input features
– 2. This sum is then thresholded by the T function

• A simple artificial model of human neurons
• weights = “synapses”
• threshold = “neuron firing”

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 70
Notation

• Inputs:
– x1, x2, …………, xd, xd+1
– x1, x2, …………, xd-1, xd are the values of the d features
– xd+1 = 1 (a constant input)
– x = (x1, x2, …………, xd, xd+1 )

• Weights:
– w1, w2, …………, wd, wd+1
– we have d+1 weights
– one for each feature + one for the constant
– w = (w1, w2, …………, wd, wd+1 )

CS 175, Fall 2007: Professor Padhraic Smyth                         Slide Set 4: Discussion, Classification 71
Perceptron Operation

• Equations of operation:

=   1   (if w1x1 +… wd+1 xd+1 > 0)
o[x1, x2,…, xd-1, xd]

=   -1 (otherwise)

Note that
w = (w1,….. wd+1) , the “weight vector” (row vector, 1 x d+1)

and x = (x1,…… xd+1), the “feature vector” (row vector, 1 x d+1)

=>             w1x1 + w2x2 +… wd+1 xd+1 = w . x’

and w . x’ is the “vector inner product” (w*x’ or w.*x in MATLAB)

CS 175, Fall 2007: Professor Padhraic Smyth                            Slide Set 4: Discussion, Classification 72
Vector Inner Product

This is the transpose
Note that
of the row vector x
(it becomes a column
w . x’ = (w1,….. wd+1) (x1           vector)

x2
..
..
xd
xd+1 )
= w1x1 + w2x2 +… wd+1 xd+1

CS 175, Fall 2007: Professor Padhraic Smyth                                Slide Set 4: Discussion, Classification 73
Perceptron Decision Boundary

• Equations of operation (in vector form):

=   1   (if w . x’ > 0)
o(x1, x2,…, xd, xd+1)

=   -1 (otherwise)

The perceptron represents a hyperplane decision surface
in d-dimensional space
e.g., a line in 2d, a plane in 3d, etc

The equation of the hyperplane is
w . x’ = 0

This is the equation for points in x-space that are on the
boundary

CS 175, Fall 2007: Professor Padhraic Smyth                             Slide Set 4: Discussion, Classification 74
Example of Perceptron Decision Boundary
w = (w1, w2,w3)
= (1, -1, 0)

x2

x1

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 75
Example of Perceptron Decision Boundary
w = (w1, w2,w3)
= (1, -1, 0)
w . x’ = 0
x2
=> 1. x1 - 1. x2 + 0.1 = 0

=> x1 - x2 = 0

=> x1 = x2

x1

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 76
Example of Perceptron Decision Boundary
w = (w1, w2,w3)
= (1, -1, 0)
w . x’ = 0
x2
=> 1. x1 - 1. x2 + 0.1 = 0

=> x1 - x2 = 0

=> x1 = x2

This is the equation
for the decision boundary

x1

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 77
Example of Perceptron Decision Boundary
w = (w1, w2,w3)
w . x’ < 0                                 = (1, -1, 0)
w . x’ = 0
=> x1 - x2 < 0                     x2

=> x1 < x2
(this is the
equation for
decision
region -1)

x1

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 78
Example of Perceptron Decision Boundary
w = (w1, w2,w3)
= (1, -1, 0)
w . x’ = 0
x2

w . X’ < 0

w . x’ > 0

=> x1 - x2 > 0

=> x1 > x2
(this is the
equation for
x1           decision
region +1)

CS 175, Fall 2007: Professor Padhraic Smyth                                  Slide Set 4: Discussion, Classification 79
Representational Power of Perceptrons

• What mappings can a perceptron represent perfectly?
–    A perceptron is a linear classifier
–    thus it can represent any mapping that is linearly separable
–    some Boolean functions like AND (on left)
–    but not Boolean functions like XOR (on right)

0                   x                  0             x

0                  0                  x               0

CS 175, Fall 2007: Professor Padhraic Smyth                            Slide Set 4: Discussion, Classification 80
Summary

• Review of Assignment 1

• K-nearest-neighbor classifiers
– Basic concepts
– Assignment 2

• Training and test accuracy

• Linear classifiers
– Perceptron classifier

• Next lecture
– How we can learn the weights of a perceptron

CS 175, Fall 2007: Professor Padhraic Smyth                       Slide Set 4: Discussion, Classification 81

```
Related docs
Other docs by noM8ZI4
aves protegidas
Views: 4  |  Downloads: 0
Publicado por: ale el 28/02/2007 09:34 AM
Views: 0  |  Downloads: 0
VAIKU IR MOKSLEIVIU � LIETUVIU LIAUDIES
Views: 461  |  Downloads: 0
AUTOMOVIL CLUB PERUANO - DOC
Views: 32  |  Downloads: 0
N�vel J�nior � Perfil 01
Views: 128  |  Downloads: 0
REGIONE PUGLIA
Views: 3  |  Downloads: 0
Anexo 4 Relaci n de Participantes des 9
Views: 31  |  Downloads: 0