Lecture 12: Automated Reasoning under Uncertainty

Shared by: noM8ZI4
Categories
Tags
-
Stats
views:
2
posted:
6/14/2012
language:
pages:
81
Document Sample
scope of work template
							Lecture 4: Homework Discussion,
   and more on Classification



            CS 175, Fall 2007




             Padhraic Smyth
     Department of Computer Science
      University of California, Irvine
        Outline

          • Discussion of Assignment 1



          • Classification revisited



          • Discussion of Assignment 2
                 – Due Wednesday (tomorrow) at noon




CS 175, Fall 2007: Professor Padhraic Smyth           Slide Set 4: Discussion, Classification 2
        Grading of Assignment 1

           • 40 points total
                   – Each MATLAB function = 10 points

                   – euclidean.m, nearest_neighbor.m, maxvalue.m
                      • functioning correctly on the test cases: +6 points
                      • comments: +2 points
                      • error-checking: +2 points

                   – test case example:
                       • x: random vector of length 100
                       • A: random matrix with 100 rows and 100 columns




CS 175, Fall 2007: Professor Padhraic Smyth                         Slide Set 4: Discussion, Classification 3
        Comments on Grading

          • Common mistakes
                 – incorrect definition of Euclidean distance

                         • dE(x, y) = sqrt(S (xi - yi)2   )
                 – no error-checking
                    • nearest_neighbor(x, A) => check that
                         – cols(A) = cols(x)
                         – rows(x) = 1
                 – no comments
                    • no comments in header
                    • no comments in body of nearest_neighbor.m


          • If you find any errors in the grading of your assignment please
            see Nathan during lab hours (or email him to make an
            appointment)
                 – no “grade negotiating”!



CS 175, Fall 2007: Professor Padhraic Smyth                       Slide Set 4: Discussion, Classification 4
        Suggestions

          • Improve the performance by vectorization
                 – can speed-up significantly
                 – e.g., calculate vector distance in Euclidean.m function


          • Do not output input / intermediate / output variables to screen
                 – can increase your run-time significantly
                 – use semicolon in the end of each line


          • Helpful commands:
                 – To learn more about the function:
                    • help
                 – To find a built-in function:
                    • lookfor




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 5
        Suggestions (2)

          • Test your code (!)
                 – Some .m functions that were submitted did not run

                 – 2 types of errors:
                     • Simple syntax errors (understandable)

                         • Systematic errors
                             – Incorrect calculations (e.g.,for Euclidean.m)
                             – Incorrect logic in finding the minimum vector
                             – Sloppy assignment of variables to values

                 – How to address this:
                    • Define a set of simple test cases
                    • Run your code and compare with manual calculation
                    • Check that the results make intuitive sense




CS 175, Fall 2007: Professor Padhraic Smyth                            Slide Set 4: Discussion, Classification 6
        Example of a Euclidean.m function
               function dist = euclidean(x,y)
               % function dist = euclidean(x,y)
               %
               % Calculates the Euclidean distance between   two vectors x and y
               %                            A. Student, CS   175
               % Inputs:
               %    x, y: 2 vectors of real numbers, each    of size 1 x n
               % Outputs:
               %    dist: the Euclidean distance between x   and y




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 7
        Example of a Euclidean.m function
               function dist = euclidean(x,y)
               % function dist = euclidean(x,y)
               %
               % Calculates the Euclidean distance between   two vectors x and y
               %                            A. Student, CS   175
               % Inputs:
               %    x, y: 2 vectors of real numbers, each    of size 1 x n
               % Outputs:
               %    dist: the Euclidean distance between x   and y
               [xr, xc] = size(x);
               [yr, yc] = size(y);                                             Error Checking
               if (xc ~= yc)
                  error('input vectors must be the same length');
               end

               if (xr ~= 1 | yr ~= 1)
                  error('inputs must both be row vectors (1 row, n columns)');
               end




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 8
        Example of a Euclidean.m function
               function dist = euclidean(x,y)
               % function dist = euclidean(x,y)
               %
               % Calculates the Euclidean distance between   two vectors x and y
               %                            A. Student, CS   175
               % Inputs:
               %    x, y: 2 vectors of real numbers, each    of size 1 x n
               % Outputs:
               %    dist: the Euclidean distance between x   and y
               [xr, xc] = size(x);
               [yr, yc] = size(y);
                                                                                      Note the use of
               if (xc ~= yc)                                                          vectorization
                  error('input vectors must be the same length');
               end

               if (xr ~= 1 | yr ~= 1)
                  error('inputs must both be row vectors (1 row, n columns)');
               end

               % calculate a vector of component_by_component distances
               delta = x - y;

               % now calculate the Euclidean distance
               dist = sqrt(delta*delta’);

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 9
        Min.m function in MATLAB
        » help min
        MIN Smallest component.
          For vectors, MIN(X) is the smallest element in X. For matrices,
          MIN(X) is a row vector containing the minimum element from each
          column. For N-D arrays, MIN(X) operates along the first
          non-singleton dimension.

           [Y,I] = MIN(X) returns the indices of the minimum values in vector I.
           If the values along the first non-singleton dimension contain more
           than one minimal element, the index of the first one is returned.

           MIN(X,Y) returns an array the same size as X and Y with the
           smallest elements taken from X or Y. Either one can be a scalar.

           [Y,I] = MIN(X,[],DIM) operates along the dimension DIM.
           When complex, the magnitude MIN(ABS(X)) is used. NaN's are ignored
           when computing the minimum.

           Example: If X = [2 8 4 then min(X,[],1) is [2 3 4],
                           7 3 9]

              min(X,[],2) is [2 and min(X,5) is [2 5 4
                             3],                5 3 5].

           See also MAX, MEDIAN, MEAN, SORT.

CS 175, Fall 2007: Professor Padhraic Smyth                                        Slide Set 4: Discussion, Classification 10
        Example of a maxvalue.m function
               function [maxvalue, rmax, cmax] = maxvalue(A);
                 % function [maxvalue, rmax, cmax] = maxvalue(A);
                 % brief description of function here
                 %                            Your Name, CS 175
                 %
                 % Inputs
                 %     A: a matrix of size r x c, with r rows and c columns
                 %
                 % Outputs
                 %     maxvalue: largest entry in A
                 %     rmax, cmax: integers specifying the (row,column)
                                         location of the max value

               % Get a row vector containing the maximum value within each column
               % Store idx_row - a vector containing the location of
               % the maximum within the column
               [mx_row, idx_row] = max(A);

               % find the maximum within this vector
               [maxvalue, cmax] = max(mx_row);

               % Use the idx_row to find the row location of the max
               rmax = idx_row(cmax);




CS 175, Fall 2007: Professor Padhraic Smyth                       Slide Set 4: Discussion, Classification 11
        Example of a nearest_neighbor.m function
      function [y, i, d] = nearest_neighbor(x, A)
      % function [y, i, d] = nearest_neighbor(x, A)
      %
      % Find the row vector y from a matrix of row vectors A
      % that is closest in Euclidean distance to row vector x.
      %                             A. Student, CS 175
      %
      %   Inputs:
      %     x: a vector of numbers of size 1 x n
      %     A: k vectors of size 1 x n, "stacked" in a k x n matrix
      %
      %   Outputs:
      %     y: the closes vector in A to x (of size 1 x n)
      %     i: the integer (row) index of y in A
      %     d: the Euclidean distance between x and y




CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 12
        Example of a nearest_neighbor.m function
      function [y, i, d] = nearest_neighbor(x, A)
      % function [y, i, d] = nearest_neighbor(x, A)
      %
      % Find the row vector y from a matrix of row vectors A
      % that is closest in Euclidean distance to row vector x.
      %                             A. Student, CS 175
      %
      %   Inputs:
      %     x: a vector of numbers of size 1 x n
      %     A: k vectors of size 1 x n, "stacked" in a k x n matrix
      %
      %   Outputs:
      %     y: the closes vector in A to x (of size 1 x n)
      %     i: the integer (row) index of y in A
      %     d: the Euclidean distance between x and y                    Error Checking
      [xr, xc] = size(x);
      [Ar, Ac] = size(A);

      if (xc ~= Ac)
         error('input vector x and matrix A must have the same number of columns');
      end

      if (xr ~= 1)
         error('input vector x must be a row vector');
      end
CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 13
        “For loop” version of nearest_neighbor.m function

      function [y, i, d] = nearest_neighbor(x, A)
      % function [y, i, d] = nearest_neighbor(x, A)
      %
      ..
      ..
      [xr, xc] = size(x);
      [Ar, Ac] = size(A);
      ..

      % "for loop" version of code
      distances = zeros(Ar,1)   % preallocate storage for distances
      for j=1:Ar    % loop over rows in A
         y = A(j,:);
         distances(j) = euclidean(x,y);
      end
      % find the minimum distance and its location
      [d i] = min(distances);
      % find the vector (the row in A) corresponding to the minimum distance
      y = A(i,:);




CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 14
CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 15
        Repmat.m function in MATLAB

         » help repmat
                                                                                          » repmat([1 2], 3, 1)
          REPMAT Replicate and tile an array.
           B = REPMAT(A,M,N) replicates and tiles the matrix A to produce the             ans =
           M-by-N block matrix B.
                                                                                              1    2
            B = REPMAT(A,[M N]) produces the same thing.                                      1    2
                                                                                              1    2
            B = REPMAT(A,[M N P ...]) tiles the array A to produce a
            M-by-N-by-P-by-... block array. A can be N-D.                                 »

            REPMAT(A,M,N) when A is a scalar is commonly used to produce
            an M-by-N matrix filled with A's value. This can be much faster
            than A*ONES(M,N) when M and/or N are large.

            Example:
              repmat(magic(2),2,3)
              repmat(NaN,2,3)

            See also MESHGRID.




CS 175, Fall 2007: Professor Padhraic Smyth                                     Slide Set 4: Discussion, Classification 16
        Vectorized version of nearest_neighbor.m function

      function [y, i, d] = nearest_neighbor(x, A)
      % function [y, i, d] = nearest_neighbor(x, A)
      %
      ..
      ..
      % VECTORIZED VERSION OF THE CODE
      % create a matrix of size Ar x xc, where each row consists of x
      xmatrix = repmat(x,Ar,1);

      % subtract the components of xmatrix and A, by matrix subtraction
      delta = xmatrix - A;

      % now square the differences, by component multiplication
      squaredelta = delta.*delta;

      % sum up the squared differences, row by row (note use of transpose: ')
      distances = sqrt(sum(squaredelta')');

      % find the minimum distance and its location (as before)
      [d i] = min(distances);
      % find the vector (the row in A) corresponding to the minimum distance
      y = A(i,:);




CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 17
Nearest-Neighbor Classification (revisited)
        Example of Data from 2 Classes

                                             TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
                                      8



                                      6



                                      4
                          Feature 2




                                      2



                                      0



                                      -2



                                      -4
                                        -4     -2    0     2      4       6   8     10     12        14
                                                                  Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                                              Slide Set 4: Discussion, Classification 19
        Classifiers and Decision Boundaries

          • What is a Classifier?
                 – A classifier is a mapping from feature space (a d-dimensional
                   vector) to the class labels {1, 2, … m}

                 – Thus, a classifier partitions the feature space into m decision
                   regions

                 – The line or surface separating any 2 classes is the decision
                   boundary




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 20
        2-Class Data with a Linear Decision Boundary

                                             TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
                                      8
                                                            Decision Region 2
                                             Decision
                                             Region 1
                                      6



                                      4
                          Feature 2




                                      2



                                      0



                                      -2
                                                                                           Decision
                                                                                           Boundary
                                      -4
                                        -4     -2       0     2       4       6   8   10       12        14
                                                                      Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                                                  Slide Set 4: Discussion, Classification 21
        Classification Problem with Overlap


                                          8


                                          7


                                          6


                                          5
                              FEATURE 2




                                          4


                                          3


                                          2


                                          1


                                          0
                                              0   1   2   3       4       5   6    7           8
                                                              FEATURE 1




CS 175, Fall 2007: Professor Padhraic Smyth                                       Slide Set 4: Discussion, Classification 22
                      8

                              Minimum Error
                      7       Decision Boundary

                      6


                      5
          FEATURE 2




                      4


                      3


                      2


                      1


                      0
                          0         1         2   3       4       5   6              7                8
                                                      FEATURE 1
CS 175, Fall 2007: Professor Padhraic Smyth                           Slide Set 4: Discussion, Classification 23
          Classifiers = functions or mappings

            Feature Values (which
            are known, measured)                                 Predicted Class Value
                                                                 (true class is unknown
                a                                                to the classifier)

                b                                                                       c
                                              Classifier
                d


                 z


        We want a mapping or function which takes any combination of
        values x = (a, b, d, ..... z) and will produce a prediction c,
        i.e., a function c = f(a, b, d, …. z) which produces a value c=1, c=2,…c=m


        The problem is that we don’t know this mapping: we have to learn it from data!


CS 175, Fall 2007: Professor Padhraic Smyth                      Slide Set 4: Discussion, Classification 24
        Classification Accuracy

          • Say we have N feature vectors
          • Say we know the true class label for each feature vector

          • We can measure how accurate a classifier is by how many
            feature vectors it classifies correctly

          • Accuracy = percentage of feature vectors correctly classified

                 – training accuracy = accuracy on training data

                 – test accuracy = accuracy on new data not used in training




CS 175, Fall 2007: Professor Padhraic Smyth                               Slide Set 4: Discussion, Classification 25
        Some Notation


          • Training Data
                 – Dtrain = { [x(1), c(1)] , [x(2), c(2)] , …………[x(N), c(N)] }
                 – N pairs of feature vectors and class labels


          • Feature Vectors and Class Labels:
                 – x(i) is the ith training data feature vector
                 – in MATLAB this could be the ith row of an N x d matrix

                 – c(i) is the class label of the ith feature vector
                 – in general, c(i) can take m different class values, e.g., c = 1, c =
                   2, ...

                 – Let y be a new feature vector whose class label we do not know,
                   i.e., we wish to classify it.




CS 175, Fall 2007: Professor Padhraic Smyth                           Slide Set 4: Discussion, Classification 26
        Example

     Feature 2


                                           1               1
                                                                       1
                                                                                    2
                                                               2

                                                   1               1
                                                                               2                     2
                                                           1
                                       1                                   2


                                                       1
                                                                       2            2
                                               2


                                                                               Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth                                             Slide Set 4: Discussion, Classification 27
        kNN Decision Boundary (k=1)


                                                                           1                     In general:
                                                                                                 Nearest-neighbor classifier
                                           1               1                                     produces piecewise linear
       Feature 2                                                                       2         decision boundaries
                                                               2

                                                   1               1               2
                                                                                                            2
                                                           1
                                       1                                       2


                                                       1
                                                                       2                   2
                                               2


                                                                                   Feature 1
CS 175, Fall 2007: Professor Padhraic Smyth                                                    Slide Set 4: Discussion, Classification 28
        K-Nearest Neighbor (kNN) Classifier

          • Find the k-nearest neighbors to y in Dtrain
                 – i.e., rank the feature vectors according to Euclidean distance
                 – select the k vectors which are have smallest distance to y


          • Classification
                 – ranking yields k feature vectors and a set of k class labels
                 – pick the class label which is most common in this set (“vote”)
                 – classify y as belonging to this class




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 29
        K-Nearest Neighbor (kNN) Classifier

          • Notes:
                 –  In effect, the classifier uses the nearest k feature vectors from
                   Dtrain to “vote” on the class label for y
                 – the single-nearest neighbor classifier is the special case of k=1
                 – for two-class problems, if we choose k to be odd (i.e., k=1, 3, 5,…)
                   then there will never be any “ties”
                 – “training” is trivial for the kNN classifier, i.e., we just use Dtrain as
                   a “lookup table” when we want to classify a new feature vector


          • Extensions of the Nearest Neighbor classifier
                 – weighted distances
                     • e.g., if some of the features are more important
                     • e.g., if features are irrelevant
                 – fast search techniques (indexing) to find k-nearest neighbors in d-
                   space




CS 175, Fall 2007: Professor Padhraic Smyth                             Slide Set 4: Discussion, Classification 30
        Assignment 2

          • Due Wednesday…..

          • 4 parts
                 1.    Plot classification data in two-dimensions
                 2.    Implement a nearest-neighbor classifier
                 3.    Plot the errors of a k-nearest-neighbor classifier
                 4.    Test the effect of the value k on the accuracy of the classifier




CS 175, Fall 2007: Professor Padhraic Smyth                              Slide Set 4: Discussion, Classification 31
        Data Structure

          simdata1 =
              shortname: 'Simulated Data 1'
              numfeatures: 2
              classnames: [2x6 char]
              numclasses: 2
              description: [1x66 char]
              features: [200x2 double]
              classlabels: [200x1 double]




CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 32
        Plotting Function
       function classplot(data, x, y);
        % function classplot(data, x, y);
        %
        % brief description of what the function does
        % ......
        %                  Your Name, CS 175, date
        %
        % Inputs
        % data: (a structure with the same fields as described above:
        %        your comment header should describe the structure explicitly)
        %        Note that if you are only using certain fields in the structure
        %        in the function below, you need only define these fields in the input comments

            -------- Your code goes here -------




CS 175, Fall 2007: Professor Padhraic Smyth                                Slide Set 4: Discussion, Classification 33
        First simulated data set, simdata1




CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 34
        Second simulated data set, simdata2




CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 35
        Nearest Neighbor Classifier

                     function [class_predictions] = knn(traindata,trainlabels,k, testdata)
                      % function [class_predictions] = knn(traindata,trainlabels,k, testdata)
                      %
                      % a brief description of what the function does
                      % ......
                      %                  Your Name, CS 175, date
                      %
                      % Inputs
                      % traindata: a N1 x d vector of feature data (the "memory" for kNN)
                      % trainlabels: a N1 x 1 vector of classlabels for traindata
                      % k: an odd positive integer indicating the number of neighbors to use
                      % testdata: a N2 x d vector of feature data for testing the knn classifier
                      %
                      % Outputs
                      % class_predictions: N2 x 1 vector of predicted class values

                          -------- Your code goes here -------




CS 175, Fall 2007: Professor Padhraic Smyth                                   Slide Set 4: Discussion, Classification 36
        Plotting k-NN Errors

                   function knn_plot(traindata,trainlabels,k,testdata,testlabels);
                   % function knn_plot(traindata,trainlabels,k,testdata,testlabels);
                   %
                   % Predicts class-labels for the data in testdata using the k nearest
                   % neighbors in traindata, and then plots the data (using the first
                   % 2 dimensions or first 2 features), displaying the data from each
                   % class in different colors, and overlaying circles on the points
                   % that were incorrectly classified.
                   %
                   % Inputs
                   % traindata: a N1 x d vector of feature data (the "memory" for kNN)
                   % trainlabels: a N1 x 1 vector of classlabels for traindata
                   % k: an odd positive integer indicating the number of neighbors to use
                   % testdata: a N2 x d vector of feature data for testing the knn classifier
                   % trainlabels: a N2 x 1 vector of classlabels for traindata




CS 175, Fall 2007: Professor Padhraic Smyth                                   Slide Set 4: Discussion, Classification 37
        Accuracy of kNN Classifier as k is varied
          function [errors] = knn_error_rates(traindata,trainlabels, testdata, testlabels,kmax,plotflag)
          % function [errors] = knn_error_rates(traindata,trainlabels, testdata, testlabels,kmax,plotflag)
          %
          % a brief description of what the function does
          % ......
          %                  Your Name, CS 175, date
          %
          % Inputs
          % traindata: a N1 x d vector of feature data (the "memory" for kNN)
          % trainlabels: a N1 x 1 vector of classlabels for traindata
          % testdata: a N2 x d vector of feature data for testing the knn classifier
          % testlabels: a N2 x 1 vector of classlabels for traindata
          % kmax: an odd positive integer indicating the maximum number of neighbors
          % plotflag: (optional argument) if 1, the error-rates versus k is plotted,
          %                      otherwise no plot.
          %
          % Outputs
          % errors: r x 1 vector of error-rates on testdata, where r is the
          %           number of values of k that are tested.

              -------- Your code goes here -------


CS 175, Fall 2007: Professor Padhraic Smyth                                 Slide Set 4: Discussion, Classification 38
        Training Data and Test Data

          • Training data
                 – labeled data used to build a classifier
          • Test data
                 – new data, not used in the training process, to evaluate how well a
                   classifier does on new data


          • Memorization versus Generalization
                 – better training_accuracy
                     • “memorizing” the training data:
                 – better test_accuracy
                     • “generalizing” to new data
                 – in general, we would like our classifier to perform well on new test
                   data, not just on training data,
                     • i.e., we would like it to generalize well to new data
                     • Test accuracy is more important than training accuracy




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 39
        Test Accuracy and Generalization

          • The accuracy of our classifier on new unseen data is a
            fair/honest assessment of the performance of our classifier

          • Why is training accuracy not good enough?
                 – Training accuracy is optimistic
                 – a classifier like nearest-neighbor can construct boundaries which
                   always separate all training data points, but which do not separate
                   new points
                     • e.g., what is the training accuracy of kNN, k = 1?
                 – A flexible classifier can “overfit” the training data
                     • in effect it just memorizes the training data, but does not learn
                        the general relationship between x and C


          • Generalization
                 – We are really interested in how our classifier generalizes to new
                   data
                 – test data accuracy is a good estimate of generalization
                   performance

CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 40
        Another Example

                                               TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
                                      6
                                                                        Decision
                                                                        Region 1           Decision
                                      5                                                    Region 2


                                      4


                                      3
                          Feature 2




                                      2


                                      1


                                      0
                                                             Decision
                                                             Boundary
                                      -1
                                           2      3      4      5       6          7   8        9          10
                                                                    Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                                                    Slide Set 4: Discussion, Classification 41
        A More Complex Decision Boundary

                                               TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
                                      6
                                                          Decision
                                                           Region 1                       Decision
                                      5                                                   Region 2


                                      4


                                      3
                          Feature 2




                                      2


                                      1


                                      0
                                                               Decision
                                                               Boundary
                                      -1
                                           2      3      4        5       6       7   8        9          10
                                                                      Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                                                   Slide Set 4: Discussion, Classification 42
        Example: The Overfitting Phenomenon




                 Y




                                                         X

CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 43
        A Complex Model

                                              Y = high-order polynomial in X




                 Y




                                                                X

CS 175, Fall 2007: Professor Padhraic Smyth          Slide Set 4: Discussion, Classification 44
        The True (simpler) Model


                                                  Y = a X + b + noise




                 Y




                                                         X

CS 175, Fall 2007: Professor Padhraic Smyth   Slide Set 4: Discussion, Classification 45
        How Overfitting affects Prediction



      Predictive
        Error




                                              Error on Training Data


                                                    Model Complexity




CS 175, Fall 2007: Professor Padhraic Smyth            Slide Set 4: Discussion, Classification 46
        How Overfitting affects Prediction



      Predictive
        Error


                                                 Error on Test Data




                                              Error on Training Data


                                                    Model Complexity




CS 175, Fall 2007: Professor Padhraic Smyth            Slide Set 4: Discussion, Classification 47
        How Overfitting affects Prediction



      Predictive                       Underfitting                  Overfitting
        Error


                                                                            Error on Test Data




                                                                        Error on Training Data


                                                                              Model Complexity

                                                   Ideal Range
                                              for Model Complexity




CS 175, Fall 2007: Professor Padhraic Smyth                                        Slide Set 4: Discussion, Classification 48
Linear Classifiers
        Decision Boundaries

          • What is a Classifier?
                 – A classifier is a mapping from feature space (a d-dimensional
                   vector) to the class labels {1, 2, … m}
                 – Thus, a classifier partitions the feature space into m decision
                   regions
                 – A line or curve separating the classes is a decision boundary
                     • in more than 2 dimensions this is a surface (e.g., a
                        hyperplane)



          • Linear Classifiers
                 – a linear classifier is a mapping which partitions feature space using
                   a linear function (a straight line, or a hyperplane)
                 – it is one of the simplest classifiers we can imagine
                      • “separate the two classes using a straight line in feature space”
                 – in 2 dimensions the decision boundary is a straight line




CS 175, Fall 2007: Professor Padhraic Smyth                           Slide Set 4: Discussion, Classification 50
        2-Class Data with a Linear Decision Boundary

                                             TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
                                      8
                                                            Decision Region 2
                                             Decision
                                             Region 1
                                      6



                                      4
                          Feature 2




                                      2



                                      0



                                      -2
                                                                                           Decision
                                                                                           Boundary
                                      -4
                                        -4     -2       0     2       4       6   8   10       12        14
                                                                      Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                                                  Slide Set 4: Discussion, Classification 51
        Non-Linearly Separable Data, with Decision Boundary


                                               TWO-CLASS DATA IN A TWO-DIMENSIONAL FEATURE SPACE
                                      6
                                                                        Decision
                                                                        Region 1           Decision
                                      5                                                    Region 2


                                      4


                                      3
                          Feature 2




                                      2


                                      1


                                      0
                                                             Decision
                                                             Boundary
                                      -1
                                           2      3      4      5       6          7   8        9          10
                                                                    Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                                                    Slide Set 4: Discussion, Classification 52
        Convex Hull of a Set of Points

          •      Convex Hull of a set of Q points:
                 – Intuitively
                     • think of each point in Q as a nail sticking out from a 2d board
                     • the convex hull = the shape formed by a tight rubber band
                       that surrounds all the nails
                 – Formally: “the convex hull is the smallest convex polygon P for
                   which each point in Q is either on the boundary of P or in its
                   interior”
                         • (p.898, Cormen, Leiserson, and Rivest, Introduction to Algorithms)
                         • can be found (for n points) in time n log n


          • Relation to Class Overlap
                 –    define convex hulls of data points D1 and D2 as P1 and P2
                 –    If P1 and P2 do not intersect => D1 and D2 are linearly separable
                 –    if P1 and P2 intersect, then we have overlap
                 –    If P1 and P2 intersect then D1 and D2 are not linearly separable




CS 175, Fall 2007: Professor Padhraic Smyth                                  Slide Set 4: Discussion, Classification 53
        Convex Hull Example


      Feature 2




                                              Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth               Slide Set 4: Discussion, Classification 54
        Convex Hull Example

                                 Convex Hull P1
      Feature 2




                                                  Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 55
        Data from 2 Classes: Linearly Separable?


      Feature 2                                              x

                                                  x
                                                                           x

                                                         x
                                                                               x


                                                                    x
                                              x



                                                  Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                   Slide Set 4: Discussion, Classification 56
        Data from 2 Classes: Linearly Separable?

                                 Convex Hull P1
      Feature 2                                                  x

                                                      x
                                                                               x

                                                             x
                                                                                   x


                                                                        x
                                                  x



                                                      Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                       Slide Set 4: Discussion, Classification 57
        Data from 2 Classes: linearly separable?

                                 Convex Hull P1                                                    Convex Hull P2
      Feature 2                                                                         x

                                                                             x
                                                                                                      x

                                                                                    x
                                                                                                          x


                                                                                               x
                                              The 2 Hulls intersect      x
                                              => data from each class
                                              are not linearly separable
                                                                             Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                                              Slide Set 4: Discussion, Classification 58
        Different data that is linearly separable

                                 Convex Hull P1                             Convex Hull P2
      Feature 2                                                  x

                                                      x
                                                                               x

                                                             x
                                                                                   x


                                                                        x
                                                  x



                                                      Feature 1



CS 175, Fall 2007: Professor Padhraic Smyth                       Slide Set 4: Discussion, Classification 59
        Some Theory
          Let N be the number of data points

          Let d be the dimension of the data points

          Consider N points in general position and assume each point is labeled
          as belonging to class 1 or class 2

          There are 2N possible labelings

          Let F(N, d) = the fraction of labelings of N points in d dimensions
                        that are linearly separable

          It can be shown that:

                         =       1     if d > N-2
          F(N, d)
                         = (1 / 2 N-1 )       Sdi=0   (N-1)! / [ (N-1-i)! i! ] if N > d



CS 175, Fall 2007: Professor Padhraic Smyth                                     Slide Set 4: Discussion, Classification 60
        Fraction of Labellings in d-space that are Linearly Separable




        F(N,d)
        = fraction that
        are linearly                              d = infinity
        separable
                      1
                                                           d = 10


                            0.5
                                                                     d=1



                                0             1    2             3             N/(d+1)


CS 175, Fall 2007: Professor Padhraic Smyth                                Slide Set 4: Discussion, Classification 61
        Fraction of Labellings in d-space that are Linearly Separable




        F(N,d)                                        Note that for N <= d+1,
        = fraction that                               any labeling of N points in
        are linearly                                  d-dimensions is linearly separable
        separable                                     (e.g., N=3, d = 2 or N=50, d=100)
                      1



                            0.5




                                0             1   2     3            N/(d+1)


CS 175, Fall 2007: Professor Padhraic Smyth                      Slide Set 4: Discussion, Classification 62
        A Linear Classifier in 2 Dimensions

              Let Feature 1 be called X
              Let Feature 2 be called Y

              A linear classifier is a linear function of X and Y,
              i.e., it computes f(X,Y) = aX + bY + c

              Here a, b, and c are the “weights” of the classifier

              Define the output of the linear classifier to be
                       T(f) = -1, if f <= 0
                       T(f) = +1, if f > 0


                 if f(X,Y) <= 0, the classifier produces a “-1” (Decision Region 1)

                 if f(X,Y) > 0, the classifier produces a “+1” (Decision Region 2)




CS 175, Fall 2007: Professor Padhraic Smyth                           Slide Set 4: Discussion, Classification 63
        Decision Boundaries for a 2d Linear Classifier

              Depending on whether f(X,Y) is > or < 0, the features (X,Y) get
              classified into class 1 or class 2

              Thus, f(X,Y) = 0 must define the decision boundary between class 1 and 2




CS 175, Fall 2007: Professor Padhraic Smyth                         Slide Set 4: Discussion, Classification 64
        Decision Boundaries for a 2d Linear Classifier

              Depending on whether f(X,Y) is > or < 0, the features (X,Y) get
              classified into class 1 or class 2

              Thus, f(X,Y) = 0 must define the decision boundary between class 1 and 2


              What is the equation for this decision boundary?

                      f(X,Y) = aX + bY + c = 0   OR    Y = (c – aX)/b

              Thus, defining a, b, and c automatically locates the decision boundary in
              X,Y space

              In summary:
                     - a classifier defines a decision boundaries between classes
                     - for a linear classifier, this boundary is a line or a plane
                     - the equation of the plane is defined by the parameters of
                                the classifier


CS 175, Fall 2007: Professor Padhraic Smyth                         Slide Set 4: Discussion, Classification 65
        An Example of a Linear Decision Boundary

                             14
                                                          Decision
                                                          Region 1
                             12

                             10

                              8
                                                                                  Decision
                              6                                                   Region 2

                              4

                              2

                                                      Decision Boundary
                              0                       defined by
                                                      a = 1, b = -1, c = 0
                             -2

                             -4
                               -4        -2   0   2   4        6        8    10      12        14




CS 175, Fall 2007: Professor Padhraic Smyth                                        Slide Set 4: Discussion, Classification 66
        A Better Linear Decision Boundary

                             14

                             12                Decision
                                               Region 2

                             10     Decision
                                    Region 1
                              8

                              6

                              4

                              2

                              0

                                                                      Decision Boundary
                             -2
                                                                      defined by
                                                                      a = 1, b = 1, c = 0
                             -4
                               -4        -2     0         2   4   6        8      10          12        14




CS 175, Fall 2007: Professor Padhraic Smyth                                                 Slide Set 4: Discussion, Classification 67
        The Perceptron Classifier (for 2 features)



               X
                               w1


                              w2
               Y                              f = w1 X + w2 Y + w3     T(f)              {-1, +1}


                             w3                                      Threshold
                                                Weighted Sum         Function            Output
               1                                                                         = class
                                                of the inputs
                                                                                         decision




CS 175, Fall 2007: Professor Padhraic Smyth                               Slide Set 4: Discussion, Classification 68
        The Perceptron Classifier (for 2 features)



               X
                               w1


                              w2
               Y                              f = w1 X + w2 Y + w3          T(f)               {-1, +1}


                             w3                                         Threshold
                                                Weighted Sum            Function               Output
               1                                                                               = class
                                                of the inputs
                                                                                               decision

                                                        Note: weights w1, w2, w3,
                                                        are the same as a, b, c in the
                                                        previous slides, i.e., f = aX + bY + c


CS 175, Fall 2007: Professor Padhraic Smyth                                     Slide Set 4: Discussion, Classification 69
        Perceptrons

          • Perceptron = a linear classifier
                 – The w’s are the weights (denoted as a, b,c, earlier)
                    • real-valued constants (can be positive or negative)
                 – Define an additional constant input “1” (allows an intercept in
                   decision boundary)



          • A perceptron calculates 2 quantities:
                 – 1. A weighted sum of the input features
                 – 2. This sum is then thresholded by the T function



          • A simple artificial model of human neurons
                         • weights = “synapses”
                         • threshold = “neuron firing”




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 70
        Notation

          • Inputs:
                 – x1, x2, …………, xd, xd+1
                 – x1, x2, …………, xd-1, xd are the values of the d features
                 – xd+1 = 1 (a constant input)
                 – x = (x1, x2, …………, xd, xd+1 )


          • Weights:
                 – w1, w2, …………, wd, wd+1
                 – we have d+1 weights
                 – one for each feature + one for the constant
                 – w = (w1, w2, …………, wd, wd+1 )




CS 175, Fall 2007: Professor Padhraic Smyth                         Slide Set 4: Discussion, Classification 71
        Perceptron Operation

          • Equations of operation:

                                                =   1   (if w1x1 +… wd+1 xd+1 > 0)
                o[x1, x2,…, xd-1, xd]

                                                =   -1 (otherwise)

                Note that
                       w = (w1,….. wd+1) , the “weight vector” (row vector, 1 x d+1)

                    and x = (x1,…… xd+1), the “feature vector” (row vector, 1 x d+1)

                 =>             w1x1 + w2x2 +… wd+1 xd+1 = w . x’

                and w . x’ is the “vector inner product” (w*x’ or w.*x in MATLAB)


CS 175, Fall 2007: Professor Padhraic Smyth                            Slide Set 4: Discussion, Classification 72
        Vector Inner Product

                                                                            This is the transpose
                                 Note that
                                                                            of the row vector x
                                                                            (it becomes a column
                                       w . x’ = (w1,….. wd+1) (x1           vector)

                                                               x2
                                                                ..
                                                               ..
                                                               xd
                                                               xd+1 )
                                              = w1x1 + w2x2 +… wd+1 xd+1




CS 175, Fall 2007: Professor Padhraic Smyth                                Slide Set 4: Discussion, Classification 73
        Perceptron Decision Boundary

          • Equations of operation (in vector form):

                                              =   1   (if w . x’ > 0)
                 o(x1, x2,…, xd, xd+1)

                                              =   -1 (otherwise)

                 The perceptron represents a hyperplane decision surface
                 in d-dimensional space
                      e.g., a line in 2d, a plane in 3d, etc

                 The equation of the hyperplane is
                                     w . x’ = 0

                 This is the equation for points in x-space that are on the
                 boundary

CS 175, Fall 2007: Professor Padhraic Smyth                             Slide Set 4: Discussion, Classification 74
        Example of Perceptron Decision Boundary
                                              w = (w1, w2,w3)
                                                = (1, -1, 0)

                                        x2




                                                                x1




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 75
        Example of Perceptron Decision Boundary
                                              w = (w1, w2,w3)
                                                = (1, -1, 0)
                                                                w . x’ = 0
                                        x2
                                                                => 1. x1 - 1. x2 + 0.1 = 0

                                                                => x1 - x2 = 0

                                                                => x1 = x2




                                                                x1




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 76
        Example of Perceptron Decision Boundary
                                              w = (w1, w2,w3)
                                                = (1, -1, 0)
                                                                w . x’ = 0
                                        x2
                                                                => 1. x1 - 1. x2 + 0.1 = 0

                                                                => x1 - x2 = 0

                                                                => x1 = x2

                                                                This is the equation
                                                                for the decision boundary

                                                                x1




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 77
        Example of Perceptron Decision Boundary
                                              w = (w1, w2,w3)
     w . x’ < 0                                 = (1, -1, 0)
                                                                w . x’ = 0
     => x1 - x2 < 0                     x2

     => x1 < x2
     (this is the
     equation for
     decision
     region -1)




                                                                x1




CS 175, Fall 2007: Professor Padhraic Smyth                          Slide Set 4: Discussion, Classification 78
        Example of Perceptron Decision Boundary
                                                      w = (w1, w2,w3)
                                                        = (1, -1, 0)
                                                                        w . x’ = 0
                                        x2

                                              w . X’ < 0

                                                                                     w . x’ > 0

                                                                                     => x1 - x2 > 0

                                                                                     => x1 > x2
                                                                                     (this is the
                                                                                     equation for
                                                                        x1           decision
                                                                                     region +1)



CS 175, Fall 2007: Professor Padhraic Smyth                                  Slide Set 4: Discussion, Classification 79
        Representational Power of Perceptrons

          • What mappings can a perceptron represent perfectly?
                 –    A perceptron is a linear classifier
                 –    thus it can represent any mapping that is linearly separable
                 –    some Boolean functions like AND (on left)
                 –    but not Boolean functions like XOR (on right)




                          0                   x                  0             x


                           0                  0                  x               0




CS 175, Fall 2007: Professor Padhraic Smyth                            Slide Set 4: Discussion, Classification 80
        Summary


          • Review of Assignment 1

          • K-nearest-neighbor classifiers
                 – Basic concepts
                 – Assignment 2


          • Training and test accuracy

          • Linear classifiers
                 – Perceptron classifier


          • Next lecture
                 – How we can learn the weights of a perceptron




CS 175, Fall 2007: Professor Padhraic Smyth                       Slide Set 4: Discussion, Classification 81

						
Related docs
Other docs by noM8ZI4
aves protegidas
Views: 4  |  Downloads: 0
Publicado por: ale el 28/02/2007 09:34 AM
Views: 0  |  Downloads: 0
VAIKU IR MOKSLEIVIU � LIETUVIU LIAUDIES
Views: 461  |  Downloads: 0
AUTOMOVIL CLUB PERUANO - DOC
Views: 32  |  Downloads: 0
N�vel J�nior � Perfil 01
Views: 128  |  Downloads: 0
REGIONE PUGLIA
Views: 3  |  Downloads: 0
Anexo 4 Relaci n de Participantes des 9
Views: 31  |  Downloads: 0