# Lecture 13 – Perceptrons and Neural Networks - CUNY

Document Sample

```					Lecture 13 – Perceptrons

Machine Learning
Last Time
• Hidden Markov Models
– Sequential modeling represented in a Graphical
Model

2
Today
• Perceptrons
– Neural Networks
– aka Multilayer Perceptron Networks
– But more accurately: Multilayer Logistic
Regression Networks

3
Review: Fitting Polynomial Functions
• Fitting nonlinear 1-D functions
• Polynomial:

• Risk:

4
Review: Fitting Polynomial Functions
• Order-D polynomial regression for 1D
variables is the same as D-dimensional linear
regression
• Extend the feature vector from a scalar.

• More generally

5
Neuron inspires Regression
• Graphical Representation of linear regression
– McCullough-Pitts Neuron
– Note: Not a graphical model

6
Neuron inspires Regression
• Edges multiply the signal (xi) by some weight (θi).
• Nodes sum inputs
• Equivalent to Linear Regression

7
Introducing Basis functions
• Graphical representation of feature extraction

• Edges multiply the signal by a weight.
• Nodes apply a function, ϕd.

8
Extension to more features
• Graphical representation of feature extraction

9
Combining function
• How do we construct the neural output

Linear Neuron

10
Combining function
• Sigmoid function or Squashing function

Classification using the
Logistic Neuron
same metaphor
11
Logistic Neuron optimization
• Minimizing R(θ) is more difficult

Bad News: There is no “closed-form” solution.

Good News: It’s convex.

12
Aside: Convex Regions
• Convex: for any pair of points xa and xb within
a region, every point xc on a line between xa
and xb is in the region

13
Aside: Convex Functions
• Convex: for any pair of points xa and xb within
a region, every point xc on a line between xa
and xb is in the region

14
Aside: Convex Functions
• Convex: for any pair of points xa and xb within
a region, every point xc on a line between xa
and xb is in the region

15
Aside: Convex Functions
• Convex functions have a single maximum and
minimum!
• How does this help us?
• (nearly) Guaranteed optimality of Gradient

Descent

16
• The Gradient is defined (though we can’t solve
directly)
• Points in the direction of fastest increase

17
• Gradient points in the direction of fastest
increase
• To minimize R, move in the opposite direction

18
• Gradient points in the direction of fastest
increase
• To minimize R, move in the opposite direction

19
• Initialize Randomly
• Update with small steps
• (nearly) guaranteed to converge to the
minimum

20
• Initialize Randomly
• Update with small steps
• (nearly) guaranteed to converge to the
minimum

21
• Initialize Randomly
• Update with small steps
• (nearly) guaranteed to converge to the
minimum

22
• Initialize Randomly
• Update with small steps
• (nearly) guaranteed to converge to the
minimum

23
• Initialize Randomly
• Update with small steps
• (nearly) guaranteed to converge to the
minimum

24
• Initialize Randomly
• Update with small steps
• (nearly) guaranteed to converge to the
minimum

25
• Initialize Randomly
• Update with small steps
• (nearly) guaranteed to converge to the
minimum

26
• Initialize Randomly
• Update with small steps
• Can oscillate if η is too large

27
• Initialize Randomly
• Update with small steps
• Can oscillate if η is too large

28
• Initialize Randomly
• Update with small steps
• Can oscillate if η is too large

29
• Initialize Randomly
• Update with small steps
• Can oscillate if η is too large

30
• Initialize Randomly
• Update with small steps
• Can oscillate if η is too large

31
• Initialize Randomly
• Update with small steps
• Can oscillate if η is too large

32
• Initialize Randomly
• Update with small steps
• Can oscillate if η is too large

33
• Initialize Randomly
• Update with small steps
• Can oscillate if η is too large

34
• Initialize Randomly
• Update with small steps
• Can oscillate if η is too large

35
• Initialize Randomly
• Update with small steps
• Can stall if    is ever 0 not at the minimum

36
Back to Neurons

Linear Neuron

Logistic Neuron

37
Perceptron
• Classification squashing function

Perceptron

• Strictly classification error
38
Classification Error
• Only count errors when a classification is
incorrect.

greater than
zero error on correct
classifications

39
Classification Error
• Only count errors when a classification is
incorrect.

Perceptrons use
strictly classification
error

40
Perceptron Error
• Can’t do gradient descent on this.

Perceptrons use
strictly classification
error

41
Perceptron Loss
• With classification loss:
• Define Perceptron Loss.
– Loss calculated for each misclassified data point

vs

– Now piecewise linear risk rather than step function

42
Perceptron Loss
• Perceptron Loss.
– Loss calculated for each misclassified data point

43
Perceptron vs. Logistic Regression
• Logistic Regression has a hard
time eliminating all errors
– Errors: 2

• Perceptrons often do better.
Increased importance of
errors.
– Errors: 0

44
points, and then updating.

• Update the gradient for each misclassified
point at training

• Update rule:
45
Online Perceptron Training
• Online Training
– Update weights for each data point.
• Iterate over points xi point,
– If xi is correctly classified
– Else
• Theorem: If xi in X are linearly separable, then
this process will converge to a θ* which leads
to zero error in a finite number of steps.

46
Linearly Separable
• Two classes of points are linearly separable,
iff there exists a line such that all the points of
one class fall on one side of the line, and all
the points of the other class fall on the other
side of the line

47
Linearly Separable
• Two classes of points are linearly separable,
iff there exists a line such that all the points of
one class fall on one side of the line, and all
the points of the other class fall on the other
side of the line

48
Next
• Multilayer Neural Networks

49

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 0 posted: 4/1/2013 language: Unknown pages: 49
How are you planning on using Docstoc?