Neural Networks

Document Sample

```					Neural Networks

Teacher:          Assistant:
Elena Marchiori     Kees Jong
R4.47             S2.22
elena@cs.vu.nl    cjong@cs.vu.nl
Neural Net types

Basics of neural network theory and practice for
supervised and unsupervised learning.

Most popular Neural Network models:
• architectures
• learning algorithms
• applications
Neural Networks

• A NN is a machine learning approach inspired by the way
in which the brain performs a particular learning task:
of examples.

– Inter neuron connection strengths (weights) are used to
store the acquired information (the training examples).

– During the learning process the weights are modified
in order to model the particular learning task correctly
on the training examples.
Connectionism
• Connectionist techniques (a.k.a. neural networks) are inspired
by the strong interconnectedness of the human brain.
• Neural networks are loosely modeled after the biological
processes involved in cognition:
1. Information processing involves many simple elements
called neurons.
2. Signals are transmitted between neurons using connecting
3. Each link has a weight that controls the strength of its signal.
4. Each neuron applies an activation function to the input that it
receives from other neurons. This function determines its
output.
What is a neural network

• A NN is a machine learning approach inspired by the way in
which the brain performs a particular learning task:
– Knowledge about the learning task is given in the form of
examples.

– Inter neuron connection strengths (weights) are used to
store the acquired information (the training examples).

– During the learning process the weights are modified in
order to model the particular learning task correctly on the
training examples.
What is a Neural Network

• A neural network is characterized by three things:
1. Its architecture: the pattern of nodes and
connections between them.
2. Its learning algorithm, or training method: the
method for determining the weights of the
connections.
3. Its activation function: the function that produces
an output based on the input values received by a
node.
Learning
• Supervised Learning
– Recognizing hand-written digits, pattern recognition,
regression.
– Labeled examples
(input , desired output)
– Neural Network models: perceptron, feed-forward, radial
basis function, support vector machine.
• Unsupervised Learning
– Find similar groups of documents in the web, content
– Unlabeled examples
(different realizations of the input alone)
– Neural Network models: self organizing maps, Hopfield
networks.
Network architectures

• Three different classes of network architectures

– single-layer feed-forward   neurons are organized
– multi-layer feed-forward      in acyclic layers
– recurrent

• The architecture of a neural network is linked with the learning
algorithm used to train
Single Layer Feed-forward

Input layer              Output layer
of                       of
source nodes               neurons
Multi layer feed-forward

3-4-2 Network

Input                       Output
layer                        layer

Hidden Layer
Recurrent network
Recurrent Network with hidden neuron(s): unit delay operator z-1 implies
dynamic system

z-1

input
z-1                                                        hidden
output

z-1
Neural Network Architectures
The Neuron
• The neuron is the basic information processing unit of a NN. It consists
of:
characterized by a weight:
W1, W2, …, Wm
2 An adder function (linear combiner) which computes
the weighted sum of                   the inputs:
m
u w xj j
j 1
3 Activation function (squashing function) for limiting
the amplitude of the output of the neuron.

y   (u  b)
The Neuron
Bias
b
x1       w1
Activation
Local   function
Field

               ()
Output
Input    x2       w2                 v                     y
signal

              Summing
function

xm       wm
Synaptic
weights
Bias as extra input
• Bias is an external parameter of the neuron. Can be
modeled by adding an extra input.               m
v   wj x j
w0                             j 0
x0 = +1
w0  b
x1            w1                                    Activation
Local         function
Field

                      ()
Input                                                                        Output
signal                                              v
x2            w2                                                   y
Summing
                           function

xm            wm Synaptic
weights
Bias of a Neuron

•   Bias b has the effect of applying an affine transformation to u

v=u+b
•   v is the induced field of the neuron

v

u
m
u   wjxj
j 1
Dimensions of a Neural Network

• Various types of neurons

• Various network architectures

• Various learning algorithms

• Various applications
Face Recognition

90% accurate learning head pose, and recognizing 1-of-20 faces
Handwritten digit recognition
A Multilayer Net for XOR
The XOR problem

A single-layer neural network cannot solve the XOR problem.
Input            Output
00      ->       0
01      ->       1
10      ->       1
11      ->       0
To see why this is true, we can try to express the problem as a
linear equation: aX + bY = Z
a0 + b0 = 0
a0 + b1 = 1 -> b = 1
a1 + b0 = 1 -> a = 1
a1 + b1 = 0 -> a = -b
But adding a third bit makes it
doable.

Input             Output
000      ->       0
010      ->       1
100      ->       1
111      ->       0
We can try to express the problem as a linear equation: aX + bY +
cZ = W
a0 + b0 + c0 = 0
a0 + b1 + c0 = 1 -> b=1
a1 + b0 + c0 = 1 -> a=1
a1 + b1 + c1 = 0 -> a + b + c = 0 -> 1 + 1 + c = 0 -> c = -2
So the equation: X + Y - 2Z = W will solve the problem.
Hidden Units

• Hidden units are a layer of nodes that are situated
between the input nodes and the output nodes.
• Hidden units allow a network to learn non-linear
functions.
• The hidden units allow the net to represent
combinations of the input features.
• Given too many hidden units, however, a net will
simply memorize the input patterns.
• Given too few hidden units, the network may not be
able to represent all of the necessary generalizations.
Backpropagation Nets

• Backpropagation networks are among the most popular and
widely used neural networks because they are relatively simple
and powerful.
• Backpropagation was one of the first general techniques
developed to train multilayer networks, which do not have many
of the inherent limitations of the earlier, single-layer neural nets
criticized by Minsky and Papert.
• Backpropagation networks use a gradient descent method to
minimize the total squared error of the output.
• A backpropagation net is a multilayer, feedforward network that
is trained by backpropagating the errors using the generalized
delta rule.
Training a backpropagation net

Feedforward training of input patterns
Each input node receives a signal, which is broadcast to all of
the hidden units.
Each hidden unit computes its activation, which is broadcast to
all of the output nodes.
Backpropagation of errors
Each output node compares its activation with the desired
output.
Based on this difference, the error is propagated back to all
previous nodes.
The weights of all links are computed simultaneously based on
the errors that were propagated backwards.
Terminology
Input vector: X = (x1 , x2 , ..., xn )
Target vector: Y = (y1 , y2 , ..., ym )
Input unit: X i
Hidden unit: Z i
Output unit: Y i
v ij : weight on link from X i to Z j
w ij : weight on link from Z i to Y j
v0j : bias on Z j
w0j : bias on Y j
i : error correction term for output unit Y i
 : learning rate
The Feedforward Stage

1. Initialize the weights with small, random values.
2. While the stopping condition is not true
For each training pair (input/output)
Each input unit broadcasts its value to all of the hidden
units.
Each hidden unit sums its input signals and applies its
activation function to compute its output signal.

Each hidden unit sends its signal to the output units.
Each output unit sums its input signals and applies its activation
function to compute its output signal.
Backpropagation

1. Each output unit updates its weights and bias:
w ij (new) = w ij (old) + w ij
2. Each hidden unit updates its weights and bias:
v ij (new) = v ij (old) +  v ij

Check stopping conditions.
Each training cycle is called an epoch. Typically, many
epochs are needed (often thousands). The weights
are updated in each cycle.
The learning rate
w ij (new) = j z i + w ij (old)
•   The learning rate, ff, controls how big the weight
changes are for each iteration.
•   Ideally, the learning rate should be infinitesimally
small, but then learning is very slow.
•   If the learning rate is too high then the system can
suffer from severe oscillations.
•   You want the learning rate to be as large as possible
(for fast learning) without resulting in oscillations.
(0.02 is common)
An Example: One Layer
Multi-Layer
The numbers

I1    1
I2    1
W13   .1
W14   -.2
W23   .3
W24   .4
W35   .5
W45   -.4
3    .2
4    -.3
5    .4
Output!
Where N= output of node
O= Activation function
 = threshold value of node
Backpropagating!
How long should you train the net?

• The goal is to achieve a balance between correct responses for
the training patterns and correct responses for new patterns.
(That is, a balance between memorization and generalization.)
• If you train the net for too long, then you run the risk of
overfitting to the training data.
• In general, the network is trained until it reaches an acceptable
error rate (e.g., 95%).
• One approach to avoid overfitting is to break up the data into a
training set and a training test set. The weight adjustments are
based on the training set. However, at regular intervals the test
set is evaluated to see if the error is still decreasing. When the
error begins to increase on the test set, training is terminated.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 18 posted: 2/17/2012 language: pages: 36
How are you planning on using Docstoc?