iai11-28 by yaofenji

VIEWS: 0 PAGES: 49

• pg 1
```									       CPN Distance/Similarity Functions
In the hidden layer, the neuron whose weight vector is
most similar to the current input vector is the “winner.”
There are different ways of defining such maximal
similarity, for example:
(1) Maximal cosine similarity (same as net input):
s(w, x)  w  x
(2) Minimal Euclidean distance:
d (w, x)   wi  xi 
2

i
(no square root necessary for determining the winner)
November 28, 2012         Introduction to Artificial Intelligence     1
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ +                                 2
+
3
o
o o
x                                 1
x                                                 o
x x

November 28, 2012             Introduction to Artificial Intelligence         2
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ +                                 2
+
3
o
o o
x                                 1
x                                                 o
x x

November 28, 2012             Introduction to Artificial Intelligence         3
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ +                                 2
+
o
3                                              o o
x                                 1
x                                                 o
x x

November 28, 2012             Introduction to Artificial Intelligence         4
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ +                                 2
+
o
3                                              o o
x                                 1
x                                                 o
x x

November 28, 2012             Introduction to Artificial Intelligence         5
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ +
+
2
o
3                                              o o
x                                 1
x                                                 o
x x

November 28, 2012             Introduction to Artificial Intelligence         6
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ +
+
2
o
3                                              o o
x                                 1
x                                                 o
x x

November 28, 2012             Introduction to Artificial Intelligence         7
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ +
+
2
1                    o
3                                              o o
x
x                                                 o
x x

November 28, 2012             Introduction to Artificial Intelligence         8
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ +
+
2
1                    o
3                                              o o
x
x                                                 o
x x

November 28, 2012             Introduction to Artificial Intelligence         9
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ 1 +
+
2
o
3                                              o o
x
x                                                 o
x x

November 28, 2012             Introduction to Artificial Intelligence         10
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ 1 +
+
2
o
3                                              o o
x
x                                                 o
x x

November 28, 2012             Introduction to Artificial Intelligence         11
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ 1 +
+
2
o
o o
x
o
3x
x x

November 28, 2012            Introduction to Artificial Intelligence         12
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ 1 +
+
2
o
o o
x
o
3x
x x

November 28, 2012            Introduction to Artificial Intelligence         13
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ 1 +
+
o
o o 2
x
o
3x
x x

November 28, 2012            Introduction to Artificial Intelligence      14
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance

… and so on,
possibly with reduction of the learning rate…

November 28, 2012         Introduction to Artificial Intelligence     15
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
Example of competitive learning with three hidden neurons:

+
+
+ 1 +
+
o
o o
x                                                        2
x                                                o
x 3x

November 28, 2012             Introduction to Artificial Intelligence     16
Lecture 18: Neural Network Application Design I
The Counterpropagation Network
After the first phase of the training, each hidden-layer
neuron is associated with a subset of input vectors.
The training process minimized the average angle
difference or Euclidean distance between the weight
vectors and their associated input vectors.
In the second phase of the training, we adjust the
weights in the network’s output layer in such a way
that, for any winning hidden-layer unit, the network’s
output is as close as possible to the desired output for
the winning unit’s associated input vectors.
The idea is that when we later use the network to
compute functions, the output of the winning hidden-
layer unit is 1, and the output of all other hidden-layer
units is 0.
November 28, 2012         Introduction to Artificial Intelligence     17
Lecture 18: Neural Network Application Design I
Counterpropagation – Euclidean Distance
At the end of the output-layer learning process, the outputs of
the network are at the center of gravity of the desired outputs of
the winner neuron.

x                                            o
o 2o
x
x 3                                      o
x

+ +
+ 1+
+

November 28, 2012               Introduction to Artificial Intelligence     18
Lecture 18: Neural Network Application Design I

Neural Network
Application Design

November 28, 2012           Introduction to Artificial Intelligence     19
Lecture 18: Neural Network Application Design I
NN Application Design
Now that we got some insight into the theory of
artificial neural networks, how can we design networks
for particular applications?
Designing NNs is basically an engineering task.
As we discussed before, for example, there is no
formula that would allow you to determine the optimal
number of hidden units in a BPN for a given task.

November 28, 2012           Introduction to Artificial Intelligence     20
Lecture 18: Neural Network Application Design I
NN Application Design
We need to address the following issues for a
successful application design:
• Choosing an appropriate data representation
• Performing an exemplar analysis
• Training the network and evaluating its performance

We are now going to look into each of these topics.

November 28, 2012           Introduction to Artificial Intelligence     21
Lecture 18: Neural Network Application Design I
Data Representation
• Most networks process information in the form of
input pattern vectors.
• These networks produce output pattern vectors
that are interpreted by the embedding application.
• All networks process one of two types of signal
components: analog (continuously variable) signals
or discrete (quantized) signals.
• In both cases, signals have a finite amplitude; their
amplitude has a minimum and a maximum value.

November 28, 2012          Introduction to Artificial Intelligence     22
Lecture 18: Neural Network Application Design I
Data Representation
max

analog

min

max

discrete

min

November 28, 2012          Introduction to Artificial Intelligence       23
Lecture 18: Neural Network Application Design I
Data Representation
The main question is:
How can we appropriately capture these signals and
represent them as pattern vectors that we can feed
into the network?
We should aim for a data representation scheme that
maximizes the ability of the network to detect (and
respond to) relevant features in the input pattern.
Relevant features are those that enable the network to
generate the desired output pattern.

November 28, 2012          Introduction to Artificial Intelligence     24
Lecture 18: Neural Network Application Design I
Data Representation

Similarly, we also need to define a set of desired
outputs that the network can actually produce.
Often, a “natural” representation of the output data
turns out to be impossible for the network to produce.
We are going to consider internal representation
and external interpretation issues as well as specific
methods for creating appropriate representations.

November 28, 2012          Introduction to Artificial Intelligence     25
Lecture 18: Neural Network Application Design I
Internal Representation Issues
As we said before, in all network types, the amplitude
of input signals and internal signals is limited:
• analog networks: values usually between 0 and 1
• binary networks: only values 0 and 1allowed
• bipolar networks: only values –1 and 1allowed
Without this limitation, patterns with large amplitudes
would dominate the network’s behavior.
A disproportionately large input signal can activate a
neuron even if the relevant connection weight is very
small.
November 28, 2012         Introduction to Artificial Intelligence     26
Lecture 18: Neural Network Application Design I
External Interpretation Issues
From the perspective of the embedding application,
we are concerned with the interpretation of input and
output signals.
These signals constitute the interface between the
embedding application and its NN component.
Often, these signals only become meaningful when
we define an external interpretation for them.
This is analogous to biological neural systems: The
same signal becomes completely different meaning
when it is interpreted by different brain areas (motor
cortex, visual cortex etc.).

November 28, 2012         Introduction to Artificial Intelligence     27
Lecture 18: Neural Network Application Design I
External Interpretation Issues
Without any interpretation, we can only use standard
methods to define the difference (or similarity)
between signals.
For example, for binary patterns x and y, we could…
• … treat them as binary numbers and compute
their difference as | x – y |
• … treat them as vectors and use the cosine of the
angle between them as a measure of similarity
• … count the numbers of digits that we would have
to flip in order to transform x into y (Hamming
distance)
November 28, 2012         Introduction to Artificial Intelligence     28
Lecture 18: Neural Network Application Design I
External Interpretation Issues
Example: Two binary patterns x and y:
x = 00010001011111000100011001011001001
y = 10000100001000010000100001000011110
These patterns seem to be very different from each
other. However, given their external interpretation…

x                                y

…x and y actually represent the same thing.
November 28, 2012         Introduction to Artificial Intelligence     29
Lecture 18: Neural Network Application Design I
Creating Data Representations
The patterns that can be represented by an ANN most
easily are binary patterns.
Even analog networks “like” to receive and produce
binary patterns – we can simply round values < 0.5 to
0 and values  0.5 to 1.
To create a binary input vector, we can simply list all
features that are relevant to the current task.
Each component of our binary vector indicates
whether one particular feature is present (1) or
absent (0).

November 28, 2012         Introduction to Artificial Intelligence     30
Lecture 18: Neural Network Application Design I
Creating Data Representations
With regard to output patterns, most binary-data
applications perform classification of their inputs.
The output of such a network indicates to which class
of patterns the current input belongs.
Usually, each output neuron is associated with one
class of patterns.
For any input, only one output neuron should be
active (1) and the others inactive (0), indicating the
class of the current input.

November 28, 2012         Introduction to Artificial Intelligence     31
Lecture 18: Neural Network Application Design I
Creating Data Representations
In other cases, classes are not mutually exclusive,
and more than one output neuron can be active at the
same time.
Another variant would be the use of binary input
patterns and analog output patterns for
“classification”.
In that case, again, each output neuron corresponds
to one particular class, and its activation indicates the
probability (between 0 and 1) that the current input
belongs to that class.

November 28, 2012         Introduction to Artificial Intelligence     32
Lecture 18: Neural Network Application Design I
Creating Data Representations
Tertiary (and n-ary) patterns can cause more
problems than binary patterns when we want to format
them for an ANN.
For example, imagine the tic-tac-toe game.
Each square of the board is in one of three different
states:
• occupied by an X,
• occupied by an O,
• empty

November 28, 2012         Introduction to Artificial Intelligence     33
Lecture 18: Neural Network Application Design I
Creating Data Representations
Let us now assume that we want to develop a network
that plays tic-tac-toe.
This network is supposed to receive the current
game configuration as its input.
Its output is the position where the network wants to
place its next symbol (X or O).

Obviously, it is impossible to represent the state of
each square by a single binary value.

November 28, 2012         Introduction to Artificial Intelligence     34
Lecture 18: Neural Network Application Design I
Creating Data Representations
Possible solution:
• Use multiple binary inputs to represent non-binary
states.
• Treat each feature in the pattern as an individual
subpattern.
• Represent each subpattern with as many positions
(units) in the pattern vector as there are possible
states for the feature.
• Then concatenate all subpatterns into one long
pattern vector.
November 28, 2012         Introduction to Artificial Intelligence     35
Lecture 18: Neural Network Application Design I
Creating Data Representations
Example:
• X is represented by the subpattern 100
• O is represented by the subpattern 010
• <empty> is represented by the subpattern 001
• The squares of the game board are enumerated as
follows:
1 2 3
4 5 6
7 8 9
November 28, 2012          Introduction to Artificial Intelligence     36
Lecture 18: Neural Network Application Design I
Creating Data Representations
Then consider the following board configuration:

X X
O O X
O

It would be represented by the following binary string:
100 100 001 010 010 100 001 001 010

Consequently, our network would need a layer of 27
input units.
November 28, 2012         Introduction to Artificial Intelligence     37
Lecture 18: Neural Network Application Design I
Creating Data Representations
And what would the output layer look like?
Well, applying the same principle as for the input, we would use
nine units to represent the 9-ary output possibilities.
Considering the same enumeration scheme:
1 2 3
4 5 6
7 8 9

Our output layer would have nine neurons, one for each
position.
To place a symbol in a particular square, the corresponding
neuron, and no other neuron, would fire (1).
November 28, 2012         Introduction to Artificial Intelligence     38
Lecture 18: Neural Network Application Design I
Creating Data Representations
But…

Would it not lead to a smaller, simpler network if we
used a shorter encoding of the non-binary states?
We do not need 3-digit strings such as 100, 010, and
001, to represent X, O, and the empty square,
respectively.
We can achieve a unique representation with 2-digits
strings such as 10, 01, and 00.

November 28, 2012         Introduction to Artificial Intelligence     39
Lecture 18: Neural Network Application Design I
Creating Data Representations
Similarly, instead of nine output units, four would
suffice, using the following output patterns to indicate
a square:

0000 0001 0010

0100 0101 0110

1000 1001 1010

November 28, 2012         Introduction to Artificial Intelligence     40
Lecture 18: Neural Network Application Design I
Creating Data Representations
The problem with such representations is that the
meaning of the output of one neuron depends on the
output of other neurons.
This means that each neuron does not represent
(detect) a certain feature, but groups of neurons do.
In general, such functions are much more difficult to
learn.
Such networks usually need more hidden neurons
and longer training, and their ability to generalize is
weaker than for the one-neuron-per-feature-value
networks.

November 28, 2012         Introduction to Artificial Intelligence     41
Lecture 18: Neural Network Application Design I
Creating Data Representations
On the other hand, sets of orthogonal vectors (such
as 100, 010, 001) can be processed by the network
more easily.
This becomes clear when we consider that a neuron’s
input signal is computed as the inner product of the
input and weight vectors.
The geometric interpretation of these vectors shows
that orthogonal vectors are especially easy to
discriminate for a single neuron.

November 28, 2012         Introduction to Artificial Intelligence     42
Lecture 18: Neural Network Application Design I
Creating Data Representations
Another way of representing n-ary data in a neural
network is using one neuron per feature, but scaling
the (analog) value to indicate the degree to which a
feature is present.
Good examples:
• the brightness of a pixel in an input image
• the output of an edge filter
Poor examples:
• the letter (1 – 26) of a word
• the type (1 – 6) of a chess piece
November 28, 2012         Introduction to Artificial Intelligence     43
Lecture 18: Neural Network Application Design I
Creating Data Representations
This can be explained as follows:
The way NNs work (both biological and artificial ones)
is that each neuron represents the
presence/absence of a particular feature.
Activations 0 and 1 indicate absence or presence of
that feature, respectively, and in analog networks,
intermediate values indicate the extent to which a
feature is present.
Consequently, a small change in one input value
leads to only a small change in the network’s
activation pattern.

November 28, 2012         Introduction to Artificial Intelligence     44
Lecture 18: Neural Network Application Design I
Creating Data Representations
Therefore, it is appropriate to represent a non-binary feature by
a single analog input value only if this value is scaled, i.e., it
represents the degree to which a feature is present.
This is the case for the brightness of a pixel or the output of an
edge detector.
It is not the case for letters or chess pieces.
For example, assigning values to individual letters (a = 0, b =
0.04, c = 0.08, …, z = 1) implies that a and b are in some way
more similar to each other than are a and z.
Obviously, in most contexts, this is not a reasonable
assumption.

November 28, 2012         Introduction to Artificial Intelligence     45
Lecture 18: Neural Network Application Design I
Creating Data Representations
It is also important to notice that, in artificial (not
natural!), completely connected networks the order of
features that you specify for your input vectors does
not influence the outcome.
For the network performance, it is not necessary to
represent, for example, similar features in
neighboring input units.
All units are treated equally; neighborhood of two
neurons does not imply to the network that these
represent similar features.
Of course once you specified a particular order, you
cannot change it any more during training or testing.
November 28, 2012         Introduction to Artificial Intelligence     46
Lecture 18: Neural Network Application Design I
Creating Data Representations
If you wanted to represent the state of each square on
the tic-tac-toe board by one analog value, which
would be the better way to do this?

<empty>              =0                                 X                =0
X                    = 0.5                              <empty>          = 0.5
O                    =1                                 O                =1
Not a good scale!                                    More natural scale!
Goes from “neutral” to                               Goes from “friendly” to
“friendly” and then                                  “neutral” and then
“hostile”.                                           “hostile”.
November 28, 2012           Introduction to Artificial Intelligence             47
Lecture 18: Neural Network Application Design I
Representing Time
So far we have only considered static data, that is,
data that do not change over time.
How can we format temporal data to feed them into
an ANN in order to detect spatiotemporal patterns or
even predict future states of a system?
The basic idea is to treat time as another input
dimension.
Instead of just feeding the current data (time t0) into
our network, we expand the input vectors to contain
n data vectors measured at t0, t0 - t, t0 - 2t, t0 - 3t,
…, t0 – (n – 1)t.
November 28, 2012         Introduction to Artificial Intelligence     48
Lecture 18: Neural Network Application Design I
Representing Time
For example, if we want to predict stock prices based
on their past values (although other factors also play a
role):

\$1,000

?

\$0                                                                                  t
t0-6t t0-5t t0-4t t0-3t t0-2t t0-t                             t0   t0+t

November 28, 2012         Introduction to Artificial Intelligence              49
Lecture 18: Neural Network Application Design I

```
To top