# Decision-Based NN _DBNN_

Document Sample

```					Supervised Learning:
Linear Perceptron NN
Distinction Between Approximation-
Based vs. Decision-Based NNs
•Teacher in Approximation-Based NN       are
quantitative in real or complex values

•Teacher in Decision-Based NNs are symbols,
Decision-Based NN
(DBNN)
•Linear Perceptron
•Discriminant function (Score function)
•Reinforced and Anti-reinforced
Learning Rules
•Hierarchical and Modular Structures
next
pattern

incorrect
/correct
classes

f1(x,w)   f2(x,w)   fM(x,w)
Supervised Learning:
Linear Perceptron NN
Two-Classes:
Linear Perceptron Learning Rule

fj(x,wj ) = (xTwj+w0) = zTŵj (= zTw)
▽fj      ( z, wj ) = z
Upon the presentation of the m-th training pattern
z(m) , the weight vector w(m) is updated as

w(m+1) = w(m) + h (t (m) - d (m) ) z(m)
where h is a positive learning rate.
Linear Perceptron:
Convergence Theorem
(Two Classes)

If a set of training patterns is
linearly separable, then the
linear perceptron learning
algorithm converges to a correct
solution in a finite number of
iterations.
w(m+1) = w(m) + h (t (m) - d (m) ) z(m)

It converges when learning rate h is small enough.
Multiple Classes

strongly linearly separable   linearly separable
Linear Perceptron
Convergence Theorem
(Multiple Classes)

If the given multiple-class training set is
linearly separable, then the linear
perceptron learning algorithm converges
to a correct solution after a finite number
of iterations.
Multiple Classes:Linear
Perceptron Learning Rule

(linearly separability)
P1j= [ z 0   0 … -z 0 … 0]
DBNN Structure for
Nonlinear Discriminant Function
y

MAXNET

f1(x,w)   f2(x,w)   f3(x,w)

x
DBNN
teacher           Training if teacher
indicates the need

y

MAXNET

w1             w2       w3

x
Decision-based learning rule is based on a minimal updating
principle. The rule tends to avoid or minimize unnecessary side-
effects due to overtraining.
•One scenario is that the pattern is already correctly classified by
the current network, then there will be no updating attributed to that
pattern, and the learning process will proceed with the next training
pattern.
•The second scenario is that the pattern is incorrectly classified to
another winning class. In this case, parameters of two classes must
be updated. The score of the winning class should be reduced, by
the anti-reinforced learning rule, while the score of the correct (but
not winning) class should be enhanced by the reinforced learning
rule.
Reinforced and
Anti-reinforced Learning
Suppose that the m -th training patternn x(m) ,

x(m) is known to belong to the i-th class.
The leading challenger is denoted by

j = arg maxi≠j φ( x(m), Θj )
Reinforced Learning          D wi = + h wi fi ( x, w)

Anti-Reinforced Learning     D wj = - h wj fj ( x, w)
For Simple RBF Discriminant Function

fj(x, wj ) = .5(x - wj         )2

▽fj(x,       wj ) = (x - wj)
Upon the presentation of the m-th training pattern
z(m) , the weight vector w(m) is updated as
Reinforced Learning        D wi = + h( x - wi)

Anti-Reinforced Learning   D wj = - h( x - wj)
Decision-Based Learning Rule

The learning scheme of the DBNN
consists of two phases:
• locally unsupervised learning.
• globally supervised learning.
Locally Unsupervised Learning
Via VQ or EM Clustering Method
Several approaches can be used to estimate the number of hidden
nodes or the initial clustering can be determined based on VQ or EM
clustering methods.

• EM allows the final decision to incorporate prior
information. This could be instrumental to multiple-
expert or multiple-channel information fusion.
1
5
x 10
2nd Principal Components

4
0.5
1         2             3

0

-0.5

-1                                                           5
2         2.5   3       3.5   4       4.5   5       5.5 x 10
Globally Supervised Learning Rules
•The objective of learning is minimum classification
error (not maximum likelihood estimation) .
•Inter-class mutual information is used to fine tune the
decision boundaries (i.e., the globally supervised
learning).

•In this phase, DBNN applies reinforced-antireinforced
learning rule [Kung95] , or discriminative learning rule
[Juang92] , to adjust network parameters. Only
misclassified patterns need to be involved in this
training phase.
Pictorial Presentation of Hierarchical DBNN
c
c                                           c
c                   c                 c
cc c                                        c
c                                       c           c                           c c
c       c                                                                   c
c cc          c
c
b                               b
b b                                b         b           b
a a                                                                                     b      b bb
a   a a
b       b           b            b bb b
a a a                                                                   b                        b
a                                                     b             b
a                                 a                                 b
a a
a                         b       b
b
a                           a
a                             b
aa         a a
a a a
a
a
Discriminant function
(Score function)
•LBF Function (or Mixture of)
•RBF Function (or Mixture of)
•Prediction Error Function
• Likelihood Function : HMM
Hierarchical and Modular DBNN
•Subcluster DBNN
•Probabilistic DBNN
•Local Experts via K-mean or EM
•Reinforced and Anti-reinforced
Learning
Subcluster DBNN

MAXNET
Subcluster DBNN
Subcluster Decision-Based Learning Rule
Probabilistic DBNN
Probabilistic DBNN

MAXNET
Probabilistic DBNN
Probabilistic DBNN

MAXNET
Subnetwork of a Probabilistic DBNN
is basically a mixture of local experts
P(y|x,fk)

P(y|x,q1)                      P(y|x,q3)
P(y|x,q2)

RBF              RBF              RBF

x
k-th subnetwork
Probabilistic Decision-Based Neural Networks
Training of Probabilistic DBNN
•Selection of initial local experts:
Intra-class training

Unsupervised training

EM (Probabilistic) Training
•Training of the experts:
Inter-class training

Supervised training

Reinforced and Anti-reinforced Learning
Probabilistic Decision-Based Neural Networks
Training procedure
x(t)    Feature Vectors         x(t)
Class ID

K-means                       Classification
j                          Misclassified
vectors

K-NNs                          Reinforced
j                   Learning
EM                                                 N
Converge ?
{ j ,  j , P ( j )}                      Y
{      j   ,  j , P ( j ), T }
Locally Unsupervised Phase        Globally supervised Phase
Probabilistic Decision-Based Neural Networks
2-D Vowel Problem:

hid                                                                          hid
3500                                                  hod                    3500                                                  hod
heard                                                                        heard
who'd                                                                        who'd
hawed                                                                        hawed
hud                                                                          hud
3000                                                 heed                    3000                                                 heed
hood                                                                         hood

2500                                                                         2500
F2(Hz)

F2(Hz)
2000                                                                         2000

1500
1500

1000
1000

500
500

0   200   400   600            800   1000   1200      1400
F1(Hz)                                                 0   200   400   600            800   1000   1200      1400
F1(Hz)

GMM                                                                        PDBNN
Difference of MOE and DBNN
For MOE, the influence from the training patterns on each
expert is regulated by the gating network (which itself is under
training) so that as the training goes, the training patterns will
have higher influence on the closer-by experts, and lower
influence on the far-away ones. (The MOE updates all the
classes.)
Unlike the MOE, the DBNN makes use of both unsupervised
(EM-type) and supervised (decision-based) learning rules.
The DBNN uses only mis-classified training patterns for its
globally supervised learning. The DBNN updates only the
``winner" class and the class which the mis-classified pattern
actually belongs to. Its training strategy is to abide by a
``minimal updating principle“.
DBNN/PDBNN Applications
•   OCR (DBNN)
•   Texture Segmentation(DBNN)
•   Mammogram Diagnosis (PDBNN)
•   Face Detection(PDBNN)
•   Face Recognition (PDBNN)
•   Money Recognition(PDBNN)
•   Multimedia Library(DBNN)
OCR Classification (DBNN)
Image Texture Classification
(DBNN)
Face Detection (PDBNN)
Face Recognition (PDBNN)
show movies
Multimedia Library(PDBNN)
MatLab Assignment #4: DBNN to separate 2 classes

ratio=2:1

•RBF DBNN with 4 centroids per class

•RBF DBNN with 4 centroids and 6 centroids for
green and blue classes respectively.
RBF-BP NN for Dynamic
Resource Allocation

•use content to determine renegotiation
time
•use content/ST-traffic to estimate how
much resource to request
Neural network traffic predictor yields
smaller prediction MSE and higher link utilization.
Intelligent Media Agent
Modern information technology in the
internet era should support interactive
and intelligent processing that
transforms and transfers information.
Integration of signal processing and
neural net techniques could be a
versatile tool to a broad spectrum of
multimedia applications.
EM Applications
•Uncertain Clustering/ Model
Expert 1           Expert 2

*

•Channel Confidence
Channel 1
*

Channel 2
*
Channel Fusion
Sensor = Channel = Expert

channel    channel

classes-in-channel network
Sensor Fusion

Human
Sensory
Modalities

“Ba”

“Ga”
Computer                            Da.
Sensory
Modalities
Fusion Example

Toy Car Recognition
Probabilistic Decision-Based Neural Networks
Probabilistic Decision-Based Neural Networks
Training procedure
x(t)    Feature Vectors         x(t)
Class ID

K-means                       Classification
j                          Misclassified
vectors

K-NNs                          Reinforced
j                   Learning
EM                                                 N
Converge ?
{ j ,  j , P ( j )}                      Y
{      j   ,  j , P ( j ), T }
Locally Unsupervised Phase        Globally supervised Phase
Probabilistic Decision-Based Neural Networks
2-D Vowel Problem:

hid                                                                          hid
3500                                                  hod                    3500                                                  hod
heard                                                                        heard
who'd                                                                        who'd
hawed                                                                        hawed
hud                                                                          hud
3000                                                 heed                    3000                                                 heed
hood                                                                         hood

2500                                                                         2500
F2(Hz)

F2(Hz)
2000                                                                         2000

1500
1500

1000
1000

500
500

0   200   400   600            800   1000   1200      1400
F1(Hz)                                                 0   200   400   600            800   1000   1200      1400
F1(Hz)

GMM                                                                        PDBNN
References:
[1] Lin, S.H., Kung, S.Y. and Lin, L.J. (1997). “Face recognition/detection by
probabilistic decision-based neural network, IEEE Trans. on Neural Networks, 8
(1), pp. 114-132.

[2] Mak, M.W. et al. (1994), “Speaker Identification using Multi Layer Perceptrons
and Radial Basis Functions Networks,” Neurocomputing, 6 (1), 99-118.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 6 posted: 10/27/2011 language: English pages: 67