Docstoc

NEURAL NETWORK

Document Sample
NEURAL NETWORK Powered By Docstoc
					              ARTIFICIAL NEURAL NETWORKS




                             A

                    SEMINAR REPORT

                            ON

     ARTIFICIAL NEURAL NETWORKS




                                  From the department of Computer Science

                                  College of Natural Sciences (COLNAS)

                                  University of Agriculture, Abeokuta (UNAAB)



ARTIFICIAL INYELLIGENCE          BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 1
                            ARTIFICIAL NEURAL NETWORKS


A SEMINAR REPORT ON ARTIFICIAL NEURAL NETWORKS CSC(427) ............................ 1
INTRODUCTION ................................................................................................................................ 4
THE BRAIN,NEURAL NETWORKS AND COMPUTERS ........................................................... 6
   The human brain ............................................................................................................................... 6
   How the human brain learns ............................................................................................................ 6
   ARTIFICIAL NEURONES ............................................................................................................. 7
   Neural network behaviour ................................................................................................................ 8
   Unsupervised learning ...................................................................................................................... 8
   Human neural networks versus conventional computers .............................................................. 9
APPLICATION OF ARTIFICIAL NEURAL NET ......................................................................... 10
   Real life applications ...................................................................................................................... 10
   Neural network software ................................................................................................................ 10
   Learning paradigms ........................................................................................................................ 10
CURRENT RESEARCHES .............................................................................................................. 12
   Culture and Research...................................................................................................................... 12
   Fundamentals of wavelet theory .................................................................................................... 13
   Wavelet neural networks ................................................................................................................ 13
   Wavelet back propagation neural networks .................................................................................. 14
   Competitive neural ......................................................................................................................... 15
   Parallel wavelet back propagation neural networks ..................................................................... 16
   ERROR ESTIMATION ................................................................................................................. 16
TYPES OF MODELS ........................................................................................................................ 17
   THE MULTI LAYER FEED FORWARD NEURAL NETWORK (MLFFNN) ...................... 17
   Initializing the weights and training sample ................................................................................. 18
   The modified back propagation algorithm .................................................................................... 19
   Different learning rate annealing schedules .................................................................................. 19
   Terminating condition .................................................................................................................... 20
   The test sample ............................................................................................................................... 20
   SELF ORGANIZING MAPS ........................................................................................................ 20

ARTIFICIAL INYELLIGENCE                                                           BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 2
                             ARTIFICIAL NEURAL NETWORKS
   Unsupervised learning .................................................................................................................... 20
   GENERAL IDEA OF THE SOM MODEL.................................................................................. 20
   THE SOM ALGORITHM ............................................................................................................. 21
   SELF- ORGANISING MAP learning........................................................................................... 21
   Map quality measures ..................................................................................................................... 22
   Mapping precision .......................................................................................................................... 23
   VISUALIZING THE SOM............................................................................................................ 23
   Applications of SOM...................................................................................................................... 23
   Disadvantages of SOM................................................................................................................... 23
LEARNING ALGORITHMS ............................................................................................................ 24
   Simulated annealing ....................................................................................................................... 25
   The basic iteration .......................................................................................................................... 25
   Evolutionary computation .............................................................................................................. 28
   Evolutionary algorithms ................................................................................................................. 28
   Expectation maximization algorithm ............................................................................................ 29
   Description ...................................................................................................................................... 29
   Applications .................................................................................................................................... 29
NEURAL NETWORK SOFTWARE ............................................................................................... 31
   Simulators ....................................................................................................................................... 31
   SNNS research neural network simulator ..................................................................................... 31
   Data analysis simulators ................................................................................................................. 31
STRENGTHS AND WEAKNESSES OF NEURAL NETWORK MODELS .............................. 32
CONCLUSION ................................................................................................................................... 34




ARTIFICIAL INYELLIGENCE                                                             BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 3
                   ARTIFICIAL NEURAL NETWORKS




                       INTRODUCTION
One type of network sees the nodes as ‗artificial neurons‘. These are called artificialneural
networks (ANNs). An artificial neuron is a computational model inspired in the natural neurons.
Natural neurons receive signals through synapses located on the dendrites or membrane of the
neuron. When the signals received are strong enough (surpass a certain threshold), the neuron is
activated and emits a signal though the axon. This signal might be sent to another synapse, and
might activate other neurons.




Figure 1.Natural neurons (artist’s conception).

The complexity of real neurons is highly abstracted when modelling artificialneurons. These
basically consist of inputs (like synapses), which are multiplied by weights(strength of the
respective signals), and then computed by a mathematical function whichdetermines the
activation of the neuron. Another function (which may be the identity) computes the output of
the artificial neuron (sometimes in dependance of a certain threshold). ANNs combine artificial
neurons in order to process information.

The higher a weight of an artificial neuron is, the stronger the input which is multiplied by it will
be. Weights can also be negative, so we can say that the signal is inhibited by the negative
weight. Depending on the weights, the computation of the neuron will be different. By adjusting
the weights of an artificial neuron we can obtain the output we want for specific inputs. But
when we have an ANN of hundreds or thousands of neurons, it would be quite complicated to
find by hand all the necessary weights. But we can find algorithms which can adjust the weights
of the ANN in order to obtain the desired output from the network. This process of adjusting the

ARTIFICIAL INYELLIGENCE                                 BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 4
                   ARTIFICIAL NEURAL NETWORKS
weights is called learning or training. The number of types of ANNs and their uses is very high.
Since the first neural model by McCulloch and Pitts (1943) there have been developed hundreds
of differentmodels considered as ANNs. When creating a functional model of the biological
neuron, there are three basic components of importance. First, the synapses of the neuron are
modeled as weights. The strength of the connection between an input and a neuron is noted by
the value of the weight. Negative weight values reflect inhibitory connections, while positive
values designate excitatory connections [Haykin]. The next two components model the actual
activity within the neuron cell. An adder sums up all the inputs modified by their respective
weights. This activity is referred to as linear combination. Finally, an activation function controls
the amplitude of the output of the neuron.




ARTIFICIAL INYELLIGENCE                                 BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 5
                   ARTIFICIAL NEURAL NETWORKS




  THE BRAIN,NEURAL NETWORKS AND
                                 COMPUTERS
       Artificial neural networks may either be used to gain an understanding of biological
neural networks, or for solving artificial intelligence problems without necessarily creating a
model of a real biological system. If we desire to build an intelligent machine then what better
way to start than by imitating the human mind, evolution's most intelligent species.

                                      The human brain
The human brain contains about 10 billion nerve cells, or neurons. On average, each neuron is
connected to other neurons through about 10 000 networks, or synapses: the most massively
connected network as yet known. The brain's network of neurons forms a massively parallel
information processing system. Many studies suggest that humans may use less than 10 percent
of their brains' potential power. While this anecdotal evidence has not been scientifically proven,
it has become another mystery. Traditionally, the neuron has been regarded as simply a switch of
some sort, giving an output for a particular combination of inputs - very like a computer logic
gate. This view is however very wrong. Recent research has shown that neurons perform
considerable processing in both space and time, the neuron output is the result of a vast
computation, perhaps equivalent to one of our own supercomputers. A neuron is itself a cell,
each of which we now believe to contain microtubule computers (thousands per cell, each
operating at perhaps 10 million cycles per second).

                               How the human brain learns
Much is still unknown about how the brain trains itself to process information, so theories
abound. In the human brain, a typical neuron collects signals from others through a host of fine
structures called dendrites(input zone). The neuron sends out spikes of electrical activity
through a long, thin stand known as an axon, which splits into thousands of branches. This
spiking event is also called depolarization, and is followed by a refractory period, during which
the neuron is unable to fire. At the end of each branch(output zone), a structure called a synapse
converts the activity from the axon into electrical effects that inhibit or excite activity from the
axon in the connected neurones. Transmission of an electrical signal from one neuron to the next
is effected by neurotransmittors, chemicals which are released from the first neuron and which
bind to receptors in the second. When a neuron receives excitatory input that is sufficiently large
compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning
occurs by changing the effectiveness of the synapses so that the influence of one neuron on

ARTIFICIAL INYELLIGENCE                                BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 6
                   ARTIFICIAL NEURAL NETWORKS
another changes i.e., altering the strengths of connections between neurons, and by adding or
deleting connections between neurons. Furthermore, they learn "on-line", based on experience.




.



                               ARTIFICIAL NEURONES
We conduct these neural networks by first trying to deduce the essential features of neurones and
their interconnections. We then typically program a computer to simulate these features.
However because our knowledge of neurones is incomplete and our computing power is limited,
our models are necessarily gross idealisations of real networks of neurones.




It is possible to create a Universal Computer with a neural network architecture, but this is well
beyond the current abilities, so let us start with something simple. Any computation requires an
input, a process and an output. This three stage design can be emulated by having a set of input
neurons (connected to a sensing device), these in turn connected to a set or sets of (hidden)
neurons to process the inputs, which are themselves connected to a set of output neurons (driving
a display device). Each set of neurons is called a layer. The number of neurons used for each
layer, their interconnections and the number of layers optimum for any particular task is subject
to much debate.



ARTIFICIAL INYELLIGENCE                               BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 7
                    ARTIFICIAL NEURAL NETWORKS

                                  Neural network behaviour
Now let us compare this human activity with neural networks. Whenever we create a new neural
network, it is like giving birth to a child. After, we start to train the network. Not surprisingly, we
may have created the neural network for certain applications or purposes. Here, the difference
between childbirth and neural networks is obvious; first, we decide why we need a neural net and
create it. In the same way that a child becomes an expert in an area, we train the neural networks
to become expert in an area. Several techniques have been suggested for this, broadly grou ped
into two classes: The first assumes that we know what the result should be (like a teacher
instructing a pupil). In this case we can present the input, check what the output shows and then
adjust the strengths/connections until the correct output is given. This can be repeated with all
available test inputs until the network gets as close to error free as possible.This type of
technique is known as back-propagation (also as feedforward - its normal recognition mode).
Many other forms of the technique are also used, with varying degrees of support and success.
These supervised learning methods use manual reinforcement (strengthening of correct
connections, weakening of poor ones) but are slow to train and have many other drawbacks,
including inability to innovate (go beyond what is known).

In our daily life, in many instances we have already transferred decision-making processes to
computers. For example, say you attempt to purchase a product using a credit card over the
Internet. For some reason, the billing address does not match the mailing address; it may be due
to missing letters or misspelled words or other reasons. Although you are the correct person
using a valid credit card, the purchase does not go through because the seller's computer does not
allow transactions with a mismatch in the address. Although instances such as this happen daily
in our lives, we tend to forget the computer's role in the decision.

                                    Unsupervised learning
        The complexity of our own brains means that we can achieve multiple categorisation, we
recognise many aspects of any object at the same time. As yet Neural Network systems are very
limited in comparison, but simple network structures are known to have the ability to self-
organise. The second class of techniques make use of this idea. This type of unsupervised
learning mimics the more interesting aspects of human behaviour, our ability to learn for
ourselves, to add one and one and make three. In these cases we need the network to recognise
features of the input data itself (categorise it) and to display its findings in some way as to be of
use (which may include movement or other actions).

Kohonen developed an algorithm (the Self-Organizing Map or SOM) to mimic the brain's ability
to self organise and this forms the basis of most types of self-learning Neural Network. In this
method, arrays of data (initially random) are compared to the input signal and the closest match
found adjusted slightly to improve the fit. This is repeated for all input options, gradually leading
to the network weights converging upon the set of input options encountered.


ARTIFICIAL INYELLIGENCE                                  BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 8
                   ARTIFICIAL NEURAL NETWORKS


             Human neural networks versus conventional computers
It is known that Both can learn and become expert in an area and both are mortal with energy
usage of 30W.But they have comparisons classified under different cases:

   1. processing elements: even though they have the same element size, the brain has up to 10^14
      synapses as compared to 10^8 transistors of computers.
   2. Processing speed : consider the time taken for each elementary operation: neurons typically
      operate at a maximum rate of about 100 Hz, while a conventional CPU carries out several
      hundred million machine level operations per second.
   3. style of computation: using parallel, distributed system,the brain network is composed of a large
      number of highly interconnected neurones working in parallel to solve a specific problem. Neural
      networks learn by example. They cannot be programmed to perform a specific task. The
      disadvantage is that because the network finds out how to solve the problem by itself, its
      operation can be unpredictable.
      On the other hand, conventional computers use a cognitive approach to problem solving; the way
      the problem is to solved must be known and stated in small unambiguous instructions, which are
      then converted to a high level language program and then into machine code that the computer
      can understand. These machines are totally predictable.
   4. fault tolerant : humans can forget but neural networks cannot. Once fully trained, a neural net
      will not forget. Whatever a neural network learns is hard-coded and becomes permanent. A
      human's knowledge is volatile and may not become permanent. The other difference is accuracy.
      Once a particular application or process is automated through a neural network, the results are
      repeatable and accurate. Whether the process is replicated ―n‖ times, the results will be the same
      and will be as accurate as calculated the first time. Human beings are not like that. The first 10
      processes may be accurate, but later mistakes may happen.
   5. Learning : the brain can learn (reorganize itself) from experience. This means that partial
      recovery from damage is possible if healthy units can learn to take over the functions previously
      carried out by the damaged areas. But without ‗thorough‘ supervised learning,the computer is
      known to work within what it knows.Scientists have manage to develop Genetic Algorithms
      and Fuzzy Logic to create systems relying on probabilistic matching. That is, we cannot be
      certain of the results we obtain, each result is merely more probable than the alternatives, the
      system just chooses that result with the highest likelihood. That may seem a drawback, yet almost
      certainly relates more closely to the actual workings of our own brains.
   6. Intelligence, consciousness: this property of the human brain is what makes human the higher
      animals and the most dynamic of all living organisms. The issue for Artificial neural networks is
      the basis of frequent research being conducted up till date on how a computer can have ‗its own
      mind‘. That is, it has not been found yet.




ARTIFICIAL INYELLIGENCE                                  BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 9
                   ARTIFICIAL NEURAL NETWORKS




            APPLICATION OF ARTIFICIAL NEURAL NET
The utility of artificial neural network models lies in the fact that they can be used to infer a
function from observations and also to use it. This is particularly useful in applications where the
complexity of the data or task makes the design of such a function by hand impractical.

                                    Real life applications
The tasks to which artificial neural networks are applied tend to fall within the following broad
categories:

Function approximation, or regression analysis, including time series prediction and modelling.

Classification, including pattern and sequence recognition, novelty detection and sequential
decision making.

Data processing, including filtering, clustering, blind signal separation and compression.

Application areas of ANNs include system identification and control (vehicle control, process
control), game-playing and decision making (backgammon, chess, racing), pattern recognition
(radar systems, face identification, object recognition, etc.), sequence recognition (gesture,
speech, handwritten text recognition), medical diagnosis, financial applications, data mining (or
knowledge discovery in databases, "KDD"), visualization and e-mail spam filtering.

Moreover, some brain diseases, e.g. Alzheimer, are apparently, and essentially, diseases of the
brain's natural NN by damaging necessary prerequisites for the functioning of the mutual
interconnections between neurons and/or glia

                                  Neural network software
      Neural network software is used to simulate, research, develop and apply artificial neural
networks, biological neural networks and in some cases a wider array of adaptive systems.

                                    Learning paradigms
There are three major learning paradigms, each corresponding to a particular abstract learning
task. These are supervised learning, unsupervised learning and reinforcement learning. Usually
any given type of network architecture can be employed in any of those tasks.

ARTIFICIAL INYELLIGENCE                                BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 10
                   ARTIFICIAL NEURAL NETWORKS
Supervised learning: In supervised learning, we are given a set of example pairs and the aim is to
find a function f in the allowed class of functions that matches the examples. In other words, we
wish to infer how the mapping implied by the data and the cost function is related to the
mismatch between our mapping and the data.

Unsupervised learning: In unsupervised learning we are given some data x , and a cost function
which is to be minimized which can be any function of x and the network's output, f. The cost
function is determined by the task formulation. Most applications fall within the domain of
estimation problems such as statistical modeling, compression, filtering, blind source separation
and clustering.

                                                      x is usually not given, but generated by
Reinforcement learning: In reinforcement learning, data
an agent's interactions with the environment. At each point in time t, the agent performs an
action yt and the environment generates an observation xt and an instantaneous cost ct,
according to some (usually unknown) dynamics. The aim is to discover a policy for selecting
actions that minimizes some measure of a long-term cost, i.e. the expected cumulative cost. The
environment's dynamics and the long-term cost for each policy are usually unknown, but can be
estimated. ANNs are frequently used in reinforcement learning as part of the overall algorithm.
Tasks that fall within the paradigm of reinforcement learning are control problems, games and
other sequential decision making tasks.




ARTIFICIAL INYELLIGENCE                               BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 11
                   ARTIFICIAL NEURAL NETWORKS




                  CURRENT RESEARCHES
        In the recent years, neural networks are considered as moreappropriate techniques for
solving the complex and timeconsuming problems. They are broadly utilized in civil
andstructural engineering applications. Determining the dynamictime history responses of
structures for the earthquakesloadings is one of the time consuming problems with a huge
computational burden. In the present study, neural networks areemployed to predict the time
history responses of structures.Some neural networks such as radial basis function
(RBF),generalized regression (GR), counter propagation (CP), backpropagation (BP) and
wavelet back propagation (WBP) neuralnetworks are used in civil and structural
engineeringapplications [1-3]. As shown the performancegenerality of WBP for approximating
the structural time historyresponses is better than that of the RBF, GR, CP and BP
neuralnetworks. Therefore, in this study we have focused on WBPneural networks and its
improvements. The most important phase in the neural networks training is data generation.
Asemphasized in the relevant professional literatures there is no explicit method to select the
training samples andtherefore this job is usually accomplished on the random basis. Current
research area in neural network include
-the function block to programmable logic controller library function
-wavelet back propagation neural network for structural dynamic analysis amongst others
For the purpose of this paper we will be taking a bit a deep look into research in the second one.

                                  Culture and Research.
         Therefore, in the large scale problems selection of propertraining data may require
significant computer effort. Also, inthe case of such problems, to train a robust neural
network,many training samples must be selected. In the present paper,we introduce a new neural
system for eliminating the maindifficulties occurred in training mode of WBP neural
networks.The new system is designed in two main phases. In the first phase, the input space is
classified based on one criterion usinga competitive neural network. In the second phase, one
distinctWBP neural network is trained for each class using datalocated. In this manner, a set of
parallel WBP neural networksare substituted with a single WBP neural network. The
neuralsystem is called parallel wavelet back propagation (PWBP)neural networks. The numerical
results indicate that the
performance generality of PWBP is better than that of thesingle WBP neural network.




ARTIFICIAL INYELLIGENCE                               BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 12
                    ARTIFICIAL NEURAL NETWORKS

                              Fundamentals of wavelet theory
       Wavelet theory is the outcome of multi-disciplinaryendeavours that brought together
mathematicians, physicistsand engineers. This relationship creates a flow of ideas thatgoes well
beyond the construction of new bases or transforms.The term of wavelet means a little wave. A
functionA New Wavelet Back Propagation NeuralNetworks for Structural Dynamic Analysish
∈L2 (R) (the set of all square integrable or a finite energy
function) is called a wavelet if it has zero average on (−∞,+∞)



                        +∞
                        ∫ h(t) dt= 0                                           (1)
                 −∞
This little wave must have at least a minimum oscillation anda fast decay to zero in both the
positive and negative directionsof its amplitude. These three properties are theGrossmann-Morlet
admissibility conditions of a function that isrequired for the wavelet transform. The wavelet
transform is anoperation which transforms a function by integrating it withmodified versions of
some kernel functions. The kernelfunction is called the mother wavelet and the modified
versionis its daughter wavelet. A function h ∈L2 (R) is admissible if:

+∞
                        ∫        dω = 0                                (2)
                       −∞
where H(ω) is the Fourier transform of h(t). The constant h c isthe admissibility constant of the
function h(t). For a given h(t),the condition h c < ∞ holds only ifH(0) = 0 .The wavelet transform
of a function h ∈L2 (R) with respectto a given admissible mother wavelet h(t) is defined as:
                               +∞
                        W f (a, b)     = ∫ f(t) ha,b * (t)dt           (3)
                               −∞
where * denotes the complex conjugate. However, mostwavelets are real valued.
Sets of wavelets are employed for approximation of a signaland the goal is to find a set of
daughter wavelets constructed bydilated and translated original wavelets or mother wavelets
thatbest represent the signal. The daughter wavelets are generated
from a single mother wavelet h(t) by dilation and translation:

                        ha,b (t) =     h(     )                                        (4)

Where a > 0 is the dilation factor, b is the translation factor.


                                     Wavelet neural networks
       Wavelet neural networks (WNN) employing wavelets as theactivation functions recently
have been researched as analternative approach to the neural networks with sigmoidalactivation
ARTIFICIAL INYELLIGENCE                                   BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 13
                  ARTIFICIAL NEURAL NETWORKS
functions. The combination of wavelet theory andneural networks has lead to the development of
WNNs. WNNsare feed forward neural networks using wavelets as activation
function. In WNNs, both the position and the dilation of thewavelets are optimized besides the
weights. Wavenet is anotherterm to describe WNN. Originally, wavenets did refer to
neuralnetworks using wavelets. In wavenets, the position and dilationof the wavelets are fixed
and the weights are optimized.



                     Wavelet back propagation neural networks
        BP network is now the most popular mapping neuralnetwork. But it has few problems
such as trapping into localminima and slow convergence. Wavelets are powerful signalanalysis
tools. They can approximately realize thetime-frequency analysis using a mother wavelet. The
motherwavelet has a square window in the time-frequency space. Thesize of the window can be
freely variable by two parameters.Thus, wavelets can identify the localization of unknown
signalsat any level. Activation function of hidden layer neurons in BPneural network is a
sigmoidal function shown in Fig.1a. Todesign wavelet back propagation (WBP) neural network
wesubstitute hidden layer sigmoidal activation function of BP with POLYWOG1 wavelet:

                      HPOLYWOG 1 (t)       (exp(1) (t) exp( t 2 / 2))             (5)

Plot of POLYWOG1 with a=1 and b=0, is shown in Fig.1.b. Inthe resulted WBP neural network,
the position and dilation ofthe wavelets as activation function of hidden layer neurons arefixed
and the weights of network are optimized using the SCGalgorithm. In this study, we obtain good
results considering b =0 and a = 2.5. The activation function of the hidden layer
neurons is as (6).

              HPOLYWOG 1 (t)        (exp(1) (t / 2.5) exp(   (t / 2.5)2 / 2))             (6)

Therefore, WBP is a modified back propagation neuralnetwork with POLYWOG1 hidden layer
neurons activationfunction. And adjusting the weights of the neural network isperformed using
SCG algorithm. Typical topology of WBP isshown in Fig.2.




ARTIFICIAL INYELLIGENCE                              BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 14
                  ARTIFICIAL NEURAL NETWORKS

                                   Competitive neural




Fig.1:a) Sigmoidal function, b) POLYWOG1 wavelet




Fig.2: Typical topology of WBP
        Some applications need to group data that may, or may notbe, clearly definable.
Competitive neural networks can learn todetect regularities and correlations in their input and
adapt theirfuture responses to that input accordingly. The neurons ofcompetitive networks learn
to recognize groups of similar inputvectors. A competitive network automatically learns to
classifyinput vectors. However, the obtained classes by the competitivenetwork depend only on
the distance between input vectors. Iftwo input vectors are very similar, the competitive
networkprobably will put them in the same class. There is nomechanism in a strictly competitive
network design to saywhether or not any two input vectors are in the same class ordifferent
classes. A competitive network simply tries to identifygroups as best as they can. Training of
competitive network isbased on Kohonen [10] self-organization algorithm. A keydifference
between this network and many other networks isthat the competitive network learns without
supervision.During training the weights of the winning neuron are updatedaccording to:


ARTIFICIAL INYELLIGENCE                             BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 15
                   ARTIFICIAL NEURAL NETWORKS

               Wij( k + 1) = Wij (k) + α[xj(k) - Wij (k)]             (7)

wherewijis the weight of competitive layer from input i toneuron j, xjis jth component of the
input vector, α is learningrate and k is discrete time.


                 Parallel wavelet back propagation neural networks
         As mentioned previously, in the case of large problems totrain a neural network with
acceptable performance generality,it is quite necessary that an adequate number of training data
tobe selected. Therefore, too much computer effort is required inthe training phase. To attain
appropriate generalizationspending low effort we propose the PWBP neural networks.
At first, the selected input-target training pairs are classifiedin some classes based on a specific
criterion. In other words, theinput and target spaces are divided into some subspaces as the
data located in each subspace have similar properties. Now wecan train a small WBP neural
network for each subspace usingits assigned training data. Considering the mentioned simple
strategy a single WBP neural network which is trained for allover the input space is substituted
with a set of parallel WBPneural networks as each of them is trained for one segment of
the classified input space.In PWBP, each WBP neural network has specific dilationfactor which
may differ from that of the other WBP neuralnetworks. Therefore performance generality of
PWBP neuralnetworks is higher than that of the single WBP neural network.
Improving generalization process of PWBP is performedvery rationally and economically in
comparison with that of thesingle WBP neural network. In other words, improvinggeneralization
process and retraining of some small parallelWBP neural networks have low effort with respect
to those ofthe single WBP neural network. Furthermore, it is veryprobable that some of the
parallel WBP neural networks ofPWBP require no improving generalization.
Selection a proper criterion for classification of the inputspace depends on the nature of the
problem and its variablesthus recognition of the effective arguments to select an efficientcriterion
has very significant influence on the generality ofPWBP. Determination the number of the
classes depends on thecomplicacy and size of the input space and there are no special
criteria for this mean.



                                  ERROR ESTIMATION
      In the present study to evaluate the error between exact andapproximate results, the root
mean squared error (RMSE) iscalculated.

                                                    2
                                                     )                         (8)

where, xi and ‗xi are the ith component of the exact andapproximated vectors, respectively. nis
the vectors dimension.To measure how successful fitting is achieved between exactand
approximate responses, the Rsquare statistic measurementis employed. A value closer to 1
indicates a better fit.


ARTIFICIAL INYELLIGENCE                                  BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 16
                   ARTIFICIAL NEURAL NETWORKS

                      Rsquare= 1 -                                                 (9)

where, x is the mean of exact vectors component.




                         TYPES OF MODELS
THE MULTI LAYER FEED FORWARD NEURAL NETWORK (MLFFNN)
        This is perhaps the most popular network architecture in use today, due originally to
Rumelhart and McClelland. In this type of network discussed briefly the units each perform a
biased weighted sum of their inputs and pass this activation level through a transfer function to
produce their output, and the units are arranged in a layered feedforward topology. The network
thus has a simple interpretation as a form of input-output model, with the weights and thresholds
(biases) the free parameters of the model. Such networks can model functions of almost arbitrary
complexity, with the number of layers, and the number of units in each layer, determining the
function complexity. Important issues in Multilayer Perceptrons (MLP) design include
specification of the number of hidden layers and the number of units in these layers.

The number of input and output units is defined by the problem there may be some uncertainty
about precisely which inputs to use, a point to which we will return later. However, for the
moment we will assume that the input variables are intuitively selected and are all meaningful.
The number of hidden units to use is far from clear. As good a starting point as any is to use one
hidden layer, with the number of units equal to half the sum of the number of input and output
units. Again, we will discuss how to choose a sensible number later.




ARTIFICIAL INYELLIGENCE                               BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 17
                  ARTIFICIAL NEURAL NETWORKS




Figure 1: Architecture or Topology



The example is a 3 layer feed forward Artificial Neural Network (ANN) using back propagation
algorithm as a framework to classify customers of a company into different categories.The
MLFFNN as shown in Figure 1 above for the classificationof customer invoicing data is a
two layer feed forwardneural network having a variable architecture. The 5 input nodes
correspond to the 3 sets of 5 values each (of the customer invoicing data) applied to the
neural network as training samples. The 3 nodes in the output layer correspond to the 3
levels of classification of customers as good, average and below average. For the purpose
of a 3 fold classification, a code has been used. 100 is used as “Desired Output” for a
good customer respectively on the 3 output nodes whereas a code of 010 is used for an
average customer. The code of 001 is used as desired output for the category of below
average customers.

                     Initializing the weights and training sample
       The weights in the network are initialized to small randomnumbers. Each unit has a bias
associated with it. The biases are similarly initialized to small random numbers. Each training
sample, X;

       (0.4301266, 0.4888733, 0.308413, 0.80125, 0.485115, A, 1, 0, 0)

       (0.0802533, 0.09396, 0.012233, 0.701267, 0.098404, B, 0, 1, 0)

       (0.04794, 0.0460733, 0.002493, 0.306189, 0.040798, C, 0,0, 1)



ARTIFICIAL INYELLIGENCE                             BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 18
                    ARTIFICIAL NEURAL NETWORKS
       Where A stands for the training data for good customers, B stands for the training data
for average customers and C stands for the training data for below average customers. The
processing of the above sample in training mode is given below.

                       The modified back propagation algorithm
(1) Initialize all weights and biases in the network;

(2) While terminating condition is not satisfied {

(3) For each training sample X in samples {

(4) // Propagate the inputs forward:

(5) For each hidden or output layer unit j {

(6) Ij = Σ wijOi + Ǿj; //compute the net input I of unit j with respect to the previous layer, i

(7) Oj =a/1+ e-bIj;} // Where a and b are control parameters used for the purpose of epoch control
and analysis.

(8) // Back propagate the errors;

(9) For each unit j in the output layer

(10) Errj = Oj (a-Oj)(b/a))(Tj-Oj); // compute the error

(11) For each unit j in the hidden layers, from the last to the first hidden layer

(12) Errj = Oj (a-Oj)(b/a) (Tj-Oj) Σ Errk*wjk; // k compute the error with respect to the next
higher layer, k

(13) For each weight wij in network {

(14) Δwij=(l) ErrjOi; // weight increment

(15) Wij = wij + Δwij;} //weight update

(16) For each bias Ǿj; in network {

(17) Δ Ǿj = (l) Errj; // bias increment

(18) Ǿj = Ǿj + Δ Ǿj;} // bias update }}

                       Different learning rate annealing schedules
       The variable l is the learning rate parameter, a constant typically having a value between
0.0 and 1.0. We have used variable Learning-Rate Annealing Schedules for the learning

ARTIFICIAL INYELLIGENCE                                  BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 19
                  ARTIFICIAL NEURAL NETWORKS
rate parameter. Ideally, we would like that the MLFFNN learns initially at a rapid rate (l =
constant,

where the value of constant is around 0.8) but as the MLFFNN proceeds through a number of
epochs, we would want that it slows down in its learning. This slow down in the learning process
is achieved by keeping a search – then – converge schedule, defined by Darkan and Moody
(1992) as l = ηǒ / (1 + (n/τ). Where ηǒ is given a value of 0.8 and analysis is performed
to find the optimal values of τ ( an value of 10000 causes convergence in the error space in an
optimal number of epochs for the customer invoicing data).

Terminating condition
•The mean squared error is below a threshold value.

•A pre specified number of epochs has expired

                                      The test sample
        A testing sample (as Input) is applied when we have trained the network adequately and
mean squared error hasfallen below the desired level and serves as a test ofeffectiveness of the
analysis model. A test case is preparedfrom any other unused source system or location (BSS
isused in our invoicing example). The effectiveness is measured on the basis of how closely a
MLFFNN is able to classify a test sample correctly. A typical test sample T for the purpose of
analysis is given below:

(0.3001266, 0.4088733, 0.208413, 0.85125, 0.65115)

For this test sample T, the MLFFNN should correctly classify the customer invoicing data in the
category of good customers.

                              SELF ORGANIZING MAPS


                                  Unsupervised learning

       The complexity of our own brains means that we can achieve multiple categorisation,
we recognise many aspects of any object at the same time. As yet Neural Network systems are
very limited in comparison, but simple network structures are known to have the ability to self-
organise.

                     GENERAL IDEA OF THE SOM MODEL
      The Self-Organizing Map (SOM) was introduced by TeuvoKohonen in 1982. In contrast
to many other neural networks using supervised learning, the SOM is based on unsupervised

ARTIFICIAL INYELLIGENCE                               BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 20
                   ARTIFICIAL NEURAL NETWORKS
learning. A self-organizing map (SOM) or self-organizing feature map (SOFM) is a type of
artificial neural network that is trained using unsupervised learning to produce a low-dimensional
(typically two-dimensional) representation of the input space of the training samples, called a
map. The SOM can thus serve as a clustering tool of high-dimensional data. Because of its
typical two-dimensional shape, it is also easy to visualize.

        Another important feature of the SOM is its capability to generalize. In other words, it
can interpolate between previously encountered inputs.

                                 THE SOM ALGORITHM
        In a self-organizing map, the neurons are placed at the nodes of a lattice, and they
become selectively tuned to various input patterns (vectors) in the course of a competitive
learning process.
In reality, the SOM belongs to the class of vector-coding algorithms . That is, a fixed number of
codewords are placed into a higher-dimensional input space, thereby facilitating data
compression.
An integral feature of the SOM algorithm is the neighbourhood function centered around a
neuron that wins the competitive process.

The algorithm exhibits two distinct phases in its operation:
1. Ordering phase, during which the topological ordering of the weight vectors takes place
2. Convergence phase, during which the computational map is fine tuned

        The SOM algorithm exhibits the following properties:
1. Approximation of the continuous input space by the weight vectors of the discrete lattice.
2. Topological ordering exemplified by the fact that the spatial location of a neuron in the lattice
corresponds to a particular feature of the input pattern.
3. The feature map computed by the algorithm reflects variations in the statistics of the input
distribution.
4. SOM may be viewed as a nonlinear form of principal components analysis.


                          SELF- ORGANISING MAP learning
        The basic idea of SOM is simple yet effective. The SOM defines a mapping from high
dimensional input data space onto a regular two-dimensional array of neurons. Every neuron i of
the map is associated with an n-dimensional reference vector , where n denotes the dimension of
the input vectors. The reference vectors together form a codebook. The neurons of the map are
connected to adjacent neurons by a neighbourhood relation, which dictates the topology, or the
structure, of the map. The most common topologies in use are rectangular and hexagonal.

        Adjacent neurons belong to the neighbourhood Ni of the neuron i. In the basic SOM
algorithm, the topology and the number of neurons remain fixed from the beginning. The number
of neurons determines the granularity of the mapping, which has an effect on the accuracy and


ARTIFICIAL INYELLIGENCE                                BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 21
                    ARTIFICIAL NEURAL NETWORKS
generalization                        of                           the                         SOM.
During the training phase, the SOM forms an elastic net that folds onto the "cloud" formed by
input data. The algorithm controls the net so that it strives to approximate the density of the data.
The reference vectors in the codebook drift to the areas where the density of the input data is
high. Eventually, only few codebook vectors lie in areas where the input data is sparse.

       The learning process of the SOM goes as follows:

   1. One sample vector x is randomly drawn from the input data set and its similarity (distance) to the
      codebook vectors is computed by using e.g. the common Euclidean distance measure:




   2. After the BMU has been found, the codebook vectors are updated. The BMU itself as well as its
      topological neighbours are moved closer to the input vector in the input space i.e. the input vector
      attracts them. The magnitude of the attraction is governed by the learning rate. As the learning
      proceeds and new input vectors are given to the map, the learning rate gradually decreases to zero
      according to the specified learning rate function type

   3. The update rule for the reference vector of unit i is the following:




   4. The steps 1 and 2 together constitute a single training step and they are repeated until the training
      ends. The number of training steps must be fixed prior to training the SOM because the rate of
      convergence in the neighbourhood function and the learning rate is calculated accordingly.

       After the training is over, the map should be topologically ordered. This means that n
topologically close (using some distance measure e.g. Euclidean) input data vectors map to n
adjacent map neurons or even to the same single neuron.




                                     Map quality measures
         After the SOM has been trained, it is important to know whether it has properly adapted
itself to the training data. Because it is obvious that one optimal map for the given input data
ARTIFICIAL INYELLIGENCE                                    BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 22
                    ARTIFICIAL NEURAL NETWORKS
must exist, several map quality measures have been proposed. Usually, the quality of the SOM is
evaluated based on the mapping precision and the topology preservation.

                                       Mapping precision
        The mapping precision measure describes how accurately the neurons 'respond' to the
given data set. For example, if the reference vector of the BMU calculated for a given testing
vector xi is exactly the same xi, the error in precision is then 0. Normally, the number of data
vectors exceeds the number of neurons and the precision error is thus always different from 0.

       A common measure that calculates the precision of the mapping is the average
quantization error over the entire data set:




                                 VISUALIZING THE SOM
        The SOM is easy to visualize and over the years, several visualization techniques have
been devised. Due to the inherently intricate nature of the SOM, however, not one of the
visualization methods discovered so far, has proven to be superior to others. At times, several
different visualizations of the same SOM are needed to fully see the state of the map. From this,
in can be concluded that every existing visualization method has its merits and demerits. Unified
distance matrix, or u-matrix, is perhaps the most popular method of displaying SOMs.

                                     Applications of SOM
        The most important practical applications of SOMs are in exploratory data analysis,
pattern recognition, speech analysis, robotics, industrial and medical diagnostics, instrumentation
and control. The SOM can also be applied to hundreds of other tasks where large amounts of
unclassified data is available.

                                    Disadvantages of SOM
        One major problem with SOMs is getting the right data. Unfortunately you need a value
for each dimension of each member of samples in order to generate a map. Sometimes this
simply is not possible and often it is very difficult to acquire all of this data so this is a limiting
feature to the use of SOMs often referred to as missing data. Another problem is that every SOM
is different and finds different similarities among the sample vectors. SOMs organize sample
data so that in the final product, the samples are usually surrounded by similar samples, however
similar samples are not always near each other. If you have a lot of shades of purple, not always
will you get one big group with all the purples in that cluster, sometimes the clusters will get
split and there will be two groups of purple. Using colors we could tell that those two groups in
reality are similar and that they just got split, but with most data, those two clusters will look
totally unrelated. So a lot of maps need to be constructed in order to get one final good map.

ARTIFICIAL INYELLIGENCE                                  BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 23
                   ARTIFICIAL NEURAL NETWORKS




                 LEARNING ALGORITHMS
        There are many algorithms for training neural networks; most of them can be viewed as a
straightforward application of optimization theory and statistical estimation. They include: Back
propagation by gradient descent, Rprop, BFGS, CG etc. Evolutionary computation methods,
simulated annealing, expectation maximization and non-parametric methods are among other
commonly used methods for training neural networks. This is related to machine learning.
Recent developments in this field also saw the use of particle swarm optimization and other
swarm intelligence techniques used in the training of neural networks.

       Machine learning is a scientific discipline that is concerned with the design and
development of algorithms that allow computers to evolve behaviors based on empirical data,
such as from sensor data or databases. A major focus of machine learning research is to
ARTIFICIAL INYELLIGENCE                              BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 24
                    ARTIFICIAL NEURAL NETWORKS
automatically learn to recognize complex patterns and make intelligent decisions based on data;
the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too
complex to describe generally in programming languages, so that in effect programs must
automatically describe programs. Artificial intelligence is a closely related field, as also
probability theory and statistics, data mining, pattern recognition, adaptive control, and
theoretical computer science.

                                      Simulated annealing
       Simulated annealing (SA) is a generic probabilisticmetaheuristic for the global
optimization problem of applied mathematics, namely locating a good approximation to the
global optimum of a given function in a large search space. It is often used when the search
space is discrete (e.g., all tours that visit a given set of cities). For certain problems, simulated
annealing may be more effective than exhaustive enumeration — provided that the goal is
merely to find an acceptably good solution in a fixed amount of time, rather than the best
possible solution.

         The name and inspiration come from annealing in metallurgy, a technique involving
heating and controlled cooling of a material to increase the size of its crystals and reduce their
defects. The heat causes the atoms to become unstuck from their initial positions (a local
minimum of the internal energy) and wander randomly through states of higher energy; the slow
cooling gives them more chances of finding configurations with lower internal energy than the
initial one.

        By analogy with this physical process, each step of the SA algorithm replaces the current
solution by a random "nearby" solution, chosen with a probability that depends on the difference
between the corresponding function values and on a global parameter T (called the temperature),
that is gradually decreased during the process. The dependency is such that the current solution
changes almost randomly when T is large, but increasingly "downhill" as T goes to zero. The
allowance for "uphill" moves saves the method from becoming stuck at local optima—which are
the bane of greedier methods.

         In the simulated annealing (SA) method, each point s of the search space is analogous to
a state of some physical system, and the function E(s) to be minimized is analogous to the
internal energy of the system in that state. The goal is to bring the system, from an arbitrary
initial state, to a state with the minimum possible energy.

                                       The basic iteration
At each step, the SA heuristic considers some neighbours' of the current state s, and
probabilistically decides between moving the system to state s' or staying in state s. The
probabilities are chosen so that the system ultimately tends to move to states of lower energy.



ARTIFICIAL INYELLIGENCE                                  BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 25
                    ARTIFICIAL NEURAL NETWORKS
Typically this step is repeated until the system reaches a state that is good enough for the
application, or until a given computation budget has been exhausted.

The neighbours of a state

The neighbours of a state are new states of the problem that are produced after altering the given
state in some particular way. For example, in the traveling salesman problem, each state is
typically defined as a particular permutation of the cities to be visited. The neighbours of some
particular permutation are the permutations that are produced for example by interchanging a
pair of adjacent cities. The action taken to alter the solution in order to find neighbouring
solutions is called "move" and different "moves" give different neighbours. These moves usually
result in minimal alterations of the solution, as the previous example depicts, in order to help an
algorithm to optimize the solution to the maximum extent and also to retain the already optimum
parts of the solution and affect only the suboptimum parts. In the previous example, the parts of
the solution are the parts of the tour.

Searching for neighbours to a state is fundamental to optimization because the final solution will
come after a tour of successive neighbours. Simple heuristics move by finding best neighbour
after best neighbour and stop when they have reached a solution which has no neighbours that
are better solutions. The problem with this approach is that a solution that does not have any
immediate neighbours that are better solution is not necessarily the optimum. It would be the
optimum if it was shown that any kind of alteration of the solution does not give a better solution
and not just a particular kind of alteration. For this reason it is said that simple heuristics can
only reach local optima and not the global optimum. Metaheuristics, although they also optimize
through the neighbourhood approach, differ from heuristics in that they can move through
neighbours that are worse solutions than the current solution. Simulated Annealing in particular
doesn't even try to find the best neighbour. The reason for this is that the search can no longer
stop in a local optimum and in theory, if the metaheuristic can run for an infinite amount of time,
the global optimum will be found.

The annealing schedule

Another essential feature of the SA method is that the temperature is gradually reduced as the
simulation proceeds. Initially, T is set to a high value (or infinity), and it is decreased at each step
according to some annealing schedule—which may be specified by the user, but must end with T
= 0 towards the end of the allotted time budget. In this way, the system is expected to wander
initially towards a broad region of the search space containing good solutions, ignoring small
features of the energy function; then drift towards low-energy regions that become narrower and
narrower; and finally move downhill according to the steepest descent heuristic.

It can be shown that for any given finite problem, the probability that the simulated annealing
algorithm terminates with the global optimal solution approaches 1 as the annealing schedule is

ARTIFICIAL INYELLIGENCE                                  BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 26
                   ARTIFICIAL NEURAL NETWORKS
extended. This theoretical result, however, is not particularly helpful, since the time required to
ensure a significant probability of success will usually exceed the time required for a complete
search of the solution space.

Pseudocode

The following pseudocode implements the simulated annealing heuristic, as described above,
starting from state s0 and continuing to a maximum of kmax steps or until a state with energy
emax or less is found. The call neighbour(s) should generate a randomly chosen neighbour of a
given states; the call random() should return a random value in the range [0,1]. The annealing
schedule is defined by the call temp(r), which should yield the temperature to use, given the
fraction r of the time budget that has been expended so far.
s ← s0; e ← E(s)                                            // Initial state, energy.

sbest ← s; ebest ← e                                        // Initial "best" solution

k ← 0                                                       // Energy evaluation count.

while k <kmaxand e >emax                                     // While time left & not good
enough:

snew ← neighbour(s)                                       // Pick some neighbour.

enew ← E(snew)                                            // Compute its energy.

ifenew<ebestthen                                     // Is this a new best?

sbest ← snew; ebest ← enew                               // Save 'new neighbour' to 'best
found'.

if P(e, enew, temp(k/kmax)) > random() then               // Should we move to it?

s ← snew; e ← enew                                     // Yes, change state.

k ← k + 1                                                 // One more evaluation done

returnsbest                                                     // Return the best solution
found.




Actually, the "pure" SA algorithm does not keep track of the best solution found so far: it does
not use the variables sbest and ebest, it lacks the first if inside the loop, and, at the end, it
returns the current state s instead of sbest. While saving the best state is a standard
optimization, that can be used in any metaheuristic, it breaks the analogy with physical annealing
— since a physical system can "store" a single state only.


ARTIFICIAL INYELLIGENCE                                BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 27
                   ARTIFICIAL NEURAL NETWORKS
In strict mathematical terms, saving the best state is not necessarily an improvement, since one
may have to specify a smaller kmax in order to compensate for the higher cost per iteration.
However, the step sbest ← snew happens only on a small fraction of the moves. Therefore, the
optimization is usually worthwhile, even when state-copying is an expensive operation.

                                 Evolutionary computation
         In computer science, evolutionary computation is a subfield of artificial intelligence more
particularly computational intelligence that involves combinatorial optimization problems.
Evolutionary computation uses iterative progress, such as growth or development in a
population. This population is then selected in a guided random search using parallel processing
to achieve the desired end. Such processes are often inspired by biological mechanisms of
evolution.The use of Darwinian principles for automated problem solving originated in the
fifties. It was not until the sixties that three distinct interpretations of this idea started to be
developed in three different places.

        Evolutionary programming was introduced by Lawrence J. Fogel in the USA, while John
Henry Holland called his method a genetic algorithm. In GermanyIngo Rechenberg and Hans-
Paul Schwefel introduced evolution strategies. These areas developed separately for about 15
years. From the early nineties on they are unified as different representatives (―dialects‖) of one
technology, called evolutionary computing. Also in the early nineties, a fourth strea m following
the general ideas had emerged – genetic programming. These terminologies denote the field of
evolutionary computing and consider evolutionary programming, evolution strategies, genetic
algorithms, and genetic programming as sub-areas.

                                  Evolutionary algorithms
        Evolutionary algorithms form a subset of evolutionary computation in that they generally
only involve techniques implementing mechanisms inspired by biological evolution such as
reproduction, mutation, recombination, natural selection and survival of the fittest. Candidate
solutions to the optimization problem play the role of individuals in a population, and the cost
function determines the environment within which the solutions "live" (see also fitness function).
Evolution of the population then takes place after the repeated application of the above operators.

         In this process, there are two main forces that form the basis of evolutionary systems:
Recombination and mutation create the necessary diversity and thereby facilitate novelty, while
selection acts as a force increasing quality. Many aspects of such an evolutionary process are
stochastic. Changed pieces of information due to recombination and mutation are randomly
chosen. On the other hand, selection operators can be either deterministic, or stochastic. In the
latter case, individuals with a higher fitness have a higher chance to be selected than individuals
with a lower fitness, but typically even the weak individuals have a chance to become a parent or
to survive.


ARTIFICIAL INYELLIGENCE                                BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 28
                   ARTIFICIAL NEURAL NETWORKS

                          Expectation maximization algorithm
        In statistics, an expectation-maximization (EM) algorithm is a method for finding
maximum likelihood estimates of parameters in statistical models, where the model depends on
unobserved latent variables. The EM algorithm was explained and given its name in a classic
1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin. They pointed out that the method
had been "proposed many times in special circumstances" by earlier authors. EM is an iterative
method which alternates between performing an expectation (E) step, which computes the
expectation of the log-likelihood evaluated using the current estimate for the latent variables, and
a maximization (M) step, which computes parameters maximizing the expected log-likelihood
found on the E step. These parameter-estimates are then used to determine the distribution of the
latent variables in the next E step.

        However, the convergence analysis of the Dempster-Laird-Rubin paper was flawed. A
correct convergence analysis was published by C. F. Jeff Wu in 1983. Wu's proof established the
EM method's convergence outside of the exponential family, as claimed by Dempster-Laird-
Rubin.

                                          Description
        Given a likelihood functionL(θ; x, z), where θ is the parameter vector, x is the observed
data and z represents the unobserved latent data or missing values, the maximum likelihood
estimate (MLE) is determined by the marginal likelihood of the observed data L(θ; x), however
this quantity is often intractable.

The EM algorithm seeks to find the MLE of the marginal likelihood by iteratively applying the
following two steps:

Expectation step: Calculate the expected value of the log likelihood function, with respect to the
conditional distribution of z given x under the current estimate of the parameters θ(t):



Maximization step: Find the parameter which maximizes this quantity:




                                         Applications
        EM is frequently used for data clustering in machine learning and computer vision. In
natural language processing, two prominent instances of the algorithm are the Baum-Welch



ARTIFICIAL INYELLIGENCE                                BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 29
                   ARTIFICIAL NEURAL NETWORKS
algorithm (also known as forward-backward) and the inside-outside algorithm for unsupervised
induction of probabilistic context-free grammars.

         In psychometrics, EM is almost indispensable for estimating item parameters and latent
abilities of item response theory models. With the ability to deal with missing data and observe
unidentified variables, EM is becoming a useful tool to price and manage risk of a portfolio. The
EM algorithm (and its faster variant OS-EM) is also widely used in medical image
reconstruction, especially in positron emission tomography and single photon emission
computed tomography.




ARTIFICIAL INYELLIGENCE                              BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 30
                   ARTIFICIAL NEURAL NETWORKS

         NEURAL NETWORK SOFTWARE
                                          Simulators
        Neural network simulators are software applications that are used to simulate the
behavior of artificial or biological neural networks. They focus on one or a limited number of
specific types of neural networks. They are typically stand-alone and not intended to produce
general neural networks that can be integrated in other software. Simulators usually have some
form of built-in visualization to monitor the training process. Some simulators also visualize the
physical structure of the neural network.



                       SNNS research neural network simulator
       Historically, the most common type of neural network software was intended for
researching neural network structures and algorithms. The primary purpose of this type of
software is, through simulation, to gain a better understanding of the behavior and properties of
neural networks. Today in the study of artificial neural networks, simulators have largely been
replaced by more general component based development environments as research platforms.

Commonly used artificial neural network simulators include the Stuttgart Neural Network
Simulator (SNNS), Emergent, JavaNNS and Neural Lab. In the study of biological neural
networks however, simulation software is still the only available approach. In such simulators
the physical biological and chemical properties of neural tissue, as well as the electromagnetic
impulses between the neurons are studied.Commonly used biological network simulators include
Neuron, GENESIS, Nest and Brian. Oter simulators are XNBC and the BNN Toolbox for
MATLAB.

                                  Data analysis simulators
        Unlike the research simulators, the data analysis simulators are intended for practical
applications of artificial neural networks. Their primary focus is on data mining and forecasting.
Data analysis simulators usually have some form of preprocessing capabilities. Unlike the more
general development environments data analysis simulators use a relatively simple static neural
network that can be configured. A majority of the data analysis simulators on the market use
self-organizing maps as their core. The advantage of this type of software is that it is relatively
easy to use. This however comes at the cost of limited capability.




ARTIFICIAL INYELLIGENCE                                BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 31
                   ARTIFICIAL NEURAL NETWORKS

     STRENGTHS AND WEAKNESSES OF
            NEURAL NETWORK MODELS
       Philosophers are interested in neural networks because they may provide a new
framework for understanding the nature of the mind and its relation to the brain (Rumelhart and
McClelland 1986, Chapter 1). Connectionist models seem particularly well matched to what we
know about neurology. The brain is indeed a neural net, formed from massively many units
(neurons) and their connections (synapses). Furthermore, several properties of neural network
models suggest that connectionism may offer an especially faithful picture of the nature of
cognitive processing. Neural networks exhibit robust flexibility in the face of the challenges
posed by the real world. Noisy input or destruction of units causes graceful degradation of
function. The net's response is still appropriate, though somewhat less accurate.

        In contrast, noise and loss of circuitry in classical computers typically result in
catastrophic failure. Neural networks are also particularly well adapted for problems that require
the resolution of many conflicting constraints in parallel. There is ample evidence from research
in artificial intelligence that cognitive tasks such as object recognition, planning, and even
coordinated motion present problems of this kind. Although classical systems are capable of
multiple constraint satisfaction, connectionists argue that neural network models provide much
more natural mechanisms for dealing with such problems.

        Over the centuries, philosophers have struggled to understand how our concepts are
defined. It is now widely acknowledged that trying to characterize ordinary notions with
necessary and sufficient conditions is doomed to failure. Exceptions to almost any proposed
definition are always waiting in the wings. For example, one might propose that a tiger is a large
black and orange feline. But then what about albino tigers? Philosophers and cognitive
psychologists have argued that categories are delimited in more flexible ways, for example via a
notion of family resemblance or similarity to a prototype. Connectionist models seem especially
well suited to accommodating graded notions of category membership of this kind. Nets can
learn to appreciate subtle statistical patterns that would be very hard to express as hard and fast
rules. Connectionism promises to explain flexibility and insight found in human intelligence
using methods that cannot be easily expressed in the form of exception free
principles (Horgan and Tienson 1989, 1990), thus avoiding the brittleness that arises
from standard forms of symbolic representation.

       Despite these intriguing features, there are some weaknesses in connectionist models that
bear mentioning. First, most neural network research abstracts away from many interesting and
possibly important features of the brain. For example, connectionists usually do not attempt to

ARTIFICIAL INYELLIGENCE                                BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 32
                   ARTIFICIAL NEURAL NETWORKS
explicitly model the variety of different kinds of brain neurons, nor the effects of
neurotransmitters and hormones. Furthermore, it is far from clear that the brain contains the kind
of reverse connections that would be needed if the brain were to learn by a process like
backpropagation, and the immense number of repetitions needed for such training methods
seems far from realistic. Attention to these matters will probably be necessary if convincing
connectionist models of human cognitive processing are to be constructed. A more serious
objection must also be met. It is widely felt, especially among classicists, that neural networks
are not particularly goodat the kind of rule based processing that is thought to undergird
language, reasoning, and higher forms of thought. (For a well known critique of this kind see
Pinker and Prince 1988.) We will discuss the matter further when we turn to the systematicity
debate.

        Another common criticism of neural networks, particularly in robotics, is that they
require a large diversity of training for real-world operation.A. K. Dewdney, a former Scientific
American columnist, wrote in 1997, "Although neural nets do solve a few toy problems, their
powers of computation are so limited that I am surprised anyone takes them seriously as a
general problem-solving tool."

        Arguments for Dewdney's position are that to implement large and effective software
neural networks, much processing and storage resources need to be committed. While the brain
has hardware tailored to the task of processing signals through a graph of neurons, simulating
even a most simplified form on Von Neumann technology may compel a NN designer to fill
many millions of database rows for its connections - which can lead to abusive RAM and HD
necessities. Furthermore, the designer of NN systems will often need to simulate the
transmission of signals through many of these connections and their associated neurons - which
must often be matched with incredible amounts of CPU processing power and time. While neural
networks often yield effective programs, they too often do so at the cost of time and money
efficiency.

        Arguments against Dewdney's position are that neural nets have been successfully used
to solve many complex and diverse tasks, ranging from autonomously flying aircraft to detecting
credit card fraud.Some other criticisms came from believers of hybrid models (combining neural
networks and symbolic approaches). They advocate the intermix of these two approaches and
believe that hybrid models can better capture the mechanisms of the human mind




ARTIFICIAL INYELLIGENCE                               BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 33
                   ARTIFICIAL NEURAL NETWORKS

                                CONCLUSION
         As it was shown in this write up, the artificial neural network has a broad field of
applications. They can do classifications, clustering, experimental design, modeling, mapping,
etc. The ANNs are quite flexible for adaption to different type of problems and can be custom-
designed to almost any type of data representations. The warning, however, should be issued
here to the reader not to go over-excited upon a new tool just because it is new. A method itself,
no matter how powerful it may seem to be can fail easily, first, if the data do not represent or are
not correlated good enough to the information sought, secondly, if the user does not know
exactly what should be achieved, and third, if other standard methods have not been tried as well
- just in order to gain as much insight to the measurement and information space of data set as
possible. They require a lot of study, good knowledge on the theory behind it, and above all, a lot
of experimental work before they are applied to their full extend and power.

        The computing world has a lot to gain from neural networks. Their ability to learn by
example makes them very flexible and powerful. Furthermore there is no need to devise an
algorithm in order to perform a specific task; i.e. there is no need to understand the internal
mechanisms of that task. They are also very well suited for real time systems because of their
fast response and computational times which are due to their parallel architecture.

        Neural networks also contribute to other areas of research such as neurology and
psychology. They are regularly used to model parts of living organisms and to investigate the
internal mechanisms of the brain.Perhaps the most exciting aspect of neural networks is the
possibility that some day 'consious' networks might be produced. There are a number of
scientists arguing that consciousness is a 'mechanical' property and that 'consious' neural
networks are a realistic possibility. I would like to state that even though neural networks have a
huge potential we will only get the best of them when they are integrated with computing, AI,
fuzzy logic and related subjects.

       Finally, although neural networks are not perfect in their prediction, they outperform all
other methods and provide hope that one day we can more fully understand dynamic, chaotic
systems such as the stock market.




ARTIFICIAL INYELLIGENCE                                BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS
Page 34

				
DOCUMENT INFO
Description: computer science seminar
kenneth obi kenneth obi INTRODUCING THE EDGE TECHNOLOGY http://www.docstoc.com/profile/kobiatech
About I am KENNETH OBI,a graduate of University of Agriculture Abeokuta.my discipline is COMPUTER SCIENCE. My interest in ICT.