Document Sample
					  Chapter 3:
Introduction to
Neural Network
     Before we start…

Information processing technology
  inspired by studies of brain and
        the nervous system.
              Brains Capability

   its performance tends
    to degrade gracefully under
    partial damage.
   it can learn (reorganize itself)
    from experience.
   it performs massively parallel computations
    extremely efficiently.
   it supports our intelligence and self-
 What Is A Neural Network?
"...a computing system made up of a number of simple,
   highly interconnected processing elements, which
   process information by their dynamic state response to
   external inputs. ”

An ANN is a network of many very simple processors
  ("units"), each possibly having a (small amount of)
  local memory. The units are connected by
  unidirectional communication channels
  ("connections"), which carry numeric (as opposed to
  symbolic) data. The units operate only on their local
  data and on the inputs they receive via the
   1943 --- McCulloch and Pitts (start of the modern era of
    neural networks). Logical calculus of neural networks. A
    network consists of sufficient number of neurons (using a
    simple model) and properly set synaptic connections can
    compute any computable function.

   1949 --- Hebb's book "The organization of behavior". An
    explicit statement of a physiological learning rule for
    synaptic modification was presented for the first time.

   Hebb proposes that the connectivity of the brain is
    continually changing as an organism learns differing
    functional tasks, and that neural assemblies are created by
    such changes.
   Hebb's work was immensely influential among

   1958 --- Rosenblatt introduced Perceptron A novel method
    of supervised learning.
               Historical … Cont’d
   Perceptron convergence theorem.

   Least mean-square (LMS) algorithm
   1969 --- Minsky and Papert showed limits on perceptron computation.
    Minsky and Papert showed that there are fundamental limits on what
    single-layer perceptrons can compute.

   They speculated that the limits could not be overcome for the multi-
    layer version

   1982 --- Hopfield's networks Hopfield showed how to use "Ising spin
    glass" type of model to store information in dynamically stable

        His work paved the way for physicists to enter neural modeling,
         thereby transforming the field of neural networks.
             HISTORICAL (cont..)
   1982 --- Kohonen's self-organizing maps (SOM) Kohonen's self-organizing
    maps is capable of reproducing important aspects of the structure of
    biological neural nets: Data representation using topographic maps (which
    are common in the nervous systems). SOM also has a wide range of

   SOM shows how the output layer can pick up the correlational structure
    (from the inputs) in the form of the spatial arrangement of units.

   1985 --- Ackley, Hinton, and Sejnowski, developed Boltzmann machine,
    which was the first successful realization of a multilayer neural network.

   1986 --- Rumelhart, Hinton, and Williams developed the back-propagation
    algorithm --- the most popular learning algorithm for the training of
    multilayer perceptrons. It has been the workhorse for many neural network
             Why Neural Nets?
   Adaptive learning: An ability to learn how to do tasks
    based on the data given for training or initial experience.
   Self-Organisation: An ANN can create its own
    organisation or representation of the information it
    receives during learning time.
   Real Time Operation: ANN computations may be carried
    out in parallel, and special hardware devices are being
    designed and manufactured which take advantage of this
   Fault Tolerance via Redundant Information Coding:
    Partial destruction of a network leads to the
    corresponding degradation of performance. However,
    some network capabilities may be retained even with
    major network damage.
              Before we start..

Processin   Elemen   Energy   Processin   Style of      Fault     learns   Intelligent
g           t        use      g           computation   toleran            conscious
element     size              speed                     t

1014        10-6     30 W 100 Hz          parallel,     Yes       Yes      Usually
synapses                                  distribute
108         10-6     30W      109Hz       Serial,       No        A little Not
                                          centralize                       (yet)

Differentiated between brain and computer
Neuron Vs   ANN
           Relationships between
       biological & artificial networks
i.     Soma                 i.     Node
ii.    Dendrites            ii.    Input
iii.   Axon                 iii.   Output
iv.    Synapse              iv.    Weight
v.     Slow Speed           v.     Fast Speed
vi.    Many Neurons - 109   vi.    Few Neurons
                                    - a dozen to
                                   hundreds of
    Summary of selected biophysical mechanisms and their
     corresponding possible neural operations they could

         Biophysical Mechanism                         Neural Operation
   Action potential initiation                Analog OR/AND 1-bit A/D converter
                                               Current-to-frequency transducer
   Repetitive spiking activity                Impulse transmission
   Action potential conduction                Sigmoid threshold or Nonreciprocal 2-
   Chemically mediated synaptic                port negative resistance
    transduction                               Reciprocal 1-port resistance

   Electrically mediated synaptic             Linear addition
   Distributed excitatory synapses in         Local AND-NOT “presynaptic
    dendritic tree                              inhibition”
   Excitatory and inhibitory synapses of      Modulating and routing transmission
    dendritic spine                             of signals
   Long distance action of
    Neural Network Fundamentals
   Components and Structures
      Composed of processing elements organized in different
       ways to form the network’s structures
   Processing Elements
      Artificial neurons = Processing Elements (PEs)

      Each PE receives, process input , and delivers a single
       output (refer to diagram)
      Input can be raw or the output of other processing
        Neural Network Fundamentals…
   The Network
       Composed of a collection of neuron grouped
        in layers (input, intermediate, output)
   Network Structure
       Can be organized in several different ways –
        neuron connected into different ways
   Network Information Processing
       After structure is determined, information can
        be processed
    Neural Network Fundamentals…
   Input
       Corresponds to a single attribute.
       Input can be text, pictures, voice
       Preprocessing needed to convert this data to meaningful inputs
   Ouput
       Contains the solution to a problem
       Post-processing is required
   Weights
       Express the relative strength (mathematic value) of the input
       Crucial in that they store learned patterns of information.

xw    i   i

                   Neural Network Fundamentals…
i 1

                  Summation Function
                     Computes the weighted sum all the input elements
                      entering each processing elements
                     Multiplies each input value by its weight and totals the
                      value for a weighted sum Y.
                     The formula is

                                             And for the jth

                      The summation function computes the internal simulation
                       or activation level of the neuron. Neuron may or may not
                       produce an output
    Neural Network Fundamentals…
   Transformation (Transfer) Function
       This Function is to produce the output after summations
        function has been compute (if necessary).
       The popular - transfer function (sigmoid function)- useful
        nonlinear transfer function is

       YT = transformed (normalized) value of Y
       Transformation modifies the output level to be within
        reasonable values ( 0-1)
       This performed before the output reach the next level
       Without transformation = the value become very large
        especially ehen there are several layers of neuron
           Learning Algorithm
   There are a lot of learning algorithm –
    classified as supervised learning and
    unsupervised Learning.
   Supervised Learning – uses a set of inputs
    for which the appropriate (desired) output
    are know
   Unsupervised Learning – only input stimuli
    are shown to the network. The network is
    2 Main Types of ANN

Supervised              Unsupervised

e.g:             e.g:
 Adaline
 Perceptron     Competitive learning networks
 MLP              - SOM
 RBF              - ART families
 Fuzzy ARTMAP     - neocognition
 etc.             - etc.
Supervised Network


             error   +

Unsupervised ANN



              How does an ANN learn
                                     O      Connected by links-each
                                     U       link has a numerical
                  neurons            T       weight
                                     P      Weight
                                     U          basic means of long-term
                                     T           memory in ANNs
                                     S          Express the strength
                                            Learns through repeated
                                     N       adjustments of these
    Input      Middle       Output   S
    layer      layer        Layer
        Learning Process of ANN
   Learn from
    experience                                 Compute

       Learning algorithms                    output

       Recognize pattern of
   Involves 3 tasks             Adjust   No        Is
       Compute outputs          Weight           Output
       Compare outputs with
        desired targets                                  yes
       Adjust the weights and
        repeat the process                     Stop
    NN Application Development
   Similar to the structured design methodologies of
    traditional computer-based IS
   There are 9 step (Turban, Aronson. 2001)
       Collect data
       Separate into training and test, sets
       Define a network structure
       Select a training algorithm
       Set, parameters, value, initialize weights
       Transform data to network inputs
       Start training and determine and revise weights
       Stop and test
       Implementation; use the network with new cases
  What Applications Should
Neural Networks Be Used For?

   capturing associations or discovering
    regularities within a set of patterns;
   where the volume, number of variables or
    diversity of the data is very great;
   the relationships between variables are
    vaguely understood; or,
   the relationships are difficult to describe
    adequately with conventional approaches.
Mathematic Relate
    Neural Network Architecture
   Feedforward Flow                    Recurrent Structure
       Algorithms –                        Algorithms – TrueTime
        Backpropagation, Madaline            Algorithm
        III                                 Neuron Output feedback as
       Neuron Output feedforward            neuron input
        to subsequent layer                 Solving problem – dynamic
       Solving problem – static             time-dependent problems
        pattern recognition,                 (e.g: sales forecasting,
        classification and                   process analysis, sequence
        generalization problems              recognition, and sequence
        (eg: quality control, loan           generation)
              Topologies of ANN

Fully-connected feed-forward
                                Partially recurrent network

                   Fully recurrent network
   Parallel processing
   Distributed representations
   Online (i.e., incremental) algorithm
   Simple computations
   Robust with respect to noisy data
   Robust with respect to node failure
   Empirically shown to work well for many
    problem domains
   Slow training
   Poor interpretability
   Network topology layouts ad hoc
   Hard to debug because distributed
    representations preclude content checking
   May converge to a local, not global,
    minimum of error
   Not known how to model higher-level
    cognitive mechanisms
   May be hard to describe a problem in
    terms of features with numerical values
            Limitation of ANN
   Lack of explanation capability
   Do not produce an explicit model
   Do not perform well on tasks that people
    do not perform well
   Required extensive training and testing of
              Applications of NN
   best at identifying patterns or trends in
    data, they are well suited for prediction or
    forecasting needs including:
       sales forecasting
       industrial process control
       customer research
       data validation
       risk management
       target marketing
       Example of Applications

   NETtalk (Sejnowski and Rosenberg, 1987)
      Maps character strings into phonemes for learning
       speech from text.
   Neurogammon (Tesauro and Sejnowski, 1989)
      Backgammon learning program

   Speech recognition (Waibel, 1989)
      Converts sound to text

   Character recognition (Le Cun et al., 1989)
   Face Recognition (Mitchell)
   ALVINN (Pomerleau, 1988)
                   Other Issues
   How to Set Alpha, the Learning Rate Parameter?
    Use a tuning set or cross-validation to train using several
    candidate values for alpha, and then select the value that
    gives the lowest error

   How to Estimate the Error?
    Use cross-validation (or some other evaluation method)
    multiple times with different random initial weights. Report
    the average error rate.

   How many Hidden Layers and How many Hidden Units per
    Usually just one hidden layer is used (i.e., a 2-layer
    network). How many units should it contain? Too few =>
    can't learn. Too many => poor generalization. Determine
    experimentally using a tuning set or cross-validation to
    select number that minimizes error.
            Other Issues (cont..)
   How many examples in the Training Set?
    Under what circumstances can I be assured that a net
    that is trained to classify 1 - e/2 of the training set
    correctly, will also classify 1 - e of the testing set
    correctly? Clearly, the larger the training set the better
    the generalization, but the longer the training time
    required. But to obtain 1 - e correct classification on the
    testing set, training set should be of size approximately
    n/e, where n is the number of weights in the network and
    e is a fraction between 0 and 1. For example, if e=.1 and
    n=80, then a training set of size 800 that is trained until
    95% correct classification is achieved on the training set,
    should produce 90% correct classification on the testing
            Other Issues (cont..)
When to Stop?

   Too much training "overfits" the data, and hence the
    error rate will go up on the testing set. Hence it is not
    usually advantageous to continue training until the MSE is
    minimized. Instead, train the network until the error rate
    on a tuning set starts to increase.

Shared By: