Neural Nets, Genetic Algorithms and Fuzzy Logic

Document Sample
scope of work template
							       Neural Nets, Genetic
   Algorithms and Fuzzy Logic:

Key Tools for AI Programming
                               By Jason Kulhanek
                               By Jason Kulhanek

     Solving real-world problems using       few years ago. Today’s standard desktop       systems to create ultimate self-adapting
only one AI technology, such as Neural       can run these solutions in a few hours —      hybrid systems.
Nets, Genetic Algorithms or Fuzzy Logic      or even minutes. With the availability of
can be difficult due to their unique         increased computing power, AI can now         Short Comings of
characteristics and inherent weaknesses.     begin to achieve its real potential.
They all have their strengths: Neural Nets        After creating numerous AI systems,
                                                                                           Specialized Systems
are dynamic and robust, Genetic              covering a variety of fields, including             A common problem to which AI is
Algorithms are highly adaptable and          human genome classification systems,          often applied is stock market prediction
Fuzzy Logic breaks down normal barriers      credit card debt systems and financial        —where a standard back-prop neural
in defining everyday problems.               forecast systems (my specialty), I have       network gives satisfactory results on the
Unfortunately, these technologies also       developed a hybrid approach for solving       test and evaluation sets. However, when
have their weakness: Neural Nets may         most problems. How does one create a          used in real life, these systems usually fail
freeze in local minimums (not finding        new system in a domain in which one           – typically not doing much better than
the optimal solution), Genetic               does not have the expertise (i.e. I’m no      the 50/50 you get with flipping a coin (or
Algorithms are limited to specific           Bio-Technician)? With modern AI               we would have many more millionaires).
problem types and Fuzzy Logic is only        technology and hybrid systems, you do         What is missing? In simple terms, the
adaptable within its own domain.             not have to be an expert — just be            dynamics of the stock market are too
Developers can overcome most of these        creative and understand the AI technology     complicated for a simple neural network
weaknesses — after sufficient                and tools.                                    to capture. In addition, the optimization
programming and customizing.                      Well-designed hybrid systems are self-   of any given single neural network will
     The simplest approach for creating a    adapting, making it easier than one would     only be as good as its domain allows (it’s
complete, robust, self-adapting system is    expect to solve problems. This is extremely   input vs. output relationships). Instead,
using hybrids – a combination of these       important for the new challenges facing       the solution demands a more
technologies. This approach offers faster    AI technology today. Building a simple        complex/hybrid system, where each
development time, producing products         back-prop neural net or a Kohonen             neural network captures something a little
that overcome the inherent weaknesses of     classification system to solve simple         different with the system taking it all into
each individual technology. Reusable         problems is no longer a technical             account — combining them into a single
code via classes and inheritance make this   challenge. The Internet has tons of code      output. This article presents a walk-
task much easier than it may first appear.   related to these basic systems. What          through of the basic steps for creating
These systems were not economical a few      separates the novice from the professional,   dynamic, self-adapting hybrid systems
years ago with applications running for      since they both have access to similar        that go beyond those normally seen today.
days or even weeks to fully optimize. One    code, is the professional understands what
of the systems I developed uses over         the code is actually doing and knows how      Beginning a Project
300mb of memory to run efficiently –         to apply it efficiently and correctly. Top         When beginning any project, two
something that was very expensive only a     professionals know how to combine these       key areas can make or break a system and

PCAI                                                             18                                                                17.2
focusing on their details is important.       Project Feasibility                            the output solution, the system will never
Although they seem basic, these concepts            Another crucial design decision is       be successful. Determine the relevancy or
form the basis for most AI systems:           determining whether building this system       move on — understanding the relevancy
What is the systems expected output (i.e.     is even possible and once built, is there      will help in determining the final tests
What is the goal)?                            any possible optimization? Knowing the         involved in optimizing the system.
And, what variables are available and how     target output and its relationship to the
do they interrelate? (Note: use all           inputs is critical in designing a successful   Information Flow
variables that are available as they can be   system. The solution to any problem                 Combining these two concepts
optimized later).                             solved with this technology must be            establishes the foundation for the systems
                                              available within the input data.               information flow. The process of
Project Output                                      Consider a simple example of a loan      analyzing this information flow assists in
     Possibly an AI system’s most             approval program with inputs that              selecting an appropriate core AI system or
important consideration is the expected       include income, debt, own/rent, etc.           combination of systems. Determining the
output. Understanding the available           These are all relevant to the loan approval    expected output is critical in deciding the
variables and how the variables relate to     output (it’s the same information on the       core system. For example: The loan
each other defines how the data will be       loan application). With this information,      program is best implemented as a
preprocessed and presented to the             it is possible to make a valid credit/risk     Classification system — classifying each
network. A key component of any               determination.                                 person as approved or declined. A stock
successful AI system is the data                    This concept is not always clear – I     predicting program would be exactly as
preprocessing – a fact that cannot be over    have had requests to build a system that       stated — a Prediction system. Both of
stressed. It helps normalize the data         predicts the lottery. (Wouldn’t that be        those examples could be determined by
throughout, increases relevancy, while        nice?). The output of this system would        looking at the expected outputs of the
removing anomalies in the data that           be the next lottery pick based upon past       system. The main choices are: Classifier,
might cause learning problems in the          lottery picks. It is easy to see that every    Predictor, Associative or Fuzzy Logic.
system. Whether it is a simple linear         time the lottery is drawn the odds are the          After determining the foundation of
transform, a complex wavelet transform,       same. Therefore, this lottery-predicting       the system, document all variables
or some other operation, depends on the       program becomes an expensive random            involved in the system – input variables as
data, the system used and the expected        number generator since the information         well as the system’s structure variables.
output. This could be the subject for an      in the input cannot predict the output.        This information supplies details needed
entirely separate article.                    After careful examination of a potential       beyond the foundation. For example,
                                              system, if the inputs are not relevant to      consider a standard back-propagation
                                                                                                 (Continued on Page 21)




PCAI                                                               19                                                              17.2
   Sidebar On Neural Networks
   Neural Nets are fascinating mathematical models
capable of adapting to a large variety of problems.
Available in an assortment of structures and sizes, the
majority of nets follow the basic principle of a
neuron/node attached to other neurons/nodes via
weights. Summing them together layer by layer, using
some form of activation function (typically sigmoidal),
generates a final output/result.

Types of Training
     Learning takes place in several forms — depending
on whether the system is performing supervised
training (it knows the expected output which is used
during training) or unsupervised training (more of a
clustering/classification solution where the output may
not be known before hand). Understanding the
problem along with the associated variables will dictate Figure 1: Typical Back-Propagation Neural Net
which type of system to create.
     Understand the limitations to Neural Nets before implementing any complete system. A key concept to recognize is that the
system cannot give out more information than it receives. If it does not receive the proper input, it cannot produce the proper
output. Although often referred to as “Black Boxes”, Neural Nets are not “Magic Boxes”. Knowing the relevancy of the inputs vs.
outputs before developing a system greatly increases the success rate. Additional limitations to neural networks are: Convergence (not
finding a solution), Local Minimums (finding a poor solution when a better one exists) and Over Training (the network is trained to
only match one set of inputs – similar to memorization).

Learning Algorithm
     The next area to examine, the Learning Algorithm, is the second most powerful system component. Specialized Learning
Algorithms help overcome an assortment of problems known to plague Neural Networks. One important aspect, the learning rate,
improves convergence without decreasing performance. However, the term variable learning rates does not mean a single variable
learning rate for the entire system. When dealing with multi-dimensional data, the search space is multi-dimensional, as well.
Therefore, one particular dimension (variable) may not do as well using a learning rate optimized for a different dimension. A
variable learning rate at the “Layer” level is good but at the “Neuron/Node” level is even better. Having it at the “Weight” level is the
best — if a stable algorithm to handle the training is available (these systems can reciprocate wildly if improperly constructed).

Local Minimums
    Neural Nets are often stuck in local minimums, especially when the search space is multi-dimensional. Picture a mountain terrain
in which a huge ball is rolling around — looking for the lowest spot. Depending on the mountain terrain and the weight of the ball,
there may be hills that it cannot overcome to reach that lowest point. To help overcome these, a momentum term is added to the ball
each time it rolls – equivalent to giving it a little extra push. This extra momentum overcomes those small hills that it normally could
not overcome without the extra push. Too large of a push though, and it could go past the desired lowest spot and be stuck behind
another hill. Too small of a push and it will not be able to overcome hills that it should using this technique.

Over Training
      Just about anybody who works with Neural Nets is familiar with the term “over training” — where a Neural Network learns the
data and does not generalize well outside of the training data. Consider an example with polynomials. A 2- or 3-degree polynomial
on a 50-point data set will generalize well — maybe even too much. On the other hand, a 40-degree polynomial on a 50-point data
set will fit the data great — but will do poorly on data outside of the samples. Weight decay algorithms prevent large weights —
known to cause over trained systems by enabling the net to easily fit the data in some cases. By forcing the neural nets to spread the
information throughout their weights using weight decay (and introduce a slight bit of noise as a side affect), you end up with a more
generalized neural net that performs much better in the real world. Combining these creates a very intense learning algorithm,
balanced by the weight decay.
      I’ve designed a proprietary algorithm that adjusts the learning rate based upon past data vs. past error at the weight level along
with individual momentum terms and an exponential weight decay algorithm for solid learning/generalization capabilities in almost
any environment (it actually scales the learning algorithm itself according to past training data making it self adapting). This
algorithm is used in ANNI, an Artificial Neural Network Investing program found at http://NeuralInvesting.com
      To find more information on Neural Networks — see the “Neural Network Primer” in Volume 16.3 of PC AI Online Magazine
or visit the PC AI home page at www.pcai.com and search for Neural Networks. For a 3D learning algorithm picture and it's it's
theoretical search space (it's actually my old standard gradient descent learning algorithm vs my new auto-scaling fully
optimized gradient descent learning algorithm) visit http://NeuralInvesting.com/LearningAlgorithm.htm.
It's very interesting to visually see a learning algorithm rather than just it's equation.

PCAI                                                               20                                                                 17.2
(Coninued from p.19)                           on this later).                              training or testing of the nets during their
neural network, where system variables               These systems do not start with one    learning phase.
could be:                                      neural net but with an entire population           Collect all tests associated with how
    Inputs used (using all inputs              of neural nets, setup differently based      the inputs relate to the outputs, placing
          supplied may not be desired)         upon the genetic genome’s range (the         them together since they will make up the
    Number of hidden neurons                   structure described above that holds the     entire system evaluation. If the developers
    Type of activation function                variables / chromosomes). This supplies a    are not domain experts, this is where
    Learning rate                              population of comparable nets and results    working closely with the real expert is
    Momentum term, etc.                        used to further optimize the variables       important — making sure the tests are
                                               involved via the Genetic Crossover.          relevant to the desired results. When
      These become the Chromosomes in          However, before establishing the Genetic     performing the tests, rank each test result
the Genome part of the genetic algorithm       Crossover, it is important to define tests   separately and combine them later in a
and the variables to be optimized — the        that determine the best networks, which      master ranking. Combine them
hybrid part — so they all need to be           is critical to the systems optimization —    independent of their ranking since
documented. After completing this list of      it determines which nets survive and         mediocre tests can be included and later
variables, make one more pass down this        reproduce and which do not.                  downplayed — this method helps decide
list to supply a range to each variable.                                                    ties in the system. Use the master rank
Consider inputs, for example. The range        Network Evaluation                           value when performing the genetic
could be a minimum of 3 and a                       This next step is deciding how to       crossover.
maximum of all that is available. The          evaluate each neural network’s                     This approach enables a proper
system operates and optimizes within           performance (which tests to perform) and     balance between the importance of each
these defined ranges. Quickly multiplying      this must be done carefully and with a lot   test and its relationship to the dynamic
all the ranges together produces the           of deliberation. Relate these tests to the   variables in the neural network structure.
mathematical optimization universe. If         goals of the system as much as possible as   By doing this, the system generates nets
this number is excessively large, consider     they will make up the Genetic Algorithms     that produce the best output without
reducing the ranges in some variables or       Fitness Function. In a financial forecast    requiring the developer to know what
use steps (i.e. 1,2,3…25 becomes               system, these may be the results from a      variables are important. In addition, the
1,3,5…25). These ranges do not have to         correlation analysis, average deviation      average results from the top ranked nets
be precise at first. Evaluating the system     from actual vs. forecasted value, etc.       are available for other uses (i.e. “early
after a few application runs will help fine-   Perform these tests on an out-of-sample      stopping” algorithms during training,
tune these ranges at a later point (more       group and not a group used in the            even dynamic Genetic Fitness Functions).




PCAI                                                               21                                                              17.2
Typically, the developer does not               perform poorly. This illustrates that if you   and robustness and opens up an entire
understand, or even know, what is               were limited to a standard system and had      new world of powerful systems to solve
considered good or best when                    unfortunately picked that combination          tomorrow’s problems. Some say that it is
determining a value for these static            of variables, the solution would have          an art finding the right combinations of
variables — using the averages from the         been doomed from the start.                    variables to use. If that is true, then
top nets allows the variables to be                   As if a fully optimized system           hybrid AI systems are dynamic art.
dynamically set. This enables the system        structure was not enough of a reward for            Many of these concepts and more are
to adapt to any training environment.           generating a hybrid system, the final          used in a program called ANNI (Artificial
                                                reward is the ability to implement group       Neural Network Investing) found at
System Tuning                                   decision making. Just as in a business,        http://NeuralInvesting.com.
     After some initial trial runs, check       when it comes to making important
and fine-tune the ranges used in the GA         decisions it is a group of individuals that
Genomes to reduce the variables that are        make the decisions, not just one person.
not needed and to increase the ones of          The combined knowledge from the group          For consultation or contract work regarding
greatest importance. When inspecting the        is less likely to make a mistake than one      specialized systems as described above, please
net population, it is easy to identify the      individuals knowledge. Applying this           contact Jason@NeuralInvesting.com
necessary ranges required for the system,       same concept to this system, after many
even after only one run. The goal is to         generations, there is a population of nets
guide the system into collecting near the       that perform as well as possible, yet may          For more in depth
                                                differ in structure and results by some
center of the ranges selected. If it collects
too much to one side, adjust the range          small amount. To take advantage of this            discussion see the
accordingly. Additional adjustment runs         and the population of nets, combine the
                                                outputs of the best nets into one by
                                                                                                   Neural Networks
will ensure the new ranges are correct.
     Since the genetic algorithms               averaging them, weighting them or using           Sidebar on page 20
optimize the neural net’s structures to         their outputs as inputs to a single master
produce the best output possible with the       net. The potential is limited only by ones              and the
given data, these systems produce
astonishing results. Tracking the
                                                imagination.
                                                                                                  Genetic Algorithms
performance of all nets from generation         Conclusion                                        Sidebar on page 23.
to generation easily shows this. During             Combining AI technologies reduces
the first few generations, some nets            development time, increases performance

PCAI                                                                 22                                                                  17.2
Genetic Algorithms Sidebar
     Genetic Algorithms (GA) are great for       Variable/                Variable/                  Variable/                Variable/
optimization problems. Many people think         Chromosome       ……      Chromosome                 Chromosome      ……       Chromosome
of a binary string when GAs are mentioned,
and in some cases they are correct. However,
most of the time, that would place an
extreme limitation on GAs capabilities. With
sufficient imagination and skill, developers
create some incredible programs using GAs.                      Genome                    Genome                    Genome
A simple example program could optimize
                                                                              ……                        ……
two moving averages to create Buy/Sell
Signals in the stock market.
                                                                                       Population
Chromosomes and Genomes
    A GA consists of variables to be           Figure 2: Sample Genetic Algorithm architecture.
optimized – called the Chromosomes. For our
example program, one Chromosome is the first Moving Averages length. The Chromosomes are part of a single Genome – the
functional structure. In the case of our example that would be the length and possibly the type of both moving averages involved. Each
Genome starts life at the very beginning with a set of initially random Chromosomes. A collection of Genomes make up the entire
population.

Fitness Function
     Defining the structure is the easy part of generating a GA. The toughest part is the fitness function, because it decides which
Genomes to keep for crossover and which to replace (survival of the fittest). The choice of fitness function must relate directly to the
systems goal – what the system is attempting to achieve. An incorrect fitness function will cause the project to create irrelevant results,
right from the start, and cause the system to fail. In the example
program, the fitness function could simply be profit. Additional tests
could be the number of trades, wins vs. losses, max drawdown, etc.                 The Creative Group
Anything related to the problem and the goals of the system.
                                                                                  Consulting & Coaching
Crossover
     After ranking the Genomes, crossover takes place – the system
selects two Genomes from the top group and they crossover – they           Leadership, OD, HR and Training
become the Parents. Crossover algorithms are quite diverse with the
simplest approach having each variable handled independently. There
                                                                                       Consulting
is an equal chance of either Parent 1 or Parent 2’s variable being         Strategic & Operational Planning
passed down to the child. After reviewing all available variables, a
child is produced that is a mixture of both Parents. However, this
                                                                              Organization Effectiveness
method alone is insufficient since it sometimes causes the system to          Performance Improvement
stick in a local minimum.
                                                                            Instructional Design, Delivery,
Mutation Algorithm                                                              Updates & Modifications
     Applying a Mutation Algorithm prevents local minimums by
randomly mutating any child variable based on a low probability                    Technical Writing
ratio (usually around 2.5%). This approach allows new combinations               Stress Management &
to occur that normally would not been possible. A combination of
these strategies is very powerful for optimizing systems. We need look           Professional Burnout
no further than a mirror to see the potential of Genetic optimization.          Training and Counseling
                                                                                Experienced Coaching &
    Looking for those technical                                                Counseling for Executives
   articles you missed in any of
        our previous years --                                                           Jocelyn Callegari
           Visit PC AI at                                                               Marcia Kennedy
   www.pcai.com and search                                                                480-807-0256
     through 16 years of back                                                          480-231-1681 (cell)
               issues.                                                                   joce3@cox.net
PCAI                                                                 23                                                                 17.2

						
Related docs