Neural Nets, Genetic
Algorithms and Fuzzy Logic:
Key Tools for AI Programming
By Jason Kulhanek
By Jason Kulhanek
Solving real-world problems using few years ago. Today’s standard desktop systems to create ultimate self-adapting
only one AI technology, such as Neural can run these solutions in a few hours — hybrid systems.
Nets, Genetic Algorithms or Fuzzy Logic or even minutes. With the availability of
can be difficult due to their unique increased computing power, AI can now Short Comings of
characteristics and inherent weaknesses. begin to achieve its real potential.
They all have their strengths: Neural Nets After creating numerous AI systems,
are dynamic and robust, Genetic covering a variety of fields, including A common problem to which AI is
Algorithms are highly adaptable and human genome classification systems, often applied is stock market prediction
Fuzzy Logic breaks down normal barriers credit card debt systems and financial —where a standard back-prop neural
in defining everyday problems. forecast systems (my specialty), I have network gives satisfactory results on the
Unfortunately, these technologies also developed a hybrid approach for solving test and evaluation sets. However, when
have their weakness: Neural Nets may most problems. How does one create a used in real life, these systems usually fail
freeze in local minimums (not finding new system in a domain in which one – typically not doing much better than
the optimal solution), Genetic does not have the expertise (i.e. I’m no the 50/50 you get with flipping a coin (or
Algorithms are limited to specific Bio-Technician)? With modern AI we would have many more millionaires).
problem types and Fuzzy Logic is only technology and hybrid systems, you do What is missing? In simple terms, the
adaptable within its own domain. not have to be an expert — just be dynamics of the stock market are too
Developers can overcome most of these creative and understand the AI technology complicated for a simple neural network
weaknesses — after sufficient and tools. to capture. In addition, the optimization
programming and customizing. Well-designed hybrid systems are self- of any given single neural network will
The simplest approach for creating a adapting, making it easier than one would only be as good as its domain allows (it’s
complete, robust, self-adapting system is expect to solve problems. This is extremely input vs. output relationships). Instead,
using hybrids – a combination of these important for the new challenges facing the solution demands a more
technologies. This approach offers faster AI technology today. Building a simple complex/hybrid system, where each
development time, producing products back-prop neural net or a Kohonen neural network captures something a little
that overcome the inherent weaknesses of classification system to solve simple different with the system taking it all into
each individual technology. Reusable problems is no longer a technical account — combining them into a single
code via classes and inheritance make this challenge. The Internet has tons of code output. This article presents a walk-
task much easier than it may first appear. related to these basic systems. What through of the basic steps for creating
These systems were not economical a few separates the novice from the professional, dynamic, self-adapting hybrid systems
years ago with applications running for since they both have access to similar that go beyond those normally seen today.
days or even weeks to fully optimize. One code, is the professional understands what
of the systems I developed uses over the code is actually doing and knows how Beginning a Project
300mb of memory to run efficiently – to apply it efficiently and correctly. Top When beginning any project, two
something that was very expensive only a professionals know how to combine these key areas can make or break a system and
PCAI 18 17.2
focusing on their details is important. Project Feasibility the output solution, the system will never
Although they seem basic, these concepts Another crucial design decision is be successful. Determine the relevancy or
form the basis for most AI systems: determining whether building this system move on — understanding the relevancy
What is the systems expected output (i.e. is even possible and once built, is there will help in determining the final tests
What is the goal)? any possible optimization? Knowing the involved in optimizing the system.
And, what variables are available and how target output and its relationship to the
do they interrelate? (Note: use all inputs is critical in designing a successful Information Flow
variables that are available as they can be system. The solution to any problem Combining these two concepts
optimized later). solved with this technology must be establishes the foundation for the systems
available within the input data. information flow. The process of
Project Output Consider a simple example of a loan analyzing this information flow assists in
Possibly an AI system’s most approval program with inputs that selecting an appropriate core AI system or
important consideration is the expected include income, debt, own/rent, etc. combination of systems. Determining the
output. Understanding the available These are all relevant to the loan approval expected output is critical in deciding the
variables and how the variables relate to output (it’s the same information on the core system. For example: The loan
each other defines how the data will be loan application). With this information, program is best implemented as a
preprocessed and presented to the it is possible to make a valid credit/risk Classification system — classifying each
network. A key component of any determination. person as approved or declined. A stock
successful AI system is the data This concept is not always clear – I predicting program would be exactly as
preprocessing – a fact that cannot be over have had requests to build a system that stated — a Prediction system. Both of
stressed. It helps normalize the data predicts the lottery. (Wouldn’t that be those examples could be determined by
throughout, increases relevancy, while nice?). The output of this system would looking at the expected outputs of the
removing anomalies in the data that be the next lottery pick based upon past system. The main choices are: Classifier,
might cause learning problems in the lottery picks. It is easy to see that every Predictor, Associative or Fuzzy Logic.
system. Whether it is a simple linear time the lottery is drawn the odds are the After determining the foundation of
transform, a complex wavelet transform, same. Therefore, this lottery-predicting the system, document all variables
or some other operation, depends on the program becomes an expensive random involved in the system – input variables as
data, the system used and the expected number generator since the information well as the system’s structure variables.
output. This could be the subject for an in the input cannot predict the output. This information supplies details needed
entirely separate article. After careful examination of a potential beyond the foundation. For example,
system, if the inputs are not relevant to consider a standard back-propagation
(Continued on Page 21)
PCAI 19 17.2
Sidebar On Neural Networks
Neural Nets are fascinating mathematical models
capable of adapting to a large variety of problems.
Available in an assortment of structures and sizes, the
majority of nets follow the basic principle of a
neuron/node attached to other neurons/nodes via
weights. Summing them together layer by layer, using
some form of activation function (typically sigmoidal),
generates a final output/result.
Types of Training
Learning takes place in several forms — depending
on whether the system is performing supervised
training (it knows the expected output which is used
during training) or unsupervised training (more of a
clustering/classification solution where the output may
not be known before hand). Understanding the
problem along with the associated variables will dictate Figure 1: Typical Back-Propagation Neural Net
which type of system to create.
Understand the limitations to Neural Nets before implementing any complete system. A key concept to recognize is that the
system cannot give out more information than it receives. If it does not receive the proper input, it cannot produce the proper
output. Although often referred to as “Black Boxes”, Neural Nets are not “Magic Boxes”. Knowing the relevancy of the inputs vs.
outputs before developing a system greatly increases the success rate. Additional limitations to neural networks are: Convergence (not
finding a solution), Local Minimums (finding a poor solution when a better one exists) and Over Training (the network is trained to
only match one set of inputs – similar to memorization).
The next area to examine, the Learning Algorithm, is the second most powerful system component. Specialized Learning
Algorithms help overcome an assortment of problems known to plague Neural Networks. One important aspect, the learning rate,
improves convergence without decreasing performance. However, the term variable learning rates does not mean a single variable
learning rate for the entire system. When dealing with multi-dimensional data, the search space is multi-dimensional, as well.
Therefore, one particular dimension (variable) may not do as well using a learning rate optimized for a different dimension. A
variable learning rate at the “Layer” level is good but at the “Neuron/Node” level is even better. Having it at the “Weight” level is the
best — if a stable algorithm to handle the training is available (these systems can reciprocate wildly if improperly constructed).
Neural Nets are often stuck in local minimums, especially when the search space is multi-dimensional. Picture a mountain terrain
in which a huge ball is rolling around — looking for the lowest spot. Depending on the mountain terrain and the weight of the ball,
there may be hills that it cannot overcome to reach that lowest point. To help overcome these, a momentum term is added to the ball
each time it rolls – equivalent to giving it a little extra push. This extra momentum overcomes those small hills that it normally could
not overcome without the extra push. Too large of a push though, and it could go past the desired lowest spot and be stuck behind
another hill. Too small of a push and it will not be able to overcome hills that it should using this technique.
Just about anybody who works with Neural Nets is familiar with the term “over training” — where a Neural Network learns the
data and does not generalize well outside of the training data. Consider an example with polynomials. A 2- or 3-degree polynomial
on a 50-point data set will generalize well — maybe even too much. On the other hand, a 40-degree polynomial on a 50-point data
set will fit the data great — but will do poorly on data outside of the samples. Weight decay algorithms prevent large weights —
known to cause over trained systems by enabling the net to easily fit the data in some cases. By forcing the neural nets to spread the
information throughout their weights using weight decay (and introduce a slight bit of noise as a side affect), you end up with a more
generalized neural net that performs much better in the real world. Combining these creates a very intense learning algorithm,
balanced by the weight decay.
I’ve designed a proprietary algorithm that adjusts the learning rate based upon past data vs. past error at the weight level along
with individual momentum terms and an exponential weight decay algorithm for solid learning/generalization capabilities in almost
any environment (it actually scales the learning algorithm itself according to past training data making it self adapting). This
algorithm is used in ANNI, an Artificial Neural Network Investing program found at http://NeuralInvesting.com
To find more information on Neural Networks — see the “Neural Network Primer” in Volume 16.3 of PC AI Online Magazine
or visit the PC AI home page at www.pcai.com and search for Neural Networks. For a 3D learning algorithm picture and it's it's
theoretical search space (it's actually my old standard gradient descent learning algorithm vs my new auto-scaling fully
optimized gradient descent learning algorithm) visit http://NeuralInvesting.com/LearningAlgorithm.htm.
It's very interesting to visually see a learning algorithm rather than just it's equation.
PCAI 20 17.2
(Coninued from p.19) on this later). training or testing of the nets during their
neural network, where system variables These systems do not start with one learning phase.
could be: neural net but with an entire population Collect all tests associated with how
Inputs used (using all inputs of neural nets, setup differently based the inputs relate to the outputs, placing
supplied may not be desired) upon the genetic genome’s range (the them together since they will make up the
Number of hidden neurons structure described above that holds the entire system evaluation. If the developers
Type of activation function variables / chromosomes). This supplies a are not domain experts, this is where
Learning rate population of comparable nets and results working closely with the real expert is
Momentum term, etc. used to further optimize the variables important — making sure the tests are
involved via the Genetic Crossover. relevant to the desired results. When
These become the Chromosomes in However, before establishing the Genetic performing the tests, rank each test result
the Genome part of the genetic algorithm Crossover, it is important to define tests separately and combine them later in a
and the variables to be optimized — the that determine the best networks, which master ranking. Combine them
hybrid part — so they all need to be is critical to the systems optimization — independent of their ranking since
documented. After completing this list of it determines which nets survive and mediocre tests can be included and later
variables, make one more pass down this reproduce and which do not. downplayed — this method helps decide
list to supply a range to each variable. ties in the system. Use the master rank
Consider inputs, for example. The range Network Evaluation value when performing the genetic
could be a minimum of 3 and a This next step is deciding how to crossover.
maximum of all that is available. The evaluate each neural network’s This approach enables a proper
system operates and optimizes within performance (which tests to perform) and balance between the importance of each
these defined ranges. Quickly multiplying this must be done carefully and with a lot test and its relationship to the dynamic
all the ranges together produces the of deliberation. Relate these tests to the variables in the neural network structure.
mathematical optimization universe. If goals of the system as much as possible as By doing this, the system generates nets
this number is excessively large, consider they will make up the Genetic Algorithms that produce the best output without
reducing the ranges in some variables or Fitness Function. In a financial forecast requiring the developer to know what
use steps (i.e. 1,2,3…25 becomes system, these may be the results from a variables are important. In addition, the
1,3,5…25). These ranges do not have to correlation analysis, average deviation average results from the top ranked nets
be precise at first. Evaluating the system from actual vs. forecasted value, etc. are available for other uses (i.e. “early
after a few application runs will help fine- Perform these tests on an out-of-sample stopping” algorithms during training,
tune these ranges at a later point (more group and not a group used in the even dynamic Genetic Fitness Functions).
PCAI 21 17.2
Typically, the developer does not perform poorly. This illustrates that if you and robustness and opens up an entire
understand, or even know, what is were limited to a standard system and had new world of powerful systems to solve
considered good or best when unfortunately picked that combination tomorrow’s problems. Some say that it is
determining a value for these static of variables, the solution would have an art finding the right combinations of
variables — using the averages from the been doomed from the start. variables to use. If that is true, then
top nets allows the variables to be As if a fully optimized system hybrid AI systems are dynamic art.
dynamically set. This enables the system structure was not enough of a reward for Many of these concepts and more are
to adapt to any training environment. generating a hybrid system, the final used in a program called ANNI (Artificial
reward is the ability to implement group Neural Network Investing) found at
System Tuning decision making. Just as in a business, http://NeuralInvesting.com.
After some initial trial runs, check when it comes to making important
and fine-tune the ranges used in the GA decisions it is a group of individuals that
Genomes to reduce the variables that are make the decisions, not just one person.
not needed and to increase the ones of The combined knowledge from the group For consultation or contract work regarding
greatest importance. When inspecting the is less likely to make a mistake than one specialized systems as described above, please
net population, it is easy to identify the individuals knowledge. Applying this contact Jason@NeuralInvesting.com
necessary ranges required for the system, same concept to this system, after many
even after only one run. The goal is to generations, there is a population of nets
guide the system into collecting near the that perform as well as possible, yet may For more in depth
differ in structure and results by some
center of the ranges selected. If it collects
too much to one side, adjust the range small amount. To take advantage of this discussion see the
accordingly. Additional adjustment runs and the population of nets, combine the
outputs of the best nets into one by
will ensure the new ranges are correct.
Since the genetic algorithms averaging them, weighting them or using Sidebar on page 20
optimize the neural net’s structures to their outputs as inputs to a single master
produce the best output possible with the net. The potential is limited only by ones and the
given data, these systems produce
astonishing results. Tracking the
performance of all nets from generation Conclusion Sidebar on page 23.
to generation easily shows this. During Combining AI technologies reduces
the first few generations, some nets development time, increases performance
PCAI 22 17.2
Genetic Algorithms Sidebar
Genetic Algorithms (GA) are great for Variable/ Variable/ Variable/ Variable/
optimization problems. Many people think Chromosome …… Chromosome Chromosome …… Chromosome
of a binary string when GAs are mentioned,
and in some cases they are correct. However,
most of the time, that would place an
extreme limitation on GAs capabilities. With
sufficient imagination and skill, developers
create some incredible programs using GAs. Genome Genome Genome
A simple example program could optimize
two moving averages to create Buy/Sell
Signals in the stock market.
Chromosomes and Genomes
A GA consists of variables to be Figure 2: Sample Genetic Algorithm architecture.
optimized – called the Chromosomes. For our
example program, one Chromosome is the first Moving Averages length. The Chromosomes are part of a single Genome – the
functional structure. In the case of our example that would be the length and possibly the type of both moving averages involved. Each
Genome starts life at the very beginning with a set of initially random Chromosomes. A collection of Genomes make up the entire
Defining the structure is the easy part of generating a GA. The toughest part is the fitness function, because it decides which
Genomes to keep for crossover and which to replace (survival of the fittest). The choice of fitness function must relate directly to the
systems goal – what the system is attempting to achieve. An incorrect fitness function will cause the project to create irrelevant results,
right from the start, and cause the system to fail. In the example
program, the fitness function could simply be profit. Additional tests
could be the number of trades, wins vs. losses, max drawdown, etc. The Creative Group
Anything related to the problem and the goals of the system.
Consulting & Coaching
After ranking the Genomes, crossover takes place – the system
selects two Genomes from the top group and they crossover – they Leadership, OD, HR and Training
become the Parents. Crossover algorithms are quite diverse with the
simplest approach having each variable handled independently. There
is an equal chance of either Parent 1 or Parent 2’s variable being Strategic & Operational Planning
passed down to the child. After reviewing all available variables, a
child is produced that is a mixture of both Parents. However, this
method alone is insufficient since it sometimes causes the system to Performance Improvement
stick in a local minimum.
Instructional Design, Delivery,
Mutation Algorithm Updates & Modifications
Applying a Mutation Algorithm prevents local minimums by
randomly mutating any child variable based on a low probability Technical Writing
ratio (usually around 2.5%). This approach allows new combinations Stress Management &
to occur that normally would not been possible. A combination of
these strategies is very powerful for optimizing systems. We need look Professional Burnout
no further than a mirror to see the potential of Genetic optimization. Training and Counseling
Experienced Coaching &
Looking for those technical Counseling for Executives
articles you missed in any of
our previous years -- Jocelyn Callegari
Visit PC AI at Marcia Kennedy
www.pcai.com and search 480-807-0256
through 16 years of back 480-231-1681 (cell)
PCAI 23 17.2