genetic_algorithms_crypto by nuhman10


									                                                                            Stephen Hurley
                                                                         CS 178 Term Paper

            The Current Involvement of Genetic Algorithms in Cryptography


        Cryptography has been in use since the time of the ancient Romans with simple
monoalphabetic substitution. It has gradually developed into a much more sophisticated
field with techniques such as knapsacks, modular arithmetic, and public key
cryptography. Modern computers have also allowed the use of more complicated
algorithms that are very difficult to decipher, allowing for incredibly secure transfer of
data between parties.
        As algorithms have getting more powerful by the use of advanced computing,
however, techniques for decryption have also increased in their effectiveness. One such
method that is currently being studied is genetic algorithms. Genetic algorithms are a
way to imitate natural selection to hone in on an optimal solution for a given
        People have been using genetic algorithms on a practical basis since the late
1970's with the study of cellular automata.1 With the advent of more powerful
computers, people were able to simulate more complex genetic processes and develop
more specialized theories. Genetic algorithms today are used in a variety of fields
including business analysis, robotics, search engines, and materials analysis.

Description of a Genetic Algorithm

         A genetic algorithm begins with a set of data, called the population, that is usually
at least psuedo-randomized, with each piece of data, called the individual, being within
the limits of the problem space. The individual is applied to a function to give an
estimate how close to the optimal answer the individual is. The function usually returns a
number, called the fitness, lower or higher (depending on the specific implementation)
determining its closeness to optimal. After each individual is given a fitness, the
algorithm creates a new generation from the original population. Individuals are
randomly selected based on the fitness weight to combine with others via mutations and
crossovers to produce individuals for the new population. Then the process starts over
again, except instead with the new population creating a third generation based on the
fitness of each individual.
         This process roughly mimics natural selection, producing a highly optimized
population after a number of generations have passed. Because of this, there are only
certain problems for which genetic algorithms work well. These fall into a category
where solutions cannot easily be found in any deterministic way, such as a formula or
specific algorithm. Problems that can easily be deterministically solved are usually
solved faster using the predetermined solution rather than a genetic algorithm. The
solutions also cannot be too random, where there is no easy way to tell whether a given
proposition is close to the optimal solution. In these cases, the fitness function could not
be computed in any efficient way.
Analysis of the Genetic Algorithm in Simple Cases of Cryptoanalysis

        Monoalphabetic substitution ciphers are typical viewed as the simplest of all
method of encryption. Consequently, they are also the simplest to decrypt as well.
Methods of using cribs or frequency analysis can quickly decipher a given text. In the
encryption, each instance of a letter is replaced by another letter. For example, every 'a'
in a given text might be replaced by a 'x'.
        In one study in which a genetic algorithm was applied to monoalphabetic
substitution,2 the individuals represent a substitution of all 26 letters. For example,
(q,w,e,r,t,y,u,i,o,p,a,s,d,f,g,h,j,k,l,z,x,c,v,b,n,m) would be an individual and a key to
decrypt some text. The fitness function is a count of adjacent letters compared to the
average frequency of adjacent letters in a larger body of known text. The closer the
frequency of the adjacent letter counts in the encrypted text decrypted with the key to the
count in the larger amount of known text, the higher the fitness rating. The mutation is a
swapping between random letters in the individual. Crossover is more difficult, however,
because there cannot be more than one instance of a letter in any given individual. The
process makes a copy of one of the parents and then gets a subsection of the other parent.
The child then has its letters swapped around in a minimal fashion until the subsection of
the parent is seen in the child in the same location.
        This method, and similar variants as discussed in the paper,2 do not work very
well. If they do work, the standard methods of decryption are still much faster and more
efficient. The reason for this is that the problem is a highly mathematical one if using the
frequency of adjacent letters. It is also solved already in a variety of ways, and so a
genetic algorithm will not work with any efficiency compared to the other solutions.

Analysis of the Genetic Algorithm in Comlex Cases of Cryptoanalysis

        The Merkle-Hellman knapsack cipher has also been studied from the point of
view of genetic algorithms. The knapsack cipher uses a superincreasing sequence of
numbers, b, that are reordered in a secret way and then modified using the equation
                ai = W bi mod M.
The W and M are also kept private, so that only the a sequence of numbers is public. To
encrypt a given message, the message's binary equivalent form is divided into sections
the size of public key sequence, making blocks of binary sequences. Then the inner
product of each binary sequence with the a sequence is sent to the user as the encrypted
code for that block. This is done repeatedly for each block of text that is encrypted.3
        There exist methods that can efficiently decipher encrypted text with just the
public key that are quite efficient. A genetic algorithm has been used as an attempt to
create a more efficient method of cryptanalysis. The individuals are a binary sequence
with each element being 0 or 1 representing whether or not that term should be included
in the knapsack sum. The fitness function is very complicated3 and essentially measures
the proximity of the sum of terms, using the individual's sequence, with the actual sum of
the knapsack. The crossover process is just a swapping of the a block of elements in the
sequences of two individuals. Mutation is even simpler, with a random bit in the
individual's sequence flipped.
        This method was successful at solving simpler knapsack problems.
Unfortunately, although there may have been some slight gain in searching a smaller key
space, they overhead for the genetic algorithm canceled out any possible efficiencies.
Thus, the traditional methods of solving the Merkle-Hellman knapsack problem are still
more efficient than the use of genetic algorithms.


        There were a number of other cryptographic systems studied by others.3 A
current trend among nearly all of them is that either the genetic algorithm did not solve
the problem, or when it did, it was not nearly as efficient as preexisting methods of
cryptanalysis. This makes a strong argument that nearly all cryptographic systems that
have known solutions are not well solved using genetic algorithms. Further study on this
topic should include systems that do not have known, or efficiently solved, solutions.
Genetic algorithms generally do very well with these types of problems and may greatly
further the field of cryptanalysis.


1.     Biesbrock, R. History of GA's.

2.     Gester, J. Solving Substitution Ciphers with Genetic Algorithm.

3.     Delman, B. Genetic Algorithms in Cryptography.

To top