Capstone

Document Sample
Capstone Powered By Docstoc
					   Cracking the Code:
Foundations of Cryptology

            A brief introduction to the
            underlying terms and
            concepts of cryptography
            and cryptanalysis




                            Martina Weber
Project Definition & Requirements

   Design and implement tools that allow you to
    quickly crack XOR-encryption schemes.
   General Requirements:
    –    XOR-Encrypt a text using a key.
    –   Given an encrypted message, produce the
        original message.
    –   Analyze the “quality” of various techniques and
        solutions.
    –   Create a Human Computer Interface for the
        system.
The Story Line

 Alice needs to send a classified message to
 Bob, however, she does not want her
 archrival, Eve, to know the confidential
 information. Therefore Alice and Bob agree
 they will disguise their message by
 employing an encryption scheme with an
 agreed upon key - but Eve is clever and
 devious...
Defining the Terms

   Plain text - the text that Alice wishes to
    transmit to Bob, in its original form
   Cipher text - the result of Alice encrypting the
    text with the key
   Decrypt - reconstructing the plaintext using
    the cipher text and the key
The Conventions

   To distinctly identify the original text from the
    encoded text, plaintext characters will be
    delimited in lower case and cipher text
    characters in upper case.

   Generally, it is standard to omit all punctuation
    and spaces from the plaintext. This is done to
    eliminate analysis based on sentence structure
    and word length in the cipher text.
Eve’s Attack: Cryptanalysis

              At first, Eve is baffled, but then
              she realizes that Alice and Bob
              only know two encryption
              schemes. Better yet, Eve is
              confident in her abilities to
              crypt analyze these schemes
              and knows she will be able to
              crack the code.
      The Encryption Schemes
Monoalphabetic Substitution Cipher   Polyalphabetic Substitution Cipher
    Each plaintext character           The plaintext message is
     in a message is                     encoded with a keyword
     substituted with a unique           of length m. Thus, a
     alternate character to              character in the original
     obtain the cipher text,             text can be mapped to
     thus any given letter of            any of the characters in
     the alphabet is always              the keyword to produce
     enciphered by the same              the cipher text.
     cipher text letter.
A Closer Look at Monoalphabetic
Substitution Ciphers

   When a monoalphabetic substitution cipher is
    used, there is a one-to-one correspondence
    between the characters in the plaintext and
    the characters in the cipher text
 A Simple Example
 Using a Monoalphabetic Substitution Cipher


    The following is the key used:

Plaintext  a   b c d e f g h i j k l m n o p q r s t u v w x y z
Ciphertext J   K L M N O P Q R S T U V W X Y Z A B C D E F G H I



    Example using the key:
     Plaintext: thisiseasy
     Cipher text: CQRBRBNJBH
    To decrypt, simply look up the encrypted character in the
     table and use the plaintext character listed directly above
A Closer Look at Polyalphabetic
Substitution Ciphers

   When a polyalphabetic substitution cipher is
    used, there is NO one-to-one correspondence
    between the characters in the plaintext and
    the characters in the cipher text; a character
    could have been encoded using any of the m
    letters of the keyword.
A 0   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1              A
B 7   1 8 1 9 2 0 2 1 2 2 2 3 2 4 25                        B
C                                                           C
D                                                           D
E      Understanding Polyalphabetic Substitution Ciphers    E
F                                                           F
G      will Require a “New” Alphabet...                     G
H                                                           H
I                                                           I
J                                                           J
K         Instead of using alphabetic characters, the      K
L
M
           new notation will be using the numerical         L
                                                            M
N          position (0 to 25) of a given letter             N
O                                                           O
P                                                           P
Q          For example, A = 0, B = 1, ..., Y = 24, Z = 25   Q
R                                                           R
S                                                           S
T                                                           T
U                                                           U
V                                                           V
W                                                           W
X                                                           X
Y                                                           Y
Z 0   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1              Z
                                                                 =
    ...And a Modest Mathematical Background in
    Modular Arithmetic

       (x mod m) is evaluated              Examples:
        as the remainder when
        dividing x by m             6 mod 3 = 0     5 mod 3 = 2
                                    20 mod 7 = 6 10 mod 7 = 3
       Modular arithmetic ([x+y]   ~~~~~~~~~~~~~~~~~~~~~~
        mod m) is performed by       Let m = 26
        first adding x and y and
        the reducing the result      (7+8) mod 26 = 15
        modulo m. Adding two         (20 + 6) mod 26 = 0
        numbers in the range 0
        to m-1 will yield a          (17 + 11) mod 26 = 2
        number in the range 0 to     (23 + 25) mod 26 = 22
        m-1
+                                                                 ÷
 Here is a “Cheat Sheet” for
 Arithmetic Modulo 26
Addition mod 26    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25
              0    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25
              1    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0
              2    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1
              3    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2
              4    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3
              5    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4
              6    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5
              7    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6
              8    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7
              9    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8
             10   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9
             11   11   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10
             12   12   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11
             13   13   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12
             14   14   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13
             15   15   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14
             16   16   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
             17   17   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16
             18   18   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17
             19   19   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18
             20   20   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19
             21   21   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20
             22   22   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21
             23   23   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22
             24   24   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23
             25   25    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24
A Simple Example
Using a Polyalphabetic Substitution Cipher


   First take the text to be encoded and convert it character
    by character into the respective numerical equivalent.
   Then choose a key to use on the text. Convert this key
    into its numerical representation as well.
   Next, write the converted key above the converted
    plaintext, repeating it as necessary, and add the
    characters together character by character, modulo 26.
   Finally, convert the encoded numerical text back to
    alphabetic text (if you so wish)

                                            Example continued...
...Example continued...


      A Simple Example
      Using a Polyalphabetic Substitution Cipher

         Step One: Converting plaintext to numerical equivalent
          Plaintext            h   a v i n g     f u n y e t
          Numerical Equivalent 7   0 21 8 13 6   5 20 13 24 4 19

         Step Two: Converting key to numerical equivalent
          Key                   y e s
          Numerical Equivalent 24 4 18
         Step Three: Adding the plaintext with the key, modulo 26
          Plaintext           7 0 21 8 13     6 5 20 13 24 4 19
          + Key               24 4 18 24 4    18 24 4 18 24 4 18
          =                   31 4 39 32 17   24 29 24 31 48 8 37
          Mod 26               5 4 13 6 17    24 3 24 5 22 8 11

         Step Four: Converting cipher text to its alphabetic equivalent
          Ciphertext            5 4 13 6 17 24 3 24 5 22 8 11
          Alphabetic Ciphertext F E N G R Y C Y F W I L
...Example continued


      A Simple Example
      Using a Polyalphabetic Substitution Cipher


        Thus, the plaintext “havingfunyet” is encoded with the key “yes” to
         the cipher text “FENGRYCYFWIL”
        Had Alice actually sent this message to Bob, he would decode it
         using the inverse procedure: subtract the key from the cipher text
         mod 26
         Ciphertext     5    4    13 6 17 24 3   24   5    22   8    11
         - Key          2    22   8 2 22 8 2     22   8    2    22   8
         =               7   26   21 8 39 32 5   46   13   24   30   19
         Mod 26          7    0   21 8 13 6 5    20   13   24    4   19
         Plaintext      h    a     v i n g f      u    n    y   e     t

        Note that subtracting in modular 26 means adding the additive
         inverse of an element. The inverse of a number x can be found
         by taking 26 - x. The results of this can be seen in the “key” row
         of the above decryption.
Initial Assumptions

   Assumptions about the              Assumptions about the
    Language:                           Method Used:
    –   The plaintext will be           –   I will be doing analysis on
        based on the English                the XOR polyalphabetic
        language                            substitution cipher
    –   When doing frequency            –   XOR encryption can be
        analysis to determine the           considered addition mod 26
        key used, I will assume             as used previously in the
        the key is an actual word           example (i.e. A = 0, B = 1,
                                            ..., Z = 25)
A Note on the Method

 Using addition mod 26 (instead of converting letters to
 binary representations and doing XOR bit-by-bit) does not
 take away from the learning experience. This is because
 in this type of cryptanalysis, the algorithm analyzes a
 character at a time without regard to the actual character,
 noting only that it is a distinct character. The addition
 mod 26 will simply provide an easier medium for both
 myself and peers to understand and convey the
 information as we can talk about specific characters and
 not be concerned with abnormal or unprintable characters
 that would otherwise be obtained in the XOR encryption.
Exploiting the Weaknesses

After Eve determines that Alice
used a polyalphabetic cipher (after
all, a monoalphabetic substitution
cipher is too simple, even for
Alice), she remembers the strategy
for cracking the code: find the key
length and then use a frequency
analysis to determine either the
plaintext or the key used for
encryption.
    Applicable Theories and Terms

   Kasiski Test: In a polyalphabetic cipher text message, two identical
    segments of plaintext will be encrypted to the same cipher text
    whenever their occurrence in the plaintext is a multiple of the length
    of the keyword; therefore if a string of characters appears
    repeatedly in the cipher message, it is possible that the distance
    between the occurrences is a multiple of the length of the keyword
   Friedman Test: Used to determine whether a cipher text has been
    enciphered using a monoalphabetic or polyalphabetic substitution
    cipher. If the cipher used is polyalphabetic, the text also suggests
    the length of the keyword using the Index of Coincidence



                                                              Continued...
...Continued



      Applicable Theories and Terms

        Index of Coincidence: The probability of two letters
         randomly selected from a text being equal
          – The expected frequencies of the letters A through Z in
            the English language are known. Using these
            probabilities, the index of coincidence for the language is
            approximately 6.5%. Hence, if two letters are arbitrarily
            chosen from an English text, nearly 6.5% of the time the
            letters would be the same.
          – In a purely random text, the letters would occur with
            roughly the same frequency, resulting in the index of
            coincidence being about 3.8%.
Conventions and Abbreviations Employed


   n = the length of the cipher text being crypt
    analyzed
   IC = Index of Coincidence, as discussed
    previously and represented by the following
    formula
                   25

                   Σ fi*(fi - 1)
        IC =       i=0


                     n*(n - 1)
Where fi represents the frequency of the respective alphabetic
character in the cipher text
Let the Code Breaking Begin...

                 Armed with this bank of
                 knowledge, Eve can
                 proceed to crypt analyze
                 Alice’s message to Bob.
                 What are the methods she
                 can use and how effective
                 are the various techniques?
                 What is the best approach?
And Now Onto the Fun Part...

Applying the theories and principles!
Determining the Key Length

   I employed four distinct, yet related
    algorithms for finding the key length. These
    algorithms are outlined on the following
    slides.


   Note: These algorithms can stand alone, however,
    for increased accuracy, they can be combined
(Formula taken from Cryptology by Albrecht Beutelspacher, page 39.)



 Algorithm One: “Plug it in”

    Simply plug data into the following formula:
                                             0.027*n
     Key Length =
                               (n-1)*IC - 0.038*n + 0.065

     Where n is the length of the text and IC is the Index of Coincidence for a
     specific text
(Algorithm taken from Introduction to Cryptography with Coding
Theory by Wade Trappe and Lawrence C. Washington, page 19.)


 Algorithm Two: Shift and Count

 1. Make a duplicate copy of the cipher text.
 2. Align the copy under the original, only
    shifted by x places.
 3. Record x and the number of coincidences.
      (i.e. where the letters match)
 4. Increase x and go to step two.
 5. The shift with the most coincidences is a
     likely guess for the key length.
(Algorithm adapted from Cryptological Mathematics by Robert
Edward Lewand, pages 90 - 92.)


 Algorithm Three: Friedman Test

1. for m = 1 to n
2.     Fill ROWS of rectangular array with dimensions
       m x (n/m) with consecutive substrings from the
       cipher text of length m.
3.     Compute the IC of each COLUMN.
4.     Find the average of all the column IC’s.
5.     If the average IC is approx 0.065, break and m is
       the likely keyword length. Else continue loop.
(Algorithm adapted from Cryptological Mathematics by Robert
Edward Lewand, pages 90 - 92, and Cryptography Theory and
Practice by Douglas R. Stinson, page 31.)

 Algorithm Four: Kasiski Test

 1. Determine repeating strings of characters in
     the cipher text (of length at least three).
 2. Tabulate the distances between occurrences.
 3. The probable key length is a divisor of the
     greatest common divisor (GCD) of all the
     distances.
Theory Behind the Kasiski Test

   If a string of characters is repeated in a
    plaintext message at a distance apart which
    is equal to a multiple of the length of the
    keyword, then the cipher text representations
    of these characters will be identical in each
    occurrence
And the Winner is...

   The most accurate is the Friedman Test, also the
    slowest algorithm
   The Shift and Count algorithm is very accurate as
    well, taking less time than the Friedman Test
   The “Plug it in!” algorithm runs the fastest, but is
    only accurate on small keys
   The Kasiski Test almost always results in output of
    the correct key length or a multiple thereof, but how
    many possible lengths must the user try before
    finding the correct one?
Determining the Plain Text/Key

   I used three distinct, yet related algorithms for
    finding the plain text/key. These algorithms are
    outlined on the following slides.

   These algorithms all require the key length as
    input, by knowing the key length, the cipher text
    can be split into rows of that length. Looking
    down a column, all letters are encrypted by the
    same key letter - resulting in a Monoalphabetic
    Substitution cipher!
(Algorithm taken from Beutelspacher and Lewand.)

 Algorithm One: Basic Frequency
 Analysis

 1. Split text into rows of the same length as the
   key.
 2. For each column, determine the frequencies
   of each letter.
 3. Compare to expected English frequencies
   (these values are known and tabulated) and
   "guess" at encryption.
 4. Repeat process on next column.
(Algorithm taken from Introduction to Cryptography with Coding
Theory by Wade Trappe and Lawrence C. Washington, pages 22 - 23.)
 Algorithm Two: Permute through
 All Shifts

1. Split text into rows of the same length as the
  key.
2. For each column, determine the frequencies of
  each letter.
3. Take the dot product of the column frequencies
  with the every possible shift of the standard
  English alphabet frequencies.
4. The largest value is the most likely shift.
5. Repeat the process on the next column.
(Algorithm taken from Stinson pages 33 - 36.)

 Algorithm Three: Find Relative
 Shifts between Key Letters

1. Split text into rows of the same length as the key.
2. For each column, determine the frequencies of each
   letter.
3. Find all MIc of each column with every other column.
4. Search for the MIc's closest to .065, this yields the
   relative shift from column i to column j.
5. Form a system of equations and solve in terms of one
   key letter.
6. The keyword is a cyclic shift of the result.

                                                Continued...
...Continued

      Algorithm Three: Find Relative
      Shifts between Key Letters

         MI(c) is represented
          by the equation on          MIC (f, hg) =
          the right, where n and
                                        25
          m are the lengths of
          substrings f and h, fi is     Σ fi*h(i - g)
                                       i=0
          the frequency of letter
          i, and h i - g is the              n*m
          frequency of letter i -
          g where 0 <= g <= 25.
And the Winner is...

   Permute through All Shifts Algorithm, logical
    winner since all possibilities are attempted
   The Basic Frequency Analysis works okay
    for small key lengths
   What about the Relative Shifts Algorithm?
    –   I need far more computing power (or patience) to
        test this algorithm.
    –   Yields accurate results when the matrix can be
        solved
Down the Road: Unaddressed
Issues and Enhancements to
Implement

   When the key length is equal to the plaintext length and
    the key is perfectly random, this XOR encryption method
    is considered perfectly secure. But, does key length really
    have to equal the plaintext length for the encryption to
    be secure; where exactly is the critical point?
   What if a random key is used instead of
    an actual word? How will this effect
    the frequency analysis to determine the
    key?
...Continued
      Down the Road: Unaddressed
      Issues and Enhancements to
      Implement

      •I used a cipher text only attack (the only available resource to
      analyze is the encrypted cipher text). Consideration should be
      given to various types of attacks, such as cribbing
      (knowledge that a certain word(s) appears in the plaintext) and
      taking advantage of multiple cipher texts in which the same
      key was used (additional information is gained under these
      circumstances because you KNOW the keys are overlapping
      starting at the beginning of the cipher text - however, how do
      you determine initially that the same key was used?).
...Continued
      Down the Road: Unaddressed
      Issues and Enhancements to
      Implement

         My final code requires “slimming down” to
          increase efficiency.
         A spell checker/dictionary could be added to
          increase accuracy
          –    Instead of giving the user all cyclic shifts of the
               key word on the Find Relative Shifts between Key
               Letters Algorithm, only give the user actual words
          –    When using the other two algorithms, a spelling-
               auto-corrector would improve accuracy
...Continued
      Down the Road: Unaddressed
      Issues and Enhancements to
      Implement

         In the first two find plain text algorithms, allow
          the user to select specific letters in the
          keyword or plaintext to change and display
          the effect of these changes.
         The key length algorithm that attempts to
          compute the GCD could be altered to throw
          out “bad” data
          –    i.e. find the number(s) that are preventing a
               common GCD and ignore those numbers
...Continued
      Down the Road: Unaddressed
      Issues and Enhancements to
      Implement

         Combine the various algorithms so they can
          share the results and base results off of one
          another.
         Finally, how about considering a new method
          of encryption?
Strategies & Knowledge

   Research, research, research!
    –   Understand everything you read, even how the
        author got from one step to the next
   Trial and error, but try it.
   Do an example first - ON PAPER (but make
    sure you do your math right)
   No single part of the project was difficult to
    code, but implementation required an in-
    depth understanding of the problem
Advice to Next Year’s Seniors

   Start EARLY! It goes by       ASK QUESTIONS!
    FAST.                          –   Different professors have
   It is almost impossible            different “specialty”
                                       areas, take advantage of
    to stay on target with             it
    your first schedule,           –   Your classmates can
    second schedule, third             provide great insight
    schedule...                    –   Don’t re-invent the
   Lofty aspirations at the           wheel, check out other
                                       solutions first
    beginning, but reality
    will hit
QUESTIONS

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:9
posted:8/22/2011
language:English
pages:46