Cryptographic Potential of the Playstation 3

Document Sample
Cryptographic Potential of the Playstation 3 Powered By Docstoc
					Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven


    Cryptography on the Playstation 3: Brute
              Force AES Attack
Introduction
Adults typically categorize game consoles as toys. This is not surprising, since that is their
intended purpose. Recently however the technology used in game consoles has become
extremely advanced, even bleeding edge. While the intended purpose of these devices is
entertainment, modern game console hardware such as the Sony Playstation 3 have more in
common with high end supercomputers than they do with the toys they share a shelf with.
Unfortunately most of this power goes to waste – these machines sit idle most of the day.
Stanford University’s Pande group recognized this wasted potential and found a way to utilize it.
This group runs the Folding @ Home network, a distributed computing network where people
can donate idle processing power to scientists so they can run complex calculations [6]. This
group worked with Sony to develop a Folding @ Home client for the Playstation 3. Since this
client went live, the processing power of their distributed computing network has more than
doubled to become one of the fastest distributed computing networks in the world. The
overwhelming success of the Playstation 3 in the protein folding project was the inspiration for
this project. Since it does so well at these complex scientific calculations, how would it do at
cryptography? Perhaps more importantly can a typical programmer unlock the potential of the
Cell, or is this a beast that can only be tamed by a handful of specialists?

Strategy
This project could have examined the technical specifications of the Playstation 3 and its Cell
processor and marvel at how great it would be at cryptography, but that would leave the reader
wondering if theory translates to reality, and more importantly it would not be much fun. To see
what the Playstation 3 can really do this project implemented an interesting, computationally
intensive application on both the Playstation 3 and on commodity PC hardware to compare their
performance. The chosen example was a known plaintext brute force attack on the AES
encryption algorithm. This paper did not aim to shoot holes in AES; this would be naive and
unachievable. The primary goal of this paper is to teach the reader about a unique new breed of
processor and show an interesting application of the Cell processor that highlights its
performance potential.


AES
AES is a symmetric block cipher that is the current FIPS standard for protecting electronic data in
business and government (FIPS 197) [1]. It is based off of the Rijndael cipher but uses a fixed 128 bit

CryptographyOnThePS3.doc
4/15/2007 5:00 PM
Page | 1
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
block size and only supports 128, 192, or 256 bit keys (Rijndael allows variable key length and block size)
[7].



Cryptanalysis
Every cryptographic algorithm is vulnerable to cryptanalysis. There are many different
cryptanalysis techniques, but one is possible with every cryptographic algorithm: brute force.
The theoretical vulnerability of an algorithm is determined by the effort required to search the full
key space; if every possible key is tried, one of them has to be the correct key. Due to their
simplistic nature many would consider brute force attacks to be primitive. Primitive or not, it is
often the most effective cryptanalytic technique. Algorithms are considered “broken” when an
attack is found that requires less than the number of operations a brute force attack would require
to recover the plaintext. These “breaks” show weakness in the algorithm, but are often nothing
more than theoretical weaknesses due to unrealistic constraints on the attack, such as requiring a
huge number of known plaintext, ciphertext pairs. It’s hard for an attacker to get a single known
plaintext, ciphertext pair, let alone a large number of them.


The Cell
The heart of the Playstation 3 is the Cell Broadband Engine microprocessor. This processor was
jointly developed by Sony, Toshiba, and IBM (STI) [8]. While the purpose of this paper is not to
marvel at the microarchitecture of the Cell, understanding what makes the Cell different from
typical CPUs is essential to unlocking its power. Traditional microprocessors have a single
general purpose processing core. Recently multi-core processors have reached the mainstream
market. These are essentially multiple (identical) general purpose processing cores packaged
together so they can be installed in a single socket. They are a consolidated version of the shared
memory multiprocessor systems that preceded them.

The Cell is different, very different. Instead of trying to build a faster processor by cramming
more transistors into general purpose processors STI decided to improve performance through
specialization. The Cell consists of a single general purpose core called the Power Processing
Element (PPE) and eight highly specialized 128 bit vector processing units called Synergistic
Processing Elements (SPEs). The PPE is capable of general purpose computing; it is the heart of
the Cell. The PPE is a 64 bit RISC processing unit based off IBM’s POWER architecture that is
capable of running two threads in parallel [9]. The SPEs are the workhorses of the Cell. These
are specialized processing units that are built to perform a limited set of operations very quickly.
The general programming strategy recommended by IBM is to control the SPEs with the code
running on the PPE, and offload all compute intensive work to the SPEs [3].



CryptographyOnThePS3.doc
4/15/2007 5:00 PM
Page | 2
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
The PPE and SPEs are not set up in a traditional shared memory multiprocessor configuration.
The PPE is linked directly to main memory (256MB). The SPEs each have their own private
memory, referred to as the Local Store (256KB). The PPE and SPEs are connected via a high
capacity interconnect called the Element Interconnect Bus (EIB). While the PPE and SPEs are
connected, the SPEs cannot access main memory directly; they must do so via DMA [9]. This is
a unique memory configuration. The SPEs can access their local store very quickly, and since a
local store is dedicated to a single SPE it does not need to worry about contention. When SPEs
need to work with data that is in main memory, the data must be transferred across the EIB. The
PPE and SPEs can both initiate DMA requests, but for efficiency reasons it is preferable to
initiate DMA from the SPEs [3]. If an SPE modifies data and wants the PPU to be able to see the
change, it needs to write the updated data back to main memory via DMA.

The Cell processor is not exclusive to the Playstation 3, it is also available in high performance
servers such as the IBM QS20 [10], dedicated processing boards such as those produced by
Mercury Computer [11], and in the Department of Energy’s next supercomputer, the IBM
Roadrunner [12]. The Cell processor in the Playstation 3 is a full featured Cell, the only
restriction being that one of the eight SPEs is disabled. This was done to improve chip yields;
many more processors can pass QC if one of the SPEs is permitted to be defective [13]. The
Playstation 3 provides a unique opportunity to gain access to supercomputer technology without
paying supercomputer prices (even the relatively cheap QS20 blade server is around $20,000
[10]).

SPE Concepts
The SPEs are what makes the Cell a monster. Their wide 128 bit registers and SIMD instruction
set allow huge volumes of data to be processed quickly. The fact that the Cell processor contains
eight SPEs does not hurt either (six available through Linux [13]). To harness the power of the
SPEs programmers must understand a few key concepts:

Vector Data Types

Vectors data types allow multiple sub-quadword values to be stored in a single 128 bit quadword.
The number of values that can fit in a vector vary based on the type of scalar being aggregated
into a vector. A vector unsigned char can hold 16 X 8 bit unsigned chars, a vector unsigned int
can hold 4 X 32 bit unsigned ints, and so on. The utility of vector data types is not immediately
obvious, but will become so in the next section.

SIMD Operations

Vector data types do not do much on their own; they just provide a way to structure data in
quadwords so it can be processed using SIMD operations. SIMD stands for Single Instruction
Multiple Data. SIMD operations allow one CPU operation to be applied to multiple values in
parallel. The figure below shows two lists of four 32 bit unsigned ints being strored in two 128

CryptographyOnThePS3.doc
4/15/2007 5:00 PM
Page | 3
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
bit vector unsigned ints. These are added using an SIMD add operation to produce a 128 bit
vector unsigned int result. With a single CPU instruction four pairs of values have been added.




                Concept derived from [2]

Intrinsics

Typically a high level language like C would be used to implement an algorithm, and the
compiler would be responsible for mapping the C operations to CPU instructions. This works
well for some target platforms. A good C compiler can normally come very close to the
performance of an assembly implementation. This is not the case with the Cell however. For the
Cell to perform, data needs to be vectorized and SIMD operations need to be used. I expected the
compiler to be able to automatically vectorize data (such as scalar arrays) and use SIMD
operations, but C code compiled for the Cell performed dismally. I am not sure if this is due to
the immaturity of the compiler or if effective auto-vectorization is too much to ask.

Optimizing for the Cell requires the programmer to convert data to vectors and convert
operations to intrinsics calls. A vector is a 128 bit chunk of data. The SPEs have 128 bit
registers, so any of the vector types can fit in a single register. SPE CPU instructions operate on
vectors. IBM has provided a library of SPE intrinsics that allow the programmer to make near
direct calls to the SPEs SIMD CPU instructions [14]. This allows the programmer to take back
some control from the compiler without resorting to raw assembly.

       Vector/Intrinsics Example: XOR two 128 bit chunks of data.

       Standard C implementation:
       unsigned char chunk1[16] = {0xA0,0x03,0x00,0x04,0x13,0xB4,0x00,0x05,0x80,0x66,0xDF,0x01,0x34,0x06,0x80,0x10};
       unsigned char chunk2[16] = {0xF4,0x50,0x01,0xA4,0x57,0x23,0x60,0x40,0xF0,0xAA,0x12,0x40,0x01,0xEF,0xC4,0x08};
       unsigned char result[16];

       int i;
       for (i=0;i<16;i++)
       {
                 result[i]=chunk1[i]^chunk2[i];
       }


       Vectorized implementation using intrinsics:
       vector unsigned char chunk1 = {0xA0,0x03,0x00,0x04,0x13,0xB4,0x00,0x05,0x80,0x66,0xDF,0x01,0x34,0x06,0x80,0x10};
       vector unsigned char chunk2 = {0xF4,0x50,0x01,0xA4,0x57,0x23,0x60,0x40,0xF0,0xAA,0x12,0x40,0x01,0xEF,0xC4,0x08};
       vector unsigned char result;

       result = spu_xor(chunk1, chunk2);



CryptographyOnThePS3.doc
4/15/2007 5:00 PM
Page | 4
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
The standard C implementation represents the 128 bits of data as an array of unsigned chars. It
then loops through each position and XORs the chunks to produce the result. The compiler will
need to interpret the code and decide how to map this to CPU instructions. On a machine with 32
or 64 bit registers multiple CPU instructions would typically be required to XOR two 128 bit
chunks of data.

The vectorized version using SPE intrinsics takes control away from the compiler. The 128 bits
of data is represented with a vector unsigned char. This is a data type that contains 16 8 bit
unsigned chars aligned to be stored in a single register. The XOR operation is performed using
the spu_xor intrinsic. The documentation for this intrinsic indicates that it will be mapped to a
single CPU instruction – XOR [14]. Intrinsics allow the programmer to make low level calls
easily from a high level language.

Performance Comparison Strategy
To evaluate the performance of the Playstation 3 and its Cell processor I implemented a brute
force attack on AES on the Playstation 3 as well as on commodity PC hardware. A range of keys
to search was chosen, and the search was performed on both sets of hardware. Timing was taken
for both versions and the results were compared.

Software Design
The x86 and Cell versions of the attack will use the same basic design. There will be a single
controller thread and many worker threads.

Controller

The controller is responsible for managing the attack. It breaks up the work to be done into
chunks. Worker threads are spawned to process the chunks. The optimal number of worker
threads will be the number of cores available on the target CPU.

               Pseudocode:
               Build a known plaintext, ciphertext pair.

               for (each core)
               {
                      Allocate a portion (1/# cores) of the keyspace to be searched.

                      Spawn a worker thread to search the keyspace passing it the range to search
                      and the known plaintext, ciphertext pair.
               }

               Exit when a worker thread finds the key OR all worker threads finish searching
               their chunk of the keyspace.



Worker:

CryptographyOnThePS3.doc
4/15/2007 5:00 PM
Page | 5
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven

The worker loops through the range of keys and tests each candidate key. If the key is found it
returns the key. When the entire range has been searched the worker returns indicating that the
key was not found.

               Pseudocode:
               for (start of key range TO end of key range)
               {
                      Encrypt known plaintext to produce candidate ciphertext.

                       if (candidate ciphertext = actual ciphertext)
                       {
                              return the key;
                       }

                       Move to next key.
               }



AES Implementation - Cell
AES can be optimized for a wide range of architectures. To maximize performance software
implementations need to be designed with the target platform in mind. Optimized software
implementations are widely available for many types of hardware, from smart cards to
supercomputers. There is not currently a freely available Cell optimized AES implementation
however. Since there was no Cell optimized AES implementation available, I needed to create
one.

The AES implementation created for this project is not a full fledged implementation. Only
encrypt and key scheduling operations needed to be implemented for the attack. In addition the
attack works on 128 bit keys, so the AES implementation only supports 128 bit keys. It would be
trivial to turn this into a full fledged Cell optimized AES implementation (if someone else is
looking for a project).

The process began with evaluating the available AES implementations compiled for the Cell to
determine how well the attack would run without optimization. Performance was much slower
than on x86 hardware with all the implementations tested. It became clear that realizing the
Cell's potential would require a custom AES implementation. The only question remaining was
which of the existing AES implementations to use as a starting point. The “optimized” reference
implementation [5] performed significantly better than the base reference implementation [4], but
there was not a clear division of the different steps of an AES round, so I found it easier to start
my optimization attempts with the base reference implementation. It is necessary to understand
how AES works to understand what was done to optimize it for the Cell.

AES Encrypt Operation




CryptographyOnThePS3.doc
4/15/2007 5:00 PM
Page | 6
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
The AES encrypt operation takes a plaintext and an expanded key and returns a ciphertext. The
key steps in encryption, SubBytes, ShiftRows, MixColumns and AddRoundKey are outlined in
more detail below.

              Pseudocode:




              [1]


AddRoundKey

       Summary: The AddRoundKey step applies a round key that is derived from the 128 bit
       encryption key to the input. The 128 bit key is applied to the 128 bit input by XORing
       them together.




              [1]

       Reference Implementation: The reference implementation loops through each byte of
       the input XORing one at a time with the corresponding byte of the round key.


CryptographyOnThePS3.doc
4/15/2007 5:00 PM
Page | 7
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
      Cell Optimized Implementation: This operation is very easy to optimize for the Cell. The
      input is 128 bits, the round key is 128 bits, and the key is applied to the input via the XOR
      operation to produce a 128 bit output. Conveniently the Cell has a vector intrinsic
      (spu_xor) that takes two 128 bit vectors, XORs them, and returns an output vector. The
      intrinsic maps to a single CPU instruction (XOR) so the entire AddRoundKey step can be
      performed in one operation.


SubBytes

      Summary: The SubBytes step takes each byte of the input, uses it as the key to a table
      lookup, and replaces the byte with the value from the table.




              [1]

      Reference Implementation: The s-box is implemented as a 256 entry byte array. The
      input array is looped through one byte at a time with the current byte used as the key to
      the 256 entry byte array. The value returned from the table lookup is substituted in one
      byte at a time.

      Cell Optimized Implementation: This step is tricky to optimize. It is necessary to look at
      what is going on from a higher level without being biased by the reference
      implementation's approach. The s-box is implemented as a 16 entry quadword array.
      Lookups are performed using the 5 least significant bits of each byte of the input data as a
      key to index two of the 16 byte entries of the s-box array at a time. This is repeated 8
      times to search the whole 16 entry s-box. The result of these 32 byte s-box lookups is 8
      intermediate vectors. These vectors contain the valid substitution values, but they also
      contain many invalid values since the most significant 3 bits of each input byte were
      ignored while doing the lookup. The invalid values need to be eliminated and the 8
      intermediate vectors need to be consolidated into a single result vector containing only the
      valid values. This is done using binary tree pruning. The bytes of the result vectors are
      pruned down to eliminate invalid values based on the value of the three most significant
      bits that were ignored previously. This is done in stages, eventually leaving a single
      vector containing the substituted values. This seems overly complex and wasteful
CryptographyOnThePS3.doc
4/15/2007 5:00 PM
Page | 8
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
      compared to the single byte substitutions the reference implementation uses. This is not
      the case however due to the SIMD operations this technique uses to perform lookups on
      16 bytes at a time. This technique for SIMD table lookups was illustrated in [2].

      SIMD table lookup using binary tree pruning:




             [2]

ShiftRows

      Summary: The ShiftRows step arranges the input as a 4x4 byte array and circular left
      shifts each row of the array by a varying number of bytes. The first row is not shifted at
      all, the second by one byte, the third by two bytes, and the fourth by three bytes.




                        [1]
      Reference Implementation: In the reference implementation the ShiftRows step is
      performed by treating the input as a 4x4 array as in the conceptual ShiftRows step. Each
      of the rows except the first gets looped through. For each row each byte is copied to the
      destination position in a temporary array. The destination position for each byte is
      determined by performing a lookup in a table that contains the destination index. After
      the copies (effectively a circular shift) are performed on a row in the temporary array the
      row is copied back over the input to form the output.

      Cell Optimized Implementation: This is another step that can be performed in a simple
      and efficient manner on the SPEs. Instead of treating the 16 bytes of input as a 4x4 array

CryptographyOnThePS3.doc
4/15/2007 5:00 PM
Page | 9
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
      like the conceptual operation and reference implementation do, the 16 bytes are kept in
      vector unsigned char form. The row by row variable shifts can be performed by
      rearranging the bytes. A constant holds the shuffle pattern which dictates where each byte
      of the input gets moved to produce the output. A single spu_shuffle intrinsic call can
      perform the entire ShiftRows operation. The spu_shuffle intrinsic maps to a single
      SHUFB CPU instruction.

MixColumns

      The MixColumns step arranges the 16 input bytes in a 4x4 byte array and applies a
      function to each column of the array. The function performs a transformation on the bytes
      where each of the input bytes affects all of the output bytes. This step cannot be explained
      without getting into some heavy math. The purpose of this paper is to help the reader
      understand the potential of the Cell and what it takes to unlock it so detailed explanation.
      This step, especially in the Cell optimized version, is too complex to be a good example
      of unlocking the power of the Cell, so this step will be skipped. The MixColumns
      function was heavily optimized for the Cell and the implementation is creative and
      interesting. This was the most time consuming part of AES to optimize. If you want
      more detail have a look at the included Cell optimized AES code; it is well documented.

AES Key Expansion

      Summary: The key expansion process converts the 128 bit key provided by the user to a
      set of round keys based off of the user key. One 128 bit round key needs to be generated
      for each round, plus one additional round key that is applied before the rounds begin. The
      number of rounds is dependent on key size. Since a 128 bit key is being tested there will
      be 10 rounds, and therefore 11 round keys.

      Cell Optimized Implementation: Key expansion was difficult to optimize due to data
      dependencies. The key provided by the user is used to generate the next 128 bit round
      key. This 128 bits is used to generate the next 128 bits, and so on. Each 128 bit round
      key is based on the previous 128 bit round key, so round keys cannot be generated in
      parallel. Not only is each round key dependent on the previous 128 bit round key, each
      32 bits of a single round key is dependent on the previous 32 bits of that round key. This
      means multiple round keys cannot be generated in parallel, and the parts of a single round
      key cannot be generated in parallel. This makes it very difficult to take advantage of
      SIMD operations. After much agony I came up with an interesting solution. If SIMD
      vector operations cannot be used to expand a single key, the only effective option is to
      expand four keys at a time! This technique would not be very useful if the use case
      required a single key to encrypt many blocks as symmetric ciphers are typically used, but
      it is a perfect fit for a key search since a key expansion is being performed for every
      encrypt operation.




CryptographyOnThePS3.doc
4/15/2007 5:00 PM
P a g e | 10
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven




AES Implementation – x86
There are many x86 optimized AES implementations available. For the comparison I chose the
Rijndael Optimized C Code version 3.0 [5]. This implementation was chosen because it is in the
public domain, performs well, and is frequently referenced and cited in comparisons. The
reference implementation of AES could have been used for the comparison, but this would have
heavily favored the Cell.

Test Configuration
The goal of this comparison was to compare the Playstation 3/Cell to commodity PC hardware.
Test environments were set up as closely to each other as possible given the vast differences in
architecture between the platforms.

Environment

       Commodity PC System:

               Hardware:
                    Athlon 64 x2 3800+ CPU
                    2 GB Corsair PC 3200 RAM
                    Abit AV8 Motherboard

               Software:
                     Fedora Core 6 Linux
                     GCC 4.1.2 Compiler

       Cell System:

               Hardware:
                    Sony Playstation 3

               Software:
                     Fedora Core 6 Linux
                     IBM Cell SDK 2.1
                     GCC 4.1.2 Compiler

Test Parameters
Each version of the software was configured to search the range of keys from
CryptographyOnThePS3.doc
4/15/2007 5:00 PM
P a g e | 11
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
0x00000000000000000000000000000000 to 0x0000000000000000000000003C000000. This is
1,006,632,960 keys. Yes, over a billion. The Cell version of the program breaks the keyspace up
into six chunks, one for each SPU. The x86 version breaks the keyspace up into two chunks, one
for each core. Threads are spawned to process the chunks in parallel until the keyspace has been
searched. Arbitrary data was chosen for the plaintext to encrypt. An arbitrary key was chosen
that is outside the search range to ensure that the entire range was searched.

Results

       x86:
       Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
       0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0 0x0 0x0.

       Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0
       0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x3c 0x0 0x0 0x0.

       Successfully executed in 133.00 seconds.

       Keyspace Searched



       Cell:
       Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0
       0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xa 0x0 0x0 0x0.

       Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0xa 0x0 0x0
       0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x14 0x0 0x0 0x0.

       Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x14 0x0
       0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0 0x0 0x0.

       Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x1e 0x0
       0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x28 0x0 0x0 0x0.

       Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x28 0x0
       0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0 0x0 0x0.

       Worker started to search from 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x32 0x0
       0x0 0x0 to 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x3c 0x0 0x0 0x0.

       Successfully executed in 47.00 seconds.

       Keyspace Searched




Analysis
The Cell optimized AES attack searched the key range in 35% of the time that the x86 version
took. The Cell optimized version was nearly three times as fast! Performance per dollar is also a
factor. A 60 GB Playstation 3 and a PC with similar specifications to the test system cost roughly
the same (around $500). This allows the Playstation 3 to keep its nearly 3:1 performance
CryptographyOnThePS3.doc
4/15/2007 5:00 PM
P a g e | 12
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
advantage when cost is considered. Depending on the application it may not be necessary to
purchase hardware to harness the power of the Playstation 3. As the Folding @ Home team
found, there are plenty of idle CPU cycles that can be utilized. Folding @ Home gets users to
donate CPU cycles to a good cause. Perhaps a generic massively parallel computing network of
Playstation 3s would work? Businesses and scientists could lease time on the network, and users
with idle hardware could get paid to put the hardware they already own to use. The Cell
processor is a true disruptive technology. Its power is just beginning to be recognized. Hopefully
this paper provided a good introduction to the Cell processor, exposed its potential for compute
intensive applications, and provided insight into programming the Cell.


Acknowledgements
I’d like to thank a handful of people/resources that provided assistance and guidance with this
paper/project.

       Dr. Barun Chandra (Project Advisor)

       Neil Costigan (Editor, provider of priceless advice)

       IBM (The Cell programming resources they put in the public domain are amazing).

       http://ps2dev.org/ (One of the few places to find Cell programmers)

       My wife Michele (Put up with me working on this non-stop)


References

[1]    N a t i o n a l I n s t i t u t e o f S t a n d a r d s a n d Te c h n o l o g y, F e d e r a l
       I nfor m ation Proces s ing Standards Publica tion 197: Advan ced
       Encr yp tion Standard, 2001.


[2]    I B M , C e l l B r o a d b a n d E n g i n e P r o g r a m m i n g H a n d b o o k Ve r s i o n 1 . 1 ,
       2007.


[3]    I B M , C e l l B r o a d b a n d E n g i n e P r o g r a m m i n g Tu t o r i a l Ve r s i o n 2 . 1 ,
       2007.

CryptographyOnThePS3.doc
4/15/2007 5:00 PM
P a g e | 13
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
[4]       P. B a r r e t o , V. R i j m e n , “ R e f e r e n c e A N S I C c o d e , ” 2 0 0 2 M a r c h
          ( Ve r s i o n 2 . 2 ) , Av a i l a b l e H T T P :
          h t t p : / / w w w. i a i k . t u g r a z . a t / r e s e a r c h / k r y p t o / A E S / o l d / ~ r i j m e n / r i j n d a e l /
          rijndaelref.zip.


[5]       V. R i j m e n , A . B o s s e l a e r s , P. B a r r e t o , “ O p t i m i z e d A N S I C c o d e f o r
          t h e R i j n d a e l c i p h e r ( n o w A E S ) , ” 2 0 0 0 D e c e m b e r ( Ve r s i o n 3 . 0 ) ,
          Av a i l a b l e H T T P :
          h t t p : / / w w w. i a i k . t u g r a z . a t / r e s e a r c h / k r y p t o / A E S / o l d / ~ r i j m e n / r i j n d a e l /
          rijndael-fst-3.0.zip.


[6]       V. P a n d e , S t a n f o r d U n i v e r s i t y, " F o l d i n g @ H o m e D i s t r i b u t e d
          C o m p u t i n g " 2 0 0 0 - 2 0 0 6 . [ O n l i n e ] . Av a i l a b l e :
          http://folding.stanford.edu/. [Accessed August 29 2007].


[7]       J . D a e m e n , V. R i j m e n S , T h e D e s i g n o f R i j n d a e l . Ve r l a g : S p r i n g e r,
          2002.


[8]       J . K a h l e e t a l . , " I n t r o d u c t i o n t o t h e C e l l M u l t i p r o c e s s o r, " I B M J .
          Research and Development, Sept. 2005.


[9]       C.R. Johns, D.A. Brokenshire, "Introduction to the Cell Broadband
          Engine Architecture,” IBM J. Research and Development, Sept.
          2007.


[10] IBM, “IBM BladeCenter QS20 blade with new Cell BE processor
     offers unique capabilities for graphic-intensive, numeric
     applications,” September 2006, http://www-
     306.ibm.com/common/ssi/rep_ca/7/897/ENUS106-677/index.html.


[ 11 ] M e r c u r y C o mp u t e r S ys t e ms , " C e l l B r o a d b a n d E n g i n e ( B E )
       P r o c e s s o r S o l u t i o n s , ” 2 0 0 7 , h t t p : / / w w w. m c . c o m / m i c r o s i t e s / c e l l .



CryptographyOnThePS3.doc
4/15/2007 5:00 PM
P a g e | 14
Kenneth Roe (kennethroe@sbcglobal.net)
Cryptography on the Playstation 3: Brute force AES Attack
Computer Science Masters Project
University of New Haven
[ 1 2 ] I B M , “ I B M t o B u i l d Wo r l d ' s F i r s t C e l l B r o a d b a n d E n g i n e B a s e d
        S u p e r c o m p u t e r, ” h t t p : / / w w w -
        03.ibm.com/press/us/en/pressrelease/20210.wss.


[13] S. Siewert, “The Cell Broadband Engine chip: High-speed offload
     for the masses,” April 2007,
     h t t p : / / w w w. i b m . c o m / d e v e l o p e r w o r k s / l i n u x / l i b r a r y / p a -
     soc12/index.html?ca=drs-.


[ 14] I B M , C /C + + L anguage Ex tens ions for C ell Broadband Eng ine
      A r c h i t e c t u r e Ve r s i o n 2 . 4 , 2 0 0 7 .




CryptographyOnThePS3.doc
4/15/2007 5:00 PM
P a g e | 15

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:12
posted:4/14/2011
language:English
pages:15