Towards FPGA Architectures Optimized For Cryptographic Algorithms

Document Sample
Towards FPGA Architectures Optimized For Cryptographic Algorithms Powered By Docstoc
					Towards FPGA Architectures Optimized
    For Cryptographic Algorithms

                  Nazar Abbas Saqib



        Supervisor: Dr. Arturo Díaz-Pérez

 Computer Science Section, Electrical Engineering Department.
   Centro de Investigación y de Estudios Avanzados del IPN
   Av. Instituto Politécnico Nacional No. 2508, México D.F.
                                                                Cinvestav-IPN
Table of Contents
   Antecedents
   Motivation
   General and Specific Objectives
   State art of the work
   Results
   Publications
   Future Work
   Conclusions
Antecedents
Cryptographic algorithms can be implemented through

      Software
      ASIC
      FPGAs

   Choice of platform depends upon

      Algorithm performance
      Cost
      Flexibility
Antecedents(continued)

 Software
    Most flexible       Low Performance
    Low cost
 ASIC
    High performance    No flexibility at all
                         High cost
 FPGAs
    Most flexible
    Low cost
    High performance
  Motivation


 FPGAs-Potential Features
 Cryptographic algorithms-Basic
  Functions
FPGA: Field programmable
Gate Arrays
Configurable Logic Block


4   Combinational
                             4
                                 16x1
       Logic                     RAM
                     1-bit                     1-bit
                      reg                       reg

                     1-bit                     1-bit
4   Combinational
                      reg    4
                                 16x1
                                                reg

       Logic                     RAM




        Logic Mode               Memory Mode
                    Virtex-II Pro
                               XC      XC      XC       XC       XC          XC       XC        XC        XC        XC
Feature/Product
                              2VP2    2VP4    2VP7     2VP20    2VP30       2VP40    2VP50     2VP70    2VP100    2VP125
                                                                 XCE         XCE      XCE       XCE      XCE       XCE
EasyPath cost reduction         -       -       -        -
                                                                2VP30       2VP40    2VP50     2VP70    2VP100    2VP125

Logic Cells                   3,168   6,768   11,088   20,880   30,816     43,632    53,136    74,448   99,216   125,136
Slices                        1,408   3,008   4,928    9,280    13,696     19,392    23,616    33,088   44,096    55,616
BRAM (Kbits)                  216     504      792     1,584    2,448       3,456     4,176    5,904     7,992    10,008
18x18 Multipliers              12      28      44       88       136         192      232       328       444      556
Digital Clock Management
Blocks
                               4       4        4        8        8           8        8         8        12        12

Config (Mbits)                1.31    3.01     4.49     8.21    11.36       15.56     19.02     25.6     33.65     42.78

PowerPC
Processors
                               0       1        1        2        2          2         2         2        2         4
Max Available Multi-Gigabit
                               4       4        8        8        8          12*       16*      20        20*       24*
Transceivers*

Max Available User I/O*       204     348      396      564      644         804      852       996      1164     1200
1 Logic Cell = (1) 4-input LUT + (1) FF + (1) Carry Logic
1 CLB = (4) Slices
                                                                      http://www.xilinx.com/products/tables/fpga.htm#v2p
Cryptographic algorithms on
FPGAs
Cryptographic algorithms contains:
   Simple logical operations - at a bit level
   Replicated blocks
   block length is high
 Can benefits FPGAs because
   FPGAs actually treat bit level operations
   Blocks can be just copied
   Parallelism is possible (high no. of IOs)
   More physical security
   Flexibility
   High density
  Objectives
General
  To achieve optimized implementations for
  cryptographic algorithms

Specific Objectives
 DES: Data encryption standard
 AES: Advance Encryption Standard
 ECC: Elliptic Curve Cryptography
      AES: Advanced Encryption
      Standard
                    Plain Text
                          128




                     AES               Key
                                 128
AES Processes
                         128
 Key Scheduling
 Encryption       Cipher Text
 Decryption
AES: Advanced Encryption
Standard

         Input = 128 bits = 16 bytes


 b b b b b b b b b b b b b b b b
 0   1   2   3    4     5   6     7   8     9   10   11   12   13   14   15




                  b0       b1        b2        b3 
                 b                             b7 
                  4        b5        b6            
                                                   
                  b8       b9        b10       b11 
                 b12
                           b13       b14       b15 
                                                    
             Key Scheduling

 k0   k1    k2    k3     k16    k17 k18 k19                      k160   k161 k162 k163 
k                 k7     k      k 21 k 22 k 23                   k      k165 k166 k167 
                                                      …………………………..
 4    k5    k6            20                                      164                  
                                                                                       
                                                      …………………………..
 k8   k9    k10   k11    k 24   k 25 k 26 k 27 
                                                                     k168   k169 k170 k171 
k12
                  k15 
                          k 28
                                  k 29 k30 k31                    k172
       k13   k14
                                                                            k173 k174 k175 
                                                                                            


Round Round Round …………………………..                                               Round
Key 0 Key 1 Key 3                                                            Key 10
       AES Encryption Algorithm Flow
     USER KEY          SUB KEY                        SUB KEY


IN                                                              OUT
     ARK        BS      ARK          BS          SR    ARK


                                 (ROUND-1)
                SR          MC


                     BS:     Byte Substitution
                     SR:     Shift Rows
                     MC:     Mix Column
                     ARK:    Add Round Key
     Byte Substitution
                                               SUB KEY

                                        BS   ARK


                                        SR    MC



                      S-BOX
a0,0 a0,1 a0,2 a0,3   16x16   b0,0 b0,1 b0,2 b0,3
a1,0 a1,1 a1,2 a1,3           b1,0 b1,1 b1,2 b1,3
a2,0 a2,1 a2,2 a2,3           b2,0 b2,1 b2,2 b2,3
a3,0 a3,1 a3,2 a3,3           b3,0 b3,1 b3,2 b3,3

  State Matrix
                                                   SUB KEY

                                             BS   ARK

ShiftRow(SR)                                 SR   MC




      Offset 0   a   b   c   d   a   b   c   d
MC
      Offset 1   e   f   g   h   f   g   h   e
      Offset 2   i   j   k   l   k   l   i   j
      Offset 3
                 m   n   o   p   p   m   n   o


      Offset 0   a   b   c   d   a   b   c   d
IMC
      Offset 1   e   f   g   h   f   g   h   e
      Offset 2   i   j   k   l   k   l   i   j
      Offset 3
                 m   n   o   p   p   m   n   o
            MixColumn(MC) &
           Inv MixColumn(IMC)
        
      c0, 0  02      03 01 01 c0,i                        SUB KEY
      c                        
       0, 0    01   02 03 01  c1,i                 BS   ARK
MC      
      c0, 0   01
                                
                        01 02 03 c2,i 
                                                    SR   MC
        
      c0, 0  03      01 01 02 c3,i 
                                              i=0,1,2,3
         
       c0,0   0 E 0 B 0 D 09  c0,i 
IMC    c  
        0, 0 
                
                                    
                   09 0 E 0 B 0 D  c1,i 
                                 
         
       c0,0  0 D 09 0 E 0 B  c2,i 
                               
        
        c0,0   0 B 0 D 09 0 E  c3,i 


      **Every entry is represented in GF(28)
    AddRoundKey(ARK)
                                                                               SUB KEY

                                                                      BS    ARK

                                                                      SR     MC

                                  key
b0,0 b0,1 b0,2 b0,3       k0,0   k0,1   k0,2 k0,3       a0,0   a0,1    a0,2 a0,3

b1,0 b1,1 b1,2 b1,3       k1,0   k1,1   k1,2 k1,3       a1,0   a1,1    a1,2 a1,3

b2,0 b2,1 b2,2 b2,3      k2,0   k2,1   k2,2 k2,3
                                                    
                                                        a2,0   a2,1    a2,2 a2,3

b3,0 b3,1 b3,2 b3,3       k3,0   k3,1   k3,2 k3,3       a3,0   a3,1    a3,2 a3,3
           Our Contributions
   Design 1: Encryptor Core
       Sequential vs. Pipelined Architecture
   Design 2: Encryptor/Decryptor Core
       MixColumn & Inv. MixColumn modified
   Design 3: Encryptor/Decryptor Core
       S-Box & Inv. S-Box
          Our Contributions



   Design 1: Encryptor Core
       Sequential vs. Pipelined Architecture
        AES Algorithm Implementation
            Sequential Approach
         USER-KEY
                        ROUND-KEY    CLK    ROUND-KEY
PLAIN               S
         RND 0                                          CIPHER
 TEXT                   RND 1-9     LATCH    RND 10      TEXT




                         RCON        CLK
         USER       S
         KEY                                ROUND
                        KGEN        LATCH
                                             KEY
                          IN




   USER- KEY
IN REG
                       IN REG
KGEN           RK 0
                       RND 0
KGEN           RK 1
                       RND 1
KGEN           RK 2
                       RND 2
KGEN           RK 3
                       RND 3
KGEN           RK 4
                       RND 4
KGEN           RK 5
                       RND 5
KGEN           RK 6
                       RND 6
KGEN           RK 7
                       RND 7
KGEN           RK 8
                       RND 8
KGEN           RK 9
                                     Pipelined Approach




                       RND 9
KGEN           RK 10
                       RND 10
                          OUT
                                AES Algorithm Implementation
            Our Contributions



   Design 2: Encryptor/Decryptor Core
       MixColumn & Inv. MixColumn Modified
     BS and Inverse BS
                 MI               AF   S-BOX
      IN

                 IAF              MI   INV S-BOX



                       E/D
                                       AF      S-BOX
IN
                             MI
           IAF
                                               INV S-BOX
     MixColumn(MC) &
Inv MixColumn(IMC) Revisted
            
          c0,0  02      03 01 01 c0,i 
          c                       
                           02 03 01  c1,i 
  MC       0,0    01
            
          c0,0   01
                                   
                           01 02 03 c2,i 
                                 
            
          c0,0  03      01 01 02 c3,i 


            
          c0,0   0 E 0 B 0 D 09  c0,i 
          c                        
  IMC      0, 0 

            
                   
                      09 0 E 0 B 0 D  c1,i 
                                    
          c0,0  0 D 09 0 E 0 B  c2,i 
                                  
           
           c0,0   0 B 0 D 09 0 E  c3,i 




        **Every entry is represented in GF(28)
               MixColumn(MC) &
          Inv MixColumn(IMC) Cont…
                                                                     
                                                                   c0,0  02      03 01 01 c0,i 
                                                                   c                       
For MC, the biggest co-efficient is, 03                             0,0    01   02 03 01  c1,i 
                                                                                            
                                                                     
                                                                   c0,0   01     01 02 03 c2,i 
                                                                                          
03 x  xtime  x   x        Where     02 x  xtime  x            
                                                                   c0,0  03      01 01 02 c3,i 
                                                                     
                                                                   c0, 0   0 E 0 B 0 D 09  c0,i 
                                                                   c                       
                                                                    0, 0    09 0 E 0 B 0 D  c1,i 
For IMC, the biggest co-efficient is, 0D                             
                                                                   c0, 0  0 D 09 0 E 0 B  c2,i 
                                                                                            
                                                                     
                                                                   c0, 0   0 B 0 D 09 0 E  c3,i 

0 Dx  xtime xtime  xtime x   xtime  xtime x   x




 The co-efficient for IMC have higher hamming weight ?
 It is costly operation?
            MixColumn(MC) &
       Inv MixColumn(IMC) Cont…
      We observe that,

       0 E 0 B 0 D 09  02 03            01 01 05   00 04 00
       09 0 E 0 B 0 D   01 02           03 01 00   05 00 04
                                                          
      0 D 09 0 E 0 B   01 01            02 03 04   00 05 00
                                                           
       0 B 0 D 09 0 E  03 01            01 02 00   04 00 05
                                     (1)                  (2)


 The biggest co-efficient for Eq.2 is, 05
   05 x  xtime  xtime  x   x

Eq.1, we already have, Eq.2 calculation can be made before Eq.1
    Data Path for
Encryption/Decryption




Encryption: MI + AF + SR + MC + ARK
Decryption: ISR + IAF + MI + ModM + MC + ARK
            Our Contributions


   Design 3: Encryptor/Decryptor Core
       S-Box & Inv. S-Box
     Byte Substitution (Revisited)
                                              IAF   S-BOX
                      IN               MI
                              IAF                   INV S-BOX




a0,0 a0,1 a0,2 a0,3                 b0,0 b0,1 b0,2 b0,3
                      S-BOX         b1,0 b1,1 b1,2 b1,3
a1,0 a1,1 a1,2 a1,3
                      16x16
a2,0 a2,1 a2,2 a2,3                 b2,0 b2,1 b2,2 b2,3
a3,0 a3,1 a3,2 a3,3                 b3,0 b3,1 b3,2 b3,3

   State Matrix
     MI: 1st Approach
                       E/D             AF          MC    E/D
                                       SR          ARK
IN                           MI                           OUT
          ISR                               IMC
          IAF                               IARK


        MI with Lookup Table
        Same S-Box (MI) for encryption/decryption
            Memory requirements become half
        BRAMs are used for storing MI values.
            No initial time to prepare them
      MI: 2nd Approach

M            Ist                MI                  2nd          M-1
       Transformation       Manipulation       Transformation

     GF(28) TO FIELD F       IN GF(24)       FIELD F TO GF(28)

                    MI Three-Stage Strategy
                  S. Morioka and A. Satoh, CHES 2002


   MI with Composite Fields GF(22)2 & GF(24)2
1. Map the element A  GF(28) to a composite field F
2. Compute the Multiplicative Inverse over the field F
3. Map back from field F to GF(28)
MI Implementation




Let AF2 and A= AH y + AL, then it can be shown that:

 A16  AH y   AH  AL ;
                                    
 A17  A16  A  0  y  AH AH  AL AL  AH  AL AL
                           16      16       2    16
AES Algorithm Implementations



        Results
   Matrix to measure?
                       1

Throughput := Clock cycle (Frequency) x No. of bits
                      No. of rounds


                       2

             FPGAs Resources used
              CLB slices
              BRAMs
              etc.
Sequential Vs Pipeline design

                     Sequential Design
                   Device Area        Throughput       Through-put/Area
                   (XCV) (CLB slices) (Mbs)
Gaj et al [1]      1000       2902           331.5          0.11
Dandalis et al [2] 1000       5673            353           0.06
Nazar et al        812        2744           258.5          0.09

                     Pipeline Design
                 Device Area (CLB slices)   Throughput Throughput/Area
                 (XCV)                      (Mbits/s)
Elbirt et al [3] 1000 9004                  1940       0.22
Nazar et al      2600 2136                  2868       1.29
       MixColumn vs Inv MixColumn
           Device          BRAMs        CLB(S)       Throughput     T/S
                                        Slices       (Mbits/s)(T)
McLoone et XCV3200E        102          7576         3239           0.43
al
This design XCV2600E       80           5677         4121           0.73



               Two approach for MC/IMC
               Less BRAMs
               Less Slices
               Higher Throughput reported to-date
       S-Box Vs Inv S-Box
                Device          BRAMs          CLB(S)     Throughput        T/S
                                               Slices     (Mbits/s)(T)
McLoone[]       XCV3200E        102            7576       3239              0.43

E/D GF(28)      XCV2600E        80             6676       3840              0.58
E/D GF(24)      XCV2600E        No             13416      3136              0.24
                                BRAMs

    Two approaches for MI            First design uses look-up table for MI,
    Key Scheduling included           Fast but high memory requirements
    No initial delay                 Second design use composite field approach
                                       for MI, Slower with less memory requirements.
              Both are efficient as compared to reported design
 Our Contributions



Elliptic Curve Cryptography
Elliptic Curve Cryptography

                Scaler Multiplication
  Elliptic            Q=kP
  Curve
  Operation
                Point doubling Q=2P
               Point addition R=P+Q


  GF(2m)           Multiplication
  Arithmatic   Squaring,Addition etc.
GF(2191) Arithmetic-Square




    A  a3 x 3  a2 x 2  a1 x  a0
    A 2  a6 x 6  a 4 x 4  a 2 x 2  a0
            A = 1111
            A2= 1010101
GF(2191) Arithmetic-Reduction
 Karatsuba Multiplier GF(2191)



                                                   m 1                         m m 1                            m 1                   m
 A ax  ax  ax  x                                                                     a  x                          ax x A A
     m 1       i       m 1               i       2                i           2 2                       i       2              i       2   H           L
                                                                                              m                 
                                                                                                  2
     i 0   i           i m       i               i 0     i                     i 0    i                       i 0   i
                           2
                                                                                 m m 1                           m 1                   m

                                                            bx  x                        b  x                         bx  x B B
                                                     m 1                        2 2                          i   2              i           H       L
B          bx                    bx 
     m 1                   m 1                                                                  m                                      2
                    i                          i     2                  i                                          
                                                                                i 0       i   2
                                                                                                                  i 0   i
     i 0   i               i m       i             i 0       i
                               2

Then Polynomial multiplication of A and B is:
C  x A B   A B  A B x  A L B L
                                                                                    m
        m       H       H                      H     L                  L   H       2


The karatsuba algorithm has an idea that the above product can be written as:
C  x A B  A B  A B  A  A                                                                           B  B x  A B
                                                                                                                                     m
        m       H       H                      H     H                  L   L             H           L           H          L       2       L   L



   x mC H  C L
     Point addition GF(2191)




           P x , y , z   P x , y , z   P x , y , z
               3   3     3       3           1         1   1       1         2   2       2   2
                                                                                                 
Hessian Form            yx 1       1   2
                                                  xy 2   1   2
                                                                        xz 3   1   2


                        zx 4       1   2
                                                  zy 5   1   2
                                                                        z y6   2   1


                       p 
                         1           1   6
                                                 p   3
                                                   2       2
                                                                       p  
                                                                         1       5   4


                       q  
                         1           2   5
                                                 q 
                                                   2       1   4
                                                                       q  
                                                                         1       6   3


                       x3  p1  q1              y3  p2  q2          z3  p3  q3
    Point doubling GF(2191)




               P2  x2 , y 2 , z 2   P1  x1 , y1 , z1   P1  x1 , y1 , z1 
Hessian Form
                    x    1
                                       2
                                       1
                                                      y   2
                                                                        2
                                                                        1
                                                                                       z   3
                                                                                                         2
                                                                                                         1


                    x
                       4           1       1
                                                      y
                                                        5           1       2
                                                                                       z 3
                                                                                         6           1


                    
                   7           5               6
                                                      
                                                    8           6               4
                                                                                       
                                                                                     9           4               5


                  x  y
                       2           1       8
                                                    y  x
                                                        2           1       7
                                                                                     z  z
                                                                                         2           1       9
    Performance results
             Tool : Xilinx Foundation F4.1i
             Device: XCV2600E
                        No. of CLB     Timings(ns)
                        slices
Karatsuba Multiplier    8721           43.123
GF(2191)
Point addition          9894           863.9
GF(2191)
Point doubling          8531           422.02
GF(2191)
         For ECC scalar multiplication
  Maximum Reported timings := 170 µs [Gerardo, Chess 2000,]
  Estimated timings        := <100 µs
          Publications
1.   Nazar A. Saqib, Francisco Rodriguez-Henriquez, and Arturo Diaz-Perez,
     “Sequential and pipelined architectures for AES implementation, ” proceedings
     of IASTED international conference COMPUTER SCIENCE AND TECHNOLOGY, pp
     159-163, May 19-21, 2003, Cancun Mexico.
2.   F. Rodriguez-Henriquez, N.A. Saqib, and A. Diaz-Perez, “4.2 Gbit/s single-chip
     FPGA implementation of AES algorithm, “ ELECTRONICS LETTERS, Vol.39, No.
     15, July 24, 2003.
3.   Nazar A. Saqib, Francisco Rodriguez-Henriquez, and Arturo Diaz-Perez, “Two
     Approaches for a Single-Chip FPGA Implementation of an Encryptor/Decryptor
     AES Core,” FPL 2003, Lecture Notes in computer Science 2778, pp. 303-312,
     2003 (FPL 2003, Sep 1-3, Lisbon,Portugal).
4.   Nazar A. Saqib, Francisco Rodriguez-Henriquez, and Arturo Diaz-Perez, “AES
     Algorithm Implementation-An efficient approach for Sequential and Pipeline
     architectures,” Fourth Mexican International Conference on Computer Science,
     ENC’ 03, pp. 126-130, Sep. 8-12, 2003, Tlaxcala, Mexico.
    Publications Planned

 RAW 2004 , ECC implementations, using efficient methodology
 Microprocessors and Microsystems, Efficient arithmatic for ECC.
 Karatsuba Multiplier GF(2191) and GF(2193)
Future work
     Elliptic
                   Scaler Multiplication
                         Q=kP
                                           ?
     Curve
     Operation
                   Point doubling Q=2P
                  Point addition R=P+Q
                                           

     GF(2m)           Multiplication       
     Arithmatic   Squaring,Addition etc.



1. Elliptic Curve Operation-Scalar Multiplication
2. Thesis writing and defense
  Conclusions
 A promising AES Encryptor/decryptor Core (contributions for
  AES S-Box/Inv S-Box)
    Using look-up table for S-Box
    Using Composite Fields GF(24)


 An optimized AES Encryptor/decryptor Core (contributions
  for AES MC/IMC)
     Using Modified version for IMC


 Efficient arithmetic for ECC (sqr,mul,point addition,point
  doubling)

  Future work , completion of ECC scalar multiplication
  Thesis writing and defense

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:16
posted:11/26/2011
language:English
pages:50