Docstoc

RSA Encryption and Decryption using the Redundant Number System

Document Sample
RSA Encryption and Decryption using the Redundant Number System Powered By Docstoc
					     RSA Encryption and Decryption using the Redundant Number System on the
                                     FPGA

                                    Koji Nakano, Kensuke Kawakami, and Koji Shigemoto
                                 Department of Information Engineering, Hiroshima University
                                       Kagamiyama 1-4-1, Higashi-Hiroshima, JAPAN


                                 Abstract                                                    redundant radix-64K number system that accelerates arith-
                                                                                             metic operations.
   The main contribution of this paper is to present effi-                                       The Montgomery modulo multiplication is used to speed
cient hardware algorithms for the modulo exponentiation                                      the modulo multiplication      ¡ ¡  Ê              ¾ ÑÓ
                                                                                                                                             Å for Ê-bit
È     ÑÓ   Å used in RSA encryption and decryption, and                                      numbers , , and Å . The idea of Montgomery mod-
implement them on the FPGA. The key ideas to acceler-                                        ulo multiplication is not to use direct modulo computa-
ate the modulo exponentiation are to use the Montgomery                                      tion, which is very costly in terms of the computing time
modulo multiplication on the redundant radix-64K num-                                        and hardware resources. By iterative computation of Mont-
ber system in the FPGA, and to use embedded        ¢ -bit          ½        ½                gomery modulo multiplication, the modulo exponentiation
multipliers and embedded 18k-bit block RAMs in effective                                     È     ÑÓ   Å can be computed, which is a key operation
way. Our hardware algorithms for the modulo exponen-                                         for RSA encryption and decryption [2, 10]. In our previ-
tiation for Ê-bit numbers È , , and Å can run in less                                        ous paper [6], we have presented an efficient implementa-
than Ê´¾ · µ´ ½ · ½µ
                 Ê            clock cycles and in expected                                   tion of the Montgomery modulo multiplication in an FPGA.
´½   Ê · µ´ ½ · ½µ
              Ê          clock cycles. We have implemented                                   The key ideas are to use 18-bit embedded multiplier in the
our modulo exponentiation hardware algorithms on Xilinx                                      FPGA and to perform the computation based on the redun-
VirtexII Pro family FPGA XC2VP30-6. The implementa-                                          dant radix-64K number system. The experimental results
tion results shows that our hardware algorithm for 1024-bit                                  in [6] show that 1024-bit Montgomery multiplication can
modulo exponentiation can be implemented to run in less                                      be performed in 1.54 × using 6824 slices and 129 multipli-
than 2.521ms and in expected 1.892ms.                                                        ers on a Xilinx Virtex II Family FPGA. However, this im-
                                                                                             plementation needs a lot of multipliers and has long critical
                                                                                             path through multipliers. We have improved this result us-
                                                                                             ing 18k-bit block RAMs as look up tables [11]. The experi-
1 Introduction                                                                               mental results in [11] show that 1024-bit Montgomery mul-
                                                                                             tiplication can be performed in 1.23 × using 7883 slices, 64
   It is well known that the addition of two Ò-bit numbers                                   multipliers, and 29 block RAMs.
can be done using a ripple carry adder with the cascade of                                      The main contribution of this paper is to present hard-
Ò full adders [4]. The ripple carry adder has a carry chain                                  ware algorithms for the modulo exponentiation used in
through all the Ò full adders. Thus, the delay time to com-                                  RSA encryption and decryption [10] based on our previous
plete the addition is proportional to Ò. The carry look-ahead                                work [6, 11]. Both hardware algorithms using redundant-
adder [4, 9] which computes the carry bits using the prefix                                   64K number system run in Ê                        ½ ·½
                                                                                                                                          clock cycles to
computation can reduce the depth of the circuit. Although                                    complete the Montgomery modulo multiplication           ¡ ¡
the delay time is Ç        ´ÐÓ µ
                          Ò , its constant factor is large and                               ¾ ÑÓ
                                                                                                Ê        Å for Ê-bit numbers , , and Å . We have
the circuit is much more complicated than the ripple carry                                   used these hardware algorithms to complete the modulo ex-
adder. Hence, it is not often to use the carry look-ahead                                              ÑÓ
                                                                                             ponentiation È          Å for Ê-bit numbers È , , and Å .
adder for actual implementations. On the other hand, re-                                     Our implementations for both hardware algorithms run in
dundant number systems can be used to accelerate addi-                                            ´¾ · µ´ ½ · ½µ
                                                                                             less than Ê          Ê          clock cycles and in expected
tion. Using redundant number systems, we can remove long                                     ´½ · µ´ ½ ·½µ
                                                                                                  Ê       Ê         clock cycles. Thus, the 1024-bit mod-
carry chains in the addition. The readers should refer to [9]                                ulo exponentiation can be done in less than 133380 clock
(Chapter 3) for comprehensive survey of redundant num-                                       cycles and in expected 100100 clock cycles. We have im-
ber systems. In our previous paper [6], we have presented                                    plemented our modulo exponentiation hardware algorithms




       Authorized licensed use limited to: Bharat University. Downloaded on September 1, 2009 at 10:32 from IEEE Xplore. Restrictions apply.
on Xilinx VirtexII Pro family FPGA XC2VP30-6. The im-                                       ¼¼¼¼¼¼µ and ´¼¼½½¼½, ½½¼¼¼¼, ¼¼¼¼¼¼, ¼¼¼¼¼¼µ are over-
plementation results show that our hardware algorithm for                                   flows, because their values are greater than ¾½   ½. We as-
1024-bit modulo exponentiation can be implemented to run                                    sume that, if the resulting value of an operation is a -digit
in 2.521ms and in expected 1.892ms.                                                         redundant radix-¾Ö number and it is greater than ¾ Ö   ½, it
    There are a lot of works that have been presented for                                   is not necessary for a circuit or a program performing the
the modulo exponentiation (i.e. RSA encryption and de-                                      operation to guarantee the correct result due to the overflow
cryption) using the Montgomery modulo multiplication.                                       error. Clearly, the redundant bits  ½ Ö                    ·½
                                                                                                                                            Ö of the most
Blum et al. [3] showed that 1024-bit modulo exponentiation                                  significant digit  ½ of a -digit redundant radix- Ö num-            ¾
can be done in 11.95ms using 6633 CLBs on XC40150XV-                                        ber    are not zero, then the value of      is overflow. Note
8. Amanor et al. [1] presented a hardware algorithm for                                     that can be overflow even if  ½ Ö               Ö is zero.·½
the Montgomery modulo multiplication and estimated the                                         In our previous paper [6], we have presented hardware
running time if it is used for 1024-bit modulo exponenti-                                   algorithms for various arithmetic operations for redundant
ation. From their estimation, it runs in expected 22.7ms                                             ¾
                                                                                            radix- Ö numbers. For the reader’s benefit, we will review
on XCV2000E-6. Thus, our implementation runs more                                           the hardware algorithms for arithmetic operations. The
than 10 times faster. Mazzeo et al. [7] have presented that                                 reader should refer to [6] for the details.
RSA encryption È     ÑÓ    Å for 1024-bit numbers È and
Å , and          ½¾ ·½  can be done in 2.99ms using 2,902                                   ¾º½                  Ø ÓÒ ÓÖ Ê ÙÒ                    ÒØ ÆÙÑ            Ö×
slices on XCV2000E. Garg and Vig [5] have shown RSA
encryption for the same instance in 0.167ms using 28,891
                                                                                               Let us see the computation of the sum of two redun-
slices on a Xilinx Virtex 4 family FPGA. In our imple-
                                                                                            dant numbers. For two 4-digit redundant radix- num-                ¾
mentation, the RSA encryption can be done in less than
´½ · µ´½¼¾ ½ ·½µ ½ ¿¼                                                                       bers      ´¼¼¼½¼½ ½½¼¼½½ ½½½½½½ ½¼½½½½µ         and
for    ¾ ·½½
                                clock cycles and in 0.027ms
                   . Consequently, our implementation runs
                                                                                            ´¼¼¼¼½½ ½¼½½½½ ¼½½½½½ ¼½¼¼¼½µ     , their sum                               ·
                                                                                            can be computed by the position sum as follows:
much faster than known implementations.
                                                                                                                  11             11               10
                                         Ö
2 Redundant Radix-¾ Numbers and Arith-                                                                          0101           0011             1111        1111
  metic Operations                                                                                                10             01               01
                                                                                                         +      0011           1111             1111        0001
                                                                                                              001101         010110           100001      010000
    In this paper, we use the following notation to repre-
sent the consecutive bits in a number. For a number , let                                   Clearly, the addition has no block carry. Let us see the ad-
          (        ) be consecutive bits from -th to -th bits,
                                           ¼                                                                                  ¾
                                                                                            dition of two -digit redundant radix- Ö numbers and .
where the least significant bits is -th bit. For example,
      ¾ ½½½¼¼         for           ½½½½¼¼¼¼
                                           .
                                                                                            The sum          ·     can be computed as follows:

                                      ¾
    A -digit redundant radix- Ö number is a sequence of                                                  ¼ Ö ½ ¼ · ¼ Ö ½ ¼
    ´ ·¾µ
    Ö È     -bit numbers      ´  ½,  ¾ , , ¼ . The value of   µ                               ¼
                                                                                                           ½ Ö · ½ Ö · Ö   ½ ¼ ·  ½ Ö · ½ Ö
             ½ ¡ Ö We call, for each with Ö
                      ¾                                          ·¾
    is
      ½¼
     Ö 
             ¼
                 and      Ö   ·½
                                                              bits,
                                 Ö , principal bits and redundant                                     · Ö   ½ ¼ ´½             µ
bits, respectively. For example,                         ´¼¼¼½¼½ ¼½¼¼½½
                                                         ,        ,
                                                                                            Hence, ¼ ¾Ö · ¾Ö ¾Ö·½ and               · ¾Ö · · ¾ Ö
½½½½½½ ½¼½½½½µ
         ,             is a 4-digit redundant radix- number,   ¾                            ¾Ö·¾ holds if Ö ¾. Thus, is a correct redundant radix-¾Ö
where underlined binary numbers are redundant bits. If
all the redundant bits of this redundant radix- number         ¾                            number.
                                                                                               Let us design a combinational circuit to compute the sum
are zero, it can be converted to the non-redundant radix-
¾                                                                                                        ·. Let                  ´¾ ¾ µ
                                                                                                                          Ö Ö denote an adder circuit
    number by just removing the redundant bits. Also, the
                                                                                                                                       ¾
                                                                                            that computes the sum of two -bit and two Ö-bit integers.
non-redundant numbers can be converted to the equivalent
                                                                    ¼¼                      Also, let              ´               µ
                                                                                                                         denote the resulting value of
redundant numbers by attaching redundant bits              to each
                                                                                                             ¾
                                                                                            the sum of -bit numbers and , and Ö-bit numbers
digit.
                                                                                                                                ¼ Ö ´¼ ¼    ¼ Ö        ½¼            ½ ¼µ
                                             È
                                                                                            and . Clearly, ¼
    From the definition, the value of a -digit redundant
      ¾
radix- Ö number
                                          ½ Ö·¾   ¡ Ö
                                                    ´¾            ½µ ¾                      and                    ½ Ö ´    Ö  ½ Ö ·½        Ö      Ö   ·½
    Ö  ½µ´¾Ö·¾  ½µ
                           is up to        ¼                                                ½¼      Ö         ½ ¼µ
                                                                                                              . Thus we have,
´¾
       ¾Ö  ½              ¾Ö . However, we assume that the valid
                          ¾       ½
value of is up to Ö   . If the value of is greater than                                     Lemma 1 The addition of two -digit redundant radix- Ö                           ¾
¾      ½
   Ö   , it is regarded as overflow. For example, 4-digit                                    numbers can be computed using adders             ÖÖ               ´¾ ¾              µ
                      ¾
redundant radix- numbers                  ´¼½¼¼¼¼ ¼¼¼¼¼¼ ¼¼¼¼¼¼
                                             ,          ,         ,                         without block carries, whenever Ö .                  ¾




      Authorized licensed use limited to: Bharat University. Downloaded on September 1, 2009 at 10:32 from IEEE Xplore. Restrictions apply.
¾º¾ ÅÙÐØ ÔÐ                      Ø ÓÒ Ó          Ê       ÙÒ        ÒØ ÆÙѹ                     each digit of the sum Ì can be computed as follows.
     Ö×
                                                                                                           ̼             ȼ Ö   ½ ¼ · ¼ Ö   ½ ¼
                                              ¿
   We show that the multiplication of -digit and 1-digit re-                                               ̽             ȼ ¾Ö   ½ Ö · Ƚ Ö   ½ ¼
                      ¾
dundant radix- numbers can be computed without block                                                                      · ¼ Ö·½ Ö · ½ Ö ½ ¼
carry. Let                      ´¼½¼¼½½ ½¼¼¼½½ ½¼½½½½µ
                                                and                                                        Ì              È  ¾ ¾Ö · ¿ ¾Ö · È  ½ ¾Ö   ½ Ö
´½¼¼½¼½µ . The product     ¡ can be computed using 6-                                                                     ·È Ö   ½ ¼ ·  ½ Ö · ½ Ö
bit¢6-bit 12-bit multiplications as follows.
                                                                                                                          · Ö   ½ ¼ (¾               ½)
                                         010011          101001         010001                             Ì              È  ¾ ¾Ö · ¿ ¾Ö · È  ½ ¾Ö   ½ Ö
     ¢                                                                  100101
                                                                                                                          ·  ½ Ö · ½ Ö · Ö   ½ ¼
                                                                                                                          È  ½ ¾Ö · ¿ ¾Ö · Ö · ½ Ö
                                           0010             0111          0101
                                0101       1110             1101                                       Ì   ·½
     +         0010             1011       1111                                                Clearly, each Ì can be computed using                 ÖÖÖ,      ´¾           µ
             000010           010000     011111          010100         000101                 and the resulting value has no more than Ö     bits if Ö . ·¾
Clearly, we do not have the block carries. Let us formally
                                                                                               Thus, Ì is a         ´ · ¾µ
                                                                                                                     -digit redundant radix- Ö number and  ¾
                                                                                               we have,
confirm that the multiplication of -digit and 1-digit redun-
                  ¾
dant radix- Ö numbers can be computed without block car-                                       Lemma 3 For a -digit redundant radix-¾Ö number , a
ries. Let and be -digit and 1-digit redundant radix- Ö                             ¾           ½-digit redundant Öradix-¾Ö number , and a ´ · ½µ-digit
numbers. Also, let È           ¡ (                 ¼
                                                 ) be the                ½                     redundant radix-¾ number , the product sum ¡ ·
partial multiplication. Since both   and has Ö         bits,             ·¾                    can be computed using ÅÍÄ´Ö · ¾ Ö · ¾µs, · ¾
È has Ö   ¾·   bits. We can compute the product Ë       ¡                                            ´¾ Ö Ö Öµs, and a ´ ·½µ´Ö ·¾µ-bit registers, when-
as follows.                                                                                    ever Ö           .

             ˼               ȼ Ö   ½ ¼                                                           Let Ì            ÈË´            µ
                                                                                                                           denote the circuit (or function)
             ˽               ȼ ¾Ö   ½ Ö · Ƚ Ö   ½ ¼                                         for Lemma 3. Using                 ÈË´            µ
                                                                                                                                        we can compute the
                                                                                               sum      of two -digit redundant radix radix- Ö numbers         ¾
             Ë                È  ¾ ¾Ö · ¿ ¾Ö · È  ½ ¾Ö   ½ Ö                                       and . Let                 ½  ¾ ´             ¼ and          µ
                              ·È Ö   ½ ¼ (¾              ½)                                    ´    ½  ¾                     µ
                                                                                                                    ¼ be two -digit redundant radix radix-
             Ë                È  ¾ ¾Ö · ¿ ¾Ö · È  ½ ¾Ö   ½ Ö                                   ¾ Ö numbers. We will show how to compute the product
         Ë                    È  ½ ¾Ö · ¿ ¾Ö                                                          ´
                                                                                               È È¾  ½ , Ⱦ  ¾ , , ȼ                   µ
                                                                                                                                    ¡ using                ÈË´
                                                                                                                                                             .              µ
             ·½
                                                                                               We compute partial products ¡ ¼ , ¡ ½ , , ¡  ½
             ˼       ¾Ö , ˽
                          ¾Ö · ¾Ö ¾Ö·½, Ë ¾Ö , and                                             in turn. We use´                µ
                                                                                                                               ½           ¼ to denote regis-
Hence,
Ë ·½ ¾ hold. Also, if Ö ¿ then Ë ¾ · ¾Ö · ¾Ö                                                                 ´ · ½µ
                                                                                               ters storing a interim                   ¾
                                                                                                                            -digit redundant radix radix- Ö
¾Ö·¾ holds. Thus, Ë ´Ë ·½ Ë             ˼ µ is a redundant                                                       ÈË´
                                                                                               number. We first compute      ¼µ        ¼     . Then, ȼ is the
radix-¾Ö number.                                                                                              ÈË´
                                                                                               least significant digit ¼µ · ½ ¼¼      Ö         . We store the
   Let ÅÍÄ´Ö · ¾ Ö · ¾µ and           ´ Ö Öµ denote com-                                       remaining·½    ÈË´     ¼µ ´ ·½µ´ ·¾µ ½ ·¾
                                                                                                                 digits       ¼               Ö   Ö
binational circuits to compute the ´¾Ö · µ-bit product of
                                                                                                                    ÈË´
                                                                                               in . After that, we compute       µ       ½    . Clearly, Ƚ is
two ´Ö · ¾µ-bit numbers and the ´Ö · ¾µ-bit sum of one
                                                                                                                ÈË´
                                                                                               the least significant digit  µ ·½ ¼ ½       Ö        holds, and
4-bit and two Ö-bit numbers. Each of the partial products
                                                                                                                    ·½
                                                                                               then we store the remaining   ÈË´     µ´ ·
                                                                                                                                  digits           ½
È  ½         ½ ¡ , È  ¾       ¾ ¡ , , ȼ         ¼¡
                                                                                               ½µ´ · ¾µ ½ · ¾
                                                                                                   Ö         Ö          in . Continuing similarly, we can
be computed using       Ö      ÅÍÄ´ · ¾ · ¾µ
                               Ö
                                                       can
                                     . After that, each Ë
                                                                                               obtain the product Å        ¡ . Thus we have,
can be computed using                ´ µ
                              Ö Ö . Thus, we have                                              Lemma 4 For two -digit redundant radix- Ö numbers           ¾
Lemma 2 The product of -digit and 1-digit redundant
                                                                                               and , the product        ¡ in the redundant radix- Ö rep-                ¾
         ¾
radix- Ö numbers can be computed using     Ö    Ö               ÅÍÄ´ ·¾ ·                      resentation can be computed in
                                                                                               ÅÍÄ´ · ¾ · ¾µ
                                                                                                      Ö      Ö
                                                                                                                                     clock cycles using
                                                                                                                                      ·¾             ´¾
                                                                                                                                           Ö Ö Ö s, and a           µ
¾µs, and                  ´       µ
                  Ö Ö s, whenever Ö .                       ¿                                  ´ · ½µ´ · ¾µ
                                                                                                        Ö
                                                                                                                     s,
                                                                                                              -bit register, whenever Ö    .
   Next, to show a circuit to compute two -digit redundant
numbers, we will show how to add a                     ´ · ½µ
                                             -digit radix- Ö                       ¾           3 Montgomery Modulo Multiplication
number to the product ¡ . More specifically, we will
show how to compute Ë           ¡                 ·
                                       . Later, is used to                                       In the RSA encryption/decryption, the modulo expo-
store interim results of the product sum. We can compute                                       nentiation     È        Å or È    ÑÓ          Å are             ÑÓ




         Authorized licensed use limited to: Bharat University. Downloaded on September 1, 2009 at 10:32 from IEEE Xplore. Restrictions apply.
computed, where È and are plain and cypher text, and                                        ¾ Ê ÑÓ         Å         È Ê ½ Ê ¾ ¡¡¡                ¾Ê ÑÓ
                                                                                                                                              If  ½
                                                                                                                                              ¼           Å.         ¼
´      µ
    Å and          ´          µ
                    Å are encryption and decryption keys.                                   then the Montgomery modulo multiplication in line 6 is
Usually, the number of bits in È , , , and Å is 1024                                        not executed. Hence,           ´
                                                                                                                           È Ê ½ Ê ¾ ¡¡¡  ½ Ê          Å ¾ ÑÓ        µ
or larger. Also, the modulo exponentiation is repeatedly                                    holds. If  ½             ½
                                                                                                                  then it is executed, and thus, we have
computed for fixed , , and Å , and various È and .                                                     ´
                                                                                                     È Ê ½ Ê ¾ ¡¡¡ ¼ Ê  ¾ ÑÓ   Å ¡ È ¡  Ê          µ Å ¾ ÑÓ
Since modulo operation is very costly in terms of the com-                                  È Ê ½ Ê ¾ ¡¡¡ ½ Ê      ¾ ÑÓ
                                                                                                                      Å          È Ê ½ Ê ¾ ¡¡¡  ½ Ê        ¾ ÑÓ
puting time and hardware resources, we use Montgomery                                       Å . This completes the proof of the induction. Thus, af-
modulo multiplication [8], which does not use direct mod-                                   ter terminating the for-loop, we have         È Ê          Å.¾ ÑÓ
ulo operations. In the Montgomery modulo multiplica-                                        Finally, by the Montgomery modulo multiplication in line
tion, three Ê-bit numbers , , and Å are given, and                                          8, we have            È Ê   ´ ¾ ÑÓ Å ¡ ¡  Ê            µ ½ ¾ ÑÓ
                                                                                                                                                     Å
´   ¡      ·   ¡ Å ¡  Ê  µ ¾ ÑÓ Å is computed, where an                                     È        ÑÓÅ.
integer is selected such that the least significant Ê bits                                       In the worst case,          ½for all . If this is the case,
of    ¡     ·    ¡ Å are zero. The value of can be com-                                     the Montgomery modulo multiplication is executed no more
                                  ´             µ
puted as follows. Let  Å  ½ denote the minimum non-                                         than Ê   ¾ ·¾ times. Also, since                        ½
                                                                                                                                     with probability     ,         ½¾
                                       ´
negative number such that  Å  ½ ¡ Å   or Ê          µ ½´ ¾ ½µ                               it is executed expected     Ê     ½       ·¾
                                                                                                                                times. Thus we have,
´ÑÓ ¾ µ Ê . If Å is odd, then  Å  ½     ´       µ ¾
                                           Ê always holds.
We can select such that                  ´´ µ ´
                                   ¡ ¡  Å  ½ Ö   .      µµ ½ ¼                              Lemma 5 The modulo exponentiation          È         Å             ÑÓ
               ´  ¡       ·           µ ½¼
                          ¡ Å Ö   are zero.                                                 for Ê-bit numbers È , , and Å can be computed by execut-
                                                                                                                                                          ¾ ·¾
For such ,
           ¼               Å     Ê and     ¾      ¼       ¾
                                                      Ê , we                                ing the Montgomery modulo multiplication Ê times and
                                                                                                           ½      ·¾                   ¾ ÑÓ                 ¾ ÑÓ
   Since
can guarantee that       ¡´     ¡Å ¡  ·  Ê     µ ¾      ¾
                                                 Å . Thus,                                  expected    Ê      times. if Ê      Å and ¾Ê         Å
by subtracting Å from         ¡     ´        ·      µ ¾
                                      ¡ Å ¡  Ê , we can                                     are given.
obtain ´   ¡           ·
                     ¡ Å ¡  Ê     µ ¾ ÑÓ
                                    Å if it is not less than
Å . Since ¡              ·
                        ¡Å       ¡               ´ÑÓ µ
                                             Å , we write                                   5 Hardware Algorithms for Montgomery
´   ¡   ·    ¡ Å ¡  Ê   µ ¾   Å    ÑÓ ¡ ¡  Ê         ¾ ÑÓ
                                                       Å.                                     Modulo Multiplication

4 Modulo Exponentiation using                                           Mont-                  The main purpose of this section is to review hardware
  gomery Modulo Multiplication                                                              algorithms [6, 11] for Montgomery modulo multiplication.

    Let us see how Montgomery modulo multiplication is                                          º½     ÐÓ ¹ ÖÖݹ Ö ÁÑÔÐ Ñ ÒØ Ø ÓÒ Ó
used to compute         È                  ÑÓ
                                  Å . Since Ê and Å are                                               ÅÓÒØ ÓÑ ÖÝ ÅÓ ÙÐÓ ÅÙÐØ ÔÐ Ø ÓÒ
fixed, we can assume that Ê                 ¾ ÑÓ
                                      Å and ¾Ê        Å          ¾ ÑÓ
are computed beforehand. Let the binary representation of                                      Recall that in the Montgomery modulo multiplication,
    be Ê ½ Ê ¾ ¡ ¡ ¡ ½ . The modulo exponentiation                                          Ê-bit numbers      , , and Å are given. In this subsection,
È    ÑÓ   Å can be computed using the following algo-                                       we assume and are a -digit redundant radix- Ö num-                 ¾
rithm based on the right-to-left method:                                                                                                           ¾
                                                                                            ber and a 1-digit redundant radix- Ö number, respectively.
1.        ʾ ÑÓ  Å;                                                                         We will show a circuit to compute the Montgomery mod-
                                                                                                                    ¡  ´           ·
                                                                                                                              ¡ Å ¡  Ö for such , , µ ¾
2. È         ´¾ ÑÓ
        È ¡ ¾Ê        Å ¡  Ê          µ ¾ ÑÓ             Å;                                 ulo multiplication
                                                                                            and Å . We assume that the value of and are given to
3. for          ½
            Ê   downto 0 do                                                                 the circuit as inputs, Å is fixed and  Å  ½ is computed   ´     µ
4.    begin
5.                   ¾ ÑÓ
                ¡ ¡  Ê       Å;                                                             beforehand. This assumption makes sense if Montgomery
6.       if        ½
                 then       ¡È ¡                        ¾ Ê ÑÓ    Å;                        modulo multiplication is used to compute the modulo expo-
                                                                                            nentiation for RSA encryption and decryption.
                                                                                               Recall that, using the circuit for Lemma 2, ¡ can be
7.    end
8.              ½ ¾ ÑÓ
            ¡ ¡  Ê     Å;                                                                   computed using       ÅÍÄ´ ·¾ ·¾µ
                                                                                                                       Ö     Ö s and  ´ µ          Ö Ö s.
The underlined formulas are computed by the Montgomery                                      After computing ¡ , we need to compute such that the
modulo multiplication. Let us confirm that         È                         ÑÓ              least significant Ö bits of´   · ¡     µ  ¡ Å are zero. We
Å        È Ê ½ Ê ¾ ¡¡¡ ¼              ÑÓ
                             Å holds. Clearly, È                                            can compute          ´´    µ ½¼ ´       µµ ½ ¼
                                                                                                                      ¡ Ö   ¡  Å   ½ Ö  
È Ê ¾ ÑÓ   Å holds. Let us show that, at the end of the for-                                using a       ÅÍÄ´ µ
                                                                                                            Ö Ö . Once is obtained, the product ¡ Å is
loop for      ,     È Ê ½ Ê ¾ ¡¡¡ Ê                 ¾ ÑÓ
                                            Å holds by in-                                  computed using the circuit for Lemma 2. Finally, the sum
duction on . Suppose that       È Ê ½ Ê ¾ ¡¡¡ Ê          Å        ¾ ÑÓ                      ´   ¡         ·    µ
                                                                                                         ¡ Å is computed by the circuit for Lemma 1.
holds at the end of the for-loop for           . After ex-                                  Note that both ¡ and ¡ Å are   ´ · ½µ       -digit redundant
ecuting line 5 of the following for-loop, we have                                                    ¾
                                                                                            radix- Ö numbers. However, since the least significant digit
´È Ê ½ Ê ¾ ¡¡¡ Ê       ¾ ÑÓ                µ´
                         Å ¡ È Ê ½ Ê ¾ ¡¡¡ Ê           Å ¡       ¾ ÑÓ            µ          of ¡ and ¡ Å are zero, we can omit the addition of the




      Authorized licensed use limited to: Bharat University. Downloaded on September 1, 2009 at 10:32 from IEEE Xplore. Restrictions apply.
least significant digit. The readers should refer to Figure 1
for illustrating the circuit for Lemma 6.
                                                                                                                       Multiplier
                                                                                                                                                           ´ ¡ µÖ ½ ¼
                                                                                                    Adder
                            Multiplier                                                                       ´ ¡ µ
                                                            ´ ¡ µÖ ½ ¼                              ¡ · ´ ¡ µ
      Adder
              ¡Å
                            Multiplier           Multiplier
                                                                                                Figure 2. Circuit to compute                              ¡ ¡· ´ ¡ µ
                                 Å
          ¡ · ¡Å                                        ´ Å  ½µ                                 using a memory

    Figure 1. Circuit to compute                     ´ ¡ · ¡ ŵ                             Lemma 7 [11] Montgomery modulo multiplication          ¡                    ´
    using multipliers                                                                           ·       µ ¾
                                                                                                    ¡ Å ¡  Ö for -digit and Å , and 1-digit
                                                                                                             ¾
                                                                                            of redundant radix- Ö representation can be computed us-
                                                                                            ing      ÅÍÄ´ · ¾ · ¾µ
                                                                                                         Ö      Ö     s,                      ·¾
                                                                                                                                          Ö Ö Ö s,            ´¾       µ
   To compute the multiplication ¡ , we can use a cir-                                              ´¾ ¾ µ      ¾
                                                                                                       Ö Ö , and a Ö -word Ö-bit memory, without
cuit for Lemma 2 which uses           ÅÍÄ´ · ¾ · ¾µ
                                      Ö     Ö      s and                                    block carries, whenever Ö    .
      ´        µ
          Ö Ö . To compute , we use a      ÅÍÄ´ µ
                                              Ö Ö . After
                                                                                               If Ö        ½
                                                                                                          and Ö                ½¼¾
that to compute the multiplication ¡ Å , we also use a cir-
                                                                                                                          , then the circuit for Lemma 7
                                                                                            needs 64Ã -word 1024-bit memory of size 64Å bits. Since
cuit for Lemma 2 and the addition ¡          · ¡ Å can be
computed using                   ´¾ ¾ µ
                              Ö Ö by Lemma 1.                                               the size of block memory of current FPGAs is up to few
                                                                                            mega bits, this circuit cannot be implemented in FPGAs.
   Therefore, we have,
Lemma 6 [6] Montgomery modulo multiplication     ¡              ´ ·                           º¿ ÅÓÒØ ÓÑ ÖÝ ÅÓ ÙÐÓ ÅÙÐØ ÔÐ                                             Ø ÓÒ
   µ¾
  ¡ Å ¡  Ö for -digit and 1-digit of redundant radix-                                            Í× Ò    Û Ö Å ÑÓÖÝ
¾Ö representation can be computed using            Ö     ¾ · ½ ÅÍÄ´ ·
¾ ·¾µ ¾
   Ö s,                 ´
                      Ö Ö s, and     µ  Ö Ö , without   ´¾ ¾ µ                                  We will reduce the size of memory to compute the func-
block carries, whenever Ö     .                                                             tion . Recall that, is a Ö-bit number such that the least
                                                                                            significant Ö bits of ¡               ·
                                                                                                                           ¡ Å are zero. Let Ö-bit num-
    º¾ ÅÓÒØ ÓÑ ÖÝ ÅÓ ÙÐÓ ÅÙÐØ ÔÐ                                           Ø ÓÒ                                                 ¾
                                                                                            ber partition into two Ö bits such that               Ö  Ö                 ½ ¾
       Í× Ò  Å ÑÓÖÝ                                                                         and          Ö       ¾ ½¼. We can compute the values of
                                                                                            and separately as follows. Let  Å  ½ be the mini-     ´            µ
    The circuit for Lemma 6 has a cascade of three multi-                                   mum non-negative integer such that  Å  ½ ¡ Å          ½   ´            µ
pliers, which can be a long critical path. Also, it needs too                               ´ÑÓ ¾ µ  Ö ¾ . Also, let                          ½ ¾
                                                                                                                                              ´
                                                                                                                                       ¡ Ö   Ö and        µ
many multipliers. We remove multipliers for computing                                            ´  µ ¾ ½¼
                                                                                                         ¡ Ö   . We set                        ´  µ
                                                                                                                                                  ¡  Å  ½ .
to improve the circuit for Lemma 6. The key idea is to use                                  Then, the least significant Ö bits of ¡¾        ·   ¡ Å are zero.
a memory to look up the value of ¡ Å .                                                      Let be a function such that            ´ µ ´´ ¾ ½ ¼ µ
                                                                                                                                          Ö   ¡Å Ö 
    Let be a function such that                   ´ µ
                                                   Ö        ¡  ´           ½¼               ½ ¾ ´
                                                                                               Ö ¡  Å  ½            µ·
                                                                                                                    and         if  ¼ ´   µ ¾ ½¼ ¼
                                                                                                                                        ¡Å Ö  
´       µµ         ½¼
  Å  ½ Ö   ¡ Å . The function can be computed                                               and           ½
                                                                                                         otherwise. Function can be computed using
       ¾
using a Ö word     ´ · ½µ Ö-bit memory as follows. The value                                                                         ¾          ¾
                                                                                            a combinational circuit with Ö input bits and Ö out put
of   ´µ ¼
        (           ¾ ½
                    Ö   ) is stored in address of the mem-                                  bits. We set               ´        · ´ µµ ¾ ½ ¼
                                                                                                                                     Ö           . Then, the
ory in advance. Then, by reading address Ö             of the       ½¼                      least significant digit of ¡             · ·      ¾
                                                                                                                                  ¡ Å ¡ Å ¡ Ö ¾ is zero.
memory, we can obtain the value of                   ´ µ
                                             in one clock cy-                                   We will implement this idea in the same way as
cle. Using this memory,              ´       µ
                                  ¡ can be computed in one                                  Lemma 7. Instead of computing and , we compute ¡ Å
clock cycle. After that, the addition ¡              ¡ can  · ´ µ                           and ¡ Å using a memory. Let be a function such that
be computed using               ·½         ´¾ ¾         µ
                                        Ö Ö s from Lemma 1.                                  ´ µ ´        Ö       ¾ ½¼ ´    µµ ¾ ½ ¼
                                                                                                                      ¡  Å  ½ Ö   ¡ Å . Sim-
Figure 2 illustrates the circuit to compute ¡           ¡ .   · ´ µ                                                                  ¾
                                                                                            ilarly to , function can be computed using Ö ¾ -word
                                                                                            ´ ´ ·¾µ·
                                                                                                Ö                ¾µ
                                                                                                         Ö -bit memory. Then, ¡ Å   ´ µ             ¡ and
    Note that the least significant digit of                 ¡ · ´ ¡ µ                         ¡Å   ´              · ´ µµ             · ¾
                                                                                                                       holds. Thus, ¡Å               ¡Ö¾
is always zero. Hence, we can omit the computation of the                                    ´ µ· ¡              ´ · ´ µµ ¾   ¡    Ö ¾ . The readers should
least significant digit of and the following addition. Thus,                                 refer to Figure 3 for illustrating the circuit to compute
           ¾
we use a Ö word Ö-bit memory for computing             ¡            ´             µ            ¡·      ¡Å        ¡    · ´ µ· ´ · ´ µµ ¾               ¡ Ö ¾.
and            ´¾ ¾         µ
                   Ö Ö s to compute the sum ¡          ¡ .         ·´            µ          Since a FPGAs has dual port memories, two modules to
Therefore, we have,                                                                                                                      ¾
                                                                                            compute in Figure 3 can be computed by a single Ö ¾ -




      Authorized licensed use limited to: Bharat University. Downloaded on September 1, 2009 at 10:32 from IEEE Xplore. Restrictions apply.
                                                                                               shown hardware algorithms for Montgomery modulo mul-
                                                                                               tiplication for ¡   ´         ·          µ¾
                                                                                                                     ¡ Å ¡  Ö for -digit and Å , and
                                  Multiplier                                                   1-digit are shown for Lemmas 6 and 8. Using the same

                                               ¡                          ¡                    technique, we can obtain hardware algorithms for -digit
                                                                                               numbers. More specifically, from Lemma 6, we have,

                                                                                               Theorem 9 Montgomery modulo multiplication        ¡                         ´           ·
                                                   Adder                                                µ ¾
                                                                                                 ¡ Å ¡   Ö for three -digit redundant radix- Ö num-                            ¾
                                                                                               bers ,       and Å can be computed in clock cycles
                                                                                               using    ¾ · ½ ÅÍÄ´ · ¾ · ¾µ ¾
                                                                                                                     Ö     Ö   s,             Ö Ö s,                   ´           µ
                Adder                                                                                  ´¾ ¾ ¾   µ     ´ ·½µ´ ·¾µ
                                                                                                             Ö Ö Ö , and a     Ö   -bit register, with-
                                                                                               out block carries, whenever Ö .

     Figure 3. Circuit to compute  ¡ · ´ ¡ µ·                                                  Further, from Lemma 8, we have,
   ´ ¡ · ´ ¡ µµ ¡ ¾Ö ¾
                                                                                               Theorem 10 Montgomery modulo multiplication             ¡                   ´           ·
word ´ Ö · Ö ¾µ-bit dual port memory in the same time.                                                  µ ¾
                                                                                                  ¡ Å ¡   Ö for three -digit redundant radix- Ö num-                        ¾
The readers may think that a combinational circuit to com-                                     bers ,      and Å can be computed in clock cycles us-
pute is not necessary. However, block RAMs in most                                             ing      ÅÍÄ´ · ¾ · ¾µ · ¾
                                                                                                             Ö      Ö      s,                     Ö Ö Ö s,    ´¾                   µ
FPGAs to implement a memory support only synchronous                                                   ´¾ ¾ ¾     µ ¾
                                                                                                              Ö Ö Ö , a Ö ¾-word Ö-bit dual port memory,
read. Thus, one clock cycle is necessary to read a memory.                                                                 ¾
                                                                                               and a combinational circuit with Ö -bit input and Ö -bit                            ¾
It follows that, if we use a memory to implement the com-                                      output, and a  ´ · ½µ´ · ¾µ
                                                                                                                       Ö    -bit register, without block car-
putation of , two clock cycles are necessary to compute                                        ries, whenever Ö      .
 ´        · ´        .µµ                                                                       The readers should refer to [6, 11] for the details.
    Let us evaluate the hard aware resources necessary to
compute ¡            ·´¡          ¡µ· ´               ·´
                                            ¡ ¡ Ö ¾. The            µµ ¾
multiplication      ¡ can be computed using            Ö              ÅÍÄ´ ·                   6 Modulo Exponentiation on the FPGA
¾ · ¾µ
   Ö      s and                   ´ µ
                            Ö Ö s from Lemma 2. Function
 ´ µ   ¡ can be computed using a combinational circuits                                            The hardware algorithms for the Montgomery modulo
with 8 input bits and 8 output bits and addition     ¡                                ·        multiplication shown in Section 5 compute          ¡         ¡      ´               ·
 ´ µ   ¡ can be computed                           ´ ¾ ¾µ
                                    Ö Ö . After that the                                           µ¾
                                                                                               Å ¡   Ö . Recall that ¡           ´         ·
                                                                                                                                  ¡ Å ¡   Ö can be largerµ¾
value of function for two arguments can be computed us-                                        than Å . Thus, we need to subtract Å if it is no less than Å
   ¾
ing a Ö ¾ -word Ö-bit dual-port memory. Finally, the sum                                       to obtain ¡             ÑÓ
                                                                                                                     Å . In other words, we need to check
´ µ· ´
     ¡            ¡          µ· ´
                              ¡        · ´
                                        ¡ ¡ Ö ¾ can be          µµ ¾                           if ´  ¡       ·          µ¾
                                                                                                             ¡ Å ¡   Ö is less than Å . Unfortunately, the
computed using                  ´¾ ¾ ¾     µ
                                  Ö Ö Ö by straightforward                                                                                       ¾
                                                                                               comparison of two redundant radix- Ö numbers is not obvi-
generalization of Lemma 1. Consequently, we have,                                              ous. To perform the comparison, we need to convert them
Lemma 8 [11] Montgomery modulo multiplication           ¡                         ´            into the non-redundant numbers and the conversion is very
                                                                                                                                         ¡           ´
                                                                                                                                                 ¡Å ¡   Ö      ·               µ¾
     ·      µ ¾
        ¡ Å ¡  Ö for -digit and Å , and and 1-                                                 costly. Therefore, we do not check if
                                                                                               is less than Å . Alternatively, we check if the redundant
                        ¾
digit of redundant radix- Ö representation can be com-
puted using    ÅÍÄ´ · ¾ · ¾µ
                    Ö     Ö      ´
                                 s,            Ö Ö s, one                     µ                bits of the most significant digit are not zero. If this is the
                                                                                               case, we add  Å to         ¡     ´           ·
                                                                                                                                    ¡ Å ¡   Ö . Since theµ ¾
         ´ ¾ ¾µ
       Ö Ö ,         ´¾ ¾ ¾  µ ¾
                               Ö Ö Ö , a Ö ¾-word Ö-bit
dual-port memory, and a combinational circuit with Ö -bit                         ¾            redundant bits of the most significant digit are either 00 or
             ¾
input and Ö -bit output, without block carries, whenever
                                                                                               01, we can guarantee that, after the addition, they are 00.
                                                                                               Note that,  Å may not be added if       ¡         ´
                                                                                                                                                 ¡Å ¡   Ö     ·                µ¾
Ö .                                                                                            is no less than than Å . However, since we can guarantee
                                                                                               that the redundant bits of the most significant digit are 00,
 º        ÅÓÒØ ÓÑ ÖÝ ÅÓ ÙÐÓ ÅÙÐØ ÔÐ                                               Ø ÓÒ         we can avoid the overflow. Thus, the Montgomery modulo
          ÓÖ ÌÛÓ ¹   Ø ÆÙÑ Ö×                                                                  multiplication hardware algorithms for Theorems 10 and 9
                                                                                               can be modified to obtain the resulting value with the re-
   Recall that we have shown a hardware algorithm for the                                      dundant bits of the most significant digit being 00 in                                   ·½
                                                      ¾
product of -digit and 1-digit radix- Ö numbers is shown                                        clock cycles.
for Lemma 2. Using this hardware algorithm iteratively, the                                        Suppose that these modified circuits for Montgomery
                                           ¾
product of two -digit radix- Ö numbers can be computed                                         modulo multiplication are used to compute the modulo ex-
as shown in Lemma 4. In the previous subsection, we have                                       ponentiation algorithm based on the algorithm in Section 4.




         Authorized licensed use limited to: Bharat University. Downloaded on September 1, 2009 at 10:32 from IEEE Xplore. Restrictions apply.
At the end of the algorithm the value is stored as a -digit                                 in 18.46 MHz for non-redundant numbers. Hence, the total
                     ¾
redundant radix- Ö number. We first convert it to the -digit                                 computing time for redundant numbers is 2.521 ms, while
                            ¾
non-redundant radix- Ö number. For this purpose, we add                                     that for non-redundant numbers is 7.218 ms. Thus, we have
zero times. After that, all that redundant bits. Note that                                  achieved the speedup factor of 2.86 using redundant number
the resulting value can be no less than Å . Hence we check                                  systems.
if it is no less than Å . If this is the case, we add  Å and
perform the iterations of addition zero again to convert it                                 8 Conclusion
to non-redundant number. If Å                      ¾
                                           Ö ½ , we can guaran-
tee that the value thus obtained is less than Å . In this way,
we can obtain          È         ÑÓ
                                  Å in non-redundant number
                                                                                               We have presented hardware algorithms for the mod-
                                                                                            ulo exponentiation used in RSA encryption and decryption.
system.
    Let us evaluate the clock cycles to obtain        È                     ÑÓ              The best algorithm runs in 2.67ms and in expected 1.99ms
                                                                                            to compute È           ÑÓ
                                                                                                                  Å for 1024-bit numbers È , , and Å
Å using Theorems 9 or 10. The Montgomery modulo
multiplication takes            ·½
                              clock cycles, and it is executed
                                                                                            on Xilinx Virtex II Pro Family FPGA XCVP30-6. It also
                                                                                                                       ½        ¾ ·½
at most Ö¾ ·¾        times from Lemma 5. After that, it is
                                                                                            runs in 0.027ms if               . Our hardware algorithms
                                                                                            run faster than previously presented algorithms.
converted to non-redundant number in clock cycles. If
the resulting value is no less than Å ,  Å is added and
                              ·½
then zero is added times to it in totally             clock cy-                             References
cles. Thus, the modulo exponentiation can be computed in
´¾ · ¾µ´ · ½µ · ¾ · ½ ´¾ · µ´ · ½µ
     Ö                                 Ê        Ê Ö       clock                              [1] D. N. Amanor, C. Paar, J. Pelzl, V. Bunimov, and
                 ´½ · µ´ · ½µ
cycles and in expected        Ê         Ê Ö       clock cycles.
                                                                                                 M. Schimmler. Efficient hardware architectures for modular
                                                                                                 multiplication on FPGAs. In Proc. of International Confer-
Theorem 11 The modulo exponentiation             È ÑÓ                                            ence on Field Programmable Logic and Applications, pages
Å for Ê-bit numbers can be computed using hardware al-                                           539–542, 2005.
gorithms for Theorems 9 or 10 in less than ´¾Ê · µ´Ê Ö ·
                                                                                             [2] T. Blum and C. Paar. High-radix montgomery modular ex-
½µ clock cycles.                                                                                 ponentiation on reconfigurable hardware. IEEE Trans. on
                                                                                                 Computers, 50(7):759–764, 2001.
                                                                                             [3] T. Blum and C. Paar. High-radix montgomery modular ex-
   If we use non-redundant numbers, the conversion from
                                                                                                 ponentiation on reconfigurable hardware. IEEE Transac-
the redundant numbers is not necessary. If this is the
case, the modulo exponentiation can be computed in Ê            ´¾ ·                             tions on Computers, 50(7):759–764, 2001.

¾µ´     ·½µ                                              ´½ ·¾µ´ ·
                                                                                             [4] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduc-
   Ê Ö      clock cycles and in expected   Ê      Ê Ö                                            tion to Algorithms. MIT Press, 1990.
½µ clock cycles.                                                                             [5] R. Garg and R. Vig. An efficient montgomery multiplica-
                                                                                                 tion algorithm and RSA cryptographic processor. In Proc.
                                                                                                 of International Conference on Computational Intelligence
7 Experimental Results                                                                           and Multimedia Applications, pages 188–195, 2007.
                                                                                             [6] K. Kawakami, K. Shigemoto, and K. Nakano. Redundant
   We have implemented our hardware algorithms for the                                           radix-¾Ö number system for accelerating arithmetic opera-
modulo exponentiation on Virtex II Pro Family FPGA                                               tions on the FPGAs. In Proc. of International Conference on
XC2VP30-6, which has 13,696 slices, 136 ¢ -bit mul-        ½ ½                                   Parallel and Distributed Computing, Applications and Tech-
tipliers, and 136 18k-bit dual-port block RAMs. We have                                          nologies (PDCAT), pages 370–377, 2008.
used XST in ISE Foundation 10.1i for logic synthesis and                                     [7] A. Mazzeo, L. Romano, G. P. Saggese, and N. Mazzocca.
analysis. Since this FPGA has 18-bit multipliers as building                                     FPGA-based implementation of a serial RSA processor. In
blocks, it makes sense to let Ö                ½
                                     . Thus, we use redun-
                                                                                                 Proc. of Design, Automation and Test in Europe Conference
                                   ¾
dant radix-64K (i.e. radix- ½ ) number system.
                                                                                                 and Exhibition, 2003.
                                                                                             [8] P. L. Montgomery. Modular multiplication without trial divi-
   Table 1 shows the performance of the experimental re-                                         sion. Mathematics of Computation, 44(170):519–521, 1985.
sults of the modulo exponentiation shown in Theorem 9                                        [9] B. Parhami. Computer Arithmetic - Algorithm and Hard-
and Theorem 11, which use               ½          ½
                                  ¢ -bit multipliers. Ta-                                        ware Designs. Oxford University Press, 2000.
ble 2 shows that for Theorem 10 and Theorem 11, which use                                   [10] R. L. Rivest, A. Shamir, and L. Adleman. A method for
½ ½¢ multipliers and 18k-bit block RAMs. In both table,                                          obtaining digital signatures and public-key cryptosystems.
                                                                                                 Communications of the ACM, 21:120 – 126, 1978.
the performance are evaluated for -digit redundant radix-
                                                                                            [11] K. Shigemoto, K. Kawakami, and K. Nakano. Accelerat-
64K numbers and non-redundant numbers. Clearly, the
                                                                                                 ing montgomery modulo multiplication for redundant radix-
clock frequency for redundant numbers are fixed, while it                                         64k number system on the FPGA using dual-port block
decreases as the number of bits increases for non-redundant                                      RAMs. In Proc. of International Conference On Embedded
numbers. For example, in Table 2, the 1024-bit modulo ex-                                        and Ubiquitous Computing(EUC), pages 44–51, 2008.
ponentiation runs in 52.9MHz for redundant numbers and




      Authorized licensed use limited to: Bharat University. Downloaded on September 1, 2009 at 10:32 from IEEE Xplore. Restrictions apply.
                  Table 1. Modulo exponentiation using 18-bit multipliers (Theorems 9 and 11)
                                  bits                       64   128    256    512      1024
                   redundant      clock(MHz)              40.80 41.50 40.38 40.30       40.21
                                  clock cycles (worst)      660 2340 8772 33924 133380
                                  time (ms)               0.016 0.056 0.217 0.841       3.317
                                  clock cycles (expected)   500 1764 6596 25476 100100
                                  time (ms)               0.012 0.042 0.161 0.632       2.489
                                  slices                    865 1708 2905      5811    13467
                                  multipliers                 9    17     33     65       129
                   non-redundant clock(MHz)               47.43 41.75 33.61 25.12       16.38
                                  clock cycles (worst)      650 2322 8738 33858 133250
                                  time (ms)               0.013 0.055 0.259 1.347       8.134
                                  clock cycles (expected)   480 1746 6562 25344        99970
                                  time (ms)               0.010 0.041 0.195 1.008       6.103
                                  slices                    530   974 1971     3909      7549
                                  multipliers                 9    17     31     63       123


Table 2. Modulo exponentiation using 18-bit multipliers and 18k-bit block RAMs(Theorems 10 and 11)


                                              bits                                      64         128          256          512             1024
                    redundant                 clock(MHz)                             53.20       52.51        52.77        52.57            52.90
                                              clock cycles (worst)                     660        2340         8772        33924          133380
                                              time (ms)                              0.012       0.044        0.166        0.645            2.521
                                              clock cycles (expected)                  500        1764         6596        25476          100100
                                              time (ms)                              0.009       0.033        0.124        0.484            1.892
                                              slices                                   896        1652         3204         5868           11589
                                              multipliers                                4           8           16           32               64
                                              block RAMs                                 2           4            8           15               29
                    non-redundant             clock(MHz)                             68.06       58.83        44.47        30.40            18.46
                                              clock cycles (worst)                     650        2322         8738        33858          133250
                                              time (ms)                               0.09       0.039        0.196        1.113            7.218
                                              clock cycles (expected)                  480        1746         6562        25344           99970
                                              time (ms)                               0.07       0.029        0.147        0.833            5.415
                                              slices                                   613        1086         2034         3911             7708
                                              multipliers                                4           8           15           31               61
                                              block RAMs                                 2           4            8           15               29




  Authorized licensed use limited to: Bharat University. Downloaded on September 1, 2009 at 10:32 from IEEE Xplore. Restrictions apply.

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:117
posted:3/5/2010
language:English
pages:8
Description: RSA Encryption and Decryption using the Redundant Number System is a security protection algorithm that many banks are using now for the data encryption and decryption.