Integer factorization

Document Sample
Integer factorization Powered By Docstoc
					Integer factorization

     1 Prime decomposition
     2 Practical applications
     3 Current state of the art

In number theory, the integer factorization problem is the problem of finding a
non-trivial divisor of a composite number; for example, given a number like 91,
the challenge is to find a number such as 7 which divides it. When the numbers
are very large, no efficient algorithm is known; a recent effort which factored a
200 digit number (RSA-200) took eighteen months and used over half a century of
computer time. The supposed difficulty of this problem is at the heart of certain
algorithms in cryptography such as RSA. Many areas of mathematics and
computer science have been brought to bear on the problem, including elliptic
curves, algebraic number theory, and quantum computing.
Prime decomposition
By the fundamental theorem of arithmetic, every integer has a unique prime
factorization. Given an algorithm for integer factorization, one can factor any
integer down to its constituent primes by repeated application of this algorithm.

Practical applications
The hardness of this problem is at the heart of several important cryptographic
systems. A fast integer factorization algorithm would mean that the RSA public-
key algorithm was insecure. Some cryptographic systems, such as the Rabin
public-key algorithm and the Blum Blum Shub pseudo-random number generator
can make a stronger guarantee - any means of breaking them can be used to build
a fast integer factorization algorithm, so if integer factorization is hard then they
are strong. In contrast, it may turn out that there are attacks on the RSA problem
more efficient than integer factorization, though none are currently known.
A similar hard problem with cryptographic applications is the discrete logarithm
Current state of the art
As of 2005, the largest semiprime factored using general-purpose methods as part
of public research is RSA-200, which is 663 bits long. This was done using a
network of computers running in parallel between Christmas 2003 and May 2005,
and used very approximately 75 years total computer time. The method used was
the general number field sieve.

Difficulty and complexity
If a large, n-bit number is the product of two primes that are roughly the same size,
then no algorithm is known that can factor in polynomial time. That means there
is no known algorithm that can factor it in time O(nk) for any constant k. There are
algorithms, however, that are faster than Θ(en). In other words, the best known
algorithms are sub-exponential, but super-polynomial. In particular, the best
known asymptotic running time is for the general number field sieve (GNFS)
algorithm, which is:
For an ordinary computer, GNFS is the best known algorithm for large n. For a
quantum computer, however, Peter Shor discovered an algorithm in 1994 that
solves it in polynomial time. This will have significant implications for
cryptography if a large quantum computer is ever built. Shor's algorithm takes
only O((log n)3) time and O(log n) space. In 2001, the first 7-qubit quantum
computer became the first to run Shor's algorithm. It factored the number 15.
Shor's algorithm
     1 Procedure
        o 1.1 Classical part

        o 1.2 Quantum part: Period-finding subroutine:

     2 Explanation of the algorithm
        o 2.1 I. Obtaining factors from period

        o 2.2 II. Finding the period

     3 Modifications to Shor's Algorithm
Shor's algorithm is a quantum algorithm for factoring a number N in O((log N)3)
time and O(log N) space, named after Peter Shor.
The algorithm is significant because it implies that public key cryptography might
be easily broken, given a sufficiently large quantum computer. RSA, for example,
uses a public key N which is the product of two large prime numbers. One way to
crack RSA encryption is by factoring N, but with classical algorithms, factoring
becomes increasingly time-consuming as N grows large; more specifically, no
classical algorithm is known that can factor in time O((log N)k) for any k. By
contrast, Shor's algorithm can crack RSA in polynomial time. It has also been
extended to attack many other public key cryptosystems.
Like all quantum computer algorithms, Shor's algorithm is probabilistic: it gives
the correct answer with high probability, and the probability of failure can be
decreased by repeating the algorithm.
Shor's algorithm was demonstrated in 2001 by a group at IBM, which factored 15
into 3 and 5, using a quantum computer with 7 qubits.
Like most factorization algorithms, Shor’s algorithm reduces the factorization
problem to problem of finding the period of a function.

Uses quantum parallelism to find a superposition of all values of the function
In one step.

Then it calculated the QFT of the function, which sets the amplitudes into
multiples of the fundamental frequency, the reciprocal of the period.

                             Integer Factoring

                             Order Finding

                             Phase Estimation

                        Quantum Fourier Transform
The problem we are trying to solve is that, given an integer N, we try to find
another integer p between 1 and N that divides N.
Shor's algorithm consists of two parts:
  1. A reduction of the factoring problem to the problem of order-finding, which
     can be done on a classical computer.
  2. A quantum algorithm to solve the order-finding problem.
Classical part
  1. Pick a pseudo-random number a < N
  2. Compute gcd(a, N). This may be done using the Euclidean algorithm.
  3. If gcd(a, N) ≠ 1, then there is a nontrivial factor of N, so we are done.
  4. Otherwise, use the period-finding subroutine (below) to find r, the period of
     the following function:

    i.e. the smallest integer r for which f(x + r) = f(x).
  5. If r is odd, go back to step 1.
  6. If a r/2 ≡ -1 (mod N), go back to step 1.
  7. The factors of N are gcd(ar/2 ± 1, N). We are done.
Euclidean algorithm

The Euclidean algorithm (also called Euclid's algorithm) is an algorithm to
determine the greatest common divisor (gcd) of two integers. It is one of the
oldest algorithms known, since it appeared in Euclid's Elements around 300 BC.
However, the algorithm probably was not discovered by Euclid and it may have
been known up to 200 years earlier. It was almost certainly known by Eudoxus of
Cnidus (about 375 BC); and Aristotle (about 330 BC) hinted at it in his Topics,
158b, 29-35. The algorithm does not require factoring the two integers.

Algorithm and implementation
Given two natural numbers a and b, check if b is zero. If yes, a is the gcd. If not,
repeat the process using b and the remainder after integer division of a and b
(written a modulo b below). The algorithm can be naturally expressed using tail
By keeping track of the quotients occurring during the algorithm, one can also
determine integers p and q with ap + bq = gcd(a, b). This is known as the
extended Euclidean algorithm.
Euclid originally formulated the problem geometrically, as the problem of finding
a common "measure" for two line lengths, and his algorithm proceeded by
repeated subtraction of the shorter from the longer segment. This is equivalent to
the following implementation, which is considerably less efficient than the
method explained above:
 function gcd(a, b)
     while a ≠ b
         if a > b
              a := a - b
              b := b - a
     return a
Proof of correctness
Suppose a and b are the numbers whose gcd has to be determined. And suppose
the remainder of the division of a by b is t. Therefore a = qb + t where q is the
quotient of the division. Now any common divisor of a and b also divides t (since
t can be written as t = a − qb); similarly, any common divisor of b and t will also
divide a. Thus the greatest common divisor of a and b is the same as the greatest
common divisor of b and t. Therefore it is enough if we continue the process with
the numbers b and t. Since t is smaller in absolute value than b, we will reach t = 0
after finitely many steps.
Running time

Plot of the running time for gcd(x,y)
When analyzing the running time of Euclid's algorithm, it turns out that the inputs
requiring the most divisions are two successive Fibonacci numbers, and the worst
case requires O(n) divisions, where n is the number of digits in the input. However,
it must be noted that the divisions themselves are not atomic operations (if the
numbers are larger than the natural size of the computer's arithmetic operations),
since the size of the operands could be as large as n digits. The actual running
time is therefore O(n²).
This is, nevertheless, considerably better than Euclid's original algorithm, in
which the modulus operation is effectively performed using repeated subtraction
in O(2n) steps. Consequently, this version of the algorithm requires O(n2n) time
for n-digit numbers, or O(mlog m) time for the number m.
Euclid's algorithm is widely used in practice, especially for small numbers, due to
its simplicity. An alternative algorithm, the binary GCD algorithm, exploits the
binary representation used by computers to avoid divisions and thereby increase
efficiency, although it too is O(n²); it merely shrinks the constant hidden by the
big-O notation on many real machines.
Continued fractions
The quotients that appear when the Euclidean algorithm is applied to the inputs a
and b are precisely the numbers occurring in the continued fraction representation
of a/b. Take for instance the example of a = 1071 and b = 1029 used above. Here
is the calculation with highlighted quotients:
    1071 = 1029 × 1 + 42
    1029 = 42 × 24 + 21
    42 = 21 × 2 + 0
From this, one can read off that

This method can even be used for real inputs a and b; if a/b is irrational, then the
Euclidean algorithm will not terminate, but the computed sequence of quotients
still represents the (now infinite) continued fraction representation of a/b.
C/C++ code
int gcd(int a, int b) {
  if (b == 0)
    return a;
    return gcd(b, a % b);
This can be rewritten iteratively as:
function gcd(a, b)
  while b ≠ 0
    var t := b                          Note: This is in Pascal.
    b := a modulo b
    a := t
  return a
For example, the gcd of 1071 and 1029 is computed by this algorithm to be 21
with the following steps:
                                   a    b  t
                                1071 1029 42
                                1029 42 21
                                  42 21 0
                                  21    0
Quantum part: Period-finding subroutine:
 1. Start with a pair of input and output qubit registers with log2N qubits each,
    and initialize them to

   where x runs from 0 to N - 1.
 2. Construct f(x) as a quantum function and apply it to the above state, to obtain

 3. Apply the quantum Fourier transform on the input register. The quantum
    Fourier transform on N points is defined by:

   This leaves us in the following state:
4. Perform a measurement. We obtain some outcome y in the input register and
   f(x0) in the output register. Since f is periodic, the probability to measure some
   y is given by

  Analysis now shows that this probability is higher, the closer yr/N is to an
5. Turn y/N into an irreducible fraction, and extract the denominator r′, which is
   a candidate for r.
6. Check if f(x) = f(x + r′). If so, we are done.
7. Otherwise, obtain more candidates for r by using values near y, or multiples
   of r′. If any candidate works, we are done.
8. Otherwise, go back to step 1 of the subroutine.
Explanation of the algorithm
The algorithm is composed of two parts. The first part of the algorithm turns the
factoring problem into the problem of finding the period of a function, and may be
implemented classically. The second part finds the period using the quantum
Fourier transform, and is responsible for the quantum speedup.

I. Obtaining factors from period
The integers less than N and coprime with N form a finite group under
multiplication modulo N, which is typically denoted (Z/NZ)×. By the end of step 3,
we have an integer a in this group. Since the group is finite, a must have a finite
order r, the smallest positive integer such that

Therefore, N | (a r − 1). Suppose we are able to obtain r, and it is even. Then
r is the smallest positive integer such that a r ≡ 1, so N cannot divide (a r / 2 − 1). If
N also does not divide (a r / 2 + 1), then N must have a nontrivial common factor
with each of (a r / 2 − 1) and (a r / 2 + 1).
Proof: For simplicity, denote (a r / 2 − 1) and (a r / 2 + 1) by u and v respectively. N
| uv, so kN = uv for some integer k. Suppose gcd(u, N) = 1; then mu + nN = 1 for
some integers m and n (this is a property of the greatest common divisor.)
Multiplying both sides by v, we find that mkN + nvN = v, so N | v. By
contradiction, gcd(u, N) ≠ 1. By a similar argument, gcd(v, N) ≠ 1.
This supplies us with a factorization of N. If N is the product of two primes, this is
the only possible factorization.

II. Finding the period
Shor's period-finding algorithm relies heavily on the ability of a quantum
computer to be in many states simultaneously. Physicists call this behavior a
"superposition" of states. To compute the period of a function f, we evaluate the
function at all points simultaneously.
Quantum physics does not allow us to access all this information directly, though.
A measurement will yield only one of all possible values, destroying all others.
Therefore we have to carefully transform the superposition to another state that
will return the correct answer with high probability. This is achieved by the
quantum Fourier transform.
Shor thus had to solve three "implementation" problems. All of them had to be
implemented "fast", which means that they can be implemented with a number of
quantum gates that is polynomial in logN.
  1. Create a superposition of states. This can be done by applying Hadamard
     gates to all qubits in the input register. Another approach would be to use the
     quantum Fourier transform (see below).
  2. Implement the function f as a quantum transform. To achieve this, Shor used
     repeated squaring for his modular exponentiation transformation.
  3. Perform a quantum Fourier transform. By using controlled NOT gates and
     single qubit rotation gates Shor designed a circuit for the quantum Fourier
     transform that uses just O((logN)2) gates.
After all these transformations a measurement will yield an approximation to the
period r. For simplicity assume that there is a y such that yr/N is an integer. Then
the probability to measure y is 1. To see that we notice that then
    e2πibyr / N = 1
for all integers b. Therefore the sum that gives us the probability to measure y will
be N/r since b takes roughly N/r values and thus the probability is 1/r. There are r
y such that yr/N is an integer, so the probabilities sum to 1. Note: another way to
explain Shor's algorithm is by noting that it is just the quantum phase estimation
algorithm in disguise.

Modifications to Shor's Algorithm
There have been many modifications to Shor's algorithm. For example, whereas,
an order of twenty to thirty runs are required on a quantum computer in the case of
Shor's original algorithm, and with some of the other modifications, in the case of
the modification done by David McAnally at the University of Queensland an
order of only four to eight runs on the quantum computer is required.
Realizations 7-qubit, NMR

•   Nuclear Magnetic Resonance : IBM
                 15=3x5 !! (Shor’s algorithm)
Initial State
 After Doing Hadamard
Transform on all bits of a
After modular exponentiation
         b=x (mod N)
State After Fourier Transform
• Produce a periodic register in the

target      ancilla

                      period !!!
Efficient quantum Fourier transform