Anda di halaman 1dari 6

U.C.

Berkeley CS174: Randomized Algorithms


Professor Luca Trevisan

Lecture Note 13
May 6, 2003

Primality Testing
Consider the following problem: given an n-bit integer N , we want to find out whether N
is prime or not.
The problem is primarily motivated by cryptographic applications, where one needs to
generate large random primes. It is known that a random n-bit number has a (1/n)
probability of being prime, and so a typical method to generate random primes is to keep
picking random numbers until a prime one is found. For this method to work, one needs to
be able to recognize prime numbers.
More fundamentally, the study of prime numbers and prime factorizations of integers is as
old as arithmetic itself, and so the existence of an efficient algorithm for checking primality
has been an open question since ancient times (although formulated in such modern terms
only more recently).
In the late Seventies, two probabilistic polynomials time algorithms for checking primality
were found (by Rabin and by Miller), and their discovery was the first strong evidence of the
usefulness of randomness in computation. Until August 2002, no deterministic polynomial
time algorithm was known.
In these notes we discuss a more recent probabilistic polynomial time algorithm due to
Agrawal and Biswas (from 1999). The deterministic algorithm by Agrawal, Kayal and
Saxena is a de-randomization of the Agrawal-Biswas algorithm.

A Probabilistic Algorithm

The following property of prime numbers has been known for a while.
Lemma 1 (Fermats Little Theorem) If N is prime, then for every a 6= 0 we have
aN a

(mod N )

This suggests the following idea for a probabilistic algorithm: pick a at random, and check
that aN a (mod N ). The algorithm will always accept primes, and the question is what
happens with numbers that are not prime. Ideally, we would like to show that if N is not
prime then the equivalence fails with reasonably large probability on a random a.
Unfortunately there are numbers that are composite and for which the equivalence is true
for every a. The idea, however, is not completely useless and, in fact, Millers algorithm
and Rabins algorithm use it, along with ways of handling the nasty composite numbers for
which the naive idea fails.
The Agrawal-Biswas algorithm, instead, is based on the following more general result, that
characterizes prime numbers.
Lemma 2 Let a and N be such that gcd(a, N ) = 1. Then N is prime if and only if
(x a)N (xN a) (mod N )
1

Before proving it, let us see what the Lemma is saying. In the expression above, a is a
constant and x is a variable. Consider
polynomial
(x a)N , and expand it into a sum of
PN the N
N  k N k
k
monomials, that is, write it as k=0 (1)
. Then reduce all the coefficients
k x a
modulo N . The Lemma says that if N is prime, then the coefficients of all the terms xk
with 1 k N 1 become zero, and we also have (a)N (a); if N is not prime, then
some of these coefficients are non-zero.
For example, for N = 5 and a = 1 we have
(x + 1)5 = x5 + 5x4 + 10x3 + 10x2 + 5x + 1
and if we reduce all coefficients modulo 5 we are only left with x5 + 1.
For N = 6 and a = 1 we have
(x + 1)6 = x6 + 6x5 + 15x4 + 20x3 + 15x4 + 6x2 + 1
and if we reduce all coefficients modul 6 we are left with x6 + 3x4 + 2x3 + 3x2 + 1.
Let us now prove the Lemma.
Proof: [Of Lemma 2] Suppose that N is prime. We want to show that, for 1 k N 1,
the coefficient of xk in the polynomial (xa)N is a multiple of N and also that (a)N (a)
(mod N ). The second equivalence follows immediately from
 Fermats Little Theorem.
 For
N
N
k
N
k
the other equivalence, the coefficient of x is (a)
k . In the expression for k , N
occurs in the numerator, and numbers smaller than N occur in
the denominator: if N is
N
prime, the N on the numerator cannot be canceled, and so k is a multiple of N , and it
is zero modulo N .
Suppose that N is composite. Then let p be a prime factor of N and say that pi is the
largest power of p that divides
N . Then consider the coefficient of xp in (x a)N . The
N
N
p
coefficient is (a)
p , and we want to argue that the coefficient cannot be a multiple
i
of p and, for a stronger reason, cannot be a multiple of N . First of all, a and N have
no common factor, so a does not contain p as a factor. Therefore, if (a)N p Np were

indeed a multiple of pi , then Np would have to be a multiple of pi . Let us see why this is
impossible. We have
 
N
N (N 1) (N p + 1)
=
p(p 1) 2
p
Now, none of (N 1), . . . , (N p + 1) can be a multiple of p, and N/p is a multiple of pi1
but not of pi . So the whole expression cannot be a multiple of pi .

The Lemma seems to give an algorithm to check primality: compute the polynomial (x+1)N ,
reduce all coefficients modulo N , and see if any intermediate coefficient is non-zero. The
problem is that we want an algorithm running in time polynomial in the number of digits of
N , that is, polynomial in log N , while the above procedure would take time at least linear
in N .
We may also be tempted to apply results about polynomial identity test. We want to check
whether the polynomial identity (x + 1)N = (x + 1) holds when we do operations modulo
N . Considering the results seen in the last two lecture, we could think of picking a few
random values for x and see if the identity holds relative to these values. There are at least
two problems with this approach: one is that the analysis of the polynomial identity testing
2

algorithm works assuming that operations are done in a field. In this case, we would have
to assume that N is prime, which is what we are trying to decide in the first place. The
other problem is that the analysis of the testing algorithm requires the field size (in our
case, N ) to be bigger than the degree of the polynomial, while in this case N is both the
size of the range of values for the variable x and the degree of the polynomial.
It turns out that the right approach is via fingerprinting. Remember that if we have two
n-bit integers A and B, with A 6= B and we pick a random prime r of value O(n log n),
then there is a high probability that A 6 B (mod r). On the other hand, if A = B then
also A B (mod r) no matter how we choose r. If we pick O(log n) random numbers r
of value O(n log n), then there is a good probability that at least one of them is prime, and
so our algorithm to check if A = B or not will be to pick O(log n) numbers r in the range,
say, 20n log n to 40n log n, and to check if A B (mod r) for all of them. What does it
have to do with our problem? The analogous approach would be to pick a random prime
polynomial r(x) and then check if
((x + 1)N mod r(x)) ((x + 1) mod r(x)) (mod N )
except that it is not clear what it means to divide one polynomial by another and what is
a prime polynomial.
Given two polynomials p(x) and q(x), their sum and their product are well defined operations. What about division? It turns out that if a(x) and b(x) are polynomials, and the
degree of a() is bigger than the degree of b(), then there always exist two polynomials q(x)
and r(x) such that
a(x) = b(x)q(x) + r(x)
and such that the degree of r() is smaller than the degree of a(). Furthermore, q() and r()
are unique.1 If a(), b(), q(), r() are as above, we write a(x) mod b(x) = r(x) and a(x) r(x)
(mod b(x)).
We say that b(x) divides a(x) if there is a polynomial q(x) such that a(x) = b(x)q(x).
We say that a polynomial a(x) is irreducible if its only divisors are constants and constant
multiples of itself. An irreducible polynomial is the analog of prime numbers in the case
of polynomials. Finally, it is possible to prove that every polynomial can be factored as
a product of irreducible polynomial, and the factorization is unique up to multiplication
by constants. For example, assuming that we do operations over the real numbers, the
polynomial x3 x2 + x 1 can be written as (x2 + 1)(x 1), where both x2 + 1 and x 1
are irreducible, or as (2x2 + 2)( x2 12 ) and so on.
This issue of constants is a little annoying, so we will use a trick to avoid it. We say that
a polynomial a(x) is monic if the coefficient of the highest degree term is 1. For example
x2 + 1 is monic but 2x2 + 2 is not. Now, a monic polynomial has a unique factorization as
a product of monic irreducible polynomials.
Finally, the whole theory holds no matter what field we do operations in: the complex
numbers, the real numbers, or even residue classes modulo a prime N . (Whether or not
a polynomial is irreducible depends on the field, but within a field the theory is always
perfectly consistent.)
1

Note the analogy with the case of integers: if a b are integers, then there are integers q, r with
0 r < b such that a = bq + r.

Now that we have come this far, we can finally describe the algorithm. The algorithm takes
care of certain special cases separately, then it picks a random polynomial r() of degree
O(log N ), and it checks whether dividing (x + 1)N (x + 1) by r(x) leaves a non-zero
remainder.
1. Input N
2. if N {2, 3, 5, 7, 11, 13} return PRIME
3. if N is a multiple of an element of {2, 3, 5, 7, 11, 13} return COMPOSITE
4. if N = ab for some integer a, b with b 2, return COMPOSITE
5. Repeat 40log N times
Pick a random monic polynomial r() of degree log n with coefficients in ZN

If r() is not a divisor of (x + 1)N xN 1, doing operations mod N , return


COMPOSITE
6. return PRIME
Let us now analyze the algorithm. First of all, you should convince yourself that it can be
implemented in time polynomial in log N . A tricky part is how to check if N is a perfect
power. For the rest, the polynomial (x + 1)N mod r(x) mod N can be computed in time
polynomial in the degree of r(x) using repeated squaring.
Regarding correctness, it should be clear that if N is prime then the algorithm will output
PRIME with probability 1. It remains to see that if N is composite there is a reasonably
large probability that the algorithm outputs COMPOSITE.
If N is composite, then it is either identified in the first three lines, or otherwise it has at
least two different prime factors, and they are both at least 17.
Now
let p be a prime factor of N , and let pi be the largest power of p that divides N . Then
N
N
pi is not a multiple of p, and so the polynomials (x + 1) and x + 1 are not only different
modulo N , but also different modulo p.2
We show that there is a reasonable probability that when we pick r at random
(x + 1)N 6 (x + 1)

(mod r(x), p)

When this happens, then, for a stronger reason, (x + 1)N 6 (x + 1) (mod r(x), N ), and
so the algorithm outputs COMPOSITE.
First of all, if we pick a monic r(x) of degree d = log n with coefficients in ZN , and
then we do operations modulo p, where p is prime and divides N , then this is the same as
picking a random monic polynomial r(x) of degree d with coefficients in Zp . Now consider
the non-zero monic polynomial a(x) = (x + 1)N (x + 1) mod p with coefficients in Zp . We
know that a(x) has a unique factorization as a product of monic irreducible polynomials,
and at most N/d of them can have degree d. If r(x) happens to be irreducible, then it is
either one of those N/d factors of a(x), or otherwise a(x) mod r(x) mod p is not zero and
the algorithm outputs COMPOSITE. So, how many irreducible polynomials of a certain
degree are there? The following result from number theory gives an answer.
2

Note that this is not true if N = pi .

Theorem 3 If p is prime, there are at least pd /d


degree d over Zp .

pd monic irreducible polynomials of

Now we are done. We know that p > 16, and so the Theorem implies
that there at least
p
d
d
p /2d monic irreducible polynomials of degree d (because we have p < pd /2d). This gives
us a probability 1/2d that r is irreducible. We need to subtract the probability that r() is
an irreducible factor of a(x), but this is only n/dpd < 1/4d, and so our overall probability
of finding an r that is irreducible and does not divide a() is at least 1/4d. and choosing
d = log n, we get a probability at least 1/(4 log n) that r(x) is irreducible and is neither of
the degree-d factors of a(x). Since we try 10d times, there is a probability at least 1 1/e10
of succeeding, which is about .99995.

A Deterministic Algorithm

The deterministic algorithm is somewhat more complicated in the analysis but not in the
structure.
The probabilistic algorithm used only the property that (x + 1)N 6 xN + 1 mod N when
N is composite. The deterministic algorithm uses the fact that, if N is composite, we also
have (x a)N 6 xN a mod N for every integer a that has no common factor with N .
Instead of checking the inequality for a random polynomial r(), the algorithm selects an
integer r with certain special properties, and then checks whether (x a)N 6 xN a mod
(xr 1) mod N .
Here is the algorithm
1. Input N
2. If N = ab with b 2 output COMPOSITE
3. r = 2
4. repeat
if gcd(N, r) 6= 1 output COMPOSITE

if r is prime

let q be largest prime factors of r


r1

if (q > 4 r log N ) and (N q 6 1

(mod r)) then BREAK

r =r+1

5. For a = 1 to 2 r log N
If (x a)N 6 xN a

(mod ()xr 1)

6. output PRIME

(mod N ) output COMPOSITE

So we do operations modulo a fixed polynomial but we change a.


The polynomial is xr 1, where r is a prime number such that r 1 has a big prime factor
and such that another property holds with respect to N . We look for r sequentially, and, in
order to prove that the algorithm is efficient, it is necessary to show that if N is composite
and not a prime power a proper r will be found quickly. It turns out that such an r can
be found of value at most O((log N )6 ). Therefore, the time complexity of the algorithm is
polynomial in log N .
The correctness is more difficult to establish and beyond the scope of this course, but the
argument is similar in spirit to the one of the previous section: we need to understand what
the factors of (x a)N x a look like when we do operations modulo xr 1 and modulo
a prime that divides N .

Anda mungkin juga menyukai