5.1 Pollard's

Next: 5.2 Group theoretic method Up: 5 Factorisation and Certificates Previous: 5 Factorisation and Certificates

5.1 Pollard's $\rho$

We saw that the trial division technique was undermined by the requirement of a large number of primes and the number of trials that need to be performed. While this made it slow for ``testing'' primality or compositeness, we have not offered any alternative to it (so far) for the purpose of finding prime factors. The method now presented is quite a practical alternative; this speed has some theoretical basis as well; unfortunately, that theoretical basis is incomplete, so the algorithm may be slower than trial division in some cases.

The idea behind this method is that iterated self-maps of finite sets must cycle. Let S be a finite set, f : S $\to$ S be any map and x₀ $\in$ S be some starting point. We define x_{k + 1} = f (x_k) for k $\geq$ 0. By the finiteness of S there is some pair (p, q) so that x_{p + q} = x_p; but then by applying f repeatedly to both sides it is clear that x_{r + q} = x_r for all r $\geq$ p. The smallest p so that x_p is repeated is called the pre-period M; the smallest q is called the period T (these depend to some extent on x₀ as well as f and S). The points x₀, ..., x_M are the ``tail'' and the points x_{M + r}, r $\geq$ 0 are the ``head'' the Pollard's $\rho$ (the name is given for the shape of the letter). Clearly, determining M and T (given S, f and x₀) is an interesting computational problem. Before that let us see what this has to do with factoring.

Now suppose S = S₁×S₂ and f = (f₁, f₂), then f₁ (respectively) f₂) will have its own (M₁, T₁) (respectively (M₂, T₂)) as pre-period and period. Each will (in general) be less than that for S; certainly those for S are upper bounds.

In particular, let us look at the case where S = $\mathbb {Z}$ /N $\mathbb {Z}$ , when N is composite; we know that there are S₁ = $\mathbb {Z}$ /a $\mathbb {Z}$ and S₂ = $\mathbb {Z}$ /b $\mathbb {Z}$ , where N = ab with a and b co-prime. Thus we should look for T₁ (or T₂). We know we will have found a period when gcd(x_{p + q} - x_p, N) > 1. If this GCD is less than N then we have found a factor (and T₁ is a multiple of q) otherwise we have only found a multiple of T; hopefully we will not be so unlucky!

Another aspect to examine is what kind of maps f are ``good'' from the point of view of finding M and T. Clearly, we can divide S into the set of repeating points S_cyclic and the set of transients S_transient (which are never repeated). If the latter set is very large, then M is likely to be very large. On the other hand if the former set is very large it is also likely that T is large. Heuristic analysis asserts that for a ``randomly chosen map'' f (i. e. a ``random'' element of Hom(S, S)) M and T are bounded by the condition M + T $\leq$ $\sqrt{\char93 (S)}$ for #(S) large.

Randomly chosen maps may not be good for us since we need the map to have the form (f₁, f₂). In the case when S = $\mathbb {Z}$ /N $\mathbb {Z}$ this condition f = (f₁, f₂) can be easily ensured by taking f to be a polynomial map (Chinese Remainder Theorem once again!). However, every map on S is a polynomial map when N is prime so we can expect that polynomial maps are adequate for our purposes.

Now let us see how we can determine M and T (actually we are looking for T₁ but that aspect has been explained earlier so we will ignore it here). Clearly, storing all the iterates x_k and comparing them until we find a match is impractical when M and T are large.

The original method suggested by Pollard and Floyd was as follows. Let us compute also the sequence y₀ = x₀ and y_{k + 1} = f (f (y_k)). It is clear, by induction that y_k = x_2k. Thus, if k is a multiple of T, we will get y_k = x_k. By checking for this identity at each iterative step we can find a multiple of T. Of course, because of the transient M it is unlikely that we will find T, but if M and T are of comparable size then we will find a small multiple of T this way. Another problem with this approach is that we need to compute the function f three times at each iterate and that can sometimes be considered expensive.

Another approach was suggested by Brent. Let us first try to look for the ``size'' of M and T. Thus, if M and T have n bits, then we should find a repetition for k = 2ⁿ - 1 and k + T = 2ⁿ + T - 1, the latter being less than 2^{n + 1} - 1. Thus, we store y_n = x_{2ⁿ - 1} and compare it with x_k when k lies between 2ⁿ and 2^{n + 1} - 2. It is clear how we can iterate over this procedure. This procedure has only one computation of f for each iteration. On the other hand, we are over-estimating M by a (worst-case) factor of about 2, which means we are making twice as many tests as in the Pollard-Floyd method. Clearly, the choice between the two methods depends on which is more time consuming--function computation or comparison.

A further improvement to the Brent method is possible if we note that when we are checking for repetitions between k = 2ⁿ - 1 and some k between 2ⁿ and 2^{n + 1} - 2, we have already checked for periods of size n - 1 bits (ignoring M for the moment). Thus we can start making comparisons only after we cross the half-way mark 2ⁿ + 2^{n - 1} - 1. Because of M (transients again!) we may actually not have checked the periods and so we will only obtain multiples of T if we do this; but we will have saved exactly half the comparisons in return!

This observation also fits in well with Pollard's idea of reducing the number of comparisons in his factorisation method as follows. Instead of computing gcd(x_k - y_n, N) at each iteration, he takes the product of x_k - y_n over (say) 10 iterations of k and computing GCD only in time in 10. This reduces the number of comparisons as well.

To make an algorithm we must choose algebraic self-maps f on $\mathbb {Z}$ /N $\mathbb {Z}$ . It is clear, that linear maps will have periods equal to the size of the prime factors so we may as well have used trial division. We take the next thing that comes to hand which is a map like f (x) = x² + 1 and hope it is a ``random enough'' choice! Powering maps like x $\mapsto$ x², are better studied via the (p - 1) method which we will see later--in particular, the periodicity of these maps has to do with a factorisation of (p - 1) when p is a factor of N. Thus, we will stick with quadratic maps and hope that this is good enough²; this is similar to the choice of ``small'' numbers as a base in the Miller-Rabin test with one crucial difference--the outcome of the Pollard $\rho$ will be a ``real'' factorisation, not a probabilistic one. Finally, if we are unsuccessful (in finding a factorisation) with a given f and x₀ we need to vary both and not just x₀.

After all that verbiage (which is used to disguise the fact that we are not really sure of the justifications!), let us come to the algorithm. Pick a small constant c other than 0 and 2 and consider the function f (x) = x² - c. Pick a point x₀ in $\mathbb {Z}$ /N $\mathbb {Z}$ (usually one of small size). Pick a small number s of steps (usually s = 20). Let y₀ = 0, e = 0 and P = 1 (e will count the number of bits in M and T). While k is between 2^e and 2^e + 2^{e - 1} - 1, we set x_{k + 1} = f (x_k). While k is between 2^e + 2^{e - 1} and 2^{e + 1} - 2, we set x_{k + 1} = f (x_k) and multiply P by (x_k - y_e). Every s steps we compute gcd(P, N). If this is greater than 1, then we have found a period (somewhere in the last s steps); otherwise we set P to be 1 again and continue. If we found a GCD, we set z₀ = y_e and iteratively compute (at most s times) z_{l + 1} = f (z_l) and gcd(x_k - z_l, N). We will find a non-trivial GCD for some l between 0 and s - 1. If this GCD is less than N then we have found a factor; else we have been unsuccessful, so we change c and x₀. If we found a factor a then we can continue, replacing N with N/a, starting with the given tuple (e, k, x_k, y_e); we need not start at the beginning since periods smaller than the one found have already been (essentially) excluded. Note that all arithmetic operations (except GCD and subscript arithmetic!) are to be done modulo N/a in this continuation.

Next: 5.2 Group theoretic method Up: 5 Factorisation and Certificates Previous: 5 Factorisation and Certificates

Kapil Hari Paranjape 2002-10-20