General number field sieve: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>Americanastronaut
Attempted to clarify the paragraph about choosing polynomials using expansions of n in different bases.
imported>Jakubisek
mNo edit summary
 
Line 32: Line 32:
== Method ==
== Method ==
{{Confusing|section|reason=there are no examples or pseudocode|date=May 2021}}
{{Confusing|section|reason=there are no examples or pseudocode|date=May 2021}}
=== Polynomial choice ===
Two [[polynomial]]s ''f''(''x'') and ''g''(''x'') of small [[degree of a polynomial|degrees]] ''d'' and ''e'' are chosen, which have integer coefficients, which are [[irreducible polynomial|irreducible]] over the [[rational number|rationals]], and which, when interpreted [[modular arithmetic|mod ''n'']], have a common integer [[root of a function|root]] ''m''. An optimal strategy for choosing these polynomials is not known; one simple method is to obtain ''f'' from the [[radix|base-''m'']] expansion of ''n'' for an appropriate choice of ''m''. More precisely: for any choice of ''m'',  
Two [[polynomial]]s ''f''(''x'') and ''g''(''x'') of small [[degree of a polynomial|degrees]] ''d'' and ''e'' are chosen, which have integer coefficients, which are [[irreducible polynomial|irreducible]] over the [[rational number|rationals]], and which, when interpreted [[modular arithmetic|mod ''n'']], have a common integer [[root of a function|root]] ''m''. An optimal strategy for choosing these polynomials is not known; one simple method is to obtain ''f'' from the [[radix|base-''m'']] expansion of ''n'' for an appropriate choice of ''m''. More precisely: for any choice of ''m'',  
writing ''n'' in base ''m'' is, by definition, finding digits <math display=inline> a_0, a_1, \ldots, a_d </math> where <math display=inline> 0 \leq a_i < m </math> for each ''i'', such that  
writing ''n'' in base ''m'' is, by definition, finding digits <math display=inline> a_0, a_1, \ldots, a_d </math> where <math display=inline> 0 \leq a_i < m </math> for each ''i'', such that  
Line 39: Line 41:
which in turn means that ''m'' is a root of the polynomial <math display=inline> f(x) = a_dx^d + \cdots + a_1x + a_0 </math> modulo ''n''. For the purposes of the general number field sieve, we first fix an appropriate degree ''d'' and then perform the above expansion for a number of values ''m'' of order ''n''<sup>1/''d''</sup>, after which we choose the polynomial ''f'' to be the one whose coefficients are overall the smallest among the candidates obtained in this way. We then simply set <math display=inline> g(x) = x - m </math>.
which in turn means that ''m'' is a root of the polynomial <math display=inline> f(x) = a_dx^d + \cdots + a_1x + a_0 </math> modulo ''n''. For the purposes of the general number field sieve, we first fix an appropriate degree ''d'' and then perform the above expansion for a number of values ''m'' of order ''n''<sup>1/''d''</sup>, after which we choose the polynomial ''f'' to be the one whose coefficients are overall the smallest among the candidates obtained in this way. We then simply set <math display=inline> g(x) = x - m </math>.


Consider the number field rings '''Z'''[''r''<sub>1</sub>] and '''Z'''[''r''<sub>2</sub>], where ''r''<sub>1</sub> and ''r''<sub>2</sub> are roots of the polynomials ''f'' and ''g''. Since ''f'' is of degree ''d'' with integer coefficients, if ''a'' and ''b'' are integers, then so will be ''b''<sup>''d''</sup>·''f''(''a''/''b''), which we call ''r''. Similarly, ''s'' = ''b''<sup>''e''</sup>·''g''(''a''/''b'') is an integer. The goal is to find integer values of ''a'' and ''b'' that simultaneously make ''r'' and ''s'' [[smooth number|smooth]] relative to the chosen basis of primes. If ''a'' and ''b'' are small, then ''r'' and ''s'' will be small too, about the size of ''m'', and we have a better chance for them to be smooth at the same time.  The current best-known approach for this search is [[lattice sieving]]; to get acceptable yields, it is necessary to use a large factor base.
==== Improving polynomial choice ====
 
The choice of polynomial can dramatically affect the time to complete the remainder of the algorithm. The method of choosing polynomials based on the expansion of {{mvar|n}} in base {{mvar|m}} shown above is suboptimal in many practical situations, leading to the development of better methods.
 
One such method was suggested by Murphy and Brent;<ref>{{citation |first1=B. |last1=Murphy |first2=R. P. |last2=Brent |title=On quadratic polynomials for the number field sieve |journal=Australian Computer Science Communications |volume=20 |date=1998 |pages=199–213 |url=http://maths-people.anu.edu.au/~brent/pub/pub178.html }}</ref> they introduce a two-part score for polynomials, based on the presence of roots modulo small primes and on the average value that the polynomial takes over the sieving area.
 
The best reported results<ref>{{citation | last=Franke |first=Jens |year=2006 |title=On RSA 200 and larger projects |url=http://www.hyperelliptic.org/tanja/SHARCS/talks06/Jens_Franke.pdf }}</ref> were achieved by the method of [[Thorsten Kleinjung]],<ref>{{cite journal | last=Kleinjung |first=Thorsten |date=October 2006 |title=On polynomial selection for the general number field sieve |journal=Mathematics of Computation |volume=75 |pages=2037–2047 |url=https://www.ams.org/mcom/2006-75-256/S0025-5718-06-01870-9/S0025-5718-06-01870-9.pdf |access-date=2007-12-13 |doi=10.1090/S0025-5718-06-01870-9 |issue=256|bibcode=2006MaCom..75.2037K |doi-access=free }}</ref> which allows {{math|''g''(''x'') {{=}} ''ax''&nbsp;+&nbsp;''b''}}, and searches over {{mvar|a}} composed of small prime factors congruent to 1 modulo 2{{math|''d''}} and over leading coefficients of {{mvar|f}} which are divisible by 60.
 
=== Generation of relation pairs (sieving) ===
 
Consider the number field rings '''Z'''[''r''<sub>1</sub>] and '''Z'''[''r''<sub>2</sub>], where ''r''<sub>1</sub> and ''r''<sub>2</sub> are roots of the polynomials ''f'' and ''g''. Since ''f'' is of degree ''d'' with integer coefficients, if ''a'' and ''b'' are integers, then so will be ''b''<sup>''d''</sup>·''f''(''a''/''b''), which we call ''r''. Similarly, ''s'' = ''b''<sup>''e''</sup>·''g''(''a''/''b'') is an integer. The goal is to find integer values of ''a'' and ''b'' that simultaneously make ''r'' and ''s'' [[smooth number|smooth]] relative to the chosen basis of primes. If ''a'' and ''b'' are small, then ''r'' and ''s'' will be small too, about the size of ''m'', and we have a better chance for them to be smooth at the same time.  The current best-known approach for this search is [[lattice sieving]]; to get acceptable yields, it is necessary to use a large factor base. These pairs are also called "relations".<ref name=msieve_readme_nfs>{{cite web |title=readme.nfs from msieve |url=https://escatter11.fullerton.edu/nfs/forum_thread.php?id=549}}</ref>
 
=== Postprocessing ===


Having enough such pairs, using [[Gaussian elimination]], one can get products of certain ''r'' and of the corresponding ''s'' to be squares at the same time. A slightly stronger condition is needed—that they are [[field norm|norms]] of squares in our number fields, but that condition can be achieved by this method too.  Each ''r'' is a norm of ''a''&nbsp;−&nbsp;''r''<sub>1</sub>''b'' and hence that the product of the corresponding factors ''a''&nbsp;−&nbsp;''r''<sub>1</sub>''b'' is a square in '''Z'''[''r''<sub>1</sub>], with a "square root" which can be determined (as a product of known factors in '''Z'''[''r''<sub>1</sub>])—it will typically be represented as an irrational [[algebraic number]].  Similarly, the product of the factors ''a''&nbsp;−&nbsp;''r''<sub>2</sub>''b'' is a square in '''Z'''[''r''<sub>2</sub>], with a "square root" which also can be computed.  It should be remarked that the use of Gaussian elimination does not give the optimal run time of the algorithm.  Instead, sparse matrix solving algorithms such as [[Block Lanczos algorithm for nullspace of a matrix over a finite field|Block Lanczos]] or [[Block Wiedemann algorithm|Block Wiedemann]] are used.  
Having enough such pairs, using [[Gaussian elimination]], one can get products of certain ''r'' and of the corresponding ''s'' to be squares at the same time. A slightly stronger condition is needed—that they are [[field norm|norms]] of squares in our number fields, but that condition can be achieved by this method too.  Each ''r'' is a norm of ''a''&nbsp;−&nbsp;''r''<sub>1</sub>''b'' and hence that the product of the corresponding factors ''a''&nbsp;−&nbsp;''r''<sub>1</sub>''b'' is a square in '''Z'''[''r''<sub>1</sub>], with a "square root" which can be determined (as a product of known factors in '''Z'''[''r''<sub>1</sub>])—it will typically be represented as an irrational [[algebraic number]].  Similarly, the product of the factors ''a''&nbsp;−&nbsp;''r''<sub>2</sub>''b'' is a square in '''Z'''[''r''<sub>2</sub>], with a "square root" which also can be computed.  It should be remarked that the use of Gaussian elimination does not give the optimal run time of the algorithm.  Instead, sparse matrix solving algorithms such as [[Block Lanczos algorithm for nullspace of a matrix over a finite field|Block Lanczos]] or [[Block Wiedemann algorithm|Block Wiedemann]] are used.  
Line 45: Line 59:
Since ''m'' is a root of both ''f'' and ''g'' mod ''n'', there are [[homomorphism]]s from the rings '''Z'''[''r''<sub>1</sub>] and '''Z'''[''r''<sub>2</sub>] to the ring '''Z'''/''n'''''Z''' (the integers [[Modular arithmetic|modulo ''n'']]), which map ''r''<sub>1</sub> and ''r''<sub>2</sub> to ''m'', and these homomorphisms will map each "square root" (typically not represented as a rational number) into its integer representative. Now the product of the factors ''a''&nbsp;−&nbsp;''mb'' mod ''n'' can be obtained as a square in two ways—one for each homomorphism. Thus, one can find two numbers ''x'' and ''y'', with ''x''<sup>2</sup>&nbsp;−&nbsp;''y''<sup>2</sup> divisible by ''n'' and again with probability at least one half we get a factor of ''n'' by finding the [[greatest common divisor]] of ''n'' and ''x''&nbsp;−&nbsp;''y''.
Since ''m'' is a root of both ''f'' and ''g'' mod ''n'', there are [[homomorphism]]s from the rings '''Z'''[''r''<sub>1</sub>] and '''Z'''[''r''<sub>2</sub>] to the ring '''Z'''/''n'''''Z''' (the integers [[Modular arithmetic|modulo ''n'']]), which map ''r''<sub>1</sub> and ''r''<sub>2</sub> to ''m'', and these homomorphisms will map each "square root" (typically not represented as a rational number) into its integer representative. Now the product of the factors ''a''&nbsp;−&nbsp;''mb'' mod ''n'' can be obtained as a square in two ways—one for each homomorphism. Thus, one can find two numbers ''x'' and ''y'', with ''x''<sup>2</sup>&nbsp;−&nbsp;''y''<sup>2</sup> divisible by ''n'' and again with probability at least one half we get a factor of ''n'' by finding the [[greatest common divisor]] of ''n'' and ''x''&nbsp;−&nbsp;''y''.


== Improving polynomial choice ==
The post-processing step involves large amounts of data generated from the two proceeding steps. It is practically done by breaking it down into three phases:<ref name=msieve_readme_nfs/>
# Filtering, which looks through the relations to find a collection that is sufficient for constructing a matrix. It is possible for this step to fail if not enough relations are given and it is hard to estimate how many relations are required beforehand. Using more relations than needed tends to make the later steps easier by producing a smaller matrix.
# Linear algebra, which looks for a group of vectors that lie in the nullspace of the very large matrix produced in the preceding step.
# Square root, which can be done over any solution from the last step.


The choice of polynomial can dramatically affect the time to complete the remainder of the algorithm. The method of choosing polynomials based on the expansion of {{mvar|n}} in base {{mvar|m}} shown above is suboptimal in many practical situations, leading to the development of better methods.
=== Distributed computing ===
GNFS involves large amounts of computing and requires some form of [[distributed computing]] to complete in practical timelines given large numbers such as [[RSA-768]]. The following steps can be done in parallel:<ref name=rsa768>{{cite web |title=We are pleased to announce the factorization of RSA768, the following 768-bit, 232-digit number from RSA's challenge list: |url=https://caramba.loria.fr/rsa768.txt}}</ref>
* Sieving, which could produce duplicate relations and requires massive amounts of computation
* Deduplication of relations, removal of singleton relations
* Linear algebra, which requires massive amounts of computation


One such method was suggested by Murphy and Brent;<ref>{{citation |first1=B. |last1=Murphy |first2=R. P. |last2=Brent |title=On quadratic polynomials for the number field sieve |journal=Australian Computer Science Communications |volume=20 |date=1998 |pages=199–213 |url=http://maths-people.anu.edu.au/~brent/pub/pub178.html }}</ref> they introduce a two-part score for polynomials, based on the presence of roots modulo small primes and on the average value that the polynomial takes over the sieving area.
The quadratic character test can be applied to the results of matrix solving to identify true solutions. In the case of RSA-768, 460 out of 512 were true. Eight of them were selected for the square step, producing 5 identical factorizations at the final step after a comparatively small amount of computation.<ref name=rsa768/>


The best reported results<ref>{{citation | last=Franke |first=Jens |year=2006 |title=On RSA 200 and larger projects |url=http://www.hyperelliptic.org/tanja/SHARCS/talks06/Jens_Franke.pdf }}</ref> were achieved by the method of [[Thorsten Kleinjung]],<ref>{{cite journal | last=Kleinjung |first=Thorsten |date=October 2006 |title=On polynomial selection for the general number field sieve |journal=Mathematics of Computation |volume=75 |pages=2037–2047 |url=https://www.ams.org/mcom/2006-75-256/S0025-5718-06-01870-9/S0025-5718-06-01870-9.pdf |access-date=2007-12-13 |doi=10.1090/S0025-5718-06-01870-9 |issue=256|bibcode=2006MaCom..75.2037K |doi-access=free }}</ref> which allows {{math|''g''(''x'') {{=}} ''ax''&nbsp;+&nbsp;''b''}}, and searches over {{mvar|a}} composed of small prime factors congruent to 1 modulo 2{{math|''d''}} and over leading coefficients of {{mvar|f}} which are divisible by 60.
A more in-depth description of a distributed effort can be found for RSA-240, DLP-240, and RSA-250, using CADO-NFS software. Full reproduction files are provided in a repository linked from the paper's appendix.<ref name=Boudot240>F. Boudot et al, [https://eprint.iacr.org/2020/697 "Comparing the difficulty of factorization and discrete logarithm: a 240-digit experiment,"] June 10, 2020.</ref>


== Implementations ==<!-- linked from [[NFSNet]] -->
== Implementations ==<!-- linked from [[NFSNet]] -->
Some implementations focus on a certain smaller class of numbers. These are known as [[special number field sieve]] techniques, such as used in the [[Cunningham project]].
Some implementations focus on a certain smaller class of numbers. These are known as [[special number field sieve]] (SNFS) techniques, such as used in the [[Cunningham project]].
 
Until 2007, the gold-standard<!-- mathematical term?? --> implementation was a suite of software developed and distributed by [[Centrum Wiskunde & Informatica|CWI]] in the Netherlands, which was available only under a relatively restrictive license.{{Citation needed|date=March 2017}} In 2007, [[Jason Papadopoulos]] developed a faster implementation of final processing as part of msieve, which is in the public domain.  Both implementations feature the ability to be distributed among several nodes in a cluster with a sufficiently fast interconnect.
 
* [http://www.math.ttu.edu/~cmonico/software/ggnfs/ GGNFS] (GNU GPL) by Chris Monico, last updated 2005. Can perform all steps. Handles about 160 digits for SNFS and 135 digits for GNFS. Includes these parts also under GNU GPL:
** pol5: Polynomial selection by Kleinjung 2005
** lasieve4: Lattice sieving by Franke and Kleinjung 2001&ndash;2004
* [https://sourceforge.net/projects/factor-by-gnfs/ factor by gnfs], C++ code by Chris DiBona and Chris Card, last updated 2024. GNU GPL.
* [http://cado-nfs.inria.fr/ CADO-NFS], C/C++ implementation of all steps by a large team at [[INRIA]]. Last updated 2025 (as of 2025). GNU LGPL.
* [https://sourceforge.net/projects/msieve/ msieve]. Contains final-processing code, a polynomial selection optimized for smaller numbers, and an implementation of the line sieve. Public domain.
* [http://kmgnfs.cti.gr kmGNFS]. Basic implementation of all steps by Christos Bakogiannis and Nikolaos Karapanos, last updated 2009.
* [https://github.com/bbuhrow/yafu YAFU]. Implementation of all steps by Ben Buhrow. Last updated 2025 (as of 2025). Public domain.
 
=== Volunteer computing ===
A project called NFSNET ran from 2002<ref>{{cite web |title= NFSNET: the first year |author= Paul Leyland |work= Presentation at EIDMA-CWI Workshop on Factoring Large Numbers |date= December 12, 2003 |url= http://homepages.cwi.nl/~herman/Leyland.ppt |access-date= August 9, 2011 }}</ref> through at least 2007. It used volunteer distributed computing on the [[Internet]].<ref>{{cite web |title= Welcome to NFSNET |date= April 23, 2007 |url-status= dead |url= http://www.nfsnet.org/ |archive-url= https://web.archive.org/web/20071022032617/http://www.nfsnet.org/ |archive-date= October 22, 2007 |access-date= August 9, 2011 }}</ref>
A project called NFSNET ran from 2002<ref>{{cite web |title= NFSNET: the first year |author= Paul Leyland |work= Presentation at EIDMA-CWI Workshop on Factoring Large Numbers |date= December 12, 2003 |url= http://homepages.cwi.nl/~herman/Leyland.ppt |access-date= August 9, 2011 }}</ref> through at least 2007. It used volunteer distributed computing on the [[Internet]].<ref>{{cite web |title= Welcome to NFSNET |date= April 23, 2007 |url-status= dead |url= http://www.nfsnet.org/ |archive-url= https://web.archive.org/web/20071022032617/http://www.nfsnet.org/ |archive-date= October 22, 2007 |access-date= August 9, 2011 }}</ref>
[[Paul Leyland]] of the [[United Kingdom]] and Richard Wackerbarth of Texas were involved.<ref>{{cite web |title=About NFSNET |url-status= dead |url= http://www.nfsnet.org/aboutus.html |archive-url= https://web.archive.org/web/20080509131653/http://www.nfsnet.org/aboutus.html |archive-date= May 9, 2008 |access-date= August 9, 2011 }}</ref>
[[Paul Leyland]] of the [[United Kingdom]] and Richard Wackerbarth of Texas were involved.<ref>{{cite web |title=About NFSNET |url-status= dead |url= http://www.nfsnet.org/aboutus.html |archive-url= https://web.archive.org/web/20080509131653/http://www.nfsnet.org/aboutus.html |archive-date= May 9, 2008 |access-date= August 9, 2011 }}</ref>


Until 2007, the gold-standard<!-- mathematical term?? --> implementation was a suite of software developed and distributed by [[Centrum Wiskunde & Informatica|CWI]] in the Netherlands, which was available only under a relatively restrictive license.{{Citation needed|date=March 2017}} In 2007, [[Jason Papadopoulos]] developed a faster implementation of final processing as part of msieve, which is in the public domain.  Both implementations feature the ability to be distributed among several nodes in a cluster with a sufficiently fast interconnect.
A newer attempt called [https://escatter11.fullerton.edu/nfs/ NFS@Home] is still running as of September 2025. It has historically used modified versions of msieve. It generally uses special number field sieves.
 
Polynomial selection is normally performed by [[GPL]] software written by Kleinjung, or by msieve, and lattice sieving by GPL software written by Franke and Kleinjung; these are distributed in GGNFS.
* [http://escatter11.fullerton.edu/nfs/ NFS@Home]
* [http://www.math.ttu.edu/~cmonico/software/ggnfs/ GGNFS]
* [https://sourceforge.net/projects/factor-by-gnfs/ factor by gnfs]
* [http://cado-nfs.inria.fr/ CADO-NFS]
* [http://sourceforge.net/projects/msieve/ msieve] (which contains final-processing code, a polynomial selection optimized for smaller numbers and an implementation of the line sieve)
* [http://kmgnfs.cti.gr kmGNFS]


== See also ==
== See also ==

Latest revision as of 15:28, 14 September 2025

Template:Short description In number theory, the general number field sieve (GNFS) is the most efficient classical algorithm known for factoring integers larger than [[googol|Template:Math]]. Heuristically, its complexity for factoring an integer Template:Mvar (consisting of Template:Math bits) is of the form

exp(((64/9)1/3+o(1))(logn)1/3(loglogn)2/3)=Ln[1/3,(64/9)1/3]

in O and L-notations.[1] It is a generalization of the special number field sieve: while the latter can only factor numbers of a certain special form, the general number field sieve can factor any number apart from prime powers (which are trivial to factor by taking roots).

The principle of the number field sieve (both special and general) can be understood as an improvement to the simpler rational sieve or quadratic sieve. When using such algorithms to factor a large number Template:Mvar, it is necessary to search for smooth numbers (i.e. numbers with small prime factors) of order Template:Math. The size of these values is exponential in the size of Template:Mvar (see below). The general number field sieve, on the other hand, manages to search for smooth numbers that are subexponential in the size of Template:Mvar. Since these numbers are smaller, they are more likely to be smooth than the numbers inspected in previous algorithms. This is the key to the efficiency of the number field sieve. In order to achieve this speed-up, the number field sieve has to perform computations and factorizations in number fields. This results in many rather complicated aspects of the algorithm, as compared to the simpler rational sieve.

The size of the input to the algorithm is Template:Math or the number of bits in the binary representation of Template:Mvar. Any element of the order Template:Math for a constant Template:Mvar is exponential in Template:Math. The running time of the number field sieve is super-polynomial but sub-exponential in the size of the input.

Number fields

Script error: No such module "Labelled list hatnote". Suppose Template:Mvar is a Template:Mvar-degree polynomial over (the rational numbers), and Template:Mvar is a complex root of Template:Mvar. Then, Template:Math, which can be rearranged to express Template:Math as a linear combination of powers of Template:Mvar less than Template:Mvar. This equation can be used to reduce away any powers of Template:Math with exponent Template:Math. For example, if Template:Math and Template:Mvar is the imaginary unit Template:Mvar, then Template:Math, or Template:Math. This allows us to define the complex product:

(a+bi)(c+di)=ac+(ad+bc)i+(bd)i2=(acbd)+(ad+bc)i.

In general, this leads directly to the algebraic number field [r], which can be defined as the set of complex numbers given by:

ak1rk1++a1r1+a0r0, where a0,,ak1.

The product of any two such values can be computed by taking the product as polynomials, then reducing any powers of Template:Math with exponent Template:Math as described above, yielding a value in the same form. To ensure that this field is actually Template:Mvar-dimensional and does not collapse to an even smaller field, it is sufficient that Template:Mvar is an irreducible polynomial over the rationals. Similarly, one may define the ring of integers 𝕆[r] as the subset of [r] which are roots of monic polynomials with integer coefficients. In some cases, this ring of integers is equivalent to the ring [r]. However, there are many exceptions.[2]

Method

Script error: No such module "Unsubst".

Polynomial choice

Two polynomials f(x) and g(x) of small degrees d and e are chosen, which have integer coefficients, which are irreducible over the rationals, and which, when interpreted mod n, have a common integer root m. An optimal strategy for choosing these polynomials is not known; one simple method is to obtain f from the base-m expansion of n for an appropriate choice of m. More precisely: for any choice of m, writing n in base m is, by definition, finding digits a0,a1,,ad where 0ai<m for each i, such that

n=admd++a1m+a0,

which in turn means that m is a root of the polynomial f(x)=adxd++a1x+a0 modulo n. For the purposes of the general number field sieve, we first fix an appropriate degree d and then perform the above expansion for a number of values m of order n1/d, after which we choose the polynomial f to be the one whose coefficients are overall the smallest among the candidates obtained in this way. We then simply set g(x)=xm.

Improving polynomial choice

The choice of polynomial can dramatically affect the time to complete the remainder of the algorithm. The method of choosing polynomials based on the expansion of Template:Mvar in base Template:Mvar shown above is suboptimal in many practical situations, leading to the development of better methods.

One such method was suggested by Murphy and Brent;[3] they introduce a two-part score for polynomials, based on the presence of roots modulo small primes and on the average value that the polynomial takes over the sieving area.

The best reported results[4] were achieved by the method of Thorsten Kleinjung,[5] which allows Template:Math, and searches over Template:Mvar composed of small prime factors congruent to 1 modulo 2Template:Math and over leading coefficients of Template:Mvar which are divisible by 60.

Generation of relation pairs (sieving)

Consider the number field rings Z[r1] and Z[r2], where r1 and r2 are roots of the polynomials f and g. Since f is of degree d with integer coefficients, if a and b are integers, then so will be bd·f(a/b), which we call r. Similarly, s = be·g(a/b) is an integer. The goal is to find integer values of a and b that simultaneously make r and s smooth relative to the chosen basis of primes. If a and b are small, then r and s will be small too, about the size of m, and we have a better chance for them to be smooth at the same time. The current best-known approach for this search is lattice sieving; to get acceptable yields, it is necessary to use a large factor base. These pairs are also called "relations".[6]

Postprocessing

Having enough such pairs, using Gaussian elimination, one can get products of certain r and of the corresponding s to be squares at the same time. A slightly stronger condition is needed—that they are norms of squares in our number fields, but that condition can be achieved by this method too. Each r is a norm of a − r1b and hence that the product of the corresponding factors a − r1b is a square in Z[r1], with a "square root" which can be determined (as a product of known factors in Z[r1])—it will typically be represented as an irrational algebraic number. Similarly, the product of the factors a − r2b is a square in Z[r2], with a "square root" which also can be computed. It should be remarked that the use of Gaussian elimination does not give the optimal run time of the algorithm. Instead, sparse matrix solving algorithms such as Block Lanczos or Block Wiedemann are used.

Since m is a root of both f and g mod n, there are homomorphisms from the rings Z[r1] and Z[r2] to the ring Z/nZ (the integers modulo n), which map r1 and r2 to m, and these homomorphisms will map each "square root" (typically not represented as a rational number) into its integer representative. Now the product of the factors a − mb mod n can be obtained as a square in two ways—one for each homomorphism. Thus, one can find two numbers x and y, with x2 − y2 divisible by n and again with probability at least one half we get a factor of n by finding the greatest common divisor of n and x − y.

The post-processing step involves large amounts of data generated from the two proceeding steps. It is practically done by breaking it down into three phases:[6]

  1. Filtering, which looks through the relations to find a collection that is sufficient for constructing a matrix. It is possible for this step to fail if not enough relations are given and it is hard to estimate how many relations are required beforehand. Using more relations than needed tends to make the later steps easier by producing a smaller matrix.
  2. Linear algebra, which looks for a group of vectors that lie in the nullspace of the very large matrix produced in the preceding step.
  3. Square root, which can be done over any solution from the last step.

Distributed computing

GNFS involves large amounts of computing and requires some form of distributed computing to complete in practical timelines given large numbers such as RSA-768. The following steps can be done in parallel:[7]

  • Sieving, which could produce duplicate relations and requires massive amounts of computation
  • Deduplication of relations, removal of singleton relations
  • Linear algebra, which requires massive amounts of computation

The quadratic character test can be applied to the results of matrix solving to identify true solutions. In the case of RSA-768, 460 out of 512 were true. Eight of them were selected for the square step, producing 5 identical factorizations at the final step after a comparatively small amount of computation.[7]

A more in-depth description of a distributed effort can be found for RSA-240, DLP-240, and RSA-250, using CADO-NFS software. Full reproduction files are provided in a repository linked from the paper's appendix.[8]

Implementations

Some implementations focus on a certain smaller class of numbers. These are known as special number field sieve (SNFS) techniques, such as used in the Cunningham project.

Until 2007, the gold-standard implementation was a suite of software developed and distributed by CWI in the Netherlands, which was available only under a relatively restrictive license.Script error: No such module "Unsubst". In 2007, Jason Papadopoulos developed a faster implementation of final processing as part of msieve, which is in the public domain. Both implementations feature the ability to be distributed among several nodes in a cluster with a sufficiently fast interconnect.

  • GGNFS (GNU GPL) by Chris Monico, last updated 2005. Can perform all steps. Handles about 160 digits for SNFS and 135 digits for GNFS. Includes these parts also under GNU GPL:
    • pol5: Polynomial selection by Kleinjung 2005
    • lasieve4: Lattice sieving by Franke and Kleinjung 2001–2004
  • factor by gnfs, C++ code by Chris DiBona and Chris Card, last updated 2024. GNU GPL.
  • CADO-NFS, C/C++ implementation of all steps by a large team at INRIA. Last updated 2025 (as of 2025). GNU LGPL.
  • msieve. Contains final-processing code, a polynomial selection optimized for smaller numbers, and an implementation of the line sieve. Public domain.
  • kmGNFS. Basic implementation of all steps by Christos Bakogiannis and Nikolaos Karapanos, last updated 2009.
  • YAFU. Implementation of all steps by Ben Buhrow. Last updated 2025 (as of 2025). Public domain.

Volunteer computing

A project called NFSNET ran from 2002[9] through at least 2007. It used volunteer distributed computing on the Internet.[10] Paul Leyland of the United Kingdom and Richard Wackerbarth of Texas were involved.[11]

A newer attempt called NFS@Home is still running as of September 2025. It has historically used modified versions of msieve. It generally uses special number field sieves.

See also

Notes

Template:Reflist

References

Template:Refbegin

  • Arjen K. Lenstra and H. W. Lenstra, Jr. (eds.). "The development of the number field sieve". Lecture Notes in Math. (1993) 1554. Springer-Verlag.
  • Richard Crandall and Carl Pomerance. Prime Numbers: A Computational Perspective (2001). 2nd edition, Springer. Template:ISBN. Section 6.2: Number field sieve, pp. 278–301.

Template:Refend

  • Matthew E. Briggs: An Introduction to the General Number Field Sieve, 1998

Template:Number theoretic algorithms

  1. Script error: No such module "citation/CS1".
  2. Script error: No such module "citation/CS1".
  3. Script error: No such module "citation/CS1".
  4. Script error: No such module "citation/CS1".
  5. Script error: No such module "Citation/CS1".
  6. a b Script error: No such module "citation/CS1".
  7. a b Script error: No such module "citation/CS1".
  8. F. Boudot et al, "Comparing the difficulty of factorization and discrete logarithm: a 240-digit experiment," June 10, 2020.
  9. Script error: No such module "citation/CS1".
  10. Script error: No such module "citation/CS1".
  11. Script error: No such module "citation/CS1".