Prime Fields

This page compares the performance of galois to native NumPy when performing finite field multiplication in \(\mathrm{GF}(p)\). Native NumPy can perform finite field multiplication in \(\mathrm{GF}(p)\) because prime fields are very simple. Multiplication is simply \(xy\ \textrm{mod}\ p\).

Lookup table performance

This section tests galois when using the "jit-lookup" compilation mode. For finite fields with order less than or equal to \(2^{20}\), galois uses lookup tables by default for efficient arithmetic.

Below are examples computing 10 million multiplications in the prime field \(\mathrm{GF}(31)\).

In [1]: import galois

In [2]: GF = galois.GF(31)

In [3]: GF.ufunc_mode
Out[3]: 'jit-lookup'

In [4]: a = GF.Random(10_000_000, seed=1, dtype=int)

In [5]: b = GF.Random(10_000_000, seed=2, dtype=int)

# Invoke the ufunc once to JIT compile it, if necessary
In [6]: a * b
Out[6]: GF([ 9, 27,  7, ..., 14, 21, 15], order=31)

In [7]: %timeit a * b
36 ms ± 1.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The equivalent operation using native NumPy ufuncs is ~1.8x slower.

In [8]: import numpy as np

In [9]: aa, bb = a.view(np.ndarray), b.view(np.ndarray)

In [10]: %timeit (aa * bb) % GF.order
65.3 ms ± 1.26 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Explicit calculation performance

This section tests galois when using the "jit-calculate" compilation mode. For finite fields with order greater than \(2^{20}\), galois will use explicit arithmetic calculation by default rather than lookup tables. Even in these cases, galois is faster than NumPy!

Below are examples computing 10 million multiplications in the prime field \(\mathrm{GF}(2097169)\).

In [1]: import galois

In [2]: GF = galois.GF(2097169)

In [3]: GF.ufunc_mode
Out[3]: 'jit-calculate'

In [4]: a = GF.Random(10_000_000, seed=1, dtype=int)

In [5]: b = GF.Random(10_000_000, seed=2, dtype=int)

# Invoke the ufunc once to JIT compile it, if necessary
In [6]: a * b
Out[6]: GF([1879104, 1566761,  967164, ...,  744769,  975853, 1142138], order=2097169)

In [7]: %timeit a * b
32.7 ms ± 1.44 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The equivalent operation using native NumPy ufuncs is ~2.5x slower.

In [8]: import numpy as np

In [9]: aa, bb = a.view(np.ndarray), b.view(np.ndarray)

In [10]: %timeit (aa * bb) % GF.order
78.8 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Runtime floor

The galois ufunc runtime has a floor, however. This is due to a requirement of the ufuncs to .view() the output array and convert its dtype with .astype(). Also the galois ufuncs must perform input verification that NumPy ufuncs don’t.

For example, for small array sizes, NumPy is faster than galois. This is true whether using lookup tables or explicit calculation.

In [1]: import galois

In [2]: GF = galois.GF(2097169)

In [3]: GF.ufunc_mode
Out[3]: 'jit-calculate'

In [4]: a = GF.Random(10, seed=1, dtype=int)

In [5]: b = GF.Random(10, seed=2, dtype=int)

# Invoke the ufunc once to JIT compile it, if necessary
In [6]: a * b
Out[6]:
GF([1879104, 1566761,  967164, 1403108,  100593,  595358,  852783,
    1035698, 1207498,  989189], order=2097169)

In [7]: %timeit a * b
7.62 µs ± 390 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The equivalent operation using native NumPy ufuncs is ~6x faster. However, in absolute terms, the difference is only ~6 µs.

In [8]: import numpy as np

In [9]: aa, bb = a.view(np.ndarray), b.view(np.ndarray)

In [10]: %timeit (aa * bb) % GF.order
1.29 µs ± 12.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Linear algebra performance

Linear algebra performance in prime fields is comparable to the native NumPy implementations, which use BLAS/LAPACK. This is because galois uses the native NumPy ufuncs when possible.

If overflow is prevented, dot products in \(\mathrm{GF}(p)\) can be computed by first computing the dot product in \(\mathbb{Z}\) and then reducing modulo \(p\). In this way, the efficient BLAS/LAPACK implementations are used to keep finite field linear algebra fast, whenever possible.

Below are examples computing the matrix multiplication of two \(100 \times 100\) matrices in the prime field \(\mathrm{GF}(2097169)\).

In [1]: import galois

In [2]: GF = galois.GF(2097169)

In [3]: GF.ufunc_mode
Out[3]: 'jit-calculate'

In [4]: A = GF.Random((100,100), seed=1, dtype=int)

In [5]: B = GF.Random((100,100), seed=2, dtype=int)

# Invoke the ufunc once to JIT compile it, if necessary
In [6]: A @ B
Out[6]:
GF([[1147163,   59466, 1841183, ...,  667877, 2084618,  799166],
    [ 306714, 1380503,  810935, ..., 1932687, 1690697,  329837],
    [ 325274,  575543, 1327001, ...,  167724,  422518,  696986],
    ...,
    [ 862992, 1143160,  588384, ...,  668891, 1285421, 1196448],
    [1026856, 1413416, 1844802, ...,   38844, 1643604,   10409],
    [ 401717,  329673,  860449, ..., 1551173, 1766877,  986430]],
order=2097169)

In [7]: %timeit A @ B
708 µs ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

The equivalent operation using native NumPy ufuncs is slightly faster. This is because galois has some internal overhead before invoking the same NumPy calculation.

In [8]: import numpy as np

In [9]: AA, BB = A.view(np.ndarray), B.view(np.ndarray)

In [10]: %timeit (AA @ BB) % GF.order
682 µs ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Last update: May 18, 2022