Performance compared with native NumPy¶
To compare the performance of galois
and native NumPy, we’ll use a prime field GF(p)
. This is because
it is the simplest field. Namely, addition, subtraction, and multiplication are modulo p
, which can
be simply computed with NumPy arrays (x + y) % p
. For extension fields GF(p^m)
, the arithmetic is
computed using polynomials over GF(p)
and can’t be so tersely expressed in NumPy.
Lookup performance¶
For fields with order less than or equal to 2^20
, galois
uses lookup tables for efficiency.
Here is an example of multiplying two arrays in GF(31)
using native NumPy and galois
with ufunc_mode="jit-lookup"
.
In [1]: import numpy as np
In [2]: import galois
In [3]: GF = galois.GF(31)
In [4]: GF.ufunc_mode
Out[4]: 'jit-lookup'
In [5]: a = GF.Random(10_000, dtype=int)
In [6]: b = GF.Random(10_000, dtype=int)
In [7]: %timeit a * b
79.7 µs ± 1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [8]: aa, bb = a.view(np.ndarray), b.view(np.ndarray)
# Equivalent calculation of a * b using native numpy implementation
In [9]: %timeit (aa * bb) % GF.order
96.6 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The galois
ufunc runtime has a floor, however. This is due to a requirement to view
the output
array and convert its dtype with astype()
. For example, for small array sizes NumPy is faster than
galois
because it doesn’t need to do these conversions.
In [4]: a = GF.Random(10, dtype=int)
In [5]: b = GF.Random(10, dtype=int)
In [6]: %timeit a * b
45.1 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [7]: aa, bb = a.view(np.ndarray), b.view(np.ndarray)
# Equivalent calculation of a * b using native numpy implementation
In [8]: %timeit (aa * bb) % GF.order
1.52 µs ± 34.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
However, for large N galois
is strictly faster than NumPy.
In [10]: a = GF.Random(10_000_000, dtype=int)
In [11]: b = GF.Random(10_000_000, dtype=int)
In [12]: %timeit a * b
59.8 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [13]: aa, bb = a.view(np.ndarray), b.view(np.ndarray)
# Equivalent calculation of a * b using native numpy implementation
In [14]: %timeit (aa * bb) % GF.order
129 ms ± 8.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Calculation performance¶
For fields with order greater than 2^20
, galois
will use explicit arithmetic calculation rather
than lookup tables. Even in these cases, galois
is faster than NumPy!
Here is an example multiplying two arrays in GF(2097169)
using NumPy and galois
with
ufunc_mode="jit-calculate"
.
In [1]: import numpy as np
In [2]: import galois
In [3]: GF = galois.GF(2097169)
In [4]: GF.ufunc_mode
Out[4]: 'jit-calculate'
In [5]: a = GF.Random(10_000, dtype=int)
In [6]: b = GF.Random(10_000, dtype=int)
In [7]: %timeit a * b
68.2 µs ± 2.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [8]: aa, bb = a.view(np.ndarray), b.view(np.ndarray)
# Equivalent calculation of a * b using native numpy implementation
In [9]: %timeit (aa * bb) % GF.order
93.4 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
And again, the runtime comparison with NumPy improves with large N because the time of viewing
and type converting the output is small compared to the computation time. galois
achieves better
performance than NumPy because the multiplication and modulo operations are compiled together into
one ufunc rather than two.
In [10]: a = GF.Random(10_000_000, dtype=int)
In [11]: b = GF.Random(10_000_000, dtype=int)
In [12]: %timeit a * b
51.2 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [13]: aa, bb = a.view(np.ndarray), b.view(np.ndarray)
# Equivalent calculation of a * b using native numpy implementation
In [14]: %timeit (aa * bb) % GF.order
111 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Linear algebra performance¶
Linear algebra over Galois fields is highly optimized. For prime fields GF(p)
, the performance is
comparable to the native NumPy implementation (using BLAS/LAPACK).
In [1]: import numpy as np
In [2]: import galois
In [3]: GF = galois.GF(31)
In [4]: A = GF.Random((100,100), dtype=int)
In [5]: B = GF.Random((100,100), dtype=int)
In [6]: %timeit A @ B
720 µs ± 5.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [7]: AA, BB = A.view(np.ndarray), B.view(np.ndarray)
# Equivalent calculation of A @ B using the native numpy implementation
In [8]: %timeit (AA @ BB) % GF.order
777 µs ± 4.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
For extension fields GF(p^m)
, the performance of galois
is close to native NumPy linear algebra
(about 10x slower). However, for extension fields, each multiplication operation is equivalently
a convolution (polynomial multiplication) of two m
-length arrays and polynomial remainder division with the
irreducible polynomial. So it’s not an apples-to-apples comparison.
Below is a comparison of galois
computing the correct matrix multiplication over GF(2^8)
and NumPy
computing a normal integer matrix multiplication (which is not the correct result!). This
comparison is just for a performance reference.
In [1]: import numpy as np
In [2]: import galois
In [3]: GF = galois.GF(2**8)
In [4]: A = GF.Random((100,100), dtype=int)
In [5]: B = GF.Random((100,100), dtype=int)
In [6]: %timeit A @ B
7.13 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [7]: AA, BB = A.view(np.ndarray), B.view(np.ndarray)
# Native numpy matrix multiplication, which doesn't produce the correct result!!
In [8]: %timeit AA @ BB
651 µs ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)