# Binary Extension Fields¶

This page compares the performance of galois performing finite field multiplication in $$\mathrm{GF}(2^m)$$ with native NumPy performing only modular multiplication.

Native NumPy cannot easily perform finite field multiplication in $$\mathrm{GF}(2^m)$$ because it involves polynomial multiplication (convolution) followed by reducing modulo the irreducible polynomial. To make a similar comparison, NumPy will perform integer multiplication followed by integer remainder division.

Important

Native NumPy is not computing the correct result! This is not a fair fight!

These are not fair comparisons because NumPy is not computing the correct product. However, they are included here to provide a performance reference point with native NumPy.

## Lookup table performance¶

This section tests galois when using the "jit-lookup" compilation mode. For finite fields with order less than or equal to $$2^{20}$$, galois uses lookup tables by default for efficient arithmetic.

Below are examples computing 10 million multiplications in the binary extension field $$\mathrm{GF}(2^8)$$.

In : import galois

In : GF = galois.GF(2**8)

In : GF.ufunc_mode
Out: 'jit-lookup'

In : a = GF.Random(10_000_000, seed=1, dtype=int)

In : b = GF.Random(10_000_000, seed=2, dtype=int)

# Invoke the ufunc once to JIT compile it, if necessary
In : a * b
Out: GF([181,  92, 148, ..., 255, 220, 153], order=2^8)

In : %timeit a * b
33.9 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


NumPy, even when computing the incorrect result, is ~1.9x slower than galois. This is because galois is using lookup tables instead of explicitly performing the polynomial multiplication and division.

In : import numpy as np

In : aa, bb = a.view(np.ndarray), b.view(np.ndarray)

In : pp = int(GF.irreducible_poly)

# This does not produce the correct result!
In : %timeit (aa * bb) % pp
64 ms ± 747 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Explicit calculation performance¶

This section tests galois when using the "jit-calculate" compilation mode. For finite fields with order greater than $$2^{20}$$, galois will use explicit arithmetic calculation by default rather than lookup tables.

Below are examples computing 10 million multiplications in the binary extension field $$\mathrm{GF}(2^{32})$$.

In : import galois

In : GF = galois.GF(2**32)

In : GF.ufunc_mode
Out: 'jit-calculate'

In : a = GF.Random(10_000_000, seed=1, dtype=int)

In : b = GF.Random(10_000_000, seed=2, dtype=int)

# Invoke the ufunc once to JIT compile it, if necessary
In : a * b
Out:
GF([1174047800, 3249326965, 3196014003, ..., 3195457330,  100242821,
338589759], order=2^32)

In : %timeit a * b
386 ms ± 14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


The galois library when using explicit calculation is only ~3.9x slower than native NumPy, which isn’t even computing the correct product.

In : import numpy as np

In : aa, bb = a.view(np.ndarray), b.view(np.ndarray)

In : pp = int(GF.irreducible_poly)

# This does not produce the correct result!
In : %timeit (aa * bb) % pp
100 ms ± 718 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Linear algebra performance¶

Linear algebra performance in extension fields is definitely slower than native NumPy. This is because, unlike with prime fields, it is not possible to use the BLAS/LAPACK implementations. Instead, entirely new JIT-compiled ufuncs are generated, which are not as optimized for parallelism or hardware acceleration as BLAS/LAPACK.

Below are examples computing the matrix multiplication of two $$100 \times 100$$ matrices in the binary extension field $$\mathrm{GF}(2^{32})$$.

In : import galois

In : GF = galois.GF(2**32)

In : GF.ufunc_mode
Out: 'jit-calculate'

In : A = GF.Random((100,100), seed=1, dtype=int)

In : B = GF.Random((100,100), seed=2, dtype=int)

# Invoke the ufunc once to JIT compile it, if necessary
In : A @ B
Out:
GF([[4203877556, 3977035749, 2623937858, ..., 3721257849, 4250999056,
4026271867],
[3120760606, 1017695431, 1111117124, ..., 1638387264, 2988805996,
1734614583],
[2508826906, 2800993411, 1720697782, ..., 3858180318, 2521070820,
3906771227],
...,
[ 624580545,  984724090, 3969931498, ..., 1692192269,  473079794,
1029376699],
[1232183301,  209395954, 2659712274, ..., 2967695343, 2747874320,
1249453570],
[3938433735,  828783569, 3286222384, ..., 3669775257,   33626526,
4278384359]], order=2^32)

In : %timeit A @ B
3.88 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


The galois library is about ~5.5x slower than native NumPy (which isn’t computing the correct product).

In : import numpy as np

In : AA, BB = A.view(np.ndarray), B.view(np.ndarray)

In : pp = int(GF.irreducible_poly)

# This does not produce the correct result!
In : %timeit (AA @ BB) % pp
703 µs ± 1.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Last update: Aug 06, 2023