# Performance compared with native NumPy¶

To compare the performance of `galois` and native NumPy, we’ll use a prime field `GF(p)`. This is because it is the simplest field. Namely, addition, subtraction, and multiplication are modulo `p`, which can be simply computed with NumPy arrays `(x + y) % p`. For extension fields `GF(p^m)`, the arithmetic is computed using polynomials over `GF(p)` and can’t be so tersely expressed in NumPy.

## Lookup performance¶

For fields with order less than or equal to `2^20`, `galois` uses lookup tables for efficiency. Here is an example of multiplying two arrays in `GF(31)` using native NumPy and `galois` with `ufunc_mode="jit-lookup"`.

```In : import numpy as np

In : import galois

In : GF = galois.GF(31)

In : GF.ufunc_mode
Out: 'jit-lookup'

In : a = GF.Random(10_000, dtype=int)

In : b = GF.Random(10_000, dtype=int)

In : %timeit a * b
79.7 µs ± 1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In : aa, bb = a.view(np.ndarray), b.view(np.ndarray)

# Equivalent calculation of a * b using native numpy implementation
In : %timeit (aa * bb) % GF.order
96.6 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

The `galois` ufunc runtime has a floor, however. This is due to a requirement to `view` the output array and convert its dtype with `astype()`. For example, for small array sizes NumPy is faster than `galois` because it doesn’t need to do these conversions.

```In : a = GF.Random(10, dtype=int)

In : b = GF.Random(10, dtype=int)

In : %timeit a * b
45.1 µs ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In : aa, bb = a.view(np.ndarray), b.view(np.ndarray)

# Equivalent calculation of a * b using native numpy implementation
In : %timeit (aa * bb) % GF.order
1.52 µs ± 34.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
```

However, for large N `galois` is strictly faster than NumPy.

```In : a = GF.Random(10_000_000, dtype=int)

In : b = GF.Random(10_000_000, dtype=int)

In : %timeit a * b
59.8 ms ± 1.64 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In : aa, bb = a.view(np.ndarray), b.view(np.ndarray)

# Equivalent calculation of a * b using native numpy implementation
In : %timeit (aa * bb) % GF.order
129 ms ± 8.01 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

## Calculation performance¶

For fields with order greater than `2^20`, `galois` will use explicit arithmetic calculation rather than lookup tables. Even in these cases, `galois` is faster than NumPy!

Here is an example multiplying two arrays in `GF(2097169)` using NumPy and `galois` with `ufunc_mode="jit-calculate"`.

```In : import numpy as np

In : import galois

In : GF = galois.GF(2097169)

In : GF.ufunc_mode
Out: 'jit-calculate'

In : a = GF.Random(10_000, dtype=int)

In : b = GF.Random(10_000, dtype=int)

In : %timeit a * b
68.2 µs ± 2.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In : aa, bb = a.view(np.ndarray), b.view(np.ndarray)

# Equivalent calculation of a * b using native numpy implementation
In : %timeit (aa * bb) % GF.order
93.4 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

And again, the runtime comparison with NumPy improves with large N because the time of viewing and type converting the output is small compared to the computation time. `galois` achieves better performance than NumPy because the multiplication and modulo operations are compiled together into one ufunc rather than two.

```In : a = GF.Random(10_000_000, dtype=int)

In : b = GF.Random(10_000_000, dtype=int)

In : %timeit a * b
51.2 ms ± 1.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In : aa, bb = a.view(np.ndarray), b.view(np.ndarray)

# Equivalent calculation of a * b using native numpy implementation
In : %timeit (aa * bb) % GF.order
111 ms ± 1.48 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

## Linear algebra performance¶

Linear algebra over Galois fields is highly optimized. For prime fields `GF(p)`, the performance is comparable to the native NumPy implementation (using BLAS/LAPACK).

```In : import numpy as np

In : import galois

In : GF = galois.GF(31)

In : A = GF.Random((100,100), dtype=int)

In : B = GF.Random((100,100), dtype=int)

In : %timeit A @ B
720 µs ± 5.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In : AA, BB = A.view(np.ndarray), B.view(np.ndarray)

# Equivalent calculation of A @ B using the native numpy implementation
In : %timeit (AA @ BB) % GF.order
777 µs ± 4.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

For extension fields `GF(p^m)`, the performance of `galois` is close to native NumPy linear algebra (about 10x slower). However, for extension fields, each multiplication operation is equivalently a convolution (polynomial multiplication) of two `m`-length arrays and polynomial remainder division with the irreducible polynomial. So it’s not an apples-to-apples comparison.

Below is a comparison of `galois` computing the correct matrix multiplication over `GF(2^8)` and NumPy computing a normal integer matrix multiplication (which is not the correct result!). This comparison is just for a performance reference.

```In : import numpy as np

In : import galois

In : GF = galois.GF(2**8)

In : A = GF.Random((100,100), dtype=int)

In : B = GF.Random((100,100), dtype=int)

In : %timeit A @ B
7.13 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In : AA, BB = A.view(np.ndarray), B.view(np.ndarray)

# Native numpy matrix multiplication, which doesn't produce the correct result!!
In : %timeit AA @ BB
651 µs ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```