GF(p) speed tests

[1]:
import numpy as np
import galois
[2]:
prime = galois.next_prime(8000)

GF = galois.GF(prime)
print(GF.properties)
GF(8009):
  characteristic: 8009
  degree: 1
  order: 8009
  irreducible_poly: Poly(x + 8006, GF(8009))
  is_primitive_poly: True
  primitive_element: GF(3, order=8009)
  dtypes: ['uint16', 'uint32', 'int16', 'int32', 'int64']
  ufunc_mode: 'jit-lookup'
  ufunc_target: 'cpu'
[3]:
modes = GF.ufunc_modes
targets = GF.ufunc_targets
targets.remove("cuda")  # Can't test with a GPU on my machine
print(modes)
print(targets)
['jit-lookup', 'jit-calculate']
['cpu', 'parallel']
[4]:
def speed_test(GF, N):
    a = GF.Random(N)
    b = GF.Random(N, low=1)

    for operation in [np.add, np.multiply]:
        print(f"Operation: {operation.__name__}")
        for target in targets:
            for mode in modes:
                GF.compile(mode, target)
                print(f"Target: {target}, Mode: {mode}", end="\n    ")
                %timeit operation(a, b)
        print()

    for operation in [np.reciprocal, np.log]:
        print(f"Operation: {operation.__name__}")
        for target in targets:
            for mode in modes:
                GF.compile(mode, target)
                print(f"Target: {target}, Mode: {mode}", end="\n    ")
                %timeit operation(b)
        print()

N = 10k

[5]:
speed_test(GF, 10_000)
Operation: add
Target: cpu, Mode: jit-lookup
    104 µs ± 1.57 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: cpu, Mode: jit-calculate
    71.3 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: parallel, Mode: jit-lookup
    The slowest run took 436.22 times longer than the fastest. This could mean that an intermediate result is being cached.
10.2 ms ± 18.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Target: parallel, Mode: jit-calculate
    The slowest run took 24.46 times longer than the fastest. This could mean that an intermediate result is being cached.
3.41 ms ± 2.38 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Operation: multiply
Target: cpu, Mode: jit-lookup
    93.1 µs ± 1.33 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: cpu, Mode: jit-calculate
    72.1 µs ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: parallel, Mode: jit-lookup
    163 µs ± 19.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: parallel, Mode: jit-calculate
    2.5 ms ± 680 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Operation: reciprocal
Target: cpu, Mode: jit-lookup
    67.3 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: cpu, Mode: jit-calculate
    6.01 ms ± 61.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Target: parallel, Mode: jit-lookup
    152 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: parallel, Mode: jit-calculate
    11.1 ms ± 1.05 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

Operation: log
Target: cpu, Mode: jit-lookup
    75.3 µs ± 242 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: cpu, Mode: jit-calculate
    149 ms ± 846 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Target: parallel, Mode: jit-lookup
    175 µs ± 16.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: parallel, Mode: jit-calculate
    56.2 ms ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)