GF(2^m) speed tests

[1]:
import numpy as np
import galois
[2]:
GF = galois.GF(2**13)
print(GF.properties)
GF(2^13):
  characteristic: 2
  degree: 13
  order: 8192
  irreducible_poly: Poly(x^13 + x^4 + x^3 + x + 1, GF(2))
  is_primitive_poly: True
  primitive_element: GF(2, order=2^13)
  dtypes: ['uint16', 'uint32', 'int16', 'int32', 'int64']
  ufunc_mode: 'jit-lookup'
  ufunc_target: 'cpu'
[3]:
modes = GF.ufunc_modes
targets = GF.ufunc_targets
targets.remove("cuda")  # Can't test with a GPU on my machine
print(modes)
print(targets)
['jit-lookup', 'jit-calculate']
['cpu', 'parallel']
[4]:
def speed_test(GF, N):
    a = GF.Random(N)
    b = GF.Random(N, low=1)

    for operation in [np.add, np.multiply]:
        print(f"Operation: {operation.__name__}")
        for target in targets:
            for mode in modes:
                GF.compile(mode, target)
                print(f"Target: {target}, Mode: {mode}", end="\n    ")
                %timeit operation(a, b)
        print()

    for operation in [np.reciprocal, np.log]:
        print(f"Operation: {operation.__name__}")
        for target in targets:
            for mode in modes:
                GF.compile(mode, target)
                print(f"Target: {target}, Mode: {mode}", end="\n    ")
                %timeit operation(b)
        print()

N = 10k

[5]:
speed_test(GF, 10_000)
Operation: add
Target: cpu, Mode: jit-lookup
    188 µs ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Target: cpu, Mode: jit-calculate
    95.6 µs ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: parallel, Mode: jit-lookup
    61.4 ms ± 4.82 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Target: parallel, Mode: jit-calculate
    61.8 ms ± 6.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Operation: multiply
Target: cpu, Mode: jit-lookup
    148 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: cpu, Mode: jit-calculate
    646 µs ± 73.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Target: parallel, Mode: jit-lookup
    59.4 ms ± 4.54 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Target: parallel, Mode: jit-calculate
    59.9 ms ± 4.29 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Operation: reciprocal
Target: cpu, Mode: jit-lookup
    113 µs ± 7.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: cpu, Mode: jit-calculate
    9.86 ms ± 1.3 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
Target: parallel, Mode: jit-lookup
    The slowest run took 189.29 times longer than the fastest. This could mean that an intermediate result is being cached.
4.88 ms ± 6.96 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Target: parallel, Mode: jit-calculate
    59.4 ms ± 4.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Operation: log
Target: cpu, Mode: jit-lookup
    138 µs ± 19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Target: cpu, Mode: jit-calculate
    63.1 ms ± 3.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Target: parallel, Mode: jit-lookup
    58.2 ms ± 1.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Target: parallel, Mode: jit-calculate
    69.4 ms ± 2.95 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)