# Numba Just-in-Time Compilation for Python

<br/>
<div align="center">17th of June, 2021</div>
<br/>
<div align="center">
Thomas Arildsen<br/>
<a href="mailto:tari@its.aau.dk">tari@its.aau.dk</a>
<div/>
<br/>
<div align="center">
CLAAUDIA<br/>
Aalborg University
</div>

# Additional Numba features

- We have seen in previous slides how to enable Numba JIT on Python functions.
- Here we introduce additional features from Numba that can further help writing high-performant Python code.

## Parallel execution

- The `@jit` option `parallel` enables Numba to attempt to perform the computations in parallel:

In [None]:
from numba import jit, njit, prange

@njit(parallel=False)
def sq_plus_const(X, Y):
    return X * Y

In [None]:
@njit(parallel=True)
def sq_plus_const_par(X, Y):
    return X * Y

In [None]:
import numpy as np

rng = np.random.default_rng()
A = rng.random((10000,10000))
B = rng.random((10000,10000))

In [None]:
%timeit sq_plus_const(A, B)

In [None]:
%timeit sq_plus_const_par(A, B)

## Numpy "ufuncs"

Numba provides a way of defining a special kind of functions that work like NumPy's ufuncs.
- Ufuncs are functions that define a scalar operation, but the function is automatically broadcast to operate on all the entries of an array given as input.
- On one hand, they may be simpler to specify where applicable, because it is not necessary to manually handle indexing into the arrays.
- On the other hand, we have to include a function signature that specifies the different types of input and output data the function can handle.

### Specifying "ufuncs"

These "ufunc-like" kernels are defined using Numba's `vectorize` decorator:

In [None]:
from numba import vectorize

@vectorize(['float32(float32, float32, float32)',
            'float64(float64, float64, float64)'])
def multiply_add(a, b, c):
    return a * b + c

Executing the "ufunc-like" kernel:

In [None]:
N = int(1e+4)
dtype = np.float16

# prepare the input
A = np.array(np.random.sample(N), dtype=dtype)
B = np.random.sample(N).astype(dtype=dtype)
C = np.array(np.random.sample(N), dtype=dtype)

In [None]:
B.dtype

In [None]:
D = multiply_add(A, B, C)
D

## Generalised "ufuncs"

Where the preceding "ufunc-like" functions specify a scalar operation that automatically gets broadcast across an array, Numba also provides a "generalised" version of such functions wherethe arguments can be arrays. Example borrowed from Numba's documentation:

In [None]:
from numba import guvectorize, float64

@guvectorize([(float64[:], float64, float64[:])], '(n),()->(n)')
def gufunc(x, y, res):
    for i in range(x.shape[0]):
        res[i] = x[i] + y

Executing the generalised "ufunc-like" kernel:

In [None]:
a = np.arange(10000, dtype=np.float64)
%timeit gufunc(a, 2)

In [None]:
a

In [None]:
c = gufunc(a, 2)

In [None]:
c

Automatic broadcast:

In [None]:
a.shape = (100,100)
c = gufunc(a, 2)

In [None]:
a

In [None]:
c

## Stencil Kernels

Numba can also specify another kind of special function which is a so-called stencil kernel.
- May be familiar to some: in image processing - a blurring or filtering kernel.
- Enables you to define, for a given entry in an array, a function on a "neighborhood" of surrounding pixels.
- Defined by the `@stencil` decorator.

In [None]:
from numba import stencil

@stencil
def kernel(a):
    return a[0, 1] + a[1, 0] + a[0, -1] + a[-1, 0]

In [None]:
A = np.ones((10,10))
A

In [None]:
kernel(A)