### Numba

Numba is an alternative to Cython, that does a similar thing. With a twist: numba uses the LLVM tool-chain to compile python code straight to machine code, bypassing the c step altogether. For more details, go see [this talk](https://www.youtube.com/watch?v=QpaapVaL8Fw) by Travis Oliphant.

Our workhorse here will be the `jit` function. We'll also use numpy below

In [14]:
from numba import jit
import numpy

This function takes a function as input and returns a JIT-compiled version of that same function.

For our first example, we will reconsider the `fib` function from our Cython notebook:

In [15]:
def fib(n):
    a, b = 1, 1
    for i in range(n):
        a, b = a+b, a

    return a

Let's remind ourselves roughly what this is like:

The `jit` function is called over the function, producing a new function:

In [16]:
fib_numba = jit(fib)

In [17]:
time_fib = %timeit -o fib(10)
time_numba = %timeit -o fib_numba(10)
print(time_fib.best/time_numba.best)

1000000 loops, best of 3: 1.02 µs per loop
The slowest run took 220699.94 times longer than the fastest. This could mean that an intermediate result is being cached 
10000000 loops, best of 3: 166 ns per loop
6.152283773751462


That's good, but not nearly as good as the best case we saw in the cython notebook!

To consider situations in which jit really shines, lets think instead about a function that calculates the pairwise Euclidean distance between every two rows in a matrix:

In [18]:
def pdist_numpy(xs):
    return np.sqrt(((xs[:,None,:] - xs)**2).sum(-1))

In [19]:
pdist_numba = jit(pdist_numpy)

In [23]:
%load_ext Cython

In [26]:
%%cython
import numpy as cnp
def pdistx(xs):
    return cnp.sqrt(((xs[:,None,:] - xs)**2).sum(-1))

In [30]:
time_pdist_numpy = %timeit -o pdist_numpy(np.random.randn(5, 100))
time_pdist_numba = %timeit -o pdist_numba(np.random.randn(5, 100))
time_pdistx = %timeit -o pdistx(np.random.randn(5, 100))
print(time_pdist_numpy.best/time_pdist_numba.best)
print(time_pdist_numpy.best/time_pdistx.best)

The slowest run took 5.37 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 29.6 µs per loop
10000 loops, best of 3: 30.1 µs per loop
10000 loops, best of 3: 28.5 µs per loop
0.9829359330921231
1.0364290550227209


Consider instead writing the most naive python implementation you could think of:

In [32]:
def pdist_python(xs):
    n, p = xs.shape
    D = np.empty((n, n), dtype=np.float)
    for i in range(n):
        for j in range(n):
            s = 0.0
            for k in range(p):
                tmp = xs[i,k] - xs[j,k]
                s += tmp * tmp
            D[i, j] = s**0.5
    return D

In [33]:
time_pdist_python = %timeit -o pdist_python(np.random.randn(5, 100))
print(time_pdist_python.best/time_pdist_numpy.best)

1000 loops, best of 3: 1.18 ms per loop
39.92996463672706


In [34]:
pdist_python_numba = jit(pdist_python)

In [37]:
time_pdist_python_numba = %timeit -o pdist_python_numba(np.random.randn(5, 100))
print(time_pdist_numpy.best/time_pdist_python_numba.best)

The slowest run took 5.75 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 19.1 µs per loop
1.5457968737729024


In [39]:
from scipy.spatial.distance import cdist

In [41]:
X = np.random.randn(5, 100)

In [44]:
X = np.random.randn(5, 100)
time_cdist = %timeit -o cdist(X, X)
print(time_pdist_python_numba.best/time_cdist.best)

10000 loops, best of 3: 17.7 µs per loop
1.0827706972957536
