# Using numba
It is difficult to avoid using loops when doing multi-body simulations.  However loops in python or SSSLLLLOOOOOWWWWW.  To illustrate this, let's implement matrix multiplication as a python function.

In [1]:
import numpy as np
A = np.random.randn(100,100)
B = np.random.randn(100,100)

def matmat(A,B):
    C = np.zeros_like(A)
    n = A.shape[0]
    for i in range(n):
        for j in range(n):
            for k in range(n):
                C[i,k] += A[i,j]*B[j,k]
    return C


We can time it with the special ipython command %timeit

In [6]:
%timeit matmat(A,B)

968 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Let's keep that number in mind.  Now let's try using the numpy built-in for matrix multiplication

In [10]:
%timeit np.dot(A,B)

104 µs ± 17.4 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


That is 10000 times faster.  This is bad.  Now let's try using numba to compile our loop.

In [14]:
import numba
@numba.jit(nopython=True)
def matmat_numba(A,B):
    C = np.zeros_like(A)
    n = A.shape[0]
    for i in range(n):
        for j in range(n):
            for k in range(n):
                C[i,k] += A[i,j]*B[j,k]
    return C

In [15]:
%timeit matmat_numba(A,B)

2.32 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


Still slower than the optimized C++ that numpy is running under the hood (plus numpy is using Strassen's algorithm or somthing similar to reduce the complexity), but still 500 times faster than uncompiled python.  

You should employ this whenever you can!  For a little bit more depth, check out the numba tutorial: https://numba.pydata.org/numba-doc/latest/user/5minguide.html