# `Numba` Tutorial

In this tutorial, we will learn how to use numba to speed up python loops.

To intall conda, type `conda install numba`.

In [None]:
import numpy as np
import numba as na
from numba import jit, njit, prange, set_num_threads

### 1. The Numba's JIT decorators, `@jit`.

First, let's consider a nested loop in python.\

Nested loops are very common in any computational physics problems (i.e. the acceleration calculations in the n-body problem).

In [None]:
def native_python(N):
    value = 0
    for _ in range(N):
        for _ in range(N):
            # some physical calculations, such as acceleration. 
            value += np.tanh(123)
    return value

In [None]:
test_size = 3000

In [None]:
%timeit ans = native_python(N=test_size)

In [None]:
ans1 = native_python(N=test_size)
print(ans1)

the above function takes ~6.43 s with `N=3000` (measured by Kuo-Chuan's desktop computer).

In the above example, the calculation is simply adding np.tanh(123) N times. This is equivalent to

In [None]:
ans2 = np.sum(np.tanh(123)*np.ones(test_size*test_size))

In [None]:
print(ans1==ans2)

In [None]:
%timeit np.sum(np.tanh(123)*np.ones(test_size**2))

the same calculation takes only 19.5 ms with `np.sum()` (x330 speedup).

In eariler lecutres, we have learned that we should use `numpy` and `scipy` to avoid using loops in native python.\
However, it is possible that the calculations inside the for loops cannot find counter part calculations in `numpy` and `scipy` (or not straightforward). 

Numba's Just-in-time (JIT) decoraators is one good solution.


In [None]:
@jit(nopython=True)
def numba_jit(N):
    value = 0
    for _ in range(N):
        for _ in range(N):
            value += np.tanh(123)
    return value

In [None]:
ans3 = numba_jit(N=test_size)
print(ans3)
print(ans1==ans3)

In [None]:
%timeit ans = numba_jit(N=test_size)

With `jit`, it takes 6.47 ms now by just adding one line of code!
Note that the performance could be still a bottle neck when `test_size` is big.

In [None]:
%timeit ans = numba_jit(N=(test_size*10))

The calculation times increased with N^2.

We could actually further improve it with `njit` and `prange`.

In [None]:
@njit(parallel=True)
def numba_njit_parallel(N):
    value = 0
    for i in prange(N):
        for j in prange(N):
            value += np.tanh(123)
    return value

note that in the above example, we could not use `for _ in prange(N)`, becasue `_` is not recognitzed by numba in parallel computing. 

In [None]:
ans4 = numba_njit_parallel(N=test_size)
print(ans1==ans4)

In [None]:
set_num_threads(4)

In [None]:
%timeit ans = numba_njit_parallel(N=(test_size*10))

It took 161 ms with 4 threads (x4 speedup).

# Exercise

## Exercise 1: Use numba `jit` and `njit` to speedup the Pi calculation. 

Compare your solutions with `numpy`.

## Exercise 2: Speedup your N-body solver.

Now, move back to `2_nbody.ipynb`. Let's speed up our `nbody.py` solver with numba.