## Numba

### What is Numba?

a JIT compiler for Python that:
- generates optimized machine code using LLVM compiler
- integrates well with the Scientific Python stack (Numpy) 

<img align="right" src="https://i.ibb.co/C2qPRJY/img1.png" width="400" height="400">

<div style="text-align: left"> 

    Interpretation: 
        1. Compile to bytecode
        2. Interpret in a virtual machine
            Ex: Python, Java
    
    Ahead of time compilation:
        1. Compile to binary code
        2. Execute on hardware
            Ex: C, C++

</div>
    

   ### Just in time compilation
<img src="https://i.ibb.co/nmtzCfP/img2.png" width="400" height="400">


#### Array sum
The function below is a naive sum function

In [1]:
import numpy as np

def sum_array(arr):
    r, c = arr.shape
    
    mysum = 0
    for i in range(r):
        for j in range(c):
            mysum += arr[i, j]
            
    return mysum

In [2]:
arr = np.random.random((300, 300))

In [3]:
sum_array(arr)

45094.22268032044

In [4]:
plain = %timeit -o sum_array(arr)

18.1 ms ± 182 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [5]:
from numba import jit

### As a function call

In [6]:
sum_array_numba = jit()(sum_array)

In [7]:
jitted = %timeit -o sum_array_numba(arr)

81.7 µs ± 6.83 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [8]:
plain.best / jitted.best

131.7665637399548

### As a decorator

In [9]:
@jit
def sum_array(inp):
    r, c = arr.shape
    
    mysum = 0
    for i in range(r):
        for j in range(c):
            mysum += arr[i, j]
            
    return mysum


In [10]:
sum_array(arr)

45028.875783920965

In [11]:
%timeit sum_array(arr)

83.4 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


### Defining ufuncs using vectorize

* ufuncs: the universal function used for implement in all position

In [5]:
import math

def f(a, b):
    return math.sin(a**2) * math.exp(b)


In [13]:
f(1,1)

2.2873552871788423

In [9]:
a = np.ones((5,5))
b = np.ones((5,5))

In [10]:
from numba import vectorize

In [11]:
vec_f = vectorize()(f)

vec_f(a,b)

array([[2.28735529, 2.28735529, 2.28735529, 2.28735529, 2.28735529],
       [2.28735529, 2.28735529, 2.28735529, 2.28735529, 2.28735529],
       [2.28735529, 2.28735529, 2.28735529, 2.28735529, 2.28735529],
       [2.28735529, 2.28735529, 2.28735529, 2.28735529, 2.28735529],
       [2.28735529, 2.28735529, 2.28735529, 2.28735529, 2.28735529]])

#### How does it compare to just using NumPy? 

In [17]:
def numpy_f(a, b):
    return np.sin(a**2) * np.exp(b)


In [18]:
a = np.random.random((1000, 1000))
b = np.random.random((1000, 1000))

In [19]:
%timeit f_a_b = [[math.sin(1) * math.exp(1)] * 5 for _ in range(5)]

1.56 µs ± 59.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [20]:
%timeit vec_f(a, b)

14.6 ms ± 834 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [None]:
%timeit numpy_f(a, b)

What happens if we specify a signature?

In [21]:
vec_f = vectorize('float64(float64, float64)')(f)

In [22]:
%timeit vec_f(a, b)

14.1 ms ± 314 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [23]:
vec_f = vectorize('float64(float64, float64)', target='parallel')(f)

In [24]:
%timeit vec_f(a, b)

3.7 ms ± 76.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
