## Lecture 6  Part 1
## Introduction to Numba
### March 8, 2021

## Reminder

**Before next week**: send me email with 5 members of your project group!

**Next week**: project proposal presentations (likely remote only), 5 min for each group (+ a few min for questions)

Part of this lecture is based on the material: 

see [https://nyu-cds.github.io/python-numba/](https://nyu-cds.github.io/python-numba/)

You will need the Numba package for this lecture (anaconda already installed it): 

[https://anaconda.org/numba/numba](https://anaconda.org/numba/numba)

----
Numba provides the ability to speed up applications with high performance functions written directly in Python, rather than using language extensions such as Cython.

Numba allows the compilation of selected portions of pure Python code to native code, and generates optimized machine code.

With a few simple annotations, array-oriented and math-heavy Python code can be **just-in-time (JIT)** optimized to achieve performance similar to C and C++, without having to switch languages or Python interpreters.

Numba’s main features are:

- On-the-fly code generation (at import time or runtime, at the user’s preference)  
- Native code generation for the CPU (default) and GPU hardware  
- Integration with the Python scientific software stack (thanks to NumPy)  



Numba’s central feature is the **numba.jit()** decorator (take a moment to recap function decoration we learned before), which marks a function for optimization by Numba’s JIT compiler.

Lets start with a simple example:

In [None]:
import numpy as np

# an array of 1000 floats 0.0 to 9.99
original_array = np.arange(0.0, 10.0, 0.01, dtype='float')

shuffled_array = original_array.copy()
np.random.shuffle(shuffled_array)

sorted_array = shuffled_array.copy()

In [None]:
# bubblesort as pure python code

def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

In [None]:
%timeit -n 10 sorted_array[:] = shuffled_array[:]; bubblesort(sorted_array)

print("length:", len(original_array))
print("original: ", original_array[:10])
print("shuffled: ", shuffled_array[:10])
print("sorted:   ", sorted_array[:10])

Now we simply use the **@jit** decorator and let Numba decide when and how to optimize:

In [None]:
from numba import jit

@jit
def bubblesort_numba(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

In [None]:
%timeit -n 10 sorted_array[:] = shuffled_array[:]; bubblesort_numba(sorted_array)

### Function Signature

- It is also possible to specify the signature of the Numba function. A function signature describes **the types of the arguments** and **the return type of the function**. 

- This can produce **slightly** faster code as the compiler does not need to infer the types. 

- However the function is no longer able to accept other types. The specified types within @jit called the function _signature_.

In [None]:
from numba import jit, int32, float64

@jit(float64(int32, int32))
def f(x, y):
    # A somewhat trivial example
    return (x + y) / 3.14

In [None]:
f(2, 2)

In [None]:
from numba import jit, float64

@jit("void(float64[:])")
def bubblesort_numba_argtypes(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

In [None]:
%timeit -n 10 sorted_array[:] = shuffled_array[:]; bubblesort_numba_argtypes(sorted_array)

---
### Compilation Modes
Numba has two compilation modes: 
- nopython mode 
- object mode

**nopython mode**: 

Numba compiler generates code that does not access Python C API. This mode produces the highest performance code, but requires that the native types of all values in the function can be inferred.


**object mode**:

Numba compiler generates code that handles all values as Python objects and uses the Python C API to perform all operations on those objects. Code compiled in object mode will often run no faster than Python interpreted code. This mode is used when the type of some variables can not be inferred.


A typical approach is to force the **nopython** mode, triggering an error message when the mode is not possible.

In [None]:
from numba import jit, float64

@jit("void(float64[:])", nopython=True)
def bubblesort_nopython_flag(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

In [None]:
%timeit -n 10 sorted_array[:] = shuffled_array[:]; bubblesort_nopython_flag(sorted_array)

Notice that this code compiles cleanly. However, if we introduce an object whose type cannot be inferred an error message shows up.

In [None]:
from decimal import Decimal

@jit("void(float64[:])", nopython=False)
def bubblesort_nopython_flag(X):
    N = len(X)
    val = Decimal(100)  # just to force an error
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

In [None]:
%timeit -n 10 sorted_array[:] = shuffled_array[:]; bubblesort_nopython_flag(sorted_array)

### Calling other functions
Numba functions can call other Numba functions. Both functions must have the **@jit** decorator, otherwise the code will be much slower.

In [None]:
import numpy as np

original = np.arange(0.0, 10.0, 0.01, dtype='float')
shuffled = original.copy()
np.random.shuffle(shuffled)

sorted = shuffled.copy()

In [None]:
from numba import jit, float64

@jit("void(float64[:])", nopython=True)
def bubblesort_ff(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp
               
@jit("void(float64[:])",nopython=True)
def do_sort(sorted):
    bubblesort_ff(sorted)
    

In [None]:
%timeit -n 10 sorted[:]=shuffled[:]; do_sort(sorted)

---
###  NumPy Universal Functions ([ufunc](https://docs.scipy.org/doc/numpy-1.10.0/reference/ufuncs.html#universal-functions-ufunc))


---
- Examples of Numpy ufunc include add(), multiply(), and sin().  

- These functions **operate on ndarrays** in an **element-by-element** fashion, supporting array broadcasting, type casting, and several other standard features.

---
- Numba’s **@vectorize** decorator allows Python functions taking scalar input arguments to be used as **NumPy ufuncs**.   


- (Creating a traditional NumPy ufunc is not the most straightforward process and involves writing some C code. Numba makes this easy.)  


- Using the @vectorize decorator, Numba can compile a pure Python function into a ufunc that operates over NumPy arrays as fast as traditional ufuncs written in C.  

The @vectorize decorator has two modes of operation:

- **Eager**, or decoration-time, compilation. If you pass one or more type signatures to the decorator, you will be building a Numpy ufunc. We’re just going to consider eager compilation here.
- **Lazy**, or call-time, compilation. When not given any signatures, the decorator will give you a Numba dynamic universal function (DUFunc) that dynamically compiles a new kernel when called with a previously unsupported input type.  


**Using @vectorize, you write your function as operating over input scalars, rather than arrays. Numba will generate the surrounding loop (or kernel) allowing efficient iteration over the actual inputs.**   


In [None]:
import numpy as np
from numba import vectorize, int64

@vectorize([int64(int64, int64)])
def vec_add_vectorize(x, y):
    return x + y


In [None]:
a = np.arange(6, dtype=np.int64)
b = np.linspace(0, 10, 6, dtype=np.int64)

print("a    : ", a)
print("b    : ", b)

print("-" * 80)
print("a + a: ", vec_add_vectorize(a, a))
print("b + b: ", vec_add_vectorize(b, b))

In [None]:
@jit("int64[:](int64[:], int64[:])")
def vec_add_jit(x, y):
    return x + y

In [None]:
print(vec_add_jit(a, a))
print(vec_add_jit(b, b))

The difference between the **@vectorize** and **@jit** is that "vectorize" is creating a new function with the associated efficient for-loop, while "@jit" is using the Numpy function.

[http://numba.pydata.org/numba-doc/latest/user/vectorize.html](http://numba.pydata.org/numba-doc/latest/user/vectorize.html)