<center><h2><strong><font color="blue"> Advanced Programming for Data Science (APDS)</font></strong></h2></center>

<center><img alt="" src="images/covers/taudata-cover.jpg" style="height: 200px;" /></center>

<center><h2><strong><font color="blue">APDS-06: Just-In-Time (JIT) Compiler in Python</font></strong></h2></center>

<b><center><h3>(C) Taufik Sutanto</h3></center>
* .

# <center><font color="blue"> Outline </font></center>

* Interpreter VS Compiler
* Introduction to Just-In-Time (JIT) VS Ahead of Time (AoT) Compiler
* Introduction to Numba
* Python Decorator
* Numba autojit
* Numba jit
* Case Study

# <center><font color="blue"> Level of Abstraction in Computer Language </font></center>

<center><img alt="" src="images/Programming-Language-level-of-Abstraction.jpg" style="height: 480px;" /></center>

* Source: https://www.slideshare.net/slideshow/high-level-languages-representation/81223329#3

# <center><font color="blue"> Interpreter VS Compiler - 01 </font></center>

<center><img alt="" src="images/compiler-vs-interpreter.jpg" style="height: 480px;" /></center>

# <center><font color="blue"> Interpreter VS Compiler - 02 </font></center>

<center><img alt="" src="images/how-compiler-interpreter-works.jpg" style="height: 200px;" /></center>

# <center><font color="blue"> Introduction to Just-In-Time (JIT) VS Ahead of Time (AoT) </font></center>

**Just-in-time (JIT)** compilation is a dynamic compilation technique in which code is translated into machine code during program execution rather than beforehand, often converting bytecode into native instructions that run directly on the processor. A JIT compiler continuously analyzes the running program to identify code segments where the performance benefits of compiling or recompiling outweigh the compilation overhead. Combining elements of **ahead-of-time (AOT)** compilation and interpretation, JIT offers the speed of compiled code while maintaining the flexibility of interpretation, though it incurs both interpreter overhead and the additional cost of on-the-fly compilation. This approach enables adaptive optimizations such as dynamic recompilation and microarchitecture-specific performance improvements.

<center><img alt="" src="images/jit-vs-aot.jpg" style="height: 400px;" /></center>

# <center><font color="blue"> Introduction to Numba </font></center>

Numba is an open-source just-in-time (JIT) compiler that uses the LLVM compiler library to translate a subset of Python and NumPy code into highly optimized machine code at runtime. This enables Python programs—especially those involving numerical and scientific computing—to run at speeds comparable to low-level languages like C or Fortran, all while allowing developers to remain within the ease and flexibility of Python.

Numba provides several key advantages, including substantial speed improvements, simple integration through decorators, strong compatibility with NumPy, and support for parallel execution across multiple CPU cores. It also enables GPU acceleration through NVIDIA CUDA, allowing users to write high-performance GPU code directly in Python. Numba works on Windows, macOS, and Linux, supports x86 and x86_64 architectures, integrates with the latest NumPy versions, and runs on standard CPython.

* https://numba.pydata.org/
* https://numba.readthedocs.io/en/stable/index.html

<center><img alt="" src="images/numba.jpg" style="height: 200px;" /></center>

# <center><font color="blue"> Installing Numba </font></center>

* pip install numba
OR
* conda install numba

# <center><font color="blue"> Numba Limitation </font></center>


While Numba is a powerful tool, it’s important to understand its limitations to use it effectively:

* **Limited Scope**: Numba excels at accelerating numerical code that involves loops, mathematical operations, and array manipulations (especially with NumPy). It’s less effective for code that relies heavily on Python objects, string manipulations, or I/O operations.
* **Compilation Overhead**: The first time you call a Numba-decorated function, there’s a compilation step. This can introduce a slight delay, but subsequent calls are much faster.
* **Not Always Faster**: In some cases, Numba might not offer a significant speedup, or it might even be slower than pure Python due to the compilation overhead. It’s crucial to profile your code to determine if Numba is beneficial.
* **Reduced Flexibility**: While Numba supports many Python features, there are some restrictions, particularly with dynamic typing and advanced language constructs.

<center><img alt="" src="images/Numba-Limitations.jpg" style="height: 200px;" /></center>

# <center><font color="blue"> When to  Use Numba </font></center>

* **Numerical Code**: Numba shines when your code primarily deals with numerical calculations, especially with NumPy arrays.
* **Loop-Heavy Code**: Functions that have loops iterating over large datasets often see significant speedups with Numba.
* **Computational Bottlenecks**: If profiling reveals that a particular function is consuming a large portion of your program’s runtime, Numba might be a good candidate for optimization.

# <center><font color="blue"> When it is Not Recommended to  Use Numba </font></center>

* **I/O-Bound Code**: Numba won’t help much if your code’s bottleneck is input/output operations (e.g., reading/writing files, network communication).
* **Non-Numerical Code**: If your code mostly involves string manipulations, complex data structures, or Python object interactions, Numba might not be the right fit.
* **Small Functions**: For very small functions, the compilation overhead might outweigh the speed gains from Numba.

# <center><font color="blue"> Python Decorator </font></center>

> *Before we discuss Numba, we need to talk about Python Decorator Function*

A decorator in Python is a function that takes another function as input and extends or alters its behavior without modifying its source code. In other words, decorators act as wrappers around functions. This approach follows the DRY (Don't Repeat Yourself) principle by enabling code reuse in scenarios such as logging, access control, timing, and caching.

<center><img alt="" src="images/python-decorator-function.jpg" style="height: 400px;" /></center>

In [1]:
# Assume you have a function inside another function

def f1(func):
    def wrapper():
        print("Something is happening before the function is called.")
        func()
        print("Something is happening after the function is called.")
    return wrapper

def f2():
    print("Hi UIII Students!")

f3 = f1(f2)
f3()

Something is happening before the function is called.
Hi UIII Students!
Something is happening after the function is called.


# Alternatively We can simply write it as

In [4]:
@f1
def f2():
    print("Hi UIII Students!")

f2() # pay attention we are calling decorated f2 and NOT f3

Something is happening before the function is called.
Hi UIII Students!
Something is happening after the function is called.


# What if our function has parameter(s)?

## use args and kwargs

In [5]:
def f1(func):
    def wrapper(*args, **kwargs):
        print("Something is happening before the function is called.")
        func(*args, **kwargs)
        print("Something is happening after the function is called.")
    return wrapper

In [6]:
@f1
def f2(n):
    for i in range(n):
        print("Hi UIII Students!")

f2(3)

Something is happening before the function is called.
Hi UIII Students!
Hi UIII Students!
Hi UIII Students!
Something is happening after the function is called.


# Returning value(s) when using decorator

In [8]:
def f1(func):
    def wrapper(*args, **kwargs):
        print("Something is happening before the function is called.")
        x = func(*args, **kwargs)
        print("Something is happening after the function is called.")
        return x
    return wrapper

In [10]:
@f1
def f2(n):
    print("processing ...")
    s = 0
    for i in range(n):
        s = s+i
    return s

f2(5)

Something is happening before the function is called.
processing ...
Something is happening after the function is called.


10

# <center><font color="red"> Exercise </font></center>

create a decorator function that **wrap Pandas "read_csv" function** such that it add the following messages:
1. Initial message: "Loading Data ..."
2. ....The actual loading process ... 
3. Final message: "Data Loaded."

* Use any arbitrary CSV file.
* The wrapped function return the loaded dataframe from the csv

<center><img alt="" src="images/exercise-question.jpg" style="height: 250px;" /></center>

In [13]:
from pandas import read_csv

file = "data/price.csv"
df = read_csv(file)
df.head(1)

Unnamed: 0,Observation,Dist_Taxi,Dist_Market,Dist_Hospital,Carpet,Builtup,Parking,City_Category,Rainfall,House_Price
0,1,9796.0,5250.0,10703.0,1659.0,1961.0,Open,CAT B,530,6649000


In [17]:
def f1(func):
    def wrapper(*args, **kwargs):
        print("Loading Data ....")
        x = func(*args, **kwargs)
        print(x.head())
        print("Data Loaded.")
        return x
    return wrapper

In [18]:
@f1
def f2(file_):
    df = read_csv(file)
    return df

In [20]:
df = f2(file)


Loading Data ....
   Observation  Dist_Taxi  Dist_Market  Dist_Hospital  Carpet  Builtup  \
0            1     9796.0       5250.0        10703.0  1659.0   1961.0   
1            2     8294.0       8186.0        12694.0  1461.0   1752.0   
2            3    11001.0      14399.0        16991.0  1340.0   1609.0   
3            4     8301.0      11188.0        12289.0  1451.0   1748.0   
4            5    10510.0      12629.0        13921.0  1770.0   2111.0   

        Parking City_Category  Rainfall  House_Price  
0          Open         CAT B       530      6649000  
1  Not Provided         CAT B       210      3982000  
2  Not Provided         CAT A       720      5401000  
3       Covered         CAT B       620      5373000  
4  Not Provided         CAT B       450      4662000  
Data Loaded.


# <center><font color="blue"> Numba </font></center>

> *Now we are ready for Numba*

## Numba offers two compilation modes:

1. **Nopython Mode**, is the default compilation mode in Numba and aims to achieve the highest performance gains. When a function is compiled in nopython mode, Numba tries to generate machine code without relying on the Python runtime. In this mode, the function and its dependencies must be written in a subset of Python that can be fully compiled to machine code.
2. **Object mode**, on the other hand, provides more flexibility at the cost of potential performance optimizations. In this mode, Numba retains the full Python runtime semantics and falls back to using Python objects and runtime calls when necessary. Object mode is useful when working with code that cannot be fully compiled to machine code due to dynamic or unsupported Python features.

In [None]:
!pip install numba  --q

In [21]:
# Importing Some Python Modules
import warnings; warnings.simplefilter('ignore')
import pandas as pd, numpy as np, seaborn as sns
import matplotlib.pyplot as plt
import numba
plt.style.use('bmh'); sns.set()
np.random.seed(420)

# <center><font color="blue"> Understanding the Inner Workings of Numba </font></center>

<center><img alt="" src="images/numba.png" style="height: 300px;" /></center>

Image Source: Continuum Analytics
* IR: Intermediate Representations
* Bytecode Analysis: Intermediate code more abstract than machine code
* LLVM: Low Level Virtual Machine, infrastructure to develop compilers
* NVVM: It is an IR compiler based on LLVM, it is designed to represent GPU kernels

## Let us create a simple function

In [22]:
x = np.random.rand(10**6)
x.shape, x[:8]

((1000000,),
 array([0.31564591, 0.45303068, 0.26698226, 0.10892818, 0.86816648,
        0.62972852, 0.35251871, 0.0675376 ]))

In [23]:
def square(x):
    s = 0.0
    for i in x:
        s = s + i**2 # or s+=i**2
    return s

# First let us calculate the performance if we are not using Numba/JIT

In [24]:
import time

In [25]:
start = time.time()
dSum = square(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}".format(end-start, dSum))

Elapsed Time = 0.231207 seconds, result = 332849.5462


# Alternatively

## Magick Command

In [26]:
%timeit dSum = square(x)

222 ms ± 28.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


# Let us use Numba

In [27]:
from numba import jit
 
# Use nopython=True where possible for best performance
@jit(nopython=True)
def square_numba(x):
    s = 0.0
    for i in x:
        s = s + i**2 # or s+=i**2
    return s

In [28]:
start = time.time()
dSum = square_numba(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}".format(end-start, dSum))

Elapsed Time = 0.693063 seconds, result = 332849.5462


# <center><font color="red"> Wait... what? Why is it slower? </font></center>

* Because this is Numba JIT (compilation + execution) Time
* in order to get the real performance time, we execute one more time

In [29]:
start = time.time()
dSum = square_numba(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}".format(end-start, dSum))

Elapsed Time = 0.001008 seconds, result = 332849.5462


# <center><font color="green"> ~ 200 times faster :) </font></center>

# <center><font color="red"> Exercise </font></center>

> create a JIT Function that Calculate (return) *Average* **and** *Variance* of *x*

* Calculate the performance of the original and JIT Function
* Do not use numpy built-in function to calculate the meand and variance

<center><img alt="" src="images/exercise-question.jpg" style="height: 250px;" /></center>

In [32]:
x

array([0.31564591, 0.45303068, 0.26698226, ..., 0.78158989, 0.882921  ,
       0.67562139])

In [34]:
np.mean(x), np.var(x)

(0.49961053296528146, 0.08323886158890453)

In [35]:
def myFunc(x):
    N = len(x)
    ave = 0.0
    for data in x:
        ave = ave + data
    ave = ave/N
    var = 0.0
    for data in x:
        var = var + (data - ave)**2
    var = var / (N-1)
    return ave, var    

In [37]:
start = time.time()
a, v = myFunc(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}, {:.4f}".format(end-start, a, v))

Elapsed Time = 0.418646 seconds, result = 0.4996, 0.0832


In [38]:
@jit(nopython=True)
def myFunc_jit(x):
    N = len(x)
    ave = 0.0
    for data in x:
        ave = ave + data
    ave = ave/N
    var = 0.0
    for data in x:
        var = var + (data - ave)**2
    var = var / (N-1)
    return ave, var  

In [40]:
start = time.time()
a, v = myFunc_jit(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}, {:.4f}".format(end-start, a, v))

Elapsed Time = 0.001988 seconds, result = 0.4996, 0.0832


# <center><font color="blue"> Numba Explicit Type declaration </font></center>

In Python, variables are dynamically typed, meaning their types can change at runtime. While this flexibility is convenient, it can also result in performance overhead. Numba mitigates this by allowing developers to explicitly specify the types of variables, enabling the compiler to generate specialized machine code tailored to those types.

Explicitly typing variables in Numba offers several advantages:
* **Improved Performance**: When Numba has type information, it can generate highly optimized machine code that bypasses Python’s dynamic typing system. This leads to significant performance improvements, especially in computationally intensive tasks.
* **Reduced Overhead**: With type information, Numba eliminates the need for runtime type checks and conversions. In simpler terms, this means that the compilation to machine code will occur during the very beginning of the program execution, not when the function is first called.
* **Code Safety**: Explicit typing helps catch potential errors at compile time, allowing you to identify and fix type-related issues early in the development process.

In [41]:
def go_fast(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in a:   # Numba likes loops
        trace += np.tanh(i) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

In [42]:
start = time.time()
dSum = go_fast(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}".format(end-start, np.sum(dSum)))

Elapsed Time = 0.903439 seconds, result = 433510567960.0078


# NoPython Version

In [43]:
@jit(nopython=True)
def go_fast_nopython(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in a:   # Numba likes loops
        trace += np.tanh(i) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

_ = go_fast_nopython(x[:100]) # initialize, so that Numba will compile the function

In [44]:
start = time.time()
dSum = go_fast_nopython(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}".format(end-start, np.sum(dSum)))

Elapsed Time = 0.010271 seconds, result = 433510567960.0079


# NoPython Version == njit

In [None]:
from numba import njit

In [None]:
@njit
def go_fast_njit(a): # Function is compiled to machine code when called the first time
    trace = 0.0
    for i in a:   # Numba likes loops
        trace += np.tanh(i) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

_ = go_fast_njit(x[:100]) # initialize, so that Numba will compile the function

In [None]:
start = time.time()
dSum = go_fast_njit(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}".format(end-start, np.sum(dSum)))

# Adding variable type information

In [45]:
@jit('float64[:](float64[:])', nopython=True)
def go_fast_explicit(a):
    trace = 0.0
    for i in a:   # Numba likes loops
        trace += np.tanh(i) # Numba likes NumPy functions
    return a + trace              # Numba likes NumPy broadcasting

# WE DO NOT need to initialize, because Numba already know the variable type
# The [:] is saying to Numba that The function accept 1D array as input and output another 1D Array

In [46]:
start = time.time()
dSum = go_fast_explicit(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}".format(end-start, np.sum(dSum)))

Elapsed Time = 0.009524 seconds, result = 433510567960.0079


# <center><font color="blue"> Numba Parallelization  </font></center>

Numba also excels at optimizing loops and array operations. By using Numba’s capabilities to parallelize execution, you can further boost performance by leveraging multiple threads. Here’s an example showcasing the parallelization with Numba:

<center><img alt="" src="images/serial-vs-parallel.png" style="height: 200px;" /></center>


In [47]:
def original_sum(a):
    result = 0
    for i in range(len(a)):
        result += a[i]
    return result

In [48]:
start = time.time()
dSum = original_sum(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}".format(end-start, np.sum(dSum)))

Elapsed Time = 0.137925 seconds, result = 499610.5330


In [49]:
from numba import prange, int64
 
@jit(parallel=True, nopython=True)
def parallel_sum(a):
    result = 0
    for i in prange(len(a)):
        result += a[i]
    return result
 
_ = parallel_sum(x[:100]) # initialize, so that Numba will compile the function

In [50]:
start = time.time()
dSum = parallel_sum(x)
end = time.time()
print("Elapsed Time = {:.6f} seconds, result = {:.4f}".format(end-start, np.sum(dSum)))

Elapsed Time = 0.001002 seconds, result = 499610.5330


# <center><font color="red"> Exercise </font></center>

> The Prime

* We are going to do this exercise in the Lab interactively

<center><img alt="" src="images/prime-definition.jpg" style="height: 250px;" /></center>

In [None]:
def isPrime(x):

    return True if x is prime and False if x is not prime 

In [None]:
def myPrime(n):
    # first u need a mechanism to decide whether a number is prime
    # do loop until u get n prime numbers
    return n first prime numbers

<center><h2><strong><font color="blue">End of Module</font></strong></h2></center>
<hr>
<center><img alt="" src="images/meme-cartoon/numba-meme.jpg"/></center>