<a href="https://colab.research.google.com/github/vuhung16au/KidsLearnsAlgorithms/blob/main/Supercharge_Your_Data_Preprocessing_with_Numba.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [12]:
import numpy as np
from numba import jit
import time

# @jit(nopython=True)
def custom_normalize(data, lower=0, upper=1):
    min_val = np.min(data)
    max_val = np.max(data)

    numerator = (data - min_val) * (upper - lower)
    denominator = max_val - min_val

    return numerator / denominator + lower

start_time = time.time()

data = np.random.rand(100000000)
normalized_data = custom_normalize(data)

end_time = time.time()
execution_time = end_time - start_time
print(f"Execution time: {execution_time} seconds")

@jit(nopython=True)
def custom_normalize(data, lower=0, upper=1):
    min_val = np.min(data)
    max_val = np.max(data)

    numerator = (data - min_val) * (upper - lower)
    denominator = max_val - min_val

    return numerator / denominator + lower

start_time = time.time()

data = np.random.rand(100000000)
normalized_data = custom_normalize(data)

end_time = time.time()
execution_time = end_time - start_time
print(f"Execution time: {execution_time} seconds")


"""
Numba helps you dramatically speed up NumPy operations in your data preprocessing pipeline.
It uses just-in-time compilation to convert Python and NumPy code into optimized machine code.


You can apply Numba to CPU-bound operations that slow down your preprocessing.
It's particularly effective for numerical algorithms and loops that can't be easily vectorized.


Here's how to use Numba to optimize a custom data normalization function:
"""

"""

The @jit(nopython=True) decorator compiles the function to machine code.
The nopython mode ensures maximum performance.

Overall this significantly speeds up the normalization process.
It's especially beneficial for large datasets, like the million-element array used here, where standard Python loops would be slow.


Numba works best with NumPy arrays and scalar values.
It doesn't support all Python features, so keep your jitted functions simple.


Measure the performance gain using %timeit in Jupyter.
You'll often see 10x to 100x speedups for suitable functions.

"""

Execution time: 2.416228771209717 seconds
Execution time: 7.608553647994995 seconds


"\n\nThe @jit(nopython=True) decorator compiles the function to machine code. \nThe nopython mode ensures maximum performance.\n\nOverall this significantly speeds up the normalization process. \nIt's especially beneficial for large datasets, like the million-element array used here, where standard Python loops would be slow.\n\n\nNumba works best with NumPy arrays and scalar values. \nIt doesn't support all Python features, so keep your jitted functions simple.\n\n\nMeasure the performance gain using %timeit in Jupyter. \nYou'll often see 10x to 100x speedups for suitable functions.\n\n"