# Using `numba.jit` to speedup the computation of the Cityblock distance matrix 


In this notebook we implement a function to compute the Cityblock distance matrix using Numba's *just-it-time* compilation decorator. We compare it's performance to that of corresponding non-decorated NumPy function.

We will use two Numba functions here. The decorator ` @numba.jit` and `numba.prange`.

In [None]:
import numpy as np
import numba

In [None]:
def cityblock_python(x, y):
    """Naive python implementation."""

    num_samples, num_feat = x.shape
    dist_matrix = np.empty((num_samples, num_samples))
    for i in range(num_samples):
        for j in range(num_samples):
            r = 0.0
            for k in range(num_feat):
                r += np.abs(x[i][k] - y[j][k])
            dist_matrix[i][j] = r

    return dist_matrix


@numba.jit(nopython=True)
def cityblock_numba1(x, y):
    """Implementation with numba."""

    num_samples, num_feat = x.shape
    dist_matrix = np.empty((num_samples, num_samples))
    for i in range(num_samples):
        for j in range(num_samples):
            r = 0.0
            for k in numba.prange(num_feat):
                r += np.abs(x[i][k] - y[j][k])
            dist_matrix[i][j] = r

    return dist_matrix


@numba.jit(nopython=True)
def cityblock_numba2(x, y):
    """Implementation with numba and numpy."""

    num_samples, num_feat = x.shape
    dist_matrix = np.empty((num_samples, num_samples))
    for i in range(num_samples):
        for j in numba.prange(num_samples):
            dist_matrix[i][j] = np.linalg.norm(x[i] - y[j], 1)

    return dist_matrix

## Note
Observe that the inner loop, which is a reduction, is done with `numba.prange`. `numba.prange` automatically takes care of data privatization and reductions.

In [None]:
# Let's check that they all give the same result
a = 10. * np.random.random([100, 10])

print(np.abs(cityblock_python(a, a) - cityblock_numba1(a, a)).max())
print(np.abs(cityblock_python(a, a) - cityblock_numba2(a, a)).max())

In [None]:
nsamples = 200
nfeat = 25

x = 10. * np.random.random([nsamples, nfeat])

%timeit cityblock_python(x,x)
%timeit cityblock_numba1(x, x)
%timeit cityblock_numba2(x, x)

## Exercise 1
How do you explain the difference in execution times?

## Conclusions

In cases where there's no possibility to do an implementation with NumPy vectorized operations, it's worth to give a try to Numba. It offers a significant improvement in performance compared to pure python, specially in situations where loops are unavoidable.

As we have seen, the speedup doesn't come completelly for free: the way the python function is implmented is crucial to obtain a good performance from Numba. Consider different implementations with and without NumPy operations and measure their execution time.