# C performance in Python or Can we be faster than Numpy..

In [16]:
import numpy as np
import math
import numba
from numba import jit, njit, vectorize

## Numpy vs Numba JIT

In [18]:
@jit
def sum_array(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result

In [19]:
arr = np.random.random((3000, 3000))

In [20]:
sum_array_jit = %timeit -o sum_array(arr)

10.4 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [21]:
sum_array_numpy = %timeit -o arr.sum()

4.91 ms ± 32.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [22]:
#SpeedUp compared to Numpy
sum_array_numpy.average/sum_array_jit.average

0.4705997204745707

Our sum function is just half the speed of Numpy sum function. :) But can we be faster..

## Numpy vs Numba Vectorize

In [36]:
@vectorize('float64(float64,float64)', target='parallel')
def trig(a, b):
    return math.sin(a**2) * math.exp(b)

def numpy_trig(a, b):
    return np.sin(a**2) * np.exp(b)

a = np.random.random((10000, 10000))
b = np.random.random((10000, 10000))

In [37]:
trig_vec_parallel = %timeit -o trig(a, b)

2.42 s ± 185 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [31]:
trig_numpy = %timeit -o numpy_trig(a,b)

5.55 s ± 17.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [33]:
#SpeedUp compared to Numpy
trig_numpy.average/trig_vec_parallel.average

2.273445613256256

Now our parallelized version is 2 times faster than Numpy.. 

Automatic multicore operations!

**Note**: `target='parallel'` is not always the best option.  There is overhead in setting up the threading, so if the individual scalar operations that make up a `ufunc` are simple you'll probably get better performance in serial.  If the individual operations are more expensive (like trig!) then parallel is (usually) a good option.