<a href="https://colab.research.google.com/drive/1xp1hk0gnvFQgVD5Avvi7DqoMMlLAJptu?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Compiling Python with `numba` and `cython`

Reproduce Python function from lecture and measure its execution time:

In [1]:
def loop(x, r):
    for i in range(r):
        x *= 2.5
    return x

%time loop(2, 10**6)

CPU times: user 20.6 ms, sys: 608 μs, total: 21.2 ms
Wall time: 20.7 ms


inf

## Using `numba`

First, let's try compiling "Just in Time" using `numba`:

In [None]:
from numba import jit

# jit compiles when we call the function for the first time
# nopython tries to run without involving Python interpreter
@jit(nopython=True)
def loop_jit(x, r):
  for i in range(r):
    x *= 2.5
  return x

%time loop_jit(2, 10**6) # includes compilation time

CPU times: user 1.55 s, sys: 132 ms, total: 1.69 s
Wall time: 5.27 s


inf

In [None]:
%time loop_jit(2, 10**6) # much faster after compilation

CPU times: user 1.47 ms, sys: 62 µs, total: 1.53 ms
Wall time: 1.54 ms


inf

In [None]:
%timeit loop(3, 10**6) # better to time across multiple runs using `timeit`

The slowest run took 4.67 times longer than the fastest. This could mean that an intermediate result is being cached.
81.1 ms ± 55.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
%timeit loop_jit(3, 10**6)

1.67 ms ± 83.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


We might want to compile our code ahead of time, though, so that we can see a speed-up the first time we use it. `numba` allows us to compile ahead of time like so:


In [None]:
from numba.pycc import CC

# name of compiled module to create:
cc = CC('test_aot')

# name of function in module, with explicit data types required (4byte=32bit ints and floats)
@cc.export('loop_aot', 'f4(f4,i4)')
def loop_aot(x, r):
    for i in range(r):
        x *= 2.5
    return x

cc.compile()

Note that we now have a compiled object file (.so) in our current directory. This is a compiled module that contains our function.

In [None]:
ls

[0m[01;34msample_data[0m/  [01;32mtest_aot.cpython-311-x86_64-linux-gnu.so[0m*


To use our function, we just need to import our pre-compiled module, as we would any other Python module:

In [None]:
import test_aot
%time test_aot.loop_aot(2, 10**6) # first time running it is fast this time

CPU times: user 1.56 ms, sys: 0 ns, total: 1.56 ms
Wall time: 1.56 ms


inf

In [None]:
%timeit test_aot.loop_aot(2, 10**6) # same overall performance as before

1.54 ms ± 17.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


## Using `cython`

Another common way to compile Python code (albeit slightly uglier) is to compile our function via explicit `cython` static typing, like so (here, using the IPython `cython` extension to compile):

In [None]:
%load_ext cython

In [None]:
# will automatically convert Python -> C -> Compiled machine code
%%cython

# explicitly add static types to function itself:
def loop_cython(float x, int r):
    cdef int i
    for i in range(r):
        x *= 2.5
    return x

In [None]:
%timeit loop_cython(2, 10**6) # comparable performance to numba

1.54 ms ± 25.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
