# Scale out calculations with Numba and Dask

In [1]:
#parameters
M = 10  # number of realisations
N = int(100e6 // 8)  # number of steps

In [6]:
%matplotlib inline

import numpy as np
from numba import jit

## Numpy and Numba (No dask for now)

We'll define a function that performs a random walk based on an array of random steps.  First, we'll use pure Numpy.  Then, we'll just-in-time (JIT) compile with Numba.

We'll create `M` walks with `N` steps and time the calculation of the standard deviation over all positions.

_Note that we use an inefficient loop on purpose._

In [7]:
def walk(steps):
    x = np.zeros_like(steps)
    for l in range(steps.shape[-1]):
        x[..., l] = x[..., l-1] + steps[..., l]
    return x

In [8]:
%time walk(np.random.normal(size=(M, N))).std()

CPU times: user 28.2 s, sys: 642 ms, total: 28.9 s
Wall time: 28.9 s


2238.5909093976024

Now with Numba:

In [9]:
@jit
def walk_jit(steps):
    x = np.zeros_like(steps)
    for l in range(steps.shape[-1]):
        x[..., l] = x[..., l-1] + steps[..., l]
    return x

In [10]:
%time walk_jit(np.random.normal(size=(M, N))).std()

CPU times: user 7.96 s, sys: 706 ms, total: 8.66 s
Wall time: 10.3 s


1896.2455231247081

## Now with a local Dask cluster

We'll create a Dask array that is chunked in the first dimension (the one denoting the realisation), apply the pure-Numpy and the Numba version of the random walk, and time the calculation of the standard deviation as above.

In [12]:
from dask import array as da
from dask.distributed import Client, wait
import os

In [13]:
client = Client(n_workers=5, threads_per_worker=1, memory_limit=2e9,
                ip=os.environ["HOSTNAME"])
client

0,1
Client  Scheduler: tcp://10.8.0.38:35348  Dashboard: http://10.8.0.38:8787/status,Cluster  Workers: 5  Cores: 5  Memory: 10.00 GB


In [26]:
steps = da.random.normal(size=(M, N), chunks=(1, N))
print(steps)
print(steps.nbytes / 1e9, "GB")

dask.array<normal, shape=(10, 12500000), dtype=float64, chunksize=(1, 12500000)>
1.0 GB


In [15]:
%time da.apply_along_axis(walk, -1, steps).std().compute()

CPU times: user 3.42 s, sys: 442 ms, total: 3.86 s
Wall time: 33 s


2876.7555800514215

In [16]:
%time da.apply_along_axis(walk_jit, -1, steps).std().compute()

CPU times: user 591 ms, sys: 63.6 ms, total: 655 ms
Wall time: 2.74 s


2876.7555800514215

_Note that `.compute()` will pass the result to the client and then forget everything. Hence, the relative speeds above are **not** because of caching._

## And with a SLURMCLuster

We'll use `100 * M` realisations.

In [22]:
from dask_jobqueue import SLURMCluster

cluster = SLURMCluster(
    cores=24,
    processes=12,  # Do you notice a difference to the other examples?
    memory="100GB",
    shebang='#!/usr/bin/env bash',
    queue="batch",
    walltime="00:30:00",
    local_directory='/tmp',
    death_timeout="15s",
    interface="ib0",
    log_directory=f'{os.environ["SCRATCH_cecam"]}/{os.environ["USER"]}/dask_jobqueue_logs/',
    project="ecam")

client = Client(cluster)
client

0,1
Client  Scheduler: tcp://10.80.32.38:36853  Dashboard: http://10.80.32.38:44557/status,Cluster  Workers: 0  Cores: 0  Memory: 0 B


In [23]:
cluster.scale(48)  # scale to four nodes
                   # Not sure we'll get them now.

In [27]:
steps = da.random.normal(size=(100 * M, N), chunks=(1, N))
print(steps)
print(steps.nbytes / 1e9, "GB")

dask.array<normal, shape=(1000, 12500000), dtype=float64, chunksize=(1, 12500000)>
100.0 GB


In [30]:
%time da.apply_along_axis(walk, -1, steps).std().compute()

CPU times: user 2min 22s, sys: 19.8 s, total: 2min 42s
Wall time: 5min 54s


2596.980546174684

In [31]:
%time da.apply_along_axis(walk_jit, -1, steps).std().compute()

CPU times: user 10.1 s, sys: 1.35 s, total: 11.5 s
Wall time: 13.4 s


2596.980546174684

In [32]:
client

0,1
Client  Scheduler: tcp://10.80.32.38:36853  Dashboard: http://10.80.32.38:44557/status,Cluster  Workers: 48  Cores: 96  Memory: 399.84 GB


## Outlook

This is only a mouth-wetting.  For more, check the [Numba docks](https://numba.pydata.org/) and read this [blog post](https://blog.dask.org/2019/04/09/numba-stencil) on the [Dask blog](https://blog.dask.org/).

## Complete listing of software used here

In [33]:
%pip list

Package            Version          
------------------ -----------------
asciitree          0.3.3            
aspy.yaml          1.2.0            
backcall           0.1.0            
bokeh              1.1.0            
certifi            2019.3.9         
cfgv               1.6.0            
cftime             1.0.3.4          
Click              7.0              
cloudpickle        1.0.0            
cycler             0.10.0           
cytoolz            0.9.0.1          
dask               1.2.0            
dask-jobqueue      0.4.1+32.g9c3371d
decorator          4.4.0            
distributed        1.27.1           
docrep             0.2.5            
fasteners          0.14.1           
heapdict           1.0.0            
identify           1.4.3            
importlib-metadata 0.13             
ipykernel          5.1.1            
ipython            7.5.0            
ipython-genutils   0.2.0            
jedi               0.13.3           
Jinja2             2.10.1           
j

In [34]:
%conda list --explicit

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://conda.anaconda.org/conda-forge/linux-64/git-lfs-2.7.2-0.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2019.3.9-hecc5488_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libgcc-ng-8.2.0-hdf63c60_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libgfortran-ng-7.3.0-hdf63c60_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libstdcxx-ng-8.2.0-hdf63c60_1.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/bzip2-1.0.6-h14c3975_1002.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/expat-2.2.5-hf484d3e_1002.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/icu-58.2-hf484d3e_1000.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/jpeg-9c-h14c3975_1001.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/libffi-3.2.1-he1b5a44_1006.tar.bz2
https://conda.anaconda.org/conda-forge/linux-64/