# `dask.delayed`: processes vs threads

Here we use a (quite uneficient) python implementation of the euclidean distance matrix to understand how `dask.delayed` behaves with python code. Remember that before, what we run with `dask.delayed` was Scipy's `cdist` function.

In [1]:
import dask
import numpy as np

In [2]:
def euclidean_distance_matrix(x, y):
    num_samples = x.shape[0]
    dist_matrix = np.empty((num_samples, num_samples))
    for i, xi in enumerate(x):
        for j, yj in enumerate(y):
            diff = xi - yj
            dist_matrix[i][j] = diff.sum()
    return dist_matrix

In [3]:
x = np.random.random([1000, 50])

In [4]:
%%time
edm = euclidean_distance_matrix(x, x)

CPU times: user 3.15 s, sys: 5.76 ms, total: 3.16 s
Wall time: 3.16 s


<mark>Question</mark>: The following dask graph runs `euclidean_distance_matrix` twice using the same input data. From the time measured in the previous cell, estimate how long it will take to run the graph? Run the cells and check your answer.

In [5]:
graph = [dask.delayed(euclidean_distance_matrix)(x, x),
         dask.delayed(euclidean_distance_matrix)(x, x)]

In [6]:
%%time
edm = dask.compute(graph, scheduler='threads')
# not releasing the gil -> not really multithreaded

CPU times: user 6.35 s, sys: 31.4 ms, total: 6.38 s
Wall time: 6.37 s


<mark>Question</mark>: Estimate how long it will take to run the follwing cell. Run it and check your answer.

In [7]:
%%time
edm = dask.compute(graph, scheduler='processes')
# releaseing the gil

CPU times: user 37.7 ms, sys: 87.8 ms, total: 126 ms
Wall time: 4.31 s


<mark>Question</mark>: Could you explain the results?