<img src="https://raw.githubusercontent.com/dask/dask/main/docs/source/images/dask_horizontal.svg"
     width="60%"
     alt="Dask logo\" />

# Dask Delayed

Dask DataFrames, Dask Arrays and Dask-ML are parallel versions of PyData libraries you likely know and love. But sometimes we encounter problems that could benefit from parallel computing but that do no fit neatly into a DataFrame, Array or Machine-Learning workflow.

Dask delayed is an interface that can be used to parallelize existing Python code and custom algorithms. 

A first step to determine if we can use `dask.delayed` is to identify if there is some level of parallelism that we haven't exploit and hopefully `dask.delayed` will take care of it. 

The following two functions will perform simple computations, where we use the `sleep` to simulate work. 

In [None]:
from time import sleep

def inc(x):
    """Increments x by one"""
    sleep(1)
    return x + 1

def add(x, y):
    """Adds x and y"""
    sleep(1)
    return x + y

Let's do some operations and time these functions using the `%%time` magic at the beginning of the cell. 

In [None]:
%%time

x = inc(1)
y = inc(2)
z = add(x, y)

The execution of the cell above took three seconds, this happens because we are calling  each function sequentially. The computations above can be represented by the following graph:

<img src="https://raw.githubusercontent.com/dask/dask/main/docs/source/images/inc-add.svg" 
     width="55%"
     alt="Dask graph\" />

From looking at the task graph, the opportunity for parallelization is more evident since the the two calls to the `inc` function are completely independent of one-another. Let's explore how `dask.delayed` can help us with this.


### `dask.delayed` 

We can use `dask.delayed` to transform the `inc` and `add` functions into "lazy" versions of themselves. 

In [None]:
from dask import delayed

In [None]:
%%time

# x = inc(1)
# y = inc(2)
# z = add(x, y)

a = delayed(inc)(1)
b = delayed(inc)(2)
c = delayed(add)(a, b)

When we call the `delayed` version of the functions by passing the arguments, the original function is isn't actually called yet, that's why the execution finishes very quickly. When we called the `delayed` version of the functions, a `delayed` object is made, which keeps track of the functions to call and what arguments to pass to it. 

If we inspect `c`, we will notice that it instead of having the value five, we have what is called a `delayed` object.

In [None]:
print(c)

We can visualize this objects by doing:

In [None]:
c.visualize(format="svg")

Up to this point the object `c` holds all the information we need to compute the result. We can evaluate the result with `.compute()`.

In [None]:
%%time

c.compute()

Notice that now the computation took 2s instead of 3s, this is because the two `inc` computations are run in parallel. 

## Parallelizing a `for`-loop

When we perform the same group of operation multiple times in the form of `for-loop`, there is a chance that we can perform this computations in parallel. For example, the following serial code, can be parallelized using `delayed`: 

In [None]:
data = list(range(8))

#### Sequential code

In [None]:
%%time
results = []
for i in data:
    y = inc(i)         # do something here
    results.append(y)
    
total = sum(results)  # do something here

In [None]:
print(f'{total = }')

### Exercise 1 

Notice that both the `inc` and `sum` operations can be done in parallel, use `delayed` to parallelize the sequential code above, compute the `total` and time it using `%%time` 

Uncomment and run the cell below to see the solution.

In [None]:
results = []
for i in data:
    y = delayed(inc)(i)    
    results.append(y)
    
total = delayed(sum)(results)

In the code above, the `sum` step is not run in parallel, but it depends on each of the `inc` steps, that's why it needs the `delayed` decorator too. The `inc`steps will be parallelized, then aggregated with the `sum` step.

Notice that we can apply delayed to built-in functions, as we did in the case of `sum` in the code above. 

In [None]:
total

In [None]:
total.visualize(format="svg")

In [None]:
%%time
total.compute()

###  The `@delayed` syntax 

The `delayed` decorator can be also used by "decorating" with `@delayed` the function you want to parallelize.

In [None]:
@delayed                    
def double(x):
    """Decrease x by one"""
    sleep(1)
    return 2*x 

Then when we call this new `double` function we obtain a delayed object:

In [None]:
d = double(4)
print(d)

In [None]:
%%time
d.compute()

### Exercise 2

Using the `delayed` decorator create the parallel versions of `inc` and `add`

```python
def inc(x):
    """Increments x by one"""
    sleep(1)
    return x + 1

def add(x, y):
    """Adds x and y"""
    sleep(1)
    return x + y
```

In [None]:
@delayed
def inc(x):
    """Increments x by one"""
    sleep(1)
    return x + 1

@delayed
def add(x, y):
    """Adds x and y"""
    sleep(1)
    return x + y

``Delayed`` objects support several standard Python operations, each of which creates another ``Delayed`` object representing the result:

- Arithmetic operators, e.g. `*`, `-`, `+`
- Item access and slicing, e.g. `x[0]`, `x[1:3]`
- Attribute access, e.g. `x.size`
- Method calls, e.g. `x.index(0)`

For example you can do:

In [None]:
result = (inc(5) * inc(7)) + (inc(3) * inc(2))
result.visualize(format="svg")

In [None]:
%%time
result.compute()

## Another for-loop example 

Let's say we want to perform some operations like `inc`, `double` and `add` on a list of data, and finally aggregate all the results. We can use our `delayed` decorated functions to perform this computations faster. 
The serial version of the code below would take approximately 24 seconds, let's see how long does the parallel version takes:

In [None]:
data = list(range(8))

output = []
for x in data:
    a = inc(x)     #parallel version
    b = double(x)  #parallel version
    c = add(a, b)  #parallel version
    output.append(c)

total = delayed(sum)(output)
total

Noticed that `inc`, `double` and `add` in the code above are already the parallel versions, since we decorated with `@delayed`

In [None]:
total.visualize(format="svg")

### Exercise: How long will this task graph take to compute on a machine with 2 cores?

In [None]:
%%time
total.compute()

## Extra resources

For more examples on `dask.delayed` check:
- Main Dask tutorial: [Delayed lesson](https://github.com/dask/dask-tutorial/blob/main/01_dask.delayed.ipynb)
- More examples on Delayed: [PyData global - Dask tutorial - Delayed](https://github.com/coiled/pydata-global-dask/blob/master/1-delayed.ipynb)
- Short screencast on Dask delayed: [How to parallelize Python code with Dask Delayed (3min)](https://www.youtube.com/watch?v=-EUlNJI2QYs)
- [Dask Delayed documentation](https://docs.dask.org/en/latest/delayed.html)
- [Delayed Best Practices](https://docs.dask.org/en/latest/delayed-best-practices.html)
