## Initilisation

We start simply by creating a cluster. Dask gives different ways to start a cluster. See [this notebook](06.dask_cluster_examples.ipynb) for more informations.  
For now, we just start a simple local cluster.

In [None]:
from dask.distributed import Client

client = Client(n_workers=4)

## Exercise: Parallelizing a for-loop code with control flow

Often we want to delay only *some* functions, running a few of them immediately.  This is especially helpful when those functions are fast and help us to determine what other slower functions we should call.  This decision, to delay or not to delay, is usually where we need to be thoughtful when using `dask.delayed` and `dask.futures`.

In the example below we iterate through a list of inputs.  If that input is even then we want to call `inc`.  If the input is odd then we want to call `double`.  This `is_even` decision to call `inc` or `double` has to be made immediately (not lazily) in order for our graph-building Python code to proceed.

In [None]:
import time
import dask

def double(x):
    time.sleep(1)
    return 2 * x

def inc(x):
    time.sleep(1)
    return x + 1

def is_even(x):
    return not x % 2

data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [None]:
%%time
# Sequential code

results = []
for x in data:
    if is_even(x):
        y = double(x)
    else:
        y = inc(x)
    results.append(y)
    
total = sum(results)
print(total)

In [None]:
%%time
# Your parallel code here...
# TODO: parallelize the sequential code above using dask.delayed
# You will need to delay some functions, but not all

In [None]:
def define_delayed(data):
    """Define the whole processing and add delayed decorator
    """
    results_delayed = []

    for x in data:
        if is_even(x):
            y = dask.delayed(double)(x)

        else:
            y = dask.delayed(inc)(x)

        results_delayed.append(y)

    return results_delayed

In [None]:
print("Using delayed functions ....")
results_delayed = define_delayed(data)
    
total = dask.delayed(sum)(results_delayed)

In [None]:
%time total.compute()

In [None]:
total.visualize()

In [None]:
## Exercise: Parallelizing a for-loop code with control flow (with futures)

In [None]:
%%time
# Your parallel code here...
# TODO: parallelize the sequential code above using dask.futures
# You will need to delay some functions, but not all

In [None]:
def complete_function(x):
    """Define the whole processing
    """

    if is_even(x):
        y = double(x)
    else:
        y = inc(x)

    return y


In [None]:
print("Using futures ....")
total_futures = client.map(complete_function, data)
client.gather(total_futures)
total = dask.delayed(sum)(total_futures)
print(total.compute())

### Some questions to consider:

-  What are other examples of control flow where we can't use delayed?
-  What would have happened if we had delayed the evaluation of `is_even(x)` in the example above?
-  What are your thoughts on delaying `sum`?  This function is both computational but also fast to run.

## Close the Client

Before moving on to the next exercise, make sure to close your client or stop this kernel.

In [None]:
client.close()