# 5. Dask Client

## Setting up an efficient Dask Client

#### Find your number of cores & RAM

You can find your number of CPUs and RAM using the `multiprocessing` and `psutil` libraries as follows:

In [None]:
import multiprocessing
import psutil

# Cores
cpus = multiprocessing.cpu_count()

# RAM in Gb
ram = psutil.virtual_memory().total / 1024 / 1024 / 1024

print(f"Cores: {cpus}")
print(f"RAM: {ram}")

#### Setting up an efficient Dask Client

This is a difficult process and not one that there's particular rules for. The following worked for me on my Mac OSX machine:

#### For Dask Data Structures (DataFrame, Bag, Array)

* `n_workers` = `multiprocessing.cpu_count()` / 4
* `memory_limit` = `psutil.virtual_memory().total / 1024 / 1024 / 1024` / `n_workers`
* `threads_per_worker` = 2 - 4 (depending on the process)

#### For Dask Parallel Processing (Delayed)
* `n_workers` = `multiprocessing.cpu_count()`
* `memory_limit` = `psutil.virtual_memory().total / 1024 / 1024 / 1024` / `n_workers`
* `threads_per_worker` = 1 - 4 (depending on the process - always start with 1!)


This should provide a good starting point for you. Note that I would recommend only changing the `threads_per_worker` value depending upon your process until you're more comfortable with Dask.

Note that Mac and Linux machines are more susceptible to memory errors. If you get a lot of memory errors, reduce the number of threads.

In my experience of running locally...

1. For Dask data structures, RAM is a larger limiting factor.
2. For parallelisation with Dask Delayed, CPUS are a larger limiting factor.

## Dask Data Structures Client

In [None]:
from dask.distributed import Client
import multiprocessing
import psutil

n_workers = int(multiprocessing.cpu_count() / 4)
threads_per_worker = 2
memory_limit = f"{psutil.virtual_memory().total / 1024 / 1024 / 1024 / n_workers} GiB"

client = Client(
    n_workers=n_workers,
    threads_per_worker=threads_per_worker,
    memory_limit=memory_limit
)

client

## Dask Delayed Client

In [None]:
from dask.distributed import Client
import multiprocessing
import psutil

n_workers = int(multiprocessing.cpu_count())
threads_per_worker = 1
memory_limit = f"{psutil.virtual_memory().total / 1024 / 1024 / 1024 / n_workers} GiB"

client = Client(
    n_workers=n_workers,
    threads_per_worker=threads_per_worker,
    memory_limit=memory_limit
)

client

#### Useful Client Commands

Bring up the Cluster interface

In [None]:
client.cluster

Close the client

In [None]:
client.close()