# Using **dask**

[dask](https://dask.org/) is a Python package build upon the scientific stack to enable scalling of Python through interactive sessions to multi-core and multi-node.

Of particular relevance to **SEGY-SAK** is that `xrray.Dataset` loads naturally into `dask`.

## Imports and Setup

Here we import the plotting tools, `numpy` and setup the `dask.Client` which will auto start a `localcluster`. Printing the client returns details about the dashboard link and resources.

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import numpy as np
from segysak import open_seisnc, segy

import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
from dask.distributed import Client

client = Client()
client

We can also scale the cluster to be a bit smaller.

In [None]:
client.cluster.scale(2, memory='0.5gb')
client

## Lazy loading from SEISNC using chunking

If your data is in SEG-Y to use dask it must be converted to SEISNC. If you do this with the CLI it only need happen once.

In [None]:
segy_file = '../data/volve10r12-full-twt-sub3d.sgy'
seisnc_file = '../data/volve10r12-full-twt-sub3d.seisnc'
segy.segy_converter(segy_file, seisnc_file, iline=189, xline=193, cdpx=181, cdpy=185, silent=True)

By specifying the chunks argument to the `open_seisnc` command we can ask dask to fetch the data in chunks of size *n*. In this example the `iline` dimension will be chunked in groups of 100. The valid arguments to chunks depends on the dataset but any dimension can be used.

Even though the seis of the dataset is `2.14GB` it hasn't yet been loaded into memory, not will `dask` load it entirely unless the operation demands it.

In [None]:
seisnc = open_seisnc('../data/volve10r12-full-twt-sub3d.seisnc', chunks={'iline':100})
seisnc.seis.humanbytes

Lets see what our dataset looks like. See that the variables are `dask.array`. This means they are references to the on disk data. The dimensions must be loaded so `dask` knows how to manage your dataset.

In [None]:
seisnc

## Operations on SEISNC using `dask`

In this simple example we calculate the mean, of the entire cube. If you check the dashboard (when running this example yourself). You can see the task graph and task stream execution.

In [None]:
mean = seisnc.data.mean()
mean

Whoa-oh, the mean is what? Yeah, `dask` won't calculate anything until you ask it to. This means you can string computations together into a task graph for lazy evaluation. To get the mean try this

In [None]:
mean.compute().values

## Plotting with `dask`

The lazy loading of data means we can plot what we want using `xarray` style slicing and `dask` will fetch only the data we need.

In [None]:
fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(20, 10))

iline = seisnc.sel(iline = 10100).transpose('twt', 'xline').data
xline = seisnc.sel(xline = 2349).transpose('twt', 'iline').data
zslice = seisnc.sel(twt = 2900, method='nearest').transpose('iline', 'xline').data

q = iline.quantile([0, 0.001, 0.5, 0.999, 1]).values
rq = np.max(np.abs([q[1], q[-2]]))

iline.plot(robust=True, ax=axs[0, 0], yincrease=False)
xline.plot(robust=True, ax=axs[0, 1], yincrease=False)
zslice.plot(robust=True, ax=axs[0, 2])

imshow_kwargs = dict(
    cmap='seismic', aspect='auto', vmin=-rq, vmax=rq, interpolation='bicubic'
)

axs[1, 0].imshow(iline.values, **imshow_kwargs)
axs[1, 0].set_title('iline')
axs[1, 1].imshow(xline.values, **imshow_kwargs)
axs[1, 1].set_title('xline')
axs[1, 2].imshow(zslice.values, origin='lower', **imshow_kwargs)
axs[1, 2].set_title('twt')