<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

# Mathematics Basics

**With `NumPy, pandas & dask`**

&copy; Dr. Yves J. Hilpisch | The Python Quants GmbH

http://tpq.io | [training@tpq.io](mailto:trainin@tpq.io) | [@dyjh](http://twitter.com/dyjh)

## `dask`

From https://dask.org:

> Dask natively scales Python &mdash; Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love<br><br>
> Dask is open source and freely available. It is developed in coordination with other community projects like NumPy, pandas, and scikit-learn.

## `dask` + `NumPy`

### The Client

In [None]:
!git clone https://github.com/tpq-classes/mathematics_basics.git
import sys
sys.path.append('mathematics_basics')


In [None]:
from dask.distributed import Client

In [None]:
client = Client(processes=False, threads_per_worker=4,
                n_workers=1, memory_limit='4GB')

In [None]:
client

### `dask` Array

In [None]:
import dask.array as da

In [None]:
x = da.random.standard_normal((15000, 15000),  # total size
                     chunks=(1500, 1500))  # size of NumPy arrays

In [None]:
x

In [None]:
x[:2, :5]

In [None]:
x[:2, :5].compute()  # gives result as ndarray object

In [None]:
y = x + x.T

In [None]:
z = y[::2, 5000:].mean(axis=1)

In [None]:
z

In [None]:
%time a = z.compute()  # gives the result ...

In [None]:
a  # ... as NumPy ndarray object

In [None]:
y = y.persist()  # persisting the data in memory (if enough is available)

In [None]:
y[0, 0]

In [None]:
%time y[0, 0].compute()

In [None]:
%time y.sum()

In [None]:
%time y.sum().compute()

In [None]:
client.shutdown()

## `dask` + `pandas` 

### The Client

In [None]:
from dask.distributed import Client, progress

In [None]:
client = Client(processes=False, threads_per_worker=4,
                n_workers=2, memory_limit='4GB')

In [None]:
client

### `dask DataFrame`

In [None]:
import pandas as pd
import dask.dataframe as dd

In [None]:
path = '../../../data/'  # adjust the path to you own path
fnc = path + 'data.csv'
fnp = path + 'data.pq'

In [None]:
ls -n $path

In [None]:
%time df = pd.read_parquet(fnp)

In [None]:
df.info()

In [None]:
ddf = dd.from_pandas(df, chunksize=1000000)

In [None]:
ddf.info()

In [None]:
ddf.index

In [None]:
%time ddf.index.compute()

In [None]:
ddf = ddf.repartition(freq='1M')

In [None]:
ddf

In [None]:
%time df.head()

In [None]:
%time ddf.head()

In [None]:
%time df[df['A'] > 0]

In [None]:
%time ddf[ddf['A'] > 0]

In [None]:
%time ddf[ddf['A'] > 0].compute()

In [None]:
%time df[(df['A'] > 0) & (df['B'] < 0)]

In [None]:
%time ddf[(ddf['A'] > 0) & (ddf['B'] < 0)]

In [None]:
%time ddf[(ddf['A'] > 0) & (ddf['B'] < 0)].compute()

In [None]:
%time df.query('A > 0 & B < 0')

In [None]:
%time ddf.query('A > 0 & B < 0')

In [None]:
%time ddf.query('A > 0 & B < 0').compute()

In [None]:
%time df.groupby('M')['C'].mean()

In [None]:
%time ddf.groupby('M')['C'].mean()

In [None]:
%time ddf.groupby('M')['C'].mean().compute()

In [None]:
%time df[['D', 'E']].resample('1w').mean().head()

In [None]:
%time ddf[['D', 'E']].resample('1w').mean().head()

### Reading Data 

In [None]:
%time ddf = dd.read_parquet(fnp, chunksize=1000000)

In [None]:
ddf.info()

In [None]:
ddf.columns

In [None]:
ddf.dtypes

In [None]:
%time df = pd.read_csv(fnc)

In [None]:
%time df[['B', 'D']].mean()

In [None]:
%time ddf = dd.read_csv(fnc)

In [None]:
ddf.columns

In [None]:
ddf.dtypes

In [None]:
ddf.head()

In [None]:
%time ddf[['B', 'D']].mean()

In [None]:
%time ddf[['B', 'D']].mean().compute()

In [None]:
%time ddf = ddf.set_index('Unnamed: 0')

In [None]:
ddf.index = ddf.index.rename('Date')

In [None]:
ddf.info()

In [None]:
ddf.head()

In [None]:
ddf.index.compute()

In [None]:
client.shutdown()

In [None]:
ls -n $path

In [None]:
!rm $path/data.*

<img src="http://hilpisch.com/tpq_logo.png" alt="The Python Quants" width="35%" align="right" border="0"><br>

<a href="http://tpq.io" target="_blank">http://tpq.io</a> | <a href="http://twitter.com/dyjh" target="_blank">@dyjh</a> | <a href="mailto:training@tpq.io">training@tpq.io</a>