# Array Multicore

Instead of trivially parallel independent tasks here we want to use multiple threads to process simultaneously different parts of the same array. `dask` automatically provides this feature by replacing the `numpy` function with `dask` functions. The key concept is a chunk, each chunk of data is executed separately by different threads. For example for a matrix we define a 2D block size and each of those blocks can be executed independently and then the results accumulated to get to the final answer. See <http://dask.pydata.org/>

In [1]:
import numpy as np
import dask.array as da

In [13]:
A = np.random.rand(20000,10000)

In [14]:
A.size / 1e6

200.0

In [15]:
A

array([[0.5410205 , 0.76296078, 0.08037006, ..., 0.46609799, 0.86788782,
        0.68013651],
       [0.29760242, 0.44926126, 0.82084149, ..., 0.32585922, 0.58017776,
        0.09203796],
       [0.54445827, 0.43079164, 0.77506465, ..., 0.30376399, 0.00583804,
        0.42287794],
       ...,
       [0.76752098, 0.23760431, 0.8231055 , ..., 0.52568007, 0.54125255,
        0.63238691],
       [0.21266485, 0.90136273, 0.08967402, ..., 0.22139702, 0.47259033,
        0.60459065],
       [0.09859459, 0.33416604, 0.37573758, ..., 0.6811173 , 0.14592635,
        0.99696078]])

In [16]:
%time B = A**2 + np.sin(A) * A * np.log(A)

CPU times: user 2.79 s, sys: 1.44 s, total: 4.22 s
Wall time: 2.68 s


In [17]:
A_dask = da.from_array(A, chunks=(1000, 2000))

In [18]:
A_dask.numblocks

(20, 5)

In [19]:
%time B_dask = (A_dask**2 + da.sin(A_dask) * A_dask * da.log(A_dask)).compute()

CPU times: user 8.1 s, sys: 2.05 s, total: 10.1 s
Wall time: 3.24 s


In [20]:
assert np.allclose(B, B_dask)