# Array

Dask array provides a parallel, larger-than-memory, n-dimensional array using blocked algorithms. 

**DISTRIBUTED Numpy**.


![](images/dask_array.png)

In [None]:
import numpy as np
import dask.array as da

#x = da.random.normal(0, 1, size=(15, 10),   
#                              chunks=(5, 10))

x = da.random.normal(0, 1, size=(20000000, 30),   # 600 million element array 
                              chunks=(10000, 30))   

y = x.mean(axis=0)
y

In [None]:
# Local client
from dask.distributed import Client
n_workers = 10

# ON OUR SGE
def scale_to_sge(n_workers):
    queue="q_1day"
    queue_resource_spec="q_1day=TRUE"
    memory="4GB"
    sge_log= "./logs"
    from dask_jobqueue import SGECluster
    cluster = SGECluster(queue=queue, memory=memory, cores=1, processes=1,
              log_directory=sge_log,
              local_directory=sge_log,
              resource_spec=queue_resource_spec
              )
    cluster.scale_up(n_workers)
    return Client(cluster)  # start local workers as threads


#### SWITH THIS IF YOU WANT TO RUN LOCALLY OR IN OUR SGE GRID ###

# Local client
#client = Client(n_workers=n_workers)

# SGE client
client = scale_to_sge(n_workers)



In [None]:
%%time
y.compute(scheduler=client)

# Several linear algebra functions already implemented in parallel.

For instance, follow below one example with SVD

N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev., Survey and Review section, Vol. 53, num. 2, pp. 217-288, June 2011


In [None]:
%%time
u, s, v = da.linalg.svd_compressed(x, k=5)
u.compute(scheduler=client)

# Always shutdown your client

In [None]:
client.shutdown()

# Most of numpy API is there

Check it out [dask array](https://docs.dask.org/en/latest/array.html)..