# Using `ipyparallel` clusters

This is a terse intro to using `ipyparallel` clusters to do HAT stuff on the `shrike` cluster. The following packages are installed in the virtualenv for all machines in the cluster.

- ipyparallel
- joblib
- dask
- numpy
- scipy
- matplotlib
- astropy
- astroquery
- astrobase

plus all of their dependencies.

All machines have identical home directories and identical virtualenvs.  Alternatively (something I should do actually), we could install the venv in the cluster-wide `/nfs/shrike/ar0` directory and run it from there.

The cluster is live as of 2016-12-08, and consists of the following machines:

- `shrike`: head node with Xeon-D 1520 4-core (8-thread) CPU at 2.0 Ghz, 64 GB ECC DDR4 RAM, 2 x 300 GB DC3500 SSDs (one for / and one for /nfs/shrike/ar0)
- `cluster-i7-one`: worker node with i7-6700 4-core (8-thread) at 3.4 Ghz, 16 GB DDR4 RAM, 1 x 120 GB SSD for /
- `cluster-i7-two`: worker node with i7-6700 4-core (8-thread) at 3.4 Ghz, 16 GB DDR4 RAM, 1 x 120 GB SSD for /
- `cluster-i5-one`: worker node with i5-6500 4-core (4-thread) at 3.2 Ghz, 16 GB DDR4 RAM, 1 x 120 GB SSD for /
- `cluster-i5-two`: worker node with i5-6500 4-core (4-thread) at 3.2 Ghz, 16 GB DDR4 RAM, 1 x 120 GB SSD for /

I plan to add three more cluster members to bring this up to an even eight nodes. One of these will likely be some sort of GPU machine, using a GTX1060 6GB card, so we can figure out GPU period-finding.

## Mapping functions across the whole cluster

The following explains how to run map operations across the whole cluster. This will dispatch functions to all nodes, run them, and return the results. This is akin to the usual `multiprocessing.map` we use all the time. All of the following stuff is done on the head node, in an ipython terminal console or in a Jupyter notebook.

### connect to the cluster

The cluster head node is `shrike`. SSH in and activate the virtualenv to get started.

```bash
[user@shrike]$ source venv/bin/activate
```

Then start ipython on the terminal using: 

```bash
(venv) [user@shrike]$ ipython
```

Or use the Jupyter notebook. For this, you'll have to first set up an SSH tunnel between your machine and `shrike`:

```bash
# start the tunnel
[user@local]$ ssh -L localhost:8888:localhost:8888 user@shrike

# start the virtualenv
[user@shrike]$ source venv/bin/activate
(venv) [user@shrike]$ ipython notebook --no-browser --port=8888
```

Then on your local machine, browse to http://localhost:8888.

### set up the cluster view

In [1]:
from ipyparallel import Client
rc = Client()
print ('cluster nodes visible: %s' % rc.ids)

cluster nodes visible: [0, 1, 2, 3]


In [2]:
# DirectView object to interact with the cluster
dview = rc[:]

### an example of running period finding across the whole cluster

In [4]:
import os.path

In [7]:
pwd

u'/home/wbhatti'

In [8]:
cd /nfs/shrike/ar0/work/wbhatti/scratch/

/nfs/shrike/ar0/work/wbhatti/scratch


In [9]:
# we're sitting in the /nfs/shrike/ar0/work/wbhatti/scratch directory
# this is an NFS volume shared across the whole cluster
lclist = !ls *.sqlite.gz

In [11]:
lclist = [os.path.abspath(x) for x in lclist]

In [13]:
lclist

['/nfs/shrike/ar0/work/wbhatti/scratch/HAT-432-0007388-V0-DR0-hatlc.sqlite.gz',
 '/nfs/shrike/ar0/work/wbhatti/scratch/HAT-553-0087416-V0-DR0-hatlc.sqlite.gz',
 '/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0212353-V0-DR0-hatlc.sqlite.gz',
 '/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0215246-V0-DR0-hatlc.sqlite.gz',
 '/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0215592-V0-DR0-hatlc.sqlite.gz',
 '/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0219865-V0-DR0-hatlc.sqlite.gz',
 '/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0302504-V0-DR0-hatlc.sqlite.gz',
 '/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0554686-V0-DR0-hatlc.sqlite.gz',
 '/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0562164-V0-DR0-hatlc.sqlite.gz',
 '/nfs/shrike/ar0/work/wbhatti/scratch/HAT-777-0058978-V0-DR0-hatlc.sqlite.gz']

In [14]:
# first: make all nodes import the things we need
with dview.sync_imports():
    from astrobase import periodbase
    from astrobase import hatlc
    import gzip
    import cPickle
    
# note that importing things like 'import os.path' will fail for some weird reason I haven't figured out yet
# best to do 'from os import path' instead

# also note that importing things like 'import cPickle as pickle' will fail
# in this case, just import cPickle directly

# I think these only fail when broadcasting imports across the cluster with the sync_imports statement
# doing these internally in the imported modules appears to work OK

# finally, you can just send stuff to all nodes and make them available that way
# >>> dview.push(an_object) # where an_object can be an imported module

importing periodbase from astrobase on engine(s)
importing hatlc from astrobase on engine(s)


In [20]:
# this function will be mapped across the whole cluster
def make_lsp(lcf):
    lcd, msg = hatlc.read_and_filter_sqlitecurve(lcf)
    normlcd = hatlc.normalize_lcdict(lcd)
    times, mags, errs = lcd['rjd'], lcd['aep_000'], lcd['aie_000']
    lsp = periodbase.aov_periodfind(times,mags,errs)
    outpkl = lcf.replace('hatlc.sqlite.gz','aov-lsp.pkl.gz') # this will write to the NFS shared storage
    with gzip.open(outpkl,'wb') as outfd:
        cPickle.dump(lsp,outfd,protocol=cPickle.HIGHEST_PROTOCOL)
    return lcf, lsp # probably slow when returning large objects back to the head node here
                    # better to return the output filename as a string instead

In [22]:
# this function call maps the function above to all worker nodes
# and blocks until we get results back
results = dview.map_sync(make_lsp, lclist)

In [23]:
# make sure we got everything
len(results) == len(lclist)

True

In [24]:
for lcf, lsp in results:
    print('%s: %.5f' % (lcf, lsp['bestperiod']))

/nfs/shrike/ar0/work/wbhatti/scratch/HAT-432-0007388-V0-DR0-hatlc.sqlite.gz: 7.30451
/nfs/shrike/ar0/work/wbhatti/scratch/HAT-553-0087416-V0-DR0-hatlc.sqlite.gz: 0.16940
/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0212353-V0-DR0-hatlc.sqlite.gz: 345.35128
/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0215246-V0-DR0-hatlc.sqlite.gz: 518.02692
/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0215592-V0-DR0-hatlc.sqlite.gz: 70.24094
/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0219865-V0-DR0-hatlc.sqlite.gz: 8.23900
/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0302504-V0-DR0-hatlc.sqlite.gz: 2.07004
/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0554686-V0-DR0-hatlc.sqlite.gz: 3.08579
/nfs/shrike/ar0/work/wbhatti/scratch/HAT-772-0562164-V0-DR0-hatlc.sqlite.gz: 6.06767
/nfs/shrike/ar0/work/wbhatti/scratch/HAT-777-0058978-V0-DR0-hatlc.sqlite.gz: 0.64831
