# From Jupyter to CCV

This notebook contains an example of how a scheduled job may be launched from a local Jupyter notebook to the CCV cluster.

It uses `zizibee` to determine which functions have been defined in the running notebook, and puts them together with other requirements in a single .py file that is then uploaded to CCV using sftp.

For this to work one needs to be either on campus, or connected through the VPN.

It assumes that there's a function that needs to be executed in an enumerated set of parameters. To provide these input values to the function of interest a necessary helper function needs to be defined. This function takes an integer and returns a tuple with the corresponding parameters.

The output of the function can be a numpy array of arbitrary shape.

For giving the function of interest a command-line interface, which is useful when setting up the sbatch shell script, `zizibee` uses [fire](https://github.com/google/python-fire).

To execute some of the remote commands necessary to set this up, `zizibee` established an SSH connection to CCV using [paramiko](https://www.paramiko.org).

Over at CCV the output of the function saves its result in .h5 format.

As an additional convenience zizibee also provides a convenience function that calls rsync to pull all the files from a remote directory at CCV.

Most of this assumes that passwordless SSH login to CCV has been already configured.

## run_at_ccv

In [153]:
# These are the imports required for running the function at CCV.
# one needs to make sure that these dependencies will be met in the remote environment.
# If there's any custom import that needs to be met, this can be 
# satisfied by providing the adequate path in the extra_py argument
# to send_to_ccv.
imports = '''
from itertools import product
import numpy as np
import h5py
import fire
import os
'''
%load_ext autoreload
%autoreload 2
import zizibee as zzb
from time import sleep
import ast
exec(imports)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [159]:
def extract_function_data(node):
    """Extract the function name, parameters and docstring."""
    name = node.name
    params = [arg.arg for arg in node.args.args]
    docstring = ast.get_docstring(node)
    
    return name, params, docstring

In [156]:
def fun1(i,j,k,l):
    '''
    This one and the next are just to emphasize the fact that 
    that all the defined functions are packaged and uploaded.
    So that any functional dependenices of the target function
    are met.
    '''
    return i - j + k - l

def fun2(i,j,k,l):
    return i + j + k + l

def input_params(idx):
    '''
    This is a function that maps the job index to the input parameters
    of the function to be run.
    '''
    ranger = range(1,5)
    inputs = list(product(ranger,ranger,ranger,ranger))
    return inputs[idx]

def complex_fun(job_index):
    '''
    A silly function to debug things.
    The important thing is that the function is made so that the 
    function value is saved to scratch_dir in h5 format.
    This function assumes that scratch_dir will be an available
    symbol when the function is run at CCV. This part is taken care of by run_at_ccv.
    '''
    ins = input_params(job_index)
    (i, j, k, l) = ins
    zrange = fun1(i,j,k,l) + fun2(i,j,k,l)
    out = np.linspace(0, abs(zrange), 100)
    fname = '%d.h5' % job_index
    fname = os.path.join(scratch_dir, fname)
    with h5py.File(fname, 'w') as f:
        f.create_dataset('out', data=out, compression='gzip')
        f.create_dataset('in', data=ins, compression='gzip')

In [160]:
extract_function_data(complex_fun)

AttributeError: 'function' object has no attribute 'name'

In [151]:
job_config = {'username':'jlizaraz',
              'numCores':1, 
              'numJobs': 4**4, 
              'memInGB': 2,
              'import_block': imports,
              'extra_py': [],
              'theglobals': globals(),
              'fun_name': 'complex_fun',
              'job_name': 'reboot'}
job_config = zzb.run_at_ccv(job_config, verbose=False)
# at this point the
mac_folder, ccv_folder = job_config['scratch_dir_at_mac'], job_config['scratch_dir_at_CCV']
numJobs = job_config['numJobs']
ccv_fruit = zzb.get_ccv_values(mac_folder, ccv_folder, numJobs)

Establishing an SSH connection to CCV and launching a shell ...
Saving Python script to file ...
Uploading script to CCV ...
Composing the sbatch script ...
Writing sbatch script ...
Sending sbatch script ...
Mac folder does not exist, creating ...
Progress: |██████████████████████████████| 100.0% Complete Elapsed Time: 63.15s Remaining Time: -0.25s


In [152]:
ccv_fruit(3, 1, 1, 2)

array([0.        , 0.08080808, 0.16161616, 0.24242424, 0.32323232,
       0.4040404 , 0.48484848, 0.56565657, 0.64646465, 0.72727273,
       0.80808081, 0.88888889, 0.96969697, 1.05050505, 1.13131313,
       1.21212121, 1.29292929, 1.37373737, 1.45454545, 1.53535354,
       1.61616162, 1.6969697 , 1.77777778, 1.85858586, 1.93939394,
       2.02020202, 2.1010101 , 2.18181818, 2.26262626, 2.34343434,
       2.42424242, 2.50505051, 2.58585859, 2.66666667, 2.74747475,
       2.82828283, 2.90909091, 2.98989899, 3.07070707, 3.15151515,
       3.23232323, 3.31313131, 3.39393939, 3.47474747, 3.55555556,
       3.63636364, 3.71717172, 3.7979798 , 3.87878788, 3.95959596,
       4.04040404, 4.12121212, 4.2020202 , 4.28282828, 4.36363636,
       4.44444444, 4.52525253, 4.60606061, 4.68686869, 4.76767677,
       4.84848485, 4.92929293, 5.01010101, 5.09090909, 5.17171717,
       5.25252525, 5.33333333, 5.41414141, 5.49494949, 5.57575758,
       5.65656566, 5.73737374, 5.81818182, 5.8989899 , 5.97979

## Uploading files to CCV

In [134]:
?zzb.upload_to_ccv

[0;31mSignature:[0m
[0mzzb[0m[0;34m.[0m[0mupload_to_ccv[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mfilename[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfolder[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0musername[0m[0;34m=[0m[0;34m'jlizaraz'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtransferhost[0m[0;34m=[0m[0;34m'sshcampus.ccv.brown.edu'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
This  function can be used to upload local files to a folder
at CCV.

Parameters
----------
filename  (str):  path  of the file to be uploaded, can be a
full or relative path.
folder (str): file will be uploaded to this folder
username (str): username at CCV
transferhost(str): the hostname of the transfer node at CCV
verbose (bool): whether to print out the command executed to
upload the file

Returns
-------
None
[0;31mFile:[0m      ~/ZiaLab/Code

In [140]:
import time

def progress_bar(iteration, total, prefix = '', suffix = '', decimals = 1, length = 50, fill = '█'):
    """
    Call in a loop to create terminal progress bar
    @params:
        iteration   - Required : current iteration (Int)
        total       - Required : total iterations (Int)
        prefix      - Optional : prefix string (Str)
        suffix      - Optional : suffix string (Str)
        decimals    - Optional : positive number of decimals in percent complete (Int)
        length      - Optional : character length of bar (Int)
        fill        - Optional : bar fill character (Str)
    """
    percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
    filled_length = int(length * iteration // total)
    bar = fill * filled_length + '-' * (length - filled_length)
    print(f'\r{prefix} |{bar}| {percent}% {suffix}', end = '\r')
    # Print New Line on Complete
    if iteration == total: 
        print()

# Test
for i in range(101):
    time.sleep(0.1)
    progress_bar(i, 100, prefix = 'Progress:', suffix = 'Complete', length = 20)


Progress: |████████████████████| 100.0% Complete


In [139]:
import time

def progress_bar(iteration, total, prefix = '', suffix = '', decimals = 1, length = 50, fill = '█'):
    """
    Call in a loop to create terminal progress bar
    @params:
        iteration   - Required : current iteration (Int)
        total       - Required : total iterations (Int)
        prefix      - Optional : prefix string (Str)
        suffix      - Optional : suffix string (Str)
        decimals    - Optional : positive number of decimals in percent complete (Int)
        length      - Optional : character length of bar (Int)
        fill        - Optional : bar fill character (Str)
    """
    # Calculate elapsed time and estimated remaining time
    global start_time
    if iteration == 0:
        start_time = time.time()
    elapsed_time = time.time() - start_time
    remaining_time = (elapsed_time * (total / (iteration + 1))) - elapsed_time if iteration > 0 else 0

    percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
    filled_length = int(length * iteration // total)
    bar = fill * filled_length + '-' * (length - filled_length)
    print(f'\r{prefix} |{bar}| {percent}% {suffix} Elapsed Time: {elapsed_time:.2f}s Remaining Time: {remaining_time:.2f}s', end = '\r')
    # Print New Line on Complete
    if iteration == total: 
        print()

# Test
for i in range(101):
    time.sleep(0.1)
    progress_bar(i, 100, prefix = 'Progress:', suffix = 'Complete', length = 50)


Progress: |██████████████████████████████████████████████████| 100.0% Complete Elapsed Time: 10.33s Remaining Time: -0.10s
