# Tutorial for running many independent jobs in parallel

### 1. Create some input data

In [1]:
import numpy as np
import pickle as pkl

rng = np.random.default_rng(12345)

N = 100
d = 2
data = list(rng.normal(size=(N, d)))

with open('hpc/run_1/data.pkl', 'wb') as f:
    pkl.dump(data, f)

### 2. Run processing on remote cluster

For example, to run on Bell cluster:

1. Fill in `hpc/config.sh` with the correct values. 

    - Optional: Add a `hpc-ignore` file to the parent directory to avoid moving unwanted files/directories to the cluster.

2. Run the following commands:

```bash
./hpc/push_to_remote.sh  # Put files onto the cluster
ssh <username>@<cluster>.rcac.purdue.edu  # Log into cluster
cd /scratch/<cluster>/<username>/<project_name>/v<version>  # These values are pulled from hpc/config.sh
source hpc/submit.sh <run_name>  # This will submit the job to the cluster. Example for run_name could be "run_1"
```



3. Wait for the job(s) to finish.

4. Pull the results back to your local machine:


```bash
# (From your local machine)
./hpc/pull_from_remote.sh <run_name>
```

And that's it! Now open the results file and compare with the expected output:

In [3]:
with open('hpc/run_1/results.pkl', 'rb') as f:
    results = pkl.load(f)

# As it is, the processing script just returns the input data, so we can simply 
# check that the results are the same as the input data.
for i in range(N):
    assert np.all(results[i][1] == data[i])

print("All tests passed!")

All tests passed!
