# Submitting (Serial or Parallel) Jobs on Great Lakes

Submitting jobs onto a cluster can be complicated. To learn about some basics of the structure of Great Lakes, the UM computing cluster, please refer to the [ITS intro slides](https://docs.google.com/presentation/d/1ONH2dwnR75qnJ3RHEK3VQseGkbAWO9pVdBWxl6UoXpc/edit?usp=sharing).

To configure your jobs in a format to be submitted to the cluster, you need to create a batch script, typically a "xxx.sbat" file. An example for a batch script is given below.

Let's now go through these line by line so that you know what to modify when you need to create a batch script for your own purposes. 

# Requesting resources with batch scripts

The above chunk of commands specifies the following:
1. account and partition: typically PhD students in the Department of Statistics use this combination; if you were an undergrad, you may also have access to [an LSA account](https://arc.umich.edu/document/lsa-public-great-lakes-accounts/).
2. `ntasks`, `nodes`, `ntasks-per-node`: task & node distributions. The above, and the simplest scheme is to parallel on n nodes, and submit one job on each node, in which case you configure the resources as above.
3. `cpus-per-task`: Depends on whether each of the tasks is parallelized. If you need to parallelize within your code, e.g. an inference algorithm where you parallelize p-value calculations for each parameter, then you will need more than one cpu to do this. Otherwise, if each of your subtask on each node is a serial task, e.g. you have a simulation program, where you partition a total of 500 iterations into subtasks, each of size 1 simulation, and assign them to 500 nodes, but within the tasks, nothing is parallelized (in the p-value example, for each simulation, you construct p-values for each parameter sequentially, instead of simultaneously), then you only need 1 cpu for each node.
4. `mem-per-cpu`: The RAM of each CPU. Oftentimes for programs used in our group, you rarely use up the set 5GB, something like 500MB should be sufficient. 
5. `time`: You may specify the program running time here. Keep in mind that if you don't know how long a large scale (e.g. 500 simulations) job will take, you may submit a small scale job (e.g. 1 simulation) and requesting much less resources (running time, cores, etc.) to have a sense for that, and then submit your large scale job while requesting an appropriate amount of resources.
6. **Please always keep in mind that it is good practice to not request too much unnecessary resource when submitting a job, as your job's queuing time depend on your requested resource, and requesting unnecessary amount of cores/nodes/time could delay the running by even weeks.**

# Activating the Virtual Environment for Cluster Jobs

The above chunk of code has the following functionalities:
1. Change the current operating directory to the directory of the git repository
(and you may modify the path to your repo accordingly)
2. Activate the virtual environment we created for the project, with necessary python packages installed
3. Specify python as version 3.10. This is necessary to avoid version incompatibility for some projects

# Arranging Parallel Jobs (Please Modify this as Needed)

The above chunk of code is only an easy way to parallel jobs, and you may find some more effective ways to do it. What is done essentially is, to partition 50 simulations into 10 blocks of 5. The vector `start_end` collects endpoints for simulation indices. Then we loop over the index array, and submit jobs for each segment defined by these indices.

The line 

submits a task to a node, where the task is to use `python3` to run the script `selectinf/Tests/nbd_simulation_vary_signal.py`.

Here, the script's `main()` function takes in 4 arguments from the input stream, i.e., 
1. starting simulation index
2. ending simulation index
3. 0 is an argument specifically used for this project, this may not be needed for other project
4. 16 specifies we have 16 CPU cores for each node. This is needed for this particular project because I internally parallelized the inference algorithm over parameters. Again, if your method is purely serial, or something that does not need to be done sequentially (e.g. a joint MLE estimation approach), internal parallelization is not needed.

The ampersand `&` at the end of the command is necessary for the simultaneous job submission to different nodes, as well as the `done` and `wait` commands. Removing these may cause problems in job submissions.

To instantiate your understanding, I have attached the `main()` function for `nbd_simulation_vary_signal.py`, so that you know how to write a main function that takes in these 4 arguments accordingly.

```
if __name__ == '__main__':
    argv = sys.argv
    # argv = [..., start, end, logic_tf, ncores]
    start, end = int(argv[1]), int(argv[2])
    logic_tf = int(argv[3])
    ncores = int(argv[4])

    nbd_simulations_vary_signal(range_=range(start, end), logic_tf=logic_tf,
                                ncores=ncores,m=2)
```

`nbd_simulations_vary_signal()` is the simulation function.