# Questions about `gmxapi`
In this notebook, we want to see if we can use `gmxapi` to launch one or more expanded ensemble simulations with MPI-enabled GROMACS on Bridges-2. With the same input file `sys.tpr`, below I demonstrate 3 cases of running expanded ensemble simulations and show the corresponding issues I have with them. In each folder `case_*`, there is a Python script `case_*.py` and a common `tpr` file `sys.tpr` for starting the simulation. The cells in this notebook are all executed in an interactive node with 128 cores on Bridges-2. Necessary modules and virtual environment have been activated before the notebook was launched. 

As a record, here we are using `gmxapi 0.2.3`. (I had issues with installing version 0.3.0, so I could only install from the source code in the root directory of GROMACS 2021.4, which is version 0.2.3.)

In [1]:
import sys
sys.path.append('/jet/home/wehs7661/gmxapi_21.4/lib/python3.8/site-packages')
import gmxapi as gmx
print(gmx.__version__)

0.2.3


## Case 1: Running one expanded ensemble using `gmx.commandline_operation`

Here let's first take a look the content of `case_1.py`:

In [2]:
%%bash
cat case_1/case_1.py

import gmxapi as gmx

md = gmx.commandline_operation('gmx_mpi',
                               arguments=['mdrun', '-deffnm', 'sys'],
                               input_files={'-s': 'sys.tpr'})
md.run()

print(f'Return code of the process: {md.output.returncode.result()}\n')

if md.output.returncode.result() != 0:
   print(f'Error of the process:\n\n {md.output.erroroutput.result()}')



And below we run `case_1.py`:

In [3]:
%%bash
cd case_1/
mpirun -np 128 python -m mpi4py case_1.py

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return code of the process: 0

Return c

In [4]:
%%bash
cd case_1/ && tail *log


NOTE: 11 % of the run time was spent communicating energies,
      you might want to increase some nst* mdp options

               Core t (s)   Wall t (s)        (%)
       Time:      155.358        1.214    12796.2
                 (ns/day)    (hour/ns)
Performance:      355.965        0.067
Finished mdrun on rank 0 Mon May  2 17:54:01 2022



As shown above, with 128 cores, the performance was over 300 ns/day and no errors occurred.

Regarding the first case, I have the following questions:
- When I first familiarized myself with `gmxapi` (version 0.3.0) with GROMACS running with thread-MPI, I found that whenever `gmx.commandline_operation` was used, there was always a newly created folder named in the form of `gmxapi.commandline.cli*_i*`. However, in this case, when we are using `gmxapi` 0.2.3 with MPI-enabled GROMACS, there is no such a folder and the outputs of the command are saved as the same working directory where the command got executed. I'm wondering the reason for this. Does this come from different versions of `gmxapi` or thread-MPI v.s. MPI?
- Also, with `gmxapi 0.3.0` and thread-MPI GROMACS, the object created by `gmx.commandline_opeartion` has attributes like `stderr` and `stdout` so that the user can print STDERR and STDOUT by printing `stderr.result()` and `stdout.result()`. However, in our case here in the notebook, I found that the object `md` does not have such attributes and I could only print out the return code. I'm wondering how I can print out `STDOUT` and `STDERR` in my case here? Are they only available in version `0.3.0`?

(The examples I set up when exploring `gmxapi` 0.3.0 can be found [here](https://github.com/wehs7661/gmxapi_practice/blob/master/tutorial_nonMPI/gmxapi_nonMPI.ipynb).)

## Case 2: Running one expanded ensemble using `gmx.mdrun`

Again, here is the content of `case_2.py`.

In [5]:
%%bash
cat case_2/case_2.py

import gmxapi as gmx

simulation_input = gmx.read_tpr('sys.tpr')
md = gmx.mdrun(simulation_input)
md.run()

if md.output.returncode.result() != 0:
   print(f'Error of the process:\n\n {md.output.erroroutput.result()}')




And now we execute `case_2.py` to run the simulation. 

In [6]:
%%bash 
cd case_2/
mpirun -np 128 python -m mpi4py case_2.py

--------------------------------------------------------------------------
A process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host:          [[28461,1],75] (PID 21234)

If you are *absolutely sure* that your application will successfully
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file /ocean/projects/cts160011p/wehs7661/gmxapi_practice/Issues/case_2/mdrun_0_1405803258414_0/topol.tpr, VERSION 

Process is interrupted.


As shown above, an error occurred and the simulation just hung without crashing (I manually interrupted it). It is not entirely clear how the error was triggered but to my understanding, it might be relevant to running the simulation on a single process on an allocation with multiple cores, which is why the error says that "127 more processes have sent help message help-opal-runtime.txt".

Regarding this case, I have the following questions:
- While `gmx.commandline_operation` can finish the simulation without crashing, it seems that `gmx.mdrun` was not able to correctly launch the simulation. I'm wondering the reason for this, especially since I'm more likely to rely on `gmx.mdrun` instead of `gmx.commandline_operation` for setting up ensembles of simulations.
- When using `gmxapi` 0.3.0 with thread-MPI GROMACS, I found that `gmx.mdrun` always generated a folder named in the form of `mdrun_*_i*_*` (e.g. `mdrun_0_i0_0`). In our case here, the name of the newly created folder (`mdrun_0_1444030601518_0`) does not follow this convention. I'm wondering the reason behind this and if we can ensure a more covenient naming convention like `mdrun_0_i0_0`.

## Case 3: Running multiple expanded ensemble simulations simultaneously 

To simultaneously run multiple expanded ensemble simulations, it seems that `gmx.mdrun` is the only choice since `gmx.commandline_operation` doesn't seem to support an ensemble of simulations. As such, we write `case_3.py` as follows to launch 16 simulations.

In [7]:
%%bash
cat case_3/case_3.py

import gmxapi as gmx

tpr_list = ['sys.tpr' for i in range(16)]
simulation_input = gmx.read_tpr(tpr_list)
md = gmx.mdrun(simulation_input)
md.run()


Since there are 16 simulations, we run `case_3.py` with 16 processes:

In [8]:
%%bash 
cd case_3/
mpirun -np 16 python -m mpi4py case_3.py

Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021

Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021

Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021.4-plumed-2.8.0 (single precision)
Reading file sys.tpr, VERSION 2021

In [9]:
%%bash
cd case_3/ && ls

case_3.py
mdrun_0_1398987389751_9
mdrun_0_1399939571782_14
mdrun_0_1401919121463_2
mdrun_0_1402919887927_6
mdrun_0_1404236386359_7
mdrun_0_1408559651892_5
mdrun_0_1419182768710_10
mdrun_0_1436694189620_3
mdrun_0_1438002348852_4
mdrun_0_1439905044806_11
mdrun_0_1450819293495_8
mdrun_0_1458568374598_15
mdrun_0_1458861203782_12
mdrun_0_1459643762246_13
mdrun_0_1462404481076_1
mdrun_0_1463586672180_0
sys.tpr


In [10]:
%%bash
cd case_3 && ls mdrun_*_0/*  

mdrun_0_1463586672180_0/confout.gro
mdrun_0_1463586672180_0/dhdl.xvg
mdrun_0_1463586672180_0/ener.edr
mdrun_0_1463586672180_0/md.log
mdrun_0_1463586672180_0/state.cpt
mdrun_0_1463586672180_0/topol.tpr
mdrun_0_1463586672180_0/traj_comp.xtc


In [11]:
%%bash
cd case_3 && ls mdrun_*_1/*   # the simulation was not performed so there is only a tpr file, which is the same as sys.tpr

mdrun_0_1462404481076_1/topol.tpr


As shown above, a call to the `fork()` system called was still involved. This time, 16 folders corresponding to the 16 simulations were created but they are still not in the form of `mdrun_*_i*_*`. Importantly, this time at least the first simulation (index 0) got executed by `gmx.mdrun` successfully, but the process then ended without running all the other 15 simulations. No simulations hung without crashing as we've seen in case 2. 

Regarding this case, I have the following questions:
- What is the reason that only one simulation got executed? Using `gmxapi` 0.3.0 with thread-MPI GROMACS, I was able to use a script like `case_3.py` with `mpiexec` to run multiple simulations simultaneously without any problems. 
- When only running a single simulation, we can use the following script to modify parameters by reading the `tpr` file:
```
simulation_input = gmx.read_tpr('sys.tpr')
modified_input = gmx.modify_input(input=simulation_input, parameters={'nsteps': 1000})
md = gmx.mdrun(input=modified_input)
md = gmx.mdrun(simulation_input)
md.run()
```
However, what if we need to change parameters for each of the 16 simulations such that each simulation has a different value for one or more parameters? I'm asking this because our first staged goal is to run an ensemble of expanded ensemble simulations, where we might need to constantly restart or extend the simulations with updated weights. The execution of the ensemble should be as simple as launching a single command like `mpirun -np 16 python -m mpi4py ensemble_EE.py`, where the algorithm about how the weights should be changed or how the simulations interact with each other are specified in `ensemble_EE.py`. I know that it is possible to write a customized Python script to parse the `mdp` file and generate an updated `tpr` file by `grompp` using `gmx.commandline_operation`. However, running `grompp` only requires 1 process (`-np 1`), which means that we can't update the `tpr` file in `ensemble_EE.py` that is always executed with 16 processes. I guess it's not entirely clear to me how we can automate the adjustment of MD parameters for an ensemble of simulations using `gmxapi` in a single script like `ensemble_EE.py`.