# GROMACS Resume -  Start a long GROMACS run that times out, and resume it

On the supercomputers, a maximum runtime of 24h is enforced. If a longer GROMACS run is needed, the run can easily be resumed. One of the outputs provided by the Rush `gmx` module is designed to be used as the input to the `gmx_resume` module, which will resume the run from the latest checkpoint. The outputs to this module are identical to those of the `gmx` module itself, so the run can be resumed as many times as necessary to finish it.

## 0.0) Imports

In [None]:
from pathlib import Path
import time

import rush

In [None]:
# |hide
# Users won't generally create a workspace
# We nuke to ensure run is reproducible
import os

WORK_DIR = Path.home() / "qdx" / "tutorial-gmx-resume"

if WORK_DIR.exists():
    client = rush.Provider(workspace=WORK_DIR)
    await client.nuke(remote=False)

os.makedirs(WORK_DIR, exist_ok=True)

2024-04-01 14:31:15,462 - rush - INFO - Not restoring by default via default
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file


In [None]:
RUSH_TOKEN = os.getenv("RUSH_TOKEN") or "YOUR_TOKEN_HERE"
client = rush.build_blocking_provider_with_functions(access_token=RUSH_TOKEN)

2024-04-01 14:31:15,487 - rush - INFO - Not restoring by default via default
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file
                                Use `.update_modules()` to update the lock file


In [None]:
# |hide
# We hide this because users will generally not set a workspace, and won't restore by default
client = rush.build_blocking_provider_with_functions(
    batch_tags=["tutorial-resume-gmx"],
    workspace=WORK_DIR,
)

2024-04-01 14:31:16,781 - rush - INFO - Not restoring by default via default


## 0.1) Input Download and Selection

In [None]:
!pdb_fetch '1B39' | pdb_selchain -A | pdb_delhetatm > '1B39_A_nohet.pdb'

## 1) Input Preparation

In [None]:
_, prepared_protein_pdb = client.prepare_protein(
    Path.cwd() / "1B39_A_nohet.pdb", None, None
)

## 2.1) Run GROMACS (modules: gmx, gmx_resume)
Next we will run a molecular dynamics simulation on our protein using gromacs via the `gmx` module.

We'll set `timeout_duration_mins = 1` so that the run times out before it finishes, and then resume via the `gmx_resume` module using the first output, which is the archive that contains all the necessary data for resuming the run from the last saved checkpoint.

We'll set `checkpoint_interval_mins = 1.0/60` so that the checkpointing takes place once per second.

We can restart as many times as we need to in order for the run to finish. Use a unique tag for each restarted run so that each sequential restart will be tagged and cached appropriately. See below for an example.

For each restarted run, please pass the same config file that was passed to the original `gmx` call. There is no support for shortening or extending runs, or changing any other run parameters, of runs that have already been started. Passing the same config ensures that progress is reported properly and that there are no other inconsistencies. So, the initial run config should specify the full desired run.

One current limitation is that the frame selection can only operate on the data generated by the last call to `gmx` or `gmx_restart`. Otherwise, the output xtc files from all the calls must be joined and processed manually.

In [None]:
gmx_config = {
    "params_overrides": {
        "nvt": {"nsteps": 2000},
        "npt": {"nsteps": 2000},
        "md": {"nsteps": 150000},
    },
    "frame_sel": {
        "start_time_ps": 290,
        "end_time_ps": 300,
        "delta_time_ps": 1,
    },
    "checkpoint_interval_mins": 1.0 / 60,
    "timeout_duration_mins": 1,
    "num_gpus": 1,
    "save_wets": False,
}

In [None]:
resume_files_first, streaming_outputs, static_outputs, *rest = client.gmx(
    None,
    prepared_protein_pdb,
    None,
    gmx_config,
    resources={"gpus": 1, "storage": 1, "storage_units": "GB"},
)

## Checking progress of GROMACS run
To determine if your GROMACS run is done, or if further runs are required, you can look at the progress output.
How to check this output is demonstrated below. 


This is an example progress event that indicates that the job has not completed, and will require resuming.
`n` is the number of execution steps in the GMX module. If `n` is less than `n_expected`, or `done` is false the run is not yet completed.


`gmx_resume_tengu progress: {
  "n": 600,
  "n_expected": 601,
  "n_max": 601,
  "done": false
}`

We keep running resume until the progress is done

In [None]:
help(client.gmx_resume)

Help on function gmx_resume in module rush.provider:

gmx_resume(*args: *tuple[RushObject[bytes], Record], target: 'Target | None' = None, resources: 'Resources | None' = None, tags: 'list[str] | None' = None, restore: 'bool | None' = None) -> tuple[RushObject[bytes], RushObject[bytes], RushObject[bytes], RushObject[bytes], RushObject[bytes], RushObject[bytes], RushObject[bytes]]
    Runs a molecular dynamics simluation using GROMACS from either protein, resuming from a checkpoint.
    Uses GMX 2023.3 https://doi.org/10.5281/zenodo.10017686 and Acpype https://doi.org/10.1186/1756-0500-5-367

    Module version:
    `github:talo/tengu-gmx/eaaa2472bd2dc67eed931fa1816fd0b46c509599#gmx_resume_tengu`

    QDX Type Description:

        resume_files: Object[@$Bytes];
        gmx_config: GMXTenguConfig {
            water_box_size: f32?,
            ligand_charge: i8?,
            save_wets: bool?,
            frame_sel: FrameSelConfig {
                start_time_ps: u32,
                del

In [None]:
done = False
resumes = 0
resume_files = resume_files_first
while not done:
    resume_files, _, _, xtc_dry, pdb_dry, _, _ = client.gmx_resume(
        resume_files,
        gmx_config,
        tags=[f"gmx-resume-{resumes}"],
        restore=False,
        resources={"gpus": 1, "storage": 1, "storage_units": "GB"},
    )
    # wait for module to finish
    resume_files.get()
    progress = client.module_instance_blocking(resume_files.source).progress
    print(progress)
    done = progress.done
    resumes += 1

2024-04-01 14:31:19,696 - rush - INFO - Argument f680d54f-dc12-4eb0-8b54-156b914816d7 is now ModuleInstanceStatus.RESOLVING
2024-04-01 14:35:39,659 - rush - INFO - Argument f680d54f-dc12-4eb0-8b54-156b914816d7 is now ModuleInstanceStatus.ADMITTED
2024-04-01 14:36:13,472 - rush - INFO - Argument f680d54f-dc12-4eb0-8b54-156b914816d7 is now ModuleInstanceStatus.DISPATCHED
2024-04-01 14:36:20,285 - rush - INFO - Argument f680d54f-dc12-4eb0-8b54-156b914816d7 is now ModuleInstanceStatus.RUNNING
2024-04-01 14:37:24,915 - rush - INFO - Argument f680d54f-dc12-4eb0-8b54-156b914816d7 is now ModuleInstanceStatus.AWAITING_UPLOAD
n=101100 n_expected=154000 n_max=154000 done=False
2024-04-01 14:38:15,234 - rush - INFO - Argument 33b4853c-9b35-4a1a-9b1f-3117b8b36114 is now ModuleInstanceStatus.RESOLVING
2024-04-01 14:38:16,344 - rush - INFO - Argument 33b4853c-9b35-4a1a-9b1f-3117b8b36114 is now ModuleInstanceStatus.ADMITTED
2024-04-01 14:38:28,593 - rush - INFO - Argument 33b4853c-9b35-4a1a-9b1f-3117b

# Downloading Results
To download extracted frames and fetch their pdbs, we can do the following

In [None]:
pdb_dry.download("dry_frames.tar.gz")

PosixPath('/home/machineer/qdx/tutorial-gmx-resume/objects/dry_frames.tar.gz')

In [None]:
import tarfile

with tarfile.open(client.workspace / "objects" / "dry_frames.tar.gz", "r") as tf:
    selected_frame_pdbs = [
        tf.extractfile(member).read()
        for member in tf
        if "pdb" in member.name and member.isfile()
    ]
    for i, frame in enumerate(selected_frame_pdbs):
        with open(
            client.workspace / "objects" / f"gmx_output_frame_{i}.pdb", "w"
        ) as pf:
            print(frame.decode("utf-8"), file=pf)

In [None]:
with open(client.workspace / "objects" / "gmx_output_frame_0.pdb", "r") as f:
    print(str.join("", f.readlines()[0:10]))

REMARK    GENERATED BY TRJCONV
TITLE     Protein in water t= 299.00000 step= 149500
REMARK    THIS IS A SIMULATION BOX
CRYST1   87.942   87.942   87.941  60.00  60.00  90.00 P 1           1
MODEL       10
ATOM      1  N   MET     1      49.440  60.400   7.670  1.00  0.00           N
ATOM      2  H1  MET     1      49.050  59.970   8.500  1.00  0.00           H
ATOM      3  H2  MET     1      48.590  60.570   7.150  1.00  0.00           H
ATOM      4  H3  MET     1      49.910  59.650   7.180  1.00  0.00           H
ATOM      5  CA  MET     1      50.310  61.580   7.860  1.00  0.00           C

