# rush-py Full Walkthrough

Below we'll walk through the process of building and running a drug discovery workflow explaining concepts in the Rush SDK, where we prepare a protein and ligand for molecular dynamics simulation, run the molecular dynamics and perform a mmpbsa energy calculation.

First, install the following modules via pip - we require Python > 3.9. Only `rush-py` is neccessary for interacting with the api, but we use additional packages in this notebook for fetching and visualizing data.
```
pip install rush-py pdb-tools py3Dmol requests
```

# 0) Setup
This is where we prepare the rush client, directories, and input data we'll be working with

## 0.0) Imports

In [None]:
import os
import tarfile
from datetime import datetime
from pathlib import Path

from pdbtools import (
    pdb_fetch,
    pdb_delhetatm,
    pdb_selchain,
    pdb_rplresname,
    pdb_keepcoord,
    pdb_selresname,
)
import py3Dmol

import rush

## 0.1) Credentials
Retrieve your api token from the [Rush UI](https://rush.qdx.co/dashboard/settings).

You can either set the RUSH_TOKEN and RUSH_URL environment variables, or provide them as variables to the client directly. 

To see how to set environment variables, [Wikipedia](https://en.wikipedia.org/wiki/Environment_variable) has an extensive article

In [None]:
RUSH_TOKEN = os.getenv("RUSH_TOKEN") or "YOUR_TOKEN_HERE"
RUSH_URL = os.getenv("RUSH_URL") or "https://tengu.qdx.ai"

## 0.2) Configuration
Lets set some global variables that define our project, these are not required, but are good practice to help organize the jobs that will be persisted under your account.

Make sure you create a unique set of tags for each run.
Good practice is to have at least each of the experiment name and system name as a tag.

In [None]:
EXPERIMENT = "rush-py-v2-explainer"
SYSTEM = "1B39"
LIGAND = "ATP"
TAGS = ["qdx", EXPERIMENT, SYSTEM, LIGAND]

In [None]:
# |hide
# Users won't generally create a workspace
# We nuke to ensure run is reproducable
WORK_DIR = Path.home() / "qdx" / EXPERIMENT

if WORK_DIR.exists():
    client = rush.build_blocking_provider_with_functions()
    client.nuke(remote=False)
import os

os.environ["RUSH_RESTORE_BY_DEFAULT"] = "False"
os.makedirs(WORK_DIR, exist_ok=True)

2024-03-21 18:47:27,997 - rush - INFO - Restoring by default via env


## 0.3) Build your client
Get our client, for calling modules and using the Rush API.

As mentioned earlier access_token and url are optional, if you have set the env variables RUSH_TOKEN and RUSH_URL.

`batch_tags` will be applied to each run that is spawned by this client.

A folder called `.rush` will be created in your workspace directory (defaults to the current working directory, can be overridden by passing `workspace=` to the provider builder

In [None]:
# By using the `build_provider_with_functions` method,
# we will also build helper functions calling each module
client = rush.build_blocking_provider_with_functions(
    access_token=RUSH_TOKEN, url=RUSH_URL, batch_tags=TAGS
)

2024-03-21 18:47:29,869 - rush - INFO - Not restoring by default via env


## 0.5) Input selection
Set where we want to save our inputs


In [None]:
SYSTEM_PDB_PATH = client.workspace / f"{SYSTEM}.pdb"
PROTEIN_PDB_PATH = client.workspace / f"{SYSTEM}_P.pdb"
LIGAND_SMILES_STR = "c1nc(c2c(n1)n(cn2)[C@H]3[C@@H]([C@@H]([C@H](O3)CO[P@@](=O)(O)O[P@](=O)(O)OP(=O)(O)O)O)O)N"
LIGAND_FILE_PATH = client.workspace / f"{SYSTEM}L.smi"
LIGAND_PDB_PATH = client.workspace / f"{LIGAND}_L.pdb"

Fetch datafiles from rcsb

In [None]:
complex = list(pdb_fetch.fetch_structure(SYSTEM))
protein = pdb_delhetatm.remove_hetatm(pdb_selchain.select_chain(complex, "A"))
# select the ATP residue
ligand = pdb_selresname.filter_residue_by_name(complex, LIGAND)
# we require ligands to be labelled as UNL
ligand = pdb_rplresname.rename_residues(ligand, LIGAND, "UNL")
# we don't want to repeat all of the remark / metadata that is already in the
# protein
ligand = pdb_keepcoord.keep_coordinates(ligand)
# write our files to the locations defined in the config block
with open(SYSTEM_PDB_PATH, "w") as f:
    for l in complex:
        f.write(str(l))
with open(PROTEIN_PDB_PATH, "w") as f:
    for l in protein:
        f.write(str(l))
with open(LIGAND_PDB_PATH, "w") as f:
    for l in ligand:
        f.write(str(l))

## 0.6) View rush modules
Rush modules are "functions" that perform various computational chemistry tasks can be run on HPC infrastructure. We maintain multiple versions of these functions so that your scripts will stay stable over upgrades.

In [None]:
# Get our latest modules as a dict[module_name, module_path]
# If a lock file exists, load it so that the run is reproducible
# This will be done automatically if you use the `build_provider_with_functions`
# method
modules = client.get_latest_module_paths_blocking()

In [None]:
module_name = "hermes_energy"
module_path = modules[module_name]
print(module_path)

github:talo/tengu-prelude/4a16cf12be35f0a40c113d4410046de865f1906f#hermes_energy


  - `module_name` is a descriptive string and indicates the "function" the module is calling;
  - `module_path` is a versioned rush "endpoint" for a module accessible via the client.

Using the same `module_path` string across multiple runs provides reproducibility.

## 0.6) Use module functions
Next, we'll use helper functions for the modules that we've fetched

If we have built a provider with functions, we can use the python `help()` function to describe their usage.

The QDX Type Description is a standard type definition across multiple programing languages to assist in interoperablility.
@ indicates that the type is stored in a file, which will be synced to cloud storage

In [None]:
help(client.convert)

Help on function convert in module rush.provider:

convert(*args: *tuple[EnumValue, RushObject[bytes]], target: 'Target | None' = None, resources: 'Resources | None' = None, tags: 'list[str] | None' = None, restore: 'bool | None' = None) -> tuple[RushObject[list[Record]]]
    Convert biomolecular and chemical file formats to the QDX file format. Supports PDB and SDF

    Module version:
    `github:talo/tengu-prelude/b345a0b0077225c63d904d0e03fb9ca1acec55ed#convert`

    QDX Type Description:

        format: Format[PDB | SDF];
        input: Object[@$Bytes]
        ->
        output: Object[[Conformer]]


    :param format: the format of the input file
    :param input: the input file
    :return output: the output conformers



# 1) Running Rush Modules
Below we'll call modules using the functions created on the client.

The parameters to a rush module function would look like the following

  - `*args`: The values or ids passed to the :
    1. For @Objects -  A `pathlib.Path` or a file-like object like `BufferedReader`, `FileIO`, `StringIO` etc.:
         Loads the data in the file as an argument.
         **NOTE**: The uploaded value isn't just the string of the file,
         so don't pass the string directly; pass the path or wrap in StringIO.
    2. An rush `Provider.Argument` or `ArgId` returned by a previous call to a rush module via `client.[some_module_name]()`:
         The `ArgId` type wraps data for use within rush. It may refer to an object already
         uploaded to rush storage, such as outputs of other run calls.
         See below for more details. It's easier to understand when you see an example.
    3. A parameter, i.e. a value of any other type, including `None`:
         Ensure the values match what is outlined in the *args list
  - **kwargs
      - `target`: The machine we want to run on (eg. `NIX_SSH` for a cluster, `GADI` for a supercomputer).
      - `resources`: The resources to use on the target. The most commonly provided being `{"gpus": n, "storage": storage_in_units, "storage_units": "B" | "MB" | "GB", "walltime": mins}`.
      - `tags`: Tags to associate with our run, so we can easily look up our runs. They will be populated by the `batch_tags` passed to                 the cleint on constructionby default
      - `restore`: If this is set to True - the function will check if a single module_instance exists for the same version of the                          function with the same tags, and return that instead of re-running.

The return value is a list of `Provider.Arguments`. You can wait for them to resolve by calling `await your_argument.get()`, or pass the arguments directly to subsequent functions, which will cause Rush to do the waiting for you.

You can see the status of all the the jobs submitted for your workspace or session by going `client.status()`

We will now demonstrate how this works in action

## 1.1) Input Preparation

### 1.1.1) Prep the protein
First we will run the protein preparation routine (using pdbfixer and pdb2pqr internally) to prepare the protein for molecular dynamics

In [None]:
# we can check the arguments and outputs for prepare_protein with help()
help(client.prepare_protein)

Help on function prepare_protein in module rush.provider:

prepare_protein(*args: *tuple[RushObject[bytes]], target: 'Target | None' = None, resources: 'Resources | None' = None, tags: 'list[str] | None' = None, restore: 'bool | None' = None) -> tuple[RushObject[list[Record]], RushObject[bytes]]
    Prepare a PDB for downstream tasks: protonate, fill missing atoms, etc.

    Module version:
    `github:talo/prepare_protein/947cdbc000031e192153a20a9b4a8fbb12279102#prepare_protein_tengu`

    QDX Type Description:

        input_pdb: Object[@$Bytes]
        ->
        output_qdxf: Object[[Conformer]];
        output_pdb: Object[@$Bytes]


    :param input_pdb: An input protein as a file; one PDB file
    :return output_qdxf: An output protein a vec: one qdxf per model in pdb
    :return output_pdb: An output protein as a file: one PDB file



In [None]:
# Here we run the function, it will return a Provider.Arg which you can use to
# fetch the results
# We set restore = True so that we can restore a previous run to the same path
# with the same tags
(prepared_protein_qdxf, prepared_protein_pdb) = client.prepare_protein(
    PROTEIN_PDB_PATH
)
# This initially only has the id of your result; we will show how to fetch the
# actual value later
print(f"{datetime.now().time()} | Running protein prep!")

18:47:32.581235 | Running protein prep!


### 1.1.2) Checking results

#### 1.1.2.1) Run statuses
This will show the status of all of your runs

In [None]:
client.status()

{'14519ca4-3e02-444d-86f7-16fdc68a3f54': (<ModuleInstanceStatus.RESOLVING: 'RESOLVING'>,
  'prepare_protein',
  1)}

#### 1.1.2.2) Run Logs
If any of our runs fail, we can check their logs with or view them in the Rush UI

In [None]:
for instance_id, (status, name, count) in (client.status()).items():
    if status.value == "FAILED":
        client.logs(instance_id, "stderr")

#### 1.1.2.3) Run Values
This will return the "value" of the output from the function - for files you will recieve a url that you can download, otherwise you will recieve them as python types

In [None]:
prepared_protein_pdb.get()

2024-03-21 18:47:32,865 - rush - INFO - Argument e44b7804-4d3b-4cf8-b7b1-afffb6950ae9 is now ModuleInstanceStatus.RESOLVING
2024-03-21 18:47:36,180 - rush - INFO - Argument e44b7804-4d3b-4cf8-b7b1-afffb6950ae9 is now ModuleInstanceStatus.ADMITTED
2024-03-21 18:47:49,594 - rush - INFO - Argument e44b7804-4d3b-4cf8-b7b1-afffb6950ae9 is now ModuleInstanceStatus.DISPATCHED
2024-03-21 18:47:56,198 - rush - INFO - Argument e44b7804-4d3b-4cf8-b7b1-afffb6950ae9 is now ModuleInstanceStatus.RUNNING
2024-03-21 18:48:06,886 - rush - INFO - Argument e44b7804-4d3b-4cf8-b7b1-afffb6950ae9 is now ModuleInstanceStatus.AWAITING_UPLOAD


'https://storage.googleapis.com/rush_store_default/f68d91e0-bd46-4582-9e5c-8a7cbb08d356?x-goog-signature=555eeb8f540982caf4e4c854a0ed5c5a57f49036e38b4a8eca22b63ca78189a12e2efc83c1bb6651491bd5893a194c46ceffc6dfe273d15de94559b7a63104a2c63acd260096c7c646c156d0cbfb8c5743a207c54eafcf0f1136c1db40a82aa233a54458dfa30d60944cd8b5e5ae685fef5f01b35cf6176007f5461f3e8d34a90b92e0ee5c01ea860c025bf3f8257ab6bb823e26dacc50399b174fb5692a6b58b33adf92c17347624036d6f5e7349b21499b193e34bd626b7bc84cbe531bd431858cfff8864058a9197a99f81fc4d2683ee1125e72d48d96e4199e1f2124820b3d6191b2dfb2446ef875a63e7ae5739e8f27b683c77728b91c32c604dc354263&x-goog-algorithm=GOOG4-RSA-SHA256&x-goog-credential=qdx-store-user%40humming-bird-321603.iam.gserviceaccount.com%2F20240321%2Fasia-southeast1%2Fstorage%2Fgoog4_request&x-goog-date=20240321T104835Z&x-goog-expires=3600&x-goog-signedheaders=host'

#### 1.1.2.4) Downloads
We provide a utility to download files into your workspace, you can either provide a filename, which will be saved in `workspace/objects/[filename]`, or you can provide your own filepath which the client will use as-is

In [None]:
downloaded_protein_path = prepared_protein_pdb.download(
    filename="01_prepared_protein.pdb", overwrite=True
)

We can read our prepared protein pdb like this

In [None]:
with open(downloaded_protein_path, "r") as f:
    print(f.readline(), "...")

REMARK   1 CREATED WITH OPENMM 8.0, 2024-03-21
 ...


You should visualize your prepared protein to spot check any incorrectly transformed residues

In [None]:
view = py3Dmol.view()
with open(client.workspace / "objects" / "01_prepared_protein.pdb", "r") as f:
    view.addModel(f.read(), "pdb")
    view.setStyle({"cartoon": {"color": "spectrum"}})
    view.zoomTo()
    view.show()

### 1.1.3) Prep the ligand
Next we will prepare the ligand (using auto3d internally)

In [None]:
# we can check the inputs for prepare_ligand with help()
help(client.auto3d)

Help on function auto3d in module rush.provider:

auto3d(*args: *tuple[RushObject[bytes], str, Record], target: 'Target | None' = None, resources: 'Resources | None' = None, tags: 'list[str] | None' = None, restore: 'bool | None' = None) -> tuple[RushObject[None], RushObject[list[Record]]]
    Generate 3D conformers from SMILES strings and other inputs

    Module version:
    `github:talo/tengu-auto3d/bd31770c27581753010b6623cdd4bd82b0628e79#auto3d_tengu`

    QDX Type Description:

        molecule_file: Object[@$Bytes];
        molecule_file_type: string;
        options: Auto3dOptions {
            enumerate_isomer: bool?,
            threshold: f32?,
            gpu_idx: [u32]?,
            capacity: u32?,
            k: i32?,
            verbose: bool?,
            optimizing_engine: Auto3dOptimizingEngines[ANI2x | ANI2xt | AIMNET]?,
            enumerate_tautomer: bool?,
            mpi_np: u32?,
            job_name: string?,
            opt_steps: u32?,
            patience: u

In [None]:
with open(LIGAND_FILE_PATH, "w") as f:
    print(f"{LIGAND_SMILES_STR} {LIGAND_SMILES_STR}", file=f)

# takes a path with the SMILES
(ligand_sdf, ligand_qdxf) = client.auto3d(
    LIGAND_FILE_PATH,
    "smi",
    {"k": 2, "use_gpu": True},
    resources={"gpus": 1, "storage": "5", "storage_units": "MB"},
)

print(f"{datetime.now().time()} | Running ligand prep!")

18:48:39.815126 | Running ligand prep!


In [None]:
# we can check the status again
client.status()

{'651bbdba-54ac-4d83-bb3a-f82a4655cdfe': (<ModuleInstanceStatus.RESOLVING: 'RESOLVING'>,
  'auto3d',
  1),
 '14519ca4-3e02-444d-86f7-16fdc68a3f54': (<ModuleInstanceStatus.COMPLETED: 'COMPLETED'>,
  'prepare_protein',
  1)}

In [None]:
# we can download our outputs
ligand_sdf.download(filename="01_prepped_ligand.sdf", overwrite=True)

print(f"{datetime.now().time()} | Downloaded prepped ligand!")

2024-03-21 18:48:40,035 - rush - INFO - Argument e9593046-116f-4504-8d94-cee03515cca0 is now ModuleInstanceStatus.RESOLVING
2024-03-21 18:48:41,134 - rush - INFO - Argument e9593046-116f-4504-8d94-cee03515cca0 is now ModuleInstanceStatus.ADMITTED
2024-03-21 18:48:56,944 - rush - INFO - Argument e9593046-116f-4504-8d94-cee03515cca0 is now ModuleInstanceStatus.DISPATCHED
2024-03-21 18:49:02,460 - rush - INFO - Argument e9593046-116f-4504-8d94-cee03515cca0 is now ModuleInstanceStatus.RUNNING
2024-03-21 19:01:14,606 - rush - INFO - Argument e9593046-116f-4504-8d94-cee03515cca0 is now ModuleInstanceStatus.AWAITING_UPLOAD
19:01:48.602422 | Downloaded prepped ligand!


In [None]:
# we can read our outputs
with open(client.workspace / "objects" / "01_prepped_ligand.sdf", "r") as f:
    print(f.readline(), f.readline(), "...")

c1nc(c2c(n1)n(cn2)[C@H]3[C@@H]([C@@H]([C@H](O3)CO[P@@](=O)(O)O[P@](=O)(O)OP(=O)(O)O)O)O)N
      RDKit          3D
 ...


## 1.2) Run GROMACS (module: gmx)
Next we will run a molecular dynamics simulation on our protein and ligand, using gromacs (gmx)

In [None]:
help(client.gmx)

Help on function gmx in module rush.provider:

gmx(*args: *tuple[Optional[RushObject[Record]], Optional[RushObject[bytes]], Optional[RushObject[bytes]], Record], target: 'Target | None' = None, resources: 'Resources | None' = None, tags: 'list[str] | None' = None, restore: 'bool | None' = None) -> tuple[RushObject[bytes], RushObject[bytes], RushObject[bytes], RushObject[bytes], RushObject[bytes], RushObject[bytes], RushObject[bytes]]
    Runs a molecular dynamics simluation using GROMACS from either protein, ligand pdbs or conformers as inputs.
    Uses GMX 2023.3 https://doi.org/10.5281/zenodo.10017686 and Acpype https://doi.org/10.1186/1756-0500-5-367

    Module version:
    `github:talo/tengu-gmx/04cff2931b995c33263dfdb477d7f09c8bbd75a7#gmx_tengu`

    QDX Type Description:

        conformer: Object[Conformer];
        protein: Object[@$Bytes]?;
        ligand: Object[@$Bytes]?;
        gmx_config: GMXTenguConfig {
            ignore_hydrogens: bool?,
            ligand_charge: i8

In [None]:
import json
import qdx_py

PREPARED_LIGAND_PATH = client.workspace / "prepared_ligand.pdb"
with open(ligand_qdxf.download(overwrite=True)) as f:
    ligand_qdxf_out = json.load(f)
prepared_ligand_str = qdx_py.conformer_to_pdb(
    json.dumps(ligand_qdxf_out[0])
).replace("LIG", "UNL")
with open(PREPARED_LIGAND_PATH, "w") as f:
    f.write(prepared_ligand_str)

In [None]:
gmx_config = {
    "params_overrides": {
        "em": {"nsteps": 5000},
        "nvt": {"nsteps": 2000},
        "npt": {"nsteps": 2000},
        "md": {"nsteps": 5000},
        "ions": {},
    },
    "num_gpus": 0,
    "num_replicas": 1,
    "ligand_charge": None,
    "save_wets": False,
    "frame_sel": {
        "start_time_ps": 0,
        "end_time_ps": 10,
        "delta_time_ps": 1,
    },
}
# we pass the outputs from our prior runs directly, instead of their values,
# to prevent them from being re-uploaded
_, _, gmx_static_outputs, _, gmx_dry_frames, _, _ = client.gmx(
    None,
    prepared_protein_pdb,
    PREPARED_LIGAND_PATH,
    gmx_config,
    resources={
        "gpus": 1,
        "storage": 1,
        "storage_units": "GB",
    },
)
print(f"{datetime.now().time()} | Running GROMACS simulation!")

19:01:50.286922 | Running GROMACS simulation!


In [None]:
# we can check the status again
client.status()

{'07051549-ee09-4430-956e-927927b95173': (<ModuleInstanceStatus.RESOLVING: 'RESOLVING'>,
  'gmx',
  1),
 '651bbdba-54ac-4d83-bb3a-f82a4655cdfe': (<ModuleInstanceStatus.COMPLETED: 'COMPLETED'>,
  'auto3d',
  1),
 '14519ca4-3e02-444d-86f7-16fdc68a3f54': (<ModuleInstanceStatus.COMPLETED: 'COMPLETED'>,
  'prepare_protein',
  1)}

In [None]:
print("Fetching gmx results")

gmx_static_outputs.download(
    filename="02_gmx_static_outputs.tar.gz", overwrite=True
)
gmx_dry_frames.download(filename="02_gmx_dry_frames.tar.gz", overwrite=True)

print(f"{datetime.now().time()} | Downloaded GROMACS output!")

Fetching gmx results
2024-03-21 19:01:50,486 - rush - INFO - Argument 0f63a5cb-69b7-498e-ba69-6715964d61c9 is now ModuleInstanceStatus.RESOLVING
2024-03-21 19:01:54,902 - rush - INFO - Argument 0f63a5cb-69b7-498e-ba69-6715964d61c9 is now ModuleInstanceStatus.ADMITTED
2024-03-21 19:02:11,131 - rush - INFO - Argument 0f63a5cb-69b7-498e-ba69-6715964d61c9 is now ModuleInstanceStatus.DISPATCHED
2024-03-21 19:02:16,962 - rush - INFO - Argument 0f63a5cb-69b7-498e-ba69-6715964d61c9 is now ModuleInstanceStatus.RUNNING
2024-03-21 19:06:59,319 - rush - INFO - Argument 0f63a5cb-69b7-498e-ba69-6715964d61c9 is now ModuleInstanceStatus.AWAITING_UPLOAD
19:07:50.529147 | Downloaded GROMACS output!


In [None]:
# Extract the "dry" (i.e. non-solvated) pdb frames we asked for
with tarfile.open(
    client.workspace / "objects" / "02_gmx_dry_frames.tar.gz", "r"
) as tf:
    selected_frame_pdbs = [
        tf.extractfile(member).read()
        for member in tf
        if "pdb" in member.name and member.isfile()
    ]
    for i, frame in enumerate(selected_frame_pdbs):
        with open(
            client.workspace / "objects" / f"02_gmx_output_frame_{i}.pdb", "w"
        ) as pf:
            print(frame.decode("utf-8"), file=pf)

In [None]:
# Extract the ligand.gro file
with tarfile.open(
    client.workspace / "objects" / "02_gmx_static_outputs.tar.gz", "r"
) as tf:
    gro = [
        tf.extractfile(member).read()
        for member in tf
        if "md.ligand_in.0.gro" in member.name
    ][0]
    with open(client.workspace / "objects" / f"md.ligand_in.0.gro", "w") as pf:
        print(gro.decode("utf-8"), file=pf)

In [None]:
(client.workspace / "objects" / f"md.ligand_in.0.gro").read_text()[0:50]

'Protein in water\n47824\n    1MET      N    1   4.80'