# rush-py

> Python SDK for the QDX Rush quantum computational chemistry workflow management system

# Quickstart
This document will walk through executing jobs on the Rush platform. For a comprehensive guide on the concepts and constructing a full workflow, see the [full rush-py explainer](https://talo.github.io/rush-py/full-rush-py-explainer.html) document.

First, install the following modules via pip—we require Python ≥ 3.9:
```
pip install rush-py pdb-tools
```

# 0) Code Sample
See the detailed breakdown in sections.

*_NOTE_*: This assumes that you are running code in a Jupyter notebook, which allows for top level `await` calls. If you are writing a normal Python script, you will need to wrap your code in something like the following:
``` python
import asyncio
async def main():
    #your code here
asyncio.run(main())
```

In [None]:
# |hide
!rm -r ./.rush

In [None]:
# Get a pdb to work with - we use the pdb-tools cli here
# but you can download directly from rcsb.org
!pdb_fetch '1brs' | pdb_selchain -A | pdb_delhetatm > '1B39_A_nohet.pdb'

In [None]:
# ...import the dependencies and set your configuration
import os
from pathlib import Path
import rush
import asyncio

RUSH_TOKEN = os.getenv("RUSH_TOKEN") or "YOUR_TOKEN_HERE"

# 1.3 Build your client
client = await rush.build_provider_with_functions(access_token=RUSH_TOKEN)

# 2.1 Prepare the protein
prepared_protein_qdxf, prepared_protein_pdb = await client.prepare_protein(
    Path("1B39_A_nohet.pdb")
)

# 2.3 Return run values
protein_qdxf_value = await prepared_protein_qdxf.get()

2024-02-29 12:13:37,765 - rush - INFO - Argument a940d387-80be-4b30-8deb-b8039a19f959 is now ModuleInstanceStatus.RESOLVING
2024-02-29 12:13:40,000 - rush - INFO - Argument a940d387-80be-4b30-8deb-b8039a19f959 is now ModuleInstanceStatus.ADMITTED
2024-02-29 12:13:53,525 - rush - INFO - Argument a940d387-80be-4b30-8deb-b8039a19f959 is now ModuleInstanceStatus.DISPATCHED
2024-02-29 12:14:00,314 - rush - INFO - Argument a940d387-80be-4b30-8deb-b8039a19f959 is now ModuleInstanceStatus.AWAITING_UPLOAD


In [None]:
# |hide
!rm 1B39_A_nohet.pdb

# 1) Setup
This is where we prepare the rush client, directories, and input data we'll be working with.

## 1.0) Imports

In [None]:
import json
import os
import tarfile
from datetime import datetime
from pathlib import Path

import py3Dmol
import requests
from pdbtools import pdb_delhetatm, pdb_fetch, pdb_selchain

import rush

## 1.1) Credentials
Retrieve your API token from the [Rush UI](https://rush.qdx.co/dashboard/settings).

You can either set the `RUSH_URL` and `RUSH_TOKEN` environment variables or provide them as variables to the client directly.

To see how to set environment variables, [Wikipedia](https://en.wikipedia.org/wiki/Environment_variable) has an extensive article.

In [None]:
RUSH_URL = os.getenv("RUSH_URL") or "https://tengu.qdx.ai"
RUSH_TOKEN = os.getenv("RUSH_TOKEN") or "YOUR_TOKEN_HERE"

## 1.2) Configuration
Lets set some global variables that define our project. These are not required, but are good practice to help organize the jobs that will be persisted under your account.

Make sure you create a unique set of tags for each run.
Good practice is to have at least each of the experiment name and system name as a tag.

In [None]:
EXPERIMENT = "rush-py-quickstart"
SYSTEM = "1B39"
TAGS = ["qdx", EXPERIMENT, SYSTEM]

In [None]:
# |hide
WORK_DIR = Path.home() / "qdx" / EXPERIMENT

if WORK_DIR.exists():
    client = rush.Provider(workspace=WORK_DIR)
    await client.nuke(remote=True)

os.makedirs(WORK_DIR, exist_ok=True)
import sys

os.chdir(WORK_DIR)

## 1.3) Build your client
Get our client, which we'll use for calling modules and generally for using the Rush API.

As mentioned earlier, `url` and `access_token` are optional if you have set the env variables `RUSH_URL` and `RUSH_TOKEN` respectively.

`batch_tags` will be applied to each run that is spawned by this client.

A folder called `.rush` will be created in your workspace directory (defaults to the current working directory, can be overridden by passing `workspace=` to the provider builder).

In [None]:
# By using the `build_provider_with_functions` method,
# we will also build helper functions calling each module
client = await rush.build_provider_with_functions(
    url=RUSH_URL, access_token=RUSH_TOKEN, batch_tags=TAGS
)

In [None]:
# |hide
client = await rush.build_provider_with_functions(
    url=RUSH_URL,
    access_token=RUSH_TOKEN,
    batch_tags=TAGS,
    restore_by_default=True,
)

## 1.4) Input selection
Fetch data files from RCSB to pass as input to the modules:

In [None]:
PROTEIN_PDB_PATH = client.workspace / f"{SYSTEM}_P.pdb"

complex = list(pdb_fetch.fetch_structure(SYSTEM))
protein = pdb_delhetatm.remove_hetatm(pdb_selchain.select_chain(complex, "A"))
with open(PROTEIN_PDB_PATH, "w") as f:
    for l in protein:
        f.write(str(l))

In [None]:
help(client.convert)

Help on function convert in module rush.provider:

async convert(*args: *tuple[EnumValue, RushObject[bytes]], target: 'Target | None' = None, resources: 'Resources | None' = {'storage': 10, 'storage_units': 'MB', 'gpus': 0}, tags: 'list[str] | None' = None, restore: 'bool | None' = None) -> tuple[RushObject[list[Record]]]
    Convert biomolecular and chemical file formats to the QDX file format. Supports PDB and SDF

    Module version:
    `github:talo/tengu-prelude/f506c7ead174cdb7e8d1725139254bb85c6b62f8#convert`

    QDX Type Description:

        format: Format[PDB | SDF];
        input: Object[@$Bytes]
        ->
        output: Object[[Conformer]]


    :param format: the format of the input file
    :param input: the input file
    :return output: the output conformers



# 2) Running Rush Modules
You can view which modules are available, alongside their documentation, in the [API Documentation](https://talo.github.io/rush-py/api/).

## 2.0) Prep the protein
First we will run the protein preparation routine (using pdbfixer and pdb2pqr internally) to prepare the protein for a molecular dynamics simulation.

In [None]:
# we can check the arguments and outputs for prepare_protein with help()
help(client.prepare_protein)

Help on function prepare_protein in module rush.provider:

async prepare_protein(*args: *tuple[RushObject[bytes]], target: 'Target | None' = None, resources: 'Resources | None' = {'storage': 138, 'storage_units': 'MB', 'gpus': 1}, tags: 'list[str] | None' = None, restore: 'bool | None' = None) -> tuple[RushObject[list[Record]], RushObject[bytes]]
    Prepare a PDB for downstream tasks: protonate, fill missing atoms, etc.

    Module version:
    `github:talo/prepare_protein/947cdbc000031e192153a20a9b4a8fbb12279102#prepare_protein_tengu`

    QDX Type Description:

        input_pdb: Object[@$Bytes]
        ->
        output_qdxf: Object[[Conformer]];
        output_pdb: Object[@$Bytes]


    :param input_pdb: An input protein as a file; one PDB file
    :return output_qdxf: An output protein a vec: one qdxf per model in pdb
    :return output_pdb: An output protein as a file: one PDB file



In [None]:
# Here we run the function, it will return a Provider.Arg which you can use to
# fetch the results
# We set restore = True so that we can restore a previous run to the same path
# with the same tags
prepared_protein_qdxf, prepared_protein_pdb = await client.prepare_protein(
    PROTEIN_PDB_PATH,
)
# This initially only has the id of your result; we will show how to fetch the
# actual value later
prepared_protein_qdxf

2024-02-29 12:14:44,224 - rush - INFO - Trying to restore job with tags: ['qdx', 'rush-py-quickstart', '1B39'] and path: github:talo/prepare_protein/947cdbc000031e192153a20a9b4a8fbb12279102#prepare_protein_tengu


Arg(id=2a211718-9b94-4fc8-bd35-0026a8abd6d4, value=None)

## 2.1) Run statuses
This will show the status of all of your runs. You can also view run statuses on the [Rush UI](https://rush.qdx.co/dashboard/jobs).

In [None]:
await client.status()

{'07909fb9-c568-4c37-85bb-ef67401444f3': (<ModuleInstanceStatus.RESOLVING: 'RESOLVING'>,
  'prepare_protein',
  1)}

## 2.2) Run Values
This will return the "value" of the output from the function—for files you will recieve a url that you can download, otherwise you will recieve them as python types:

In [None]:
protein_qdxf_info = await prepared_protein_qdxf.get()
protein_qdxf_info

2024-02-29 12:14:44,587 - rush - INFO - Argument 2a211718-9b94-4fc8-bd35-0026a8abd6d4 is now ModuleInstanceStatus.RESOLVING
2024-02-29 12:14:46,812 - rush - INFO - Argument 2a211718-9b94-4fc8-bd35-0026a8abd6d4 is now ModuleInstanceStatus.ADMITTED
2024-02-29 12:15:01,429 - rush - INFO - Argument 2a211718-9b94-4fc8-bd35-0026a8abd6d4 is now ModuleInstanceStatus.DISPATCHED
2024-02-29 12:15:09,328 - rush - INFO - Argument 2a211718-9b94-4fc8-bd35-0026a8abd6d4 is now ModuleInstanceStatus.QUEUED
2024-02-29 12:15:33,948 - rush - INFO - Argument 2a211718-9b94-4fc8-bd35-0026a8abd6d4 is now ModuleInstanceStatus.AWAITING_UPLOAD


'https://storage.googleapis.com/rush_store_default/50b50c06-776a-45b7-90d6-c4f9c82f6789?x-goog-signature=82a7783e39fa0fef7a7c3f6775adba197124cf59a00ab4a459184b9cf7580c0d5277c698c9db357ed32edb3d16a1ccfd92c14174c992f4fd45609a73a61b53330d91f4bdf652998a2f6c67386e3d257315260bda8327def894e99c1b65f747fc359a9ba5e27f5e90d55b95899d221de9226dc24fd31db6095e668911ea757d662670770f9151e88337a988262c491e73e129fa922ed5bb1c73b61419d09bbd96559292fe6df6c2d969e560b3d5a42bdf77425506ba699873aa90fff0086cc657ec0f6bbb623697534560276394a5c30c205f4211c806ec465f72a5154e5456994b68877103dec717839a5bd18253214fcd0ab9e36b010ef7a64db95d6b6f8114&x-goog-algorithm=GOOG4-RSA-SHA256&x-goog-credential=qdx-store-user%40humming-bird-321603.iam.gserviceaccount.com%2F20240229%2Fasia-southeast1%2Fstorage%2Fgoog4_request&x-goog-date=20240229T041606Z&x-goog-expires=3600&x-goog-signedheaders=host'

## 2.3) Downloads
We provide a utility to download files into your workspace, you can either provide a filename, which will be saved in `workspace/objects/[filename]`, or you can provide your own filepath which the client will use as-is:

In [None]:
protein_qdxf_file = await prepared_protein_qdxf.download(overwrite=True)

In [None]:
# qdxf files can be loaded as json
with open(protein_qdxf_file) as f:
    protein_qdxf_data = json.load(f)[0]
protein_qdxf_data["amino_acid_seq"][:10]

['MET', 'GLU', 'ASN', 'PHE', 'GLN', 'LYS', 'VAL', 'GLU', 'LYS', 'ILE']

In [None]:
await prepared_protein_pdb.download(
    filename="01_prepared_protein.pdb", overwrite=True
)

PosixPath('/home/machineer/qdx/rush-py-quickstart/objects/01_prepared_protein.pdb')

In [None]:
# we can read our prepared protein pdb like this
with open(client.workspace / "objects" / "01_prepared_protein.pdb", "r") as f:
    print(f.readline(), "...")

REMARK   1 CREATED WITH OPENMM 8.0, 2024-02-29
 ...
