# rush-py

> Python SDK for the QDX Quantum Chemistry workflow management system

# Quickstart
This document will walk through executing jobs on the Rush platform. For a comprehensive guide on the concepts and constructing a full workflow, see the [full rush-py explainer](./Tutorials/full-rush-py-explainer.ipynb) document

First, install the following modules via pip - we require Python > 3.10
```
pip install rush-py pdb-tools
```

# 0) Setup
This is where we prepare the rush client, directories, and input data we'll be working with

## 0.0) Imports

In [None]:
import os
import tarfile
from datetime import datetime
from pathlib import Path

from pdbtools import pdb_fetch, pdb_delhetatm, pdb_selchain, pdb_rplresname, pdb_keepcoord, pdb_selresname
import requests
import py3Dmol

import rush

## 0.1) Credentials

In [None]:
# Set our token - ensure you have exported RUSH_TOKEN in your shell; or just replace the os.getenv with your token
TOKEN = os.getenv("RUSH_TOKEN")
# You might have a custom deployment url, by default it will use https://tengu.qdx.ai
URL = os.getenv("RUSH_URL") or "https://tengu.qdx.ai"
# These env variables will be read by default, so you can skip this step in future

## 0.2) Configuration
Lets set some global variables that define our project, these are not required, but are good practice to help organize the jobs that will be persisted under your account

In [None]:
# Make sure you create a unique set of tags for each run.
# Good practice is to have at least each of the experiment name and system name as a tag.
EXPERIMENT = "tengu-py-v2-quickstart"
SYSTEM = "cdk2"
TAGS = ["qdx", EXPERIMENT, SYSTEM]
# Set our inputs
WORK_DIR = Path.home() / "qdx" / EXPERIMENT
PROTEIN_PDB_PATH = WORK_DIR / "test_P.pdb"

In [None]:
# |hide
if WORK_DIR.exists():
    client = rush.Provider(workspace=WORK_DIR)
    await client.nuke(remote=True)

Ensure your workdir exists

In [None]:
os.makedirs(WORK_DIR)

## 0.2) Build your client
Get our client, for calling modules and using the Rush API


In [None]:
# Note, access_token and url are optional, if you have set the env variables RUSH_TOKEN and RUSH_URL
# Workspace sets the location where we will store our session history file and module lock file
# By using the `build_provider_with_functions` method, we will also build helper functions calling each module
client = await rush.build_provider_with_functions(
    access_token=TOKEN, url=URL, workspace=WORK_DIR, batch_tags=TAGS
)

## 0.3) Input selection
Fetch data files from RCSB to pass as input to the modules

In [None]:
complex = list(pdb_fetch.fetch_structure("1B39"))
protein = pdb_delhetatm.remove_hetatm(pdb_selchain.select_chain(complex, "A"))
with open(PROTEIN_PDB_PATH, "w") as f:
    for l in protein:
        f.write(str(l))

In [None]:
help(client.convert)

Help on function convert in module rush.provider:

async convert(*args: [list[typing.Union[str, ~T]], <class 'pathlib.Path'>], target: rush.graphql_client.enums.ModuleInstanceTarget | None = <ModuleInstanceTarget.NIX: 'NIX'>, resources: rush.graphql_client.input_types.ModuleInstanceResourcesInput | None = ModuleInstanceResourcesInput(gpus=0, gpu_mem=None, gpu_mem_units=None, cpus=None, nodes=None, mem=None, mem_units=None, storage=10, storage_units=<MemUnits.MB: 'MB'>, walltime=None, storage_mounts=None), tags: list[str] | None = None, restore: bool | None = None) -> [<class 'pathlib.Path'>]
    Convert biomolecular and chemical file formats to the QDX file format. Supports PDB and SDF
    
    Module version: github:talo/tengu-prelude/efc6d8b3a8cc342cd9866d037abb77dac40a4d56#convert
    
    QDX Type Description:
    
        format: PDB|SDF;
    
        input: @bytes 
    
    ->
    
        output: @[Conformer]
    
    
    
    :param format: the format of the input file
    :pa

# 1) Running Rush Modules
You can view which modules are available, alongside their documentation, in the [API Dodumentation](./api/index.html)

## 1.1) Prep the protein
First we will run the protein preparation routine (using pdbfixer internally) to prepare the protein for molecular dynamics

In [None]:
# we can check the arguments and outputs for prepare_protein with help()
help(client.prepare_protein)

Help on function prepare_protein in module rush.provider:

async prepare_protein(*args: [<class 'pathlib.Path'>], target: rush.graphql_client.enums.ModuleInstanceTarget | None = <ModuleInstanceTarget.NIX_SSH_2: 'NIX_SSH_2'>, resources: rush.graphql_client.input_types.ModuleInstanceResourcesInput | None = ModuleInstanceResourcesInput(gpus=1, gpu_mem=None, gpu_mem_units=None, cpus=None, nodes=None, mem=None, mem_units=None, storage=138, storage_units=<MemUnits.MB: 'MB'>, walltime=None, storage_mounts=None), tags: list[str] | None = None, restore: bool | None = None) -> [<class 'pathlib.Path'>, <class 'pathlib.Path'>]
    Prepare a PDB for downstream tasks: protonate, fill missing atoms, etc.
    
    Module version: github:talo/pdb2pqr/ff5abe87af13f31478ede490d37468a536621e9c#prepare_protein_tengu
    
    QDX Type Description:
    
        input_pdb: @bytes 
    
    ->
    
        output_qdxf: @[Conformer];
    
        output_pdb: @bytes
    
    
    
    :param input_pdb: An input 

In [None]:
# Here we run the function, it will return a Provider.Arg which you can use to fetch the results
# We set restore = True so that we can restore a previous run to the same path with the same tags
(prepared_protein_qdxf, prepared_protein_pdb) = await client.prepare_protein(
    PROTEIN_PDB_PATH
)
print(f"{datetime.now().time()} | Running protein prep!")
prepared_protein_qdxf  # this initially only have the id of your result, we will show how to fetch the actual value later

17:52:21.654575 | Running protein prep!


Arg(id=bd64bd44-118f-435b-be7a-64d25f76c5dc, value=None)

## 1.3) Run statuses
This will show the status of all of your runs

In [None]:
await client.status()

{'687a9367-9866-4999-9586-7480ba581b54': (<ModuleInstanceStatus.RESOLVING: 'RESOLVING'>,
  'prepare_protein',
  1)}

## 1.4) Run Values
This will return the "value" of the output from the function - for files you will recieve a url that you can download, otherwise you will recieve them as python types

In [None]:
protein_qdxf_value = await prepared_protein_qdxf.get()
len(protein_qdxf_value[0]["topology"]["symbols"])

4852

## 1.5) Downloads
We provide a utility to download files into your workspace, you can either provide a filename, which will be saved in `workspace/objects/[filename]`, or you can provide your own filepath which the client will use as-is

In [None]:
try:
    await prepared_protein_pdb.download(filename="01_prepared_protein.pdb")
except FileExistsError:
    # we will raise an error if you try to overwrite an existing file, you can force the file to overwrite
    # by passing an absolute filepath instead
    pass

In [None]:
# we can read our prepared protein pdb like this
with open(client.workspace / "objects" / "01_prepared_protein.pdb", "r") as f:
    print(f.readline(), "...")

REMARK   1 PDBFIXER FROM: /home/ubuntu/.cache/tengu_store/run/687a9367-9866-4999-9586-7480ba581b54/.tmp/m2_protein.pdb
 ...
