# rush-py
> Python SDK for the Rush computational chemistry workflow management system 

# Install

First, install the following modules via the command-line (we require Python ≥ 3.11):

``` bash
pip install rush-py
```

# Constructing a Minimal Workflow in rex
### From Molecular Input to Binding Affinity Prediction

This section provides a minimal example of how to define and execute a rex workflow in Rush. The workflow sequentially applies three key modules—auto3d, p2rank, and gnina—to process molecular data and predict binding affinity. Each function plays a specific role:

1. Molecule Preparation (auto3d) – Converts a SMILES string into a 3D molecular structure.
2. Binding Site Prediction (p2rank) – Identifies a binding pocket on a given protein structure.
3. Molecular Docking (gnina) – Uses the binding pocket and small molecule structure to predict the binding affinity.

The following rex script demonstrates the minimal setup required to compute and benchmark binding affinity:

In [34]:
# |hide
# hidden setup for the notebook
import os
import pathlib

WORK_DIR = pathlib.Path("~/qdx/benchmark_notebook").expanduser()
if WORK_DIR.exists():
    !rm -r $WORK_DIR
os.makedirs(WORK_DIR, exist_ok=True)
# swap into clean workdir so that our tests are deterministic
os.chdir(WORK_DIR)
PUT_YOUR_TOKEN_HERE = os.environ["RUSH_2KEN"]
PUT_YOUR_PREFERRED_WORKING_DIRECTORY_HERE = WORK_DIR
os.environ["RUSH_RESTORE_BY_DEFAULT"] = "False"

In [37]:
from rush import build_blocking_provider_with_functions


In [38]:
client = build_blocking_provider_with_functions(
    access_token="2f586006-6d6f-415b-8f0d-48ef5ed7b80c",
    url="https://tengu-server-staging-edh4uref5a-as.a.run.app/",
)
benchmark = client.benchmark_blocking(name="OpenFF CDK2 RMSD17 Benchmark")

2025-02-05 16:04:36,694 - rush - INFO - Not restoring by default via env


AttributeError: 'str' object has no attribute 'items'

In [15]:
# |hide
from IPython.display import Markdown as md
rex_code_above = """
let
    runspec = RunSpec {
        target = 'Bullet',
        resources = Resources {
            storage = some 10,
            storage_units = some "MB",
            gpus = some 1
        }
    },

    runspec_nogpu = RunSpec {
        target = 'Bullet',
        resources = Resources {
            storage = some 10,
            storage_units = some "MB",
            gpus = none
        }
    },

    auto3d = \\smi ->
        let
            result = get 0 (auto3d_rex_s runspec { k = 1 } [smi]),
            make_virtual_object = \\index ->
                VirtualObject {
                    path = get "path" (get index result),
                    size = get "size" (get index result),
                    format = "json"
                }
        in
            (make_virtual_object 0, make_virtual_object 1),

    p2rank = \\prot_conf ->  p2rank_rex_s runspec_nogpu {} prot_conf,

    gnina = \\prot_conf -> \\bounding_box -> \\smol_conf ->
        get 0 (get 0 (gnina_rex_s runspec {} [prot_conf] [bounding_box] smol_conf [])),

in
\\input ->
    let
        protein = load (id (get 0 input)) 'ProteinConformer',
        smol_id = id (get 1 input),
        smiles = smi (load smol_id 'Smol'),

        structure = load (structure_id protein) 'Structure',
        trc = [
            topology structure,
            residues structure,
            chains structure
        ],

        bounding_box = get 0 (get 0 (p2rank trc)),

        smol_structure = auto3d smiles,

        docked_structure = gnina trc bounding_box [smol_structure],

        min_affinity = list_min (map (get "affinity") (get "scores" docked_structure)),

        binding_affinity = BindingAffinity {
            affinity = min_affinity,
            affinity_metric = 'kcal/mol',
            protein_id = protein_id protein,
            smol_id = smol_id,
            metadata = Metadata {
                name = 'binding affinity for smol:' + smol_id + ' and protein ' + (protein_id protein),
                description = none,
                tags = []
            }
        }
    in
        [BenchmarkArg {
            entity = "BindingAffinity",
            id = save binding_affinity
        }]
"""

md(f"```haskell{rex_code_above}```")


```haskell
let
    runspec = RunSpec {
        target = 'Bullet',
        resources = Resources {
            storage = some 10,
            storage_units = some "MB",
            gpus = some 1
        }
    },

    runspec_nogpu = RunSpec {
        target = 'Bullet',
        resources = Resources {
            storage = some 10,
            storage_units = some "MB",
            gpus = none
        }
    },

    auto3d = \smi ->
        let
            result = get 0 (auto3d_rex_s runspec { k = 1 } [smi]),
            make_virtual_object = \index ->
                VirtualObject {
                    path = get "path" (get index result),
                    size = get "size" (get index result),
                    format = "json"
                }
        in
            (make_virtual_object 0, make_virtual_object 1),

    p2rank = \prot_conf ->  p2rank_rex_s runspec_nogpu {} prot_conf,

    gnina = \prot_conf -> \bounding_box -> \smol_conf ->
        get 0 (get 0 (gnina_rex_s runspec {} [prot_conf] [bounding_box] smol_conf [])),

in
\input ->
    let
        protein = load (id (get 0 input)) 'ProteinConformer',
        smol_id = id (get 1 input),
        smiles = smi (load smol_id 'Smol'),

        structure = load (structure_id protein) 'Structure',
        trc = [
            topology structure,
            residues structure,
            chains structure
        ],

        bounding_box = get 0 (get 0 (p2rank trc)),

        smol_structure = auto3d smiles,

        docked_structure = gnina trc bounding_box [smol_structure],

        min_affinity = list_min (map (get "affinity") (get "scores" docked_structure)),

        binding_affinity = BindingAffinity {
            affinity = min_affinity,
            affinity_metric = 'kcal/mol',
            protein_id = protein_id protein,
            smol_id = smol_id,
            metadata = Metadata {
                name = 'binding affinity for smol:' + smol_id + ' and protein ' + (protein_id protein),
                description = none,
                tags = []
            }
        }
    in
        [BenchmarkArg {
            entity = "BindingAffinity",
            id = save binding_affinity
        }]
```

In [None]:
submission = client.run_benchmark(
    benchmark.id, 
    rex_code_above, 
    "simple submission", 
    sample=0.2)

# Module: auto3d

`auto3d` generates a **3D molecular structure** from a **SMILES**. This is based on [https://auto3d.readthedocs.io/en/latest/usage.html]

Parameters
----------
| Parameter | Type | Default | Description |
| --- | --- | --- | --- |
| `smi` | `String` | `None` | (_Required_)The **SMILES** representation of the molecule. |
| `k` | `Integer` | `1` | _(Optional)_ Output top `k` structures for each molecule. |
| `window` | `Float` | `None` | _(Optional)_ Outputs structures whose energies are within `x` kcal/mol from the lowest energy conformer. |
| `memory` | `Integer` | `None` | _(Optional)_ Memory in GB. |
| `capacity` | `Integer` | `42` | _(Optional)_ Number of SMILES the model handles per 1GB of memory. |
| `enumerate_tautomer` | `Boolean` | `False` | _(Optional)_ When `True`, enumerates tautomers for the input. |
| `max_confs` | `Integer` | `None` | _(Optional)_ Maximum number of isomers per SMILES. Defaults to a dynamic value (`heavy_atoms - 1`). |
| `enumerate_isomer` | `Boolean` | `True` | _(Optional)_ When `True`, cis/trans and R/S isomers are enumerated. |
| `mpi_np` | `Integer` | `4` | _(Optional)_ Number of MPI processes. |
| `optimizing_engine` | `{ANI2x \| ANI2xt \| AIMNET}` | `AIMNET` | _(Optional)_ The engine used for optimization. |
| `opt_steps` | `Integer` | `5000` | _(Optional)_ Maximum number of optimization steps. |
| `convergence_threshold` | `Float` | `0.003` | _(Optional)_ Optimization is considered converged if maximum force is below this threshold. |
| `patience` | `Integer` | `1000` | _(Optional)_ If force does not decrease for `patience` steps, conformer drops out of optimization loop. |
| `threshold` | `Float` | `0.3` | _(Optional)_ If RMSD between two conformers is within this threshold, one is removed as a duplicate. |
| `verbose` | `Boolean` | `False` | _(Optional)_ When `True`, saves all metadata while running. |
| `job_name` | `String` | `None` | _(Optional)_ Custom job name. |
| `batchsize_atoms` | `Integer` | `1024` | _(Optional)_ Number of atoms in one optimization batch per 1GB memory. |


Returns
`TODO: returns a fucking virtual dunno what the fuck`
-------
| Output | Type | Description |
| --- | --- | --- |
| `smol_structure` | `SmolStructure` | A **3D molecular structure** generated from the input **SMILES** string. |





# Module: P2rank

`p2rank` identifies **binding sites** on a given **protein structure**.  
It predicts pockets based on **machine learning models** trained on structural features. This function is based on:  
[Identifying ligand-binding sites using machine learning](https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0285-8).

Parameters
----------
None of the parameters of `p2rank` are exposed at this point.


Returns
-------
| Output | Type | Description |
| --- | --- | --- |
| `bounding_box` | `BoundingBox` | A predicted binding site on the protein structure.. |
