# Dubai — Quantum accurate bond inference and partial charge calculations

This notebook will walk us through getting quantum-accurate bond inference and Mulliken partial charge calculations for a SMILES string retrieved from PubChem.

## 0) Setup
See [Quickstart](../index.ipynb#imports) for more details on the setup.

## 0.0) Imports

In [1]:
import os
import json
from pathlib import Path

import requests
import rush

## 0.1) Credentials


In [2]:
TOKEN = os.getenv("RUSH_TOKEN")
# You might have a custom deployment url, by default it will use
# https://tengu.qdx.ai
RUSH_URL = os.getenv("RUSH_URL")

## 0.2) Configuration
Let's set some global variables that define our project.

In [3]:
DESCRIPTION = "quantum-bond-inference-notebook"
TAGS = ["rush-py", "dubai", "convert"]
WORK_DIR = Path.home() / "qdx" / "dubai-rush-py-demo"

## 0.3) Build your rush client

In [4]:
# |hide
if WORK_DIR.exists():
    client = rush.Provider(workspace=WORK_DIR, access_token=TOKEN, url=RUSH_URL)
    await client.nuke(remote=False)

In [5]:
os.makedirs(WORK_DIR, exist_ok=True)

client = await rush.build_provider_with_functions(
    access_token=TOKEN, url=RUSH_URL, workspace=WORK_DIR, batch_tags=TAGS
)

In [6]:
#| hide
client = await rush.build_provider_with_functions(
    access_token=TOKEN, url=RUSH_URL, workspace=WORK_DIR, batch_tags=TAGS, restore_by_default=True
)

## 0.4) Download Aspirin SDF from PubChem

In [7]:
# Convert aspirin to a QDXF file so we can use it for this demo
SMILES_STRING = "CC(=O)OC1=CC=CC=C1C(=O)O"
SDF_LINK = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/{SMILES_STRING}/record/SDF?record_type=3d"

file_path = f"{WORK_DIR}/aspirin.sdf"
with open(file_path, "wb") as f:
    f.write(requests.get(SDF_LINK).content)

## 0.5) Convert SDF file to QDXF format
QDXF is the central molecule format of the Rush API, so before we use the Dubai module to infer connectivity (bonds) for our molecule, we must convert our SDF file to QDXF.

In [8]:
# We need to specify storage > file size to ensure that we have allocated
# enough resources for the convert module
(ligand,) = await client.convert(
    "SDF", Path(file_path), resources={"storage": 5000}
)

ligand = await ligand.get()

2024-02-09 02:11:10,934 - rush - INFO - Trying to restore job with tags: ['rush-py', 'dubai', 'convert'] and path: github:talo/tengu-prelude/efc6d8b3a8cc342cd9866d037abb77dac40a4d56#convert
2024-02-09 02:11:11,195 - rush - INFO - Restoring job from previous run with id e99b41a1-2426-45e6-be51-e11bda1cac3c


## 0.6 ) Remove connectivity
We remove connectivity, as we will be perceiving bonds (quantum-accurately) using Dubai in the next step.

In [9]:
EXPECTED_CONNECTIVITY = ligand[0]["topology"]["connectivity"]
ligand = ligand[0]
del ligand["topology"]["connectivity"]

ligand["topology"]["fragment_multiplicities"] = [1]

## 1.0) Set Dubai module specific configuration
In this stage, we set configuration for the Dubai module, as well as saving our QDXF Aspirin to disk, as the Dubai module needs the file itself.


In [10]:
DUBAI_RESOURCES = {
    "gpus": 1,
    "storage": 1024_000,
    "walltime": 60,
}
LIGAND_FILEPATH = Path(f"{WORK_DIR}/aspirin.qdxf.json")
json.dump(ligand, open(LIGAND_FILEPATH, "w"))

## 1.1) Run Dubai
Finally, we run Dubai to perform quantum-accurate bond inference, as well the calculation of Mulliken partial charges.

In [11]:
help(client.dubai)

Help on function dubai in module rush.provider:

async dubai(*args: *tuple[RushObject[Conformer]], target: Optional[Target] = 'NIX_SSH', resources: Optional[Resources] = {'storage': 10, 'storage_units': 'MB', 'gpus': 0}, tags: list[str] | None = None, restore: bool | None = None) -> tuple[RushObject[Conformer]]
    Perform quantum accurate bond inference and partial charge calculation on a Conformer
    
    Module version:  
    `github:talo/Dubai/f9c420ba9e7fc04e520c67e62201a5f32d174a77#dubai_tengu`
    
    QDX Type Description:
    
        input_conformer: @Conformer
        ->
        output_conformer: @Conformer
    
    :param input_conformer: A Conformer. The Conformer's Topology requires fragment charges and fragment charge multiplicities
    :return output_conformer: Output Conformer including partial charges and bond recalculation



In [12]:
(ligand_with_bonds,) = await client.dubai(
    LIGAND_FILEPATH, resources=DUBAI_RESOURCES, target="NIX_SSH_3"
)

2024-02-09 02:11:21,035 - rush - INFO - Trying to restore job with tags: ['rush-py', 'dubai', 'convert'] and path: github:talo/Dubai/f9c420ba9e7fc04e520c67e62201a5f32d174a77#dubai_tengu
2024-02-09 02:11:21,477 - rush - INFO - Restoring job from previous run with id f97f26e8-0cb9-4db5-9f37-e0f0b5565e90


In [13]:
output_ligand = await ligand_with_bonds.get()

for expected_bond, outputted_bond in zip(
    EXPECTED_CONNECTIVITY, output_ligand["topology"]["connectivity"]
):
    # Check start atoms are the same
    assert expected_bond[0] == outputted_bond[0]
    # Check ending atoms are the same
    assert expected_bond[1] == outputted_bond[1]
    # NB: we don't check the third item of the bond -- the bond type. This is
    # because Dubai accurately outputs ring bonds as a specific 'RINGBOND' type,
    # whereas SDF aspirin was interleaving single and double bonds.
print("Bond inference performed correctly!")

Bond inference performed correctly!
