# Dubai - Quantum accurate bond inference and partial charge calculations 

This notebook will work us through getting quantum-accurate bond inference and Mulliken partial charge calculations for a SMILES string retrieved from PubChem.

## 0) Setup
See [Quickstart](../index.ipynb#imports) for more details on the setup.

## 0.0) Imports

In [59]:
import os
import json
from pathlib import Path

import requests
import tengu

## 0.1) Credentials


In [60]:
TOKEN = os.getenv("TENGU_TOKEN")
# You might have a custom deployment url, by default it will use https://tengu.qdx.ai
TENGU_URL = os.getenv("TENGU_URL") or "https://tengu.qdx.ai"

## 0.2) Configuration
Let's set some global variables that define our project.

In [61]:
DESCRIPTION = "quantum-bond-inference-notebook"
TAGS = ["tengu-py", "dubai", "convert"]
WORK_DIR = Path.home() / "qdx" / "dubai-tengu-py-demo"

## 0.3) Build your tengu client

In [62]:
#|hide
client = tengu.Provider(workspace=WORK_DIR,access_token=TOKEN, url=TENGU_URL)
await client.nuke()

In [63]:
os.makedirs(WORK_DIR, exist_ok=True)

client = await tengu.build_provider_with_functions(
    access_token=TOKEN, url=TENGU_URL, workspace=WORK_DIR, batch_tags=TAGS
)

## 0.4) Download Aspirin SDF from PubChem

In [64]:
# Convert aspirin to a QDXF file so we can use it for this demo
SMILES_STRING = "CC(=O)OC1=CC=CC=C1C(=O)O"
SDF_LINK = f"https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/smiles/{SMILES_STRING}/record/SDF?record_type=3d"

file_path = f'{WORK_DIR}/aspirin.sdf'
with open(file_path, 'wb') as f:
    f.write(requests.get(SDF_LINK).content)

## 0.5) Convert SDF file to QDXF format
QDXF is the central molecule format of the Tengu API, so before we use the Dubai module to infer connectivity (bonds) for our molecule, we must convert our SDF file to QDXF.

In [65]:
# We need to specify storage > file size to ensure that we have allocated enough resources for the convert module
(ligand,) = await client.convert("SDF", Path(file_path), resources={"storage": 5000})
await ligand.get()

ligand = await ligand.get()

## 0.6 ) Remove connectivity
We remove connectivity, as we will be perceiving bonds (quantum-accurately) using Dubai in the next step.

In [66]:
EXPECTED_CONNECTIVITY = ligand[0]['topology']['connectivity']
ligand = ligand[0]
del ligand['topology']['connectivity']

ligand['topology']['fragment_multiplicities'] = [1]

## 1.0) Set Dubai module specific configuration
In this stage, we set configuration for the Dubai module, as well as saving our QDXF Aspirin to disk, as the Dubai module needs the file itself.


In [67]:
DUBAI_RESOURCES = {
    "gpus": 1,
    "storage": 1024_000,
    "walltime": 60,
}
LIGAND_FILEPATH = Path(f'{WORK_DIR}/aspirin.qdxf.json')
json.dump(ligand, open(LIGAND_FILEPATH, 'w'))

## 1.1) Run Dubai
Finally, we run Dubai to perform quantum-accurate bond inference, as well the calculation of Mulliken partial charges.

In [68]:
(ligand_with_bonds,) = await client.dubai(LIGAND_FILEPATH, resources=DUBAI_RESOURCES)

In [69]:
output_ligand = await ligand_with_bonds.get()

for expected_bond, outputted_bond in zip(EXPECTED_CONNECTIVITY, output_ligand['topology']['connectivity']):
    # Check start atoms are the same
    assert expected_bond[0] == outputted_bond[0]
    # Check ending atoms are the same
    assert expected_bond[1] == outputted_bond[1]
    # NB: we don't check the third item of the bond -- the bond type. This is because Dubai accurately outputs ring bonds as 
    # a specific 'RINGBOND' type, whereas SDF aspirin was interleaving single and double bonds.