## Virtual screen protocol

In this tutorial, we will demonstrate how you can use `rush-py` to conduct a large-scale virtual screen on a target without a known drug target.
This is useful for scenarios where you want to explore drugging a protein target that has no reference ligand.

We will use the Zinc20 database of FDA approved drugs as our sample screening database, but Rush's capability means that this protocol could scale to screen tens of millions of ligands.

## 0.0) Imports

In [1]:
import rush
import csv

import requests
from pathlib import Path

## 0.1) Project configuration

In [2]:
# |hide
import os
import pathlib

WORK_DIR = pathlib.Path("~/qdx/virtual-screen/").expanduser()
if WORK_DIR.exists():
    !rm -r $WORK_DIR
os.makedirs(WORK_DIR)
os.chdir(WORK_DIR)

In [3]:
# Define our project information
DESCRIPTION = "rush-py virtual screen"
TAGS = ["qdx", "rush-py-v2", "demo", "virtual-screen"]
WORK_DIR = Path.home() / "qdx" / "virtual-screen"

In [4]:
client = await rush.build_provider_with_functions()

## 0.3) Input files for virtual screen

For more details about the input structure of the our virtual screen module, we can run:

In [5]:
help(client.virtual_screen_pdb)

Help on function virtual_screen_pdb in module rush.provider:

async virtual_screen_pdb(*args: *tuple[RushObject[bytes], Optional[RushObject[bytes]], RushObject[bytes], Record], target: 'Target | None' = None, resources: 'Resources | None' = {'storage': 138, 'storage_units': 'MB', 'gpus': 1}, tags: 'list[str] | None' = None, output_tags: 'list[list[str] | None] | None' = None, restore: 'bool | None' = None) -> tuple[list[None]]
    Run a virtual screen on a library of molecules

    Please see:
    GNINA 1.0: Molecular docking with deep learning
    A McNutt, P Francoeur, R Aggarwal, T Masuda, R Meli, M Ragoza, J Sunseri, DR Koes. J. Cheminformatics, 2021


    Module version:
    `github:talo/tengu-virtualscreen/dc2dc37ff95f935708c50f6d04500a84b7102b5b#tengu_virtual_screen_pdb`

    QDX Type Description:

        in: Object[@$Bytes];
        in: Object {size: u64, format: ObjectFormat[json | bin]?, path: @$Bytes}?;
        in: Object[@$Bytes,];
        in: VirtualScreenOptions {
      

## 0.4 Download protein target 
For this example, we will fetch an example protein. We are using CDK2 as a protein target to serve as an example.
In this step, we also remove the ligand from the protein PDB file.

In [6]:
PROTEIN_TARGET_FILEPATH = Path.cwd() / "3pxy_cleaned.pdb"
VIRTUAL_SCREEN_LIBRARY_URL = (
    "https://zinc20.docking.org/substances/subsets/fda.csv?count=all"
)
VIRTUAL_SCREEN_FILEPATH = Path.cwd() / "vs.txt"

In [None]:
!pdb_fetch '3pxy' |  pdb_delhetatm > 3pxy_cleaned.pdb
!ls

3pxy_B_JWS.sdf	3pxy_cleaned.pdb


## 0.4) Prepping virtual screen library
The virtual screen module expects a file of SMILES strings with each one seperated by newlines, so we need to download the Zinc20 dataset and then write only the SMILES strings to a new file.


In [None]:
with requests.get(VIRTUAL_SCREEN_LIBRARY_URL, stream=True) as response:
    response.raise_for_status()

    lines = (line.decode("utf-8") for line in response.iter_lines())
    reader = csv.DictReader(lines)

    with open(VIRTUAL_SCREEN_FILEPATH, 'w') as f:
        # also write the reference ligand to the screen for sanity checking purposes
        f.write('COc1ccc(O)c(-c2nc(N)nc(N)n2)c1' + '\n')
        for row in reader:
            if "smiles" in row:
                f.write(row['smiles'] + '\n')

In [None]:
# print the first 3 lines of the file for verification purposes 
with open(VIRTUAL_SCREEN_FILEPATH, 'r') as file:
    for _ in range(3):
        print(file.readline(), end="")

COc1ccc(O)c(-c2nc(N)nc(N)n2)c1
C[C@@H](S)C(=O)NCC(=O)O
COc1ccccc1OC[C@H](O)CO


## 1.0) Run p2rank to find potential pockets to target


## 1.0) Run virtual screen module
We are only screening the first 100 molecules as a preflight check before running the whole screen.

In [None]:
VS_CONFIG = {'templated_docking': True, 'screen_n_molecules': 100}

(results,) = await client.virtual_screen_pdb(
    PROTEIN_TARGET_FILEPATH,
    REFERENCE_LIGAND_FILEPATH,
    VIRTUAL_SCREEN_FILEPATH,
    VS_CONFIG,
    target="BULLET",
    resources={'gpus': 1, "storage": 50, "storage_units": "GB"},
)


## 1.1) Viewing and downloading results
The virtual screen module returns a number of hits ordered by gnina CNNScore by default, though is is configurable.
We can also download any of our top screen hits which are stored as SDF files.

In [None]:
screen_results = await results.get()
screen_results

2024-06-14 01:43:13,780 - rush - INFO - Argument a8630dc2-babe-4fcc-90d5-ead5c3466c87 is now ModuleInstanceStatus.ADMITTED
2024-06-14 01:43:26,501 - rush - INFO - Argument a8630dc2-babe-4fcc-90d5-ead5c3466c87 is now ModuleInstanceStatus.DISPATCHED
2024-06-14 01:44:31,426 - rush - INFO - Argument a8630dc2-babe-4fcc-90d5-ead5c3466c87 is now ModuleInstanceStatus.RUNNING
2024-06-14 01:44:43,973 - rush - INFO - Argument a8630dc2-babe-4fcc-90d5-ead5c3466c87 is now ModuleInstanceStatus.AWAITING_UPLOAD


[['CCCN[C@H]1CCc2nc(N)sc2C1',
  {'path': 'ea787e0d-4268-4e94-965e-901c44033913',
   'size': 0,
   'format': 'json'},
  [{'mode': 1,
    'affinity': -5.97556,
    'cnn_score': 0.84076226,
    'cnn_affinity': 1.6136252}]],
 ['CCCN[C@H]1CCc2nc(N)sc2C1',
  {'path': '48a20872-96ca-4eae-98c9-84203e43e3bb',
   'size': 0,
   'format': 'json'},
  [{'mode': 1,
    'affinity': -6.14681,
    'cnn_score': 0.8310019,
    'cnn_affinity': 1.6622521}]],
 ['CCCN[C@H]1CCc2nc(N)sc2C1',
  {'path': 'e0b4e72d-d183-4d1c-980e-71a21bd333fc',
   'size': 0,
   'format': 'json'},
  [{'mode': 1,
    'affinity': -5.95758,
    'cnn_score': 0.79241043,
    'cnn_affinity': 2.0833986}]],
 ['CCCN[C@H]1CCc2nc(N)sc2C1',
  {'path': 'da54ce66-e4d9-4a35-9d6c-2388bb8eab7f',
   'size': 0,
   'format': 'json'},
  [{'mode': 1,
    'affinity': -6.28759,
    'cnn_score': 0.7663172,
    'cnn_affinity': 2.135249}]],
 ['C[C@@H](CCc1ccccc1)NC[C@H](O)c1ccc(O)c(C(N)=O)c1',
  {'path': '307006c2-5232-46f8-a1ec-8e1788462755',
   'size': 0,


In [None]:
path_to_best_pose = screen_results[0][1]['path']
print(path_to_best_pose)

out_path =  await client.download_object(path_to_best_pose, filename=f'{screen_results[0][0]}.sdf', decode=True, overwrite=True)
print(out_path)

ea787e0d-4268-4e94-965e-901c44033913


objects/CCCN[C@H]1CCc2nc(N)sc2C1.sdf


In [None]:
with open(out_path, 'r') as file:
        for _ in range(0,20):
                print(file.readline(), end="")




 17 18  0  0  0  0  0  0  0  0999 V2000
   65.2011   80.0207  -82.6061 C   0  0  0  0  0  0  0  0  0  0  0  0
   65.6572   79.9666  -81.1486 C   0  0  0  0  0  0  0  0  0  0  0  0
   65.9533   78.5724  -80.5956 C   0  0  0  0  0  0  0  0  0  0  0  0
   66.5680   77.6469  -81.5908 C   0  0  0  0  0  0  0  0  0  0  0  0
   67.1377   76.4650  -81.2394 N   0  0  0  0  0  0  0  0  0  0  0  0
   67.5501   75.7858  -82.3319 C   0  0  0  0  0  0  0  0  0  0  0  0
   68.1939   74.5197  -82.2610 N   0  0  0  0  0  0  0  0  0  0  0  0
   67.2207   76.6009  -83.7689 S   0  0  0  0  0  0  0  0  0  0  0  0
   66.4940   77.8802  -82.9553 C   0  0  0  0  0  0  0  0  0  0  0  0
   66.0108   79.1497  -83.5678 C   0  0  0  0  0  0  0  0  0  0  0  0
  -10.3952   56.8908   27.0933 H   0  0  0  0  0  0  0  0  0  0  0  0
  -10.3952   56.8908   27.0933 H   0  0  0  0  0  0  0  0  0  0  0  0
   63.7957   79.6549  -82.6836 N   0  0  0  0  0  0  0  0  0  0  0  0
    2.1012   11.3715   15.3196 H   0  0  0  0  