# COVID-19: Main protease binding site defined by XChem ligands

## Aim of this notebook
    
- Load all XChem structures in Biopython
- Get residues in radius cutoff of ligand centroids
- Find overlapping residues across all structures (by residue coverage threshold)

## Data

> To contribute to the global effort to combat COVID-19, Diamond has been able to solve a new structure of the SARS-CoV-2 main protease (MPro) at high resolution (PDB ID: 6YB7), and complete a large XChem crystallographic fragment screen against it (detailed below). Data have been deposited with the PDB, but we are making the results available immediately to the world on this page; additional work is ongoing, and updates will be continually posted here in coming days and weeks.

https://www.diamond.ac.uk/covid-19/for-scientists/Main-protease-structure-and-XChem.html 

## Binding site definition

Transform the following cell to Code cell if you need to install packages.

In [1]:
from pathlib import Path

from binding_site_definition import mulitple_binding_sites_residue_ids, binding_site_residues_by_coverage
from binding_site_definition import binding_site_in_pymol, binding_site_in_probis

### Parameters

- `STRUCTURE_PATHS`: Paths to structures files from Diamond's XChem screen
- `LIGAND_NAME`: Ligand name in dataset (the same in all structures, thanks!): `H_LIG`
- `distance_cutoff`: Radius from ligand centroid to end of binding site
- `coverage_cutoff`: Percentage of structures that need to show a certain residue ID in their binding site

In [2]:
# Path to folder with structures
STRUCTURE_FOLDER = Path('.') / '..' / 'data' / 'Mpro_All_PDBs'

# Get path to all structure files
STRUCTURE_PATHS = [pdb for pdb in STRUCTURE_FOLDER.glob('*.pdb')]
print(f'Number of structures: {len(STRUCTURE_PATHS) }')

In [4]:
LIGAND_NAME = 'H_LIG'

### Main function

In [7]:
def binding_site_definition(structure_paths, ligand_name, distance_cutoff=15, coverage_cutoff=0.5):
    """
    Get binding site residues seen across all structures within a radius around the structures' ligand centroids.
    
    Parameters
    ----------
    structure_paths : list of pathlib.Path
        Paths to PDB structures from Diamond's XChem screen.
    ligand_name : str
        Ligand name
    distance_cutoff : int
        Radius around ligand centroid in which residues are considered as part of the binding site.
    coverage_cutoff : float
        Residues are only considered as part of the binding site if they seen in a given ratio of structures (i.e. coverage; between 0 and 1).
    """
    
    n_structures = len(structure_paths)
    
    residue_ids = mulitple_binding_sites_residue_ids(structure_paths, ligand_name, distance_cutoff)
    
    binding_site = binding_site_residues_by_coverage(residue_ids, n_structures, coverage_cutoff)

    print(
        f'--- Number of binding site residues: '
        f'{len(binding_site)}\n'
    )
    print(
        f'--- PyMol command to visualize binding site in 6LU7:\n\n'
        f'{binding_site_in_pymol(binding_site)}\n'
    )
    print(
        f'--- ProBis command to select binding site:\n\n'
        f'{binding_site_in_probis(binding_site)}\n'
    )

### Parameter set 1

In [6]:
binding_site_definition(
    structure_paths=STRUCTURE_PATHS, 
    ligand_name=LIGAND_NAME, 
    distance_cutoff=15, 
    coverage_cutoff=0.5
)

--- Number of binding site residues: 68

--- PyMol command to visualize binding site in 6LU7:

fetch 6LU7
remove solvent
select pocket, 6LU7 and resi 20+21+22+23+24+25+26+27+28+29+37+38+39+40+41+42+43+44+45+46+47+48+49+50+51+52+54+57+61+66+85+86+87+116+117+118+119+120+139+140+141+142+143+144+145+146+147+161+162+163+164+165+166+167+168+170+171+172+173+174+175+181+186+187+188+189+190+192
show cartoon
hide lines
color blue, pocket
show lines, pocket

--- ProBis command to select binding site:

:A and (20,21,22,23,24,25,26,27,28,29,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,54,57,61,66,85,86,87,116,117,118,119,120,139,140,141,142,143,144,145,146,147,161,162,163,164,165,166,167,168,170,171,172,173,174,175,181,186,187,188,189,190,192)

