# T015 · Binding site detection

Authors:
- adapted from Abishek Laxmanan Ravi Shankar, 2019, Volkamer Lab
- Andrea Volkamer, 2020, [Volkamer lab, Charité](https://volkamerlab.org/)
- Dominique Sydow, 2020, [Volkamer lab, Charité](https://volkamerlab.org/)

## Aim of this talktorial

The binding site of a protein is the key to its function. In this talktorial, we introduce the concepts of computational binding site detection tools, exemplified on an EGFR structure using DoGSiteScorer from the  [protein.plus](https://proteins.plus/) web server. 
Additionally, we will compare the results to the pre-defined KLIFS binding site by calculating the percentage of residues in accordance between the two sets.

### Contents in *Theory*

* Protein binding sites
* Binding site detection
    * Methods overview
    * DoGSiteScorer
* Comparison to KLIFS pocket

### Contents in *Practical*

* Binding site detection using DoGSiteScorer
    * Submit job to DoGSiteScorer and get job location
    * Get DoGSiteScorer pocket metadata
    * Sort pockets based on their drugScore
    * Get best binding site file content
    * Get residue IDs and names of best pocket
* Get KLIFS pocket
    * Get KLIFS structure ID for a PDB ID of interest
    * Get pocket from KLIFS ID and extract residues
* Comparison DoGSiteScorer and KLIFS pocket
    * Get DoGSiteScorer coverage with KLIFS results

### References
* Prediction, Analysis, and Comparison of Active Sites ( Journal of Chemical Information and Modeling 2018 6,7)
* DoGSiteScorer paper
* NAR: proteins.plus
* KLIFS database ( KLIFS Website)
* KLIFS: a structural kinase-ligand interaction database (KLIFS Pubmed 2016, Nucleic Acids Research, 44, 365-371)
* KLIFS: A Knowledge-Based Structural Database To Navigate Kinase−Ligand Interaction Space (KLIFS Journal of Medicinal Chemistry 2013, 57, 249-257)

*We suggest the following citation style:*
* Keyword describing resource: <i>Journal</i> (year), <b>volume</b>, pages (link to resource) 

*Example:*
* ChEMBL web services: <i>Nucleic Acids Res.</i> (2015), <b>43</b>, 612-620 (https://academic.oup.com/nar/article/43/W1/W612/2467881) 

## Theory

### Protein binding sites

Most biological processes are guided through (non-)reversible binding of molecules. Given a therapeutic target associated to a specific disease, knowing its binding site(s), i.e. the key to the proteins function, is of utmost important for designing new drugs.

Depending on the given data, e.g. no protein-ligand complex structure (x-ray) is available or one is interested in allosteric sites, binding site detection algorithm come into play. Binding sites, or in the case of enzyme rather called active sites, are defined as cavities in 3-dimensional space, mostly on the surface of a protein structure, that serve as binding (docking) regions for ligands, peptides, or proteins. There the two binding partners need to be complementary w.r.t. to shape and physico-chemical properties (like a key and a look).


**Note**: TODO: Include images like this (place images in `images/`):

![ChEMBL web service schema](images/chembl_webservices_schema_diagram.jpg)

*Figure 1:* 
Describe figure and add reference.
Figure and description taken from: [<i>Nucleic Acids Res.</i> (2015), <b>43</b>, 612-620](https://academic.oup.com/nar/article/43/W1/W612/2467881).

### Binding site detection

#### Methods overview

If ligand information is available (protein-ligand complex), then the ligand-surrounding protein region can simply be defined as pocket (e.g. using all protein residues within a predefined radius of the ligand atoms suhc as 6 Å). If the ligand is absent, detection tools can be used for in silico pocket detection. These methods can be largely grouped one one hand into geometry- and energy-based methods as well as on the other hand in grid-based and grid-free approaches as outlined in the Figure 1. Note that in recent years, more and more machine learning (ML) based methods have been developed.

**TODO**: Figure 2: Binding site detection methods can be grouped into geometry-based and energy-based approaches as well as grid-based and grid-free approaches. Figure from Prediction, Analysis, and Comparison of Active Sites. Volkamer et al., 2018 (Journal of Chemical Information and Modeling 2018 6,7)

**Geometry-based approaches** analyze the shape of a molecular surface to locate cavities. They are based upon the 3D spatial arrangement of the atoms on the protein surface. **Energy-based approaches** record interactions of probes or a molecular fragment with the protein. Favorable energetic responses are thereby assigned to pockets. Both strategies can be performed on a Cartesian **grid-based** representation of the protein (i.e. checking the environment per grid point) or without (i.e. **grid-free**). In the following an example for each of the four categories will be shortly introduced:
* *Geometric, grid-based approach*: In **LIGSITE** (Journal of Molecular Graphics and Modelling 1997 6 359-363), a Cartesian grid (e.g. 1A grid spacing) is spanned over the protein of interest. Each grid point is then scanned in seven direction (along the X, Y and Z axes as well as the fours cubic diagonals) and the number of Protein Solvent Protein (PSP) event per point is stored (# rays restricted on both side by the protein). Finally, grid points that are buried (= have a high PSP value) are clustered to pockets.
* *Geometric, grid-free approach*: In **SURFNET** (Journal of Molecular Graphics1995, 13323-330), spheres are placed midway between any two pairs of atoms on the protein surface directly. In case a probe clashes with any nearby atom, its radius is reduced until no overlap occurs. The resulting probes define the cavities.
* *Energy, grid-based approach*: In **DrugSite** (Genome Informatics 2004 15 31–41), the protein is embedded in a Cartesian grid and carbon probes are placed on each grid point. Then, van der Waal's energies between the probe and the protein environment within 8 Å distance are calculated. Grid points with unfavorable energies, i.e. above an engery cut-off based on the mean energy and standard deviation over the whole grid, are discarded. Finally, grid points fulfilling these cut-off are merged to pockets.
* *Energy, grid-free approach*:  In **docking**-based methods, fragments (or small molecules) are docked against the protein of interest (placed and scored, for more info on Docking see talktorial Txxx). Pockets are then assigned based on the quantity of fragments that bind to a specific area.

#### DoGSiteScorer

In this talktorial, we will use the DoGSiteScorer functionality, available within protein.plus, to detect and score the pockets of a protein of interest. Thus, it's algorithm will be explained in a bit more detail.

* Pocket Detection: DoGSiteScorer incorporates a **geometric** and **grid-based** algorithm to detect pockets. The protein is embedded in a Cartesian grid, and each grid point is labeled as either 0 (free) or 1 (occupied), depending on if it lies within any protein atom's vdw radius. Then, an edge-detection algorithm from image processing, a **Difference of Gaussian filter** (thus the name DoGSite) is invoked to identify protrusion on the protein surface (i.e. the positions on a protein surface where the location of a sphere-like object is favorable). Based on specific cut-off criteria, grid points with the highest intensity are selected and first clustered to subpockets, then merged to pockets. 
* Descriptor calculation: TODO
* Druggability estimates: TODO

TODO: Figure


### Comparison
Once we obtain the binding site of interest from DoGSiteScorer, we can compare the results with any other method in order to validate it. Here we compare it with the KLIFS binding pocket for our target kinase structure using the KLIFS API (see **Talktorial T012** for more detail).

**KLIFS pocket definition** (in a nutshell)
The KLIFS (Kinase-Ligand Interaction Fingerprints and Structures) database, is a structural repository of information on over 3600 human and mouse kinase structures. KLIFS thereby enables us to do systematic comparison or analysis of the structures, chemical features, bound ligands and interactions to such of all available structures. KLIFS comes with a nomenclature of typical structural motifs within kinases (such as DFG-in/out, hinge region, ...) and maps the binding site of all known kinases 85 residues, defined via an elaborated multiple seqeuence alignment. It is possible to compare the interaction patterns of kinase-inhibitors to each other to, for example, identify crucial interactions determining kinase-inhibitor selectivity. 

E.g., the KLIFS API allows us to return the binding pocket of a specific kinase protein of interest for further analysis.

## Practical

Short summay of what will be done in this practical section.

Please add all your imports on top of this section.

In [2]:
from pathlib import Path
import time

import pandas as pd
from biopandas.pdb import PandasPdb
import requests

from opencadd.databases.klifs import setup_remote

pd.set_option('display.max_columns', 50)

Add globals to this talktorial's path (`HERE`) and its data folder (`DATA`).

In [2]:
HERE = Path(_dh[-1])
DATA = HERE / "data"

### Define kinase structure of interest

In [3]:
pdb_id = "3w32"
chain_id = "A"

### Binding site detection using DoGSiteScorer

In [4]:
def dogsitescorer_submit_with_pdbid(pdb_code, chain_id, ligand=''):
    """
    Submit PDB ID to DoGSiteScorer webserver using their API and get back URL for job location.
    
    Parameters
    ----------
    pdb_id : str 
        4-letter valid PDB ID, e.g. '3w32'.
    chain : str
        Chain ID, e.g. 'A'.
    ligand : str
        Name of ligand bound to PDB structure with pdb_id, e.g. 'W32_A_1101'. 
        Currently, the ligand name must be checked manually on the DoGSiteScorer website. 
        
    Returns
    -------
    str
        Job location URL for submitted query.
        
    References
    ----------
        Function is adapted from: https://github.com/volkamerlab/TeachOpenCADD/pull/3 (@jaimergp)
    """
    
    # Submit job to proteins.plus
    r = requests.post("https://proteins.plus/api/dogsite_rest",
        json={
            "dogsite": {
                "pdbCode": pdb_code,
                "analysisDetail": "0",
                "bindingSitePredictionGranularity": "1",
                "ligand": ligand,
                "chain": chain_id
            }
        },
        headers= {'Content-type': 'application/json', 'Accept': 'application/json'}
    )

    r.raise_for_status()
    
    return r.json()['location']

In [5]:
# Identifying the job location where the work is submitted to the web server
job_location = dogsitescorer_submit_with_pdbid(pdb_id, chain_id)
job_location

'https://proteins.plus/api/dogsite_rest/nCrcoastyQTs7MnaGLesMNYF'

In [6]:
# Wait a bit so that job can finish
time.sleep(30)

#### Get DoGSiteScorer pocket metadata

In [7]:
def get_dogsitescorer_metadata(job_location):
    """
    Get results from a DoGSiteScorer query, i.e. the binding sites which are found over the protein surface, 
    in the form of a table with the details about all detected pockets.

    Parameters
    ----------
    job_location : str
        Consists of the location of a finished DoGSiteScorer job on the proteins.plus web server.
    
    Returns
    -------
    pandas.DataFrame
        Table with metadata on detected binding sites. 
    """
    
    # Get job results
    result = requests.get(job_location)
    
    # Get URL of result table file
    result_file = result.json()['result_table']
    
    # Get result table
    result_table = requests.get(result_file).text
    
    # Split string into list of lists (=table)
    result_table_split = [i.split('\t') for i in result_table[:-1].split('\n')]

    # Remove spaces
    result_table_split = [[j.replace(' ', '') for j in i] for i in result_table_split]

    # Extract column names, index names, table body
    column_names = result_table_split[0]
    index_names = [i[0] for i in result_table_split[1:]]
    table = [i[1:] for i in result_table_split[1:]]

    # Convert to DataFrame
    result_table_df = pd.DataFrame(
        table,
        columns=column_names[1:],
        index=index_names
    )
    result_table_df.index.name = 'name'
    
    # Convert number strings to numeric values
    for name, data in result_table_df.iteritems():
        try:
            result_table_df[name] = pd.to_numeric(data)
        except ValueError:
            pass
    
    return result_table_df

In [8]:
# Retrieving all the guessed pockets from DoGSiteScorer Web server 
metadata = get_dogsitescorer_metadata(job_location)
metadata

Unnamed: 0_level_0,lig_cov,poc_cov,lig_name,volume,enclosure,surface,depth,surf/vol,lid/hull,ellVol,ellc/a,ellb/a,siteAtms,accept,donor,hydrophobic_interactions,hydrophobicity,metal,Cs,Ns,Os,Ss,Xs,negAA,posAA,polarAA,apolarAA,ALA,ARG,ASN,ASP,CYS,GLN,GLU,GLY,HIS,ILE,LEU,LYS,MET,PHE,PRO,SER,THR,TRP,TYR,VAL,simpleScore,drugScore
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1
P_0,0.0,0.0,"""""",1422.66,0.1,1673.75,19.26,1.176493,-,-,0.13,0.67,288,86,40,71,0.36,0,198,45,41,4,0,0.1,0.13,0.24,0.53,4,5,2,5,2,2,1,5,0,3,12,3,2,3,3,1,2,1,1,5,0.63,0.810023
P_1,0.0,0.0,"""""",708.99,0.13,1030.19,14.32,1.453039,-,-,0.14,0.59,140,44,13,34,0.37,0,98,17,25,0,0,0.14,0.11,0.36,0.39,3,1,1,0,0,0,4,4,0,1,4,2,0,1,1,2,2,0,1,1,0.46,0.755915
P_2,0.0,0.0,"""""",286.21,0.18,462.29,11.83,1.615213,-,-,0.23,0.44,70,20,9,13,0.31,0,50,9,9,2,0,0.12,0.06,0.18,0.65,1,0,0,2,0,1,0,0,0,1,1,1,2,1,2,0,1,0,1,3,0.09,0.537137
P_3,0.0,0.0,"""""",244.16,0.04,514.94,14.34,2.109027,-,-,0.12,0.31,92,18,7,24,0.49,0,67,12,13,0,0,0.12,0.06,0.41,0.41,0,0,1,1,1,0,1,1,1,0,1,0,0,2,3,2,1,0,1,1,0.12,0.572013
P_4,0.0,0.0,"""""",169.28,0.21,373.47,11.49,2.206226,-,-,0.12,0.16,59,14,3,24,0.59,0,44,7,7,1,0,0.08,0.08,0.17,0.67,0,0,0,1,0,1,0,0,0,3,0,1,2,1,2,0,1,0,0,0,0.07,0.424397
P_5,0.0,0.0,"""""",166.59,0.16,347.46,11.99,2.085719,-,-,0.14,0.24,58,18,8,10,0.28,0,39,8,11,0,0,0.15,0.08,0.46,0.31,0,0,1,1,0,2,1,1,1,1,1,0,1,0,1,1,0,0,1,0,0.0,0.401384
P_6,0.0,0.0,"""""",155.14,0.0,146.79,11.39,0.946178,-,-,0.37,0.62,81,16,6,1,0.04,0,55,10,15,1,0,0.09,0.14,0.23,0.55,3,2,0,2,0,0,0,0,1,1,1,0,2,0,1,3,1,2,1,2,0.0,0.399009
P_7,0.0,0.0,"""""",116.03,0.27,227.72,8.04,1.962596,-,-,0.26,0.57,31,12,8,4,0.17,0,20,6,5,0,0,0.3,0.1,0.2,0.4,0,1,0,0,0,1,3,0,0,1,1,0,0,0,1,0,0,1,1,0,0.0,0.224937
P_8,0.0,0.0,"""""",105.98,0.23,206.04,10.16,1.94414,-,-,0.15,0.2,37,6,4,11,0.52,0,28,5,4,0,0,0.18,0.18,0.09,0.55,0,1,0,1,0,0,1,0,0,2,1,1,0,0,2,0,0,1,1,0,0.0,0.308046
P_9,0.0,0.0,"""""",104.32,0.09,202.13,9.2,1.937596,-,-,0.3,0.4,42,10,5,8,0.35,0,29,7,5,1,0,0.1,0.2,0.2,0.5,1,1,0,0,0,0,1,0,1,2,0,0,1,0,0,1,0,0,1,1,0.0,0.267262


#### Sort pockets based on their drugScore

In [9]:
# Sort the obtained binding site by descending drugScore
metadata.sort_values(by='drugScore', ascending=False)

Unnamed: 0_level_0,lig_cov,poc_cov,lig_name,volume,enclosure,surface,depth,surf/vol,lid/hull,ellVol,ellc/a,ellb/a,siteAtms,accept,donor,hydrophobic_interactions,hydrophobicity,metal,Cs,Ns,Os,Ss,Xs,negAA,posAA,polarAA,apolarAA,ALA,ARG,ASN,ASP,CYS,GLN,GLU,GLY,HIS,ILE,LEU,LYS,MET,PHE,PRO,SER,THR,TRP,TYR,VAL,simpleScore,drugScore
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1
P_0,0.0,0.0,"""""",1422.66,0.1,1673.75,19.26,1.176493,-,-,0.13,0.67,288,86,40,71,0.36,0,198,45,41,4,0,0.1,0.13,0.24,0.53,4,5,2,5,2,2,1,5,0,3,12,3,2,3,3,1,2,1,1,5,0.63,0.810023
P_1,0.0,0.0,"""""",708.99,0.13,1030.19,14.32,1.453039,-,-,0.14,0.59,140,44,13,34,0.37,0,98,17,25,0,0,0.14,0.11,0.36,0.39,3,1,1,0,0,0,4,4,0,1,4,2,0,1,1,2,2,0,1,1,0.46,0.755915
P_3,0.0,0.0,"""""",244.16,0.04,514.94,14.34,2.109027,-,-,0.12,0.31,92,18,7,24,0.49,0,67,12,13,0,0,0.12,0.06,0.41,0.41,0,0,1,1,1,0,1,1,1,0,1,0,0,2,3,2,1,0,1,1,0.12,0.572013
P_2,0.0,0.0,"""""",286.21,0.18,462.29,11.83,1.615213,-,-,0.23,0.44,70,20,9,13,0.31,0,50,9,9,2,0,0.12,0.06,0.18,0.65,1,0,0,2,0,1,0,0,0,1,1,1,2,1,2,0,1,0,1,3,0.09,0.537137
P_4,0.0,0.0,"""""",169.28,0.21,373.47,11.49,2.206226,-,-,0.12,0.16,59,14,3,24,0.59,0,44,7,7,1,0,0.08,0.08,0.17,0.67,0,0,0,1,0,1,0,0,0,3,0,1,2,1,2,0,1,0,0,0,0.07,0.424397
P_5,0.0,0.0,"""""",166.59,0.16,347.46,11.99,2.085719,-,-,0.14,0.24,58,18,8,10,0.28,0,39,8,11,0,0,0.15,0.08,0.46,0.31,0,0,1,1,0,2,1,1,1,1,1,0,1,0,1,1,0,0,1,0,0.0,0.401384
P_6,0.0,0.0,"""""",155.14,0.0,146.79,11.39,0.946178,-,-,0.37,0.62,81,16,6,1,0.04,0,55,10,15,1,0,0.09,0.14,0.23,0.55,3,2,0,2,0,0,0,0,1,1,1,0,2,0,1,3,1,2,1,2,0.0,0.399009
P_8,0.0,0.0,"""""",105.98,0.23,206.04,10.16,1.94414,-,-,0.15,0.2,37,6,4,11,0.52,0,28,5,4,0,0,0.18,0.18,0.09,0.55,0,1,0,1,0,0,1,0,0,2,1,1,0,0,2,0,0,1,1,0,0.0,0.308046
P_9,0.0,0.0,"""""",104.32,0.09,202.13,9.2,1.937596,-,-,0.3,0.4,42,10,5,8,0.35,0,29,7,5,1,0,0.1,0.2,0.2,0.5,1,1,0,0,0,0,1,0,1,2,0,0,1,0,0,1,0,0,1,1,0.0,0.267262
P_7,0.0,0.0,"""""",116.03,0.27,227.72,8.04,1.962596,-,-,0.26,0.57,31,12,8,4,0.17,0,20,6,5,0,0,0.3,0.1,0.2,0.4,0,1,0,0,0,1,3,0,0,1,1,0,0,0,1,0,0,1,1,0,0.0,0.224937


In [10]:
def select_best_pocket(metadata, by='drugScore'):
    """
    select_best_pocket - This function uses 'drugScore' as a parameter to identify the best pocket 
    among the obtained pockets.
    
    Parameters
    ----------
    metadata : pd.DataFrame
        Guessed pockets retrieved from the DoGSiteScorer website
        
    by : str
        Method name to sort table by (default is to sort by drugScore).
        
    Returns 
    -------
    str
        Best binding site name.
    """
    
    by_methods = ['drugScore', 'volume']
    
    # Sort by best druggability score
    if by == 'drugScore':
        sorted_pocket = metadata.sort_values(by='drugScore', ascending=False)
    # Sort by volume
    elif by == 'volume':
        sorted_pocket = metadata.sort_values(by='volume', ascending=False)
    else:
        raise ValueError(f'Selection method not in list: {", ".join(by_methods)}')
                         
    # Get name of best pocket
    best_pocket_name = sorted_pocket.iloc[0, :].name     
        
    return best_pocket_name

In [11]:
# Get the name of the best pocket
best_pocket = select_best_pocket(metadata, by='drugScore')
best_pocket

'P_0'

#### Get best binding site file content

In [12]:
def get_pocket_locations(job_location):
    """
    Get the all pocket file locations for a finished DoGSiteScorer job.
    
    Parameters
    ----------
    job_location : str
        URL of finished job submitted to the DoGSiteScorer web server.
    
    Returns
    -------
    list
        List of all pocket file location on the DoGSiteScorer web server.
    """
    
    # Get job results
    result = requests.get(job_location)
    
    # Get residues
    return result.json()['residues']

In [13]:
def get_best_pocket_location(pocket_files, best_pocket):
    """
    Get the best binding site file location.
    
    Parameters
    ----------
    pocket_files : list
        List of all pocket file location on the DoGSiteScorer web server.
    best_pocket : str
        Best binding site name.

    Returns
    ------
    str
        Best pocket file location on the DoGSiteScorer web server.
    """
    result = []
    
    for pocket_file in pocket_files:
        
        if f'{best_pocket}_res' in pocket_file:
            result.append(pocket_file)
            
    if len(result) > 1:
        raise TypeError(f'Multiple strings detected: {", ".join(result)}.')
    elif len(result) == 0:
        raise TypeError(f'No string detected.')
    else:
        pass
            
    return result[0]

In [14]:
# Get URL for all PDB files
pocket_files = get_pocket_locations(job_location)
pocket_files

['https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_8_res.pdb',
 'https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_0_res.pdb',
 'https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_1_res.pdb',
 'https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_2_res.pdb',
 'https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_3_res.pdb',
 'https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_4_res.pdb',
 'https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_5_res.pdb',
 'https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_6_res.pdb',
 'https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_7_res.pdb',
 'https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_9_res.pdb']

In [15]:
# Get URL for PDB file containing the best pocket
best_pocket_location = get_best_pocket_location(pocket_files, best_pocket)
best_pocket_location

'https://proteins.plus/results/dogsite/nCrcoastyQTs7MnaGLesMNYF/3w32_P_0_res.pdb'

#### Get residue IDs and names of best pocket

In [16]:
def get_pocket_residues(pocket_location):
    """
    Gets residue IDs and names of a pocket.
    
    Parameters
    ----------
    pocket_location : str
        Best pocket file location on the DoGSiteScorer web server.
        
    Returns
    -------
    pandas.DataFrame
        Table of residues names and IDs for the best obtained binding site.
    """
    
    # Retrieve PDB file content from URL
    result = requests.get(pocket_location)

    # Get content of PDB file  
    pdb_residues = result.text
    
    # Load PDB format as DataFrame
    ppdb = PandasPdb()
    pdb_df = ppdb._construct_df(pdb_residues.splitlines(True))['ATOM']
    
    return pdb_df[['residue_number', 'residue_name']]

In [17]:
# Get residues of best pocket
pocket_residues = get_pocket_residues(best_pocket_location)
pocket_residues

Unnamed: 0,residue_number,residue_name
0,701,GLN
1,701,GLN
2,701,GLN
3,701,GLN
4,702,ALA
...,...,...
283,1017,LEU
284,1017,LEU
285,1017,LEU
286,1017,LEU


### Get KLIFS pocket

We are using the `opencadd.databases.klifs` module to extract the binding site residues (PDB residue numbering) as defined in the KLIFS database.

Please refer to __Talktorial T012__ for detailed information about KLIFS and the `opencadd.databases.klifs` module usage.

In [18]:
session = setup_remote()

NameError: name 'setup_remote' is not defined

In [None]:
# Get first structure KLIFS ID associated with PDB ID
structures = session.structures.by_structure_pdb_id(pdb_id)
structure_klifs_id = structures["structure.klifs_id"].iloc[0]
# Get the structure's pocket
pocket = session.pockets.by_structure_klifs_id(structure_klifs_id)
pocket

## Discussion

Wrap up the talktorial's content here and discuss pros/cons and open questions/challenges.

## Quiz

Ask three questions that the user should be able to answer after doing this talktorial. Choose important take-aways from this talktorial for your questions.

1. Question
2. Question
3. Question