# T016 · Protein-ligand interactions

Authors:

- Jaime Rodríguez-Guerra, [Volkamer lab](https://volkamerlab.org)
- Dominique Sydow, [Volkamer lab](https://volkamerlab.org)
- Michele Wichmann, 2019-2020, Charité/FU Berlin
- Talia B. Kimber, [Volkamer lab](https://volkamerlab.org)
- Andrea Volkamer, [Volkamer lab](https://volkamerlab.org)

## Aim of this talktorial

We use the Protein–Ligand Interaction Profiler, or PLIP, to get insight into protein-ligand interactions for any sample complex and visualize the interactions in 3D using NGLView.

### Contents in _Theory_

- Protein-ligand interactions
- Visualization: complex and interactions

### Contents in _Practical_

- Profiling protein-ligand interactions using PLIP
- Visualization with NGLView

### References

- Review on protein-ligand interactions ([_Int. J. Mol. Sci._ (2016), __17__, 144](https://www.mdpi.com/1422-0067/17/2/144))
- A systematic analysis of non-covalent interactions in the PDB database ([_M. Med. Chem. Commun._ (2017), __8__, 1970-1981](https://pubs.rsc.org/en/content/articlelanding/2017/md/c7md00381a#!divAbstract))
- A chapter about how protein–ligand interactions are key for drug action (in [Klebe G. (eds) Drug Design. Springer, Berlin, Heidelberg.](https://link.springer.com/referenceworkentry/10.1007%2F978-3-642-17907-5_4))
* NGLView, the interactive visualizer for Jupyter notebooks ([_Bioinformatics_ (2018), __34__, 1241–124](https://doi.org/10.1093/bioinformatics/btx789))
* PLIP, the Protein–Ligand Interaction Profiler ([_Nucl. Acids Res._ (2015), __43__, W1, W443-W447](https://academic.oup.com/nar/article/43/W1/W443/2467865))

## Theory

### Protein-ligand interactions

Ligand binding is mainly governed by non-covalent interactions between the ligand and the surface of the protein pocket or protein-protein interface. This process is a function of electrostatic and shape complementarities, induces fitting, desolvation processes and more.

Some quotes from the literature:

Adapted from [José L. Medina-Franco, Oscar Méndez-Lucio, Karina Martinez-Mayorga](https://www.sciencedirect.com/science/article/pii/S1876162314000029):

> Understanding protein–ligand interactions (PLIs) and protein–protein interactions (PPIs) is at the core of molecular recognition and has a fundamental role in many scientific areas. PLIs and PPIs have a broad area of practical applications in drug discovery including but not limited to molecular docking, structure-based design, virtual screening of molecular fragments, small molecules, and other types of compounds, clustering of complexes, and structural interpretation of activity cliffs, to name a few.

Of course, these interactions can be rationalized in several ways, which opens the door to systematic analysis of the docking solutions. For example, as adapted from [Med. Chem. Commun., 2017,8, 1970-1981](https://pubs.rsc.org/en/content/articlelanding/2017/md/c7md00381a#!divAbstract):

> We extracted from the PDB all X-ray structures of small-molecules in complex with proteins, with a resolution ≤2.5 Å, resulting in a collection of 11,016 complexes. To be considered as a ligand, the compound had to meet several criteria such as being a small molecule and be of interest for medicinal chemistry applications (buffers or part of crystallization cocktails were excluded). This collection contained 750,873 ligand–protein atom pairs, where a pair of atoms is defined as two atoms separated by 4 Å or less. The top-100 most frequent ligand–protein atom pairs can be clustered into seven interaction types (see figure below). Among the most frequently observed are interactions that are well known and widely used in ligand design such as hydrophobic contacts, hydrogen bonds and π-stacking. These are followed by weak hydrogen bonds, salt bridges, amide stacking, and cation–π interactions.

![protein ligand non-covalent interactions](images/protein_ligand_non_covalent.gif)

_Figure 1_ : Frequency of non-colavent interactions from the PDB data base. Figure extracted from the paper by de Freitas, Renato Ferreira, and Matthieu Schapira, [A systematic analysis of atomic protein–ligand interactions in the PDB](https://doi.org/10.1039/C7MD00381A).

There are several programs to assess protein-ligand interactions in an automated way. [PLIP](https://plip.biotec.tu-dresden.de/plip-web/plip/index) is one of the most popular thanks to its publicly available webserver and free-to-use Python library. The [supporting information](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4489249/bin/supp_gkv315_nar-00254-web-b-2015-File003.pdf) accompanying the manuscript describes protein-ligand interactions in a simple way that is very easy to understand. As an introduction, this paragraph is enough:

> PLIP uses a rule-based system for detection of non-covalent interactions between protein
residues and ligands. Information on chemical groups able to participate in a specific interaction
(e.g. requirements for hydrogen bond donors) and interaction geometry (e.g. distance and
angle thresholds) from literature are used to detect characteristics of non-covalent interactions
between contacting atoms of protein and ligands. For each binding site, the algorithm searches
first for atoms or atom groups in the protein and ligand which could possibly be partner in
specific interactions. Subsequently, geometric rules are applied to match groups in protein
and ligand forming an interaction.

![aromatic interaction](images/aromatic_interaction.png)

_Figure 2_ : Aromatic stacking in protein-ligand interactions. The figure is taken from the Supplementary Data in the [PLIP paper](https://doi.org/10.1093/nar/gkv315).

For more details, have a look at the PDF document shown below.

In [1]:
from teachopencadd.utils import show_pdf

In [2]:
pdf = "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4489249/bin/supp_gkv315_nar-00254-web-b-2015-File003.pdf"
show_pdf(pdf)

### Visualization: complex and interactions

We will use `nglview` for visualization. It's a [web-based molecular viewer](https://doi.org/10.1093/bioinformatics/btx789) that can be run on Jupyter notebooks. We will first use it in a basic way to visualize a complex of interest. And then we will make use of `ipywidgets` layouts to visualize protein-ligand interactions.

## Practical

As a very first step, let's install a working version of PLIP:

In [3]:
# !pip install -U https://github.com/pharmai/plip/archive/development.tar.gz --no-deps

In [4]:
# imports
from pathlib import Path
from warnings import filterwarnings
import time
import pandas as pd
import nglview as nv
import openbabel

from plip.structure.preparation import PDBComplex
from plip.exchange.report import BindingSiteReport



In [5]:
HERE = Path(_dh[-1])
DATA = HERE / "data"

As a show case for this notebook, we choose the EGFR kinase. The PDB that will be looked into is given by the ID `3poz`. Let's use `nglview` to vizualise the structure in a notebook cell.

_Note_: the complex can easily be changed by adapting the PDB ID in the cell below.

In [6]:
pdb_id = "3poz"

In [7]:
ngl_viewer = nv.show_pdbid(pdb_id)
ngl_viewer

NGLWidget()

### Profiling protein-ligand interactions using PLIP

PLIP offers [a webserver](https://projects.biotec.tu-dresden.de/plip-web/plip) for automated analysis, but unfortunately there is no API. We could try to use the HTML forms as if we were using the standard web UI, but since the library itself is Python-3 ready and very easy to install with `pip`, we can just use it locally for simplicity.

PLIP only accepts PDB files, so we can pass the PDB file to PLIP and let it do its magic. The `BindingSiteReport` class processes each detected binding site in `PDBComplex` and creates an object with the (eight) fields we are interested in, namely

- hydrophobic interaction : `hydrophobic`
- hydrogen bond : `hbond`
- water bridge : `waterbridge`
- salt bridge : `saltbridge`
- $\pi$-stacking (parallel and perpendicular) : `pistacking`
- $\pi$- cation : `pication`
- halogen bond : `halogen`
- metal complexation : `metal`


These fields are divided in `<field>_features` (containing column names) and `<field>_info` (containing the actual records). If we iterate over the object retrieving the correct attribute name with `getattr()`, we can compose a dictionary that can be passed to a `pandas.DataFrame` for nice overviews.

In [8]:
def analyze_interactions(pdbfile):
    """
    Retrieves the interactions from PLIP.
    
    Parameters
    ----------
    pdbfile : 
            pdbfile of the complex.
            
    Returns
    -------
    dict :
            A dictionary of the binding sites and the interactions.
    """
    protlig = PDBComplex()
    protlig.load_pdb(pdbfile)  # load the pdb
    for ligand in protlig.ligands:
        protlig.characterize_complex(ligand)  # find ligands and analyze interactions
    sites = {}
    for key, site in sorted(protlig.interaction_sets.items()):
        binding_site = BindingSiteReport(site)  # collect data about interactions
        # tuples of *_features and *_info will be converted to pandas data frame
        keys = (
            "hydrophobic",
            "hbond",
            "waterbridge",
            "saltbridge",
            "pistacking",
            "pication",
            "halogen",
            "metal",
        )
        interactions = {
            k: [getattr(binding_site, k + "_features")] + getattr(binding_site, k + "_info")
            for k in keys
        }
        sites[key] = interactions
    return sites

In [9]:
interactions_by_site = analyze_interactions(f"{pdb_id}.pdb")
#interactions_by_site = analyze_interactions(DATA/"3poz.pdb")
# TODO : error when running cell using DATA folder
interactions_by_site  # TODO: CI

{'03P:A:1023': {'hydrophobic': [('RESNR',
    'RESTYPE',
    'RESCHAIN',
    'RESNR_LIG',
    'RESTYPE_LIG',
    'RESCHAIN_LIG',
    'DIST',
    'LIGCARBONIDX',
    'PROTCARBONIDX',
    'LIGCOO',
    'PROTCOO'),
   (745,
    'LYS',
    'A',
    1023,
    '03P',
    'A',
    '3.91',
    2399,
    320,
    (18.317, 32.25, 10.052),
    (20.469, 34.989, 8.267)),
   (788,
    'LEU',
    'A',
    1023,
    '03P',
    'A',
    '3.66',
    2384,
    595,
    (18.404, 30.743, 6.486),
    (18.317, 33.573, 4.169)),
   (790,
    'THR',
    'A',
    1023,
    '03P',
    'A',
    '3.80',
    2398,
    611,
    (16.476, 34.203, 10.862),
    (12.875, 33.449, 9.914)),
   (854,
    'THR',
    'A',
    1023,
    '03P',
    'A',
    '3.82',
    2383,
    1138,
    (18.135, 32.543, 11.422),
    (17.798, 28.992, 12.797)),
   (858,
    'LEU',
    'A',
    1023,
    '03P',
    'A',
    '3.93',
    2384,
    1167,
    (18.404, 30.743, 6.486),
    (22.084, 30.736, 5.093))],
  'hbond': [('RESNR',
    'RESTYPE',


This dictionary is composed of two levels:

- First level is the detected binding sites.

- For each binding site, we have one more sub-dictionary containing eight lists, one for each specific interaction. Each list will contain the column names in the first row, and the data (if available) in the following.

Let's see how many binding sites are detected using our complex of interest.

In [10]:
print(f"Number of binding sites detected in {pdb_id} : \n "
      f"{len(interactions_by_site)}"
     )  # TODO: CI

Number of binding sites detected in 3poz : 
 4


We can construct a `pandas.DataFrame` for a binding site and particular interaction type.

In [11]:
def create_df_from_binding_site(binding_site_id, interaction_type="hbond"):
    """
    Creates a data frame from a binding site and interaction type.
    
    Parameters
    ----------
    binding_site_id : str
        The binding site of interest.
    interaction_type : str, optional
        The interaction type of interest (default set to hydrogen bond).
        
    Returns
    -------
    DataFrame : 
        Data frame with information retrieved from PLIP.
    """
    df = pd.DataFrame.from_records(
    # data is stored AFTER the column names
    interactions_by_site[binding_site_id][interaction_type][1:],
    # column names are always the first element
    columns=interactions_by_site[binding_site_id][interaction_type][0],  
    )  
    return df

In the next cell, we show the hydrogen interactions from the third detected binding site.

In [12]:
create_df_from_binding_site(list(interactions_by_site.keys())[2])

Unnamed: 0,RESNR,RESTYPE,RESCHAIN,RESNR_LIG,RESTYPE_LIG,RESCHAIN_LIG,SIDECHAIN,DIST_H-A,DIST_D-A,DON_ANGLE,PROTISDON,DONORIDX,DONORTYPE,ACCEPTORIDX,ACCEPTORTYPE,LIGCOO,PROTCOO
0,996,ASN,A,2,SO4,A,False,1.98,2.89,151.35,True,2223,Nam,2368,O3,"(12.465, 35.968, 31.608)","(10.739, 37.795, 30.186)"
1,997,PHE,A,2,SO4,A,False,2.12,3.0,149.36,True,2231,Nam,2365,O2,"(13.708, 36.129, 29.523)","(12.221, 38.072, 27.779)"
2,994,ASP,A,2,SO4,A,False,3.34,3.93,121.05,False,2368,O3,2212,O2,"(12.465, 35.968, 31.608)","(8.716, 36.654, 32.562)"


### Visualization with NGLView

Now, let's try to represent those interactions in the NGL viewer. We can draw cylinders between the interaction points (`LIGCOOVisualization with NGLView` and `PROTCOO` in the `pandas.DataFrame`) and color-code them as shown in `color_map`, which uses RGB tuples.

In [13]:
color_map = {
    "hydrophobic": [0.90, 0.10, 0.29],
    "hbond": [0.26, 0.83, 0.96],
    "waterbridge": [1.00, 0.88, 0.10],
    "saltbridge": [0.67, 1.00, 0.76],
    "pistacking": [0.75, 0.94, 0.27],
    "pication": [0.27, 0.60, 0.56],
    "halogen": [0.94, 0.20, 0.90],
    "metal": [0.90, 0.75, 1.00],
}


def show_interactions_3D(pdb=pdb_id):
    """
    3D visualization of protein-ligand interactions.
    
    Parameters
    ----------
    pdb : str
        The pdb ID of interest.
    
    Returns
    -------
    NGL viewer with explicit interactions given by PLIP.
    
    """

    viewer = nv.NGLWidget(height="600px", default=True, gui=True)
    prot_component = viewer.add_pdbid(pdb_id)  # add protein
    time.sleep(1)

    # Add interactions
    interactions_by_site = analyze_interactions(f"{pdb_id}.pdb")
    interacting_residues = []
    for site, interactions in interactions_by_site.items():
        for interaction_type, interaction_list in interactions.items():
            color = color_map[interaction_type]
            if len(interaction_list) == 1:
                continue
            df_interactions = pd.DataFrame.from_records(
                interaction_list[1:], columns=interaction_list[0]
            )
            for _, interaction in df_interactions.iterrows():
                name = interaction_type
                viewer.shape.add_cylinder(
                    interaction["LIGCOO"], interaction["PROTCOO"], color, [0.1], name,
                )
                interacting_residues.append(interaction["RESNR"])
    # Display interacting residues
    res_sele = " or ".join([f"({r} and not _H)" for r in interacting_residues])
    res_sele_nc = " or ".join(
        [f"({r} and ((_O) or (_N) or (_S)))" for r in interacting_residues]
    )
    prot_component.add_ball_and_stick(
        sele=res_sele, colorScheme="chainindex", aspectRatio=1.5
    )
    prot_component.add_ball_and_stick(
        sele=res_sele_nc, colorScheme="element", aspectRatio=1.5
    )

    return viewer

In [14]:
show_interactions_3D()

NGLWidget()

Tab(children=(Box(children=(Box(children=(Box(children=(Label(value='step'), IntSlider(value=1, min=-100)), la…

## Discussion

In this talktorial we have learned about protein-ligand interactions, more specifically in the context of the Protein–Ligand Interaction Profiler, or PLIP for short. We created a data frame to depict the interactions in a table. Furthermore, we made use of the NGL viewer to visualize these interactions in 3D, which require a good amount of web technologies, mainly based around the NGL viewer itself and `ipywidgets` layouts.

## Quiz

- Do some interactions seem more important than others?
- What's the main difference between hydrophobic interactions and hydrogen bonds? How are they similar?
- What can be a considerable advantage of using PLIP over KLIFS?