# Talktorial 11 (part C)

# CADD web services that can be used via a Python API

__Developed at AG Volkamer, Charité__

Dr. Jaime Rodríguez-Guerra

## Aim of this talktorial

> This is part C of the "Online webservices" talktorial:
>
> - 11a. Querying KLIFS & PubChem for potential kinase inhibitors
> - 11b. Docking the candidates against the target obtained in 11a
> - __11c. Assessing the results and comparing against known data__

After obtaining input structures and docking them, we will assess whether the results are any good.

## Learning goals

### Theory

- Protein-ligand interactions
- False positives in docking

### Practical

- Visualize the results
- Run automated analysis


## References

Pending.


***

## Theory

### Protein-ligand interactions

Pending

### False positives

Pending

***

## Practical

### Visualize the results

Use `nglview` for that! It's a web-based molecular viewer that can be run on Jupyter Notebooks! Also, it's compatible with `PDBQT` files out of the box (but will only load the first model... we will see how to deal with that).

To install `nglview` run:

We will use `ipywidgets` to create an interactive GUI in the notebook. That way, we can click in the different ligands and the viewer will be refreshed accordingly. In particular, we want our little GUI to:

- Show the list of poses and their affinities, as reported in the Vina output
- Show the protein structure with a ribbon representation, the ligand with ball and stick, and the surrounding residues with licorice (stick-only)
- The 3D visualization should respond to the user selecting a different pose in the list.

So, this means that we need to:

1. Invoke the NGL viewer with the adequate representations
2. Build an interactive table of results (hint: use `ipywidgets.Select`)
3. Write an event handler that can communicate with the NGL Viewer: when the user clicks on a new entry, update the ligand in display, along with the surrounding residues. The ribbon should not need to be updated.

In [1]:
import pandas as pd
import time
import nglview as nv
# ipywidgets must be v7.5+ to provide AppLayout
from ipywidgets import AppLayout, Layout, Select

_ColormakerRegistry()

The PDBQT file created by Vina contain several models, but `nglview` will only parse the first one. The workaround is simple: divide the file into individual models by splitting whenever an `ENDMDL` line is found.

In [2]:
def split_pdbqt(path):
    """
    Split a multimodel PDBQT into separate files.
    """
    files = []
    with open(path) as f:
        lines = []
        i = 0
        for line in f:
            lines.append(line)
            if line.strip() == 'ENDMDL':
                fn = f'data/results.{i}.pdbqt'
                with open(fn, 'w') as o:
                    o.write(''.join(lines))
                files.append(fn)
                i += 1
                lines = []
    return files

The Vina output is a simple text file that contains the table of results. Parsing that table is relatively straightforward. We return a Pandas DataFrame for a simple visualization, if needed.

In [3]:
def parse_output(out):
    """
    Create a DataFrame out of the Vina output file
    """
    with open(out) as f:
        data = []
        for line in f:
            if line.startswith('-----+'):
                line = next(f)
                while line.split()[0].isdigit():
                    index, *floats = line.split()
                    data.append([int(index)] + list(map(float, floats)))
                    line = next(f)
    return pd.DataFrame.from_records(data, 
                                     columns=['Mode', 'Affinity (kcal/mol)', 'RMSD (l.b.)', 'RMSD (u.b.)'], 
                                     exclude=['Mode'])

With this function we can convert this text file:

In [14]:
!cat data/vina.out

#################################################################
# If you used AutoDock Vina in your work, please cite:          #
#                                                               #
# O. Trott, A. J. Olson,                                        #
# AutoDock Vina: improving the speed and accuracy of docking    #
# with a new scoring function, efficient optimization and       #
# multithreading, Journal of Computational Chemistry 31 (2010)  #
# 455-461                                                       #
#                                                               #
# DOI 10.1002/jcc.21334                                         #
#                                                               #
# Please see http://vina.scripps.edu for more information.      #
#################################################################

Reading input ... done.
Setting up the scoring function ... done.
Analyzing the binding site ... done.
Using random seed: -4

... into this nicely formatted `pandas` DataFrame:

In [15]:
parse_output("data/vina.out")

Unnamed: 0,Affinity (kcal/mol),RMSD (l.b.),RMSD (u.b.)
0,-8.4,0.0,0.0
1,-7.6,1.475,1.941
2,-6.0,5.211,9.34
3,-5.8,1.457,2.153


The Vina output contains three columns:

1. The estimated binding affinity
2. RMSD with respect to the best solution (lowest affinity) using a symmetry-corrected algorithm
3. Same, but without the symmetry correction

We are only interested in the affinity, so in the following cells you will see that we only get a single column for that dataframe.

***

Now we create the NGL viewer instance. Instead of creating a new one for each protein-pose pair, we will reuse the same canvas all over, hiding or showing the needed ligands. We will load everything first, while also labeling the ligands with their respective affinity. Each molecule loaded into the viewer is called a "component". The protein will be loaded first, so it will be `component_0`. Ligands will follow starting with `component_1` and so on.

In [4]:
def create_viewer(protein, ligands, affinities):
    """
    Create a nglview widget with the protein and all the ligands labeled by affinities
    """
    viewer = nv.show_file(protein)
    # Select first atom in molecule (@0) so it holds the affinity label
    label_kwargs = dict(labelType="text", sele="@0", showBackground=True, backgroundColor="black")
    for ligand, affinity in zip(ligands, affinities):
        ngl_ligand = viewer.add_component(ligand)
        ngl_ligand.add_label(labelText=[str(affinity)], **label_kwargs)
    return viewer

And finally, in this cell below we will build the actual GUI!

It will be composed of two widgets arranged horizontally using the `ipywidgets.AppLayout` layour.

- The selector (`ipywidgets.Select`)
- The NGL viewer itself

When the user clicks on a new entry in the selector, `_on_selection_change` will be called, which will:

1. Check if the new value is any different from the previous one. If that's the case, then:
2. Hide all ligands (simpler way to hide the previous one; no need to check individually)
3. Show the new one and center the camera on it with a cool 500ms animation
4. Execute some JavaScript on the NGL viewer to update the list of sidechains within 5A of the new pose center of mass.

In [20]:
# JavaScript code needed to update residues around the ligand
# because this part is not exposed in the Python widget
# Based on: http://nglviewer.org/ngl/api/manual/snippets.html
_RESIDUES_AROUND = """
var protein = this.stage.compList[0];
var ligand_center = this.stage.compList[{index}].structure.atomCenter();
var around = protein.structure.getAtomSetWithinPoint(ligand_center, {radius});
var around_complete = protein.structure.getAtomSetWithinGroup(around);
var last_repr = protein.reprList[protein.reprList.length-1];
protein.removeRepresentation(last_repr);
protein.addRepresentation("licorice", {{sele: around_complete.toSeleString()}});
"""

def show_docking(protein, ligands, vina_output):
    # Split the multi PDBQT ligand file into separate files
    ligands_files = split_pdbqt(ligands)
    # Retrieve affinities (we only need that column of the dataframe)
    affinities = parse_output(vina_output)['Affinity (kcal/mol)']
                                
    # Create viewer widget
    viewer = create_viewer(protein, ligands_files, affinities)
    
    # Create selection widget
    #   Options is a list of (text, value) tuples. When we click on select, the value will be passed
    #   to the callable registered in `.observe(...)`
    selector = Select(options=[(f"#{i} {aff} kcal/mol", i) for (i, aff) in enumerate(affinities, 1)],
                      description="",  rows=len(ligands_files), layout=Layout(width="auto"))
                 
    # Arrange GUI elements
    # The selection box will be on the left, the viewer will occupy the rest of the window
    display(AppLayout(left_sidebar=selector, center=viewer, pane_widths=[1, 6, 1]))
    
    # This is the event handler - action taken when the user clicks on the selection box
    # We need to define it here so it can "see" the viewer variable
    def _on_selection_change(change):
        # Update only if the user clicked on a different entry
        if change['name'] == 'value' and (change['new'] != change['old']):
            viewer.hide(list(range(1,len(ligands_files) + 1)))  # Hide all ligands
            component = getattr(viewer, f"component_{change['new']}")
            component.show()  # Display the selected one
            component.center(500)  # Zoom view
            # Call the JS code to show sidechains around ligand
            viewer._execute_js_code(_RESIDUES_AROUND.format(index=change['new'], radius=5))
    
    # Register event handler
    selector.observe(_on_selection_change)
    # Trigger event manually to focus on the first solution
    _on_selection_change({'name': 'value', 'new': 1, 'old': None})

    return viewer

In the function above we are doing some advanced Python. If you are interested, you can click on the arrow below to show some more details.


<details>
    
<summary>
   Advanced Python explanation
</summary>
    
> First you might have noticed is that we are mixing JavaScript and Python code! This is possible thanks to the `ipywidgets` infrastructure, as provided in `NGLViewer.Viewer._execute_js_code`. You can pass a string here (containing JavaScript code) and it will be executed in the NGL widget scope (`this` refers to the interactive canvas). To make it parameterizable, we have added some template placeholders that are formatted on each call to `_on_selection_change`.
>
> This function, `_on_selection_change` is the glue that ties user interactions (clicks on the selection box) to the Python world. It will be called each time you click on the selection box, with a single argument `change`: a dictionary containing the type of event and both the old and new values (so you can compare and do something about it). However, we also need `viewer` to be present in that function... and [we cannot pass additional values](https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Events.html#Traitlet-events)!
>
> One way to have `viewer` available in that function is to nest its definition within our `show_docking` function. That way, it will be able to access the outer scope without having to look up in the notebook (global) scope. You have some [more information about scope and closures in this post](https://medium.com/@dannymcwaves/a-python-tutorial-to-understanding-scopes-and-closures-c6a3d3ba0937).

</details>



In [13]:
viewer = show_docking("data/protein.mol2", "data/results.pdbqt", "data/vina.out")

AppLayout(children=(Select(layout=Layout(grid_area='left-sidebar', width='auto'), options=(('#1 -8.4 kcal/mol'…

## Discussion

Pending.

## Quiz

Pending.