<div class="alert alert-block alert-info">

<b>Thank you for contributing to TeachOpenCADD!</b>

</div>

<div class="alert alert-block alert-info">

<b>Set up your PR</b>: Please check out our <a href="https://github.com/volkamerlab/teachopencadd/issues/41">issue</a> on how to set up a PR for new talktorials, including standard checks and TODOs.

</div>

# · Diffusion-based docking models

**Note:** This talktorial is a part of TeachOpenCADD, a platform that aims to teach domain-specific skills and to provide pipeline templates as starting points for research projects.

Authors:

- Hamza Ibrahim, CADD seminars 2023, Universität des Saarlandes (UdS)
- Michael Bockenköhler, 2023,  [Volkamer lab](https://volkamerlab.org), Universität des Saarlandes (UdS)
- Andrea Volkamer, 2023,  [Volkamer lab](https://volkamerlab.org), Universität des Saarlandes (UdS)

## Aim of this talktorial

This talktorial presents two state-of-the-art classes of generative models. You will learn what generative models are and know the basics of two powerful classes of generative models. We explore then their potential application in  molecular docking.

### Contents in *Theory*

* Generative models
    * Denoising diffusion probabilistic model (DDPM).
        1. Forward process
        2. Reverse process
        3. DDPM training
            - Loss function
            - Network architecture
    * Score-based generative model
        1. Score model training
        2. Score model with stochastic differential equations (SDEs)

* Diffusion-based docking models.
    1. Ligand pose manifold
    2. Product space diffusion
    3. Model architecture

### Contents in *Practical*

* Data preparation.
    - Download PDB structure
    - Prepare input file
* DiffDock implementation
* Denoising visualization

### References

* Score-based generative modeling through stochastic differential equations: [<i>arXiv</i> (2021)](https://arxiv.org/pdf/2011.13456.pdf) 
* Equivariant Graph Neural Networks: [<i>arXiv</i> (2022)](https://arxiv.org/pdf/2102.09844.pdf)
* Structure-based Drug Design with Equivariant Diffusion Models: [<i>arXiv</i> (2022)](https://arxiv.org/pdf/2210.13695.pdf)
* DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking: [<i>arXiv</i> (2023)](https://arxiv.org/pdf/2210.01776v2.pdf)
* [Diffusion Model Clearly Explained!](https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166)
* Deep Unsupervised Learning using Nonequilibrium Thermodynamics: [Sohl-Dickstein et al. <i>arXiv</i> (2021)](https://arxiv.org/pdf/1503.03585.pdf)
* Generative Modeling by Estimating Gradients of the Data Distribution: [Song et al. <i>arXiv</i> (2019)](https://arxiv.org/abs/1907.05600)
* Denoising Diffusion Probabilistic Models [Ho et al. <i>arXiv</i> (2020)](https://arxiv.org/abs/2006.11239)

## Theory

## Generative models

Generative models are a category of machine learning models that have the capability to generate new data by learning the data distribution of a given training data set by injecting noise to the input. In a nutshell ***"Creating noise from data is easy; creating data from noise is generative modeling."*** [Song et al 2021](https://arxiv.org/abs/2011.13456).

In this section, we are going to discuss two advanced techniques used in generative modelling,denoising diffusion probabilistic model and score-based generative models.

### Denoising diffusion probabilistic model (DDPM).

DDPM or so called "diffusion model" is inspired from Physics by non-equilibrium thermodynamics[DDPM paper]. It learns to generate new data depending on two main reciprocal processes that represent two sets of random variables organized in the form of Markov chains.

1. Forward Diffusion Process → add noise to input data.
2. Reverse Diffusion Process → denoise noised data.

![DGM processes figure](images/basics_dgm.png)

*Figure 1:* 
Black arrows represent the forward diffusion process, while blue arrow represents the reverse diffusion process
Figure is taken from: [Medium article](https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166).

#### 1. Forward process

The first process adds guassian noise sequentially to the input data $x_0$ by $T$ steps. As $T → \infty $, $x_T$ becomes a complete static noise image as in figure [1]. So every successive state 
$\mathbb{x}_{t + 1}$ could be computed as the following : 
$$
q(\mathbb{x}_{t}|{x}_{t-1}) = \mathcal{N(\mathbb{x}_{t};\mathbb{\mu}_t = \sqrt{1 - \beta}{x}_{t-1}, \Sigma_t = \beta_t \mathbf{I})}, \tag{1}
$$
Where $q(\mathbb{x}_{t}|{x}_{t-1})$ denotes the distribution of the next state $\mathbb{x}_{t}$.

 $\mathbb{\mu}_t$ and $\Sigma_t({x}_t, t)$ represent the mean and covariance of next state distribution, respectively.

Utilizing [Reparametrization Trick](https://medium.com/@steinsfu/diffusion-model-clearly-explained-cd331bd41166#228f) closed-form formula could be derived, which prompts us to sample ${x}_{t}$ at any time step using ${x}_{0}$. It makes forward diffusion process much faster as following:
$$
{x}_{t} = \sqrt{{\bar{\alpha}}_t} {x}_0 + \sqrt{1 - {\bar\alpha}_t} {\epsilon}_0, \tag{2}
$$

Where ${\bar\alpha}_t = \prod_{s = 0}^{t}{1 - {\beta}_s}$ , and ${\epsilon}_0, ... , {\epsilon}_{t-2}, {\epsilon}_{t-1} \sim \mathcal{N (0 , \mathbf{I})}$

#### 2. Reverse process

Unfortunetaly, It's not possible to sample ${x}_{0}$ from ${x}_{t}$ using $q(\mathbb{x}_{t-1}|{x}_{t})$ as in forward process, because reversing the noise is intractable, therefore **reverse diffusion process** is employed. As a solution $q(\mathbb{x}_{t-1}|{x}_{t})$ could be approximated by using a deep learning model (e.g. neural network), which predicts an approximation to the conditional probability distribution $\mathbb{p}_{\theta}(\mathbb{x}_{t-1}|{x}_{t})$, which modeled as a Gaussian distribution:

$$
\mathbb{p}_{\theta}(\mathbb{x}_{t-1}|{x}_{t}) = \mathcal{N(\mathbb{x}_{t-1};\mathbb{\mu}_\theta({x}_t, t), \Sigma_t({x}_t, t))}, \tag{3}
$$

By learning the conditional probability densities using deep learning model the original image $x_0$ is reconstructed from the noisy image $\mathbb{x}_t$ as illustrated in _figure 1_. Allowing for the extraction of meaningful information from the noisy representation.



After explaining the two main processes of diffusion models, we start now with training the model.

#### 3. Train a diffusion model

The objective of training a diffusion model is to learn data distribution of the input data from the noised version. 

In order to effectively train a generative model, It's necessary to define an optimized loss function and the architecture of the deep learning model. In this section we will explain briefly the loss function and the network archietecture, which commonly employed in diffusion models then we'll explain the training process of diffusion models.

##### - Loss function

As illusterated before, diffusion model has some similarity with variational autoencoders (VAEs). They are both generative models used to learn data distribution to generate new data. Mximizing the log-liklihood guides the model towards capturing patterns and statistical properties in noised data. 

$$
\underset{\theta}{\text{max}}\sum_{i=1}^{N}\log{p_\theta(x_i)} \tag{4}
$$

In diffusion models the log likelihood is intractable. However, we can indirectly optimize it by optimizing the lower variational bound. By skipping mathimatical details, [Ho et al. (2020)](https://arxiv.org/pdf/2006.11239.pdf) has simplified the loss function to:
$$
{L}_{t}^{simple} = \mathbb{E}_{t \sim [1,T] ,x_0, \epsilon}[|| \epsilon - {\epsilon}_\theta(\sqrt{\bar{a}} x_0 + \sqrt{1 - \bar{a}} \epsilon,t)||^2] \tag{5}
$$ 
where: 

$\epsilon \sim \mathcal{N}(0, \mathbb{I})$ is the actual noise added, whch follows a standard normal distribution.

${\epsilon}_\theta(\sqrt{\bar\alpha} x_0 + \sqrt{1 - \bar\alpha} \epsilon,t) = {\epsilon}_\theta(x_t,t) $ denotes the approximated noise from neural network using reparamarization trick that mentioned before.

In case of DGMs the true value corresponds to the distribution of added noise that introduced to an image and the model's objective is to learn the original data distribution from the added noise on the inp.
Usually the loss function is the difference between predicted and true values.  As observed, the loss function is the mean square error (MSE) of the added noise and predicted noise.

Once the loss function has been chosen, we can go to the next step, which is selecting an appropriate network architecture and training the diffusion model.

##### - Network architecture

The most important requirment of the network is to have the identical dimensionality for the input and the output. Therefore, usually [U-Net](https://theaisummer.com/unet-architectures/) is commonly used for prediction tasks in DGMs as a network architecture.

The U-Net architecture based on an encoder-decoder structure. In encoding, the spatial dimensions decreases while the number of channels increases keeping the important features of the input data. On the contrary, in decoding the spatial dimensions increase while number of channels decrease to produce the same spatial dimensions as the input data as illustrated in Figure 2.


<img src="images/Unet-architecture.png"  style="margin-left: auto; margin-right: auto;">

*Figure 2:* 
An overview of U-Net architecture.
Figure is taken from: [AI Summer](https://theaisummer.com/static/fa507fda71846a516801bccb19474aec/0012b/Unet-architecture.png).

By optimizing the gradient descent of the loss function, the model can be trained until it converges.

Now we got a clear idea of the outline of DDPM. In the next section, we'll explain another type of generative models, which is score-based generative models.

### Score-based generative model.

Suppose a given data set {$x_1, ..., x_{N-1}, x_N$} which follows a certain probability distribution, denoted as $p(x)$. The primary goal of the score model is to fit a model to the given distribution $p(x)$ so that new data points can be generated by sampling from the learned distribution.

However, first we need to think of a way to represent the probability distribution. One way is to model probability density function (PDF) directly. So, let $f_\theta(x) \isin \mathbb{R}$ parameterized by $\theta$, which is learnable parameter. PDF of a probabilistic model $f_\theta(x)$ is defined as follow : 

$$
p_\theta(x)= \frac{e^{-f_\theta(x)}}{\mathbb{Z_\theta}}, \tag{6}
$$

As $\mathbb{Z_\theta} > 0$ is the normalizing constant, depend on $\theta$. The model can be trained by maximizing the log-likelihood of the PDF as in equation (4).

However, a general normalizing constant $\mathbb{Z_0}$ is intractable and we can't compute $p_\theta(x)$. To overcome the intractibility problem of $\mathbb{Z_0}$, **score function** is modelled instead of the PDF, which is defined as :

$$
s_\theta(x) = \nabla_x \log {p(x)} \tag{7}
$$

$s_\theta(x)$ is score-based model and can be parametrized without the need to evaluate $\mathbb{Z_0}$. By taking the gradient of the distribution, the normalizing constant becomes zero and we can ignore it as in equation 8.

$$
s_\theta(x) = - \nabla_x f_\theta(x) - \nabla_x \log{\mathbb{Z_0}} =  - \nabla_x f_\theta(x) \tag{8}
$$

As shown in figure 3, there is no need to use normalization while parameterizing score functions. On the contrary, changes in data distribution undergo normalization, as the area under the curve (AUC) must integrate to one.
<p float="left">
  <img src="images/ebm.gif" height="280" width="400" />
  <img src="images/score.gif" height="280" width="400" /> 
</p>

*Figure 3:* 
Parameterization of probability density functions (on the left) and score functions (on left). 
Figures are taken from: [Yang-Song blogpost](https://yang-song.net/blog/2021/score/#connection-to-diffusion-models-and-others).

#### - Score model training

In order to train a score-based model, we need to compare between the model and the actual data distribution. This is done by minimizing a function that computes the distance between ground-truth data score and score-based model like Fisher divergence, which defined as:

$$
\mathbb{E_{p(x)}}[|| \nabla_x \log{p_{(x)}} - \mathbb{s_\theta(x)}||_2^2] \tag{9}
$$

The ground-truth data score is unknown, which makes it infeasible to compute fisher divergence indirectly. However, **score matching** makes it feasible, because it can minimize fisher divergence without the estimation of the ground-truth data score and allow us to train the model.

The model can be trained, but the main objective of the score-based model, which is generating new data, is not achieved. [Langevin dynamics](https://en.wikipedia.org/wiki/Langevin_dynamics) is used as a sampling method to generate new data, accessing only its score function as shown in figure 4.

<img src="images/smld.jpg"  style="margin-left: auto; margin-right: auto;">

*Figure 4:* 
An overview of score-based generative modeling with score matching and langevin dynamics.
Figure is taken from: [Yang-Song blogpost](https://yang-song.net/blog/2021/score/#connection-to-diffusion-models-and-others).

#### - Score model with stochastic differential equations (SDEs)

Adding multiple noise scale at different scales has shown an improvement to the model's ability to generate high quality samples. By increasing the noise scale to infinity, exact log-likelihood can be also obtained. By injecting noise to the data, the concept becomes similar to DDPM. However, the main difference in this scenario is that the noise perturbation is a continous-time [stochastic process](https://en.wikipedia.org/wiki/Stochastic_process#:~:text=A%20stochastic%20process%20is%20defined,measurable%20with%20respect%20to%20some) as shown in figure 5.

<img src="images/sde_schematic.jpeg" height=400 width =1000  style="margin-left: auto; margin-right: auto;">

*Figure 5:* 
An overview of a forward and reverse SDE in general.
Figure is taken from: [Yang-Song blogpost](https://yang-song.net/blog/2021/score/#connection-to-diffusion-models-and-others).

Like DDPM we get now forward process and reverse process of SDE. Keep in mind that using SDE is not a unique approach and there are different ways to add noise perturbations, one way is showed in the example. However, Score function is used to generate data in the reverse SDE, as demonstrated in figure 5. which will use score matching for training the score-based model. For more more comprehensive understanding, [This article](https://yang-song.net/blog/2021/score/#connection-to-diffusion-models-and-others) explains every tiny detail.

By now, two main state-of-the-art generative models are well-covered. In the next section, we discuss a novel tool called _DiffDock_. It has employed a generative model to tackle the challenge of molecular docking in the field of cheminformatics.

### Diffusion-based docking model.

The main concepts of creating a generative models are explained. It's important to note that implementing generative model in molecular docking will not be the same as we explained. In this section we discuss the challenges encountered in applying generative model, especially score-based generative model, in molecular docking and how these obstacles have been overcome in a real-case application.

#### 1. Ligand pose manifold

In order to have diffusion-based docking model, you have to think of a manifold that suits ligand poses first where $L \isin \mathbb{R}^{3n} $ as $n$ is the number of atoms. If we start forward diffusing without setting any limitations for the degree of freedom, it becomes absurd and ligands will have unreasonable bond lengths and angles as in figure 6.

<img src="images/absurd_ligand.png"  style="margin-left: auto; margin-right: auto;">

*Figure 6:* 
Randomizing bond length and angles without keeping local structures fixed.

A solution to this problem is presented in [DiffDock paper](https://arxiv.org/pdf/2210.01776v2.pdf). They are inspired from traditional docking approches by taking already embedded ligand in a 3D space using RDKit , which instantiates the angles and bond length of the atoms. Instead of thinking of a ligand as an element in an eucledian space, they described ligand pose by four main parameters. 

1. Local structures like bond lengths, bond angles, chirality and ring structure are generated using RDKit and kept fixed in order to maintain integrity of the predicted ligand and the model.

2. Position of ligand with 3D translastion group were left flexible to find the pocket and fit in it $\mathbb{R}^3$.

3. Rotation parameterization, where $Rotation \isin {SO(3)}$ correspnds to 3D rigid rotation around the mass centre of the ligand.

<img src="images/rotation.gif"  style="width:300px;">


*Figure 7:* 
GIF shows an example of the rotaion of a methyl group in an Ethane structure. Figure taken from [Proteopedia](https://proteopedia.org/wiki/index.php/Dihedral/Index)

4. Flexibility of torsion angles to fit in the pocket, where: $ \mathit{Torsions} \isin \mathbb{T}^m$, which represent the changes in torsion angles around rotatable bonds in a ligand with a copy of 2D rotation group ${SO(2)}$ .  

<img src="images/Phipsi-AH.gif"  style="width:400px;">

*Figure 8:* 
GIF illustrartes the torsion angles and changes in it. As shown a torsion angle ϕ is defined by  a four covalently bonded atoms. Every three atoms defines a half plane and when these planes intersect the angle between them is torsion angle ϕ. Figure taken from [Proteopedia](https://proteopedia.org/wiki/index.php/Dihedral/Index)

These four parameters have introduced a new challenge. The problem arises from the fact that there are several valid possibilities for making changes through rotations and alterations in torsion angles together. The used strategy in DiffDock is to _disentangle_ the degrees of freedom involved in docking, which aims to isolate the modifcation of torsion angles from other transformations such as rotation and translations.

To make sure that the changes in torsion were totally independent during docking, post-torsion RMSD alignment was performed to confirm that rotations and translations were orthogonal to torsion modifications.

By utilizing those parameters, it was possible to map ligand poses into submanifold $\mathcal{M}_c \subset \mathbb{R}^{3n}$, where they can easily diffuse over. This submanifold $\mathcal{M}_c$ facilitates diffusion over a space where ligand poses are represented in $(m + 6)$ dimensions, where $m$ denotes number of rotatable bonds.

Fortunately, the ligand pose submanifold establishes a smooth mapping with the product space. As a result, we can now map displacements within the manifold of ligand poses to the product space. This product space, denoted as : $\mathbb{P} = \mathbb{R}^3 * \isin {SO(3)} * \isin \mathbb{T}^m$

#### 2. Product space diffusion

After mapping ligand pose manifold to product space, score-based generative model with SDE is trained with **score matching** according to [Song et al. 2019](https://arxiv.org/abs/1907.05600) to compute the score of the diffusion kernel on the product space and sample from it. But here appears another problem which is how the score model will be diffused on the product space.

The problem is that most of existing score-based generative models are designed for a data on an eucalidean space. However, [De Bortoli et al. 2022](https://arxiv.org/pdf/2202.02763.pdf) has developed Riemannian score-based generative model (SGM) which based on Riemannian manifold which gives the possibility to create SGMs of a various manifolds.

The main concept is to consider the score model not as a vector field on the eucledian space, but rather as a vector field on the manifold where score and the score model are elements of the tangent space of every possible point on the manifold as represented in _figure 8_. 

<img src="images/tangent_space.png"  style="width:400px;">

*Figure 8:* The tangent space, denoted as ${T_xM}$ represents the set of all possible tangent vectors $v$ at $x$ as $x \isin \mathcal{M}$.


As mentioned before product space is the product of three manifolds. So, in order to proceed the forward diffusion process on the product space, every manifold will be diffused independently according to [Rodol`a et al., 2019](https://arxiv.org/abs/1809.10940) and the tangent space will become a direct sum of every manifold:

$$
T_g \mathbb{P} = T_r\mathbb{T^3} \oplus T_RSO(3) \oplus T_\theta SO(2)^m 
$$
Therefore, we can sample from diffusion kernel and perform regression independently against its true score within each group.

#### 3. Model architecture

A confidence model, besides a score model, are constructed using [E(3)NN](https://arxiv.org/abs/2207.09453). Score-based generative model is used to simulate the reverse diffusion starting from the "noisy" version of the ligand-portein interaction using reverse SDE to denoising and find the right binding pocket

While confidence model is responisble for ranking the arbitrary generated number of ligands. It's trained as a classifier specially to rank the poses and find best generated conformers as demonestrated in figure 10.

![ChEMBL web service schema](images/DiffDock.png)

*Figure 10:* 
Overview of DiffDock workflow. Left: The model takes as input the separate ligand and protein
structures. Center: Score-based generative model where random initial poses are denoised via a reverse SDE over trans-
lational, rotational, and torsional degrees of freedom. Right:. The sampled poses are ranked by the
confidence model to produce a final prediction and confidence score.
Figure and discription taken from: [arXiv 2023](https://arxiv.org/pdf/2210.01776v2.pdf).

DiffDock has shown an significant improvement in comparison to tradition docking. It achieved a $38%%$% success rate in making predictions with RMSD below $2$, whereas the best traditional docking tool used got $28%$%.

## Practical

* Data preparation.
    - Download PDB structure
    - Prepare input file
* DiffDock implementation
* Denoising visualization

In the practical part we are going to implement _DiffDock_, which used in molecular docking trained using score-based generative model. It's now open-source and published on [Github](https://github.com/gcorso/DiffDock).

#### Import dependancies

In [1]:
import pandas as pd
from matplotlib import pyplot as plt
import os 
#import nglview as nv
import urllib
from pathlib import Path

In [2]:
from pathlib import Path
HERE = Path(_dh[-1])
DATA = HERE / "data"

### Data preparation

Before starting _DiffDock_ implementation, input data has to be prepared. We start with the protein structure.

#### Prepare protein structure

If you want to download the protein structure. Set `protein_pdb` to your PDB code of the structure you want to use. You can use your own protein, but you need to place it inside data/ directory and set protein_pdb to your protein pdb structure.

In [3]:
#By default, it's set to apixaban-bound ABLE crystal structure

protein_pdb = '6w70.pdb'

In [4]:
# Download the PDB structure if it's not in DATA

if protein_pdb not in os.listdir('data'):
    urllib.request.urlretrieve(f'http://files.rcsb.org/download/{protein_pdb}', f'data/{protein_pdb}')
else:
    print('PDB structure already downloaded.')

PDB structure already downloaded.


#### Specify your quary ligand

In this step, you need to specify your ligand input. There are different input types can be used. SDF file or SMILES string. In case you want to use sdf file, change `ligand` variable to the path of SDF file.

Unlike _GNINA_, _DiffDock_ use one quary per protein. Therefore, if you want to dock more than one ligand to the same protein, you can add it to `ligand_input` list.

In [5]:
# Give ligand SMILES here
ligand = "O=C(O)c1cc(/OCc2cccc3ccccc23)ccc1O"

molecule_id = ['Molecule_1']
ligand_input = [ligand]

### DiffDock implementation

First step is to download _DiffDock_ software from its [Github repository](https://github.com/gcorso/DiffDock.git).

In [6]:
if "DiffDock" not in os.listdir(HERE):
    #specifiy version
    !git clone https://github.com/gcorso/DiffDock.git
else:
    print(f"DiffDock is alreay cloned.")

DiffDock is alreay cloned.


After cloning the repository, you can create anaconda environment for DiffDock using the next cell, in order to avoid any running issues.

Note: If you didn't install anaconda on your system, more information can be found [here](https://www.anaconda.com).

In [7]:
!conda env create -f DiffDock/environment.yml
!conda activate diffdock

zsh:1: command not found: conda


zsh:1: command not found: conda


#### Configure your docking settings 

In [7]:
samples_per_complex = "5"
inference_steps = "10"
actual_steps = "18"
batch_size = "5"
print(f'{DATA}/{protein_pdb}')

/home/hamza/Desktop/Bioinformatics_master/SS23/CADDSeminar_2023/notebook/T02_DiffusionBasedDocking/data/6w70.pdb


In [9]:
os.chdir("DiffDock/")
for id, smiles in zip(molecule_id, ligand_input):
    diffdock_cmd = f"python -m inference --protein_path {DATA}/{protein_pdb} --ligand '{smiles}' --out_dir {DATA}/{id} --inference_steps {inference_steps} --samples_per_complex {samples_per_complex} --save_visualisation --batch_size {batch_size} --actual_steps {actual_steps} --no_final_step_noise"
    os.system(diffdock_cmd)
os.chdir("../")

Generating ESM language model embeddings




Processing 1 of 1 batches (2 sequences)
HAPPENING | confidence model uses different type of graphs than the score model. Loading (or creating if not existing) the data for the confidence model now.
Size of test dataset:  1


0it [00:00, ?it/s]

Failed on ['1a0q'] index 10 is out of bounds for axis 0 with size 10
Failed for 1 complexes <torch_geometric.loader.dataloader.DataLoader object at 0x7fbc0b37c160>
Skipped 0 complexes
Results are in /home/hamza/Desktop/Bioinformatics_master/SS23/CADDSeminar_2023/notebook/T02_DiffusionBasedDocking/data/Molecule_1


1it [01:05, 65.16s/it]


In [7]:
import nglview as nv

view = nv.show_pdbid(f"{DATA}/{id}/rank1_reverseprocess")  # Or use nv.show_file("path/to/your/file.pdb")

# Display the molecular viewer
display(view)

InvalidURL: URL can't contain control characters. '/pdb/files//Users/hamzaibrahim/Github/CADDSeminar_2023/notebook/T02_DiffusionBasedDocking/data/<built-in function id>/rank1_reverseprocess.cif' (found at least ' ')

In [2]:
import py3Dmol
from rdkit import Chem

# Read the PDB file
with open(f'{DATA}/{protein_pdb}', 'r') as pdb_file:
    pdb_data = pdb_file.read()

# Read the SDF file
suppl = Chem.SDMolSupplier(f'{DATA}/results/Molecule_1/1a0q/rank1.sdf')
print(suppl)
# Create a viewer
viewer = py3Dmol.view(width=800, height=600)

# Add the PDB data to the viewer
viewer.addModel(pdb_data, 'pdb')

# Iterate over the molecules in the SDF file
for mol in suppl:
    # Skip invalid molecules

    if mol is None:
        continue

    # Add the SDF molecule to the viewer
    viewer.addModel(Chem.MolToMolBlock(mol), 'sdf')

# Set the style and visualization options
# viewer.addModel(Chem.MolToMolBlock(suppl), 'sdf')
viewer.setStyle({'cartoon': {'color': 'spectrum'}})
viewer.zoomTo()

# Display the viewer
viewer.show()

ModuleNotFoundError: No module named 'py3Dmol'

## Discussion

Throughout this talktorial, the fundementals of two main types of generative models are explained, highlighting their powerful capibility to generate new data from using minimal information as an input. We've presented a case study in the field of cheminformatics and how it's used in molecular docking. While some problems have been solved, new questions raised that need to be addressed. Proving that molecular docking remains a vital field that offers room for more improvement using robust and well-structured models.


## Quiz

Ask three questions that the user should be able to answer after doing this talktorial. Choose important take-aways from this talktorial for your questions.

1. From presenting two main types of generative models, address their benefits and differences between them.
2. What is the unit of the predicted DiffDock output? In comparison to traditional molecular docking tools, is it any better to get this output? and why?
3. What are the limitations of DiffDock? and can it be overcome in the future?