# Incompatible `plotly` functionality with molplotly

`Plotly` is a graphing library that does far more than just scatter plots - it has lots of cool functionalities that unfortunately clash with how `molplotly` implements the hover box (for now at least). Here are some examples of known incompatibilities, which are still very useful data visualisations in vanilla `plotly`!

## Imports and Data Loading

Import pandas for data manipulation, plotly for plotting, and molplot for visualising structures!

In [1]:
import pandas as pd
import plotly.express as px
import molplotly


INFO:rdkit:Enabling RDKit 2021.09.4 jupyter extensions


Let's load the ESOL dataset from [ESOL: Estimating Aqueous Solubility Directly from Molecular Structure](https://doi.org/10.1021/ci034243x) - helpfully hosted by the [deepchem](https://github.com/deepchem/deepchem) team but also included as `example.csv` in the repo.

In [2]:
# df_esol = pd.read_csv('example.csv')
df_esol = pd.read_csv(
    'https://raw.githubusercontent.com/deepchem/deepchem/master/datasets/delaney-processed.csv')
df_esol['y_pred'] = df_esol['ESOL predicted log solubility in mols per litre']
df_esol['y_true'] = df_esol['measured log solubility in mols per litre']


## Marginals on scatter plots 

I like having marginals on the sides by default because the data density in a dataset can often vary a lot. Anything to do with histogram/violin plots don't work yet with `molplotly`.

In [None]:
fig_marginal = px.scatter(df_esol,
                 x="y_true",
                 y="y_pred",
                 title='ESOL Regression (with histogram marginals)',
                 labels={'y_pred': 'Predicted Solubility',
                         'y_true': 'Measured Solubility'},
                 marginal_x='violin',
                 marginal_y='histogram',
                 width=1200,
                 height=800)
fig_marginal.show()


## Violin plots

The aesthetic of violin plots are nice, especially when there's a lot of datapoints but if there's not much data (often the case in drug discovery!) then those nice smooth KDE curves can be misleading so I usually prefer strip plots. `plotly` has cool mouseover data on violin plots which are incompatible with `molplotly` but at least if there's enough data that I prefer using a violin plot, it's probably too memory consuming to run a strip plot with `molplotly` anyway!

<a name="violin"></a>

In [None]:
fig_violin = px.violin(df_esol,
                       y="y_true",
                       title='ESOL violin plot of measured solubility',
                       labels={'y_true': 'Measured Solubility'},
                       box=True,
                       points='all',
                       width=1200,
                       height=800)
fig_violin.show()
