<a href="https://colab.research.google.com/github/ustojiljkoff/baybe/blob/main/Walter_3D_1_0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Welcome to Walter_1.0.** This code uses RDKit to compute the following parameters: molecular weight, topographic polar surface area (tPSA), number of rotatable bonds, nuber of H-bond donors, number of H-bond acceptors, fraction sp3, LogP, number of aromatic rings, number of aliphatic rings, number of saturated rings, and QED. You can then enter a SMILES string for your own molecule, and your molecule will be plotted together with the dataset for the uploaded SMILES in a PCA plot, so you can get an idea where in "chemical space" your molecule lives compared with the dataset (e.g., FDA-approved drugs).

Datasets like FDA-approved drugs, vet drugs, drugs containing phenols, and drugs containing phenolic ethers are available at the following github page (download as .csv and then upload when prompted) https://github.com/SculpturatusLabs/FDA-approved_SMILES.

**To Run the Code:**
  1. At the top, click "Runtime" and "Run All"
  2. Scroll to the bottom of the screen. When the first two modules of code finish running you will be prompted to upload a dataset. Upload the data set that you want to use, and when it finishes processing, you will be prompted to name it. Select a name (this will be used in the legent of the plot)
  3. At this point a PCA plot will be generated for this dataset. Scroll past it.
  4. You will be prompted to enter a SMILES string. Copy and paste your string, and name the compound. This name will be used in the legend of the PCA plot.
  5. Plot.

In [1]:
!pip install pandas rdkit scikit-learn matplotlib
!pip install plotly

Collecting rdkit
  Downloading rdkit-2024.3.5-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.9 kB)
Downloading rdkit-2024.3.5-cp310-cp310-manylinux_2_28_x86_64.whl (33.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m33.1/33.1 MB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rdkit
Successfully installed rdkit-2024.3.5


In [None]:
import pandas as pd
from rdkit import Chem
from rdkit.Chem import Descriptors, rdMolDescriptors
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import plotly.express as px
from google.colab import files
import ipywidgets as widgets
from IPython.display import display, clear_output

# Ensure plotly is installed
!pip install plotly

# Function to calculate descriptors
def calculate_descriptors(smiles):
    mol = Chem.MolFromSmiles(smiles)
    if mol is not None:
        mw = Descriptors.MolWt(mol)
        fraction_sp3 = rdMolDescriptors.CalcFractionCSP3(mol)
        logp = Descriptors.MolLogP(mol)
        h_donors = Descriptors.NumHDonors(mol)
        h_acceptors = Descriptors.NumHAcceptors(mol)
        tpsa = Descriptors.TPSA(mol)
        num_rotatable_bonds = Descriptors.NumRotatableBonds(mol)
        num_aromatic_rings = rdMolDescriptors.CalcNumAromaticRings(mol)
        num_aliphatic_rings = rdMolDescriptors.CalcNumAliphaticRings(mol)
        num_saturated_rings = rdMolDescriptors.CalcNumSaturatedRings(mol)
        num_heteroatoms = Descriptors.NumHeteroatoms(mol)
        qed = Descriptors.qed(mol)
        return mw, fraction_sp3, logp, h_donors, h_acceptors, tpsa, num_rotatable_bonds, num_aromatic_rings, num_aliphatic_rings, num_saturated_rings, num_heteroatoms, qed
    else:
        return None, None, None, None, None, None, None, None, None, None, None, None

# Upload the CSV file
uploaded = files.upload()

# Prompt for the name of the uploaded CSV file
csv_name = input("Enter a name for the uploaded CSV file dataset: ")

# Load the CSV file into a pandas dataframe
df = pd.read_csv(next(iter(uploaded)))

# Ensure the CSV contains a column named 'SMILES'
if 'SMILES' not in df.columns:
    raise ValueError("The CSV file must contain a 'SMILES' column.")

# Apply the function to the dataframe
df[['MolecularWeight', 'FractionSP3', 'LogP', 'NumHDonors', 'NumHAcceptors', 'TPSA', 'NumRotatableBonds', 'NumAromaticRings', 'NumAliphaticRings', 'NumSaturatedRings', 'NumHeteroatoms', 'QED']] = df['SMILES'].apply(lambda x: pd.Series(calculate_descriptors(x)))

# Drop rows with None values (in case some SMILES strings could not be processed)
df = df.dropna()

# Perform PCA on the original dataset
features = ['MolecularWeight', 'FractionSP3', 'LogP', 'NumHDonors', 'NumHAcceptors', 'TPSA', 'NumRotatableBonds', 'NumAromaticRings', 'NumAliphaticRings', 'NumSaturatedRings', 'NumHeteroatoms', 'QED']
x = df[features]

# Normalize the data by setting mean to 0 and variance to 1
scaler = StandardScaler()
x_normalized = scaler.fit_transform(x)

# Perform PCA
pca = PCA(n_components=3)
principal_components = pca.fit_transform(x_normalized)
pca_df = pd.DataFrame(data=principal_components, columns=['Principal Component 1', 'Principal Component 2', 'Principal Component 3'])

# Get percentage of variance explained by each component
explained_variance = pca.explained_variance_ratio_ * 100

# Initial plot to ensure Plotly works
fig = px.scatter_3d(
    pca_df, x='Principal Component 1', y='Principal Component 2', z='Principal Component 3',
    color_discrete_sequence=['blue'], opacity=0.5, labels={'color': csv_name}
)
fig.update_traces(marker=dict(size=3))  # Adjust marker size here
fig.update_layout(
    scene=dict(
        xaxis_title=f'Principal Component 1<br>({explained_variance[0]:.2f}%)',
        yaxis_title=f'Principal Component 2<br>({explained_variance[1]:.2f}%)',
        zaxis_title=f'Principal Component 3<br>({explained_variance[2]:.2f}%)',
        xaxis=dict(titlefont=dict(size=12, family='Arial Black', color='black'), tickfont=dict(size=12)),
        yaxis=dict(titlefont=dict(size=12, family='Arial Black', color='black'), tickfont=dict(size=12)),
        zaxis=dict(titlefont=dict(size=12, family='Arial Black', color='black'), tickfont=dict(size=12))
    ),
    title='Initial 3D PCA Plot',
    width=1000,
    height=800
)
fig.show()

# Function to add new SMILES strings
def add_smiles(smiles_input, smiles_name):
    print(f"Adding SMILES: {smiles_input} with name: {smiles_name}")  # Debug statement
    smiles_list = smiles_input.split()
    new_data = {'SMILES': smiles_list}
    new_df = pd.DataFrame(new_data)
    new_df[['MolecularWeight', 'FractionSP3', 'LogP', 'NumHDonors', 'NumHAcceptors', 'TPSA', 'NumRotatableBonds', 'NumAromaticRings', 'NumAliphaticRings', 'NumSaturatedRings', 'NumHeteroatoms', 'QED']] = new_df['SMILES'].apply(lambda x: pd.Series(calculate_descriptors(x)))
    new_df = new_df.dropna()

    new_x = new_df[features]
    new_x_normalized = scaler.transform(new_x)
    new_principal_components = pca.transform(new_x_normalized)
    new_pca_df = pd.DataFrame(data=new_principal_components, columns=['Principal Component 1', 'Principal Component 2', 'Principal Component 3'])

    # Clear previous output
    clear_output(wait=True)

    # Plot PCA in 3D using plotly
    fig = px.scatter_3d(
        pca_df, x='Principal Component 1', y='Principal Component 2', z='Principal Component 3',
        color_discrete_sequence=['blue'], opacity=0.5, labels={'color': csv_name}
    )
    fig.update_traces(marker=dict(size=3))  # Adjust marker size here
    fig.add_scatter3d(
        x=new_pca_df['Principal Component 1'], y=new_pca_df['Principal Component 2'], z=new_pca_df['Principal Component 3'],
        mode='markers', marker=dict(color='red', size=6), name=smiles_name  # Adjust marker size here
    )
    fig.update_layout(
        scene=dict(
            xaxis_title=f'Principal Component 1<br>({explained_variance[0]:.2f}%)',
            yaxis_title=f'Principal Component 2<br>({explained_variance[1]:.2f}%)',
            zaxis_title=f'Principal Component 3<br>({explained_variance[2]:.2f}%)',
            xaxis=dict(titlefont=dict(size=12, family='Arial Black', color='black'), tickfont=dict(size=12)),
            yaxis=dict(titlefont=dict(size=12, family='Arial Black', color='black'), tickfont=dict(size=12)),
            zaxis=dict(titlefont=dict(size=12, family='Arial Black', color='black'), tickfont=dict(size=12))
        ),
        title='Updated 3D PCA Plot',
        width=1000,
        height=800
    )
    fig.show()

# Text entry widget
smiles_text = widgets.Text(
    value='',
    placeholder='Enter SMILES strings separated by spaces',
    description='SMILES:',
    disabled=False
)
smiles_name_text = widgets.Text(
    value='',
    placeholder='Enter a name for the new SMILES strings',
    description='Name:',
    disabled=False
)
display(smiles_text)
display(smiles_name_text)

# Button to submit SMILES strings
button = widgets.Button(description="Add SMILES")
display(button)

def on_button_click(b):
    print("Button clicked!")  # Debug statement
    try:
        add_smiles(smiles_text.value, smiles_name_text.value)
    except Exception as e:
        print(f"An error occurred: {e}")

button.on_click(on_button_click)

# Initial output to ensure widgets display correctly
print("Enter SMILES strings and click 'Add SMILES' to update the plot.")


