# maxsmi
## Analysis of results

This notebook serves to analyse the results of the simulations ran on the Curta cluster from the Freie Universität Berlin.

### Maxsmi Models

### Goal

Determine the Maxsmi models for all three tasks:

- ESOL,
- FreeSolv,
- Lipophilicity.

In [1]:
import os
from pathlib import Path
import pickle
import numpy as np
import matplotlib.pyplot as plt

# Path to this notebook
HERE = Path(_dh[-1])

path_to_output = HERE.parents[0]

In [2]:
def load_data(path,
              task,):
    """
    Loads the result data from the Maxsmi models.

    Parameters
    ----------
    path : str
        The path to output folder.
    task : str
        The data with associated task, e.g. "ESOL", "FreeSolv"

    Returns
    -------
    data: pd.Pandas
        Pandas data frame with performance metrics (on train and test sets), such as r2 score and time.
    """
    if task == "FreeSolv":
        ml_model = "CONV1D"
        augmentation_strategy = "augmentation_with_duplication"
        augmentation_number = 70
    elif task == "ESOL":
        ml_model = "CONV1D"
        augmentation_strategy = "augmentation_with_reduced_duplication"
        augmentation_number = 70
    elif task == "Lipophilicity":
        ml_model = "CONV1D"
        augmentation_strategy = "augmentation_without_duplication"
        augmentation_number = 80
    else:
        None
        
    with open(
        f"{path}/output/"
        f"{task}_smiles_{augmentation_strategy}_"
        f"{augmentation_number}_{augmentation_strategy}_"
        f"{augmentation_number}_{ml_model}/"
        f"results_metrics.pkl",
        "rb",
    ) as f:
        data = pickle.load(f)
        print(ml_model, augmentation_strategy, augmentation_number)
        return data

In [3]:
for task in ["FreeSolv", "ESOL", "Lipophilicity"]:
    print(task)
    maxsmi_model = load_data(path_to_output,
                             task)
    print(f"{maxsmi_model.test[0][1]:.3f}\n")

FreeSolv
CONV1D augmentation_with_duplication 70
1.032

ESOL
CONV1D augmentation_with_reduced_duplication 70
0.569

Lipophilicity
CONV1D augmentation_without_duplication 80
0.593



This values indeed correspond to the minimum value shown in the `results_tables` notebooks.