# maxsmi
## Analysis of results

This notebook serves to analyse the results of the simulations ran on the Curta cluster from the Freie Universität Berlin.

### Early stopping

Simulations can be run using the following command:
```
(maxsmi) $ python maxsmi/full_workflow_earlystopping.py --task lipophilicity --string-encoding smiles --aug-strategy-train augmentation_without_duplication --aug-strategy-test augmentation_without_duplication --aug-nb-train 5 --aug-nb-test 5 --ml-model CONV1D --eval-strategy True
```


### Goal

The aim of this notebook is to compare the results for a subset of models that were trained with and without early stopping.

Models that were trained are:

- ESOL: (CONV2D, Augmentation with duplication, 4),
- FreeSolv: (RNN, Augmentation with reduced duplication, 3),
- Lipophilicity: (CONV1D, Augmentation without duplication, 5).

In [None]:
import os
from pathlib import Path
import pickle

# Path to this notebook
HERE = Path(_dh[-1])

path_to_output = HERE.parents[0]

In [None]:
def load_results(
    path,
    task,
    augmentation_strategy_train,
    train_augmentation,
    augmentation_strategy_test,
    test_augmentation,
    ml_model,
    string_encoding="smiles",
    early_stopping=False
):
    """
    Loads the result data from simulations.

    Parameters
    ----------
    path : str
        The path to output folder.
    task : str
        The data with associated task, e.g. "ESOL", "FreeSolv"
    augmentation_strategy_train : str
        The augmentation strategy used on the train set.
    train_augmentation : int
        The number of augmentation on the train set.
    augmentation_strategy_test : str
        The augmentation strategy used on the test set.
    test_augmentation : int
        The number of augmentation on the test set.
    ml_model : str
        The machine learning model, e.g. "CONV1D".
    string_encoding : str
        The molecular encoding, default is "smiles".
    early_stopping : bool, default False.
        Whether the training was done with early stopping.

    Returns
    -------
    data: pd.Pandas
        Pandas data frame with performance metrics (on train and test sets), such as r2 score and time.
    """

    if early_stopping:
        with open(
            f"{path}/output/{task}_{string_encoding}_{augmentation_strategy_train}_"
            f"{train_augmentation}_{augmentation_strategy_test}_"
            f"{test_augmentation}_{ml_model}_earlystopping/"
            f"results_metrics.pkl",
            "rb",
        ) as f:
            data = pickle.load(f)
    else:
        with open(
            f"{path}/output/{task}_{string_encoding}_{augmentation_strategy_train}_"
            f"{train_augmentation}_{augmentation_strategy_test}_"
            f"{test_augmentation}_{ml_model}/"
            f"results_metrics.pkl",
            "rb",
        ) as f:
            data = pickle.load(f)

    return data

In [None]:
TASK = "ESOL"
ML_MODEL = "CONV2D"
AUGMENTATION_STRATEGY = "augmentation_with_duplication"
AUGMENTATION_NUMBER = 4
STRING_ENCODING = "smiles"

In [None]:
TASK = "Lipophilicity"
ML_MODEL = "CONV1D"
AUGMENTATION_STRATEGY = "augmentation_without_duplication"
AUGMENTATION_NUMBER = 5
STRING_ENCODING = "smiles"

In [None]:
TASK = "FreeSolv"
ML_MODEL = "RNN"
AUGMENTATION_STRATEGY = "augmentation_with_reduced_duplication"
AUGMENTATION_NUMBER = 3
STRING_ENCODING = "smiles"

In [6]:
test_rmse_no_earlystopping = load_results(path_to_output,
                                          TASK, AUGMENTATION_STRATEGY, AUGMENTATION_NUMBER,
                                          AUGMENTATION_STRATEGY, AUGMENTATION_NUMBER,
                                          ML_MODEL, STRING_ENCODING,
                                          early_stopping=False).test[0][1]
print(f"Test RMSE with no early stopping: {test_rmse_no_earlystopping:.3f}")

Test RMSE with no early stopping: 2.692


In [7]:
test_rmse_earlystopping = load_results(path_to_output,
                                       TASK, AUGMENTATION_STRATEGY, AUGMENTATION_NUMBER,
                                       AUGMENTATION_STRATEGY, AUGMENTATION_NUMBER,
                                       ML_MODEL, STRING_ENCODING,
                                       early_stopping=True).test[0][1]
print(f"Test RMSE with early stopping: {test_rmse_earlystopping:.3f}")

Test RMSE with early stopping: 1.920
