# maxsmi
## Analysis of results

This notebook serves to analyse the results of the simulations ran on the Curta cluster from the Freie Universität Berlin.

### Early stopping

Simulations can be run using the following command:
```
(maxsmi) $ python maxsmi/full_workflow_earlystopping.py --task lipophilicity --string-encoding smiles --aug-strategy-train augmentation_without_duplication --aug-strategy-test augmentation_without_duplication --aug-nb-train 5 --aug-nb-test 5 --ml-model CONV1D --eval-strategy True
```


### Goal

The aim of this notebook is to compare the results for a subset of models that were trained with and without early stopping.

Models that were trained are:

- ESOL: (CONV2D, Augmentation with duplication, 4),
- FreeSolv: (RNN, Augmentation with reduced duplication, 3),
- Lipophilicity: (CONV1D, Augmentation without duplication, 5).

In [1]:
import os
from pathlib import Path
import pickle
import numpy as np

# Path to this notebook
HERE = Path(_dh[-1])

path_to_output = HERE.parents[0]

In [2]:
TASK = "FreeSolv"
ML_MODEL = "RNN"
AUGMENTATION_STRATEGY = "augmentation_with_reduced_duplication"
AUGMENTATION_NUMBER = 3
STRING_ENCODING = "smiles"

In [3]:
TASK = "ESOL"
ML_MODEL = "CONV2D"
AUGMENTATION_STRATEGY = "augmentation_with_duplication"
AUGMENTATION_NUMBER = 4
STRING_ENCODING = "smiles"

In [4]:
TASK = "Lipophilicity"
ML_MODEL = "CONV1D"
AUGMENTATION_STRATEGY = "augmentation_without_duplication"
AUGMENTATION_NUMBER = 5
STRING_ENCODING = "smiles"

In [5]:
def test_RMSE(path,
              task,
              augmentation_strategy_train,
              train_augmentation,
              augmentation_strategy_test,
              test_augmentation,
              ml_model,
              string_encoding="smiles",
              trial=1):
    """
    Loads the result data from simulations with and without early stopping.

    Parameters
    ----------
    path : str
        The path to output folder.
    task : str
        The data with associated task, e.g. "ESOL", "FreeSolv"
    augmentation_strategy_train : str
        The augmentation strategy used on the train set.
    train_augmentation : int
        The number of augmentation on the train set.
    augmentation_strategy_test : str
        The augmentation strategy used on the test set.
    test_augmentation : int
        The number of augmentation on the test set.
    ml_model : str
        The machine learning model, e.g. "CONV1D".
    string_encoding : str
        The molecular encoding, default is "smiles".
    trial : int, default 1
        Iteration of the training process.

    Returns
    -------
    None
    """

    try:
        with open(
                f"{path}/output_early_stopping/output_{trial}/"
                f"{task}_{string_encoding}_{augmentation_strategy_train}_"
                f"{train_augmentation}_{augmentation_strategy_test}_"
                f"{test_augmentation}_{ml_model}_earlystopping/"
                f"results_metrics.pkl",
                "rb",
        ) as f:
            data = pickle.load(f)
            test_rmse_earlystopping = data.test[0][1]
    except FileNotFoundError:
        test_rmse_earlystopping = np.nan
    print(f"Test RMSE with early stopping: \t{test_rmse_earlystopping:.3f}")
    
    try:
        with open(
                f"{path}/output_early_stopping/output_{trial}/"
                f"{task}_{string_encoding}_{augmentation_strategy_train}_"
                f"{train_augmentation}_{augmentation_strategy_test}_"
                f"{test_augmentation}_{ml_model}/"
                f"results_metrics.pkl",
                "rb",
        ) as f:
            data = pickle.load(f)
            test_rmse_no_earlystopping = data.test[0][1]
    except FileNotFoundError:
        test_rmse_no_earlystopping = np.nan
    print(f"Test RMSE no early stopping: \t{test_rmse_no_earlystopping:.3f}")
    print("\n")

In [6]:
print(TASK)
for i in range(1, 11):
    print(i)
    test_RMSE(path_to_output,
              TASK,
              AUGMENTATION_STRATEGY,
              AUGMENTATION_NUMBER,
              AUGMENTATION_STRATEGY,
              AUGMENTATION_NUMBER,
              ML_MODEL, STRING_ENCODING,
              trial=i)

Lipophilicity
1
Test RMSE with early stopping: 	0.850
Test RMSE no early stopping: 	0.859


2
Test RMSE with early stopping: 	0.898
Test RMSE no early stopping: 	0.857


3
Test RMSE with early stopping: 	0.853
Test RMSE no early stopping: 	0.852


4
Test RMSE with early stopping: 	0.872
Test RMSE no early stopping: 	0.856


5
Test RMSE with early stopping: 	0.858
Test RMSE no early stopping: 	0.846


6
Test RMSE with early stopping: 	0.883
Test RMSE no early stopping: 	0.847


7
Test RMSE with early stopping: 	0.869
Test RMSE no early stopping: 	0.845


8
Test RMSE with early stopping: 	0.869
Test RMSE no early stopping: 	0.849


9
Test RMSE with early stopping: 	0.876
Test RMSE no early stopping: 	0.857


10
Test RMSE with early stopping: 	0.847
Test RMSE no early stopping: 	0.858


