# maxsmi
## Analysis of results

This notebook serves to analyse the results of the simulations ran on the Curta cluster from the Freie Universität Berlin.

### Early stopping

Simulations can be run using the following command:
```
(maxsmi) $ python maxsmi/full_workflow_earlystopping.py --task lipophilicity --string-encoding smiles --aug-strategy-train augmentation_without_duplication --aug-strategy-test augmentation_without_duplication --aug-nb-train 5 --aug-nb-test 5 --ml-model CONV1D --eval-strategy True
```


### Goal

The aim of this notebook is to compare the results for a subset of models that were trained with and without early stopping.

Models that were trained are:

- ESOL: (CONV2D, Augmentation with duplication, 4),
- FreeSolv: (RNN, Augmentation with reduced duplication, 3),
- Lipophilicity: (CONV1D, Augmentation without duplication, 5).

Moreover, we use a t-test to determine if the mean of the two groups is statistically different.

In [1]:
import os
from pathlib import Path
import pickle
from scipy import stats

# Path to this notebook
HERE = Path(_dh[-1])

path_to_output = HERE.parents[0]

In [2]:
def load_results(
    path,
    task,
    augmentation_strategy_train,
    train_augmentation,
    augmentation_strategy_test,
    test_augmentation,
    ml_model,
    string_encoding="smiles",
    trial=1,
    early_stopping=False
):
    if early_stopping:
        with open(
            f"{path}/output{trial}/{task}_{string_encoding}_{augmentation_strategy_train}_"
            f"{train_augmentation}_{augmentation_strategy_test}_"
            f"{test_augmentation}_{ml_model}_earlystopping/"
            f"results_metrics.pkl",
            "rb", ) as f: 
            data = pickle.load(f)
    else:
        with open(
            f"{path}/output{trial}/{task}_{string_encoding}_{augmentation_strategy_train}_"
            f"{train_augmentation}_{augmentation_strategy_test}_"
            f"{test_augmentation}_{ml_model}/"
            f"results_metrics.pkl",
            "rb", ) as f: 
            data = pickle.load(f)
        
        
    return data

In [3]:
TASK = "ESOL"
ML_MODEL = "CONV2D"
AUGMENTATION_STRATEGY = "augmentation_with_duplication"
AUGMENTATION_NUMBER = 4
STRING_ENCODING = "smiles"

In [4]:
TASK = "Lipophilicity"
ML_MODEL = "CONV1D"
AUGMENTATION_STRATEGY = "augmentation_without_duplication"
AUGMENTATION_NUMBER = 5
STRING_ENCODING = "smiles"

In [5]:
TASK = "FreeSolv"
ML_MODEL = "RNN"
AUGMENTATION_STRATEGY = "augmentation_with_reduced_duplication"
AUGMENTATION_NUMBER = 3
STRING_ENCODING = "smiles"

In [6]:
test_rmse_no_earlystopping = load_results(path_to_output, TASK, AUGMENTATION_STRATEGY, AUGMENTATION_NUMBER,
             AUGMENTATION_STRATEGY, AUGMENTATION_NUMBER, ML_MODEL, STRING_ENCODING, trial=1, early_stopping=False).test[0][1]
test_rmse_no_earlystopping

1.9367644946937959

In [7]:
test_rmse_earlystopping = load_results(path_to_output, TASK, AUGMENTATION_STRATEGY, AUGMENTATION_NUMBER,
             AUGMENTATION_STRATEGY, AUGMENTATION_NUMBER, ML_MODEL, STRING_ENCODING, trial=1, early_stopping=True).test[0][1]
test_rmse_earlystopping

2.1166053526382274

### T-test to determine the statistical difference

_Note_: A p-value larger than a chosen threshold (e.g. 5% or 1%) indicates that our observation is not so unlikely to have occurred by chance.

In [8]:
rvs1 = stats.norm.rvs(loc=5, scale=10, size=80)
rvs2 = stats.norm.rvs(loc=5, scale=10, size=80)
stats.ttest_ind(rvs1, rvs2)

Ttest_indResult(statistic=1.1893322976106901, pvalue=0.23609318910730553)