<h1>Evaluation of raw results</h1>

To evaluate the results there are two states results can be in:
1) 'Raw' - the results are not yet processed.
2) 'Metric results' - the raw have been turned into metric results over n folds.

This notebook looks at 'raw', see evaluation_metric_results.ipynb for metric results

<h2> Raw results </h2>
Raw result are those yet to be processed. A result for an estimator in the 'raw' format
is a csv that takes the following format:

<b>File name</b>: \<split>Resample\<resample number>.csv e.g. testResample0.csv
<b>First line</b>: \<dataset name>,\<estimator name>,\<split>,\<run time value>,\<run time unit>,\<extra info>
<b>Second line</b>: \<dict containing parameters passed to estimator>
<b>Third line</b>: \<TBD>
<b>Fourth line and onwards</b>: \<True y class>,\<predicted y class>,\<num classes>,<proba class 1>,<proba class 2>,...,\<proba class n>

<h3>Example file</h3>
See notebooks/example_data/testResample0.csv

In [1]:
import csv

import pandas as pd

line_one = None
line_two = None
line_three = None
results = []
with open("./example_data/testResample0.csv", newline="\n") as csvfile:
    reader = csv.reader(csvfile, delimiter=",")
    line_one = next(reader)
    line_two = next(reader)
    line_three = next(reader)
    for row in reader:
        results.append(row)

print(f"First line: {line_one}\n")
print(f"Second line: {line_two}\n")
print(f"Third line: {line_three}\n")
print(f"Results: {pd.DataFrame(results)}")

First line: ['ACSF1', 'kmeans-ddtw', 'test', '0', 'MILLISECONDS', 'Generated by clustering_experiments on 2022-05-01 16:30:44.347516']

Second line: ["{'average_params': {'averaging_distance_metric': 'ddtw'}", " 'averaging_method': 'mean'", " 'distance_params': {'window': 1.0", " 'epsilon': 0.05", " 'g': 0.05", " 'c': 1}", " 'init_algorithm': 'random'", " 'max_iter': 300", " 'metric': 'ddtw'", " 'n_clusters': 10", " 'n_init': 10", " 'random_state': 1", " 'tol': 1e-06", " 'verbose': False}"]

Third line: ['0', '19307', '19283', '-1', '-1', '10', '10']

Results:    0  1  2    3    4    5    6    7    8    9    10   11   12
0   9  9     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
1   9  9     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
2   9  9     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
3   9  9     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
4   9  9     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  1.0
.. .. .. ..  ...  ...  ...  ...  ...  ...  ...  ...  ...  ...

<h1>Evaluating raw results</h1>

In [6]:
from sktime_estimator_evaluation.evaluation import (
    CLUSTER_METRIC_CALLABLES,
    evaluate_raw_results,
)

results = evaluate_raw_results(
    experiment_name="example_experiment_name",  # The name of the experiment
    path="./example_data/clustering_results",  # Path to the raw files
    metrics=CLUSTER_METRIC_CALLABLES,  # List of metrics to evaluate the results with
)
# The return value will take the format of:
#     List[Dict]
#         A list of metric results. Each metric will take the form:
#         {
#             'metric_name': str,
#             'test_estimator_results': [
#                 {
#                     'estimator_name': str,
#                     'result': pd.DataFrame
#                 },
#                 {
#                     'estimator_name': str,
#                     'result': pd.DataFrame
#                 }
#             ],
#             'train_estimator_results': [
#                 {
#                     'estimator_name': str,
#                     'result': pd.DataFrame
#                 },
#                 {
#                     'estimator_name': str,
#                     'result': pd.DataFrame
#                 }
#             ],
#         }
#         Each result dataframe will be in the following format:
#         | folds    | 0   | 1   | 2   | 3   | ... |
#         ------------------------------------------
#         | dataset1 | 0.0 | 0.0 | 0.0 | 0.0 | ... |
#         | dataset2 | 0.0 | 0.0 | 0.0 | 0.0 | ... |
#         | dataset3 | 0.0 | 0.0 | 0.0 | 0.0 | ... |
#         | dataset4 | 0.0 | 0.0 | 0.0 | 0.0 | ... |

evaluating estimator:  kmeans_dba
----> evaluating experiment:  kmeans-dtw
----> evaluating experiment:  kmeans-msm


In [3]:
# If we want to write the results to a file with can specify an output path.
results_output = evaluate_raw_results(
    experiment_name="example_experiment_name",  # The name of the experiment
    path="./example_data/clustering_results",  # Path to the raw files
    metrics=CLUSTER_METRIC_CALLABLES,  # List of metrics to evaluate the results with
    output_dir="./example_data/output/clustering_results",  # Path to the output directory
)

evaluating estimator:  clustering_results
----> evaluating experiment:  kmeans-dtw
----> evaluating experiment:  kmeans-euclidean


In [4]:
# If we want to use a custom metric we can specify it in the metrics list.
results_str_metric = evaluate_raw_results(
    experiment_name="example_experiment_name",  # The name of the experiment
    path="./example_data/clustering_results",  # Path to the raw files
    metrics=["ACC", "RI", "AMI"],  # List of metrics to evaluate the results with
)

# The full list of available clustering metrics is:
# RI (rand index), AMI (adjusted mutual information), ACC (accuracy), NMI (normalized mutual information), ARI (adjusted rand index), MI (mutual information)
valid_cluster_metrics = ["RI", "AMI", "ACC", "NMI", "ARI", "MI"]
# The full list of available classification metrics is:
# ACC (accuracy), F1 (f1 score), Precision (precision score), Recall (recall score), Jacard (jaccard score), ROC_AUC (roc auc score), Brier (brier score), Log_Loss (log loss score) Balanced_Accuracy (balanced accuracy score), Top_k_Accuracy (top k accuracy score), Average_Precision (average precision score)
valid_classification_metrics = [
    "ACC",
    "F1",
    "Precision",
    "Recall",
    "Jacard",
    "ROC_AUC",
    "Brier",
    "Log_Loss",
    "Balanced_Accuracy",
    "Top_k_Accuracy",
    "Average_Precision",
]

evaluating estimator:  clustering_results
----> evaluating experiment:  kmeans-dtw
----> evaluating experiment:  kmeans-euclidean


In [5]:
# Custom metric or custom parameters to existing metrics
# A custom metric function can only have two parameters: true labels and predicted labels.
from sklearn.metrics import normalized_mutual_info_score
from sktime_estimator_evaluation.evaluation import MetricCallable


# In this example we want to use normalized mutual info score but pass additional
# parameters so you wrap it in a function like so:
def example_custom_NMI(true_labels, predicted_labels):
    return normalized_mutual_info_score(
        true_labels, predicted_labels, average_method="min"
    )


# We then construct a custom MetricCallable object OR dict.
custom_metric = MetricCallable(name="custom_NMI", callable=example_custom_NMI)
dict_custom_metric = {"name": "custom_dict_NMI", "callable": example_custom_NMI}
# Both are valid

result_custom_metric = evaluate_raw_results(
    experiment_name="example_experiment_name",
    path="./example_data/clustering_results",
    metrics=[
        custom_metric,
        dict_custom_metric,
        "ACC",
    ],  # We can still use other metrics
)

print(result_custom_metric)

evaluating estimator:  clustering_results
----> evaluating experiment:  kmeans-dtw
----> evaluating experiment:  kmeans-euclidean
[{'metric_name': 'custom_NMI', 'test_estimator_results': [{'estimator_name': 'kmeans-dtw', 'result':        folds         0         1         2         3         4         5  \
0      ACSF1  0.471553  0.511391  0.496259  0.503164  0.536588  0.519174   
1      Adiac  0.687087  0.671689  0.689086  0.699429  0.693124  0.697099   
2  ArrowHead  0.271769  0.247246  0.247246  0.247246  0.247246  0.247246   

          6         7         8  ...        20        21        22        23  \
0  0.513126  0.486407  0.498182  ...  0.553775  0.510917  0.521075  0.502281   
1  0.705582  0.705026  0.693871  ...  0.698392  0.698325  0.703904  0.684360   
2  0.271769  0.247246  0.278141  ...  0.278141  0.247246  0.278141  0.278141   

         24        25        26        27        28        29  
0  0.498182  0.504014  0.488849  0.455637  0.480972  0.468552  
1  0.700289  0.