<h1>Evaluation of metrics results</h1>

To evaluate the results there are two states results can be in:
1) 'Raw' - the results are not yet processed.
2) 'Metric results' - the raw have been turned into metric results over n folds.

This notebook looks at 'Metric results', see evaluation_raw_results.ipynb for metric results

<h2> Metric results </h2>
Metric results are those where the raw results have been processed and csv of the
following format have been produced:

<h3>Example file</h3>
see notebooks/example_data/ACC.csv

In [1]:
import csv

import pandas as pd

with open("./example_data/ACC.csv", newline="\n") as csvfile:
    results = list(csv.reader(csvfile, delimiter=","))

print(pd.DataFrame(results))

# Each result csv will be in the following format:
# | folds    | 0   | 1   | 2   | 3   | ... |
# ------------------------------------------
# | dataset1 | 0.0 | 0.0 | 0.0 | 0.0 | ... |
# | dataset2 | 0.0 | 0.0 | 0.0 | 0.0 | ... |
# | dataset3 | 0.0 | 0.0 | 0.0 | 0.0 | ... |
# | dataset4 | 0.0 | 0.0 | 0.0 | 0.0 | ... |

                0                     1                    2   \
0            folds                     0                    1   
1            ACSF1                  0.12                 0.14   
2            Adiac  0.020460358056265986  0.05115089514066496   
3        ArrowHead                  0.08  0.29714285714285715   
4             Beef   0.23333333333333334  0.26666666666666666   
..             ...                   ...                  ...   
104          Wafer     0.373134328358209    0.373134328358209   
105           Wine    0.5185185185185185   0.5185185185185185   
106          Worms   0.22077922077922077   0.1038961038961039   
107  WormsTwoClass    0.5714285714285714  0.42857142857142855   
108           Yoga                 0.536                0.536   

                      3                     4                    5   \
0                      2                     3                    4   
1                   0.05                  0.02                 0.13   
2    0

<h2>Metric results from 'raw' results</h2>

In [2]:
# The results being processed from 'raw' using this packages functionality will return
# a List[Dict]. See evaluation_raw_results.ipynb for more information.
from sktime_estimator_evaluation.evaluation import (
    CLUSTER_METRIC_CALLABLES,
    evaluate_raw_results,
)

returned_metric_results = evaluate_raw_results(
    experiment_name="example_experiment_name",  # The name of the experiment
    path="./example_data/clustering_results",  # Path to the raw files
    metrics=CLUSTER_METRIC_CALLABLES,  # List of metrics to evaluate the results with
)

# This format can be used now across this package

evaluating estimator:  clustering_results
----> evaluating experiment:  kmeans-dtw
----> evaluating experiment:  kmeans-euclidean


<h2>Metric results from output directory from 'raw' results

In [3]:
# We can choose to output the metric results using the above function to a directory
# See evaluation_raw_results.ipynb for more information.
from sktime_estimator_evaluation.evaluation import evaluate_metric_results

evaluate_raw_results(
    experiment_name="example_experiment_name",  # The name of the experiment
    path="./example_data/clustering_results",  # Path to the raw files
    metrics=CLUSTER_METRIC_CALLABLES,  # List of metrics to evaluate the results with
    output_dir="./example_data/output/clustering_results",  # Path to the output directory
)

# This will output a file we can read in using the following functionality
metric_results_from_file = evaluate_metric_results(
    path="./example_data/output/clustering_results/"
)
print(metric_results_from_file)

evaluating estimator:  clustering_results
----> evaluating experiment:  kmeans-dtw
----> evaluating experiment:  kmeans-euclidean
[{'metric_name': 'ACC', 'test_estimator_results': [{'estimator_name': 'kmeans-dtw', 'result':        folds        0         1         2         3         4         5  \
0      ACSF1  0.12000  0.140000  0.050000  0.020000  0.130000  0.060000   
1      Adiac  0.02046  0.051151  0.038363  0.030691  0.048593  0.046036   
2  ArrowHead  0.08000  0.297143  0.188571  0.371429  0.297143  0.114286   

          6        7         8  ...        20        21        22        23  \
0  0.140000  0.06000  0.000000  ...  0.110000  0.010000  0.150000  0.210000   
1  0.015345  0.01023  0.030691  ...  0.035806  0.071611  0.040921  0.015345   
2  0.371429  0.44000  0.542857  ...  0.148571  0.371429  0.542857  0.428571   

         24        25        26        27        28        29  
0  0.110000  0.010000  0.050000  0.030000  0.080000  0.120000  
1  0.025575  0.005115  0.07161

<h2>Metric result from a unformatted directory </h2>

In [4]:
# If you have a directory of metric csvs which is not in the format outputted from
# this package, you need to define how the name of the estimator, metric name and
# the split are derived from the file. This is done using a function which takes
# a path parameters which is the path to the csv. It then must return three values:
# estimator_name, metric_name and split.
import platform


def custom_classification(path: str):
    # Check os to determine split value
    if "Windows" in platform.platform():
        split_subdir = path.split("\\")
    else:
        split_subdir = path.split("/")
    metric_name = "ACC"
    file_name_split = split_subdir[-1].split("_")
    estimator_name = file_name_split[0]
    split = file_name_split[0].split("FOLDS")[0].lower()
    return estimator_name, metric_name, split


classification_results = evaluate_metric_results("../results/", custom_classification)

print(classification_results)

[{'metric_name': 'ACC', 'test_estimator_results': [{'estimator_name': 'Arsenal', 'result':             folds:         0         1         2         3         4  \
0            ACSF1  0.890000  0.780000  0.790000  0.840000  0.800000   
1            Adiac  0.777494  0.772379  0.749361  0.746803  0.795396   
2        ArrowHead  0.817143  0.862857  0.845714  0.897143  0.857143   
3             Beef  0.833333  0.800000  0.733333  0.666667  0.900000   
4        BeetleFly  0.900000  0.850000  0.800000  0.950000  0.850000   
..             ...       ...       ...       ...       ...       ...   
107           Wine  0.833333  0.944444  0.944444  0.907407  0.870370   
108   WordSynonyms  0.742947  0.769592  0.755486  0.747649  0.763323   
109          Worms  0.727273  0.662338  0.740260  0.753247  0.688312   
110  WormsTwoClass  0.779221  0.805195  0.831169  0.753247  0.792208   
111           Yoga  0.905333  0.916333  0.905667  0.921333  0.897667   

            5         6         7         8 

<h2>Summary data</h2>

In [6]:
# The above gives lots of dictionaries in a list, which while for storage is fine,
# is not very easy to process through functions. We can summaries all the above into
# one dataframe by aggregating all the folds and creating one big dataframe.
# This dataframe will be of the format:
#         ----------------------------------
#         | estimator | dataset | metric1  | metric2 |
#         | cls1      | data1   | 1.2      | 1.2     |
#         | cls2      | data2   | 3.4      | 1.4     |
#         | cls1      | data2   | 1.4      | 1.3     |
#         | cls2      | data1   | 1.3      | 1.2     |
#         ----------------------------------
# How to produce this is shown.
from sktime_estimator_evaluation.evaluation import metric_result_to_summary

# We can read in results or we can the result evaluate function return
metrics_from_raw_results = evaluate_raw_results(
    experiment_name="example_experiment_name",  # The name of the experiment
    path="./example_data/clustering_results",  # Path to the raw files
    metrics=CLUSTER_METRIC_CALLABLES,  # List of metrics to evaluate the results with
)

metric_results_from_file = evaluate_metric_results(
    path="./example_data/output/clustering_results/"
)

df_from_raw_results = metric_result_to_summary(metrics_from_raw_results)
print(df_from_raw_results)

df_from_file = metric_result_to_summary(metric_results_from_file)
print(df_from_file)

evaluating estimator:  clustering_results
----> evaluating experiment:  kmeans-dtw
----> evaluating experiment:  kmeans-euclidean
          estimator    dataset        RI       AMI       NMI       ARI  \
0        kmeans-dtw      ACSF1  0.660040  0.258515  0.376401  0.100716   
1        kmeans-dtw      Adiac  0.934916  0.475209  0.626178  0.240566   
2        kmeans-dtw  ArrowHead  0.634654  0.245863  0.254177  0.208196   
3  kmeans-euclidean      ACSF1  0.682424  0.236444  0.358955  0.098812   
4  kmeans-euclidean      Adiac  0.947642  0.427568  0.611580  0.244806   
5  kmeans-euclidean  ArrowHead  0.566174  0.271135  0.282123  0.178624   

         MI       ACC  
0  0.696623  0.089667  
1  2.042572  0.027451  
2  0.268425  0.320190  
3  0.673076  0.140000  
4  2.068850  0.010230  
5  0.246564  0.165714  
          estimator    dataset       ACC       AMI       ARI        MI  \
0        kmeans-dtw      ACSF1  0.089667  0.258515  0.100716  0.696623   
1        kmeans-dtw      Adiac  0.0