# Demo: Binary Classification Engine

This notebook demonstrates how to use the binary classification engine from the core library of Artifact-ML to evaluate a binary classification experiment.

We'll walk through:

1. Loading the classification results
2. Setting up the validation engine
3. Computing various validation artifacts

## Setup

First, we'll set up our environment and import the necessary libraries.

In [None]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [None]:
from pathlib import Path

import pandas as pd
from artifact_core.binary_classification import (
    BinaryClassificationArrayCollectionType,
    BinaryClassificationArrayType,
    BinaryClassificationEngine,
    BinaryClassificationPlotCollectionType,
    BinaryClassificationPlotType,
    BinaryClassificationScoreCollectionType,
    BinaryClassificationScoreType,
    BinaryFeatureSpec,
)
from scipy.special import logit as logt_fn

## Loading the Data

We'll load the classification results from a file.

In [None]:
artifact_core_root = Path().absolute().parent.parent

df_classification_results = pd.read_csv(artifact_core_root / "assets/binary_classification.csv")

Let's examine the classification results to understand their structure and content:

In [None]:
df_classification_results

Let's parse the results to extract quantities required for evaluation:

In [None]:
true = df_classification_results["arthritis_true"].to_dict()
predicted = df_classification_results["arthritis_pred"].to_dict()
probs_pos = df_classification_results["arthritis_prob_est"].to_dict()
logits = df_classification_results["arthritis_prob_est"].apply(logt_fn).to_dict()

## Resource Specification Setup

Before we can evaluate the results, we need to specify labelling metadata.

This information helps the binary classification comparison engine interpret the classification results appropriately.

In [None]:
class_spec = BinaryFeatureSpec(ls_categories=["0", "1"], positive_category="1")

class_spec

## Initializing the Validation Engine

Now we'll initialize the BinaryClassificationEngine with our resource specification.

This engine will handle all tasks related to classification result evaluation.

In [None]:
engine = BinaryClassificationEngine(resource_spec=class_spec)

## Computing Validation Artifacts

Let’s use the engine to explore a range of validation artifacts that capture the quality of the classification results.

In [None]:
engine.produce_classification_score(
    score_type=BinaryClassificationScoreType.ACCURACY,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score(
    score_type=BinaryClassificationScoreType.PRECISION,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score(
    score_type=BinaryClassificationScoreType.RECALL,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score(
    score_type=BinaryClassificationScoreType.FNR,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score(
    score_type=BinaryClassificationScoreType.FPR,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score(
    score_type=BinaryClassificationScoreType.TNR,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score(
    score_type=BinaryClassificationScoreType.MCC,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score(
    score_type=BinaryClassificationScoreType.GROUND_TRUTH_PROB_MEAN,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score_collection(
    score_collection_type=BinaryClassificationScoreCollectionType.GROUND_TRUTH_PROB_STATS,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score_collection(
    score_collection_type=BinaryClassificationScoreCollectionType.BINARY_PREDICTION_SCORES,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score(
    score_type=BinaryClassificationScoreType.ROC_AUC,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score(
    score_type=BinaryClassificationScoreType.PR_AUC,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score_collection(
    score_collection_type=BinaryClassificationScoreCollectionType.THRESHOLD_VARIATION_SCORES,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score_collection(
    score_collection_type=BinaryClassificationScoreCollectionType.NORMALIZED_CONFUSION_COUNTS,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score_collection(
    score_collection_type=BinaryClassificationScoreCollectionType.SCORE_MEANS,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_score_collection(
    score_collection_type=BinaryClassificationScoreCollectionType.POSITIVE_CLASS_SCORE_STATS,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_array(
    array_type=BinaryClassificationArrayType.CONFUSION_MATRIX,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_array_collection(
    array_collection_type=BinaryClassificationArrayCollectionType.CONFUSION_MATRICES,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_plot(
    plot_type=BinaryClassificationPlotType.CONFUSION_MATRIX_PLOT,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
dict_plots = engine.produce_classification_plot_collection(
    plot_collection_type=BinaryClassificationPlotCollectionType.CONFUSION_MATRIX_PLOTS,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

print(dict_plots)

dict_plots["TRUE"]

In [None]:
engine.produce_classification_plot(
    plot_type=BinaryClassificationPlotType.ROC_CURVE,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_plot(
    plot_type=BinaryClassificationPlotType.PR_CURVE,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_plot(
    plot_type=BinaryClassificationPlotType.DET_CURVE,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_plot(
    plot_type=BinaryClassificationPlotType.RECALL_THRESHOLD_CURVE,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
engine.produce_classification_plot(
    plot_type=BinaryClassificationPlotType.PRECISION_THRESHOLD_CURVE,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
dict_plots = engine.produce_classification_plot_collection(
    plot_collection_type=BinaryClassificationPlotCollectionType.THRESHOLD_VARIATION_CURVES,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

print(dict_plots)

In [None]:
dict_plots["roc"]

In [None]:
engine.produce_classification_plot(
    plot_type=BinaryClassificationPlotType.SCORE_PDF,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

In [None]:
dict_plots = engine.produce_classification_plot_collection(
    plot_collection_type=BinaryClassificationPlotCollectionType.SCORE_PDF_PLOTS,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)

print(dict_plots)

In [None]:
dict_plots["NEGATIVE"]

In [None]:
engine.produce_classification_plot(
    plot_type=BinaryClassificationPlotType.GROUND_TRUTH_PROB_PDF,
    true=true,
    predicted=predicted,
    probs_pos=probs_pos,
)