# Implementing Custom Metrics in ValidMind

ValidMind strives to offer a comprehensive set of metrics out of the box to help you evaluate and document your models and datasets. However, we understand that there will always be cases where a model or dataset is not supported or where you need to document specific metrics that are not part of the default set. In these cases, you will want to create and use your own code to accomplish what you need. We aim to make the process of using custom code as seamless as possible. To this end, we offer support for custom metric functions. Custom metrics offer added flexibility by extending the default metrics provided by ValidMind, enabling you to document any type of model or use case. In this notebook, we will demonstrate how to implement custom metrics, register them with ValidMind, run them individually and see the result in the ValidMind platform, and, finally, add them as part of your model documentation template.

#### Prerequisites

We assume that you are familiar with Python and have a basic understanding of defining functions and using decorators. If you are new to these concepts, we recommend that you familiarize yourself with them before proceeding.


#### Key Concepts

- **Documentation Templates**: Documentation templates are used to define the structure of your model documentation. They specify the tests that should be run, and how the results should be displayed. In the context of this tutorial, you will not need to know how templates work, merely how to add custom metrics to them via the ValidMind Platform.
- **Tests**: Tests are the building blocks of ValidMind. They are used to evaluate and document models and datasets. Tests can be run individually or as part of a suite that is defined by your model documentation template.
- **Metrics**: Metrics are a subset of tests that do not have thresholds. In the context of this notebook, you can think of metrics and tests as interchangeable concepts.
- **Custom Metrics**: Custom metrics are functions that you define to evaluate your model or dataset. These functions can be registered with ValidMind to be used in the platform.
- **Inputs**: In the ValidMind framework, inputs are objects to be evaluated and documented. They can be any of the following:
    - **model**: A single model that has been initialized in ValidMind with `vm.init_model()`. See the [Model Documentation](https://docs.validmind.ai/validmind/validmind.html#init_model) or the for more information.
    - **dataset**: Single dataset that has been initialized in ValidMind with `vm.init_dataset()`. See the [Dataset Documentation](https://docs.validmind.ai/validmind/validmind.html#init_dataset) for more information.
    - **models**: A list of ValidMind models - usually this is used when you want to compare multiple models in your custom metric.
    - **datasets**: A list of ValidMind datasets - usually this is used when you want to compare multiple datasets in your custom metric. See this [example](https://docs.validmind.ai/notebooks/how_to/run_tests_that_require_multiple_datasets.html) for more information.
- **Parameters**: Parameters are additional arguments that can be passed when running a ValidMind test. These can be used to pass additional information to a metric, customize its behavior, or provide additional context.
- **Outputs**: Custom metrics can return any number of the following elements (in any order):
    - **table**: Either a list of dictionaries where each dictionary represents a row in the table, or a pandas DataFrame.
    - **plot**: A matplotlib or plotly figure.

#### Custom Metric Overview

A custom metric is any function that takes as arguments a set of inputs and optionally some parameters and returns one or more outputs. That's it! The function can be as simple or as complex as you need it to be. It can use external libraries, make API calls, or do anything else that you can do in Python. The only requirement is that the function signature and return values can be "understood" and handled by the ValidMind developer framework.

Now that you are familiar with what custom metrics are and the key concepts involved in creating and using them, let's dive into some hands-on examples!

## Before you begin

::: {.callout-tip}

### New to ValidMind?

To access the ValidMind Platform UI, you'll need an account.

Signing up is FREE — **[Create your account](https://app.prod.validmind.ai)**.
:::

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).


## Install the client library


In [1]:
%pip install -q validmind

[0mNote: you may need to restart the kernel to use updated packages.


## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details, making sure to select **Binary classification** as the template and **Marketing/Sales - Attrition/Churn Management** as the use case, and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/register-models-in-model-inventory.html))

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:


In [2]:
# Replace with your code snippet

import validmind as vm

vm.init(
  api_host = "...",
  api_key = "...",
  api_secret = "...",
  project = "..."
)

2024-04-01 17:31:13,552 - INFO(validmind.api_client): Connected to ValidMind. Project: [Demo] Customer Churn Model - Initial Validation (clo0f697t003dn8rycwynwlox)


## Implement a Custom Metric

Let's start off by creating a simple custom metric that creates a Confusion Matrix for a binary classification model. We will use the `sklearn.metrics.confusion_matrix` function to calculate the confusion matrix and then display it as a heatmap using `plotly`. (This is already a built-in metric in ValidMind, but we will use it as an example to demonstrate how to create custom metrics.)

In [10]:
import matplotlib.pyplot as plt
from sklearn import metrics

@vm.metric("my_custom_metrics.ConfusionMatrix")
def confusion_matrix(dataset, model):
    """The confusion matrix is a table that is often used to describe the performance of a classification model on a set of data for which the true values are known.

    The confusion matrix is a 2x2 table that contains 4 values:

    - True Positive (TP): the number of correct positive predictions
    - True Negative (TN): the number of correct negative predictions
    - False Positive (FP): the number of incorrect positive predictions
    - False Negative (FN): the number of incorrect negative predictions

    The confusion matrix can be used to assess the holistic performance of a classification model by showing the accuracy, precision, recall, and F1 score of the model on a single figure.
    """
    y_true = dataset.y
    y_pred = dataset.y_pred(model_id=model.input_id)

    confusion_matrix = metrics.confusion_matrix(y_true, y_pred)

    cm_display = metrics.ConfusionMatrixDisplay(
        confusion_matrix=confusion_matrix,
        display_labels=[False, True]
    )
    cm_display.plot()

    plt.close()  # close the plot to avoid displaying it
    
    return cm_display.figure_  # return the figure object itself

Registering custom metric with ID: my_custom_metrics.ConfusionMatrix


Thats our custom metric defined and ready to go... Let's take a look at whats going on here:

- The function `confusion_matrix` takes two arguments `dataset` and `model`. This is a VMDataset and VMModel object respectively.
- The function docstring provides a description of what the metric does. This will be displayed along with the result in this notebook as well as in the ValidMind platform.
- The function body calculates the confusion matrix using the `sklearn.metrics.confusion_matrix` function and then plots it using `sklearn.metric.ConfusionMatrixDisplay`.
- The function then returns the `ConfusionMatrixDisplay.figure_` object - this is important as the ValidMind framework expects the output of the custom metric to be a plot or a table.
- The `@vm.metric` decorator is doing the work of creating a wrapper around the function that will allow it to be run by the ValidMind framework. It also registers the metric so it can be found by the ID `my_custom_metrics.ConfusionMatrix` (see the section below on how test IDs work in ValidMind and why this format is important)

## Run the Custom Metric

Now that we have defined and registered our custom metric, lets see how we can run it and properly use it in the ValidMind platform.

### Setup the Model and Dataset

First let's setup a an example model and dataset to run our custom metic against. Since this is a Confusion Matrix, we will use the Customer Churn dataset that ValidMind provides and train a simple XGBoost model.

In [4]:
import xgboost as xgb
from validmind.datasets.classification import customer_churn

raw_df = customer_churn.load_data()
train_df, validation_df, test_df = customer_churn.preprocess(raw_df)

x_train = train_df.drop(customer_churn.target_column, axis=1)
y_train = train_df[customer_churn.target_column]
x_val = validation_df.drop(customer_churn.target_column, axis=1)
y_val = validation_df[customer_churn.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)

Easy enough! Now we have a model and dataset setup and trained. One last thing to do is bring the dataset and model into the ValidMind framework:

In [5]:
# for now, we'll just use the test dataset
vm_test_ds = vm.init_dataset(
    dataset=test_df,
    target_column=customer_churn.target_column,
    input_id="test_dataset",
)

vm_model = vm.init_model(model, input_id="model")

# link the model to the dataset
vm_test_ds.assign_predictions(model=vm_model)

2024-04-01 17:31:13,827 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-04-01 17:31:14,629 - INFO(validmind.vm_models.dataset): Running predict()... This may take a while


Great, now let's see how we can run our metric against this model and dataset.

In [11]:
from validmind.tests import run_test


result = run_test("my_custom_metrics.ConfusionMatrix", dataset=vm_test_ds, model=vm_model)

VBox(children=(HTML(value='<h1>Confusion Matrix</h1>'), HTML(value='<p>The confusion matrix is a table that is…