# Implement custom tests

Custom tests or metrics extend the functionality of ValidMind, allowing you to document any model or use case with added flexibility.

ValidMind provides a comprehensive set of metrics out-of-the-box to evaluate and document your models and datasets. We recognize there will be cases where the default metrics do not support a model or dataset, or specific documentation is needed. In these cases, you can create and use your own custom code to accomplish what you need. To streamline custom code integration, we support the creation of custom metric functions.

This interactive notebook provides a step-by-step guide for implementing and registering custom metrics with ValidMind, running them individually, viewing the results on the ValidMind platform, and incorporating them into your model documentation template.



#### Prerequisites

We assume that you are familiar with Python and have a basic understanding of defining functions and using decorators. If you are new to these concepts, we recommend that you familiarize yourself with them before proceeding.


#### Key Concepts

- **Documentation Templates**: Documentation templates are used to define the structure of your model documentation. They specify the tests that should be run, and how the results should be displayed. In the context of this tutorial, you will not need to know how templates work, merely how to add custom metrics to them via the ValidMind Platform.
- **Tests**: Tests are the building blocks of ValidMind. They are used to evaluate and document models and datasets. Tests can be run individually or as part of a suite that is defined by your model documentation template.
- **Metrics**: Metrics are a subset of tests that do not have thresholds. In the context of this notebook, you can think of metrics and tests as interchangeable concepts.
- **Custom Metrics**: Custom metrics are functions that you define to evaluate your model or dataset. These functions can be registered with ValidMind to be used in the platform.
- **Inputs**: In the ValidMind framework, inputs are objects to be evaluated and documented. They can be any of the following:
    - **model**: A single model that has been initialized in ValidMind with `vm.init_model()`. See the [Model Documentation](https://docs.validmind.ai/validmind/validmind.html#init_model) or the for more information.
    - **dataset**: Single dataset that has been initialized in ValidMind with `vm.init_dataset()`. See the [Dataset Documentation](https://docs.validmind.ai/validmind/validmind.html#init_dataset) for more information.
    - **models**: A list of ValidMind models - usually this is used when you want to compare multiple models in your custom metric.
    - **datasets**: A list of ValidMind datasets - usually this is used when you want to compare multiple datasets in your custom metric. See this [example](https://docs.validmind.ai/notebooks/how_to/run_tests_that_require_multiple_datasets.html) for more information.
- **Parameters**: Parameters are additional arguments that can be passed when running a ValidMind test. These can be used to pass additional information to a metric, customize its behavior, or provide additional context.
- **Outputs**: Custom metrics can return any number of the following elements (in any order):
    - **table**: Either a list of dictionaries where each dictionary represents a row in the table, or a pandas DataFrame.
    - **plot**: A matplotlib or plotly figure.

#### Custom metric overview

A custom metric is any function that takes as arguments a set of inputs and optionally some parameters and returns one or more outputs. That's it! The function can be as simple or as complex as you need it to be. It can use external libraries, make API calls, or do anything else that you can do in Python. The only requirement is that the function signature and return values can be "understood" and handled by the ValidMind developer framework.

Now that you are familiar with what custom metrics are and the key concepts involved in creating and using them, let's dive into some hands-on examples!

## Before you begin

::: {.callout-tip}

### New to ValidMind?

To access the ValidMind Platform UI, you'll need an account.

Signing up is FREE — **[Create your account](https://app.prod.validmind.ai)**.
:::

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).


## Install the client library


In [None]:
%pip install -q validmind

## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details, making sure to select **Binary classification** as the template and **Marketing/Sales - Attrition/Churn Management** as the use case, and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/register-models-in-model-inventory.html))

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:


In [None]:
# Replace with your code snippet

import validmind as vm

vm.init(
  api_host = "...",
  api_key = "...",
  api_secret = "...",
  project = "..."
)

## Implement a Custom Metric

Let's start off by creating a simple custom metric that creates a Confusion Matrix for a binary classification model. We will use the `sklearn.metrics.confusion_matrix` function to calculate the confusion matrix and then display it as a heatmap using `plotly`. (This is already a built-in metric in ValidMind, but we will use it as an example to demonstrate how to create custom metrics.)

In [None]:
import matplotlib.pyplot as plt
from sklearn import metrics


@vm.metric("my_custom_metrics.ConfusionMatrix")
def confusion_matrix(dataset, model):
    """The confusion matrix is a table that is often used to describe the performance of a classification model on a set of data for which the true values are known.

    The confusion matrix is a 2x2 table that contains 4 values:

    - True Positive (TP): the number of correct positive predictions
    - True Negative (TN): the number of correct negative predictions
    - False Positive (FP): the number of incorrect positive predictions
    - False Negative (FN): the number of incorrect negative predictions

    The confusion matrix can be used to assess the holistic performance of a classification model by showing the accuracy, precision, recall, and F1 score of the model on a single figure.
    """
    y_true = dataset.y
    y_pred = dataset.y_pred(model_id=model.input_id)

    confusion_matrix = metrics.confusion_matrix(y_true, y_pred)

    cm_display = metrics.ConfusionMatrixDisplay(
        confusion_matrix=confusion_matrix,
        display_labels=[False, True]
    )
    cm_display.plot()

    plt.close()  # close the plot to avoid displaying it
    
    return cm_display.figure_  # return the figure object itself

Thats our custom metric defined and ready to go... Let's take a look at whats going on here:

- The function `confusion_matrix` takes two arguments `dataset` and `model`. This is a VMDataset and VMModel object respectively.
- The function docstring provides a description of what the metric does. This will be displayed along with the result in this notebook as well as in the ValidMind platform.
- The function body calculates the confusion matrix using the `sklearn.metrics.confusion_matrix` function and then plots it using `sklearn.metric.ConfusionMatrixDisplay`.
- The function then returns the `ConfusionMatrixDisplay.figure_` object - this is important as the ValidMind framework expects the output of the custom metric to be a plot or a table.
- The `@vm.metric` decorator is doing the work of creating a wrapper around the function that will allow it to be run by the ValidMind framework. It also registers the metric so it can be found by the ID `my_custom_metrics.ConfusionMatrix` (see the section below on how test IDs work in ValidMind and why this format is important)

## Run the Custom Metric

Now that we have defined and registered our custom metric, lets see how we can run it and properly use it in the ValidMind platform.

### Setup the Model and Dataset

First let's setup a an example model and dataset to run our custom metic against. Since this is a Confusion Matrix, we will use the Customer Churn dataset that ValidMind provides and train a simple XGBoost model.

In [None]:
import xgboost as xgb
from validmind.datasets.classification import customer_churn

raw_df = customer_churn.load_data()
train_df, validation_df, test_df = customer_churn.preprocess(raw_df)

x_train = train_df.drop(customer_churn.target_column, axis=1)
y_train = train_df[customer_churn.target_column]
x_val = validation_df.drop(customer_churn.target_column, axis=1)
y_val = validation_df[customer_churn.target_column]

model = xgb.XGBClassifier(early_stopping_rounds=10)
model.set_params(
    eval_metric=["error", "logloss", "auc"],
)
model.fit(
    x_train,
    y_train,
    eval_set=[(x_val, y_val)],
    verbose=False,
)

Easy enough! Now we have a model and dataset setup and trained. One last thing to do is bring the dataset and model into the ValidMind framework:

In [None]:
# for now, we'll just use the test dataset
vm_test_ds = vm.init_dataset(
    dataset=test_df,
    target_column=customer_churn.target_column,
    input_id="test_dataset",
)

vm_model = vm.init_model(model, input_id="model")

# link the model to the dataset
vm_test_ds.assign_predictions(model=vm_model)

### Run the Custom Metric

Now that we have our model and dataset setup, we have everything we need to run our custom metric. We can do this by importing the `run_test` function from the `validmind.tests` module and passing in the test ID of our custom metric along with the model and dataset we want to run it against.

>Notice how the `inputs` dictionary is used to map an `input_id` which we set above to the `model` and `dataset` keys that are expected by our custom metric function. This is how the ValidMind framework knows which inputs to pass to different metrics and is key when using many different datasets and models.

In [None]:
from validmind.tests import run_test

result = run_test("my_custom_metrics.ConfusionMatrix", inputs={"model": "model", "dataset": "test_dataset"})

You'll notice that the docstring becomes a markdown description of the test. The figure is then displayed as the test result. What you see above is how it will look in the ValidMind platform as well. Let's go ahead and log the result to see how that works.

In [None]:
result.log()

## Adding Custom Metrics to Model Documentation

To do this, go to the documentation page of the model you registered above and navigate to the `Model Development` -> `Model Evaluation` section. Then hover between any existing content block to reveal the `+` button as shown in the screenshot below.

![screenshot showing insert button for test-driven blocks](../../images/insert-test-driven-block.png)

Now click on the `+` button and select the `Test-Driven Block` option. This will open a dialog where you can select `Metric` as the type of test and the `My Custom Metrics Confusion Matrix` from the list of available metrics. You can preview the result and then click `Insert Block` to add it to the documentation.

![screenshot showing how to insert a test-driven block](../../images/insert-test-driven-block-custom.png)

The test should match the result you see above. It is now part of your documentation and will now be run everytime you run `vm.run_documentation_tests()` for your model. Let's do that now.

In [None]:
vm.reload()

If you preview the template, it should show the custom metric in the `Model Development`->`Model Evaluation` section:

In [None]:
vm.preview_template()

Just so we can run all of the tests in the template, let's initialize the train and raw dataset.

(see the `quickstart_customer_churn_full_suite.ipynb` notebook and the ValidMind docs for more information on what we are doing here)

In [None]:
vm_raw_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column=customer_churn.target_column,
    class_labels=customer_churn.class_labels,
)

vm_train_ds = vm.init_dataset(
    dataset=train_df,
    input_id="train_dataset",
    target_column=customer_churn.target_column,
)
vm_train_ds.assign_predictions(model=vm_model)

To run all the tests in the template, you can use the `vm.run_documentation_tests()` and pass the inputs we initialized above and the demo config from our customer_churn module. We will have to add a section to the config for our new test to tell it which inputs it should receive. This is done by simply adding a new element in the config dictionary where the key is the ID of the test and the value is a dictionary with the following structure:
```python
{
    "inputs": {
        "model": "test_dataset",
        "dataset": "model",
    }
}
```

In [None]:
from validmind.utils import preview_test_config

test_config = customer_churn.get_demo_test_config()
test_config["my_custom_metrics.ConfusionMatrix"] = {
    "inputs": {
        "dataset": "test_dataset",
        "model": "model",
    }
}
preview_test_config(test_config)

In [None]:
full_suite = vm.run_documentation_tests(config=test_config)

## Some More Custom Metrics

Now that you understand the entire process of creating custom metrics and using them in your documentation, let's create a few more to see different ways you can utilize custom metrics.

### Custom Metric: Table of Model Hyperparameters

This custom metric will display a table of the hyperparameters used in the model:

In [None]:
@vm.metric("my_custom_metrics.Hyperparameters")
def hyperparameters(model):
    """The hyperparameters of a machine learning model are the settings that control the learning process.
    These settings are specified before the learning process begins and can have a significant impact on the
    performance of the model.

    The hyperparameters of a model can be used to tune the model to achieve the best possible performance
    on a given dataset. By examining the hyperparameters of a model, you can gain insight into how the model
    was trained and how it might be improved.
    """
    hyperparameters = model.model.get_xgb_params() # dictionary of hyperparameters

    # turn the dictionary into a table where each row contains a hyperparameter and its value
    return [{"Hyperparam": k, "Value": v} for k, v in hyperparameters.items() if v]


result = run_test("my_custom_metrics.Hyperparameters", inputs={"model": "model"})
result.log()

Since the metric has been run and logged, you can add it to your documentation using the same process as above. It should look like this:

![screenshot showing hyperparameters metric](../../images/hyperparameters-custom-metric.png)

For our simple toy model, there are aren't really any proper hyperparameters but you can see how this could be useful for more complex models that have gone through hyperparameter tuning.

### Custom Metric: External API Call

This custom metric will make an external API call to get the current BTC price and display it as a table. This demonstrates how you might integrate external data sources into your model documentation in a programmatic way. You could, for instance, setup a pipeline that runs a metric like this every day to keep your model documentation in sync with an external system.

In [None]:
import requests


@vm.metric("my_custom_metrics.ExternalAPI")
def external_api():
    """This metric calls an external API to get the current BTC price. It then creates
    a table with the relevant data so it can be displayed in the documentation.

    The purpose of this metric is to demonstrate how to call an external API and use the
    data in a metric. A metric like this could even be setup to run in a scheduled
    pipeline to keep your documentation in-sync with an external data source.
    """
    url = "https://api.coindesk.com/v1/bpi/currentprice.json"
    response = requests.get(url)
    data = response.json()

    # extract the time and the current BTC price in USD
    return [
        {
            "Time": data["time"]["updated"],
            "Price (USD)": data["bpi"]["USD"]["rate"],
        }
    ]


result = run_test("my_custom_metrics.ExternalAPI")
result.log()

Again, you can add this to your documentation to see how it looks:

![screenshot showing BTC price metric](../../images/btc-price-custom-metric.png)

### Custom Metric: Passing Parameters

Custom metric functions, as stated earlier, can take both inputs and params. When you define your function there is no need to distinguish between the two, the ValidMind framework will handle that for you. You simply need to add both to the function as arguments and the framework will pass in the correct values.

So for instance, if you wanted to parameterize the first custom metric we created, the confusion matrix, you could do so like this:

```python
def confusion_matrix(dataset: VMDataset, model: VMModel, my_param: str = "Default Value"):
    pass
```

And then when you run the test, you can pass in the parameter like this:

```python
vm.run_test(
    "my_custom_metrics.ConfusionMatrix",
    inputs={"model": "model", "dataset": "test_dataset"},
    params={"my_param": "My Value"},
)
```

Or if you are running the entire documentation template, you would update the config like this:

```python
test_config["my_custom_metrics.ConfusionMatrix"] = {
    "inputs": {
        "dataset": "test_dataset",
        "model": "model",
    },
    "params": {
        "my_param": "My Value",
    },
}
```

Let's go ahead and create a toy metric that takes a parameter and uses it in the result:

In [None]:
import plotly_express as px


@vm.metric("my_custom_metrics.ParameterExample")
def parameter_example(plot_title = "Default Plot Title", x_col="sepal_width", y_col="sepal_length"):
    """This metric takes two parameters and creates a scatter plot based on them.

    The purpose of this metric is to demonstrate how to create a metric that takes
    parameters and uses them to generate a plot. This can be useful for creating
    metrics that are more flexible and can be used in a variety of scenarios.
    """
    # return px.scatter(px.data.iris(), x=x_col, y=y_col, color="species")
    return px.scatter(px.data.iris(), x=x_col, y=y_col, color="species", title=plot_title)


result = run_test(
    "my_custom_metrics.ParameterExample",
    params={
        "plot_title": "My Cool Plot",
        "x_col": "sepal_width",
        "y_col": "sepal_length",
    },
)
result.log()

Play around with this and see how you can use parameters, default values and other features to make your custom metrics more flexible and useful.

Here's how this one looks in the documentation:
![screenshot showing parameterized metric](../../images/parameterized-custom-metric.png)

### Custom Metric: Multiple Tables and Plots in a Single Metric

Custom metric functions, as stated earlier, can return more than just one table or plot. In fact, any number of tables and plots can be returned. Let's see an example of this:

In [None]:
import numpy as np
import plotly_express as px

@vm.metric("my_custom_metrics.ComplexOutput")
def complex_output():
    """This metric demonstrates how to return many tables and figures in a single metric"""
    # create a couple tables
    table = [{"A": 1, "B": 2}, {"A": 3, "B": 4}]
    table2 = [{"C": 5, "D": 6}, {"C": 7, "D": 8}]

    # create a few figures showing some random data
    fig1 = px.line(x=np.arange(10), y=np.random.rand(10), title="Random Line Plot")
    fig2 = px.bar(x=["A", "B", "C"], y=np.random.rand(3), title="Random Bar Plot")
    fig3 = px.scatter(x=np.random.rand(10), y=np.random.rand(10), title="Random Scatter Plot")

    return {
        "My Cool Table": table,
        "Another Table": table2,
    }, fig1, fig2, fig3


result = run_test("my_custom_metrics.ComplexOutput")
result.log()

Notice how you can return the tables as a dictionary where the key is the title of the table and the value is the table itself. You could also just return the tables by themselves but this way you can give them a title to more easily identify them in the result.

![screenshot showing multiple tables and plots](../../images/multiple-tables-plots-custom-metric.png)

### Custom Metric: Images

If you are using a plotting library that isn't supported by ValidMind (i.e. not `matplotlib` or `plotly`), you can still return the image directly as a bytes-like object. This could also be used to bring any type of image into your documentation in a programmatic way. For instance, you may want to include a diagram of your model architecture or a screenshot of a dashboard that your model is integrated with. As long as you can produce the image with Python or open it from a file, you can include it in your documentation.

In [None]:
import io
import matplotlib.pyplot as plt


@vm.metric("my_custom_metrics.Image")
def image():
    """This metric demonstrates how to return an image in a metric"""

    # create a simple plot
    fig, ax = plt.subplots()
    ax.plot([1, 2, 3, 4])
    ax.set_title("Simple Line Plot")

    # save the plot as a PNG image (in-memory buffer)
    img_data = io.BytesIO()
    fig.savefig(img_data, format="png")
    img_data.seek(0)

    plt.close()  # close the plot to avoid displaying it

    return img_data.read()


result = run_test("my_custom_metrics.Image")
result.log()

Adding this custom metric to your documentation will display the image:

![screenshot showing image custom metric](../../images/image-in-custom-metric.png)

## Conclusion

In this notebook, we have demonstrated how to create custom metrics in ValidMind. We have shown how to define custom metric functions, register them with the ValidMind framework, run them against models and datasets, and add them to model documentation templates. We have also shown how to return tables and plots from custom metrics and how to use them in the ValidMind platform. We hope this tutorial has been helpful in understanding how to create and use custom metrics in ValidMind.

## Next steps

You can look at the results of this test plan right in the notebook where you ran the code, as you would expect. But there is a better way: view the test results as part of your model documentation in the ValidMind Platform UI:

1. In the [Platform UI](https://app.prod.validmind.ai), go to the **Documentation** page for the model you registered earlier.

2. Expand **Model Development**

What you can see now is a more easily consumable version of the model diagnosis tests you just performed, along with other parts of your model documentation that still need to be completed.

### Find your next how-to guide

If you want to learn more about where you are in the model documentation process, take a look at [Get started with the ValidMind Developer Framework](https://docs.validmind.ai/guide/get-started-developer-framework.html).

As always, to learn more about ValidMind, please visit our [documentation](https://docs.validmind.ai/).