# ValidMind Introduction for Model Developers

This interactive notebook guides you through the process of documenting a model with the ValidMind Developer Framework. It uses a binary classification model as an example, but the same principles apply to other model types.

As part of the notebook, you will learn how to start documenting a model as a **Model Developer** persona. At this stage the assumption is that there has been a [Model Documentation template](https://docs.validmind.com/guide/swap-documentation-templates.html#view-current-templates) defined in the platform.

## Overview of the Notebook

1. Initializing the ValidMind Developer Framework:

   ValidMind’s developer framework provides a rich collection of documentation tools and test suites, from documenting descriptions of your dataset to validation testing your models for weak spots and overfit areas.

2. Start the model development process with raw data and run out-of-the box tests and add evidence to model documentation

   In this stage the notebook will provide you details on how to access ValidMind's test repository to individual tests that you will use as building blocks to ensure a model is being built appropriately. The goal is to show how to run tests, investigate results and add tests / evidence to the documentation.

   For a full list of out-of-box tests please refer to: https://docs.validmind.com/guide/test-descriptions.html

3. Next we are going to build upon the previous step, but the focus here is implementation of Custom Tests

   In this stage the notebook will provide details on how to implement custom tests. Usually, model developers have a lot of their own custom tests and it is important to include this within the model documentation. We will show how you how to include custom tests and then how they can be implemented within the documentation as additional evidence.

4. The final part of the notebook will show you how to ensure completion of documentation

   In this stage the notebook will provide details on how to ensure that model documentation and associated sections in the model documentation have been built out, and if there are any changes to testing due to additional data processing or data analysis requirements. The notebook will show how to update results for existing tests.


## ValidMind at a glance

ValidMind's platform enables organizations to identify, document, and manage model risks for all types of models, including AI/ML models, LLMs, and statistical models. As a model developer, you use the ValidMind Developer Framework to automate documentation and validation tests, and then use the ValidMind AI Risk Platform UI to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators.

If this is your first time trying out ValidMind, you can make use of the following resources alongside this notebook:

- [Get started](https://docs.validmind.ai/guide/get-started.html) — The basics, including key concepts, and how our products work
- [Get started with the ValidMind Developer Framework](https://docs.validmind.ai/guide/get-started-developer-framework.html) — The path for developers, more code samples, and our developer reference

It is important to note that in order to connect to the Developer Framework you will have to access this through our API's using Python.


## Before you begin

::: {.callout-tip}

### New to ValidMind?

For access to all features available in this notebook, create a free ValidMind account.

Signing up is FREE — [**Sign up now**](https://app.prod.validmind.ai)
:::

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).


## 1. Initializing the ValidMind Developer Framework


## Install the client library

Please note the following recommended Python versions to utilize:

- Python version 3.7 > x <= 3.11

The client library provides Python support for the ValidMind Developer Framework. To install it run:


In [None]:
%pip install -q validmind

## Register a new model in ValidMind UI and initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/register-models-in-model-inventory.html))

   For example, to register a model for use with this notebook, select:

   - Documentation template: `Binary classification`
   - Use case: `Marketing/Sales - Attrition/Churn Management`

   You can fill in other options according to your preference.

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:


In [None]:
# Replace with your code snippet

import validmind as vm

vm.init(
    api_host="https://api.prod.validmind.ai/api/v1/tracking",
    api_key="...",
    api_secret="...",
    project="...",
)

## Verify & preview the documentation template

Here we want to verify that we have connected with ValidMnd and that the appropriate template is selected. A template predefines sections for your model documentation and provides a general outline to follow, making the documentation process much easier.

You will upload documentation and test results into this template later on. For now, take a look at the structure that the template provides with the `vm.preview_template()` function from the ValidMind library and note the empty sections:


In [None]:
vm.preview_template()

Finally let's observe the the list of all available tests in the ValidMind Developer Framework:


In [None]:
vm.tests.list_tests()

## 2. Start the model development process with raw data and run out-of-the box tests and add evidence to model documentation

In this section we will provide details on how to understand individual tests available in ValidMind, how you can access each test, run it and change parameters if necessary. You will be using an example dataset provided by ValidMind.


In [None]:
from validmind.datasets.classification import customer_churn as demo_dataset

df_raw = demo_dataset.load_data()
df_raw.head()

Let's do some data quality assessments by running a few individual tests related to data assessment. You will be using the `vm.tests.list_tests()` function above in combination with `vm.tests.list_tags()` and `vm.tests.list_task_types()` to find which prebuilt tests are relevant for data quality assessment.


In [None]:
# Get the list of available tags
sorted(vm.tests.list_tags())

In [None]:
# Get the list of available task types
sorted(vm.tests.list_task_types())

We can pass `tags` and `task_types` as parameters to the `vm.tests.list_tests()` function to filter the tests based on the tags and task types. For example, to find tests related to tabular data quality for classification models, you can call `list_tests()` like this:


In [None]:
vm.tests.list_tests(task="classification", tags=["tabular_data", "data_quality"])

### Initialize the ValidMind datasets

Now we assume we have identified some tests we want to run with regards to the data we are intending to use. The next step is to connect your data with a ValidMind dataset object. This step is always necessary every time you want to connect a dataset to documentation and produce tests through ValidMind. You only need to do it one time per dataset.

You can initialize a ValidMind dataset object using the [`init_dataset`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) function from the ValidMind (`vm`) module.

This function takes a number of arguments:

- `dataset` — the raw dataset that you want to provide as input to tests
- `input_id` - a unique identifier that allows tracking what inputs are used when running each individual test
- `target_column` — a required argument if tests require access to true values. This is the name of the target column in the dataset


In [None]:
# vm_raw_dataset is now a VMDataset object that you can pass to any ValidMind test
vm_raw_dataset = vm.init_dataset(
    dataset=df_raw,
    input_id="raw_dataset",
    target_column="Exited",
)

### Run some tabular data tests

Individual tests can be easily run by calling the `run_test` function provided by the `validmind.tests` module. The function takes the following arguments:

- `test_id`: The ID of the test to run. To find a particular test and get its ID, refer to the [explore_tests](../how_to/explore_tests.ipynb) notebook. Look above for example after running 'vm.test_suites.describe_suite' as column 'Test ID' will contain the id.
- `params`: A dictionary of parameters for the test. These will override any `default_params` set in the test definition. Refer to the [explore_tests](../how_to/explore_tests.ipynb) notebook to find the default parameters for a test. See below for examples.

The inputs expected by a test can also be found in the test definition. Let's take `validmind.data_validation.DescriptiveStatistics` as an example. Note that the output of the `describe_test()` function below shows that this test expects a `dataset` as input:


In [None]:
vm.tests.describe_test("validmind.data_validation.DescriptiveStatistics")

Now, let's run a few tests to assess the quality of the dataset.


In [None]:
test = vm.tests.run_test(
    test_id="validmind.data_validation.DescriptiveStatistics",
    inputs={"dataset": vm_raw_dataset},
)

In [None]:
test2 = vm.tests.run_test(
    test_id="validmind.data_validation.ClassImbalance",
    inputs={"dataset": vm_raw_dataset},
    params={"min_percent_threshold": 30},
)

You can see that the class imbalance test did not pass accordig to the value of `min_percent_threshold` we have set. Here is how you can re-run the test on some processed data to address this data quality issue. In this case we apply a very simple rebalance technique to the dataset.


In [None]:
import pandas as pd

df_raw_new = df_raw.sample(frac=1)  # Create a copy of the raw dataset

# Create a balanced dataset with the same number of exited and not exited customers
exited_df = df_raw_new.loc[df_raw_new["Exited"] == 1]
not_exited_df = df_raw_new.loc[df_raw_new["Exited"] == 0].sample(n=exited_df.shape[0])

new_df = pd.concat([exited_df, not_exited_df])
new_df_raw = new_df.sample(frac=1, random_state=42)

With this new raw dataset you can re-run the individual test to see if it passes the class imbalance test requirement. Remember to register new VM dataset object since that is the type of input required by `run_test()`:


In [None]:
# Register new data and now 'vm_raw_dataset_new' is the new dataset object of interest
vm_raw_dataset_new = vm.init_dataset(
    dataset=new_df_raw,
    input_id="new_df_raw",
    target_column="Exited",
)

In [None]:
test = vm.tests.run_test(
    test_id="validmind.data_validation.ClassImbalance",
    inputs={"dataset": vm_raw_dataset_new},
    params={"min_percent_threshold": 30},
)

### Utilize Test Output

Below is an example on how you can utilize the output from a ValidMind test for futher use, for example, if you want to remove highly correlated features then the below shows how you can get a pearson's correlation matrix, use the output to reduce the feature list for modeling


In [None]:
corr_results = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_raw_dataset_new},
)

Let's assume we want to remove highly correlated features from the dataset. `corr_results` is an object of type `ThresholdTestResult` and we can inspects its individual `results` to get access to the features that failed the test. In general, all ValidMind tests can return two different types of results:

- [MetricResult](https://docs.validmind.ai/validmind/validmind/vm_models.html#MetricResult): most metrics return this type of result
- [ThresholdTestResult](https://docs.validmind.ai/validmind/validmind/vm_models.html#ThresholdTest): metrics that compare a metric to a threshold return this type of result


In [None]:
print(corr_results.test_results)
print("test_name: ", corr_results.test_results.test_name)
print("params: ", corr_results.test_results.params)
print("passed: ", corr_results.test_results.passed)
print("results: ", corr_results.test_results.results)

Let's inspect the `results` and extract a list of features that failed the test:


In [None]:
corr_results.test_results.results

Remove the highly correlated features and create a new VM dataset object. Note the use of different `input_id`s. This allows tracking the inputs used when running each individual test.


In [None]:
high_correlation_features = [
    result.column
    for result in corr_results.test_results.results
    if result.passed == False
]
high_correlation_features

In [None]:
# Remove the highly correlated features from the dataset
new_df_raw.drop(columns=high_correlation_features, inplace=True)

# Re-initialize the dataset object
vm_raw_dataset_new = vm.init_dataset(
    dataset=new_df_raw,
    input_id="new_df_raw_no_age",
    target_column="Exited",
)

Re-running the test with the reduced feature set should pass the test. You can also plot the correlation matrix to visualize the new correlation between features:


In [None]:
corr_results = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_raw_dataset_new},
)

In [None]:
corr_results = vm.tests.run_test(
    test_id="validmind.data_validation.PearsonCorrelationMatrix",
    inputs={"dataset": vm_raw_dataset_new},
)

### Documenting the results based on two datasets

We have now done some analysis on two different datasets and we should able to document why certain things were done to the raw data with testing to support it. Every test result returned by the `run_test()` function has a `.log()` method that can be used to log the test results to ValidMind. When logging individual results to ValidMind you will need to manually add those results in a specific section of the model documentation.

When using `run_documentation_tests()`, it's possible to automatically populate a section with the results of all tests that were registered in the documentation template.

To populate the data preparation section of the documentatio, you will now complete the following steps:

1. Run `run_documentation_tests()` using `vm_raw_dataset_new` as input
2. Log the individual result high correlation test using `vm_raw_dataset` (no data cleanup)
3. Log the individual result high correlation test using `vm_raw_dataset_new` (balanced classes and reduced features)

After adding test driven blocks for steps #2 and #3 you will be able to explain the changes made to the raw data by editing the default description of the test result.


#### Run `run_documentation_tests()` using `vm_raw_dataset_new` as input

`run_documentation_tests()` allows you to run multiple tests at once and log the results to the documentation. The function takes the following arguments:

- `inputs`: any inputs to be passed to the tests
- `config`: a dictionary `<test_id>:<test_config>` that allows configuring each test individually. Each test config has the following form:
  - `params`: individual test parameters
  - `inputs`: individual test inputs. When passed, this overrides any inputs passed from the `run_documentation_tests()` function


In [None]:
test_config = {
    "validmind.data_validation.ClassImbalance": {
        "params": {"min_percent_threshold": 30},
    },
    "validmind.data_validation.HighPearsonCorrelation": {
        "params": {"max_threshold": 0.3},
    },
}

tests_suite = vm.run_documentation_tests(
    inputs={
        "dataset": vm_raw_dataset_new,
    },
    config=test_config,
    section=["data_preparation"],
)

#### Log the individual result high correlation test using `vm_raw_dataset` (no data cleanup)

Here you can use a custom `result_id` to tag the individual result with a unique identifier. This `result_id` can be appended to `test_id` with a `:` separator.


In [None]:
result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation:vm_raw_dataset",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_raw_dataset},
)
result.log()

#### Log the individual result high correlation test using `vm_raw_dataset_new` (balanced classes and reduced features)

Repeat the same process as above but with the new dataset, using a new `result_id`.


In [None]:
result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation:vm_raw_dataset_new",
    params={"max_threshold": 0.3},
    inputs={
        "dataset": vm_raw_dataset_new,
    },
)
result.log()

### Add individual test results to model documentation

You can now visit the documentation page for the model you connected to at the beginning of this notebook and add a new content block in the relevant section.

To do this, go to the documentation page of your model and navigate to the `Data Preparation` -> `Correlations and Interactions` section. Then hover after the "Pearson Correlation Matrix" content block to reveal the `+` button as shown in the screenshot below.

![screenshot showing insert button for test-driven blocks](../images/insert-test-driven-block.png)

Click on the `+` button and select `Test-Driven Block`. This will open a dialog where you can select `Threshold Test` as the type of the test-driven content block, and then select the `High Pearson Correlation Vm Raw Dataset Test` metric. This will show a preview of the result and it should match the results shown above.

![screenshot showing the selected test result in the dialog](../images/selecting-high-pearson-correlation-test.png)

Finally, click on the `Insert block` button to add the test result to the documentation. You'll now see two individual results for the high correlation test in the `Correlations and Interactions` section of the documentation. To finalize the documentation, you can edit the test result's description block to explain the changes made to the raw data and the reasons behind them as we can see in the screenshot below.

![screenshot showing the high pearson correlation block](../images/high-pearson-correlation-block.png)


### Model Testing

We have focused so far on the data assessment and pre-processing that usually occurs prior to any models being built. Now we are going to assume we have built a model and now we want to incorporate some model results in our documentation.

Let's train a simple logistic regression model on the dataset and evaluate its performance. You will use the `LogisticRegression` class from the `sklearn.linear_model` and use ValidMind tests to evaluate the model's performance.

Before training the model, we need to encode the categorical features in the dataset. You will use the `OneHotEncoder` class from the `sklearn.preprocessing` module to encode the categorical features. The categorical features in the dataset are `Geography` and `Gender`.


In [None]:
new_df_raw.head()

In [None]:
new_df_raw = pd.get_dummies(
    new_df_raw, columns=["Geography", "Gender"], drop_first=True
)
new_df_raw.head()

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

# Split the input and target variables
X = new_df_raw.drop("Exited", axis=1)
y = new_df_raw["Exited"]
X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42,
)

# Logistic Regression grid params
log_reg_params = {
    "penalty": ["l1", "l2"],
    "C": [0.001, 0.01, 0.1, 1, 10, 100, 1000],
    "solver": ["liblinear"],
}

# Grid search for Logistic Regression
from sklearn.model_selection import GridSearchCV

grid_log_reg = GridSearchCV(LogisticRegression(), log_reg_params)
grid_log_reg.fit(X_train, y_train)

# Logistic Regression best estimator
log_reg = grid_log_reg.best_estimator_

### Initialize model evaluation objects and assigning predictions

The last step for evaluating the model's performance is to initialize the ValidMind datasets ande model objects and assign model predictions to each dataset. You will use the `init_dataset`, `init_model` and `assign_predictions` functions to initialize these objects.


In [None]:
TRAIN = X_train
TRAIN["Exited"] = y_train
TEST = X_test
TEST["Exited"] = y_test

vm_train_ds = vm.init_dataset(
    input_id="train_dataset_final",
    dataset=TRAIN,
    target_column="Exited",
)

vm_test_ds = vm.init_dataset(
    input_id="test_dataset_final",
    dataset=TEST,
    target_column="Exited",
)

# Register the model
vm_model = vm.init_model(log_reg, input_id="log_reg_model_v1")

Once the model has been registered you can assign model predictions to the training and test datasets. The `assign_predictions()` method from the `Dataset` object can link existing predictions to any number of models. If no prediction values are passed, the method will compute predictions automatically:


In [None]:
vm_train_ds.assign_predictions(model=vm_model)
vm_test_ds.assign_predictions(model=vm_model)

### Run the model evaluation tests

In this part, we focus on running the tests within the model development section of the model documentation. After running this function, only the tests associated with this section will be executed, and the corresponding section in the model documentation will be updated. In the example below, you will focus on only running tests for the "model development" section of the document.

Note the additional config that is passed to `run_documentation_tests()`. This allows you to override inputs or params in certain tests. In our case, we want to explicitly use the `vm_train_ds` for the `validmind.model_validation.sklearn.ClassifierPerformance:in_sample` test, since it's supposed to run on the training dataset and not the test dataset.


In [None]:
test_config = {
    "validmind.model_validation.sklearn.ClassifierPerformance:in_sample": {
        "inputs": {
            "dataset": vm_train_ds,
            "model": vm_model,
        },
    }
}
results = vm.run_documentation_tests(
    section=["model_development"],
    inputs={
        "dataset": vm_test_ds,  # Any test that requires a single dataset will use vm_test_ds
        "model": vm_model,
        "datasets": (
            vm_train_ds,
            vm_test_ds,
        ),  # Any test that requires multiple datasets will use vm_train_ds and vm_test_ds
    },
    config=test_config,
)

### 3. Custom metrics implementation

This section assumes that model developers already have a repository of custom made tests that they consider critical to include in the documentation. Here we will provide details on how to easily integrate your custom tests with ValidMind.

For a more in-depth introduction to custom metrics, refer to this [notebook](../code_samples/custom_tests/implementing_custom_tests.ipynb).

A custom metric is any function that takes as arguments a set of inputs and optionally some parameters and returns one or more outputs. The function can be as simple or as complex as you need it to be. It can use external libraries, make API calls, or do anything else that you can do in Python. The only requirement is that the function signature and return values can be "understood" and handled by the ValidMind developer framework. As such, custom metrics offer added flexibility by extending the default metrics provided by ValidMind, enabling you to document any type of model or use case.

In the following example, you will learn how to implement a custom metric that calculates the confusion matrix for a binary classification model. You will see that the custom metric function is just a regular Python function that can include and require any Python library as you see fit.


#### Create a confusion matrix plot

To understand how to create a custom metric from anything, let's first create a confusion matrix plot using the `confusion_matrix` function from the `sklearn.metrics` module.


In [None]:
import matplotlib.pyplot as plt
from sklearn import metrics

# Get the predicted classes
y_pred = log_reg.predict(X_test)

confusion_matrix = metrics.confusion_matrix(y_test, y_pred)

cm_display = metrics.ConfusionMatrixDisplay(
    confusion_matrix=confusion_matrix, display_labels=[False, True]
)
cm_display.plot()

We will now create a @vm.metric wrapper that will allow you to create a reusable custom metric. Note the following changes in the code below:

- The function `confusion_matrix` takes two arguments `dataset` and `model`. This is a `VMDataset` and `VMModel` object respectively.
  - `VMDataset` objects allow you to access the dataset's true (target) values by accessing the `.y` attribute.
  - `VMDataset` objects allow you to access the predictions for a given model by accessing the `.y_pred()` method.
- The function docstring provides a description of what the metric does. This will be displayed along with the result in this notebook as well as in the ValidMind platform.
- The function body calculates the confusion matrix using the `sklearn.metrics.confusion_matrix` function as we just did above.
- The function then returns the `ConfusionMatrixDisplay.figure_` object - this is important as the ValidMind framework expects the output of the custom metric to be a plot or a table.
- The `@vm.metric` decorator is doing the work of creating a wrapper around the function that will allow it to be run by the ValidMind framework. It also registers the metric so it can be found by the ID `my_custom_metrics.ConfusionMatrix` (see the section below on how test IDs work in ValidMind and why this format is important)


In [None]:
@vm.metric("my_custom_metrics.ConfusionMatrix")
def confusion_matrix(dataset, model):
    """The confusion matrix is a table that is often used to describe the performance of a classification model on a set of data for which the true values are known.

    The confusion matrix is a 2x2 table that contains 4 values:

    - True Positive (TP): the number of correct positive predictions
    - True Negative (TN): the number of correct negative predictions
    - False Positive (FP): the number of incorrect positive predictions
    - False Negative (FN): the number of incorrect negative predictions

    The confusion matrix can be used to assess the holistic performance of a classification model by showing the accuracy, precision, recall, and F1 score of the model on a single figure.
    """
    y_true = dataset.y
    y_pred = dataset.y_pred(model_id=model.input_id)

    confusion_matrix = metrics.confusion_matrix(y_true, y_pred)

    cm_display = metrics.ConfusionMatrixDisplay(
        confusion_matrix=confusion_matrix, display_labels=[False, True]
    )
    cm_display.plot()

    plt.close()  # close the plot to avoid displaying it

    return cm_display.figure_  # return the figure object itself

You can now run the newly created custom metric on both the training and test datasets using the `run_test()` function:


In [None]:
# Training dataset
result = vm.tests.run_test(
    "my_custom_metrics.ConfusionMatrix:training_dataset",
    inputs={"model": vm_model, "dataset": vm_train_ds},
)

In [None]:
# Test dataset
result = vm.tests.run_test(
    "my_custom_metrics.ConfusionMatrix:test_dataset",
    inputs={"model": vm_model, "dataset": vm_test_ds},
)

#### Adding parameters to custom metrics

Custom metrics can take parameters just like any other function.


### Register external test providers (custom test)

We will now declare a local filesystem test provider that allows loading tests from a local folder. Fror this to work we just need to specify the root folder under which the provider class will look for tests. For this demo, it is the `./tests/` directory.

[PLACEHOLDER FOR TEAM TO ADD MORE DETAILS ON THE FLOW HERE]  
WE NEED HOW THE CODE SHOULD BE STRUCTURED AND GOAL: MAKE IT AS EASY AS POSSIBLE
CAN WE ADD MULTIPLE TESTS IN THE PYTHON FILE?


In [None]:
from validmind.tests import LocalTestProvider

# First we are going define a name so that we can always refer back and find our custom tests. In this example "gbc_test_provides" is the identifier
gbc_namespace = "gbc_test_provider"

# Setting up the connection to where the custom testing code lives.
local_test_provider = LocalTestProvider(root_folder="./tests/")

# Now let's register the test under the name we defined above
vm.tests.register_test_provider(
    namespace=gbc_namespace,
    test_provider=local_test_provider,
)

### Implementing & Executing Custom Test in Model Documentation

Let's now build a sample custom test that includes the outputs from a demo function called `get_marginal_bad_rates`. Inside the `tests/` directory next to this notebook you will find a file called `MarginalBadRateTest.py`. This file contains the custom test definition that we will run in the next cell. If you open that file you'll see how we invoke the `get_marginal_bad_rates` function from the `run()` method provided by the test interface.


In [None]:
# The custom test is found by searching for the name space created above with the Python file name 'MarginalBadRateTest'
# This runs the test on the dataset object 'vm_train_ds' with model object 'vm_model'
test = vm.tests.run_test(
    test_id=f"{gbc_namespace}.MarginalBadRateTest",
    inputs={
        "dataset": vm_train_ds,
        "model": vm_model,
    },
)
test.log()

#### Change the parameters and implement in Model Documentation

Note how we have defined the following property in the custom test class (i.e. parameter in custom test):

```python
default_params = {"bins": 10}
```

This allows you to pass parameters to the test when running it. Let's try to re-running the test with 15 bins instead. In this custom test the bins affecting the figures and table output.


In [None]:
# This test is run exactly the same as before but now you can see an additional line; 'params={"bins":15}' which will overwrite default bin value of 10

test = vm.tests.run_test(
    test_id=f"{gbc_namespace}.MarginalBadRateTest",
    inputs={
        "dataset": vm_train_ds,
        "model": vm_model,
    },
    params={"bins": 15},
).log()

#### Using another dataset

The inputs to the test can also can be changed. Let's try to re-run the test with the test dataset instead of the training dataset.

[PLACEHOLDER CAN WE IMPLEMENT TWO DATASET RESULTS FOR ONE TEST RUN?]


In [None]:
test = vm.tests.run_test(
    test_id=f"{zopa_namespace}.MarginalBadRateTest",
    inputs={
        "dataset": vm_test_ds,
        "model": vm_model,
    },
).log()

### Incorporate Custom Test in Model Documentation [PLACEHOLDER TEAM - IS there a way to incorporate the test programatically without going to UI?]

Now, let's try visualizing these results in the ValidMind dashboard. Since we have called `test.log()` when running these tests their results are automatically logged to the ValidMind platform.

Go to the ValidMind UI, select your model in the registry and go to the documentation page of your model and navigate to the `Model Development` -> `Model Evaluation` section. Then hover between any existing content block to reveal the `+` button as shown in the screenshot below.

![screenshot showing insert button for test-driven blocks](images/insert-test-driven-block.png)

Click on the `+` button and select `Test-Driven Block`. This will open a dialog where you can select `Metric` as the type of the test-driven content block, and then select the `GBC Test Provider Marginal Bad Rate Test` metric. This will show a preview of the composite metric and it should match the results shown above.

![screenshot showing the selected composite metric in the dialog](images/selecting-bad-rates-metric.png)

Finally, click on the `Insert block` button to add the composite metric to the documentation. You'll see the composite metric displayed in the documentation and now anytime you run `run_documentation_tests()`, the `Model Performance` composite metric will be run as part of the test suite. Let's go ahead and connect to the documentation project and run the tests.


### 4. Finalize Testing and Documentation

In this section we will show how to finalize the testing and documentation by showing the following items:

1. How to run documentation and update the configuration so we can implement custom tests and additional tests in documentation sections
2. How to overwrite individual tests with new data or new model
3. How to go deeper in the configuration of parameters for model diagnosis testing
4. MORE? (specific to model development persona....)


#### 4.1 Programtically change the documentation configuration

Below you will observe how you can first preview the current configuration using the `vm.get_test_suite().get_default_config()` interface.


In [None]:
import json

project_test_suite = vm.get_test_suite()
config = project_test_suite.get_default_config()
print("Suite Config: \n", json.dumps(config, indent=2))

##### Updating config

The test configuration can be updated to fit with your use case and requirements but below you can see examples where several datasets are provided.

[PLACEHOLDER CAN WE PROVIDE EXAMPLES ON HOW TO ADD A TEST IN A SECTION - PREFERABLY A CUSTOM TET?]


In [None]:
config = {
    "validmind.data_validation.DatasetSplit": {
        "inputs": {"datasets": (vm_train_ds, vm_test_ds)},
    },
    "validmind.model_validation.sklearn.PopulationStabilityIndex": {
        "inputs": {"model": vm_model, "datasets": (vm_train_ds, vm_test_ds)},
    },
    "validmind.model_validation.sklearn.ConfusionMatrix": {
        "inputs": {"model": vm_model, "dataset": vm_test_ds},
    },
    "validmind.model_validation.sklearn.ClassifierPerformance:in_sample": {
        "inputs": {"model": vm_model, "dataset": vm_train_ds},
    },
    "validmind.model_validation.sklearn.ClassifierPerformance:out_of_sample": {
        "inputs": {"model": vm_model, "dataset": vm_test_ds},
    },
    "validmind.model_validation.sklearn.PrecisionRecallCurve": {
        "inputs": {"model": vm_model, "dataset": vm_test_ds},
    },
    "validmind.model_validation.sklearn.ROCCurve": {
        "inputs": {"model": vm_model, "dataset": vm_test_ds},
    },
    "validmind.model_validation.sklearn.TrainingTestDegradation": {
        "inputs": {"model": vm_model, "datasets": (vm_train_ds, vm_test_ds)},
    },
    "validmind.model_validation.sklearn.MinimumAccuracy": {
        "inputs": {"model": vm_model, "dataset": vm_test_ds},
    },
    "validmind.model_validation.sklearn.MinimumF1Score": {
        "inputs": {"model": vm_model, "dataset": vm_test_ds},
    },
    "validmind.model_validation.sklearn.MinimumROCAUCScore": {
        "inputs": {"model": vm_model, "dataset": vm_test_ds},
    },
    "validmind.model_validation.sklearn.PermutationFeatureImportance": {
        "inputs": {"model": vm_model, "dataset": vm_test_ds},
    },
    "validmind.model_validation.sklearn.SHAPGlobalImportance": {
        "inputs": {"model": vm_model, "dataset": vm_test_ds},
    },
    "validmind.model_validation.sklearn.WeakspotsDiagnosis": {
        "inputs": {"model": vm_model, "datasets": (vm_train_ds, vm_test_ds)},
    },
    "validmind.model_validation.sklearn.OverfitDiagnosis": {
        "inputs": {"model": vm_model, "datasets": (vm_train_ds, vm_test_ds)},
    },
    "validmind.model_validation.sklearn.RobustnessDiagnosis": {
        "inputs": {"model": vm_model, "datasets": (vm_train_ds, vm_test_ds)},
    },
}

### Run documentation tests

You can now run all documentation tests and pass an extra `config` parameter that overrides input and parameter configuration for the tests specified in the object.


In [None]:
full_suite = vm.run_documentation_tests(
    inputs={
        "dataset": vm_raw_ds,
        "model": vm_model,
        "datasets": (vm_train_ds, vm_test_ds),
    },
    config=config,
).log()

#### 4.2 Overwrite a test that has been docmented

In this example we are showing how you can easily overwrite a test results. For example, let's assume you did some inital testing and logged results but for some reason you had to change the data used for model training and testing and as a consequence updated tests have to be implemented


#### 4.3 Configure parameters for model diagnosis tests

Each test has its default parameters and their values depending on the use case you are trying to solve. ValidMind's developer framework exposes these parameters at the user level so that they can be adjusted based on requirements.

The config can be applied to a specific test to override the default configuration parameters.

The format of the config is:

```
config = {
    "<test1_id>": {
        "<default_param_1>": value,
        "<default_param_2>": value,
    },
     "<test2_id>": {
        "<default_param_1>": value,
        "<default_param_2>": value,
    },
}
```

Users can input the configuration to `run_documentation_tests()` and `run_test_suite()` using **`config`**, allowing fine-tuning the suite according to the specific configuration requirements.


In [None]:
# In the example below we are making the test more specific for certain columns. For example, in test Weak Spot Diagnosis I only want to perform this test on Age and Balance features.

config = {
    "validmind.model_validation.sklearn.OverfitDiagnosis": {
        "params": {
            "cut_off_percentage": 3,
            "feature_columns": ["Age", "Balance", "Tenure", "NumOfProducts"],
        },
    },
    "validmind.model_validation.sklearn.WeakspotsDiagnosis": {
        "params": {
            "features_columns": ["Age", "Balance"],
            "accuracy_gap_threshold": 85,
        },
    },
    "validmind.model_validation.sklearn.RobustnessDiagnosis": {
        "params": {
            "features_columns": ["Balance", "Tenure"],
            "scaling_factor_std_dev_list": [0.0, 0.1, 0.2, 0.3, 0.4, 0.5],
            "accuracy_decay_threshold": 4,
        },
    },
}

full_suite = vm.run_documentation_tests(
    inputs={
        "dataset": vm_train_ds,
        "datasets": (vm_train_ds, vm_test_ds),
        "model": vm_model,
    },
    config=config,
    section="model_diagnosis",
)

### Next steps

You can look at the results of this test plan right in the notebook where you ran the code, as you would expect. But there is a better way: view the test results as part of your model documentation right in the ValidMind Platform UI:

1. In the [Platform UI](https://app.prod.validmind.ai), go to the **Documentation** page for the model you registered earlier.

2. Expand **Model Development**

What you can see now is a more easily consumable version of the model diagnosis tests you just performed, along with other parts of your model documentation that still need to be completed.

If you want to learn more about where you are in the model documentation process, take a look at <a href="https://docs.validmind.ai/guide/get-started-developer-framework.html#how-do-i-use-the-framework"> How do I use the framework? </a>.
