# ValidMind for model validation 3 — Developing a potential challenger model

Learn how to use ValidMind for your end-to-end model validation process with our series of four introductory notebooks. In this third notebook, develop a potential challenger model and then pass your model and its predictions to ValidMind.

A *challenger model* is an alternate model that attempts to outperform the champion model, ensuring that the best performing fit-for-purpose model is always considered for deployment. Challenger models also help avoid over-reliance on a single model, and allow testing of new features, algorithms, or data sources without disrupting the production lifecycle.

<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;"><span style="color: #083E44;"><b>Learn by doing</b></span>
<br></br>
Our course tailor-made for validators new to ValidMind combines this series of notebooks with more a more in-depth introduction to the ValidMind Platform — <a href="https://docs.validmind.ai/training/validator-fundamentals/validator-fundamentals-register.html" style="color: #DE257E;"><b>Validator Fundamentals</b></a></div>

::: {.content-hidden when-format="html"}
## Contents    
- [Prerequisites](#toc1_)    
- [Setting up](#toc2_)    
  - [Initialize the ValidMind Library](#toc2_1_)    
  - [Import the sample dataset](#toc2_2_)    
    - [Preprocess the dataset](#toc2_2_1_)    
  - [Split the preprocessed dataset](#toc2_3_)    
- [Import the champion model](#toc3_)    
- [Training a potential challenger model](#toc4_)    
  - [Random forest classification model](#toc4_1_)    
- [Initializing the model objects](#toc5_)    
  - [Initialize the model objects](#toc5_1_)    
  - [Assign predictions](#toc5_2_)    
- [Running model evaluation tests](#toc6_)    
  - [Run model performance tests](#toc6_1_)    
    - [Evaluate performance of the champion model](#toc6_1_1_)    
    - [Log a model finding](#toc6_1_2_)    
    - [Evaluate performance of challenger model](#toc6_1_3_)    
  - [Run diagnostic tests](#toc6_2_)    
  - [Run feature importance tests](#toc6_3_)    
- [In summary](#toc7_)    
- [Next steps](#toc8_)    
  - [Finalize validation and reporting](#toc8_1_)    

:::
<!-- jn-toc-notebook-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=4
	/jn-toc-notebook-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

<a id='toc1_'></a>

## Prerequisites

In order to develop potential challenger models with this notebook, you'll need to first have:

- [x] Registered a model within the ValidMind Platform and granted yourself access to the model as a validator
- [x] Installed the ValidMind Library in your local environment, allowing you to access all its features
- [x] Learned how to import and initialize datasets for use with ValidMind
- [x] Understood the basics of how to run and log tests with ValidMind
- [x] Run data quality tests on the datasets used to train the champion model, and logged the results of those tests to ValidMind
- [x] Inserted your logged test results into your validation report

<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;"><span style="color: #083E44;"><b>Need help with the above steps?</b></span>
<br></br>
Refer to the first two notebooks in this series:

- <a href="1-set_up_validmind_for_validation.ipynb" style="color: #DE257E;"><b>1 — Set up the ValidMind Library for validation</b></a>
- <a href="2-start_validation_process.ipynb" style="color: #DE257E;"><b>2 — Start the model validation process</b></a>

</div>

<a id='toc2_'></a>

## Setting up

This section should be quite familiar to you — as we performed the same actions in the previous notebook, **[2 — Start the model validation process](2-start_validation_process.ipynb)**.

<a id='toc2_1_'></a>

### Initialize the ValidMind Library

As usual, let's first connect up the ValidMind Library to our model we previously registered in the ValidMind Platform:

1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).

2. In the left sidebar, navigate to **Inventory** and select the model you registered for this "ValidMind for model validation" series of notebooks.

3. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, [load your model identifier credentials from an `.env` file](https://docs.validmind.ai/developer/model-documentation/store-credentials-in-env-file.html) or replace the placeholder with your own code snippet:

In [None]:
# Make sure the ValidMind Library is installed

%pip install -q validmind

# Load your model identifier credentials from an `.env` file

%load_ext dotenv
%dotenv .env

# Or replace with your code snippet

import validmind as vm

vm.init(
    # api_host="...",
    # api_key="...",
    # api_secret="...",
    # model="...",
)

<a id='toc2_2_'></a>

### Import the sample dataset

Next, we'll load in the sample [Bank Customer Churn Prediction](https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction) dataset used to develop the champion model that we will independently preprocess:

In [None]:
# Load the sample dataset
from validmind.datasets.classification import customer_churn as demo_dataset

print(
    f"Loaded demo dataset with: \n\n\t• Target column: '{demo_dataset.target_column}' \n\t• Class labels: {demo_dataset.class_labels}"
)

raw_df = demo_dataset.load_data()

<a id='toc2_2_1_'></a>

#### Preprocess the dataset

We’ll apply a simple rebalancing technique to the dataset before continuing:

In [None]:
import pandas as pd

raw_copy_df = raw_df.sample(frac=1)  # Create a copy of the raw dataset

# Create a balanced dataset with the same number of exited and not exited customers
exited_df = raw_copy_df.loc[raw_copy_df["Exited"] == 1]
not_exited_df = raw_copy_df.loc[raw_copy_df["Exited"] == 0].sample(n=exited_df.shape[0])

balanced_raw_df = pd.concat([exited_df, not_exited_df])
balanced_raw_df = balanced_raw_df.sample(frac=1, random_state=42)

Let’s also quickly remove highly correlated features from the dataset using the output from a ValidMind test.

As you know, before we can run tests you’ll need to initialize a ValidMind dataset object with the [`init_dataset` function](https://docs.validmind.ai/validmind/validmind.html#init_dataset):

In [None]:
# Register new data and now 'balanced_raw_dataset' is the new dataset object of interest
vm_balanced_raw_dataset = vm.init_dataset(
    dataset=balanced_raw_df,
    input_id="balanced_raw_dataset",
    target_column="Exited",
)

With our balanced dataset initialized, we can then run our test and utilize the output to help us identify the features we want to remove:

In [None]:
# Run HighPearsonCorrelation test with our balanced dataset as input and return a result object
corr_result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_balanced_raw_dataset},
)

In [None]:
# From result object, extract table from `corr_result.tables`
features_df = corr_result.tables[0].data
features_df

In [None]:
# Extract list of features that failed the test
high_correlation_features = features_df[features_df["Pass/Fail"] == "Fail"]["Columns"].tolist()
high_correlation_features

In [None]:
# Extract feature names from the list of strings
high_correlation_features = [feature.split(",")[0].strip("()") for feature in high_correlation_features]
high_correlation_features

We can then re-initialize the dataset with a different `input_id` and the highly correlated features removed and re-run the test for confirmation:

In [None]:
# Remove the highly correlated features from the dataset
balanced_raw_no_age_df = balanced_raw_df.drop(columns=high_correlation_features)

# Re-initialize the dataset object
vm_raw_dataset_preprocessed = vm.init_dataset(
    dataset=balanced_raw_no_age_df,
    input_id="raw_dataset_preprocessed",
    target_column="Exited",
)

In [None]:
# Re-run the test with the reduced feature set
corr_result = vm.tests.run_test(
    test_id="validmind.data_validation.HighPearsonCorrelation",
    params={"max_threshold": 0.3},
    inputs={"dataset": vm_raw_dataset_preprocessed},
)

<a id='toc2_3_'></a>

### Split the preprocessed dataset

With our raw dataset rebalanced with highly correlated features removed, let's now **spilt our dataset into train and test** in preparation for model evaluation testing:

In [None]:
# Encode categorical features in the dataset
balanced_raw_no_age_df = pd.get_dummies(
    balanced_raw_no_age_df, columns=["Geography", "Gender"], drop_first=True
)
balanced_raw_no_age_df.head()

In [None]:
from sklearn.model_selection import train_test_split

# Split the dataset into train and test
train_df, test_df = train_test_split(balanced_raw_no_age_df, test_size=0.20)

X_train = train_df.drop("Exited", axis=1)
y_train = train_df["Exited"]
X_test = test_df.drop("Exited", axis=1)
y_test = test_df["Exited"]

In [None]:
# Initialize the split datasets
vm_train_ds = vm.init_dataset(
    input_id="train_dataset_final",
    dataset=train_df,
    target_column="Exited",
)

vm_test_ds = vm.init_dataset(
    input_id="test_dataset_final",
    dataset=test_df,
    target_column="Exited",
)

<a id='toc3_'></a>

## Import the champion model

With our raw dataset assessed and preprocessed, let's go ahead and import the champion model submitted by the model development team in the format of a `.pkl` file: **[lr_model_champion.pkl](lr_model_champion.pkl)**

In [None]:
# Import the champion model
import pickle as pkl

with open("lr_model_champion.pkl", "rb") as f:
    log_reg = pkl.load(f)

<a id='toc4_'></a>

## Training a potential challenger model

We're curious how an alternate model compares to our champion model, so let's train a challenger model as a basis for our testing.

Our champion *logistic regression model* is a simpler, parametric model that assumes a linear relationship between the independent variables and the log-odds of the outcome. While logistic regression may not capture complex patterns as effectively, it offers a high degree of interpretability and is easier to explain to stakeholders. However, model risk is not calculated in isolation from a single factor, but rather in consideration with trade-offs in predictive performance, ease of interpretability, and overall alignment with business objectives.

<a id='toc4_1_'></a>

### Random forest classification model

A *random forest classification model* is an ensemble machine learning algorithm that uses multiple decision trees to classify data. In ensemble learning, multiple models are combined to improve prediction accuracy and robustness.

Random forest classification models generally have higher accuracy because they capture complex, non-linear relationships, but as a result they lack transparency in their predictions.

In [None]:
# Import the Random Forest Classification model
from sklearn.ensemble import RandomForestClassifier

# Create the model instance with 50 decision trees
rf_model = RandomForestClassifier(
    n_estimators=50,
    random_state=42,
)

# Train the model
rf_model.fit(X_train, y_train)

<a id='toc5_'></a>

## Initializing the model objects

<a id='toc5_1_'></a>

### Initialize the model objects

In addition to the initialized datasets, you'll also need to initialize a ValidMind model object (`vm_model`) that can be passed to other functions for analysis and tests on the data for each of our two models.

You simply initialize this model object with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model):

In [None]:
# Initialize the champion logistic regression model
vm_log_model = vm.init_model(
    log_reg,
    input_id="log_model_champion",
)

# Initialize the challenger random forest classification model
vm_rf_model = vm.init_model(
    rf_model,
    input_id="rf_model",
)

<a id='toc5_2_'></a>

### Assign predictions

With our models registered, we'll move on to assigning both the predictive probabilities coming directly from each model's predictions, and the binary prediction after applying the cutoff threshold described in the Compute binary predictions step above.

- The [`assign_predictions()` method](https://docs.validmind.ai/validmind/validmind/vm_models.html#assign_predictions) from the `Dataset` object can link existing predictions to any number of models.
- This method links the model's class prediction values and probabilities to our `vm_train_ds` and `vm_test_ds` datasets.

If no prediction values are passed, the method will compute predictions automatically:

In [None]:
# Champion — Logistic regression model
vm_train_ds.assign_predictions(model=vm_log_model)
vm_test_ds.assign_predictions(model=vm_log_model)

# Challenger — Random forest classification model
vm_train_ds.assign_predictions(model=vm_rf_model)
vm_test_ds.assign_predictions(model=vm_rf_model)

<a id='toc6_'></a>

## Running model evaluation tests

With our setup complete, let's run the rest of our validation tests. Since we have already verified the data quality of the dataset used to train our champion model, we will now focus on comprehensive performance evaluations of both the champion and challenger models.

<a id='toc6_1_'></a>

### Run model performance tests

Let's run some performance tests, beginning with independent testing of our champion logistic regression model, then moving on to our potential challenger model.

Use [`vm.tests.list_tests()`](https://docs.validmind.ai/validmind/validmind/tests.html#list_tests) to identify all the model performance tests for classification:

In [None]:

vm.tests.list_tests(tags=["model_performance"], task="classification")

We'll isolate the specific tests we want to run in `mpt`:

- [`ClassifierPerformance`](https://docs.validmind.ai/tests/model_validation/sklearn/ClassifierPerformance.html)
- [`ConfusionMatrix`](https://docs.validmind.ai/tests/model_validation/sklearn/ConfusionMatrix.html)
- [`MinimumAccuracy`](https://docs.validmind.ai/tests/model_validation/sklearn/MinimumAccuracy.html)
- [`MinimumF1Score`](https://docs.validmind.ai/tests/model_validation/sklearn/MinimumF1Score.html)
- [`ROCCurve`](https://docs.validmind.ai/tests/model_validation/sklearn/ROCCurve.html)

As we learned in the previous notebook [2 — Start the model validation process](2-start_validation_process.ipynb), you can use a custom `result_id` to tag the individual result with a unique identifier by appending this `result_id` to the `test_id` with a `:` separator. We'll append an identifier for our champion model here:

In [None]:
mpt = [
    "validmind.model_validation.sklearn.ClassifierPerformance:logreg_champion",
    "validmind.model_validation.sklearn.ConfusionMatrix:logreg_champion",
    "validmind.model_validation.sklearn.MinimumAccuracy:logreg_champion",
    "validmind.model_validation.sklearn.MinimumF1Score:logreg_champion",
    "validmind.model_validation.sklearn.ROCCurve:logreg_champion"
]

<a id='toc6_1_1_'></a>

#### Evaluate performance of the champion model

Now, let's run and log our batch of model performance tests using our testing dataset (`vm_test_ds`) for our champion model:

- The test set serves as a proxy for real-world data, providing an unbiased estimate of model performance since it was not used during training or tuning.
- The test set also acts as protection against selection bias and model tweaking, giving a final, more unbiased checkpoint.

In [None]:
for test in mpt:
    vm.tests.run_test(
        test,
        inputs={
            "dataset": vm_test_ds, "model" : vm_log_model,
        },
    ).log()

<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;"><span style="color: #083E44;"><b>Note the output returned indicating that a test-driven block doesn't currently exist in your model's documentation for some test IDs. </b></span>
<br></br>
That's expected, as when we run validations tests the results logged need to be manually added to your report as part of your compliance assessment process within the ValidMind Platform.</div>

<a id='toc6_1_2_'></a>

#### Log a model finding

As we can observe from the output above, our champion model doesn't pass the `MinimumAccuracy` based on the default thresholds of the out-of-the-box test, so let's log a model finding in the ValidMind Platform  ([Need more help?](https://docs.validmind.ai/guide/model-validation/add-manage-model-findings.html)):

1. From the **Inventory** in the ValidMind Platform, go to the model you connected to earlier.

2. In the left sidebar that appears for your model, click **Validation Report** under Documents.

3. Locate the Data Preparation section and click on **2.2.2. Model Performance** to expand that section.

4. Under the Model Performance Metrics section, locate Findings then click **Link Finding to Report**:

    <img src= "link-finding.png" alt="Screenshot showing the validation report with the link finding option highlighted" style="border: 2px solid #083E44; border-radius: 8px; border-right-width: 2px; border-bottom-width: 3px;">
    <br><br>

5. Select **Validation Issue** as the type of finding.

6. Click **+ Add Validation Issue Finding** to add a finding.

76. Enter in the details for your finding, for example:

    - **TITLE** — Champion Logistic Regression Model Fails Minimum Accuracy Threshold
    - **RISK AREA** — Model Performance
    - **DOCUMENTATION SECTION** — 3.2. Model Evaluation
    - **DESCRIPTION** — The logistic regression champion model was subjected to a Minimum Accuracy test to determine whether its predictive accuracy meets the predefined performance threshold of 0.7. The model achieved an accuracy score of 0.6136, which falls below the required minimum. As a result, the test produced a Fail outcome.

8. Click **Save**.

9. Select the finding you just added to link to your validation report and click **Update Linked Findings** to insert your finding.

10. Click on the finding to expand the finding, where you can adjust details such as severity, owner, due date, status, etc. as well as include proposed remediation plans or supporting documentation as attachments.

<a id='toc6_1_3_'></a>

#### Evaluate performance of challenger model

We've now conducted similar tests as the model development team for our champion model, with the aim of verifying their test results.

Next, let's see how our challenger models compare. We'll use the same batch of tests here as we did in `mpt`, but append a different `result_id` to indicate that these results should be associated with our challenger model:

In [None]:
mpt_chall = [
    "validmind.model_validation.sklearn.ClassifierPerformance:champion_vs_challenger",
    "validmind.model_validation.sklearn.ConfusionMatrix:champion_vs_challenger",
    "validmind.model_validation.sklearn.MinimumAccuracy:champion_vs_challenger",
    "validmind.model_validation.sklearn.MinimumF1Score:champion_vs_challenger",
    "validmind.model_validation.sklearn.ROCCurve:champion_vs_challenger"
]

We'll run each test once for each model with the same `vm_test_ds` dataset to compare them:

In [None]:
for test in mpt_chall:
    vm.tests.run_test(
        test,
        input_grid={
            "dataset": [vm_test_ds], "model" : [vm_log_model,vm_rf_model]
        }
    ).log()

<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;"><span style="color: #083E44;"><b>Based on the performance metrics, our challenger random forest classification model passes the <code>MinimumAccuracy</code> where our champion did not.</b></span>
<br></br>
In your validation report, support your recommendation in your finding's <b>Proposed Remediation Plan</b> to investigate the usage of our challenger model by inserting the performance tests we logged with this notebook into the appropriate section.</div>

<a id='toc6_2_'></a>

### Run diagnostic tests

Next, we want to inspect the robustness and stability testing comparison between our champion and challenger model.

Use `list_tests()` to list all available diagnosis tests applicable to classification tasks:

In [None]:
vm.tests.list_tests(tags=["model_diagnosis"], task="classification")

Let’s now assess the models for potential signs of *overfitting* and identify any sub-segments where performance may inconsistent with the [`OverfitDiagnosis` test](https://docs.validmind.ai/tests/model_validation/sklearn/OverfitDiagnosis.html).

Overfitting occurs when a model learns the training data too well, capturing not only the true pattern but noise and random fluctuations resulting in excellent performance on the training dataset but poor generalization to new, unseen data:

- Since the training dataset (`vm_train_ds`) was used to fit the model, we use this set to establish a baseline performance for how well the model performs on data it has already seen.
- The testing dataset (`vm_test_ds`) was never seen during training, and here simulates real-world generalization, or how well the model performs on new, unseen data. 

In [None]:
vm.tests.run_test(
    test_id="validmind.model_validation.sklearn.OverfitDiagnosis:champion_vs_challenger",
    input_grid={
        "datasets": [[vm_train_ds,vm_test_ds]],
        "model" : [vm_log_model,vm_rf_model]
    }
).log()

Let's also conduct *robustness* and *stability* testing of the two models with the [`RobustnessDiagnosis` test](https://docs.validmind.ai/tests/model_validation/sklearn/RobustnessDiagnosis.html). Robustness refers to a model's ability to maintain consistent performance, and stability refers to a model's ability to produce consistent outputs over time across different data subsets.

Again, we'll use both the training and testing datasets to establish baseline performance and to simulate real-world generalization:

In [None]:
vm.tests.run_test(
    test_id="validmind.model_validation.sklearn.RobustnessDiagnosis:Champion_vs_LogRegression",
    input_grid={
        "datasets": [[vm_train_ds,vm_test_ds]],
        "model" : [vm_log_model,vm_rf_model]
    },
).log()

<a id='toc6_3_'></a>

### Run feature importance tests

We also want to verify the relative influence of different input features on our models' predictions, as well as inspect the differences between our champion and challenger model to see if a certain model offers more understandable or logical importance scores for features.

Use `list_tests()` to identify all the feature importance tests for classification:

In [None]:
# Store the feature importance tests
FI = vm.tests.list_tests(tags=["feature_importance"], task="classification",pretty=False)
FI

We'll only use our testing dataset (`vm_test_ds`) here, to provide a realistic, unseen sample that mimic future or production data, as the training dataset has already influenced our model during learning:

In [None]:
# Run and log our feature importance tests for both models for the testing dataset
for test in FI:
    vm.tests.run_test(
        "".join((test,':champion_vs_challenger')),
        input_grid={
            "dataset": [vm_test_ds], "model" : [vm_log_model,vm_rf_model]
        },
    ).log()

<a id='toc7_'></a>

## In summary

In this third notebook, you learned how to:

- [x] Initialize ValidMind model objects
- [x] Assign predictions and probabilities to your ValidMind model objects
- [x] Use tests from ValidMind to evaluate the potential of models, including comparative tests between champion and challenger models
- [x] Log a model finding in the ValidMind Platform

<a id='toc8_'></a>

## Next steps

<a id='toc8_1_'></a>

### Finalize validation and reporting

Now that you're familiar with the basics of using the ValidMind Library to run and log validation tests, let's learn how to implement some custom tests and wrap up our validation: **[4 — Finalize validation and reporting](4-finalize_validation_reporting.ipynb)**