# ValidMind for model validation — 114 Finalize testing and reporting

Learn how to use ValidMind for your end-to-end model validation process with our series of four introductory notebooks. In this last notebook, you'll configure and run some custom tests, then add logged test results to your validation report.

As we concluded in [113 Perform validation tests](113-perform_validation_tests.ipynb), our challenger random forest classification model was not a viable candidate for our use case and was eliminated as a contender. We'll finish up by comparing our champion application scorecard model against our remaining challenger logistic regression model, then use the ValidMind Platform to draft our validation report supplemented by our logged test results as evidence.

::: {.content-hidden when-format="html"}
## Contents    
- [Prerequisites](#toc1_)    
- [Setting up](#toc2_)    
  - [Initialize the ValidMind Library](#toc2_1_)    
  - [Import the champion model](#toc2_2_)    
  - [Train the challenger model](#toc2_3_)    
  - [Extract predicted probabilities](#toc2_4_)    
  - [Initialize the ValidMind objects](#toc2_5_)    
  - [Assign predictions](#toc2_6_)    
  - [Enable use case context](#toc2_7_)    
- [Adjust a ValidMind test](#toc3_)    
- [Run diagnostic tests](#toc4_)    
- [Run feature importance tests](#toc5_)    
- [Implement a custom test](#toc6_)    
- [Verify test runs](#toc7_)    
- [Add test results to reporting](#toc8_)    
- [In summary](#toc9_)    
- [Next steps](#toc10_)    
  - [Work with your validation report](#toc10_1_)    
  - [Learn more](#toc10_2_)    
    - [More how-to guides and code samples](#toc10_2_1_)    
    - [Discover more learning resources](#toc10_2_2_)    

:::
<!-- jn-toc-notebook-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=4
	/jn-toc-notebook-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

<a id='toc1_'></a>

## Prerequisites

In order to finalize the validation testing and reporting for your sample model, you'll need to first have:

- [ ] Registered a model within the ValidMind Platform and granted yourself access to the model as a validator
- [ ] Installed the ValidMind Library in your local environment, allowing you to access all its features
- [ ] Learned how to import and initialize datasets for use with ValidMind
- [ ] Learned how to enable custom context for test descriptions generated by ValidMind
- [ ] Understood the basics of how to identify and run validation tests
- [ ] Run data quality and model performance tests for your champion and challenger models, and logged the results of those tests to the ValidMind Platform


<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;"><span style="color: #083E44;"><b>Need help with the above steps?</b></span>
<br></br>
Refer to the first three notebooks in this series:

<ol>
    <li><a href="111-import_champion_model.ipynb" style="color: #DE257E;"><b>111 Import the champion model</b></a></li>
    <li><a href="112-develop_challenger_models.ipynb" style="color: #DE257E;"><b>112 Develop potential challenger models</b></a></li>
    <li><a href="113-perform_validation_tests.ipynb" style="color: #DE257E;"><b>113 Perform validation tests</b></a></li>
</ol>

</div>

<a id='toc2_'></a>

## Setting up

This section should be very familiar to you now — as we performed the same actions in the previous two notebooks in this series.

<a id='toc2_1_'></a>

### Initialize the ValidMind Library

As usual, let's first connect up the ValidMind Library to our model we previously registered in the ValidMind Platform:

1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).

2. In the left sidebar, navigate to **Inventory** and select the model you registered for this "ValidMind for model validation" series of notebooks.

3. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, [load your model identifier credentials from an `.env` file](https://docs.validmind.ai/developer/model-documentation/store-credentials-in-env-file.html) or replace the placeholder with your own code snippet:

In [None]:
# Make sure the ValidMind Library is installed

%pip install -q validmind

# Load your model identifier credentials from an `.env` file

%load_ext dotenv
%dotenv .env

# Or replace with your code snippet

import validmind as vm

vm.init(
    # api_host="...",
    # api_key="...",
    # api_secret="...",
    # model="...",
)

<a id='toc2_2_'></a>

### Import the champion model

Next, we'll import the champion model submitted by the model development team as we used in the last notebooks (**[xgb_model_champion.pkl](xgb_model_champion.pkl)**) and load in the same sample [Lending Club](https://www.kaggle.com/datasets/devanshi23/loan-data-2007-2014/data) dataset:

In [None]:
import xgboost as xgb

#Load the saved model
xgb_model = xgb.XGBClassifier()
xgb_model.load_model("xgb_model_champion.pkl")
xgb_model

# Ensure that we have to appropriate order in feature names from Champion model and dataset
cols_when_model_builds = xgb_model.get_booster().feature_names

In [None]:
# Import the Lending Club dataset from Kaggle
from validmind.datasets.credit_risk import lending_club

df = lending_club.load_data(source="offline")
df.head()

# Preprocess the dataset for data quality testing purposes
preprocess_df = lending_club.preprocess(df)

# Apply feature engineering to the dataset
fe_df = lending_club.feature_engineering(preprocess_df)
fe_df.head()

In [None]:
# Split our dataset into train and test to start the validation testing process
train_df, test_df = lending_club.split(fe_df, test_size=0.2)

x_train = train_df.drop(lending_club.target_column, axis=1)
y_train = train_df[lending_club.target_column]

x_test = test_df.drop(lending_club.target_column, axis=1)
y_test = test_df[lending_club.target_column]

# Now let's apply the order of features from the champion model construction
x_train = x_train[cols_when_model_builds]
x_test = x_test[cols_when_model_builds]

In [None]:
cols_use = ['annual_inc_woe',
 'verification_status_woe',
 'emp_length_woe',
 'installment_woe',
 'term_woe',
 'home_ownership_woe',
 'purpose_woe',
 'open_acc_woe',
 'total_acc_woe',
 'int_rate_woe',
 'sub_grade_woe',
 'grade_woe','loan_status']


train_df = train_df[cols_use]
test_df = test_df[cols_use]
test_df.head()

<a id='toc2_3_'></a>

### Train the challenger model

As we eliminated the random forest classification model as a challenger, we'll only train our logistic regression model here:

In [None]:
# Import the Logistic Regression model
from sklearn.linear_model import LogisticRegression

# Logistic Regression grid params
log_reg_params = {
    "penalty": ["l1", "l2"],
    "C": [0.001, 0.01, 0.1, 1, 10, 100, 1000],
    "solver": ["liblinear"],
}

# Grid search for Logistic Regression
from sklearn.model_selection import GridSearchCV

grid_log_reg = GridSearchCV(LogisticRegression(), log_reg_params)
grid_log_reg.fit(x_train, y_train)

# Logistic Regression best estimator
log_reg = grid_log_reg.best_estimator_
log_reg

<a id='toc2_4_'></a>

### Extract predicted probabilities

With our challenger model trained, let's extract the predicted probabilities from our two models and convert the probability predictions into a binary:

In [None]:
# Champion — Application scorecard model
train_xgb_prob = xgb_model.predict_proba(x_train)[:, 1]
test_xgb_prob = xgb_model.predict_proba(x_test)[:, 1]

# Challenger — Logistic regression model
train_log_prob = log_reg.predict_proba(x_train)[:, 1]
test_log_prob = log_reg.predict_proba(x_test)[:, 1]

In [None]:
# If probability > 0.3 = 1 (positive)
cut_off_threshold = 0.3

# Champion — Application scorecard model
train_xgb_binary_predictions = (train_xgb_prob > cut_off_threshold).astype(int)
test_xgb_binary_predictions = (test_xgb_prob > cut_off_threshold).astype(int)

# Challenger — Logistic regression model
train_log_binary_predictions = (train_log_prob > cut_off_threshold).astype(int)
test_log_binary_predictions = (test_log_prob > cut_off_threshold).astype(int)

<a id='toc2_5_'></a>

### Initialize the ValidMind objects

Let's initialize the ValidMind `Dataset` and `Model` objects in preparation for assigning model predictions to each dataset:

In [None]:
# Initialize the raw dataset
vm_raw_dataset = vm.init_dataset(
    dataset=df,
    input_id="raw_dataset",
    target_column=lending_club.target_column,
)

# Initialize the preprocessed dataset
vm_preprocess_dataset = vm.init_dataset(
    dataset=preprocess_df,
    input_id="preprocess_dataset",
    target_column=lending_club.target_column,
)

# Initialize the feature engineered dataset
vm_fe_dataset = vm.init_dataset(
    dataset=fe_df,
    input_id="fe_dataset",
    target_column=lending_club.target_column,
)

# Initialize the training dataset
vm_train_ds = vm.init_dataset(
    dataset=train_df,
    input_id="train_dataset",
    target_column=lending_club.target_column,
)

# Initialize the test dataset
vm_test_ds = vm.init_dataset(
    dataset=test_df,
    input_id="test_dataset",
    target_column=lending_club.target_column,
)

In [None]:
# Initialize the champion application scorecard model
vm_xgb_model = vm.init_model(
    xgb_model,
    input_id="xgb_model_developer_champion",
)

# Initialize the challenger logistic regression model
vm_log_model = vm.init_model(
    log_reg,
    input_id="log_model",
)

<a id='toc2_6_'></a>

### Assign predictions

With our models registered, we'll move on to assigning both the predictive probabilities coming directly from each model's predictions, and the binary prediction after applying the cutoff threshold:

In [None]:
# Champion — Application scorecard model
vm_train_ds.assign_predictions(
    model=vm_xgb_model,
    prediction_values=train_xgb_binary_predictions,
    prediction_probabilities=train_xgb_prob,
)

vm_test_ds.assign_predictions(
    model=vm_xgb_model,
    prediction_values=test_xgb_binary_predictions,
    prediction_probabilities=test_xgb_prob,
)

# Challenger — Logistic regression model
vm_train_ds.assign_predictions(
    model=vm_log_model,
    prediction_values=train_log_binary_predictions,
    prediction_probabilities=train_log_prob,
)

vm_test_ds.assign_predictions(
    model=vm_log_model,
    prediction_values=test_log_binary_predictions,
    prediction_probabilities=test_log_prob,
)

In [None]:
# Compute the scores
train_xgb_scores = lending_club.compute_scores(train_xgb_prob)
test_xgb_scores = lending_club.compute_scores(test_xgb_prob)
train_log_scores = lending_club.compute_scores(train_log_prob)
test_log_scores = lending_club.compute_scores(test_log_prob)

# Assign scores to the datasets
vm_train_ds.add_extra_column("xgb_scores", train_xgb_scores)
vm_test_ds.add_extra_column("xgb_scores", test_xgb_scores)
vm_train_ds.add_extra_column("log_scores", train_log_scores)
vm_test_ds.add_extra_column("log_scores", test_log_scores)

<a id='toc2_7_'></a>

### Enable use case context

We'll also adjust the use case context to focus on comparison between our models for tests going forward:

In [None]:
import os
os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT_ENABLED"] = "1"

context = """
FORMAT FOR THE LLM DESCRIPTIONS: 
    **<Test Name>** is designed to <begin with a concise overview of what the test does and its primary purpose, 
    extracted from the test description>.

    The test operates by <write a paragraph about the test mechanism, explaining how it works and what it measures. 
    Include any relevant formulas or methodologies mentioned in the test description.>

    The primary advantages of this test include <write a paragraph about the test's strengths and capabilities, 
    highlighting what makes it particularly useful for specific scenarios.>

    Users should be aware that <write a paragraph about the test's limitations and potential risks. 
    Include both technical limitations and interpretation challenges. 
    If the test description includes specific signs of high risk, incorporate these here.>

    **Key Insights:**

    The test results reveal:

    - **<insight title>**: <comprehensive description of one aspect of the results>
    - **<insight title>**: <comprehensive description of another aspect>
    ...

    Based on these results, <conclude with a brief paragraph that ties together the test results with the test's 
    purpose and provides any final recommendations or considerations.>

ADDITIONAL INSTRUCTIONS:

    The champion model as the basis for comparison is called "xgb_model_developer_champion" and emphasis should be on the following:
    - The metrics for the champion model compared against the challenger models
    - Which model potentially outperforms the champion model based on the metrics, this should be highlighted and emphasized


    For each metric in the test results, include in the test overview:
    - The metric's purpose and what it measures
    - Its mathematical formula
    - The range of possible values
    - What constitutes good/bad performance
    - How to interpret different values

    Each insight should progressively cover:
    1. Overall scope and distribution
    2. Complete breakdown of all elements with specific values
    3. Natural groupings and patterns
    4. Comparative analysis between datasets/categories
    5. Stability and variations
    6. Notable relationships or dependencies

    Remember:
    - Champion model (xgb_model_developer_champion) is the selection and challenger models are used to challenge the selection
    - Keep all insights at the same level (no sub-bullets or nested structures)
    - Make each insight complete and self-contained
    - Include specific numerical values and ranges
    - Cover all elements in the results comprehensively
    - Maintain clear, concise language
    - Use only "- **Title**: Description" format for insights
    - Progress naturally from general to specific observations

""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

<a id='toc3_'></a>

## Adjust a ValidMind test

Let's dig deeper into the [`MinimumF1Score` test](https://docs.validmind.ai/tests/model_validation/sklearn/MinimumF1Score.html) we ran previously in [113 Perform validation tests](113-perform_validation_tests.ipynb) to ensure that the models maintain a minimum acceptable balance between *precision* and *recall*. Precision refers to how many out of the positive predictions made by the model were actually correct, and recall refers to how many out of the actual positive cases did the model correctly identify.

Use [`run_test()`](https://docs.validmind.ai/validmind/validmind/tests.html#run_test) with our testing dataset (`vm_test_ds`) to run the test in isolation again for our two remaining models without logging the result to have the output to compare with a subsequent iteration:

In [None]:
vm.tests.run_test(
    "validmind.model_validation.sklearn.MinimumF1Score:xgboost_champion_vs_challengers",
    input_grid={
        "dataset": [vm_test_ds],
        "model": [vm_xgb_model, vm_log_model]
    },
)

As `MinimumF1Score` allows us to customize parameters and thresholds for performance standards, let's adjust the threshold to see if it improves metrics:

In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.MinimumF1Score:AdjThreshold",
    input_grid={
        "dataset": [vm_test_ds],
        "model": [vm_xgb_model, vm_log_model],
        "params": {"min_threshold": 0.35}
    },
).log()

<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;"><span style="color: #083E44;"><b>Note the output returned indicating that a test-driven block doesn't currently exist in your model's documentation for some test IDs. </b></span>
<br></br>
That's expected, as when we run validations tests the results logged need to be manually added to your report as part of your compliance assessment process within the ValidMind Platform.</div>

<a id='toc4_'></a>

## Run diagnostic tests

Next we want to inspect the robustness and stability testing comparison between our champion and challenger model.

Use [`list_tests()`](https://docs.validmind.ai/validmind/validmind/tests.html#list_tests) to identify all the model diagnosis tests for classification:

In [None]:
vm.tests.list_tests(tags=["model_diagnosis"], task="classification")

<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;"><span style="color: #083E44;"><b>Want to learn more about navigating ValidMind tests?</b></span>
<br></br>
Refer to our notebook outlining the utilities available for viewing and understanding available ValidMind tests: <a href="https://docs.validmind.ai/notebooks/how_to/explore_tests.html" style="color: #DE257E;"><b>Explore tests</b></a></div>

Let's see if models suffer from any *overfit* potentials and also where there are potential sub-segments of issues. Overfitting occurs when a model learns the training data too well, capturing not only the true pattern but noise and random fluctuations resulting in excellent performance on the training dataset but poor generalization to new, unseen data.

We'll select the following two tests:

- **[`TrainingTestDegradation`](https://docs.validmind.ai/tests/model_validation/sklearn/TrainingTestDegradation.html):** Evaulates how much the models' performance degrade when moving from the training to the test datasets.
- **[`OverfitDiagnosis`](https://docs.validmind.ai/tests/model_validation/sklearn/OverfitDiagnosis.html):** Dives into detecting overfitting beyond performance degradation by examining learning curves, model complexity v.s. generalization, and cross-validation results.

In [None]:
# Store our desired 2 tests
overfit_testing = [
    "validmind.model_validation.sklearn.TrainingTestDegradation:Champion_vs_LogRegression",
    "validmind.model_validation.sklearn.OverfitDiagnosis:Champion_vs_LogRegression"
]

In [None]:
# Run and log our 2 tests for both models for the training and testing datasets
for test in overfit_testing:
    vm.tests.run_test(
        test,
        input_grid={
            "datasets": [[vm_train_ds,vm_test_ds]],
            "model" : [vm_xgb_model,vm_log_model]
        }
    ).log()

Let's also conduct *robustness* and *stability* testing of the two models. Robustness refers to a model's ability to maintain consistent performance, and stability refers to a model's ability to produce consistent outputs over time across different data subsets.

We'll select the following two tests:

- **[`RobustnessDiagnosis`](https://docs.validmind.ai/tests/model_validation/sklearn/RobustnessDiagnosis.html):** Evaluates whether the models are resilient to small *perturbations* (small, controlled modifications) or *variations* (broader subgroup-level or distributional changes) in input data.
- **[`WeakspotsDiagnosis`](https://docs.validmind.ai/tests/model_validation/sklearn/WeakspotsDiagnosis.html):** Identifies specific conditions where the models consistently underperform.

In [None]:
# Store our desired 2 tests
stab_robust = [
    "validmind.model_validation.sklearn.RobustnessDiagnosis:Champion_vs_LogRegression",
    "validmind.model_validation.sklearn.WeakspotsDiagnosis:Champion_vs_LogRegression"
    ]

In [None]:
# Run and log our 2 tests for both models for the training and testing datasets
for test in stab_robust:
    vm.tests.run_test(
        test,
        input_grid={
            "datasets": [[vm_train_ds,vm_test_ds]],
            "model" : [vm_xgb_model,vm_log_model]
        },
    ).log()

<a id='toc5_'></a>

## Run feature importance tests

We want to verify the relative influence of different input features on our models' predictions, as well as inspect the differences between our champion and challenger model to see if a certain model offers more understandable or logical importance scores for features.

Use `list_tests()` to identify all the feature importance tests for classification:

In [None]:
# Store the feature importance tests
FI = vm.tests.list_tests(tags=["feature_importance"], task="classification",pretty=False)
FI

In [None]:
# Run and log our feature importance tests for both models for the testing dataset
for test in FI:
    vm.tests.run_test(
        "".join((test,':Champion_vs_LogisticRegression')),
        input_grid={
            "dataset": [vm_test_ds], "model" : [vm_xgb_model,vm_log_model]
        },
    ).log()

<a id='toc6_'></a>

## Implement a custom test

Let's finish up testing by implementing a custom *inline test* that outputs a FICO score-type score. An inline test refers to a test written and executed within the same environment as the code being tested — in this case, right in this Jupyter Notebook —  without requiring a separate test file or framework.

The [`@vm.test` wrapper](https://docs.validmind.ai/validmind/validmind.html#test) allows you to create a reusable test:

In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go

@vm.test("my_custom_tests.ScoreToOdds")
def score_to_odds_analysis(dataset, score_column='score', score_bands=[410, 440, 470]):
    """
    Analyzes the relationship between score bands and odds (good:bad ratio).
    Good odds = (1 - default_rate) / default_rate
    
    Higher scores should correspond to higher odds of being good.

    If there are multiple scores provided through score_column, this means that there are two different models and the scores reflect each model

    If there are more scores provided in the score_column then focus the assessment on the differences between the two scores and indicate through evidence which one is preferred.
    """
    df = dataset.df
    
    # Create score bands
    df['score_band'] = pd.cut(
        df[score_column],
        bins=[-np.inf] + score_bands + [np.inf],
        labels=[f'<{score_bands[0]}'] + 
               [f'{score_bands[i]}-{score_bands[i+1]}' for i in range(len(score_bands)-1)] +
               [f'>{score_bands[-1]}']
    )
    
    # Calculate metrics per band
    results = df.groupby('score_band').agg({
        dataset.target_column: ['mean', 'count']
    })
    
    results.columns = ['Default Rate', 'Total']
    results['Good Count'] = results['Total'] - (results['Default Rate'] * results['Total'])
    results['Bad Count'] = results['Default Rate'] * results['Total']
    results['Odds'] = results['Good Count'] / results['Bad Count']
    
    # Create visualization
    fig = go.Figure()
    
    # Add odds bars
    fig.add_trace(go.Bar(
        name='Odds (Good:Bad)',
        x=results.index,
        y=results['Odds'],
        marker_color='blue'
    ))
    
    fig.update_layout(
        title='Score-to-Odds Analysis',
        yaxis=dict(title='Odds Ratio (Good:Bad)'),
        showlegend=False
    )
    
    return fig

With the custom test available, run and log the test for our champion and challenger models with our testing dataset (`vm_test_ds`):

In [None]:
result = vm.tests.run_test(
    "my_custom_tests.ScoreToOdds:Champion_vs_Challenger",
    inputs={
        "dataset": vm_test_ds,
    },
    param_grid={
        "score_column": ["xgb_scores","log_scores"],
        "score_bands": [[500, 540, 570]],
    },
).log()

<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;"><span style="color: #083E44;"><b>Want to learn more about custom tests?</b></span>
<br></br>
Refer to our in-depth introduction to custom tests: <a href="https://docs.validmind.ai/notebooks/code_samples/custom_tests/implement_custom_tests.html" style="color: #DE257E;"><b>Implement custom tests</b></a></div>

<a id='toc7_'></a>

## Verify test runs

Our final task is to verify that all the tests provided by the model development team were run and reported accurately. Note the appended `result_ids` to delineate which dataset we ran the test with for the relevant tests.

Here, we'll specify all the tests we'd like to independently rerun in a dictionary called `test_config`:

In [None]:
test_config = {
    # Run with the raw dataset
    'validmind.data_validation.DatasetDescription:raw_data': {
        'inputs': {'dataset': 'raw_dataset'}
    },
    'validmind.data_validation.DescriptiveStatistics:raw_data': {
        'inputs': {'dataset': 'raw_dataset'}
    },
    'validmind.data_validation.MissingValues:raw_data': {
        'inputs': {'dataset': 'raw_dataset'},
        'params': {'min_threshold': 1}
    },
    'validmind.data_validation.ClassImbalance:raw_data': {
        'inputs': {'dataset': 'raw_dataset'},
        'params': {'min_percent_threshold': 10}
    },
    'validmind.data_validation.Duplicates:raw_data': {
        'inputs': {'dataset': 'raw_dataset'},
        'params': {'min_threshold': 1}
    },
    'validmind.data_validation.HighCardinality:raw_data': {
        'inputs': {'dataset': 'raw_dataset'},
        'params': {
            'num_threshold': 100,
            'percent_threshold': 0.1,
            'threshold_type': 'percent'
        }
    },
    'validmind.data_validation.Skewness:raw_data': {
        'inputs': {'dataset': 'raw_dataset'},
        'params': {'max_threshold': 1}
    },
    'validmind.data_validation.UniqueRows:raw_data': {
        'inputs': {'dataset': 'raw_dataset'},
        'params': {'min_percent_threshold': 1}
    },
    'validmind.data_validation.TooManyZeroValues:raw_data': {
        'inputs': {'dataset': 'raw_dataset'},
        'params': {'max_percent_threshold': 0.03}
    },
    'validmind.data_validation.IQROutliersTable:raw_data': {
        'inputs': {'dataset': 'raw_dataset'},
        'params': {'threshold': 5}
    },
    # Run with the preprocessed dataset
    'validmind.data_validation.DescriptiveStatistics:preprocessed_data': {
        'inputs': {'dataset': 'preprocess_dataset'}
    },
    'validmind.data_validation.TabularDescriptionTables:preprocessed_data': {
        'inputs': {'dataset': 'preprocess_dataset'}
    },
    'validmind.data_validation.MissingValues:preprocessed_data': {
        'inputs': {'dataset': 'preprocess_dataset'},
        'params': {'min_threshold': 1}
    },
    'validmind.data_validation.TabularNumericalHistograms:preprocessed_data': {
        'inputs': {'dataset': 'preprocess_dataset'}
    },
    'validmind.data_validation.TabularCategoricalBarPlots:preprocessed_data': {
        'inputs': {'dataset': 'preprocess_dataset'}
    },
    'validmind.data_validation.TargetRateBarPlots:preprocessed_data': {
        'inputs': {'dataset': 'preprocess_dataset'},
        'params': {'default_column': 'loan_status'}
    },
    # Run with the training and test datasets
    'validmind.data_validation.DescriptiveStatistics:development_data': {
        'input_grid': {'dataset': ['train_dataset', 'test_dataset']}
    },
    'validmind.data_validation.TabularDescriptionTables:development_data': {
        'input_grid': {'dataset': ['train_dataset', 'test_dataset']}
    },
    'validmind.data_validation.ClassImbalance:development_data': {
        'input_grid': {'dataset': ['train_dataset', 'test_dataset']},
        'params': {'min_percent_threshold': 10}
    },
    'validmind.data_validation.UniqueRows:development_data': {
        'input_grid': {'dataset': ['train_dataset', 'test_dataset']},
        'params': {'min_percent_threshold': 1}
    },
    'validmind.data_validation.TabularNumericalHistograms:development_data': {
        'input_grid': {'dataset': ['train_dataset', 'test_dataset']}
    },
    'validmind.data_validation.MutualInformation:development_data': {
        'input_grid': {'dataset': ['train_dataset', 'test_dataset']},
        'params': {'min_threshold': 0.01}
    },
    'validmind.data_validation.PearsonCorrelationMatrix:development_data': {
        'input_grid': {'dataset': ['train_dataset', 'test_dataset']}
    },
    'validmind.data_validation.HighPearsonCorrelation:development_data': {
        'input_grid': {'dataset': ['train_dataset', 'test_dataset']},
        'params': {'max_threshold': 0.3, 'top_n_correlations': 10}
    },
    'validmind.model_validation.ModelMetadata': {
        'input_grid': {'model': ['xgb_model', 'rf_model']}
    },
    'validmind.model_validation.sklearn.ModelParameters': {
        'input_grid': {'model': ['xgb_model', 'rf_model']}
    },
    'validmind.model_validation.sklearn.ROCCurve': {
        'input_grid': {'dataset': ['train_dataset', 'test_dataset'], 'model': ['xgb_model']}
    },
    'validmind.model_validation.sklearn.MinimumROCAUCScore': {
        'input_grid': {'dataset': ['train_dataset', 'test_dataset'], 'model': ['xgb_model']},
        'params': {'min_threshold': 0.5}
    }
}

Then batch run and log our tests in `test_config`:

In [None]:
for t in test_config:
    print(t)
    try:
        # Check if test has input_grid
        if 'input_grid' in test_config[t]:
            # For tests with input_grid, pass the input_grid configuration
            if 'params' in test_config[t]:
                vm.tests.run_test(t, input_grid=test_config[t]['input_grid'], params=test_config[t]['params']).log()
            else:
                vm.tests.run_test(t, input_grid=test_config[t]['input_grid']).log()
        else:
            # Original logic for regular inputs
            if 'params' in test_config[t]:
                vm.tests.run_test(t, inputs=test_config[t]['inputs'], params=test_config[t]['params']).log()
            else:
                vm.tests.run_test(t, inputs=test_config[t]['inputs']).log()
    except Exception as e:
        print(f"Error running test {t}: {str(e)}")

<a id='toc8_'></a>

## Add test results to reporting

With all the test results logged, let's head to the model we connected to at the beginning of this notebook and insert some test results into our validation report ([Need more help?](https://docs.validmind.ai/guide/model-validation/assess-compliance.html#link-validator-evidence)):

1. From the **Inventory** in the ValidMind Platform, go to the model you connected to earlier.

2. In the left sidebar that appears for your model, click **Validation Report**.

3. Locate the Data Preparation section and click on **2.2.2. Model Performance** to expand that section.

4. Under the Model Performance Metrics section, locate Validator Evidence then click **Link Evidence to Report**:

    <img src= "link-validator-evidence.png" alt="Screenshot showing the validation report with the link validator evidence to report option highlighted" style="border: 2px solid #083E44; border-radius: 8px; border-right-width: 2px; border-bottom-width: 3px;">
    <br><br>

5. Select the Minimum F1 Score test results we logged:

    - **ValidMind Model Validation Sklearn Minimum F 1 Score Adj Threshold**
    - **ValidMind Model Validation Sklearn Minimum F 1 Score Xgboost Champion**
    - **ValidMind Model Validation Sklearn Minimum F 1 Score Xgboost Champion Vs Challengers**

    <img src= "selecting-minimum-f1-scores.png" alt="Screenshot showing the MinimumF1Score tests selected" style="border: 2px solid #083E44; border-radius: 8px; border-right-width: 2px; border-bottom-width: 3px;">
    <br><br>

6. Finally, click **Update Linked Evidence** to add the test results to the validation report.

    Confirm that the results for the minimum F1 score tests have been correctly inserted into section **2.2.2. Model Performance** of the report:

    <img src= "inserted-minimum-f1-scores.png" alt="Screenshot showing the MinimumF1Score tests inserted into the validation report" style="border: 2px solid #083E44; border-radius: 8px; border-right-width: 2px; border-bottom-width: 3px;">
    <br><br>

7. Continue to work on your validation report by:

    - **Adding risk assessment notes:** Click under **Risk Assessment Notes** in any validation report section to access the text editor and content editing toolbar, including an option to generate a draft with AI. (Learn more: [Work with content blocks](https://docs.validmind.ai/guide/model-documentation/work-with-content-blocks.html#content-editing-toolbar))
    - **Adding findings:** Click on **Link Finding to Report** in any validation report section, then click **+ Create New Finding**. (Learn more: [Add and manage model findings](https://docs.validmind.ai/guide/model-validation/add-manage-model-findings.html))
    - **Assessing compliance:** Under the Guideline for any validation report section, click **ASSESSMENT** and select the compliance status from the drop-down menu. (Learn more: [Assess compliance](https://docs.validmind.ai/guide/model-validation/assess-compliance.html#provide-compliance-assessments))


<a id='toc9_'></a>

## In summary

In this final notebook, you learned how to:

- [ ] Adjust an out-of-the box ValidMind test
- [ ] Implement a custom inline test
- [ ] Re-run tests provided by your model development team to verify that they were run and reported accurately
- [ ] Link your logged test results as evidence on your validation report

With our ValidMind for model validation series of notebooks, you learned how to validate a model end-to-end with the ValidMind Library by running through some common scenarios in a typical model validation setting:

- Verifying the data quality steps performed by the model development team
- Independently replicating the champion model's results and conducting additional tests to assess performance, stability, and robustness
- Setting up test inputs and challenger models for comparative analysis
- Running validation tests, analyzing results, and logging findings to ValidMind

<a id='toc10_'></a>

## Next steps

<a id='toc10_1_'></a>

### Work with your validation report

Now that you've logged all your test results and verified the work done by the model development team, head to the ValidMind Platform to wrap up your validation report. Include risk assessment notes, add findings, and assess compliance, then submit your report for review when it's ready. **Learn more:** [Preparing validation reports](https://docs.validmind.ai/guide/model-validation/preparing-validation-reports.html)

<a id='toc10_2_'></a>

### Learn more

Now that you're familiar with the basics, you can explore the following notebooks to get a deeper understanding on how the ValidMind Library assists you in streamlining model validation:

<a id='toc10_2_1_'></a>

#### More how-to guides and code samples

- [Explore available tests in detail](../../how_to/explore_tests.ipynb)
- [In-depth guide on running dataset based tests](../../how_to/run_tests/1_run_dataset_based_tests.ipynb)
- [In-depth guide for running comparison tests](../../how_to/run_tests/2_run_comparison_tests.ipynb)
- [In-depth guide for implementing custom tests](../../code_samples/custom_tests/implement_custom_tests.ipynb)

<a id='toc10_2_2_'></a>

#### Discover more learning resources

All notebook samples can be found in the following directories of the ValidMind Library GitHub repository:

- [Code samples](https://github.com/validmind/validmind-library/tree/main/notebooks/code_samples)
- [How-to guides](https://github.com/validmind/validmind-library/tree/main/notebooks/how_to)