# ValidMind for model validation — 113 Perform model validation tests

Learn how to use ValidMind for your end-to-end model validation process with our series of four introductory notebooks. In this third notebook, you'll identify relevant validation tests, run them, then add the test results to your validation report.

You'll become familiar with the individual tests available in ValidMind, as well as how to run them and change parameters as necessary. Using ValidMind's repository of individual tests as building blocks helps you ensure that a model was built appropriately.

**For a full list of out-of-the-box tests,** refer to our [Test descriptions](https://docs.validmind.ai/developer/model-testing/test-descriptions.html) or try the interactive [Test sandbox](https://docs.validmind.ai/developer/model-testing/test-sandbox.html).

## Prerequisites

In order to log tests results or evidence to your validation report with this notebook, you'll need to first have:

- [ ] Registered a model within the ValidMind Platform and granted yourself access to the model as a validator
- [ ] Installed the ValidMind Library in your local environment, allowing you to access all its features
- [ ] Learned how to import and initialize datasets for use with ValidMind
- [ ] Understood the basics of how to enable custom context for test descriptions generated by ValidMind


<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;"><span style="color: #083E44;"><b>Need help with the above steps?</b></span>
<br></br>
Refer to the first two notebooks in this series:

<ol>
    <li><a href="111-import_champion_model.ipynb" style="color: #DE257E;"><b>111 Import the champion model</b></a></li>
    <li><a href="112-develop_challenger_models.ipynb" style="color: #DE257E;"><b>112 Develop potential challenger models</b></a></li>
</ol>

</div>

## Setting up

This section should be quite familiar to you — as we performed the same actions in the previous notebook, **[112 Develop potential challenger models](112-develop_challenger_models.ipynb)**.

### Initialize the ValidMind Library

As usual, let's first connect up the ValidMind Library to our model we previously registered in the ValidMind Platform:

1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).

2. In the left sidebar, navigate to **Inventory** and select the model you registered for this "ValidMind for model validation" series of notebooks.

3. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, [load your model identifier credentials from an `.env` file](https://docs.validmind.ai/developer/model-documentation/store-credentials-in-env-file.html) or replace the placeholder with your own code snippet:

In [None]:
# Make sure the ValidMind Library is installed

%pip install -q validmind

# Load your model identifier credentials from an `.env` file

%load_ext dotenv
%dotenv .env

# Or replace with your code snippet

import validmind as vm

vm.init(
    # api_host="...",
    # api_key="...",
    # api_secret="...",
    # model="...",
)

### Import the champion model

Next, we'll import the champion model submitted by the model development team as we used in the last notebooks (**[xgb_model_champion.pkl](xgb_model_champion.pkl)**) and load in the same sample [Lending Club](https://www.kaggle.com/datasets/devanshi23/loan-data-2007-2014/data) dataset:

In [None]:
import xgboost as xgb

#Load the saved model
xgb_model = xgb.XGBClassifier()
xgb_model.load_model("xgb_model_champion.pkl")
xgb_model

# Ensure that we have to appropriate order in feature names from Champion model and dataset
cols_when_model_builds = xgb_model.get_booster().feature_names

In [None]:
# Import the Lending Club dataset from Kaggle
from validmind.datasets.credit_risk import lending_club

df = lending_club.load_data(source="offline")
df.head()

# Preprocess the dataset for data quality testing purposes
preprocess_df = lending_club.preprocess(df)

# Apply feature engineering to the dataset
fe_df = lending_club.feature_engineering(preprocess_df)
fe_df.head()

We'll also spilt our dataset into train and test to start the validation testing process:

In [None]:
# Split the data
train_df, test_df = lending_club.split(fe_df, test_size=0.2)

x_train = train_df.drop(lending_club.target_column, axis=1)
y_train = train_df[lending_club.target_column]

x_test = test_df.drop(lending_club.target_column, axis=1)
y_test = test_df[lending_club.target_column]

# Now let's apply the order of features from the champion model construction
x_train = x_train[cols_when_model_builds]
x_test = x_test[cols_when_model_builds]

In [None]:
cols_use = ['annual_inc_woe',
 'verification_status_woe',
 'emp_length_woe',
 'installment_woe',
 'term_woe',
 'home_ownership_woe',
 'purpose_woe',
 'open_acc_woe',
 'total_acc_woe',
 'int_rate_woe',
 'sub_grade_woe',
 'grade_woe','loan_status']


train_df = train_df[cols_use]
test_df = test_df[cols_use]
test_df.head()

### Train challenger models

Just like in the last notebook, we'll train two challenger models as the basis for our testing:

In [None]:
# Import the Random Forest Classification model
from sklearn.ensemble import RandomForestClassifier

# Create the model instance with 50 decision trees
rf_model = RandomForestClassifier(
    n_estimators=50,
    random_state=42,
)

# Train the model
rf_model.fit(x_train, y_train)

In [None]:
# Import the Logistic Regression model
from sklearn.linear_model import LogisticRegression

# Logistic Regression grid params
log_reg_params = {
    "penalty": ["l1", "l2"],
    "C": [0.001, 0.01, 0.1, 1, 10, 100, 1000],
    "solver": ["liblinear"],
}

# Grid search for Logistic Regression
from sklearn.model_selection import GridSearchCV

grid_log_reg = GridSearchCV(LogisticRegression(), log_reg_params)
grid_log_reg.fit(x_train, y_train)

# Logistic Regression best estimator
log_reg = grid_log_reg.best_estimator_
log_reg

### Extract predicted probabilities

With our challenger models trained, let's extract the predicted probabilities from our three models and convert the probability predictions into a binary:

In [None]:
# Champion — Application scorecard model
train_xgb_prob = xgb_model.predict_proba(x_train)[:, 1]
test_xgb_prob = xgb_model.predict_proba(x_test)[:, 1]

# Challenger — Random forest classification model
train_rf_prob = rf_model.predict_proba(x_train)[:, 1]
test_rf_prob = rf_model.predict_proba(x_test)[:, 1]

# Challenger — Logistic regression model
train_log_prob = log_reg.predict_proba(x_train)[:, 1]
test_log_prob = log_reg.predict_proba(x_test)[:, 1]

In [None]:
# If probability > 0.3 = 1 (positive)
cut_off_threshold = 0.3

# Champion — Application scorecard model
train_xgb_binary_predictions = (train_xgb_prob > cut_off_threshold).astype(int)
test_xgb_binary_predictions = (test_xgb_prob > cut_off_threshold).astype(int)

# Challenger — Random forest classification model
train_rf_binary_predictions = (train_rf_prob > cut_off_threshold).astype(int)
test_rf_binary_predictions = (test_rf_prob > cut_off_threshold).astype(int)

# Challenger — Logistic regression model
train_log_binary_predictions = (train_log_prob > cut_off_threshold).astype(int)
test_log_binary_predictions = (test_log_prob > cut_off_threshold).astype(int)

## In summary

## Next steps

### Finalize testing and reporting