# Ongoing Monitoring for Application Scorecard 

TBC.

## Install the ValidMind Library

To install the library:

In [None]:
#%pip install -q validmind
%pip install -q -e ../../../../developer-framework

## Initialize the ValidMind Library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the ValidMind Library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

<a id='toc3_1_'></a>

### Get your code snippet

1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register Model**.

3. Enter the model details and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/model-inventory/register-models-in-inventory.html))

   For example, to register a model for use with this notebook, select:

   - Documentation template: `Binary classification`
   - Use case: `Marketing/Sales - Attrition/Churn Management`

   You can fill in other options according to your preference.

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, [load your model identifier credentials from an `.env` file](https://docs.validmind.ai/developer/model-documentation/store-credentials-in-env-file.html) or replace the placeholder with your own code snippet:

In [None]:
# Load your model identifier credentials from an `.env` file

%load_ext dotenv
%dotenv .env

# Or replace with your code snippet

import validmind as vm

vm.init(
  api_host = "https://api.prod.validmind.ai/api/v1/tracking",
  api_key = "f3e49f241081145facbbf59e93bcd8a9",
  api_secret = "c8dae73c5cc063cd070fa19508e625f60fe6dd18dddf96afed0d932ded91f530",
  model = "cm5gljv9100021nignfpbkvvc",
  monitoring = True
)

## Initialize the Python environment

Next, let's import the necessary libraries and set up your Python environment for data analysis:

In [3]:
import xgboost as xgb

from validmind.tests import run_test
from validmind.datasets.credit_risk import lending_club

%matplotlib inline

### Preview the monitoring template

A template predefines sections for your model monitoring documentation and provides a general outline to follow, making the documentation process much easier.

You will upload documentation and test results into this template later on. For now, take a look at the structure that the template provides with the `vm.preview_template()` function from the ValidMind library and note the empty sections:

In [None]:
vm.preview_template()

## Load the reference and monitoring datasets

The sample dataset used here is provided by the ValidMind library. For demonstration purposes we'll use the training, test dataset splits as `reference` and `monitoring` datasets.

In [None]:
df = lending_club.load_data(source="offline")
df.head()

In [None]:
preprocess_df = lending_club.preprocess(df)
preprocess_df.head()

In [None]:
fe_df = lending_club.feature_engineering(preprocess_df)
fe_df.head()

## Train the model

In this section, we focus on constructing and refining our predictive model. 
- We begin by dividing our data, which is based on Weight of Evidence (WoE) features, into training and testing sets (`train_df`, `test_df`). 
- With `lending_club.split`, we employ a simple random split, randomly allocating data points to each set to ensure a mix of examples in both.

In [None]:
# Split the data
train_df, test_df = lending_club.split(fe_df, test_size=0.2)

x_train = train_df.drop(lending_club.target_column, axis=1)
y_train = train_df[lending_club.target_column]

x_test = test_df.drop(lending_club.target_column, axis=1)
y_test = test_df[lending_club.target_column]

In [9]:
# Define the XGBoost model
xgb_model = xgb.XGBClassifier(
    n_estimators=50, 
    random_state=42, 
    early_stopping_rounds=10
)
xgb_model.set_params(
    eval_metric=["error", "logloss", "auc"],
)

# Fit the model
xgb_model.fit(
    x_train, 
    y_train,
    eval_set=[(x_test, y_test)],
    verbose=False
)

# Compute probabilities
train_xgb_prob = xgb_model.predict_proba(x_train)[:, 1]
test_xgb_prob = xgb_model.predict_proba(x_test)[:, 1]

# Compute binary predictions
cut_off_threshold = 0.3
train_xgb_binary_predictions = (train_xgb_prob > cut_off_threshold).astype(int)
test_xgb_binary_predictions = (test_xgb_prob > cut_off_threshold).astype(int)

### Initialize the ValidMind datasets

Before you can run tests, you must first initialize a ValidMind dataset object using the [`init_dataset`](https://docs.validmind.ai/validmind/validmind.html#init_dataset) function from the ValidMind (`vm`) module.

This function takes a number of arguments:

- `dataset` — The raw dataset that you want to provide as input to tests.
- `input_id` - A unique identifier that allows tracking what inputs are used when running each individual test.
- `target_column` — A required argument if tests require access to true values. This is the name of the target column in the dataset.

With all datasets ready, you can now initialize training, reference(test) and monitor datasets (`reference_df` and `monitor_df`) created earlier into their own dataset objects using [`vm.init_dataset()`](https://docs.validmind.ai/validmind/validmind.html#init_dataset):

In [10]:
vm_raw_dataset = vm.init_dataset(
    dataset=df,
    input_id="raw_dataset",
    target_column=lending_club.target_column,
)

vm_preprocess_dataset = vm.init_dataset(
    dataset=preprocess_df,
    input_id="preprocess_dataset",
    target_column=lending_club.target_column,
)

vm_fe_dataset = vm.init_dataset(
    dataset=fe_df,
    input_id="fe_dataset",
    target_column=lending_club.target_column,
)

vm_reference_ds = vm.init_dataset(
    dataset=train_df,
    input_id="reference_dataset",
    target_column=lending_club.target_column,
)

vm_monitoring_ds = vm.init_dataset(
    dataset=test_df,
    input_id="monitoring_dataset",
    target_column=lending_club.target_column,
)

### Initialize a model object

You will also need to initialize a ValidMind model object (`vm_model`) that can be passed to other functions for analysis and tests on the data. You simply intialize this model object with [`vm.init_model()`](https://docs.validmind.ai/validmind/validmind.html#init_model):

In [11]:
vm_xgb_model = vm.init_model(
    xgb_model,
    input_id="xgb_model",
)

### Assign prediction values and probabilities to the datasets

With our model now trained, we'll move on to assigning both the predictive probabilities coming directly from the model's predictions, and the binary prediction after applying the cutoff threshold described in the previous steps. 
- These tasks are achieved through the use of the `assign_predictions()` method associated with the VM `dataset` object.
- This method links the model's class prediction values and probabilities to our VM train and test datasets.

In [12]:
vm_reference_ds.assign_predictions(
    model=vm_xgb_model,
    prediction_values=train_xgb_binary_predictions,
    prediction_probabilities=train_xgb_prob,
)

vm_monitoring_ds.assign_predictions(
    model=vm_xgb_model,
    prediction_values=test_xgb_binary_predictions,
    prediction_probabilities=test_xgb_prob,
)

### Compute credit risk scores

In this phase, we translate model predictions into actionable scores using probability estimates generated by our trained model.

In [None]:
train_xgb_scores = lending_club.compute_scores(train_xgb_prob)
test_xgb_scores = lending_club.compute_scores(test_xgb_prob)

# Assign scores to the datasets
vm_reference_ds.add_extra_column("xgb_scores", train_xgb_scores)
vm_monitoring_ds.add_extra_column("xgb_scores", test_xgb_scores)

### Adding custom context to the LLM descriptions

To enable the LLM descriptions context, you need to set the `VALIDMIND_LLM_DESCRIPTIONS_CONTEXT_ENABLED` environment variable to `1`. This will enable the LLM descriptions context, which will be used to provide additional context to the LLM descriptions. This is a global setting that will affect all tests.

In [14]:
import os
os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT_ENABLED"] = "1"

context = """
FORMAT FOR THE LLM DESCRIPTIONS: 
    **<Test Name>** is designed to <begin with a concise overview of what the test does and its primary purpose, 
    extracted from the test description>.

    The test operates by <write a paragraph about the test mechanism, explaining how it works and what it measures. 
    Include any relevant formulas or methodologies mentioned in the test description.>

    The primary advantages of this test include <write a paragraph about the test's strengths and capabilities, 
    highlighting what makes it particularly useful for specific scenarios.>

    Users should be aware that <write a paragraph about the test's limitations and potential risks. 
    Include both technical limitations and interpretation challenges. 
    If the test description includes specific signs of high risk, incorporate these here.>

    **Key Insights:**

    The test results reveal:

    - **<insight title>**: <comprehensive description of one aspect of the results>
    - **<insight title>**: <comprehensive description of another aspect>
    ...

    Based on these results, <conclude with a brief paragraph that ties together the test results with the test's 
    purpose and provides any final recommendations or considerations.>

ADDITIONAL INSTRUCTIONS:
    Present insights in order from general to specific, with each insight as a single bullet point with bold title.

    For each metric in the test results, include in the test overview:
    - The metric's purpose and what it measures
    - Its mathematical formula in LaTeX notation
    - The range of possible values
    - What constitutes good/bad performance
    - How to interpret different values

    Each insight should progressively cover:
    1. Overall scope and distribution
    2. Complete breakdown of all elements with specific values
    3. Natural groupings and patterns
    4. Comparative analysis between datasets/categories
    5. Stability and variations
    6. Notable relationships or dependencies

    Remember:
    - Keep all insights at the same level (no sub-bullets or nested structures)
    - Make each insight complete and self-contained
    - Include specific numerical values and ranges
    - Cover all elements in the results comprehensively
    - Maintain clear, concise language
    - Use only "- **Title**: Description" format for insights
    - Progress naturally from general to specific observations

""".strip()

os.environ["VALIDMIND_LLM_DESCRIPTIONS_CONTEXT"] = context

### Conduct target and feature drift testing

Next, the goal is to investigate the distributional characteristics of predictions and features to determine if the underlying data has changed. These tests are crucial for assessing the expected accuracy of the model.

1. **Target drift:** We compare the dataset used for testing (reference data) with the monitoring data. This helps to identify any shifts in the target variable distribution.
2. **Feature drift:** We compare the training dataset with the monitoring data. Since features were used to train the model, any drift in these features could indicate potential issues, as the underlying patterns that the model was trained on may have changed.

Next, we can examine the correlation between features and predictions. Significant changes in these correlations may trigger a deeper assessment.

In [None]:
run=True
if run:

    run_test(
        "validmind.ongoing_monitoring.TargetPredictionDistributionPlot",
        inputs={
            "datasets": [vm_reference_ds, vm_monitoring_ds],
            "model": vm_xgb_model,
        },
        params={
            "drift_pct_threshold": 5
        },
    )

Now we want see difference in correlation pairs between model prediction and features.

In [None]:
run=True
if run:

    run_test(
        "validmind.ongoing_monitoring.PredictionCorrelation",
        inputs={
            "datasets": [vm_reference_ds, vm_monitoring_ds],
            "model": vm_xgb_model,
        },
        params={
            "drift_pct_threshold": 5
        },
    )

Finally for target drift, let's plot each prediction value and feature grid side by side.

In [None]:
run=True
if run:

    run_test(
        "validmind.ongoing_monitoring.PredictionQuantilesAcrossFeatures",
        inputs={
            "datasets": [vm_reference_ds, vm_monitoring_ds],
            "model": vm_xgb_model,
        },
    )

#### Feature drift tests

Next, let's add run a test to investigate how or if the features have drifted. In this instance we want to compare the training data with prediction data.

In [None]:
run=True
if run:

    run_test(
        "validmind.ongoing_monitoring.FeatureDrift",
        inputs={
            "datasets": [vm_reference_ds, vm_monitoring_ds],
            "model": vm_xgb_model,
        },
        params={
            "psi_threshold": 0.2,
        },
    )
