# Operational Deposit Model Documentation Demo

0. Explore ValidMind developer framework
1. Data quality tests
2. Segmentation of data
3. Custom tests
4. Review model document

<a id='toc2_'></a>

## About ValidMind

ValidMind is a platform for managing model risk, including risk associated with AI and statistical models.

You use the ValidMind Developer Framework to automate documentation and validation tests, and then use the ValidMind AI Risk Platform UI to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators.

<a id='toc2_1_'></a>

### Before you begin

This notebook assumes you have basic familiarity with Python, including an understanding of how functions work. If you are new to Python, you can still run the notebook but we recommend further familiarizing yourself with the language. 

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).

<a id='toc2_2_'></a>

### New to ValidMind?

If you haven't already seen our [Get started with the ValidMind Developer Framework](https://docs.validmind.ai/developer/get-started-developer-framework.html), we recommend you explore the available resources for developers at some point. There, you can learn more about documenting models, find code samples, or read our developer reference.

<div class="alert alert-block alert-info" style="background-color: #f7e4ee; color: black; border: 1px solid black;"><b>For access to all features available in this notebook, create a free ValidMind account.</b>
<br></br>
Signing up is FREE — <a href="https://docs.validmind.ai/guide/configuration/register-with-validmind.html"><b>Register with ValidMind</b></a></div>

<a id='toc2_3_'></a>

![Dataset based test architecture](./dataset_image.png)
![Model based test architecture](./model_image.png)

# Pre-requisites

Let's go ahead and install the `validmind` library if its not already installed.

In [None]:
%pip install -q validmind

In [None]:
import os

os.environ["VM_OVERRIDE_METADATA"] = "true"
os.environ["VALIDMIND_LLM_DESCRIPTIONS_ENABLED"] = "true"

<a id='toc4_'></a>

## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details, making sure to select **Time Series Forecasting** as the template and **Credit Risk - Underwriting - Loan** as the use case, and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/model-inventory/register-models-in-inventory.html))

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:


In [None]:
# Replace with your code snippet


import validmind as vm

vm.init(api_host="...", api_key="...", api_secret="...", model="...")

Before learning how to run tests, let's explore the list of all available tests in the ValidMind Developer Framework. You can see that the documentation template for this model has references to some of the test IDs listed below.


In [None]:
vm.tests.list_tests()

Let's do some data quality assessments by running a few individual tests related to data assessment. You will use the `vm.tests.list_tests()` function introduced above in combination with `vm.tests.list_tags()` and `vm.tests.list_tasks()` to find which prebuilt tests are relevant for data quality assessment.


In [None]:
# Get the list of available tags
sorted(vm.tests.list_tags())

In [None]:
# Get the list of available tasks
sorted(vm.tests.list_tasks())

You can pass `tags` and `tasks` as parameters to the `vm.tests.list_tests()` function to filter the tests based on the tags and task types. For example, to find tests related to tabular data quality for classification models, you can call `list_tests()` like this:


In [None]:
vm.tests.list_tests(task="classification", tags=["tabular_data", "data_quality"])

## Data preparation


In [None]:
import pandas as pd

raw_df = pd.read_csv("./datasets/odm_data_example/synthetic_data.csv")
print(f"Columns {list(raw_df.columns)}")
print(f"Size {list(raw_df.shape)}")

raw_df.head(4)

# Data validation

Now that we have loaded our dataset, we can go ahead and run some data validation tests right away to start assessing and documenting the quality of our data. Since we are using a text dataset, we can use ValidMind's built-in array of text data quality tests to check that things like number of duplicates, missing values, and other common text data issues are not present in our dataset. We can also run some tests to check the sentiment and toxicity of our data.

## ValidMind objects


In [None]:
vm_raw_ds = vm.init_dataset(
    dataset=raw_df, input_id="raw_dataset", target_column="cust_ipid_nm"
)

## Data Validation

In [None]:
vm.tests.list_tests(filter="data_validation")

### Dataset summary

In [None]:
vm_ds_summary = vm.init_dataset(
    dataset=raw_df.drop("bal_date", axis=1),
    input_id="raw_dataset",
    target_column="cust_ipid_nm",
)
result = vm.tests.run_test(
    "validmind.data_validation.DatasetDescription", dataset=vm_ds_summary
).log()

### Duplicates

First, let's check for duplicates in our dataset. We can use the `validmind.data_validation.Duplicates` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.Duplicates", dataset=vm_raw_ds
).log()

### Missing values

Next, let's check for missing values in our dataset. We can use the `validmind.data_validation.MissingValues` test and pass our dataset:

In [None]:
result = vm.tests.run_test("validmind.data_validation.MissingValues", dataset=vm_raw_ds)

### Unique rows

Next, let's check for unique rows in our dataset. We can use the `validmind.data_validation.UniqueRows` test and pass our dataset:

In [None]:
result = vm.tests.run_test("validmind.data_validation.UniqueRows", dataset=vm_raw_ds)

### High cardinality

Next, let's check for high cardinality in our dataset. We can use the `validmind.data_validation.HighCardinality` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.HighCardinality", dataset=vm_raw_ds
).log()

### Skewness

Next, let's check for skewness in our dataset. We can use the `validmind.data_validation.Skewness` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.Skewness", dataset=vm_raw_ds
).log()

### Zero Values

Next, let's check for zeros values in our dataset. We can use the `validmind.data_validation.TooManyZeroValues` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.TooManyZeroValues", dataset=vm_raw_ds
).log()

### Descriptive statistics

Next, let's check statistics of our dataset. We can use the `validmind.data_validation.DescriptiveStatistics` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.DescriptiveStatistics", dataset=vm_raw_ds
).log()

### High pearson correlation

Next, let's check person correlation of our dataset. We can use the `validmind.data_validation.HighPearsonCorrelation` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.HighPearsonCorrelation", dataset=vm_raw_ds
)
result.log()

### Pearson correlation matrix

Next, let's check person correlation matrix of our dataset. We can use the `validmind.data_validation.PearsonCorrelationMatrix` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.PearsonCorrelationMatrix", dataset=vm_raw_ds
).log()

## Segmentation of clients

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

In [None]:
cluster_df = raw_df.drop(
    columns=[
        "LOB_data",
        "cust_id",
        "bal_date",
        "ult_parent_cust_ipid_no",
        "ult_parent_cust_nm",
        "client",
        "subclient",
    ],
    axis=1,
)
target_column = "cust_ipid_nm"
cluster_df.head(2)

### Clustering
Let's build Kmeans model

In [None]:
from validmind.datasets.cluster import digits as demo_dataset

cluster_df = cluster_df.dropna()
train_df, validation_df, test_df = demo_dataset.preprocess(cluster_df)

x_train = train_df.drop(target_column, axis=1)
y_train = train_df[target_column]
x_val = validation_df.drop(target_column, axis=1)
y_val = validation_df[target_column]
x_test = test_df.drop(target_column, axis=1)
y_test = test_df[target_column]


x_train = pd.concat([x_train, x_val], axis=0)
y_train = pd.concat([y_train, y_val], axis=0)

scale = False
if scale:
    scaler = StandardScaler()
    x_train = scaler.fit_transform(x_train)
    x_val = scaler.fit_transform(x_val)
    x_test = scaler.fit_transform(x_test)


n_clusters = 4
model = KMeans(init="k-means++", n_clusters=n_clusters, n_init=4)  # random_state=0
model = model.fit(x_train)

Let's prepate VM dataset objects

In [None]:
vm_train_ds = vm.init_dataset(
    dataset=train_df, target_column=target_column, input_id="training_seg_dataset"
)

vm_test_ds = vm.init_dataset(
    dataset=test_df, target_column=target_column, input_id="test_seg_dataset"
)

In [None]:
vm_model = vm.init_model(model, input_id="kmean_model")

### Prediction
Prediction values can be attached using `assign_prediction` interface.

In [None]:
vm_train_ds.assign_predictions(model=vm_model)
vm_test_ds.assign_predictions(model=vm_model)

### Compare Manual vs predicted 

In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.ConfusionMatrix:training",
    inputs={"dataset": vm_train_ds, "model": vm_model},
).log()

### Confusion matrix - test data

In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.ConfusionMatrix:test",
    inputs={"dataset": vm_test_ds, "model": vm_model},
).log()

### Hyper parameter tuning

In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.HyperParametersTuning",
    inputs={"dataset": vm_train_ds, "model": vm_model},
    params={"param_grid": {"n_clusters": range(3, 6)}},
).log()

### Cluster performance metrics

In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.ClusterPerformanceMetrics",
    inputs={"datasets": (vm_train_ds, vm_test_ds), "model": vm_model},
).log()

### No of clusters optimization


In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.KMeansClustersOptimization",
    inputs={"dataset": vm_train_ds, "model": vm_model},
    params={
        "n_clusters": range(2, 8),
    },
).log()

## Operational deposit  model

### Operational deposit model compuation

In [None]:
operational_deposit_df = raw_df.copy()
target_column = "cust_ipid_nm"


# Eod_outflow_ratio
# Step 4: Statistical Analysis
def calculate_eod_outflow_ratio(df):
    df["eod_outflow_ratio"] = df["EOD"] / df["Total_Outflow"]

    return df


operational_deposit_df = calculate_eod_outflow_ratio(operational_deposit_df)


# Step 5: Model Implementation
def rolling_average(df, window=30):
    df["rolling_eod_balance"] = (
        df.groupby("cust_ipid_nm")["EOD"]
        .rolling(window=window)
        .mean()
        .reset_index(level=0, drop=True)
    )
    df["rolling_daily_outflow"] = (
        df.groupby("cust_ipid_nm")["Total_Outflow"]
        .rolling(window=window)
        .mean()
        .reset_index(level=0, drop=True)
    )
    return df


operational_deposit_df = rolling_average(operational_deposit_df)

# # Step 6: Output Generation
# def generate_outputs(df):
#     output_df = df.groupby(['cust_ipid_nm', 'subclient']).agg({
#         'rolling_eod_balance': 'last',
#         'rolling_daily_outflow': 'last'
#     }).reset_index()
#     output_df['operational_core'] = output_df['rolling_eod_balance'] / output_df['rolling_daily_outflow']
#     return output_df

# raw_df = generate_outputs(raw_df)

### Prepare VM dataset for the model

In [None]:
from validmind.datasets.cluster import digits as demo_dataset

operational_deposit_df = operational_deposit_df.dropna()

x_train = operational_deposit_df.drop(target_column, axis=1)
y_train = operational_deposit_df[target_column]

vm_od_ds = vm.init_dataset(
    dataset=operational_deposit_df, input_id="od_dataset", target_column="cust_ipid_nm"
)

### VM model
VM provides flexibility to generate model as per the use case requirement. Here, it's simple we treat prediction value as value of column `rolling_daily_outflow`

In [None]:
def operational_deposit(input):

    return input["rolling_daily_outflow"]


vm_od_model = vm.init_model(
    input_id="operational_deposit", predict_fn=operational_deposit
)
vm_od_ds.assign_predictions(
    model=vm_od_model, prediction_column="rolling_daily_outflow"
)
print(vm_od_ds)

### External test provider

In [None]:
from validmind.tests import LocalTestProvider

tests_folder = "tests"
# initialize the test provider with the tests folder we created earlier
my_test_provider = LocalTestProvider(tests_folder)

vm.tests.register_test_provider(
    namespace="demo_test_provider",
    test_provider=my_test_provider,
)

### Simple custom test
Let's plot timeseries line plot by grouping a specific column in the dataset

In [None]:
from validmind.tests import run_test

result = run_test(
    "demo_test_provider.TimeseriesGroupbyPlot",
    inputs={"dataset": vm_od_ds, "model": vm_od_model},
    params={
        "date_column": "bal_date",
        "groupby_column": "cust_ipid_nm",
        "y_column": "Total_Outflow",
    },
).log()

In [None]:
from validmind.tests import run_test

result = run_test(
    "my_test_provider.TimeseriesGroupbyPlot:Total_Outflow",
    inputs={"dataset": vm_od_ds, "model": vm_od_model},
    params={
        "date_column": "bal_date",
        "groupby_column": "client",
        "y_column": "Total_Outflow",
    },
).log()

In [None]:
from validmind.tests import run_test

result = run_test(
    "my_test_provider.TimeseriesGroupbyPlot:eod_outflow_ratio",
    inputs={"dataset": vm_od_ds, "model": vm_od_model},
    params={
        "date_column": "bal_date",
        "groupby_column": "subclient",
        "y_column": "eod_outflow_ratio",
    },
).log()

In [None]:
from validmind.tests import run_test

result = run_test(
    "my_test_provider.TimeseriesGroupbyPlot:rolling_eod_balance",
    inputs={"dataset": vm_od_ds, "model": vm_od_model},
    params={
        "date_column": "bal_date",
        "groupby_column": "subclient",
        "y_column": "rolling_eod_balance",
    },
).log()

<a id='toc8_'></a>

## Where to go from here

In this notebook you have learned the end-to-end process to document a model with the ValidMind Developer Framework, running through some very common scenarios in a typical model development setting:

- Running out-of-the-box tests
- Documenting your model by adding evidence to model documentation
- Extending the capabilities of the Developer Framework by implementing custom tests
- Ensuring that the documentation is complete by running all tests in the documentation template

As a next step, you can explore the following notebooks to get a deeper understanding on how the developer framework allows you generate model documentation for any use case:

<a id='toc8_1_'></a>

### Use cases

- [Application scorecard demo](../code_samples/credit_risk/application_scorecard_demo.ipynb)
- [Linear regression documentation demo](../code_samples/regression/quickstart_regression_full_suite.ipynb)
- [LLM model documentation demo](../code_samples/nlp_and_llm/foundation_models_integration_demo.ipynb)

<a id='toc8_2_'></a>

### More how-to guides and code samples

- [Explore available tests in detail](../how_to/explore_tests.ipynb)
- [In-depth guide for implementing custom tests](../code_samples/custom_tests/implement_custom_tests.ipynb)
- [In-depth guide to external test providers](../code_samples/custom_tests/integrate_external_test_providers.ipynb)
- [Configuring dataset features](../how_to/configure_dataset_features.ipynb)
- [Introduction to unit and composite metrics](../how_to/run_unit_metrics.ipynb)

<a id='toc8_3_'></a>

### Discover more learning resources

All notebook samples can be found in the following directories of the Developer Framework GitHub repository:

- [Code samples](https://github.com/validmind/developer-framework/tree/main/notebooks/code_samples)
- [How-to guides](https://github.com/validmind/developer-framework/tree/main/notebooks/how_to)
