# Operational Deposit Model Documentation Demo

::: {.content-hidden when-format="html"}
## Contents    
- [About ValidMind](#toc1__)    
  - [Before you begin](#toc1_1__)    
  - [New to ValidMind?](#toc1_2__)    
- [Initialize the ValidMind Library](#toc2__)    
  - [Register sample model](#toc2_1__)    
  - [Apply documentation template](#toc2_2__)    
  - [Get your code snippet](#toc2_3__)    
- [Data preparation](#toc3__)    
- [ValidMind objects](#toc4__)    
- [Data Validation](#toc5__)    
  - [Dataset summary](#toc5_1__)    
  - [Duplicates](#toc5_2__)    
  - [Missing values](#toc5_3__)    
  - [Unique rows](#toc5_4__)    
  - [High cardinality](#toc5_5__)    
  - [Skewness](#toc5_6__)    
  - [Zero Values](#toc5_7__)    
  - [Descriptive statistics](#toc5_8__)    
  - [High pearson correlation](#toc5_9__)    
  - [Pearson correlation matrix](#toc5_10__)    
- [Segmentation of clients](#toc6__)    
  - [Clustering](#toc6_1__)    
  - [Prediction](#toc6_2__)    
  - [Compare Manual vs predicted](#toc6_3__)    
  - [Confusion matrix - test data](#toc6_4__)    
  - [Hyper parameter tuning](#toc6_5__)    
  - [Cluster performance metrics](#toc6_6__)    
  - [No of clusters optimization](#toc6_7__)    
- [Operational deposit  model](#toc7__)    
  - [Operational deposit model compuation](#toc7_1__)    
  - [Prepare VM dataset for the model](#toc7_2__)    
  - [VM model](#toc7_3__)    
  - [External test provider](#toc7_4__)    
  - [Simple custom test](#toc7_5__)    
- [Where to go from here](#toc8__)    
  - [Use cases](#toc8_1__)    
  - [More how-to guides and code samples](#toc8_2__)    
  - [Discover more learning resources](#toc8_3__)    
- [Upgrade ValidMind](#toc9__)    

:::
<!-- jn-toc-notebook-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=4
	/jn-toc-notebook-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

<a id='toc1__'></a>

## About ValidMind

ValidMind is a suite of tools for managing model risk, including risk associated with AI and statistical models.

You use the ValidMind Library to automate documentation and validation tests, and then use the ValidMind Platform to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators.

<a id='toc1_1__'></a>

### Before you begin

This notebook assumes you have basic familiarity with Python, including an understanding of how functions work. If you are new to Python, you can still run the notebook but we recommend further familiarizing yourself with the language. 

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).

<a id='toc1_2__'></a>

### New to ValidMind?

If you haven't already seen our documentation on the [ValidMind Library](https://docs.validmind.ai/developer/validmind-library.html), we recommend you begin by exploring the available resources in this section. There, you can learn more about documenting models and running tests, as well as find code samples and our Python Library API reference.

<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;"><span style="color: #083E44;"><b>For access to all features available in this notebook, you'll need access to a ValidMind account.</b></span>
<br></br>
<a href="https://docs.validmind.ai/guide/configuration/register-with-validmind.html" style="color: #DE257E;"><b>Register with ValidMind</b></a></div>

![Dataset based test architecture](./dataset_image.png)
![Model based test architecture](./model_image.png)

# Pre-requisites

Let's go ahead and install the `validmind` library if its not already installed.

In [None]:
%pip install -q validmind

In [None]:
import os

os.environ["VM_OVERRIDE_METADATA"] = "true"
os.environ["VALIDMIND_LLM_DESCRIPTIONS_ENABLED"] = "true"

<a id='toc2__'></a>

## Initialize the ValidMind Library

<a id='toc2_1__'></a>

### Register sample model

Let's first register a sample model for use with this notebook:

1. In a browser, [log in to ValidMind](https://docs.validmind.ai/guide/configuration/log-in-to-validmind.html).

2. In the left sidebar, navigate to **Inventory** and click **+ Register Model**.

3. Enter the model details and click **Next >** to continue to assignment of model stakeholders. ([Need more help?](https://docs.validmind.ai/guide/model-inventory/register-models-in-inventory.html))

   For example, to register a model for use with this notebook, select the following use case: `Credit Risk - Underwriting - Loans`

4. Select your own name under the **MODEL OWNER** drop-down.

5. Click **Register Model** to add the model to your inventory.

<a id='toc2_2__'></a>

### Apply documentation template

Once you've registered your model, let's select a documentation template. A template predefines sections for your model documentation and provides a general outline to follow, making the documentation process much easier.

1. In the left sidebar that appears for your model, click **Documents** and select **Documentation**.

2. Under **[template]{.smallcaps}**, select `Time Series Forecasting`.

3. Click **Use Template** to apply the template.

<a id='toc2_3__'></a>

### Get your code snippet

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the ValidMind Library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

1. On the left sidebar that appears for your model, select **Getting Started** and click **Copy snippet to clipboard**.
2. Next, [load your model identifier credentials from an `.env` file](https://docs.validmind.ai/developer/model-documentation/store-credentials-in-env-file.html) or replace the placeholder with your own code snippet:

In [None]:
# Load your model identifier credentials from an `.env` file

%load_ext dotenv
%dotenv .env

# Or replace with your code snippet

import validmind as vm

vm.init(
    # api_host="...",
    # api_key="...",
    # api_secret="...",
    # model="...",
)

Before learning how to run tests, let's explore the list of all available tests in the ValidMind Library. You can see that the documentation template for this model has references to some of the test IDs listed below.

In [None]:
vm.tests.list_tests()

Let's do some data quality assessments by running a few individual tests related to data assessment. You will use the `vm.tests.list_tests()` function introduced above in combination with `vm.tests.list_tags()` and `vm.tests.list_tasks()` to find which prebuilt tests are relevant for data quality assessment.

In [None]:
# Get the list of available tags
sorted(vm.tests.list_tags())

In [None]:
# Get the list of available tasks
sorted(vm.tests.list_tasks())

You can pass `tags` and `tasks` as parameters to the `vm.tests.list_tests()` function to filter the tests based on the tags and task types. For example, to find tests related to tabular data quality for classification models, you can call `list_tests()` like this:

In [None]:
vm.tests.list_tests(task="classification", tags=["tabular_data", "data_quality"])

<a id='toc3__'></a>

## Data preparation

In [None]:
import pandas as pd

raw_df = pd.read_csv("./datasets/odm_data_example/synthetic_data.csv")
print(f"Columns {list(raw_df.columns)}")
print(f"Size {list(raw_df.shape)}")

raw_df.head(4)

# Data validation

Now that we have loaded our dataset, we can go ahead and run some data validation tests right away to start assessing and documenting the quality of our data. Since we are using a text dataset, we can use ValidMind's built-in array of text data quality tests to check that things like number of duplicates, missing values, and other common text data issues are not present in our dataset. We can also run some tests to check the sentiment and toxicity of our data.

<a id='toc4__'></a>

## ValidMind objects

In [None]:
vm_raw_ds = vm.init_dataset(
    dataset=raw_df, input_id="raw_dataset", target_column="cust_ipid_nm"
)

<a id='toc5__'></a>

## Data Validation

In [None]:
vm.tests.list_tests(filter="data_validation")

<a id='toc5_1__'></a>

### Dataset summary

In [None]:
vm_ds_summary = vm.init_dataset(
    dataset=raw_df.drop("bal_date", axis=1),
    input_id="raw_dataset",
    target_column="cust_ipid_nm",
)
result = vm.tests.run_test(
    "validmind.data_validation.DatasetDescription", dataset=vm_ds_summary
).log()

<a id='toc5_2__'></a>

### Duplicates

First, let's check for duplicates in our dataset. We can use the `validmind.data_validation.Duplicates` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.Duplicates", dataset=vm_raw_ds
).log()

<a id='toc5_3__'></a>

### Missing values

Next, let's check for missing values in our dataset. We can use the `validmind.data_validation.MissingValues` test and pass our dataset:

In [None]:
result = vm.tests.run_test("validmind.data_validation.MissingValues", dataset=vm_raw_ds)

<a id='toc5_4__'></a>

### Unique rows

Next, let's check for unique rows in our dataset. We can use the `validmind.data_validation.UniqueRows` test and pass our dataset:

In [None]:
result = vm.tests.run_test("validmind.data_validation.UniqueRows", dataset=vm_raw_ds)

<a id='toc5_5__'></a>

### High cardinality

Next, let's check for high cardinality in our dataset. We can use the `validmind.data_validation.HighCardinality` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.HighCardinality", dataset=vm_raw_ds
).log()

<a id='toc5_6__'></a>

### Skewness

Next, let's check for skewness in our dataset. We can use the `validmind.data_validation.Skewness` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.Skewness", dataset=vm_raw_ds
).log()

<a id='toc5_7__'></a>

### Zero Values

Next, let's check for zeros values in our dataset. We can use the `validmind.data_validation.TooManyZeroValues` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.TooManyZeroValues", dataset=vm_raw_ds
).log()

<a id='toc5_8__'></a>

### Descriptive statistics

Next, let's check statistics of our dataset. We can use the `validmind.data_validation.DescriptiveStatistics` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.DescriptiveStatistics", dataset=vm_raw_ds
).log()

<a id='toc5_9__'></a>

### High pearson correlation

Next, let's check person correlation of our dataset. We can use the `validmind.data_validation.HighPearsonCorrelation` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.HighPearsonCorrelation", dataset=vm_raw_ds
)
result.log()

<a id='toc5_10__'></a>

### Pearson correlation matrix

Next, let's check person correlation matrix of our dataset. We can use the `validmind.data_validation.PearsonCorrelationMatrix` test and pass our dataset:

In [None]:
result = vm.tests.run_test(
    "validmind.data_validation.PearsonCorrelationMatrix", dataset=vm_raw_ds
).log()

<a id='toc6__'></a>

## Segmentation of clients

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

In [None]:
cluster_df = raw_df.drop(
    columns=[
        "LOB_data",
        "cust_id",
        "bal_date",
        "ult_parent_cust_ipid_no",
        "ult_parent_cust_nm",
        "client",
        "subclient",
    ],
    axis=1,
)
target_column = "cust_ipid_nm"
cluster_df.head(2)

<a id='toc6_1__'></a>

### Clustering
Let's build Kmeans model

In [None]:
from validmind.datasets.cluster import digits as demo_dataset

cluster_df = cluster_df.dropna()
train_df, validation_df, test_df = demo_dataset.preprocess(cluster_df)

x_train = train_df.drop(target_column, axis=1)
y_train = train_df[target_column]
x_val = validation_df.drop(target_column, axis=1)
y_val = validation_df[target_column]
x_test = test_df.drop(target_column, axis=1)
y_test = test_df[target_column]


x_train = pd.concat([x_train, x_val], axis=0)
y_train = pd.concat([y_train, y_val], axis=0)

scale = False
if scale:
    scaler = StandardScaler()
    x_train = scaler.fit_transform(x_train)
    x_val = scaler.fit_transform(x_val)
    x_test = scaler.fit_transform(x_test)


n_clusters = 4
model = KMeans(init="k-means++", n_clusters=n_clusters, n_init=4)  # random_state=0
model = model.fit(x_train)

Let's prepate VM dataset objects

In [None]:
vm_train_ds = vm.init_dataset(
    dataset=train_df, target_column=target_column, input_id="training_seg_dataset"
)

vm_test_ds = vm.init_dataset(
    dataset=test_df, target_column=target_column, input_id="test_seg_dataset"
)

In [None]:
vm_model = vm.init_model(model, input_id="kmean_model")

<a id='toc6_2__'></a>

### Prediction
Prediction values can be attached using `assign_prediction` interface.

In [None]:
vm_train_ds.assign_predictions(model=vm_model)
vm_test_ds.assign_predictions(model=vm_model)

<a id='toc6_3__'></a>

### Compare Manual vs predicted

In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.ConfusionMatrix:training",
    inputs={"dataset": vm_train_ds, "model": vm_model},
).log()

<a id='toc6_4__'></a>

### Confusion matrix - test data

In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.ConfusionMatrix:test",
    inputs={"dataset": vm_test_ds, "model": vm_model},
).log()

<a id='toc6_5__'></a>

### Hyper parameter tuning

In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.HyperParametersTuning",
    inputs={"dataset": vm_train_ds, "model": vm_model},
    params={"param_grid": {"n_clusters": range(3, 6)}},
).log()

<a id='toc6_6__'></a>

### Cluster performance metrics

In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.ClusterPerformanceMetrics",
    inputs={"datasets": (vm_train_ds, vm_test_ds), "model": vm_model},
).log()

<a id='toc6_7__'></a>

### No of clusters optimization

In [None]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.KMeansClustersOptimization",
    inputs={"dataset": vm_train_ds, "model": vm_model},
    params={
        "n_clusters": range(2, 8),
    },
).log()

<a id='toc7__'></a>

## Operational deposit  model

<a id='toc7_1__'></a>

### Operational deposit model compuation

In [None]:
operational_deposit_df = raw_df.copy()
target_column = "cust_ipid_nm"


# Eod_outflow_ratio
# Step 4: Statistical Analysis
def calculate_eod_outflow_ratio(df):
    df["eod_outflow_ratio"] = df["EOD"] / df["Total_Outflow"]

    return df


operational_deposit_df = calculate_eod_outflow_ratio(operational_deposit_df)


# Step 5: Model Implementation
def rolling_average(df, window=30):
    df["rolling_eod_balance"] = (
        df.groupby("cust_ipid_nm")["EOD"]
        .rolling(window=window)
        .mean()
        .reset_index(level=0, drop=True)
    )
    df["rolling_daily_outflow"] = (
        df.groupby("cust_ipid_nm")["Total_Outflow"]
        .rolling(window=window)
        .mean()
        .reset_index(level=0, drop=True)
    )
    return df


operational_deposit_df = rolling_average(operational_deposit_df)

# # Step 6: Output Generation
# def generate_outputs(df):
#     output_df = df.groupby(['cust_ipid_nm', 'subclient']).agg({
#         'rolling_eod_balance': 'last',
#         'rolling_daily_outflow': 'last'
#     }).reset_index()
#     output_df['operational_core'] = output_df['rolling_eod_balance'] / output_df['rolling_daily_outflow']
#     return output_df

# raw_df = generate_outputs(raw_df)

<a id='toc7_2__'></a>

### Prepare VM dataset for the model

In [None]:
from validmind.datasets.cluster import digits as demo_dataset

operational_deposit_df = operational_deposit_df.dropna()

x_train = operational_deposit_df.drop(target_column, axis=1)
y_train = operational_deposit_df[target_column]

vm_od_ds = vm.init_dataset(
    dataset=operational_deposit_df, input_id="od_dataset", target_column="cust_ipid_nm"
)

<a id='toc7_3__'></a>

### VM model
VM provides flexibility to generate model as per the use case requirement. Here, it's simple we treat prediction value as value of column `rolling_daily_outflow`

In [None]:
def operational_deposit(input):

    return input["rolling_daily_outflow"]


vm_od_model = vm.init_model(
    input_id="operational_deposit", predict_fn=operational_deposit
)
vm_od_ds.assign_predictions(
    model=vm_od_model, prediction_column="rolling_daily_outflow"
)
print(vm_od_ds)

<a id='toc7_4__'></a>

### External test provider

In [None]:
from validmind.tests import LocalTestProvider

tests_folder = "tests"
# initialize the test provider with the tests folder we created earlier
my_test_provider = LocalTestProvider(tests_folder)

vm.tests.register_test_provider(
    namespace="demo_test_provider",
    test_provider=my_test_provider,
)

<a id='toc7_5__'></a>

### Simple custom test
Let's plot timeseries line plot by grouping a specific column in the dataset

In [None]:
from validmind.tests import run_test

result = run_test(
    "demo_test_provider.TimeseriesGroupbyPlot",
    inputs={"dataset": vm_od_ds, "model": vm_od_model},
    params={
        "date_column": "bal_date",
        "groupby_column": "cust_ipid_nm",
        "y_column": "Total_Outflow",
    },
).log()

In [None]:
from validmind.tests import run_test

result = run_test(
    "my_test_provider.TimeseriesGroupbyPlot:Total_Outflow",
    inputs={"dataset": vm_od_ds, "model": vm_od_model},
    params={
        "date_column": "bal_date",
        "groupby_column": "client",
        "y_column": "Total_Outflow",
    },
).log()

In [None]:
from validmind.tests import run_test

result = run_test(
    "my_test_provider.TimeseriesGroupbyPlot:eod_outflow_ratio",
    inputs={"dataset": vm_od_ds, "model": vm_od_model},
    params={
        "date_column": "bal_date",
        "groupby_column": "subclient",
        "y_column": "eod_outflow_ratio",
    },
).log()

In [None]:
from validmind.tests import run_test

result = run_test(
    "my_test_provider.TimeseriesGroupbyPlot:rolling_eod_balance",
    inputs={"dataset": vm_od_ds, "model": vm_od_model},
    params={
        "date_column": "bal_date",
        "groupby_column": "subclient",
        "y_column": "rolling_eod_balance",
    },
).log()

<a id='toc8__'></a>

## Where to go from here

In this notebook you have learned the end-to-end process to document a model with the ValidMind Library, running through some very common scenarios in a typical model development setting:

- Running out-of-the-box tests
- Documenting your model by adding evidence to model documentation
- Extending the capabilities of the ValidMind Library by implementing custom tests
- Ensuring that the documentation is complete by running all tests in the documentation template

As a next step, you can explore the following notebooks to get a deeper understanding on how the ValidMind Library allows you generate model documentation for any use case:

<a id='toc8_1__'></a>

### Use cases

- [Application scorecard demo](../code_samples/credit_risk/application_scorecard_demo.ipynb)
- [Linear regression documentation demo](../code_samples/regression/quickstart_regression_full_suite.ipynb)
- [LLM model documentation demo](../code_samples/nlp_and_llm/foundation_models_integration_demo.ipynb)

<a id='toc8_2__'></a>

### More how-to guides and code samples

- [Explore available tests in detail](../how_to/explore_tests.ipynb)
- [In-depth guide for implementing custom tests](../code_samples/custom_tests/implement_custom_tests.ipynb)
- [In-depth guide to external test providers](../code_samples/custom_tests/integrate_external_test_providers.ipynb)
- [Configuring dataset features](../how_to/configure_dataset_features.ipynb)
- [Introduction to unit and composite metrics](../how_to/run_unit_metrics.ipynb)

<a id='toc8_3__'></a>

### Discover more learning resources

All notebook samples can be found in the following directories of the ValidMind Library GitHub repository:

- [Code samples](https://github.com/validmind/validmind-library/tree/main/notebooks/code_samples)
- [How-to guides](https://github.com/validmind/validmind-library/tree/main/notebooks/how_to)

<a id='toc9__'></a>

## Upgrade ValidMind

<div class="alert alert-block alert-info" style="background-color: #B5B5B510; color: black; border: 1px solid #083E44; border-left-width: 5px; box-shadow: 2px 2px 4px rgba(0, 0, 0, 0.2);border-radius: 5px;">After installing ValidMind, you’ll want to periodically make sure you are on the latest version to access any new features and other enhancements.</div>

Retrieve the information for the currently installed version of ValidMind:

In [None]:
%pip show validmind

If the version returned is lower than the version indicated in our [production open-source code](https://github.com/validmind/validmind-library/blob/prod/validmind/__version__.py), restart your notebook and run:

```bash
%pip install --upgrade validmind
```

You may need to restart your kernel after running the upgrade package for changes to be applied.