# Operational Deposit Model Documentation Demo

<a id='toc2_'></a>

## About ValidMind

ValidMind is a platform for managing model risk, including risk associated with AI and statistical models.

You use the ValidMind Developer Framework to automate documentation and validation tests, and then use the ValidMind AI Risk Platform UI to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators.

<a id='toc2_1_'></a>

### Before you begin

This notebook assumes you have basic familiarity with Python, including an understanding of how functions work. If you are new to Python, you can still run the notebook but we recommend further familiarizing yourself with the language. 

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).

<a id='toc2_2_'></a>

### New to ValidMind?

If you haven't already seen our [Get started with the ValidMind Developer Framework](https://docs.validmind.ai/guide/get-started-developer-framework.html), we recommend you explore the available resources for developers at some point. There, you can learn more about documenting models, find code samples, or read our developer reference.

::: {.callout-tip}

For access to all features available in this notebook, create a free ValidMind account.

Signing up is FREE — [**Sign up now!**](https://app.prod.validmind.ai)

:::

<a id='toc2_3_'></a>

![Dataset based test architecture](./dataset_image.png)
![Model based test architecture](./model_image.png)

# Pre-requisites

Let's go ahead and install the `validmind` library if its not already installed.

In [1]:
# %pip install -q validmind

In [2]:
import os
os.environ["VM_OVERRIDE_METADATA"] = "true"
os.environ["VALIDMIND_LLM_DESCRIPTIONS_ENABLED"] = "true"

<a id='toc4_'></a>

## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details, making sure to select **Time Series Forecasting** as the template and **Credit Risk - Underwriting - Loan** as the use case, and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/register-models-in-model-inventory.html))

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:


In [3]:
import validmind as vm

vm.init(
  api_host = "https://api.prod.validmind.ai/api/v1/tracking",
  api_key = "f316498929be8215fef76fcb3e814194",
  api_secret = "be2e1476c8f7f3837bf142f2a1b7ba60e6c1cdbfd42930bbfb7aa40f42691868",
  project = "clwhu1lol00pz23iio6n4w8k4"
)

2024-05-23 16:52:02,511 - INFO(validmind.api_client): Connected to ValidMind. Project: Operational deposit model - Initial Validation (clwhu1lol00pz23iio6n4w8k4)


Before learning how to run tests, let's explore the list of all available tests in the ValidMind Developer Framework. You can see that the documentation template for this model has references to some of the test IDs listed below.


In [4]:
vm.tests.list_tests()

ID,Name,Test Type,Description,Required Inputs,Params
validmind.prompt_validation.Bias,Bias,ThresholdTest,Evaluates bias in a Large Language Model based on the order and distribution of exemplars in a prompt....,['model.prompt'],{'min_threshold': 7}
validmind.prompt_validation.Clarity,Clarity,ThresholdTest,Evaluates and scores the clarity of prompts in a Large Language Model based on specified guidelines....,['model.prompt'],{'min_threshold': 7}
validmind.prompt_validation.Specificity,Specificity,ThresholdTest,"Evaluates and scores the specificity of prompts provided to a Large Language Model (LLM), based on clarity,...",['model.prompt'],{'min_threshold': 7}
validmind.prompt_validation.Robustness,Robustness,ThresholdTest,Assesses the robustness of prompts provided to a Large Language Model under varying conditions and contexts....,['model'],{'num_tests': 10}
validmind.prompt_validation.NegativeInstruction,Negative Instruction,ThresholdTest,"Evaluates and grades the use of affirmative, proactive language over negative instructions in LLM prompts....",['model.prompt'],{'min_threshold': 7}
validmind.prompt_validation.Conciseness,Conciseness,ThresholdTest,Analyzes and grades the conciseness of prompts provided to a Large Language Model....,['model.prompt'],{'min_threshold': 7}
validmind.prompt_validation.Delimitation,Delimitation,ThresholdTest,Evaluates the proper use of delimiters in prompts provided to Large Language Models....,['model.prompt'],{'min_threshold': 7}
validmind.model_validation.BertScore,Bert Score,Metric,Evaluates the quality of machine-generated text using BERTScore metrics and visualizes the results through histograms...,"['dataset', 'model']",{}
validmind.model_validation.RegardScore,Regard Score,Metric,"Computes and visualizes the regard score for each text instance, assessing sentiment and potential biases....","['dataset', 'model']",{}
validmind.model_validation.BleuScore,Bleu Score,Metric,Evaluates the quality of machine-generated text using BLEU metrics and visualizes the results through histograms...,"['dataset', 'model']",{}


Let's do some data quality assessments by running a few individual tests related to data assessment. You will use the `vm.tests.list_tests()` function introduced above in combination with `vm.tests.list_tags()` and `vm.tests.list_task_types()` to find which prebuilt tests are relevant for data quality assessment.


In [5]:
# Get the list of available tags
sorted(vm.tests.list_tags())

['AUC',
 'anomaly_detection',
 'binary_classification',
 'categorical_data',
 'correlation',
 'credit_risk',
 'data_distribution',
 'data_quality',
 'data_validation',
 'feature_importance',
 'few_shot',
 'forecasting',
 'frequency_analysis',
 'kmeans',
 'llm',
 'logistic_regression',
 'model_comparison',
 'model_diagnosis',
 'model_interpretation',
 'model_metadata',
 'model_performance',
 'model_selection',
 'multiclass_classification',
 'nlp',
 'numerical_data',
 'qualitative',
 'rag_performance',
 'ragas',
 'retrieval_performance',
 'risk_analysis',
 'seasonality',
 'senstivity_analysis',
 'sklearn',
 'stationarity',
 'statistical_test',
 'statsmodels',
 'tabular_data',
 'text_data',
 'text_embeddings',
 'time_series_data',
 'unit_root_test',
 'visualization',
 'zero_shot']

In [6]:
# Get the list of available task types
sorted(vm.tests.list_task_types())

['classification',
 'clustering',
 'feature_extraction',
 'nlp',
 'regression',
 'text_classification',
 'text_generation',
 'text_qa',
 'text_summarization']

You can pass `tags` and `task_types` as parameters to the `vm.tests.list_tests()` function to filter the tests based on the tags and task types. For example, to find tests related to tabular data quality for classification models, you can call `list_tests()` like this:


In [7]:
vm.tests.list_tests(task="classification", tags=["tabular_data", "data_quality"])

ID,Name,Test Type,Description,Required Inputs,Params
validmind.data_validation.MissingValuesRisk,Missing Values Risk,Metric,Assesses and quantifies the risk related to missing values in a dataset used for training an ML model....,['dataset'],{}
validmind.data_validation.Skewness,Skewness,ThresholdTest,Evaluates the skewness of numerical data in a machine learning model and checks if it falls below a set maximum...,['dataset'],{'max_threshold': 1}
validmind.data_validation.Duplicates,Duplicates,ThresholdTest,"Tests dataset for duplicate entries, ensuring model reliability via data quality verification....",['dataset'],{'min_threshold': 1}
validmind.data_validation.MissingValuesBarPlot,Missing Values Bar Plot,Metric,Creates a bar plot showcasing the percentage of missing values in each column of the dataset with risk...,['dataset'],"{'threshold': 80, 'fig_height': 600}"
validmind.data_validation.HighCardinality,High Cardinality,ThresholdTest,Assesses the number of unique values in categorical columns to detect high cardinality and potential overfitting....,['dataset'],"{'num_threshold': 100, 'percent_threshold': 0.1, 'threshold_type': 'percent'}"
validmind.data_validation.MissingValues,Missing Values,ThresholdTest,Evaluates dataset quality by ensuring missing value ratio across all features does not exceed a set threshold....,['dataset'],{'min_threshold': 1}
validmind.data_validation.HighPearsonCorrelation,High Pearson Correlation,ThresholdTest,Identifies highly correlated feature pairs in a dataset suggesting feature redundancy or multicollinearity....,['dataset'],{'max_threshold': 0.3}


## Data preparation


In [8]:
import pandas as pd
raw_df = pd.read_csv("./datasets/odm_data_example/synthetic_data.csv")
print(f"Columns {list(raw_df.columns)}")
print(f"Size {list(raw_df.shape)}")

raw_df.head(4)

Columns ['bal_date', 'cust_id', 'LOB_data', 'cust_ipid_nm', 'ult_parent_cust_ipid_no', 'ult_parent_cust_nm', 'client', 'subclient', 'EOD', 'Total_Outflow', 'Total_Inflow', 'Total_Outflow_Volume', 'Total_Inflow_Volume']
Size [100000, 13]


Unnamed: 0,bal_date,cust_id,LOB_data,cust_ipid_nm,ult_parent_cust_ipid_no,ult_parent_cust_nm,client,subclient,EOD,Total_Outflow,Total_Inflow,Total_Outflow_Volume,Total_Inflow_Volume
0,09/20/22,193226329,ASSET SERVICING INTERNAL,1,849333769,Company of 1,BANK 8,INVESTMENT BANK,620588441,6045789727,6056785345,65,69
1,12/28/22,147845695,ASSET SERVICING INTERNAL,2,846403000,Company of 2,BANK 2,COMMERCIAL BANK,749542305,7668359015,7821262323,79,71
2,12/09/22,972808766,ASSET SERVICING INTERNAL,0,898029133,Company of 0,BANK 9,COMMERCIAL BANK,587273414,5734508268,5808149422,55,54
3,11/11/22,569050631,ASSET SERVICING INTERNAL,0,579834489,Company of 0,BANK 8,COMMERCIAL BANK,569466714,5938051746,5959450622,55,57


# Data validation

Now that we have loaded our dataset, we can go ahead and run some data validation tests right away to start assessing and documenting the quality of our data. Since we are using a text dataset, we can use ValidMind's built-in array of text data quality tests to check that things like number of duplicates, missing values, and other common text data issues are not present in our dataset. We can also run some tests to check the sentiment and toxicity of our data.

## Validmind objects


In [9]:
vm_raw_ds = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_dataset",
    target_column="cust_ipid_nm"
)

2024-05-23 16:52:04,891 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


## Data Validation

In [10]:
vm.tests.list_tests(filter="data_validation")

ID,Name,Test Type,Description,Required Inputs,Params
validmind.data_validation.nlp.Hashtags,Hashtags,ThresholdTest,"Assesses hashtag frequency in a text column, highlighting usage trends and potential dataset bias or spam....","['dataset', 'dataset.text_column']",{'top_hashtags': 25}
validmind.data_validation.HighCardinality,High Cardinality,ThresholdTest,Assesses the number of unique values in categorical columns to detect high cardinality and potential overfitting....,['dataset'],"{'num_threshold': 100, 'percent_threshold': 0.1, 'threshold_type': 'percent'}"
validmind.data_validation.EngleGrangerCoint,Engle Granger Coint,Metric,Validates co-integration in pairs of time series data using the Engle-Granger test and classifies them as...,['dataset'],{'threshold': 0.05}
validmind.data_validation.nlp.TextDescription,Text Description,Metric,"Performs comprehensive textual analysis on a dataset using NLTK, evaluating various parameters and generating...","['dataset', 'dataset.text_column']","{'unwanted_tokens': {'s', 'us', 'dollar', ' ', ""'s"", 'dr', 'mr', '``', 'mrs', ""s'"", ""''"", 'ms'}, 'num_top_words': 3, 'lang': 'english'}"
validmind.data_validation.TooManyZeroValues,Too Many Zero Values,ThresholdTest,"Identifies numerical columns in a dataset that contain an excessive number of zero values, defined by a threshold...",['dataset'],{'max_percent_threshold': 0.03}
validmind.data_validation.TimeSeriesOutliers,Time Series Outliers,ThresholdTest,Identifies and visualizes outliers in time-series data using z-score method....,['dataset'],{'zscore_threshold': 3}
validmind.data_validation.ANOVAOneWayTable,ANOVA One Way Table,Metric,Applies one-way ANOVA (Analysis of Variance) to identify statistically significant numerical features in the...,['dataset'],"{'features': None, 'p_threshold': 0.05}"
validmind.data_validation.DescriptiveStatistics,Descriptive Statistics,Metric,Performs a detailed descriptive statistical analysis of both numerical and categorical data within a model's...,['dataset'],{}
validmind.data_validation.RollingStatsPlot,Rolling Stats Plot,Metric,This test evaluates the stationarity of time series data by plotting its rolling mean and standard deviation....,['dataset'],{'window_size': 12}
validmind.data_validation.nlp.PolarityAndSubjectivity,Polarity And Subjectivity,Metric,Analyzes the polarity and subjectivity of text data within a dataset....,['dataset'],{}


### Dataset summary

In [11]:
vm_ds_summary = vm.init_dataset(
    dataset=raw_df.drop('bal_date', axis=1),
    input_id="raw_dataset",
    target_column="cust_ipid_nm"
)
result = vm.tests.run_test(
    "validmind.data_validation.DatasetDescription",
    dataset=vm_ds_summary
).log()

2024-05-23 16:52:05,761 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


VBox(children=(HTML(value='<h1>Dataset Description</h1>'), HTML(value="<p><strong>Dataset Description</strong>…

### Duplicates

First, let's check for duplicates in our dataset. We can use the `validmind.data_validation.Duplicates` test and pass our dataset:

In [12]:
result = vm.tests.run_test(
    "validmind.data_validation.Duplicates",
    dataset=vm_raw_ds
).log()

VBox(children=(HTML(value='\n            <h1>Duplicates ✅</h1>\n            <p><strong>Duplicates</strong> cal…

### Missing values

Next, let's check for missing values in our dataset. We can use the `validmind.data_validation.MissingValues` test and pass our dataset:

In [13]:
result = vm.tests.run_test(
    "validmind.data_validation.MissingValues",
    dataset=vm_raw_ds
)

VBox(children=(HTML(value='\n            <h1>Missing Values ✅</h1>\n            <p><strong>Missing Values</str…

### Unique rows

Next, let's check for unique rows in our dataset. We can use the `validmind.data_validation.UniqueRows` test and pass our dataset:

In [14]:
result = vm.tests.run_test(
    "validmind.data_validation.UniqueRows",
    dataset=vm_raw_ds
)

VBox(children=(HTML(value='\n            <h1>Unique Rows ✅</h1>\n            <p><strong>Unique Rows</strong> c…

### High cardinality

Next, let's check for high cardinality in our dataset. We can use the `validmind.data_validation.HighCardinality` test and pass our dataset:

In [15]:
result = vm.tests.run_test(
    "validmind.data_validation.HighCardinality",
    dataset=vm_raw_ds
).log()

VBox(children=(HTML(value='\n            <h1>High Cardinality ✅</h1>\n            <p><strong>High Cardinality<…

### Skewness

Next, let's check for skewness in our dataset. We can use the `validmind.data_validation.Skewness` test and pass our dataset:

In [16]:
result = vm.tests.run_test(
    "validmind.data_validation.Skewness",
    dataset=vm_raw_ds
).log()

VBox(children=(HTML(value='\n            <h1>Skewness ✅</h1>\n            <p><strong>Skewness</strong> evaluat…

### Zero Values

Next, let's check for zeros values in our dataset. We can use the `validmind.data_validation.TooManyZeroValues` test and pass our dataset:

In [17]:
result = vm.tests.run_test(
    "validmind.data_validation.TooManyZeroValues",
    dataset=vm_raw_ds
).log()

VBox(children=(HTML(value='\n            <h1>Too Many Zero Values ✅</h1>\n            <p><strong>Too Many Zero…

### Descriptive statistics

Next, let's check statistics of our dataset. We can use the `validmind.data_validation.DescriptiveStatistics` test and pass our dataset:

In [18]:
result = vm.tests.run_test(
    "validmind.data_validation.DescriptiveStatistics",
    dataset=vm_raw_ds
).log()

VBox(children=(HTML(value='<h1>Descriptive Statistics</h1>'), HTML(value="<p><strong>Descriptive Statistics</s…

### High pearson correlation

Next, let's check person correlation of our dataset. We can use the `validmind.data_validation.HighPearsonCorrelation` test and pass our dataset:

In [19]:
result = vm.tests.run_test(
    "validmind.data_validation.HighPearsonCorrelation",
    dataset=vm_raw_ds
)
result.log()

VBox(children=(HTML(value='\n            <h1>High Pearson Correlation ❌</h1>\n            <p><strong>High Pear…

### Pearson correlation matrix

Next, let's check person correlation matrix of our dataset. We can use the `validmind.data_validation.PearsonCorrelationMatrix` test and pass our dataset:

In [20]:
result = vm.tests.run_test(
    "validmind.data_validation.PearsonCorrelationMatrix",
    dataset=vm_raw_ds
).log()

VBox(children=(HTML(value='<h1>Pearson Correlation Matrix</h1>'), HTML(value="<p><strong>Pearson Correlation M…

## Segmentation of clients

In [21]:
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans

In [22]:
cluster_df = raw_df.drop(columns=['LOB_data','cust_id', 'bal_date', 'ult_parent_cust_ipid_no','ult_parent_cust_nm','client','subclient'], axis=1)
target_column = 'cust_ipid_nm'
cluster_df.head(2)

Unnamed: 0,cust_ipid_nm,EOD,Total_Outflow,Total_Inflow,Total_Outflow_Volume,Total_Inflow_Volume
0,1,620588441,6045789727,6056785345,65,69
1,2,749542305,7668359015,7821262323,79,71


### Clustering
Let's build Kmeans model

In [23]:
from validmind.datasets.cluster import digits as demo_dataset
cluster_df = cluster_df.dropna()
train_df, validation_df, test_df = demo_dataset.preprocess(cluster_df)

x_train = train_df.drop(target_column, axis=1)
y_train = train_df[target_column]
x_val = validation_df.drop(target_column, axis=1)
y_val = validation_df[target_column]
x_test = test_df.drop(target_column, axis=1)
y_test = test_df[target_column]


x_train = pd.concat([x_train, x_val], axis=0)
y_train = pd.concat([y_train, y_val], axis=0)

scale = False
if scale:
    scaler = StandardScaler()
    x_train = scaler.fit_transform(x_train)
    x_val = scaler.fit_transform(x_val)
    x_test = scaler.fit_transform(x_test)


n_clusters = 4
model = KMeans(init="k-means++", n_clusters=n_clusters, n_init=4) # random_state=0
model = model.fit(x_train)

Let's prepate VM dataset objects

In [24]:
vm_train_ds = vm.init_dataset(
    dataset=train_df,
    target_column=target_column
)

vm_test_ds = vm.init_dataset(
    dataset=test_df,
    target_column=target_column
)

2024-05-23 16:53:47,181 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


2024-05-23 16:53:47,340 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


In [25]:
vm_model = vm.init_model(
    model,
    input_id="kmean_model"
)

### Prediction
Prediction values can be attached using `assign_prediction` interface.

In [26]:
vm_train_ds.assign_predictions(model=vm_model)
vm_test_ds.assign_predictions(model=vm_model)

2024-05-23 16:53:47,620 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while
2024-05-23 16:53:47,620 - INFO(validmind.vm_models.dataset.utils): Not running predict_proba() for unsupported models.
2024-05-23 16:53:47,621 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may take a while

X does not have valid feature names, but KMeans was fitted with feature names

2024-05-23 16:53:47,623 - INFO(validmind.vm_models.dataset.utils): Done running predict()
2024-05-23 16:53:47,624 - INFO(validmind.vm_models.dataset.dataset): No probabilities computed or provided. Not adding probability column to the dataset.
2024-05-23 16:53:47,625 - INFO(validmind.vm_models.dataset.utils): Running predict_proba()... This may take a while
2024-05-23 16:53:47,626 - INFO(validmind.vm_models.dataset.utils): Not running predict_proba() for unsupported models.
2024-05-23 16:53:47,626 - INFO(validmind.vm_models.dataset.utils): Running predict()... This may

### Compare Manual vs predicted 

In [27]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.ConfusionMatrix:training",
    inputs={
        "dataset":vm_train_ds,
        "model":vm_model
    }
).log()

VBox(children=(HTML(value='<h1>Confusion Matrix Training</h1>'), HTML(value="<p><strong>Confusion Matrix</stro…

### Confusion matrix - test data

In [28]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.ConfusionMatrix:test",
    inputs={
        "dataset":vm_test_ds,
        "model":vm_model
    }
).log()

VBox(children=(HTML(value='<h1>Confusion Matrix Test</h1>'), HTML(value="<p><strong>Confusion Matrix</strong> …

### Hyper parameter tuning

In [29]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.HyperParametersTuning",
    inputs={
        "dataset":vm_train_ds,
        "model":vm_model
    },
    params={
         "param_grid": {"n_clusters": range(3, 6)}
    }
).log()

VBox(children=(HTML(value='<h1>Hyper Parameters Tuning</h1>'), HTML(value='<p><strong>Hyper Parameters Tuning<…

### Cluster performance metrics

In [30]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.ClusterPerformanceMetrics",
    inputs={
        "datasets": (vm_train_ds, vm_test_ds),
        "model":vm_model
    }
).log()

VBox(children=(HTML(value='<h1>Cluster Performance Metrics</h1>'), HTML(value='<p><strong>Cluster Performance …

### No of clusters optimization


In [31]:
result = vm.tests.run_test(
    "validmind.model_validation.sklearn.KMeansClustersOptimization",
    inputs={
        "dataset": vm_train_ds,
        "model":vm_model
    },
    params={
        "n_clusters": range(2, 8),
    }

).log()

VBox(children=(HTML(value='<h1>K Means Clusters Optimization</h1>'), HTML(value="<p><strong>K-Means Clusters O…

## Operational deposit  model

### Operational deposit model compuation

In [32]:
operational_deposit_df = raw_df.copy()
target_column = 'cust_ipid_nm'

# Eod_outflow_ratio
# Step 4: Statistical Analysis
def calculate_eod_outflow_ratio(df):
    df['eod_outflow_ratio'] = df['EOD'] / df['Total_Outflow']
    
    return df

operational_deposit_df = calculate_eod_outflow_ratio(operational_deposit_df)

# Step 5: Model Implementation
def rolling_average(df, window=30):
    df['rolling_eod_balance'] = df.groupby('cust_ipid_nm')['EOD'].rolling(window=window).mean().reset_index(level=0, drop=True)
    df['rolling_daily_outflow'] = df.groupby('cust_ipid_nm')['Total_Outflow'].rolling(window=window).mean().reset_index(level=0, drop=True)
    return df

operational_deposit_df = rolling_average(operational_deposit_df)

# # Step 6: Output Generation
# def generate_outputs(df):
#     output_df = df.groupby(['cust_ipid_nm', 'subclient']).agg({
#         'rolling_eod_balance': 'last',
#         'rolling_daily_outflow': 'last'
#     }).reset_index()
#     output_df['operational_core'] = output_df['rolling_eod_balance'] / output_df['rolling_daily_outflow']
#     return output_df

# raw_df = generate_outputs(raw_df)


### Prepare VM dataset for the model

In [33]:
from validmind.datasets.cluster import digits as demo_dataset
operational_deposit_df = operational_deposit_df.dropna()

x_train = operational_deposit_df.drop(target_column, axis=1)
y_train = operational_deposit_df[target_column]

vm_od_ds = vm.init_dataset(
    dataset=operational_deposit_df,
    input_id="od_dataset",
    target_column="cust_ipid_nm"
)

2024-05-23 16:57:00,172 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


### VM model
VM provides flexibility to generate model as per the use case requirement. Here, it's simple we treat prediction value as value of column `rolling_daily_outflow`

In [34]:
def operational_deposit(input):
    return input["rolling_daily_outflow"]

vm_od_model = vm.init_model(input_id="operational_deposit", predict_fn=operational_deposit)
vm_od_ds.assign_predictions(model=vm_od_model, prediction_column="rolling_daily_outflow")
print(vm_od_ds)

2024-05-23 16:57:00,591 - INFO(validmind.vm_models.dataset.dataset): No probabilities computed or provided. Not adding probability column to the dataset.


VMDataset object: 
Input ID: od_dataset
Target Column: cust_ipid_nm
Feature Columns: ['bal_date', 'cust_id', 'LOB_data', 'ult_parent_cust_ipid_no', 'ult_parent_cust_nm', 'client', 'subclient', 'EOD', 'Total_Outflow', 'Total_Inflow', 'Total_Outflow_Volume', 'Total_Inflow_Volume', 'eod_outflow_ratio', 'rolling_eod_balance']
Text Column: None
Extra Columns: ExtraColumns(extras=set(), group_by_column=None, prediction_columns={'operational_deposit': 'rolling_daily_outflow'}, probability_columns={})
Target Class Labels: None
Columns: ['bal_date', 'cust_id', 'LOB_data', 'cust_ipid_nm', 'ult_parent_cust_ipid_no', 'ult_parent_cust_nm', 'client', 'subclient', 'EOD', 'Total_Outflow', 'Total_Inflow', 'Total_Outflow_Volume', 'Total_Inflow_Volume', 'eod_outflow_ratio', 'rolling_eod_balance', 'rolling_daily_outflow', 'rolling_daily_outflow']
Index: [  105   106   109 ... 99997 99998 99999]



### External test provider

In [35]:
from validmind.tests import LocalTestProvider

tests_folder = "tests"
# initialize the test provider with the tests folder we created earlier
my_test_provider = LocalTestProvider(tests_folder)

vm.tests.register_test_provider(
    namespace="my_test_provider",
    test_provider=my_test_provider,
)

### Simple custom test
Let's plot timeseries line plot by grouping a specific column in the dataset

In [36]:
from validmind.tests import run_test

result = run_test(
    "my_test_provider.TimeseriesGroupbyPlot:Total_Outflow",
    inputs={
        "dataset": vm_od_ds,
        "model": vm_od_model
    },
    params={
        "date_column": "bal_date",
        "groupby_column": "cust_ipid_nm",
        "y_column": "Total_Outflow",
    },
).log()


The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



VBox(children=(HTML(value='<h1>Timeseries Groupby Plot Total Outflow</h1>'), HTML(value='<p><strong>Timeseries…

In [37]:
from validmind.tests import run_test
result = run_test(
    "my_test_provider.TimeseriesGroupbyPlot:Total_Outflow",
    inputs={
        "dataset": vm_od_ds,
        "model": vm_od_model
    },
    params={
        "date_column": "bal_date",
        "groupby_column": "client",
        "y_column": "Total_Outflow",
    },
).log()


The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



VBox(children=(HTML(value='<h1>Timeseries Groupby Plot Total Outflow</h1>'), HTML(value='<p><strong>Time Serie…

In [38]:
from validmind.tests import run_test

result = run_test(
    "my_test_provider.TimeseriesGroupbyPlot:eod_outflow_ratio",
    inputs={
        "dataset": vm_od_ds,
        "model": vm_od_model
    },
    params={
        "date_column": "bal_date",
        "groupby_column": "subclient",
        "y_column": "eod_outflow_ratio",
    },
).log()


The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



VBox(children=(HTML(value='<h1>Timeseries Groupby Plot Eod Outflow Ratio</h1>'), HTML(value='<p><strong>Time S…

In [39]:
from validmind.tests import run_test

result = run_test(
    "my_test_provider.TimeseriesGroupbyPlot:rolling_eod_balance",
    inputs={
        "dataset": vm_od_ds,
        "model": vm_od_model
    },
    params={
        "date_column": "bal_date",
        "groupby_column": "subclient",
        "y_column": "rolling_eod_balance",
    },
).log()


The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



VBox(children=(HTML(value='<h1>Timeseries Groupby Plot Rolling Eod Balance</h1>'), HTML(value='<p><strong>Time…

<a id='toc8_'></a>

## Where to go from here

In this notebook you have learned the end-to-end process to document a model with the ValidMind Developer Framework, running through some very common scenarios in a typical model development setting:

- Running out-of-the-box tests
- Documenting your model by adding evidence to model documentation
- Extending the capabilities of the Developer Framework by implementing custom tests
- Ensuring that the documentation is complete by running all tests in the documentation template

As a next step, you can explore the following notebooks to get a deeper understanding on how the developer framework allows you generate model documentation for any use case:

<a id='toc8_1_'></a>

### Use cases

- [Application scorecard demo](../code_samples/credit_risk/application_scorecard_demo.ipynb)
- [Linear regression documentation demo](../code_samples/regression/quickstart_regression_full_suite.ipynb)
- [LLM model documentation demo](../code_samples/nlp_and_llm/foundation_models_integration_demo.ipynb)

<a id='toc8_2_'></a>

### More how-to guides and code samples

- [Explore available tests in detail](../how_to/explore_tests.ipynb)
- [In-depth guide for implementing custom tests](../code_samples/custom_tests/implement_custom_tests.ipynb)
- [In-depth guide to external test providers](../code_samples/custom_tests/integrate_external_test_providers.ipynb)
- [Configuring dataset features](../how_to/configure_dataset_features.ipynb)
- [Introduction to unit and composite metrics](../how_to/run_unit_metrics.ipynb)

<a id='toc8_3_'></a>

### Discover more learning resources

All notebook samples can be found in the following directories of the Developer Framework GitHub repository:

- [Code samples](https://github.com/validmind/developer-framework/tree/main/notebooks/code_samples)
- [How-to guides](https://github.com/validmind/developer-framework/tree/main/notebooks/how_to)
