# Deposits Forecast Model with Seasonality using PyMC and Random Forest

0. Explore ValidMind developer framework
1. Data quality tests
2. Seasonality adjustment
3. Custom tests
4. Random forest model
5. Model validation test
6. Review model document

<a id='toc2_'></a>

## About ValidMind

ValidMind is a platform for managing model risk, including risk associated with AI and statistical models.

You use the ValidMind Developer Framework to automate documentation and validation tests, and then use the ValidMind AI Risk Platform UI to collaborate on model documentation. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators.

<a id='toc2_1_'></a>

### Before you begin

This notebook assumes you have basic familiarity with Python, including an understanding of how functions work. If you are new to Python, you can still run the notebook but we recommend further familiarizing yourself with the language. 

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).

<a id='toc2_2_'></a>

### New to ValidMind?

If you haven't already seen our [Get started with the ValidMind Developer Framework](https://docs.validmind.ai/guide/get-started-developer-framework.html), we recommend you explore the available resources for developers at some point. There, you can learn more about documenting models, find code samples, or read our developer reference.

::: {.callout-tip}

For access to all features available in this notebook, create a free ValidMind account.

Signing up is FREE — [**Sign up now!**](https://app.prod.validmind.ai)

:::

<a id='toc2_3_'></a>

![Dataset based test architecture](./images/dataset_image.png)
![Model based test architecture](./images/model_image.png)

# Pre-requisites

Let's go ahead and install the `validmind` library if its not already installed.

In [None]:
%pip install -q validmind

In [None]:
import arviz as az
import numpy as np
import pandas as pd
import pymc as pm
import plotly.express as px
import plotly.graph_objects as go

<a id='toc4_'></a>

## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details, making sure to select **Time Series Forecasting** as the template and **Credit Risk - Underwriting - Loan** as the use case, and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/register-models-in-model-inventory.html))

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:


In [None]:
# Replace with your code snippet

import validmind as vm

vm.init(
  api_host = "https://api.prod.validmind.ai/api/v1/tracking",
  api_key = "...",
  api_secret = "...",
  project = "..."
)

Before learning how to run tests, let's explore the list of all available tests in the ValidMind Developer Framework. You can see that the documentation template for this model has references to some of the test IDs listed below.

In [None]:
vm.tests.list_tests()

Let's do some data quality assessments by running a few individual tests related to data assessment. You will use the `vm.tests.list_tests()` function introduced above in combination with `vm.tests.list_tags()` and `vm.tests.list_task_types()` to find which prebuilt tests are relevant for data quality assessment.

In [None]:
# Get the list of available tags
sorted(vm.tests.list_tags())

In [None]:
# Get the list of available task types
sorted(vm.tests.list_task_types())

You can pass `tags` and `task_types` as parameters to the `vm.tests.list_tests()` function to filter the tests based on the tags and task types. For example, to find tests related to tabular data quality for classification models, you can call `list_tests()` like this:

In [None]:
vm.tests.list_tests(task="regression", tags=["time_series_data"])

# Data preparation

## Load data

In [None]:
from validmind.datasets.regression import fred_deposits as demo_dataset

deposits_df, deposits_seasonality_df, fedfunds_df, tb3ms_df, gs10_df, gs30_df = demo_dataset.load_data()

raw_df = deposits_seasonality_df.copy()

raw_df["FEDFUNDS"] = fedfunds_df["FEDFUNDS"]
raw_df["TB3MS"] = tb3ms_df["TB3MS"]
raw_df["GS10"] = gs10_df["GS10"]
raw_df["GS30"] = gs30_df["GS30"]

target_column = demo_dataset.target_column

raw_df.head()

## Run data validation tests

In [None]:
vm_raw_dataset = vm.init_dataset(
    dataset=raw_df,
    input_id="raw_ds",
    target_column=target_column,
)

In [None]:
test= vm.tests.run_test(
    "validmind.data_validation.TimeSeriesLinePlot",
    inputs = {
        "dataset": vm_raw_dataset,
    }
)
test.log()

In [None]:
test= vm.tests.run_test(
    "validmind.data_validation.TimeSeriesFrequency",
    inputs = {
        "dataset": vm_raw_dataset,
    }
)
test.log()

In [None]:
test= vm.tests.run_test(
    "validmind.data_validation.SeasonalDecompose",
    inputs = {
        "dataset": vm_raw_dataset,
    }
)
test.log()

# Seasonality adjustment with PyMC

## Fit linear PyMC model

In [None]:
pymc_df = raw_df.copy()
pymc_df["Month"] = pymc_df.index

t = (pymc_df["Month"]- pd.Timestamp("1900-01-01")).dt.days.to_numpy()
t_min = np.min(t)
t_max = np.max(t)
t = (t - t_min) / (t_max - t_min)

y = pymc_df[target_column].to_numpy()
y_max = np.max(y)
y = y / y_max

In [None]:
with pm.Model(check_bounds=False) as linear:
    alpha = pm.Normal("alpha", mu=0, sigma=0.5)
    beta = pm.Normal("beta", mu=0, sigma=0.5)
    sigma = pm.HalfNormal("sigma", sigma=0.5)
    trend = pm.Deterministic("trend", alpha + beta * t)
    pm.Normal("likelihood", mu=trend, sigma=sigma, observed=y)

    linear_prior = pm.sample_prior_predictive()

with linear:
    linear_trace = pm.sample(return_inferencedata=True)
    linear_prior = pm.sample_posterior_predictive(trace=linear_trace)

In [None]:
likelihood = az.extract(linear_prior, group="posterior_predictive", num_samples=100)["likelihood"] * y_max
trend = az.extract(linear_trace, group="posterior", num_samples=100)["trend"] * y_max

## External test provider

In [None]:
from validmind.tests import LocalTestProvider

tests_folder = "tests"
# initialize the test provider with the tests folder we created earlier
my_test_provider = LocalTestProvider(tests_folder)

vm.tests.register_test_provider(
    namespace="bny_test_provider",
    test_provider=my_test_provider,
)

## Run custom tests

In [None]:
vm_pymc_ds = vm.init_dataset(
    dataset=pymc_df,
    input_id="pymc_ds",
    target_column=target_column,
)

In [None]:
from validmind.tests import run_test

result = run_test(
    "bny_test_provider.PyMCPlot:Posterior_Likelihood",
    inputs={
        "dataset": vm_pymc_ds,
    },
    params={
        "pymc_output": likelihood,
        "month_column": "Month",
        "title": "Posterior Predictive",
    },
).log()

In [None]:
result = run_test(
    "bny_test_provider.PyMCPlot:Posterior_Trend",
    inputs={
        "dataset": vm_pymc_ds,
    },
    params={
        "pymc_output": trend,
        "month_column": "Month",
        "title": "Posterior Trend Lines",
    },
).log()

## Fit Seasonality PyMC model

### Create fourier features

In [None]:
n_order = 10
periods = (pymc_df["Month"] - pd.Timestamp("1900-01-01")).dt.days / 365.25

fourier_features = pd.DataFrame(
    {
        f"{func}_order_{order}": getattr(np, func)(2 * np.pi * periods * order)
        for order in range(1, n_order + 1)
        for func in ("sin", "cos")
    }
)
fourier_features

In [None]:
coords = {"fourier_features": np.arange(2 * n_order)}
with pm.Model(check_bounds=False, coords=coords) as linear_with_seasonality:
    alpha = pm.Normal("alpha", mu=0, sigma=0.5)
    beta = pm.Normal("beta", mu=0, sigma=0.5)
    sigma = pm.HalfNormal("sigma", sigma=0.1)
    beta_fourier = pm.Normal("beta_fourier", mu=0, sigma=0.1, dims="fourier_features")
    seasonality = pm.Deterministic(
        "seasonality", pm.math.dot(beta_fourier, fourier_features.to_numpy().T)
    )
    trend = pm.Deterministic("trend", alpha + beta * t)
    mu = trend + seasonality
    pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y)

    linear_seasonality_prior = pm.sample_prior_predictive()

In [None]:
likelihood = az.extract(linear_seasonality_prior, group="prior_predictive", num_samples=100)["likelihood"] * y_max
trend = az.extract(linear_seasonality_prior, group="prior", num_samples=100)["trend"] * y_max
seasonality = az.extract(linear_seasonality_prior, group="prior", num_samples=100)["seasonality"] * 100

## Run custom tests

In [None]:
result = run_test(
    "bny_test_provider.PyMCPlot:Prior_Likelihood",
    inputs={
        "dataset": vm_pymc_ds,
    },
    params={
        "pymc_output": likelihood,
        "month_column": "Month",
        "title": "Prior Predictive",
    },
).log()

In [None]:
result = run_test(
    "bny_test_provider.PyMCPlot:Prior_Trend",
    inputs={
        "dataset": vm_pymc_ds,
    },
    params={
        "pymc_output": trend,
        "month_column": "Month",
        "title": "Prior Trend Lines",
    },
).log()

In [None]:
result = run_test(
    "bny_test_provider.PyMCSeasonalityPlot:Prior_Seasonality_Lines",
    inputs={
        "dataset": vm_pymc_ds,
    },
    params={
        "seasonality": seasonality,
        "month_column": "Month",
        "title": "Prior Seasonality Lines",
    },
).log()

## Posterior seasonality checks

In [None]:
with linear_with_seasonality:
    linear_seasonality_trace = pm.sample(return_inferencedata=True)
    linear_seasonality_posterior = pm.sample_posterior_predictive(trace=linear_seasonality_trace)

In [None]:
likelihood = az.extract(linear_seasonality_posterior, group="posterior_predictive", num_samples=100)["likelihood"] * y_max
trend = az.extract(linear_trace, group="posterior", num_samples=100)["trend"] * y_max
seasonality = az.extract(linear_seasonality_trace, group="posterior", num_samples=100)["seasonality"] * 10000

## Run custom tests

In [None]:
result = run_test(
    "bny_test_provider.PyMCPlot:Posterior_Predictive_Seasonality",
    inputs={
        "dataset": vm_pymc_ds,
    },
    params={
        "pymc_output": likelihood,
        "month_column": "Month",
        "title": "Posterior Predictive Seasonality",
    },
).log()

In [None]:
result = run_test(
    "bny_test_provider.PyMCPlot:Posterior_Trend_Seasonality",
    inputs={
        "dataset": vm_pymc_ds,
    },
    params={
        "pymc_output": trend,
        "month_column": "Month",
        "title": "Posterior Trend Lines",
    },
).log()

In [None]:
result = run_test(
    "bny_test_provider.PyMCSeasonalityPlot:Posterior_Seasonality_Lines",
    inputs={
        "dataset": vm_pymc_ds,
    },
    params={
        "seasonality": seasonality,
        "month_column": "Month",
        "title": "Posterior Seasonality Lines",
    },
).log()

# Random Forest model

## Prepare data

In [None]:
# Extract the posterior predictive mean for seasonality
seasonality_posterior_mean = seasonality.mean(axis=1)

In [None]:
preprocessed_df = raw_df.copy()

# Adjust the target variable by removing the seasonality component
preprocessed_df[target_column] = preprocessed_df[target_column] - seasonality_posterior_mean
preprocessed_df

## Fit the model

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

train_df, test_df = train_test_split(preprocessed_df, test_size=0.20)

X_train = train_df.drop(target_column, axis=1)
y_train = train_df[target_column]
X_test = test_df.drop(target_column, axis=1)
y_test = test_df[target_column]

model = RandomForestRegressor(random_state=0)
model.fit(X_train, y_train)
s1 = model.score(X_train, y_train)
s2 = model.score(X_test, y_test)
print("R² of Support Vector Regressor on training set: {:.3f}".format(s1))
print("R² of Support Vector Regressor on test set: {:.3f}".format(s2))

### Create VM datasets and model

In [None]:
vm_train_ds = vm.init_dataset(
    dataset=train_df, input_id="train_dataset", target_column=target_column
)

vm_test_ds = vm.init_dataset(
    dataset=test_df, input_id="test_dataset", target_column=target_column
)

vm_model = vm.init_model(
    model,
    input_id="random_forest_regressor",
)

## Assign predictions 

In [None]:
vm_train_ds.assign_predictions(
    model=vm_model,
)

vm_test_ds.assign_predictions(
    model=vm_model,
)

In [None]:
vm_test_ds.df.head()

## Run model validation tests

In [None]:
test= vm.tests.run_test(
    "validmind.model_validation.ModelMetadata",
    inputs = {
        "model": vm_model
    }
)
test.log()

In [None]:
test= vm.tests.run_test(
    "validmind.data_validation.DatasetSplit",
    inputs = {
        "datasets": [vm_train_ds, vm_test_ds]
    }
)
test.log()

In [None]:
test= vm.tests.run_test(
    "validmind.model_validation.sklearn.RegressionErrors",
    inputs = {
        "datasets": [vm_train_ds, vm_test_ds],
        "model": vm_model
    }
)
test.log()

In [None]:
test= vm.tests.run_test(
    "validmind.model_validation.RegressionResidualsPlot:train_dataset",
    inputs = {
        "dataset": vm_train_ds,
        "model": vm_model
    }
)
test.log()

In [None]:
test= vm.tests.run_test(
    "validmind.model_validation.RegressionResidualsPlot:test_dataset",
    inputs = {
        "dataset": vm_test_ds,
        "model": vm_model
    }
)
test.log()

In [None]:
test= vm.tests.run_test(
    "validmind.model_validation.RegressionResidualsPlot:test_dataset",
    inputs = {
        "dataset": vm_test_ds,
        "model": vm_model
    }
)
test.log()

In [None]:
test= vm.tests.run_test(
    "validmind.model_validation.sklearn.RegressionR2Square",
    inputs = {
        "datasets": [vm_train_ds, vm_test_ds],
        "model": vm_model
    }
)
test.log()

In [None]:
test= vm.tests.run_test(
    "validmind.model_validation.sklearn.PermutationFeatureImportance:train_dataset",
    inputs = {
        "dataset": vm_train_ds,
        "model": vm_model
    }
)
test.log()

In [None]:
test= vm.tests.run_test(
    "validmind.model_validation.sklearn.PermutationFeatureImportance:test_dataset",
    inputs = {
        "dataset": vm_test_ds,
        "model": vm_model
    }
)
test.log()

<a id='toc8_'></a>

## Where to go from here

In this notebook you have learned the end-to-end process to document a model with the ValidMind Developer Framework, running through some very common scenarios in a typical model development setting:

- Running out-of-the-box tests
- Documenting your model by adding evidence to model documentation
- Extending the capabilities of the Developer Framework by implementing custom tests
- Ensuring that the documentation is complete by running all tests in the documentation template

As a next step, you can explore the following notebooks to get a deeper understanding on how the developer framework allows you generate model documentation for any use case:

<a id='toc8_1_'></a>

### Use cases

- [Application scorecard demo](../code_samples/credit_risk/application_scorecard_demo.ipynb)
- [Linear regression documentation demo](../code_samples/regression/quickstart_regression_full_suite.ipynb)
- [LLM model documentation demo](../code_samples/nlp_and_llm/foundation_models_integration_demo.ipynb)

<a id='toc8_2_'></a>

### More how-to guides and code samples

- [Explore available tests in detail](../how_to/explore_tests.ipynb)
- [In-depth guide for implementing custom tests](../code_samples/custom_tests/implement_custom_tests.ipynb)
- [In-depth guide to external test providers](../code_samples/custom_tests/integrate_external_test_providers.ipynb)
- [Configuring dataset features](../how_to/configure_dataset_features.ipynb)
- [Introduction to unit and composite metrics](../how_to/run_unit_metrics.ipynb)

<a id='toc8_3_'></a>

### Discover more learning resources

All notebook samples can be found in the following directories of the Developer Framework GitHub repository:

- [Code samples](https://github.com/validmind/developer-framework/tree/main/notebooks/code_samples)
- [How-to guides](https://github.com/validmind/developer-framework/tree/main/notebooks/how_to)
