# Corporate Deposits Forecast Model Demo

<a id='toc2_'></a>

## Install the client library

The client library provides Python support for the ValidMind Developer Framework. To install it:

In [1]:
%pip install -q validmind

[0mNote: you may need to restart the kernel to use updated packages.


<a id='toc3_'></a>

## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/register-models-in-model-inventory.html))

   For example, to register a model for use with this notebook, select:

   - Documentation template: `Baseline Template`
   - Use case: `Analytics/Analytics`

   You can fill in other options according to your preference.

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:

In [2]:
import validmind as vm

vm.init(
  api_host = "https://api.prod.validmind.ai/api/v1/tracking",
  api_key = "...",
  api_secret = "...",
  project = "..."
)

2024-05-28 22:05:58,534 - INFO(validmind.api_client): Connected to ValidMind. Project: Corporate Deposits Forecast Model - Initial Validation (clwqql88401i022iknur86lpr)


<a id='toc4_'></a>

## Initialize the Python environment

Next, let's import the necessary libraries and set up your Python environment for data analysis:

In [3]:
import arviz as az
import numpy as np
import pandas as pd
import pymc as pm
import plotly.express as px
import plotly.graph_objects as go



<a id='toc4_1_'></a>

### Preview the documentation template

A template predefines sections for your model documentation and provides a general outline to follow, making the documentation process much easier.

You'll upload documentation and test results into this template later on. For now, take a look at the structure that the template provides with the `vm.preview_template()` function from the ValidMind library and note the empty sections:

In [4]:
vm.preview_template()

Accordion(children=(Accordion(children=(HTML(value='<p>Empty Section</p>'), Accordion(children=(HTML(value='<p…

<a id='toc5_'></a>

## Load the sample dataset

The sample dataset used here is provided by the ValidMind library. To be able to use it, you'll need to import the dataset and load it into a pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), a two-dimensional tabular data structure that makes use of rows and columns:

In [5]:
from validmind.datasets.regression import fred_deposits as demo_dataset

deposits_df, deposits_seasonality_df, fedfunds_df, tb3ms_df, gs10_df, gs30_df = demo_dataset.load_data()

df = deposits_seasonality_df.copy()

df["Month"] = df.index
df["FEDFUNDS"] = fedfunds_df["FEDFUNDS"]
df["TB3MS"] = tb3ms_df["TB3MS"]
df["GS10"] = gs10_df["GS10"]
df["GS30"] = gs30_df["GS30"]

target_column = demo_dataset.target_column

<a id='toc6_'></a>

## Train the seasonality model

In [6]:
t = (df["Month"]- pd.Timestamp("1900-01-01")).dt.days.to_numpy()
t_min = np.min(t)
t_max = np.max(t)
t = (t - t_min) / (t_max - t_min)

In [7]:
y = df[target_column].to_numpy()
y_max = np.max(y)
y = y / y_max

In [8]:
with pm.Model(check_bounds=False) as linear:
    alpha = pm.Normal("alpha", mu=0, sigma=0.5)
    beta = pm.Normal("beta", mu=0, sigma=0.5)
    sigma = pm.HalfNormal("sigma", sigma=0.5)
    trend = pm.Deterministic("trend", alpha + beta * t)
    pm.Normal("likelihood", mu=trend, sigma=sigma, observed=y)

    linear_prior = pm.sample_prior_predictive()

with linear:
    linear_trace = pm.sample(return_inferencedata=True)
    linear_prior = pm.sample_posterior_predictive(trace=linear_trace)

Sampling: [alpha, beta, likelihood, sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta, sigma]


Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 18 seconds.
Sampling: [likelihood]


Output()

In [9]:
n_order = 10
periods = (df["Month"] - pd.Timestamp("1900-01-01")).dt.days / 365.25

fourier_features = pd.DataFrame(
    {
        f"{func}_order_{order}": getattr(np, func)(2 * np.pi * periods * order)
        for order in range(1, n_order + 1)
        for func in ("sin", "cos")
    }
)
fourier_features

Unnamed: 0_level_0,sin_order_1,cos_order_1,sin_order_2,cos_order_2,sin_order_3,cos_order_3,sin_order_4,cos_order_4,sin_order_5,cos_order_5,sin_order_6,cos_order_6,sin_order_7,cos_order_7,sin_order_8,cos_order_8,sin_order_9,cos_order_9,sin_order_10,cos_order_10
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2010-01-01,-0.008601,0.999963,-0.017202,0.999852,-0.025801,0.999667,-0.034398,0.999408,-0.042993,0.999075,-0.051584,0.998669,-0.060172,0.998188,-0.068755,0.997634,-0.077334,0.997005,-0.085906,0.996303
2010-02-01,0.500931,0.865487,0.867099,0.498137,0.999995,-0.003225,0.863867,-0.503720,0.495337,-0.868701,-0.006451,-0.999979,-0.506504,-0.862238,-0.870294,-0.492533,-0.999953,0.009676,-0.860600,0.509282
2010-03-01,0.844881,0.534955,0.903946,-0.427646,0.122261,-0.992498,-0.773138,-0.634237,-0.949449,0.313921,-0.242687,0.970105,0.689796,0.724004,0.980707,-0.195486,0.359472,-0.933156,-0.596104,-0.802907
2010-04-01,0.999514,0.031174,0.062318,-0.998056,-0.995628,-0.093402,-0.124395,0.992233,0.987873,0.155266,0.185987,-0.982552,-0.976277,-0.216527,-0.246857,0.969052,0.960885,0.276946,0.306767,-0.951785
2010-05-01,0.884725,-0.466114,-0.824765,-0.565476,-0.115856,0.993266,0.932769,-0.360475,-0.753698,-0.657221,-0.230151,0.973155,0.968251,-0.249981,-0.672480,-0.740116,-0.341347,0.939938,0.990692,-0.136120
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-08-01,-0.476544,-0.879150,0.837909,0.545811,-0.996751,-0.080549,0.914679,-0.404181,-0.611530,0.791221,0.160575,-0.987024,0.329192,0.944263,-0.739392,-0.673275,0.970882,0.239557,-0.967711,0.252062
2022-09-01,-0.857296,-0.514823,0.882712,-0.469915,-0.051584,0.998669,-0.829598,-0.558361,0.905777,-0.423755,-0.103031,0.994678,-0.799691,-0.600412,0.926430,-0.376467,-0.154204,0.988039,-0.767655,-0.640864
2022-10-01,-0.999694,-0.024726,0.049437,-0.998777,0.997250,0.074117,-0.098753,0.995112,-0.992366,-0.123328,0.147827,-0.989013,0.985056,0.172236,-0.196540,0.980496,-0.975336,-0.220724,0.244772,-0.969581
2022-11-01,-0.873453,0.486908,-0.850583,-0.525841,0.045141,-0.998981,0.894542,-0.446983,0.825979,0.563701,-0.090190,0.995925,-0.913808,0.406147,-0.799691,-0.600412,0.135055,-0.990838,0.931210,-0.364483


In [10]:
coords = {"fourier_features": np.arange(2 * n_order)}
with pm.Model(check_bounds=False, coords=coords) as linear_with_seasonality:
    alpha = pm.Normal("alpha", mu=0, sigma=0.5)
    beta = pm.Normal("beta", mu=0, sigma=0.5)
    sigma = pm.HalfNormal("sigma", sigma=0.1)
    beta_fourier = pm.Normal("beta_fourier", mu=0, sigma=0.1, dims="fourier_features")
    seasonality = pm.Deterministic(
        "seasonality", pm.math.dot(beta_fourier, fourier_features.to_numpy().T)
    )
    trend = pm.Deterministic("trend", alpha + beta * t)
    mu = trend + seasonality
    pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y)

    linear_seasonality_prior = pm.sample_prior_predictive()

with linear_with_seasonality:
    linear_seasonality_trace = pm.sample(return_inferencedata=True)
    linear_seasonality_posterior = pm.sample_posterior_predictive(trace=linear_seasonality_trace)

Sampling: [alpha, beta, beta_fourier, likelihood, sigma]
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [alpha, beta, sigma, beta_fourier]


Output()

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 23 seconds.
Sampling: [likelihood]


Output()

In [11]:
likelihood = az.extract(linear_seasonality_posterior, group="posterior_predictive", num_samples=100)["likelihood"] * y_max
trend = az.extract(linear_trace, group="posterior", num_samples=100)["trend"] * y_max
seasonality = az.extract(linear_seasonality_trace, group="posterior", num_samples=100)["seasonality"] * 10000

In [14]:
# Extract the posterior predictive mean for seasonality
seasonality_posterior_mean = seasonality.mean(axis=1)

In [18]:
vm_seasonality_model = vm.init_model(
    input_id="seasonality_model",
    attributes={
        "architecture": "PyMC",
        "language": "Python",
    }
)

In [15]:
vm_raw_ds = vm.init_dataset(
    dataset=df,
    input_id="raw_ds"
)


2024-05-28 22:10:12,645 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


In [19]:
vm_raw_ds.assign_predictions(
    model=vm_seasonality_model,
    prediction_values=seasonality.mean(axis=1)
)

2024-05-28 22:14:22,766 - INFO(validmind.vm_models.dataset.dataset): No probabilities computed or provided. Not adding probability column to the dataset.


In [20]:
vm_raw_ds.df.head()

Unnamed: 0_level_0,DPSACBW027NBOG,Month,FEDFUNDS,TB3MS,GS10,GS30,seasonality_model_prediction
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2010-01-01,7692.7155,2010-01-01,0.11,0.06,3.73,4.6,31.229875
2010-02-01,7737.06,2010-02-01,0.13,0.11,3.69,4.62,35.660255
2010-03-01,7798.83494,2010-03-01,0.16,0.15,3.73,4.64,61.238766
2010-04-01,7811.242,2010-04-01,0.2,0.16,3.85,4.69,118.964207
2010-05-01,7743.18854,2010-05-01,0.2,0.16,3.42,4.29,86.079921


In [22]:
vm_raw_ds.add_extra_column(
    column_name=f"{target_column}_seasonal_adjusted", 
    column_values = df[target_column] - seasonality.mean(axis=1)
)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().