# Time Series Forecasting Model Tutorial

This notebook guides model developers through the process of automatically documenting and testing time series forecasting models. It shows you how to use the ValidMind Developer Framework to import and prepare data and before running a data validation test suite, followed by loading a pre-trained model and running a model validation test suite. 

As part of the notebook, you will learn how to:

- Step 1: Import raw data
- Step 3: Run data validation test suite on raw data
- Step 4: Preprocess data
- Step 5: Run data validation test suite on processed data
- Step 6: Load pre-trained models
- Step 7: Run model validation test suite on models

## ValidMind at a glance

ValidMind's platform enables organizations to identify, document, and manage model risks for all types of models, including AI/ML models, LLMs, and statistical models. As a model developer, you use the ValidMind Developer Framework to automate documentation and validation tests, and then use the ValidMind AI Risk Platform UI to collaborate on documentation projects. Together, these products simplify model risk management, facilitate compliance with regulations and institutional standards, and enhance collaboration between yourself and model validators.

If this is your first time trying out ValidMind, we recommend going through the following resources first:

- [Get started](https://docs.validmind.ai/guide/get-started.html) — The basics, including key concepts, and how our products work
- [Get started with the ValidMind Developer Framework](https://docs.validmind.ai/guide/get-started-developer-framework.html) —  The path for developers, more code samples, and our developer reference

## Before you begin

::: {.callout-tip}
### New to ValidMind? 
To access the ValidMind Platform UI, you'll need an account.

Signing up is FREE — **[Create your account](https://app.prod.validmind.ai)**.
:::

If you encounter errors due to missing modules in your Python environment, install the modules with `pip install`, and then re-run the notebook. For more help, refer to [Installing Python Modules](https://docs.python.org/3/installing/index.html).

## Install the client library

The client library provides Python support for the ValidMind Developer Framework. To install it:

In [1]:
%pip install -q validmind


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Initialize the client library

Every documentation project in the Platform UI comes with a _code snippet_ that lets the client library associate your documentation and tests with the right project on the Platform UI when you run this notebook.

Get your code snippet by creating a documentation project:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. Go to **Documentation Projects** and click **Create new project**.

3. Select **`[Demo] Interest Rate Time Series Forecasting Model`** and **`Initial Validation`** for the model name and type, give the project a unique  name to make it yours, and then click **Create project**.

4. Go to **Documentation Projects** > **YOUR_UNIQUE_PROJECT_NAME** > **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:

In [2]:
## Replace with code snippet from your documentation project ##

import validmind as vm

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  api_key = "...",
  api_secret = "...",
  project = "..."
)
  

2024-01-19 14:13:41,601 - INFO(validmind.api_client): Connected to ValidMind. Project: [Demo] Interest Rate Time Series Forecasting Model - Initial Validation (clr4sqrao0042uqy6ujfkd3rr)


#### Explore available test suites

In this notebook we will run a collection of test suites that are available in the ValidMind Developer Framework. Test suites group together a collection of tests that are relevant for a specific use case. In our case, we will run test different test suites for time series forecasting models. Once a test suite runs successfully, its results will be automatically uploaded to the ValidMind platform.

In [3]:
vm.test_suites.list_suites()

ID,Name,Description,Tests
classifier_model_diagnosis,ClassifierDiagnosis,Test suite for sklearn classifier model diagnosis tests,"validmind.model_validation.sklearn.OverfitDiagnosis, validmind.model_validation.sklearn.WeakspotsDiagnosis, validmind.model_validation.sklearn.RobustnessDiagnosis"
classifier_full_suite,ClassifierFullSuite,Full test suite for binary classification models.,"validmind.data_validation.DatasetMetadata, validmind.data_validation.DatasetDescription, validmind.data_validation.DescriptiveStatistics, validmind.data_validation.PearsonCorrelationMatrix, validmind.data_validation.ClassImbalance, validmind.data_validation.Duplicates, validmind.data_validation.HighCardinality, validmind.data_validation.HighPearsonCorrelation, validmind.data_validation.MissingValues, validmind.data_validation.Skewness, validmind.data_validation.UniqueRows, validmind.data_validation.TooManyZeroValues, validmind.model_validation.ModelMetadata, validmind.data_validation.DatasetSplit, validmind.model_validation.sklearn.ConfusionMatrix, validmind.model_validation.sklearn.ClassifierInSamplePerformance, validmind.model_validation.sklearn.ClassifierOutOfSamplePerformance, validmind.model_validation.sklearn.PermutationFeatureImportance, validmind.model_validation.sklearn.PrecisionRecallCurve, validmind.model_validation.sklearn.ROCCurve, validmind.model_validation.sklearn.PopulationStabilityIndex, validmind.model_validation.sklearn.SHAPGlobalImportance, validmind.model_validation.sklearn.MinimumAccuracy, validmind.model_validation.sklearn.MinimumF1Score, validmind.model_validation.sklearn.MinimumROCAUCScore, validmind.model_validation.sklearn.TrainingTestDegradation, validmind.model_validation.sklearn.ModelsPerformanceComparison, validmind.model_validation.sklearn.OverfitDiagnosis, validmind.model_validation.sklearn.WeakspotsDiagnosis, validmind.model_validation.sklearn.RobustnessDiagnosis"
classifier_metrics,ClassifierMetrics,Test suite for sklearn classifier metrics,"validmind.model_validation.ModelMetadata, validmind.data_validation.DatasetSplit, validmind.model_validation.sklearn.ConfusionMatrix, validmind.model_validation.sklearn.ClassifierInSamplePerformance, validmind.model_validation.sklearn.ClassifierOutOfSamplePerformance, validmind.model_validation.sklearn.PermutationFeatureImportance, validmind.model_validation.sklearn.PrecisionRecallCurve, validmind.model_validation.sklearn.ROCCurve, validmind.model_validation.sklearn.PopulationStabilityIndex, validmind.model_validation.sklearn.SHAPGlobalImportance"
classifier_model_validation,ClassifierModelValidation,Test suite for binary classification models.,"validmind.model_validation.ModelMetadata, validmind.data_validation.DatasetSplit, validmind.model_validation.sklearn.ConfusionMatrix, validmind.model_validation.sklearn.ClassifierInSamplePerformance, validmind.model_validation.sklearn.ClassifierOutOfSamplePerformance, validmind.model_validation.sklearn.PermutationFeatureImportance, validmind.model_validation.sklearn.PrecisionRecallCurve, validmind.model_validation.sklearn.ROCCurve, validmind.model_validation.sklearn.PopulationStabilityIndex, validmind.model_validation.sklearn.SHAPGlobalImportance, validmind.model_validation.sklearn.MinimumAccuracy, validmind.model_validation.sklearn.MinimumF1Score, validmind.model_validation.sklearn.MinimumROCAUCScore, validmind.model_validation.sklearn.TrainingTestDegradation, validmind.model_validation.sklearn.ModelsPerformanceComparison, validmind.model_validation.sklearn.OverfitDiagnosis, validmind.model_validation.sklearn.WeakspotsDiagnosis, validmind.model_validation.sklearn.RobustnessDiagnosis"
classifier_validation,ClassifierPerformance,Test suite for sklearn classifier models,"validmind.model_validation.sklearn.MinimumAccuracy, validmind.model_validation.sklearn.MinimumF1Score, validmind.model_validation.sklearn.MinimumROCAUCScore, validmind.model_validation.sklearn.TrainingTestDegradation, validmind.model_validation.sklearn.ModelsPerformanceComparison"
cluster_full_suite,ClusterFullSuite,Full test suite for clustering models.,"validmind.model_validation.ModelMetadata, validmind.data_validation.DatasetSplit, validmind.model_validation.sklearn.HomogeneityScore, validmind.model_validation.sklearn.CompletenessScore, validmind.model_validation.sklearn.VMeasure, validmind.model_validation.sklearn.AdjustedRandIndex, validmind.model_validation.sklearn.AdjustedMutualInformation, validmind.model_validation.sklearn.FowlkesMallowsScore, validmind.model_validation.sklearn.ClusterPerformanceMetrics, validmind.model_validation.sklearn.ClusterCosineSimilarity, validmind.model_validation.sklearn.SilhouettePlot, validmind.model_validation.ClusterSizeDistribution, validmind.model_validation.sklearn.HyperParametersTuning, validmind.model_validation.sklearn.KMeansClustersOptimization"
cluster_metrics,ClusterMetrics,Test suite for sklearn clustering metrics,"validmind.model_validation.ModelMetadata, validmind.data_validation.DatasetSplit, validmind.model_validation.sklearn.HomogeneityScore, validmind.model_validation.sklearn.CompletenessScore, validmind.model_validation.sklearn.VMeasure, validmind.model_validation.sklearn.AdjustedRandIndex, validmind.model_validation.sklearn.AdjustedMutualInformation, validmind.model_validation.sklearn.FowlkesMallowsScore, validmind.model_validation.sklearn.ClusterPerformanceMetrics, validmind.model_validation.sklearn.ClusterCosineSimilarity, validmind.model_validation.sklearn.SilhouettePlot"
cluster_performance,ClusterPerformance,Test suite for sklearn cluster performance,validmind.model_validation.ClusterSizeDistribution
embeddings_full_suite,EmbeddingsFullSuite,Full test suite for embeddings models.,"validmind.model_validation.ModelMetadata, validmind.data_validation.DatasetSplit, validmind.model_validation.embeddings.DescriptiveAnalytics, validmind.model_validation.embeddings.CosineSimilarityDistribution, validmind.model_validation.embeddings.ClusterDistribution, validmind.model_validation.embeddings.EmbeddingsVisualization2D, validmind.model_validation.embeddings.StabilityAnalysisRandomNoise, validmind.model_validation.embeddings.StabilityAnalysisSynonyms, validmind.model_validation.embeddings.StabilityAnalysisKeyword, validmind.model_validation.embeddings.StabilityAnalysisTranslation"
embeddings_metrics,EmbeddingsMetrics,Test suite for embeddings metrics,"validmind.model_validation.ModelMetadata, validmind.data_validation.DatasetSplit, validmind.model_validation.embeddings.DescriptiveAnalytics, validmind.model_validation.embeddings.CosineSimilarityDistribution, validmind.model_validation.embeddings.ClusterDistribution, validmind.model_validation.embeddings.EmbeddingsVisualization2D"


For our example use case we will run the following test suites:

- `time_series_dataset`
- `time_series_model_validation`

## Step 1: Import raw data

### Import FRED dataset

Federal Reserve Economic Data, or FRED, is a comprehensive database maintained by the Federal Reserve Bank of St. Louis. It offers a wide array of economic data from various sources, including U.S. government agencies and international organizations. The dataset encompasses numerous economic indicators across various categories such as employment, consumer price indices, money supply, and gross domestic product, among others.

FRED provides a valuable resource for researchers, policymakers, and anyone interested in understanding economic trends and conducting economic analysis. The platform also includes tools for data visualization, which can help users interpret complex economic data and identify trends over time.

The following code snippet imports a sample FRED dataset into a Pandas dataframe:

In [4]:
from validmind.datasets.regression import fred as demo_dataset

target_column = demo_dataset.target_column
feature_columns = demo_dataset.feature_columns

df = demo_dataset.load_data()
df.tail(10)

Unnamed: 0_level_0,MORTGAGE30US,FEDFUNDS,GS10,UNRATE
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2023-03-02,6.65,,,
2023-03-09,6.73,,,
2023-03-16,6.6,,,
2023-03-23,6.42,,,
2023-03-30,6.32,,,
2023-04-01,,,3.46,
2023-04-06,6.28,,,
2023-04-13,6.27,,,
2023-04-20,6.39,,,
2023-04-27,6.43,,,


## Step 2: Run data validation test suite on raw data

### Explore the time series dataset test suites

Let's see what tests are included on each test suite:

In [5]:
vm.test_suites.describe_suite("time_series_data_quality")

ID,Name,Description,Tests
time_series_data_quality,TimeSeriesDataQuality,Test suite for data quality on time series datasets,"validmind.data_validation.TimeSeriesOutliers, validmind.data_validation.TimeSeriesMissingValues, validmind.data_validation.TimeSeriesFrequency"


In [6]:
vm.test_suites.describe_suite("time_series_univariate")

ID,Name,Description,Tests
time_series_univariate,TimeSeriesUnivariate,"This test suite provides a preliminary understanding of the target variable(s)  used in the time series dataset. It visualizations that present the raw time  series data and a histogram of the target variable(s).  The raw time series data provides a visual inspection of the target variable's  behavior over time. This helps to identify any patterns or trends in the data,  as well as any potential outliers or anomalies. The histogram of the target  variable displays the distribution of values, providing insight into the range  and frequency of values observed in the data.","validmind.data_validation.TimeSeriesLinePlot, validmind.data_validation.TimeSeriesHistogram, validmind.data_validation.ACFandPACFPlot, validmind.data_validation.SeasonalDecompose, validmind.data_validation.AutoSeasonality, validmind.data_validation.AutoStationarity, validmind.data_validation.RollingStatsPlot, validmind.data_validation.AutoAR, validmind.data_validation.AutoMA"


### Initialize the dataset

Use the ValidMind Developer Framework to initialize the dataset object:

In [7]:
vm_dataset = vm.init_dataset(
    dataset=df,
    target_column=demo_dataset.target_column,
)

2024-01-19 14:13:41,668 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


### Run time series dataset test suite on raw dataset

Next, use the ValidMind Developer Framework to run the test suite for time series datasets:

In [15]:
config = {
    # TIME SERIES DATA QUALITY PARAMS
    "validmind.data_validation.TimeSeriesOutliers": {
        "zscore_threshold": 3,
    },
    "validmind.data_validation.TimeSeriesMissingValues": {
        "min_threshold": 2,
    },

    # TIME SERIES UNIVARIATE PARAMS
    "validmind.data_validation.RollingStatsPlot": {
        "window_size": 12
    },
    "validmind.data_validation.SeasonalDecompose": {
        "seasonal_model": 'additive'
    },
    "validmind.data_validation.AutoSeasonality": {
        "min_period": 1,
        "max_period": 3
    },
    "validmind.data_validation.AutoStationarity": {
        "max_order": 3,
        "threshold": 0.05
    },
    "validmind.data_validation.AutoAR": {
        "max_ar_order": 2
    },
    "validmind.data_validation.AutoMA": {
        "max_ma_order": 2
    },

    # TIME SERIES MULTIVARIATE PARAMS
    "validmind.data_validation.LaggedCorrelationHeatmap": {
        "target_col": demo_dataset.target_column,
        "independent_vars": demo_dataset.feature_columns
    },
    "validmind.data_validation.EngleGrangerCoint": {
        "threshold": 0.05
    },
}

full_suite = vm.run_test_suite(
    "time_series_dataset",
    inputs = {"dataset": vm_dataset},
    config=config,
)

HBox(children=(Label(value='Running test suite...'), IntProgress(value=0, max=32)))

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.

2024-01-19 14:16:09,886 - INFO(validmind.tests.data_validation.SeasonalDecompose): Frequency of MORTGAGE30US: MS
2024-01-19 14:16:09,886 - INFO(validmind.tests.data_validation.SeasonalDecompose): Frequency of MORTGAGE30US: MS
2024-01-19 14:16:09,898 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'seasonal

VBox(children=(HTML(value='<h2>Test Suite Results: <i style="color: #DE257E">Time Series Dataset</i></h2><hr>'…

## Step 3: Preprocess data

### Handle frequencies, missing values and stationairty

In [9]:
# Sample frequencies to Monthly
resampled_df = df.resample("MS").last()

#  Remove all missing values
nona_df = resampled_df.dropna()

#  Take the first different across all variables
preprocessed_df = nona_df.diff().dropna()

## Step 4: Run data validation test suite on processed data

In [16]:
vm_dataset = vm.init_dataset(
    dataset=preprocessed_df,
    target_column=demo_dataset.target_column,
)

full_suite = vm.run_test_suite(
    "time_series_dataset",
    inputs = {"dataset":vm_dataset},
    config=config,
)

2024-01-19 14:16:53,294 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


HBox(children=(Label(value='Running test suite...'), IntProgress(value=0, max=32)))

No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
No artists with labels found to put in legend.  Note that artists whose label start with an underscore are ignored when legend() is called with no argument.

The default method 'yw' can produce PACF values outside of the [-1,1] interval. After 0.13, the default will change tounadjusted Yule-Walker ('ywm'). You can use this method now by setting method='ywm'.

2024-01-19 14:16:55,108 - INFO(validmind.tests.data_validation.SeasonalDecompose): Frequency of MORTGAGE30US: MS
2024-01-19 14:16:55,108 - INFO(validmind.tests.data_validation.SeasonalDecompose): Frequency of MORTGAGE30US: MS
2024-01-19 14:16:55,117 - ERROR(validmind.vm_models.test_suite.test): Failed to run test 'seasonal

VBox(children=(HTML(value='<h2>Test Suite Results: <i style="color: #DE257E">Time Series Dataset</i></h2><hr>'…

## Step 5: Load pre-trained models

### Load pre-trained models

In [11]:
from validmind.datasets.regression import fred as demo_dataset

model_A, train_df_A, test_df_A = demo_dataset.load_model('fred_loan_rates_model_3')
model_B, train_df_B, test_df_B = demo_dataset.load_model('fred_loan_rates_model_4')

### Initialize Validmind models



In [12]:
# Initialize training and testing datasets for model A
vm_train_ds_A = vm.init_dataset(
    dataset=train_df_A, target_column=demo_dataset.target_column)
vm_test_ds_A = vm.init_dataset(
    dataset=test_df_A, target_column=demo_dataset.target_column)

# Initialize training and testing datasets for model B
vm_train_ds_B = vm.init_dataset(
    dataset=train_df_B, target_column=demo_dataset.target_column)
vm_test_ds_B = vm.init_dataset(
    dataset=test_df_B, target_column=demo_dataset.target_column)

# Initialize model A
vm_model_A = vm.init_model(
    model=model_A,
    train_ds=vm_train_ds_A,
    test_ds=vm_test_ds_A)

# Initialize model B
vm_model_B = vm.init_model(
    model=model_B,
    train_ds=vm_train_ds_B,
    test_ds=vm_test_ds_B)


models = [vm_model_A, vm_model_B]

2024-01-19 14:15:11,444 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-01-19 14:15:11,447 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-01-19 14:15:11,449 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...
2024-01-19 14:15:11,449 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


## Step 6: Run model validation test suite on models

### Explore the time series model validation test suite

In [13]:
vm.test_suites.describe_test_suite("time_series_model_validation")

ID,Name,Description,Tests
time_series_model_validation,TimeSeriesModelValidation,Test suite for time series model validation.,"validmind.data_validation.DatasetSplit, validmind.model_validation.ModelMetadata, validmind.model_validation.statsmodels.RegressionModelsCoeffs, validmind.model_validation.statsmodels.RegressionModelsPerformance, validmind.model_validation.statsmodels.RegressionModelForecastPlotLevels, validmind.model_validation.statsmodels.RegressionModelSensitivityPlot"


### Run model validation test suite on a list of models

In [18]:
config = {
    "validmind.model_validation.statsmodels.RegressionModelForecastPlotLevels": {
        "transformation": "integrate",
    },
    "validmind.model_validation.statsmodels.RegressionModelSensitivityPlot": {
        "transformation": "integrate",
        "shocks": [0.3],
    }
}

full_suite = vm.run_test_suite(
    "time_series_model_validation",
    inputs = {
        "model": vm_model_B,
        "models": models,
    },
    config=config,
)

HBox(children=(Label(value='Running test suite...'), IntProgress(value=0, max=12)))

2024-01-19 14:18:07,834 - INFO(validmind.tests.model_validation.statsmodels.RegressionModelSensitivityPlot): {'transformation': 'integrate', 'shocks': [0.3]}
2024-01-19 14:18:07,834 - INFO(validmind.tests.model_validation.statsmodels.RegressionModelSensitivityPlot): {'transformation': 'integrate', 'shocks': [0.3]}


VBox(children=(HTML(value='<h2>Test Suite Results: <i style="color: #DE257E">Time Series Model Validation</i><…

## Next steps

You can look at the results of this test suite right in the notebook where you ran the code, as you would expect. But there is a better way: view the prompt validation test results as part of your model documentation right in the ValidMind Platform UI: 

1. Log back into the [Platform UI](https://app.prod.validmind.ai) 

2. Go to **Documentation Projects** > **YOUR_DOCUMENTATION_PROJECT** > **Documentation**.

3. Expand **3. Model Development** to review all test results.

What you can see now is a more easily consumable version of the prompt validation testing you just performed, along with other parts of your documentation project that still need to be completed. 

If you want to learn more about where you are in the model documentation process, take a look at [How do I use the framework?](https://docs.validmind.ai/guide/get-started-developer-framework.html#how-do-i-use-the-framework).

