# Time Series Forecasting Model Tutorial

## Introduction

This tutorial shows **Model Developers** on how to use and configure the **Developer Framework** and the **ValidMind Platform**. The following steps will guide you to **automatically** document and test Time Series Forecasting models: 

- Step 1: Import raw data
- Step 3: Run data validation test suite on raw data
- Step 4: Preprocess data
- Step 5: Run data validation test suite on processed data
- Step 6: Load pre-trained models
- Step 7: Run model validation test suite on models

## Before you begin

To use the ValidMind Developer Framework with a Jupyter notebook, you need to install and initialize the client library first, along with getting your Python environment ready.

If you don't already have one, you should also [create a documentation project](https://docs.validmind.ai/guide/create-your-first-documentation-project.html) on the ValidMind platform. You will use this project to upload your documentation and test results.

## Install the client library

In [None]:
%pip install -q validmind

## Initialize the client library

In a browser, go to the **Client Integration** page of your documentation project and click **Copy to clipboard** next to the code snippet. This code snippet gives you the API key, API secret, and project identifier to link your notebook to your documentation project.

::: {.column-margin}
::: {.callout-tip}
This step requires a documentation project. [Learn how you can create one](https://docs.validmind.ai/guide/create-your-first-documentation-project.html).
:::
:::

Next, replace this placeholder with your own code snippet:

In [None]:
## Replace with code snippet from your documentation project ##

import validmind as vm

vm.init(
    api_host="https://api.prod.validmind.ai/api/v1/tracking",
    api_key="...",
    api_secret="...",
    project="..."
)

#### Explore available test suites

In this notebook we will run a collection of test suites that are available in the ValidMind Developer Framework. Test suites group together a collection of tests that are relevant for a specific use case. In our case, we will run test different test suites for time series forecasting models. Once a test suite runs successfully, its results will be automatically uploaded to the ValidMind platform.

In [None]:
vm.test_suites.list_suites()

For our example use case we will run the following test suites:

- `time_series_dataset`
- `time_series_model_validation`

## Step 1: Import raw data

#### Import FRED dataset

Federal Reserve Economic Data, or FRED, is a comprehensive database maintained by the Federal Reserve Bank of St. Louis. It offers a wide array of economic data from various sources, including U.S. government agencies and international organizations. The dataset encompasses numerous economic indicators across various categories such as employment, consumer price indices, money supply, and gross domestic product, among others.

FRED provides a valuable resource for researchers, policymakers, and anyone interested in understanding economic trends and conducting economic analysis. The platform also includes tools for data visualization, which can help users interpret complex economic data and identify trends over time.

In the following code snippet we will import a sample FRED dataset into a Pandas dataframe.

In [None]:
from validmind.datasets.regression import fred as demo_dataset

target_column = demo_dataset.target_column
feature_columns = demo_dataset.feature_columns

df = demo_dataset.load_data()
df.tail(10)

## Step 2: Run data validation test suite on raw data

#### Explore the time series dataset test suites

Let's see what tests are included on each test plan.

In [None]:
vm.test_suites.describe_suite("time_series_data_quality")

In [None]:
vm.test_suites.describe_suite("time_series_univariate")

##### Connect Raw Dataset to ValidMind Platform

In [None]:
vm_dataset = vm.init_dataset(
    dataset=df,
    target_column=demo_dataset.target_column,
)

##### Run Time Series Dataset Test Suite on Raw Dataset

In [None]:
config = {
    # TIME SERIES DATA QUALITY PARAMS
    "validmind.data_validation.TimeSeriesOutliers": {
        "zscore_threshold": 3,
    },
    "validmind.data_validation.TimeSeriesMissingValues":{
        "min_threshold": 2,
    },

    # TIME SERIES UNIVARIATE PARAMS
    "validmind.data_validation.RollingStatsPlot": {
        "window_size": 12
    },
     "validmind.data_validation.SeasonalDecompose": {
        "seasonal_model": 'additive'
    },
     "validmind.data_validation.AutoSeasonality": {
        "min_period": 1,
        "max_period": 3
    },
      "validmind.data_validation.AutoStationarity": {
        "max_order": 3,
        "threshold": 0.05
    },
    "validmind.data_validation.AutoAR": {
        "max_ar_order": 2
    },
    "validmind.data_validation.AutoMA": {
        "max_ma_order": 2
    },

    # TIME SERIES MULTIVARIATE PARAMS
    "validmind.data_validation.LaggedCorrelationHeatmap": {
        "target_col": demo_dataset.target_column,
        "independent_vars": demo_dataset.feature_columns
    },
    "validmind.data_validation.EngleGrangerCoint": {
        "threshold": 0.05
    },
}

full_suite = vm.run_test_suite(
    "time_series_dataset",
    dataset=vm_dataset,
    config = config,
)

## Step 3: Preprocess data

##### Handle Frequencies, Missing Values and Stationairty

In [None]:
# Sample frequencies to Monthly
resampled_df = df.resample("MS").last()

# Remove all missing values
nona_df = resampled_df.dropna()

# Take the first different across all variables
preprocessed_df = nona_df.diff().dropna()

## Step 4: Run data validation test suite on processed Data

In [None]:
vm_dataset = vm.init_dataset(
    dataset=preprocessed_df,
    target_column=demo_dataset.target_column,
)

full_suite = vm.run_test_suite(
    "time_series_dataset",
    dataset=vm_dataset,
    config=config,
)

## Step 5: Load pre-trained models

#### Load pre-trained models

In [None]:
from validmind.datasets.regression import fred as demo_dataset

model_A, train_df_A, test_df_A = demo_dataset.load_model('fred_loan_rates_model_3')
model_B, train_df_B, test_df_B = demo_dataset.load_model('fred_loan_rates_model_4')

##### Initialize VM Models

In [None]:
# Initialize training and testing datasets for model A
vm_train_ds_A = vm.init_dataset(dataset=train_df_A, target_column=demo_dataset.target_column)
vm_test_ds_A = vm.init_dataset(dataset=test_df_A, target_column=demo_dataset.target_column)

# Initialize training and testing datasets for model B
vm_train_ds_B = vm.init_dataset(dataset=train_df_B, target_column=demo_dataset.target_column)
vm_test_ds_B = vm.init_dataset(dataset=test_df_B, target_column=demo_dataset.target_column)

# Initialize model A
vm_model_A = vm.init_model(
    model = model_A,
    train_ds=vm_train_ds_A,
    test_ds=vm_test_ds_A)

# Initialize model B
vm_model_B = vm.init_model(
    model = model_B,
    train_ds=vm_train_ds_B,
    test_ds=vm_test_ds_B)


models = [vm_model_A, vm_model_B]

## Step 6: Run model validation test suite on models

#### Explore the time series model validation test suite

In [None]:
vm.test_suites.describe_test_suite("time_series_model_validation")

#### Run model validation test suite on a list of models

In [None]:
config = {
    "validmind.model_validation.statsmodels.RegressionModelForecastPlotLevels": {
        "transformation": "integrate",
    },
    "validmind.model_validation.statsmodels.RegressionModelSensitivityPlot": {
        "transformation": "integrate",
        "shocks": [0.3],
    }
}

full_suite = vm.run_test_suite(
    "time_series_model_validation",
    model = vm_model_B,
    models = models,
    config = config,
)