# Time Series Model Development Tutorial 

## Setup ValidMind Platform
Prepare the environment for our analysis. First, **import** all necessary libraries and modules required for our analysis. Next, **connect** to the ValidMind MRM platform, which provides a comprehensive suite of tools and services for model validation.

Finally, define and **configure** the specific use case we are working on by setting up any required parameters, data sources, or other settings that will be used throughout the analysis.

### ValidMind Project Information
**Model**: FRED Loan Rates  
**Project Type**: Initial Validation   
**Project ID**: `clhuctiea0000j3y6fkeyk73f`  
**Template**: `time_series_forecasting_v3.yaml` 

### Import Libraries

In [None]:
# Load API key and secret from environment variables
%load_ext dotenv
%dotenv .env

# Load ValidMind utils  
from validmind.datasets.regression import (
    identify_frequencies, 
    resample_to_common_frequency
)

### Connect to ValidMind Project

In [None]:
import validmind as vm

vm.init(
  api_host = "http://localhost:3000/api/v1/tracking",
  api_key = "2494c3838f48efe590d531bfe225d90b",
  api_secret = "4f692f8161f128414fef542cab2a4e74834c75d01b3a8e088a1834f2afcfe838",
  project = "clhuctiea0000j3y6fkeyk73f"
) 

### Test Plans Availables

In [None]:
vm.test_plans.list_plans()

## Data Engineering 

### Data Collection

In [None]:
from validmind.datasets.regression import fred as demo_dataset

target_column = demo_dataset.target_column
feature_columns = demo_dataset.feature_columns

# Split the dataset into test and training 
df = demo_dataset.load_data()

### Connect Datasets to ValidMind Platform

In [None]:
vm_dataset = vm.init_dataset(
    dataset=df,
    target_column=demo_dataset.target_column,
)

### Data Description

### Data Quality

**Run the Time Series Data Quality Test Plan**

In [None]:
# TIME SERIES DATA QUALITY PARAMS
config={
    
    "time_series_outliers": {
        "zscore_threshold": 3,
    },
    "time_series_missing_values":{
        "min_threshold": 2,
    }
}

vm.run_test_plan("time_series_data_quality",
                 config=config, 
                 dataset=vm_dataset)

## Exploratory Data Analysis

### Univariate Analysis

**Run the Time Series Univariate Test Plan**

In [None]:
# TIME SERIES UNIVARIATE PARAMS 
config={
       
    "rolling_stats_plot": {
        "window_size": 12    
    },
     "seasonal_decompose": {
        "seasonal_model": 'additive'
    },
     "auto_seasonality": {
        "min_period": 1,
        "max_period": 3
    },
      "auto_stationarity": {
        "max_order": 3,
        "threshold": 0.05
    },
    "auto_ar": {
        "max_ar_order": 4
    },
    "auto_ma": {
        "max_ma_order": 3
    }
}

vm.run_test_plan("time_series_univariate",
                config=config, 
                 dataset=vm_dataset)

### Multivariate Analysis

**Run the Time Series Univariate Test Plan**

In [None]:
# TIME SERIES MULTIVARIATE PARAMS 
config={

    "lagged_correlation_heatmap": {
        "target_col": demo_dataset.target_column,
        "independent_vars": demo_dataset.feature_columns
    },
    "engle_granger_coint": {
        "threshold": 0.05
    },
}

vm.run_test_plan("time_series_multivariate",
                 config=config, 
                 dataset=vm_dataset)

## Feature Engineering

### Treatment of Frequencies
Show the frequencies of each variable in the raw dataset.

In [None]:
frequencies = identify_frequencies(df)
display(frequencies)

Handle frequencies by resampling all variables to a common frequency.

In [None]:
preprocessed_df = resample_to_common_frequency(df, common_frequency=demo_dataset.frequency)
frequencies = identify_frequencies(preprocessed_df)
display(frequencies)

**Run the Time Series Dataset Test Suite**

Run the same suite again after handling frequencies. 

In [None]:
vm_dataset = vm.init_dataset(
    dataset=preprocessed_df,
    target_column=demo_dataset.target_column,
)

full_suite = vm.run_test_suite(
    "time_series_dataset",
    dataset=vm_dataset,
)

### Treatment of Missing Values
Handle the missing values by droping all the `nan` values.

In [None]:
preprocessed_df = preprocessed_df.dropna()

**Run the Time Series Dataset Test Suite**

Run the same test suite to check there are no missing values and frequencies of all variables are the same.

In [None]:
vm_dataset = vm.init_dataset(
    dataset=preprocessed_df,
    target_column=demo_dataset.target_column,
)

full_suite = vm.run_test_suite(
    "time_series_dataset",
    dataset=vm_dataset,
)

## Model Training

### Load Train Data, Test Data and Models

In [None]:
# Currently only fred pre-trained models are available
from validmind.datasets.regression import fred as demo_dataset
model_A, train_df_A, test_df_A = demo_dataset.load_model('fred_loan_rates_model_3')
model_B, train_df_B, test_df_B = demo_dataset.load_model('fred_loan_rates_model_4')

### Create ValidMind Datasets

In [None]:
# Initialize training and testing datasets for model A
vm_train_ds_A = vm.init_dataset(dataset=train_df_A, type="generic", target_column=demo_dataset.target_column)
vm_test_ds_A = vm.init_dataset(dataset=test_df_A, type="generic", target_column=demo_dataset.target_column)

# Initialize training and testing datasets for model B
vm_train_ds_B = vm.init_dataset(dataset=train_df_B, type="generic", target_column=demo_dataset.target_column)
vm_test_ds_B = vm.init_dataset(dataset=test_df_B, type="generic", target_column=demo_dataset.target_column)

### Create ValidMind Models

In [None]:
# Initialize model A
vm_model_A = vm.init_model(
    model = model_A, 
    train_ds=vm_train_ds_A, 
    test_ds=vm_test_ds_A)

# Initialize model B
vm_model_B = vm.init_model(
    model = model_B,
    train_ds=vm_train_ds_B,
    test_ds=vm_test_ds_B)


list_of_models = [vm_model_A, vm_model_B]

### Data Split Description

**Run the Regression Model Description Test Plan**

In [None]:
test_plan = vm.run_test_plan(
    "regression_model_description",
    model = vm_model_B,
    models = list_of_models
)

### Train and Test Data Validation

**Run the Time Series Dataset Test Suite**

In [None]:
vm_dataset = vm.init_dataset(
    dataset=test_df_B,
    target_column=demo_dataset.target_column,
)

full_suite = vm.run_test_suite(
    "time_series_dataset",
    dataset=vm_dataset,
)

## Model Testing

### Performance

**Run the Regression Model Evaluation Test Plan**

In [None]:
test_plan = vm.run_test_plan(
    "regression_models_evaluation",
    model = model_B,
    models = list_of_models
)

### Forecasting

**Run the Time Series Forecast Test Plan**

In [None]:
config= {
    "regression_forecast_plot_levels": {
        "transformation": "integrate",
    }
}

test_plan = vm.run_test_plan(
    "time_series_forecast",
    models = list_of_models,
    config = config
)

### Sensitivity Analysis 

In [None]:
config= {
    "regression_sensitivity_plot": {
        "transformation": "integrate",
        "shocks": [0.1, 0.2],
    }
}

test_plan = vm.run_test_plan(
    "time_series_sensitivity",
    models = list_of_models,
    config = config
)