# Document a time series forecasting model

## Install the client library

The client library provides Python support for the ValidMind Developer Framework. To install it:

In [1]:
%pip install -q validmind

[0mNote: you may need to restart the kernel to use updated packages.


## Initialize the client library

ValidMind generates a unique _code snippet_ for each registered model to connect with your developer environment. You initialize the client library with this code snippet, which ensures that your documentation and tests are uploaded to the correct model when you run the notebook.

Get your code snippet:

1. In a browser, log into the [Platform UI](https://app.prod.validmind.ai).

2. In the left sidebar, navigate to **Model Inventory** and click **+ Register new model**.

3. Enter the model details, making sure to select **Time Series Forecasting** as the template and **Credit Risk - Underwriting - Loan** as the use case, and click **Continue**. ([Need more help?](https://docs.validmind.ai/guide/register-models-in-model-inventory.html))

4. Go to **Getting Started** and click **Copy snippet to clipboard**.

Next, replace this placeholder with your own code snippet:

In [2]:
# Replace with your code snippet

import validmind as vm

vm.init(
  api_host = "https://api.prod.validmind.ai/api/v1/tracking",
  api_key = "...",
  api_secret = "...",
  project = "..."
)

2024-06-10 09:56:54,584 - INFO(validmind.api_client): Connected to ValidMind. Project: [Juan] Time Series Forecast Model - Initial Validation (clx4ol0e2002o29if4p849ayi)


## Initialize the Python environment

Next, let's import the necessary libraries and set up your Python environment for data analysis:

In [3]:
%matplotlib inline

## Load the sample dataset

The sample dataset used here is provided by the ValidMind library. To be able to use it, you need to import the dataset and load it into a pandas [DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), a two-dimensional tabular data structure that makes use of rows and columns:

In [4]:
from validmind.datasets.regression import fred_timeseries as demo_dataset

target_column = demo_dataset.target_column
feature_columns = demo_dataset.feature_columns

df = demo_dataset.load_data()
df

Unnamed: 0_level_0,MORTGAGE30US,FEDFUNDS,GS10,UNRATE
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1971-04-01,7.29,4.16,5.83,5.9
1971-05-01,7.46,4.63,6.39,5.9
1971-06-01,7.54,4.91,6.52,5.9
1971-07-01,7.69,5.31,6.73,6.0
1971-08-01,7.69,5.57,6.58,6.1
...,...,...,...,...
2022-11-01,6.58,3.78,3.89,3.6
2022-12-01,6.42,4.10,3.62,3.5
2023-01-01,6.13,4.33,3.53,3.4
2023-02-01,6.50,4.57,3.75,3.6


### Raw data 

In [5]:
vm_dataset = vm.init_dataset(
    input_id="raw_dataset",
    dataset=df,
    target_column=target_column,
)

2024-06-10 09:56:54,625 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


In [6]:
run=False
if run:

    test= vm.tests.run_test(
        "validmind.data_validation.TimeSeriesLinePlot:raw_data",
        inputs = {
            "dataset": vm_dataset,
        }
    )
    test.log()

In [7]:
run=False
if run:
    
    test= vm.tests.run_test(
        "validmind.data_validation.TimeSeriesMissingValues",
        inputs = {
            "dataset": vm_dataset,
        }
    )
    test.log()

In [8]:
run=False
if run:

    test= vm.tests.run_test(
        "validmind.data_validation.TimeSeriesOutliers",
        inputs = {
            "dataset": vm_dataset,
        }
    )
    test.log()

In [9]:
run=False
if run:

    test= vm.tests.run_test(
        "validmind.data_validation.SeasonalDecompose",
        inputs = {
            "dataset": vm_dataset,
        }
    )
    test.log()

## Preprocess data

In [10]:
from sklearn.model_selection import train_test_split

#  Take the first different across all variables
preprocessed_df = df.diff().dropna()

In [11]:
vm_preprocessed_ds = vm.init_dataset(
    input_id="preprocessed_ds",
    dataset=preprocessed_df,
    target_column=target_column,
)

2024-06-10 09:56:55,768 - INFO(validmind.client): Pandas dataset detected. Initializing VM Dataset instance...


In [12]:
run=True
if run:

    test= vm.tests.run_test(
        "validmind.data_validation.TimeSeriesLinePlot:preprocessed_data",
        inputs = {
            "dataset": vm_preprocessed_ds,
        }
    )
    test.log()

VBox(children=(HTML(value='<h1>Time Series Line Plot Preprocessed Data</h1>'), HTML(value='<p><strong>Time Ser…

In [19]:
run=True
if run:

    test= vm.tests.run_test(
        "validmind.data_validation.TimeSeriesHistogram",
        inputs = {
            "dataset": vm_preprocessed_ds,
        },
        params = {
            "nbins": 150
        }
    )
    test.log()

VBox(children=(HTML(value='<h1>Time Series Histogram</h1>'), HTML(value='<p><strong>Time Series Histogram</str…

## Model training

In [21]:
# Split the data into train and test
train_df, test_df = train_test_split(preprocessed_df, test_size=0.2, shuffle=False)

X_train = train_df.drop(target_column, axis=1)
y_train = train_df[target_column]
X_test = test_df.drop(target_column, axis=1)
y_test = test_df[target_column]

In [24]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score

# Initialize the model
rf = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model
rf.fit(X_train, y_train)

# Make predictions
y_pred_diff = rf.predict(X_test)

# Convert differenced predictions back to original scale
previous_values = df.loc[X_test.index, target_column] - preprocessed_df.loc[X_test.index, target_column]

# Add the previous values to the differenced predictions to get the actual predictions
y_pred_actual = y_pred_diff + previous_values

# Get the actual values of y_test in original scale
y_test_actual = y_test + previous_values

# Evaluate the model
mse_diff = mean_squared_error(y_test_actual, y_pred_actual)
r2_diff = r2_score(y_test_actual, y_pred_actual)

print(f"Mean Squared Error: {mse_diff}"), 
print(f"R-Squared: {r2_diff}")

Mean Squared Error: 0.025865389039866685
R-Squared: 0.9642985147523984
