# Time-series forecasting - Lap records in Formula 1

## Overview

In this step, it mainly wants to show the use of trained model to forecast. The purpose is to follow the pattern/steps of machine learning pipeline.

Machine Learning (ML) Pipeline automates the workflow it takes to produce a machine learning model, and it consists of multiple sequential steps that do everything from data extraction, preprocessing and feature engineering to model training and deployment. Usuually, it covers from development to deployment in an automated manner.

The diagram demonstrates the experimental workflow (without MLOps) in this exercise:

<img src="../pictures/machine-learning-pipeline.png" width="800">

Essentially, in this machine learning workflow, the model is the product.

In [71]:
# import libraries

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import pandas as pd
pd.options.mode.chained_assignment = None
import numpy as np
import pickle
from ipywidgets import widgets, interact

## Loading pre-trained ML model 

The trained model in the previous step was saved into _pickle_ file, so loads it by using _pickle_.

In [72]:
with open('../data/05-trained-models/rf-trained-model.pkl', 'rb') as f:
    rf_regressor = pickle.load(f)

## Loading the testing files for predication

Testing data is used for validation and evaluation

In [73]:
df_test_features = pd.read_csv('../data/04-train-test-data/test_data.csv')
df_test_labels = pd.read_csv('../data/04-train-test-data/test_labels.csv')
print(df_test_features.shape)
print(df_test_labels.shape)

(182075, 4)
(182075, 1)


In [81]:
X_test = df_test_features.values
y_test = df_test_labels.values
X_test[99:100]

array([[1.00000000e+00, 1.01311000e+05, 9.46555962e+04, 8.97570000e+01]])

In [85]:
df_pred_ident = pd.read_csv('../data/04-train-test-data/pred_ident.csv')
lap_identifier = df_pred_ident.values
print(lap_identifier.shape)
lap_identifier[88666:88667]

(182075, 2)


array([[16, 'Japanese Grand Prix']], dtype=object)

In [76]:
np_pred = np.column_stack((lap_identifier, lt_pred))
np_pred[:1]

array([[20, 'Australian Grand Prix', 82736.0]], dtype=object)

## Forecasting 

In [77]:
lt_pred = rf_regressor.predict(X_test)
lt_pred.shape

(182075,)

In [78]:
df_pred = pd.DataFrame(lt_pred)
df_pred.to_csv('../data/06-inferences/pred_data.csv', index=False)

## Further forcasting

If want to predict further with the speficic driver and event, we can prepare some data and follow the above way to predict/forecast.

Sample data:

[16, 'Japanese Grand Prix', 1, 10368, 9812, 8139]

# Summary

In this exercise, manual steps are mainly taken for the experimental and exploratory purpose. 

Once the experiment and development stage is finalized, the framework of MLOps (Machine Learning Operations) is recommended to use for the whole AI/ML project as the guidance to develop, deploy and monitor, etc.

Machine learning operations (MLOps) are a set of practices that automate and simplify machine learning (ML) workflows and deployments. We can use MLOps to automate and standardize processes across the ML lifecycle. These processes include model development, testing, integration, release, and infrastructure management.

MLOps is critical to systematically and simultaneously manage the release of new ML models with application code and data changes. The ML assets will be treated similarly to other continuous integration and delivery (CI/CD) environment software assets. The following diagram shows the automated MLOps lifecycle for reference. 

<img src="../pictures/mlops.png" width="800" align="center"/>