# Model Ranking and Select the Best Model
- Aggregates evaluation results from all trained models -

    - Arima (auto_arima optimization)
    - XGBoost (optuna optimization)
    - LSTM (Neural Network)

- compares their performance metrics and identifies the best-performing model.

Once the best model is selected, we'll performs error analysis and visualizes performance through plots.

- Purpose:

    - Provide a unified comparison of all candidate models
    - Select the most suitable model for deployment
    - Conduct error analysis and performance visualization for the chosen model

# Root Configuration

In [1]:
import sys
import os
from pathlib import Path

# get project root as parent of current working directory
project_root = Path(os.getcwd()).parent

if str(project_root) not in sys.path:
    sys.path.insert(0, str(project_root))

# Libraries

In [2]:
import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from pathlib import Path

# Model Comparison

## Metrics Overview

In [3]:
# model performance path
model_performance_dir = Path(r"../artifacts/model-performance")

# performance files
model_performance = pd.read_csv(model_performance_dir / "a_ModelPerformance.csv")
overfitting_analysis = pd.read_csv(model_performance_dir / "a_OverfittingAnalysis.csv")

In [4]:
# Table 1: All Models Performance
print("=== TABLE 4: OVERALL MODELS PERFORMANCE ===".center(110))
display(model_performance)

                                 === TABLE 4: OVERALL MODELS PERFORMANCE ===                                  


Unnamed: 0,Model,Test MAE,Test R2-Score,Test MAPE,CV MAE,CV R2,CV MAPE,RMSE Increase,Overfitting Ratio
0,Arima,24.663,-0.071,12.54,121.647,-48.653,84.104,-104.375,0.234
1,XGBoost,53.752,-4.351,38.378,6.272,0.294,9.406,49.399,5.913
2,LSTM,7.595,0.919,4.117,0.07,-0.191,12.897,8.624,101.567


In [5]:
# Table 2: Overfitting Analysis
print("=== TABLE 2: OVERFITTING ANALYSIS ===".center(70))
display(overfitting_analysis)

                === TABLE 2: OVERFITTING ANALYSIS ===                 


Unnamed: 0,Model,CV_RMSE,Test_RMSE,RMSE_Increase,Overfitting_Ratio
0,Arima,136.202,31.828,-104.375,0.234
1,XGBoost,10.055,59.453,49.399,5.913
2,LSTM,0.086,8.71,8.624,101.567


In [6]:
# models ranking based on Test MAPE, Test MAE, CV MAPE
model_ranking = model_performance.sort_values(by=['Test MAPE', 'Test MAE', 'CV MAPE'], ascending=True).reset_index(drop=True)

print("=== TABLE 3: MODELS RANKING ===".center(110))
display(model_ranking)


                                       === TABLE 3: MODELS RANKING ===                                        


Unnamed: 0,Model,Test MAE,Test R2-Score,Test MAPE,CV MAE,CV R2,CV MAPE,RMSE Increase,Overfitting Ratio
0,LSTM,7.595,0.919,4.117,0.07,-0.191,12.897,8.624,101.567
1,Arima,24.663,-0.071,12.54,121.647,-48.653,84.104,-104.375,0.234
2,XGBoost,53.752,-4.351,38.378,6.272,0.294,9.406,49.399,5.913


## Visualize Comparison