# Forecasting Weekly Hotel Cancellations with an ARIMA Model
Remember the weekly time series that you formed in Milestone 1? Your objective is to train an ARIMA model using this newly formed time series to forecast weekly cancellations for the hotel.


## Goal: 
The deliverable for this milestone is a Jupyter notebook outlining the following:

Configuration of an ARIMA model using pmdarima, with the model of best fit ranked according to the BIC

Calculation of model accuracy against the test set using RMSE

In [41]:
import warnings
import pandas as pd

import pmdarima as pm
from sklearn.metrics import root_mean_squared_error

import plotly.graph_objects as go

# Set pandas options to display more columns and rows
pd.set_option("display.max_columns", None)  # Show all columns
pd.set_option("display.max_rows", 50)  # Show up to 50 rows

# Silence warnings because pdarima is verbose
warnings.filterwarnings("ignore")

In [42]:
df = pd.read_csv("data/H1.csv")
df = df.sort_values(by=["ArrivalDateYear", "ArrivalDateWeekNumber"])
df["Date_Year_Week"] = df["ArrivalDateYear"].astype(str) + df[
    "ArrivalDateWeekNumber"
].astype(str)

cancellations_per_week = df.groupby("Date_Year_Week")["IsCanceled"].sum().reset_index()

In [46]:
# Calculate split point at 90% of data. This plot shows there's potentially data missing :).

split_point = int(len(cancellations_per_week) * 0.9)
print(f"Split point: {split_point}")

# Split the data
X_train = cancellations_per_week.iloc[:split_point]["Date_Year_Week"]
X_test = cancellations_per_week.iloc[split_point:]["Date_Year_Week"]

Y_train = cancellations_per_week.iloc[:split_point]["IsCanceled"]
Y_test = cancellations_per_week.iloc[split_point:]["IsCanceled"]

# Train arima
arima = pm.auto_arima(
    Y_train,
    start_p=0,
    start_q=0,
    max_p=10,
    max_q=10,
    start_P=0,
    start_Q=0,
    max_P=10,
    max_Q=10,
    m=51,
    stepwise=True,
    seasonal=True,
    information_criterion="bic",
    trace=True,
    d=1,
    D=1,
    error_action="warn",
    suppress_warnings=True,
    random_state=20,
    n_fits=30,
)

Split point: 103
Performing stepwise search to minimize bic
 ARIMA(0,1,0)(0,1,0)[51]             : BIC=611.924, Time=0.08 sec
 ARIMA(1,1,0)(1,1,0)[51]             : BIC=inf, Time=0.80 sec
 ARIMA(0,1,1)(0,1,1)[51]             : BIC=inf, Time=1.19 sec
 ARIMA(0,1,0)(1,1,0)[51]             : BIC=inf, Time=2.14 sec
 ARIMA(0,1,0)(0,1,1)[51]             : BIC=inf, Time=0.72 sec
 ARIMA(0,1,0)(1,1,1)[51]             : BIC=619.328, Time=2.01 sec
 ARIMA(1,1,0)(0,1,0)[51]             : BIC=592.095, Time=0.15 sec
 ARIMA(1,1,0)(0,1,1)[51]             : BIC=inf, Time=0.70 sec
 ARIMA(1,1,0)(1,1,1)[51]             : BIC=599.826, Time=0.86 sec
 ARIMA(2,1,0)(0,1,0)[51]             : BIC=585.987, Time=0.22 sec
 ARIMA(2,1,0)(1,1,0)[51]             : BIC=589.918, Time=0.96 sec
 ARIMA(2,1,0)(0,1,1)[51]             : BIC=589.918, Time=0.61 sec
 ARIMA(2,1,0)(1,1,1)[51]             : BIC=593.850, Time=1.21 sec
 ARIMA(3,1,0)(0,1,0)[51]             : BIC=587.833, Time=0.29 sec
 ARIMA(2,1,1)(0,1,0)[51]            

In [44]:
y_hat = arima.predict(n_periods=Y_test.shape[0], index=X_test)

rmse = root_mean_squared_error(Y_test, y_hat)

print(f"RMSE: {rmse}")

RMSE: 40.063526202126745


In [45]:
# Create figure
fig = go.Figure()

# Add training data
fig.add_trace(go.Scatter(y=Y_train, x=X_train, name="Training Data", mode="lines"))

# Add test data
fig.add_trace(go.Scatter(y=Y_test, x=X_test, name="Test Data", mode="lines"))

# Add predictions
fig.add_trace(go.Scatter(y=y_hat, x=X_test, name="Predictions", mode="lines"))

# Update layout
fig.update_layout(
    title="Hotel Cancellations: Actual vs Predicted",
    xaxis_title="Week",
    yaxis_title="Number of Cancellations",
    showlegend=True,
)

fig.show()