https://www.kaggle.com/code/prashant111/tutorial-time-series-forecasting-with-prophet

In [0]:
%pip install prophet

In [0]:
%restart_python

In [0]:
spark.sql('use catalog dbacademy')
spark.sql('use schema labuser9128531_1738705451')

In [0]:
current_catalog=spark.sql("SELECT current_catalog() AS catalog").collect()[0]['catalog']
print(current_catalog)

In [0]:
from pyspark.sql.functions import current_database
current_db = spark.sql("SELECT current_database()").collect()[0][0]
print(current_db)

In [0]:
#create volume
# volumeName='air'
# spark.sql("CREATE VOLUME " + current_catalog+ "." + current_db + "." + volumeName)

In [0]:
databaseName='air_wy'
spark.sql("create database if not exists " + current_catalog + "." + databaseName)

In [0]:
path='/Volumes/dbacademy/labuser9128531_1738705451/air/AirPassengers.csv'
# Read the CSV file into a DataFrame
df = spark.read.csv(path, header=True, inferSchema=True)

In [0]:
df.show()

In [0]:
from prophet import Prophet
from prophet.plot import plot_plotly
import plotly.offline as py
py.init_notebook_mode()

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')

## Understand data

In [0]:
type(df)

In [0]:
df_pd=df.toPandas()

In [0]:
df_pd.info()

In [0]:
df_pd['Date'] = pd.to_datetime(df_pd['Date'])

In [0]:
df_pd.info()

In [0]:
df_pd.head(10)

In [0]:
# this step is needed for creating Prophet model
df_pd.rename(columns={'Date':'ds', 'Passengers':'y'}, inplace=True)

## Visualize data

In [0]:
ax = df_pd.set_index('ds').plot(figsize=(12, 8))
ax.set_ylabel('Monthly Number of Airline Passengers')
ax.set_xlabel('Date')

plt.show()

## Forcast with Prophet

### Initiate Prophet project by specify a number of arguments

In [0]:
# set the uncertainty interval to 95% (the Prophet default is 80%)
my_model = Prophet(interval_width=0.95)

### Fit model

In [0]:
my_model.fit(df_pd)

#### Generate 36 datestamps in th efuture

In [0]:
future_dates = my_model.make_future_dataframe(periods=36, freq='MS') # start of the month
future_dates.head()

In [0]:
forecast = my_model.predict(future_dates)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].head()

observed values and blue lines are the predicted values and blue shaded region represents the uncertainty interval

In [0]:
my_model.plot(forecast, uncertainty=True)

### View the components

Trend line shows that the number of passengers increased through the years

The yearly trend shows that passenger numbers increased at the end of the month and increase during the summer time (July and August)

In [0]:
my_model.plot_components(forecast)

## Adding change points

Changepoints are the datetime points where the time series have abrupt changes in the trajectory.

By default, Prophet adds 25 changepoints to the initial 80% of the data-set.

Let’s plot the vertical lines where the potential changepoints occurred.

In [0]:
from prophet.plot import add_changepoints_to_plot
fig = my_model.plot(forecast)
a = add_changepoints_to_plot(fig.gca(), my_model, forecast)

In [0]:
my_model.changepoints

In [0]:
len(my_model.changepoints)

In [0]:
pro_change= Prophet(changepoint_range=0.9)
forecast = pro_change.fit(df_pd).predict(future_dates)
fig= pro_change.plot(forecast);
a = add_changepoints_to_plot(fig.gca(), pro_change, forecast)

In [0]:
pro_change= Prophet(n_changepoints=20, yearly_seasonality=True)
forecast = pro_change.fit(df_pd).predict(future_dates)
fig= pro_change.plot(forecast);
a = add_changepoints_to_plot(fig.gca(), pro_change, forecast)

## Adjust trend
Prophet allows us to adjust the trend in case there is an overfit or underfit.

changepoint_prior_scale helps adjust the strength of the trend.

Default value for changepoint_prior_scale is 0.05.

Decrease the value to make the trend less flexible.

Increase the value of changepoint_prior_scale to make the trend more flexible.

Increasing the changepoint_prior_scale to 0.08 to make the trend flexible.

In [0]:
pro_change= Prophet(n_changepoints=20, yearly_seasonality=True, changepoint_prior_scale=0.08)
forecast = pro_change.fit(df_pd).predict(future_dates)
fig= pro_change.plot(forecast);
a = add_changepoints_to_plot(fig.gca(), pro_change, forecast)

In [0]:
pro_change= Prophet(n_changepoints=20, yearly_seasonality=True, changepoint_prior_scale=0.001)
forecast = pro_change.fit(df_pd).predict(future_dates)
fig= pro_change.plot(forecast);
a = add_changepoints_to_plot(fig.gca(), pro_change, forecast)