# Introduction

In this notebook, we will perform time series analysis with exploratory data analysis using Python. We will perform time series analysis and forecasting with an energy deman forecasting dataset, GEFM2014. 

The following is the process that will be followed in this notebook:

1. Start with **data description and cleaning**
2. Use **EDA** for better understanding of the dataset
3. **Time series prediction** to forecast energy demand and load.

## Import important libraries

In [None]:
# warnings for programs
import warnings
from datetime import time
from math import sqrt

# scientific math
import numpy as np 
# data processing
import pandas as pd

import statsmodels.api as sm
from sklearn.preprocessing import StandardScaler
# visualisation library
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

from plotly.subplots import make_subplots
import plotly.graph_objects as go

# sklearn

from sklearn.model_selection import train_test_split

%matplotlib inline 
# warnings.filterwarnings("ignore")

# ignore filters
warnings.filterwarnings("ignore")


## Load the data

For this tutorial, the Global Energy Forecasting Competion 2014 (GEFCom 2014) will be used. This dataset has 11 years of hourly temperature and 9 years of hourly load. 

In [None]:
# dataset path 
data_path = "data/energy.csv"

# load csv 
data = pd.read_csv(data_path)

# show the first 5 observations in the dataset
data.head()

## Data processing and cleaning

In [None]:
# check the data types
data.info()

From the output above, we can see that the **load** column has some missing values. 


In [None]:
# check for missing values in the dataset
data.isna().sum()

Approximately, 18% of the load column has missing values - since we have a large dataset. we will proceed by removing the observations with missing values. 

In [None]:
# drop missing values
data_clean = (data
       .dropna()
       .reset_index(drop=True))

# display the first 5 observations
data_clean.head()

The date column is an object and the hour column is an integer. I would like to concatenate these two columns into one and convert the combined column to a pandas series. 

To get the date column in to the correct format  - ***dd/mm/yyyy H:M*** There will be some steps done to get the data to the right format.

We will do this in the next step.

In [None]:
# create a list from the hour column
hour_list = data_clean["Hour"].tolist()
# create an empty list 
hour_list_ = []
# loop through each value in the hour_list
for hours in hour_list:\
    # if the value is equal to 24
    if hours == 24:
        # set the number to 0
        hours = 0
    # use the time function to convert the hour value to a time object
    hour_str = time(hour=hours, minute=0).strftime("%H:%M")
    # append the new time value format to the empty list
    hour_list_.append(hour_str)

In [None]:
# replace hour list
data_clean["Hour"] = hour_list_

In [None]:
# concatenate date and time
data_clean["Date"] = data_clean["Date"] + " " + data_clean["Hour"]

In [None]:
# convert date column to date time
data_clean["Date"] = pd.to_datetime(data_clean["Date"], format="%d/%m/%Y %H:%M")
# set new index
data_clean = data_clean.set_index(["Date"])


In [None]:
# check first 24 observations
data_clean.head(24)

00:00 should be at the start and not end of the day. Let's fix that

In [None]:
data_clean = (data_clean.
              sort_index()
             )

Extract time series features - day, hour, quarter, month, year  - for time series analysis

In [None]:
# create a copy of df
data_clean_ = data_clean.copy()
data_clean_.head()

In [None]:
# extract hour 
data_clean_['hour'] = data_clean_.index.hour
# extract day of the week 
data_clean_['dayOfWeek'] = data_clean_.index.dayofweek
# extract quarter
data_clean_['quarter'] = data_clean_.index.quarter
# extract the month
data_clean_['month'] = data_clean_.index.month
# extract the year
data_clean_['year'] = data_clean_.index.year

In [None]:
data_clean_.head()

In [None]:
# make change to day of week column
# 0 - monday to 1 - monday
# 6 - Sunday to 7 - sunday
data_clean_["dayOfWeek"] = data_clean_["dayOfWeek"] + 1 

In [None]:
data_clean = data_clean_

## Exploratory Data Analysis and Time Series Analysis
 
Let us look at the following time-related information and features:
- Trend : investigate if their is any visible change in load and temperatures - such as increase or decrease in value over a prolonged period of time.
- Seasonality : investigate any recurring and persistent patterns within a time period
- Cyclic: investigate any recurring and persistent patterns of upward or downward changes - no fixed pattern
- Noise: investigate any irregular and unpredictable changes in the the data

The `statsmodels` module has the `seasonal_decompose()` function which allows us to **decompose** the time series data into the four components mentioned above. 

In [None]:
# create visualisations for the demand in various timeframes
fig = make_subplots(2,1)

# 
fig.add_trace(
    go.Scatter(x = data_clean.index, y=data_clean["load"],name=" hourly load")
)
fig.add_trace(
    go.Scatter(x = data_clean.index, y=data_clean["T"], name="hourly temperature"), row=2, col=1
)

fig.update_layout(height=900, width=1000, title_text="Energy load ")
fig.show()

In [None]:
# # plot load and temperature column
# fig, (ax1,ax2) = plt.subplots(2,1,figsize=(15, 15))
# ax1.plot(data_clean.index, data_clean["load"], marker="*", linestyle="-", markersize=2)
# ax1.grid(True)
# ax1.set_xlabel("Date")
# ax1.set_ylabel("Load")
# ax1.set_title("Hourly load from 2006 - 2015")

# ax2.plot(data_clean.index, data_clean["T"], marker="*", linestyle="-", markersize=4)
# ax2.grid(True)
# ax2.set_xlabel("Date")
# ax2.set_ylabel("Temperature")
# ax2.set_title("Hourly temperature from 2006 - 2015")
# # matplotlib.rcParams['figure.figsize'] = [24, 6]

In [None]:
# use the square-root rule to determine the number of bins
num_samples = len(data_clean.index)
num_bins = int(sqrt(num_samples))
num_bins

In [None]:
# lets look at the distribution
data_clean["load"].plot.hist(bins=num_bins,  edgecolor = 'black')

In [None]:
data_clean["T"].plot.hist(bins=num_bins,  edgecolor = 'black')

In [None]:
# create visualisations for the demand in various timeframes
fig = make_subplots(5,1)

fig.add_trace(
    go.Box(x = data_clean['hour'], y=data_clean['load'],name="hour")
)
fig.add_trace(
    go.Box(x = data_clean['dayOfWeek'], y=data_clean['load'], name="dayofWeek"), row=2, col=1
)
fig.add_trace(
    go.Box(x = data_clean['month'], y=data_clean['load'], name="month"), row=3, col=1
)
fig.add_trace(
    go.Box(x = data_clean['quarter'], y=data_clean['load'], name="quarter"), row=4, col=1
)
fig.add_trace(
    go.Box(x = data_clean['year'], y=data_clean['load'], name="year"), row=5, col=1
)
fig.update_layout(height=900, width=1000, title_text="Power load by various timeframes")
fig.show()

In [None]:
# create visualisations for the demand in various timeframes
fig = make_subplots(5,1)

fig.add_trace(
    go.Box(x = data_clean['hour'], y=data_clean['T'], name="hour")
)
fig.add_trace(
    go.Box(x = data_clean['dayOfWeek'], y=data_clean['T'], name="dayofWeek"), row=2, col=1
)
fig.add_trace(
    go.Box(x = data_clean['month'], y=data_clean['T'], name="month"), row=3, col=1
)
fig.add_trace(
    go.Box(x = data_clean['quarter'], y=data_clean['T'], name="quarter"), row=4, col=1
)
fig.add_trace(
    go.Box(x = data_clean['year'], y=data_clean['T'], name="year"), row=5, col=1
)
fig.update_layout(height=900, width=1000, title_text="Temperature by various timeframes")
fig.show()

In [None]:
# extract the load
load_decomposition = sm.tsa.seasonal_decompose(data_clean["load"])
# create figure
fig_load = load_decomposition.plot()
matplotlib.rcParams['figure.figsize'] = [24, 10]

In [None]:
# extract the temperature
temp_decomposition = sm.tsa.seasonal_decompose(data_clean["T"])
# create figure
fig_temp = temp_decomposition.plot()
matplotlib.rcParams['figure.figsize'] = [24, 10]

From the two seasonal figures above, we can see there is not much information to be gleaned from the plot. Therefore, we will resample the data from an hourly scale to a daily scale. 

Before resampling the dataset, we need to seperate the load and temperature columns. This is because for the load column - we would want the daily load, this equates to calculating the **sum** for each day. While for the temperature, we would get more information by calculating the **mean** temperature for the day. 

In [None]:
# seperate the load and temperature to their own dataframes
load_df = pd.DataFrame(data_clean["load"])
temp_df = pd.DataFrame(data_clean["T"])

In [None]:
len(load_df)

In [None]:
# resample the load to daily sum
load_df_daily = load_df.resample("D").sum()
# resample the temperature df to daily mean
temp_df_daily = temp_df.resample("D").mean()

In [None]:
len(load_df_daily)

Now, let us decompose these daily time series.

In [None]:
# extract the load
load_decomposition = sm.tsa.seasonal_decompose(load_df_daily["load"])
# create figure
fig_load = load_decomposition.plot()
matplotlib.rcParams['figure.figsize'] = [24, 12]

In [None]:
# extract the temperature
temp_decomposition = sm.tsa.seasonal_decompose(temp_df_daily["T"])
# create figure
fig_temp = temp_decomposition.plot()
matplotlib.rcParams['figure.figsize'] = [24, 12]

## Time Series Forecasting

Simple forecsting methods that can be used as benchmarks

- average method
- naive method
- seasonal naive method
- drift method

### Data Splitting
In order to validate the performance of each method we will split the dataset into train/validation/test with the following ratio:

- train: 70%
- validation: 20%
- test: 10% 

In [None]:
# calculate train_test_indice
train_test_indice = int(len(load_df_daily) * 0.9)
# train_test_indice

# create extract datetimeIndex based on indice
train_indice = load_df_daily.index[train_test_indice]
# train_indice

In [None]:
# test_indice

In [None]:
# create train and test df using indice
# dataframe that contains train and validation - will be split next
load_train_val_df = load_df_daily.loc[:train_indice]
temp_train_val_df = temp_df_daily.loc[:train_indice]
# dataframe that contains test
# shift the train_indice timestamp by one day - so test does not contain last value of validation set
test_indice = train_indice + pd.Timedelta(hours=24)
# use test_indice to split daily
load_test_df = load_df_daily.loc[test_indice:]
temp_test_df = temp_df_daily.loc[test_indice:]

In [None]:
# now split the train_val dataset into train and validation
train_val_indice = int(len(load_train_val_df) *0.8)
# train_val_indice
# create extract datetimeIndex based on indice
# this indice is where the training data ends
# a day later is the validation data starts
val_indice = load_train_val_df.index[train_val_indice]


In [None]:
load_train_df = load_train_val_df.loc[:val_indice]
temp_train_df = temp_train_val_df.loc[:val_indice]
# validation indice
val_start_indice = val_indice + pd.Timedelta(hours=24)
# split validation set
load_val_df = load_train_val_df.loc[val_start_indice:]
temp_val_df = temp_train_val_df.loc[val_start_indice:]

In [None]:
load_train_df.head()

In [None]:
load_train_df.tail()

In [None]:
load_val_df.head()

In [None]:
load_val_df.tail()

In [None]:
load_test_df.head()

In [None]:
load_test_df.tail()

Plot the train, validation, and test set

In [145]:
fig = go.Figure()
fig.add_trace(
    go.Scatter(x = load_train_df.index, y=load_train_df["load"],name="train")
)
fig.add_trace(
    go.Scatter(x = load_val_df.index, y=load_val_df["load"], name="validation")
)

fig.update_layout(height=900, width=950, title_text="Daily total energy load")
fig.show()

In [None]:
fig = go.Figure()
fig.add_trace(
    go.Scatter(x = temp_train_df.index, y=temp_train_df["T"],name="train")
)
fig.add_trace(
    go.Scatter(x = temp_val_df.index, y=temp_val_df["T"], name="validation")
)

fig.update_layout(height=900, width=950, title_text="Daily mean temperature")
fig.show()

In [None]:
fig = go.Figure()
fig.add_trace(
    go.Scatter(x = load_train_df.index, y=load_train_df["load"],name="train")
)
fig.add_trace(
    go.Scatter(x = load_val_df.index, y=load_val_df["load"], name="validation")
)
fig.add_trace(
    go.Scatter(x = load_val_df.index, y=load_avg, name="average_forecast")
)
fig.update_layout(height=900, width=950, title_text="Daily total energy load")
fig.show()

In [158]:
def forecast_plots(method, y_forecast, variable="load", title = "Forecast energy load"):
    fig = go.Figure()
    
    if variable == "load":
        global load_train_df
        train_data = load_train_df
        global load_val_df
        val_data = load_val_df
    elif variable == "T":
        global temp_train_df
        train_data = temp_train_df
        global temp_val_df
        val_data = temp_val_df
    else:
        print("Invalid keyword")
    
    fig = go.Figure()
    
    fig.add_trace(
        go.Scatter(x = train_data.index, y=train_data[variable],name="train "+variable)
    )
    fig.add_trace(
        go.Scatter(x = val_data.index, y=val_data[variable], name="validation "+variable)
    )
    fig.add_trace(
        go.Scatter(x = val_data.index, y=y_forecast, name=method+"_forecast")
    )
    if variable=="load":
        y_axis = "Energy load (W)"
    elif variable=="T":
        y_axis ="Temperature (C)"
    fig.update_layout(height=900, 
                      width=950,
                      title_text=title,
                      xaxis_title="Date",
                      yaxis_title=y_axis
                     )
    fig.show()
    

### Average Method

In this method, the forecasts of all future values are the average of the historical data. 

let:

$y_{1},...,y_{T}$: historical data

$\hat y_{T+h|T} = \bar y = (y_{1}+...+y_{T})/T $

$\hat y_{T+h|T}$ - estimate of $y_{T+h|T}$ is based on the historical data. 

In [None]:
# calculate averages of training data 
# replicate the mean by the number of observations in the validation set
load_avg = list(load_train_df.mean().values) * len(load_val_df)

In [None]:
temp_avg = list(temp_train_df.mean().values) * len(temp_val_df)

In [159]:
forecast_plots("Average",load_avg, "load", "Average Forecast method - load")

In [161]:
forecast_plots("Average",temp_avg, "T", "Average Forecast method - temperature")

### Naive Method

In this method, we set all the forecasts to be the value of the last observation. 

$$\hat y_{T+h|T} = y_{T}$$

This method is best suited for economic and financial time series. 

In [127]:
# get the last observation of the load and temp series
load_naive = load_train_df.values.tolist()[-1] * len(load_val_df)


In [128]:
len(load_naive)

591

In [129]:
temp_naive = temp_train_df.values.tolist()[-1] * len(temp_val_df)

In [162]:
forecast_plots("Naive",load_naive, "load", "Naive Forecast method - load")

In [132]:
# plot
fig = go.Figure()
fig.add_trace(
    go.Scatter(x = load_train_df.index, y=load_train_df["load"],name="train")
)
fig.add_trace(
    go.Scatter(x = load_val_df.index, y=load_val_df["load"], name="validation")
)
fig.add_trace(
    go.Scatter(x = load_val_df.index, y=load_naive, name="naive_forecast")
)
fig.update_layout(height=900, width=950, title_text="Naive forecast - total energy load")
fig.show()

In [163]:
forecast_plots("Naive",temp_naive, "T", "Naive Forecast method - temperature")

### Seasonal Naive Method

This method is used for highly seasonal data. In this method, each forecast is equal to the last observed value form the same season, for example the same day a year ago. The forecast is written as

$$\hat y_{T+h|T} = y_{T+h-m(k+1)} $$

where:
- m: seasonal period
- k: number of complete years in the forecast period prior to the future time



In [None]:
load_train_df.head()


In [None]:
load_train_df.tail()

In [None]:
years = 6

In [None]:
def seasonal_forecast(years,df_train, df_val):
    # create seasonal prediction list
    seasonal_prediction = []
    # iterate through each value in the validation 
    for day in df_val.index.tolist():
        # create list to store seasonal past observations
        past_days_obs = []
        # iterate through the number of years in the train 
        for year in range(1, years+1):
            # get the timedelta
            time_change = pd.DateOffset(years=year)
            try:
                # calculate previous day
                previous_day = day - time_change
            except Exception as e:
                continue
            
            if previous_day in df_train.index:
                # add to past observation list
                past_days_obs.append(df_train.loc[previous_day].values)
            else:
                continue
        y_seasonal = np.mean(past_days_obs)
        seasonal_prediction.append(y_seasonal)
    
    return seasonal_prediction
                

In [None]:
seasonal_prediction = []
for day in load_val_df.index.tolist():
#     print("#"*20)
#     print(f"validation day-- {day}")
#     print("#"*20)
    past_days_obs = []
    for year in range(1, years+1):
        time_change = pd.DateOffset(years= year) #pd.Timedelta(days=year*365)
        try:
            
            previous_day = day - time_change
            print(previous_day)
            
#             print(load_train_df.loc[previous_day])
        except Exception as e:
            print(f"{e} - cant get previous day")
        # check if day is in training
        if previous_day in load_train_df.index:
            # add to list
            past_days_obs.append(load_train_df.loc[previous_day].values)
        else:
            continue
    y_seasonal = np.mean(past_days_obs)
    seasonal_prediction.append(y_seasonal)
    
        

In [None]:
load_seasonal = seasonal_forecast(6, load_train_df, load_val_df)
len(load_seasonal)

In [164]:
forecast_plots("Seasonal",load_seasonal, "load", "Seasonal Naive Forecast method - load")

In [None]:
temp_seasonal = seasonal_forecast(6, temp_train_df, temp_val_df)
print(len(temp_seasonal))

In [165]:
forecast_plots("Seasonal",temp_seasonal, "T", "Seasonal Naive Forecast method - temperature")

### Drift Method

This a variation of the naive method - were the forecasts are allowed to increase or decrease over time. The amount of change over time is reffered to as **drift**, which is the average change seen in the past. The forecast for a specified horizon - $T + h$:

$$\hat y_{T+h|h}  = y_{T} + \frac{h}{T - 1}\sum_{t=2}^{T} (y_{t} - y_{t-1}) = y_{T} + h (\frac{y_{T} - y_{1}}{T - 1})$$

Where:

$y_{T}$ - Last observation

$y_1$ - First observation

h - horizon

In [None]:
# calculate drift method 
# changing values 
# time period - increase with the length of validation data
# number of past time periods is the length of training set 
# # constants
# last value in training  - yt
# first value in training - y1
# for num in range(1, 20+1):
#     print(num)
# load
# get first observation
load_first = load_train_df.iloc[0].load
# get last observation
load_last = load_train_df.iloc[-1].load
# get load forecast horizon
load_horizon = len(load_val_df)
# get the number of previous observations
load_obs = len(load_train_df)

# temp
# get first observation
temp_first = temp_train_df.iloc[0].values[0]
# get last observation
temp_last = temp_train_df.iloc[-1].values[0]
# get temp forecast horizon
temp_horizon = len(temp_val_df)
# get number of previous observations
temp_obs = len(temp_train_df)

In [None]:
temp_last#.values[0]

In [None]:
def drift_forecast(forecast_horizon, last_obs, first_obs, numb_obs):
    forecast_list = []
    for h in range(1, forecast_horizon+1):
        y_hat = last_obs + (h*(last_obs-first_obs)/(numb_obs - 1))
        forecast_list.append(y_hat)
        
    return forecast_list


In [None]:
# drift forecasts
load_drift = drift_forecast(load_horizon, load_last, load_first, load_obs)
temp_drift = drift_forecast(temp_horizon, temp_last, temp_first, temp_obs)

In [166]:
forecast_plots("Drift",load_drift, "load", "Drift Forecast method - load")

In [167]:
forecast_plots("Drift",temp_drift, "T", "Drift Forecast method - temp")

## Forecast Accuracy Evaluation

The forecast 'error' is simply the difference between an observed value and its forecast. The error is the unpredictable portion of the observation and is written as:
$$e_{T+h} = y_{T+h} - \hat y_{T=h|T}$$

where:

Training data is  $\{y_{1},...,y_{T}\}$

Test data is $\{y_{T+1},y_{T+2},...\}$

The two error measures that will be used in this notbook are the MAE and RMSE.

**Mean absolute error:** is a measure of errors between paired observations expressing the same phenomenon. This error measure is commonly used measure forecast error in time series analysis. It is simply the average absolute verticcal or horiontal distance between each point in a scatter plot and the Y=X line. 

$$ MAE = mean(|e_{t}|)$$

**Root mean squared error:** is the standard deviation of the residuals. THe residuals measure how far from the regression line data poins are. RMSE simply measures the spread of the reiduals essentially informing you of the concetration of data around the line of best fit. 

$$RMSE = \sqrt(mean(\mathrm{e}^{2}_{t})$$

In [112]:
from sklearn.metrics import mean_absolute_error as MAE
from sklearn.metrics import mean_squared_error as MSE
from math import sqrt

def RMSE(mse):
    return sqrt(mse)

In [135]:
# calculate errors for mean
# mean method
load_avg_mae = MAE(load_val_df["load"], load_avg)
load_avg_rmse = RMSE(MSE(load_val_df["load"], load_avg))
# naive method
load_naive_mae = MAE(load_val_df["load"], load_naive)
load_naive_rmse = RMSE(MSE(load_val_df["load"], load_naive))
#seasonal method
load_seasonal_mae = MAE(load_val_df["load"], load_seasonal)
load_seasonal_rmse = RMSE(MSE(load_val_df["load"], load_seasonal))
# drift method
load_drift_mae = MAE(load_val_df["load"], load_drift)
load_drift_rmse = RMSE(MSE(load_val_df["load"], load_drift))

In [136]:
load_error_dict = {}
# methods
load_error_dict["methods"] = ["Mean", "Naive", "Seasonal Naive", "Drift"]
load_error_dict["MAE"] = [load_avg_mae, load_naive_mae, load_seasonal_mae, load_drift_mae]
load_error_dict["RMSE"] = [load_avg_rmse, load_naive_rmse, load_seasonal_rmse, load_drift_rmse]
load_error_dict

{'methods': ['Mean', 'Naive', 'Seasonal Naive', 'Drift'],
 'MAE': [6136.60668615951,
  6122.334179357022,
  4467.480795262267,
  6120.9172473026965],
 'RMSE': [7647.158673672504,
  7545.728677618003,
  5851.923820610397,
  7549.86896248381]}

In [137]:
temp_val_df["T"]

Date
2012-06-26    61.070000
2012-06-27    64.180417
2012-06-28    70.264583
2012-06-29    72.125000
2012-06-30    77.360833
                ...    
2014-02-02    35.444167
2014-02-03    27.471667
2014-02-04    22.778333
2014-02-05    21.236250
2014-02-06    13.917083
Freq: D, Name: T, Length: 591, dtype: float64

In [138]:
# mean method
temp_avg_mae = MAE(temp_val_df["T"], temp_avg)
temp_avg_rmse = RMSE(MSE(temp_val_df["T"], temp_avg))
# naive method
temp_naive_mae = MAE(temp_val_df["T"], temp_naive)
temp_naive_rmse = RMSE(MSE(temp_val_df["T"], temp_naive))
#seasonal method
temp_seasonal_mae = MAE(temp_val_df["T"], temp_seasonal)
temp_seasonal_rmse = RMSE(MSE(temp_val_df["T"], temp_seasonal))
# drift method
temp_drift_mae = MAE(temp_val_df["T"], temp_drift)
temp_drift_rmse = RMSE(MSE(temp_val_df["T"], temp_drift))

In [139]:
temp_error_dict = {}
# methods
temp_error_dict["methods"] = ["Mean", "Naive", "Seasonal Naive", "Drift"]
temp_error_dict["MAE"] = [temp_avg_mae, temp_naive_mae, temp_seasonal_mae, temp_drift_mae]
temp_error_dict["RMSE"] = [temp_avg_rmse, temp_naive_rmse, temp_seasonal_rmse, temp_drift_rmse]
temp_error_dict

{'methods': ['Mean', 'Naive', 'Seasonal Naive', 'Drift'],
 'MAE': [16.774913760425243,
  20.070852368866333,
  6.193286073510058,
  23.937166280231267],
 'RMSE': [19.314274682911826,
  25.444145772132845,
  7.911492263044839,
  29.98732946557194]}

In [143]:
load_error_df = pd.DataFrame(load_error_dict)
load_error_df.head()


Unnamed: 0,methods,MAE,RMSE
0,Mean,6136.606686,7647.158674
1,Naive,6122.334179,7545.728678
2,Seasonal Naive,4467.480795,5851.923821
3,Drift,6120.917247,7549.868962


In [144]:
temp_error_df  = pd.DataFrame(temp_error_dict)
temp_error_df.head()

Unnamed: 0,methods,MAE,RMSE
0,Mean,16.774914,19.314275
1,Naive,20.070852,25.444146
2,Seasonal Naive,6.193286,7.911492
3,Drift,23.937166,29.987329


In [170]:
# create combined chart - load
fig = go.Figure()
fig.add_trace(
    go.Scatter(x = load_train_df.index, y=load_train_df["load"],name="train")
)
fig.add_trace(
    go.Scatter(x = load_val_df.index, y=load_val_df["load"], name="validation")
)
fig.add_trace(
    go.Scatter(x = load_val_df.index, y=load_avg, name="avg_forecast")
)
fig.add_trace(
    go.Scatter(x = load_val_df.index, y=load_naive, name="naive_forecast")
)
fig.add_trace(
    go.Scatter(x = load_val_df.index, y=load_seasonal, name="seasonal_forecast")
)
fig.add_trace(
    go.Scatter(x = load_val_df.index, y=load_drift, name="drift_forecast")
)
fig.update_layout(height=900,
                  width=950,
                  xaxis_title="Date",
                  yaxis_title="Energy(W)",
                  title_text="Load forecast")
fig.show()

In [171]:
# create combined chart - temperature
fig = go.Figure()
fig.add_trace(
    go.Scatter(x = temp_train_df.index, y=temp_train_df["T"],name="train")
)
fig.add_trace(
    go.Scatter(x = temp_val_df.index, y=temp_val_df["T"], name="validation")
)
fig.add_trace(
    go.Scatter(x = temp_val_df.index, y=temp_avg, name="avg_forecast")
)
fig.add_trace(
    go.Scatter(x = temp_val_df.index, y=temp_naive, name="naive_forecast")
)
fig.add_trace(
    go.Scatter(x = temp_val_df.index, y=temp_seasonal, name="seasonal_forecast")
)
fig.add_trace(
    go.Scatter(x = temp_val_df.index, y=temp_drift, name="drift_forecast")
)
fig.update_layout(height=900,
                  width=950,
                  xaxis_title="Date",
                  yaxis_title="Temperature (C)",
                  title_text="Temperature forecast")
fig.show()