In this notebook we will forecast electricity load using Multi-layer perceptron. This is one of the example of using MLP for time-series prediction. Before building a neural network we would first visualize the dataset and preprocess it if it seems necessary.

The dataset is provided at [OpenDataNepal.Net](https://opendatanepal.com/dataset/electricity-load-profile-of-nepal-in-2073-nepal-electricity-authority). The dataset is for the year 2073 B.S and the data load profile is given for each 12 months(Baisakh to Chaitra). Also, the data provided is not for each hour it is given at an interval of 1 hour for 1AM to 5AM, and from 5AM to 9PM it is given at an interval of 30 minutes.

First we will combine all the 12 months data into a single data frame using pandas and combine load hours. The preprocessed dataset is provided with this notebook as **Load_Profile_Data_2073.csv**.

In [167]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

In [299]:
dataset = pd.read_csv("Load_Profile_Data_2073.csv")

In [300]:
dataset

Unnamed: 0,1:00,2:00,3:00,4:00,5:00,6:00,7:00,8:00,9:00,10:00,...,15:00,16:00,17:00,18:00,19:00,20:00,21:00,22:00,23:00,0:00
0,717.49,707.64,706.44,737.34,745.09,1540.13,1599.88,1575.38,735.64,736.24,...,670.24,672.74,659.04,1262.68,1785.88,2237.78,2178.08,773.74,725.29,682.14
1,651.39,636.34,615.34,623.74,656.19,1375.33,1608.88,1717.48,880.74,863.34,...,904.54,890.94,837.94,1703.48,2216.28,2374.08,2304.18,835.04,802.89,772.94
2,754.89,744.44,729.04,724.74,769.39,1627.63,1763.18,1827.68,933.04,848.64,...,903.94,877.84,854.94,1888.78,2157.98,2095.28,2245.88,824.44,828.49,803.94
3,751.79,746.94,723.24,732.94,727.89,1511.43,1628.18,1777.18,904.24,914.44,...,903.84,852.14,784.14,1017.58,2035.72,1386.08,1036.88,826.94,771.89,767.84
4,722.89,596.24,595.24,595.24,635.19,1409.83,1556.18,1550.68,728.74,778.04,...,383.54,862.14,702.04,1708.98,2111.48,2385.98,2113.38,810.44,797.39,764.24
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
360,593.50,613.80,652.70,641.90,651.70,1373.80,1624.80,1841.70,899.60,850.40,...,852.60,847.80,103.50,1675.30,2201.70,2313.50,2219.70,784.20,749.10,720.10
361,692.30,685.60,683.70,698.60,699.60,1546.60,1701.70,1816.20,899.50,870.00,...,784.90,816.90,771.80,1703.40,2018.40,2258.10,2118.70,734.10,718.40,682.90
362,623.70,607.70,605.50,610.10,643.00,1418.60,1588.70,1699.30,871.90,848.60,...,856.60,868.40,802.50,1473.70,1612.90,1744.20,1705.30,784.50,745.40,730.70
363,693.70,682.30,669.50,686.70,715.70,1535.20,1729.10,1843.40,934.80,903.60,...,860.40,871.10,896.20,1667.30,2153.60,2301.80,2065.90,814.70,784.90,767.00


The main thing with time series prediction is the count of previous sequence we want to use. In this case we may use data of previous 3 days or 4 days or 10 days. In this project, we are going to use datas from previous 3 days.

Another interesting thing with time series prediction is **Test Cases**. Generally test cases are splitted from overall dataset in certail ration like 80:20, 70:30 depending upon the size of datasets(If we have a very large dataset then 90:10 split will provide sufficient test cases). The neural net will not see the test cases during training. So they are very important to see the generalization of our neural net. 

The problem is if we drop any data from our dataset then the train data will not be in sequence. Say we have a problem of dertermining next number given 2 inputs and data as 1,2,3,4,5,6,7,8,9,10 and if we drop 5,6 for testing then the remaining data will not be in sequence. So what we will do is given a sequence we will transform that sequence into individual datapoints as [1,2], [2,3], [3,4] ... and also the label will be 3, 4 .... . Now we can randomize the datapoints and drop certain input for testing too as individual datapoints are always in sequence. 


So, for the Electricity load profile let's breakdown the year-long data into individual datapoints.

In [301]:
## First change the data from pandas dataframe to numpy array
data = np.array(dataset)
data = data/10000

In [302]:
data.shape

(365, 24)

In [303]:
datapoints = list()
labelpoints = list()

split_count = 3 # we are going to seperate each 3 sequence
for row in range(data.shape[0] - split_count):
    datapoints.append(data[row:row+split_count, :])
    labelpoints.append(data[row+split_count])
    

## Just taking first 14 hours
datapoints = np.array(datapoints)[:, :, :14]
labelpoints = np.array(labelpoints)[:, :14]

In [304]:
print(datapoints.shape, labelpoints.shape)

(362, 3, 14) (362, 14)


In [305]:
input_size = 14*3
hidden_size = 20
output_size = labelpoints.shape[1]

weight_input_hidden = 0.2 * np.random.random((input_size, hidden_size)) - 0.1
weight_hidden_output = 0.2 * np.random.random((hidden_size, output_size)) - 0.1


def relu(x):
    return (x>=0) * x # returns x if x>=0, else 0

def relu2deriv(x):
    return x>=0

In [306]:
num_iterations = 200
alpha = 0.005

for itera in range(num_iterations):
    error = 0
    
    for idx in range(datapoints.shape[0]):
        label = labelpoints[idx].reshape((14, 1))
        layer_0 = datapoints[idx].reshape((input_size, 1))
        layer_1 = relu(weight_input_hidden.T.dot(layer_0))
        output = weight_hidden_output.T.dot(layer_1)
        error = error + np.sum((output - label)**2)
        
        delta_output = output - label
#         print(output[0:2], label[0:2])
        delta_layer_1 = weight_hidden_output.dot(delta_output) * relu2deriv(layer_1)
        
        weight_hidden_output = weight_hidden_output - alpha * layer_1.dot(delta_output.T)
        weight_input_hidden = weight_input_hidden - alpha * layer_0.dot(delta_layer_1.T)
    if itera % 9 == 0:    
        print(error)
    
    

55.92750282512371
0.6402991255285938
0.6208060175379133
0.6202901110558504
0.6197818948188362
0.6192729643854374
0.6187630754028478
0.6182521133429449
0.6177399798004259
0.6172265642156867
0.6167117449857243
0.616195414198664
0.6156774649907351
0.6151577915590487
0.6146362891620291
0.6141128541219589
0.6135873838298076
0.6130597767523391
0.6125299324414923
0.6119977515460227
0.6114631358254069
0.6109259881659856
0.610386212599357
