# Simple NN for Power Usage Prediction
---
This notebook contains code to train a simple, 2-layer neural net to predict a regression target: `Global_active_power`. This is to approach the problem of household power consumption, in accordance with my stated goal and hypothesis:
> **Predictive goal**: Predict future active power that a household will need on an hourly basis (e.g, at 9am, 10am, etc.) with high accuracy. The best predictions will avoid underestimating the active power. 

>**Hypothesis**: I think power meter readings and _past_ hourly usage will be great predictors for an ML model. 

In previous notebooks, I formatted train and test data in prep for training an ML model, and trained a baseline linear regression model:
> Baseline RMSE: ~0.34.

📍In this notebook, I will attend to the same measure of error, and see if I can improve upon the baseline in terms of overall RMSE as well as in avoiding cases of under-estimation.

### NN Model Creation

The process will be broken down into the following steps:
>1. Load the data
2. Create train/test dataloaders
3. Define a neural network
4. Train the model
5. Evaluate the performance of our trained model on a test dataset
6. Further investigate patterns in errors
7. UX Considerations

These are almost the same steps as when I developed a baseline solution. 

One additional step, of note, is step 3: **Define a neural network**

In developing an NN-based solution, you'll typically have to define the architecture of that neural network, rather than relying on a default algorithm structure like linear regression. 

Before we begin, we have to import the necessary libraries for working with data and PyTorch.

In [1]:
# import PyTorch libraries
import torch
from torch.utils.data import DataLoader

## reproducibility
torch.manual_seed(0)

# import data libraries
import os
import numpy as np
import matplotlib.pyplot as plt

---
## Step 1: Load the Data


This cell defines a custom Dataset class that will allow us to read in a specifically-formatted csv file of power usage data, and convert that data into Tensors for PyTorch to work with.

In [2]:
from helpers import PowerConsumptionDataset

# creating train and test datasets using the PowerConsumptionDataset class
train_path = 'data/train_hourly.pkl'
train_dataset = PowerConsumptionDataset(pkl_file=train_path)

test_path = 'data/test_hourly.pkl'
test_dataset = PowerConsumptionDataset(pkl_file=test_path)


In [3]:
# print out a few (3) samples to see that it looks right
# I should have 7 input features and 1 target

for i in range(3):
    sample = train_dataset[i]
    print()
    print(sample)



(tensor([  1.0728,   0.9341,  17.0000, 234.6439,   0.0000,   0.5278,  16.8611]), tensor([4.2229]))

(tensor([  1.3521,   1.0581,  18.0000, 234.5802,   0.0000,   6.7167,  16.8667]), tensor([3.6322]))

(tensor([  1.7886,   1.1519,  19.0000, 233.2325,   0.0000,   1.4333,  16.6833]), tensor([3.4002]))


---
## Step 2: Creating DataLoaders for Train/Test Datasets

DataLoaders allow us to do things like batch data (for batch learning), shuffle data, etc.—they are the standard way to iterate through data for training a PyTorch model.

In [4]:
# how many samples per batch
batch_size = 64

# train and test loaders
train_loader = DataLoader(train_dataset, batch_size=batch_size)

test_loader = DataLoader(test_dataset, batch_size=batch_size)

---
## Step 3: Define the Neural Network Architecture

The architecture will be responsible for transforming input features into a single target value. 

> This particular example defines a 2-layer NN.


In [5]:
# importing NN modules
import torch.nn as nn
import torch.nn.functional as F

# a simple 2 layer (input-hidden-output) NN
class SimpleNet(nn.Module):
    
    ## Defines the layers of an NN
    def __init__(self, input_dim, hidden_dim, output_dim):
        '''Defines layers of a neural network.
           :param input_dim: Number of input features
           :param hidden_dim: Size of hidden layer(s)
           :param output_dim: Number of outputs
         '''
        super(SimpleNet, self).__init__()
                
        # defining linear layers that go input > hidden > output
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
    
    ## Defines the feedforward behavior of the network
    def forward(self, x):
        '''Feedforward behavior of the net.
           :param x: A batch of input features
           :return: A batch of output values; predictions
         '''
        out = F.relu(self.fc1(x)) # ReLU activation fn applied to output of hidden layer
        out = self.fc2(out) # final output, no activation fn needed 
        return out 

In [6]:
# instantiating the simple NN with specified dimensions

input_dim = 7 # input feats
output_dim = 1 # one target value
hidden_dim = 10 # nodes in hidden layer


model = SimpleNet(input_dim, hidden_dim, output_dim)

# print model layers (from init fn)
model

SimpleNet(
  (fc1): Linear(in_features=7, out_features=10, bias=True)
  (fc2): Linear(in_features=10, out_features=1, bias=True)
)

### Define loss and optimization strategy

The loss function defines what a network tries to minimize in terms of comparing actual versus predicted values. In classification tasks, it is common to use a cross entropy loss and in regression tasks, such as this one, you'll commonly see mean squared error or root mean squared error (RMSE). You can also create custom loss functions depending on what you want to optimize for!

The optimizer defines how a neural network updates or learns as a result of trying to minimize the loss function. 

In [7]:
# specify loss function (categorical cross-entropy for classification, mse for regression)
criterion = nn.MSELoss()

# specify optimizer (stochastic gradient descent) and learning rate = 0.001
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

---
## Step 4: Train loop + saving a model

In the `helpers.py` file I include code for a training loop that does the following:
* Iterate through the training data in batches provided by the `train_loader` 
* Calculate the loss (RMSE) and backpropagate to find the source of this error
* Update the weights of this NN to decrease the loss
* After a specified number of epochs, save the final, trained model


📍 **EXPERIMENTAL NOTE:** I'd like to add validation data to implement early-stopping and avoid overfitting even more. For now, I am eye-balling an appropriate number of epochs to run instead.

In [8]:
from helpers import train

# I've created this local directory to save trained models
MODEL_DIR = 'saved_models/'
model_name = 'model_10hdn_15ep.pth'

# define number of epochs - times you iterate through the entire training dataset
n_epochs = 15

# call train function with all params
model = train(model, train_loader, n_epochs, optimizer, criterion, model_name, MODEL_DIR)

Epoch: 1, Loss: 1.0393941646499885
Epoch: 2, Loss: 0.6835263245325867
Epoch: 3, Loss: 0.6374956142130516
Epoch: 4, Loss: 0.5903895092453603
Epoch: 5, Loss: 0.5523993382565409
Epoch: 6, Loss: 0.5183460384750251
Epoch: 7, Loss: 0.4992046398724869
Epoch: 8, Loss: 0.4758248953796405
Epoch: 9, Loss: 0.4617599566563142
Epoch: 10, Loss: 0.45188950731171124
Epoch: 11, Loss: 0.44272165272018604
Epoch: 12, Loss: 0.4378582664923988
Epoch: 13, Loss: 0.4333353827146889
Epoch: 14, Loss: 0.42942123668942805
Epoch: 15, Loss: 0.4256879380709833
Saving the model as model_10hdn_15ep.pth


---
## Step 5: Test the Trained Network

Finally, I test my trained model on **test data** which it has NOT seen during training, and evaluate its performance in terms of RMSE. This calculation is completed by another function in `helpers.py`.

Testing on unseen data is a good way to check that our model generalizes well, and, in this case, if it can generalize to future data (2010 vs 2006-2009). 

It may also be useful to be granular in this analysis and take a look at the distribution of errors that the model tends to make by comparing actual versus predicted values. 

In [9]:
from helpers import test_eval

test_rmse = test_eval(model, test_loader, criterion)

print('Test RMSE: {:.6f}\n'.format(test_rmse))


Test RMSE: 0.318717



**Simple NN RMSE** 📝

This looks like a _slight_ improvement on the baseline model, which is promising!
> From a baseline of around 0.34 RMSE to a simple NN value of about 0.32 RMSE.

Simple NNs should produce at least comparable results to a good baseline. And, of course, I will haev to look at the distribution of errors to see if this does better in the way we care about for this use case: less likely to under-estimate power usage. 

---
## Step 6: Further Comparing Predictions vs Targets

📝 The RMSE looks slightly improved when compared to the linear regression model trained as a baseline, which is promising! 

Next, I compare target versus predicted values in the same way that I would do for a baseline—looking at the distribution of errors and especially attending to under-estimations. For this, I am using turicreate's `show()` but other summary stats or viz tools will work well too. 

In [10]:
# actual and predicted values (from model)
actual = test_dataset[:][1]
preds = model(test_dataset[:][0])

# get diffs and turn into numpy array
diffs = actual - preds
diffs_np = diffs.squeeze().detach().numpy()

In [13]:
import turicreate as tc

# convert np array to SFrame for tc distribution viz
diffs_tc = tc.SFrame(diffs_np)

# uncomment
#diffs_tc.show()

---
## Step 7: Further UX Considerations 📝 

At Apple, we are always thinking about the nuances of the user experience for different populations. This section represent answers to a set of questions that ask us to consider inclusive design practices, such as:

* **Failure cases**: What might go wrong, and how does the likelihood of failures vary across users?
* **Delight**: What potential impact of the OBC feature are you most excited about?

For any model you are thinking of putting into production or sharing with a larger team, you should critically consider the different tradeoffs and impacts such a trained model could have on different users. 

### Potential failure cases: 

* **User experience and fairness**: I would like to test this model on different locales around the world to see if performance is fair/even across different geographic locations. 
    * Taking averages and training on limited data will bias this model towards the locale that is best represented in this data and in these averages, which in this case is one locale: a household near Paris, France (from the dataset description)
    * Further, do different locales have the same style and format of sub-meter readings or do they have more/less/different input information? I could imagine that if different sub-meters are attached to different rooms in a house or one house has way more or way fewer appliances, or even if the meters record information in a different format—different models may need to be created depending on the available information.

* **Reducing costly errors**: This model still under-estimates occasionally, which is a costly error in terms of being harmful for how power companies can prepare to deliver power, and so we may want to tune or establish some rule-based cutoffs that ensure that under-estimations do not occur. 

* **Extreme conditions**: Such a model will likely only work in standard (predictable) conditions; we need a failsafe for, say, blackouts, particularly cold or hot weather where people may be over-using A/C, etc.—I should partner with power companies and users to better understand these failure risks. 

### Potential for delight: 

* **Environmental impact**: Ideally, these predictions can be used to efficiently allocate power resources so that none go to waste; low or zero-waste has positive environmental and financial impacts.

* **Open question**: How should these predictions be surfaced to power companies so they can make the most informed predictions? Perhaps it's important that they also know more about how these predictions are made or what the confidence of certain predictions are. I should discuss with power companies to better understand their decision-making processes. 
