<a href="https://colab.research.google.com/github/zelal-Eizaldeen/deeplearning_course/blob/main/3_13pytorch_regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

- In this programming example, we will implement a network to solve a regression problem using **PyTorch**.

We will use California housing dataset. Each training example will refer to a house in California and the input variables represent information about this house, like its size or its location. And then the output value is the price of the house, and the idea here is to try to train the network to, given a number of input variables that describes the house, to predict what is the cost of this house.

In [None]:
"""
The MIT License (MIT)
Copyright (c) 2021 NVIDIA
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
"""

This dataset is not included in PyTorch, so we will import it from scikit-learn. We are going to train with a batch size of 128 for 256 epochs.

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
import numpy as np

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
EPOCHS = 256
BATCH_SIZE = 128

We will import the dataset here using the fetch California housing, separate the data and targets. We will then split this into a training and a test part using this function from scikit-learn. We'll say that we want to use 20% of the dataset as a test dataset, and the rest for training.

In [None]:
# Read dataset and split into train and test.
california_housing = fetch_california_housing()
data = california_housing.get('data')
target = california_housing.get('target')
raw_x_train, raw_x_test, y_train, y_test = train_test_split(
    data, target, test_size=0.2, random_state=0)

We need to then convert some of this into the type of the data types that we want to use. So we want to use a float32, because that is what we're using for this model. If we don't do that, we will get an error message later.

In [None]:
# Convert to same precision as model.
raw_x_train = raw_x_train.astype(np.float32)
raw_x_test = raw_x_test.astype(np.float32)
y_train = y_train.astype(np.float32)
y_test = y_test.astype(np.float32)
y_train = np.reshape(y_train, (-1, 1))
y_test = np.reshape(y_test, (-1, 1))


We want to standardize the data.

In [None]:
# Standardize the data.
x_mean = np.mean(raw_x_train, axis=0)
x_stddev = np.std(raw_x_train, axis=0)
x_train = (raw_x_train - x_mean) / x_stddev
x_test = (raw_x_test - x_mean) / x_stddev

And then we **create the dataset objects**.

In [None]:
# Create Dataset objects.
trainset = TensorDataset(torch.from_numpy(x_train),
                         torch.from_numpy(y_train))
testset = TensorDataset(torch.from_numpy(x_test),
                        torch.from_numpy(y_test))

Now it's time to **create the model**. We specify the first layer being a linear layer with eight inputs and 32 neurons. We have a ReLU activation function for that. And then the output layer, its inputs should match the output from the previous layer for PyTorch. And then the output layer has a single neuron, because we are predicting a single value,

In [None]:
# Create model.
model = nn.Sequential(
    nn.Linear(8, 32),
    nn.ReLU(),
    nn.Linear(32, 1)
)


To initialize the weights, we use **the Xavier initialization function**, and we do that for the linear layers. We will use the **Adam optimizer**, and the **mean squared error loss functions**. MSE loss is what you want to use when you do a regression problem and **using a linear activation function in the output layer.**

In [None]:
# Initialize weights.
for module in model.modules():
    if isinstance(module, nn.Linear):
        nn.init.xavier_uniform_(module.weight)
        nn.init.constant_(module.bias, 0.0)

# Loss function and optimizer
optimizer = torch.optim.Adam(model.parameters())
loss_function = nn.MSELoss()

#Utilites.py

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
import sys, importlib

In [None]:

ROOT = "your_directory_to_utilities.py"
if ROOT not in sys.path:
    sys.path.insert(0, ROOT)

# Clear cache and import
sys.modules.pop("utilities", None)
import utilities; importlib.reload(utilities)

from utilities import train_model  # should work now

In [None]:
from utilities import train_model


To **train the model**, we've actually refactored the code here a little bit so we don't have to write this training function over and over again. So I've lifted that out into a utilities.py, where we have a function that is containing the training function.

Where we have the outer training loops with number of epochs and then going through all the batches, moving to the GPU, doing the forward pass, accumulate metrics, and then backward pass.

One thing we have changed here is that we're also given, we have a parameter here that says **what metric do we want to print.** So when we look at the classification problem, we want to look at accuracy, how many of the examples did we classify correctly. For a regression problem, that doesn't make any sense because we will not predict exactly the right value, so we want to look at some other **metric** **as how far away are we from the true value**, so we want to **use mean absolute error instead**.

So this training_model function can take either accuracy as input or mean absolute error, and depending on which one you give, it will calculate that metric a little bit differently **where we either check if they're equal or we're accumulating the absolute error**. We can just call it with train model. We pass the model input to it, we say if we have a GPU or not, number of epochs, batch size, the training and test dataset, what optimizer we will use, what loss function, and then finally, what metric.

In [None]:
# Train model.
utilities.train_model(model, device, EPOCHS, BATCH_SIZE, trainset, testset,
            optimizer, loss_function, 'mae')

Epoch 1/256 loss: 2.5985 - mae: 1.2328 - val_loss: 1.0307 - val_mae: 0.7184
Epoch 2/256 loss: 0.8076 - mae: 0.6481 - val_loss: 0.7120 - val_mae: 0.5942
Epoch 3/256 loss: 0.6543 - mae: 0.5801 - val_loss: 0.6172 - val_mae: 0.5524
Epoch 4/256 loss: 0.5738 - mae: 0.5412 - val_loss: 0.5629 - val_mae: 0.5198
Epoch 5/256 loss: 0.5124 - mae: 0.5108 - val_loss: 0.4879 - val_mae: 0.4905
Epoch 6/256 loss: 0.4711 - mae: 0.4890 - val_loss: 0.4560 - val_mae: 0.4724
Epoch 7/256 loss: 0.4401 - mae: 0.4726 - val_loss: 0.4642 - val_mae: 0.4608
Epoch 8/256 loss: 0.4214 - mae: 0.4613 - val_loss: 0.4166 - val_mae: 0.4491
Epoch 9/256 loss: 0.4084 - mae: 0.4547 - val_loss: 0.4349 - val_mae: 0.4473
Epoch 10/256 loss: 0.4006 - mae: 0.4502 - val_loss: 0.4069 - val_mae: 0.4414
Epoch 11/256 loss: 0.3937 - mae: 0.4475 - val_loss: 0.3938 - val_mae: 0.4377
Epoch 12/256 loss: 0.3886 - mae: 0.4437 - val_loss: 0.4048 - val_mae: 0.4362
Epoch 13/256 loss: 0.3855 - mae: 0.4416 - val_loss: 0.3878 - val_mae: 0.4358
Epoch 14

[0.36554122317669, 0.3688814357826204]

We can see that we got a loss of 0.28, and then a validation loss of 0.297. We will now also use **this trained model to do some predictions**. So we can take the test dataset, and provide them to the device, and then apply the model to it. So this will do a prediction for all our examples in the test dataset, and then we will just print out the first three of them, and we will also print out what the actual label is. So we can see here the prediction and the true value, and we can see that they are somewhat similar to each other,

In [None]:
# Print first 3 predictions.
inputs = torch.from_numpy(x_test)
inputs = inputs.to(device)
outputs = model(inputs)
for i in range(0, 3):
    print('Prediction: %4.2f' % outputs.data[i].item(),
         ', true value: %4.2f' % y_test[i].item())

Prediction: 1.53 , true value: 1.37
Prediction: 2.50 , true value: 2.41
Prediction: 1.34 , true value: 2.01


# Modified Version

 So let's look at how we can then modify this network to be a little bit better as well. We will have a deeper network.

But if you see the **definition of the model**, you can see that we have not only two layers, but we have, **this is the input layer, which is now, instead of 32 neurons, it's 256 neurons, and then we have another hidden layer, which also has 256 neurons.**
- Then when I tried that first, I saw that **the training error went down much more than the test error due to overfitting,** so we also then added some **dropout layers here as regularization to try to reduce the amount of overfitting.**

In [None]:
# Create model.
model = nn.Sequential(
    nn.Linear(8, 256),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(256, 256),
    nn.Dropout(0.3),
    nn.ReLU(),
    nn.Linear(256, 1)
)

We can see now that after finishing

In [None]:
# Initialize weights.
for module in model.modules():
    if isinstance(module, nn.Linear):
        nn.init.xavier_uniform_(module.weight)
        nn.init.constant_(module.bias, 0.0)

# Loss function and optimizer
optimizer = torch.optim.Adam(model.parameters())
loss_function = nn.MSELoss()

# Train model.
train_model(model, device, EPOCHS, BATCH_SIZE, trainset, testset,
            optimizer, loss_function, 'mae')

# Print first 3 predictions.
inputs = torch.from_numpy(x_test)
inputs = inputs.to(device)
outputs = model(inputs)
for i in range(0, 3):
    print('Prediction: %4.2f' % outputs.data[i].item(),
         ', true value: %4.2f' % y_test[i].item())

Epoch 1/256 loss: 1.2168 - mae: 0.7010 - val_loss: 0.4743 - val_mae: 0.4871
Epoch 2/256 loss: 0.5147 - mae: 0.5146 - val_loss: 0.3868 - val_mae: 0.4351
Epoch 3/256 loss: 0.4563 - mae: 0.4821 - val_loss: 0.3994 - val_mae: 0.4223
Epoch 4/256 loss: 0.4390 - mae: 0.4720 - val_loss: 0.3825 - val_mae: 0.4189
Epoch 5/256 loss: 0.4196 - mae: 0.4613 - val_loss: 0.6780 - val_mae: 0.4210
Epoch 6/256 loss: 0.4270 - mae: 0.4540 - val_loss: 0.3643 - val_mae: 0.4021
Epoch 7/256 loss: 0.3996 - mae: 0.4449 - val_loss: 0.3372 - val_mae: 0.3999
Epoch 8/256 loss: 0.4082 - mae: 0.4453 - val_loss: 0.3584 - val_mae: 0.3981
Epoch 9/256 loss: 0.3891 - mae: 0.4377 - val_loss: 0.3191 - val_mae: 0.3830
Epoch 10/256 loss: 0.3789 - mae: 0.4315 - val_loss: 0.3209 - val_mae: 0.3849
Epoch 11/256 loss: 0.3678 - mae: 0.4259 - val_loss: 0.3976 - val_mae: 0.3812
Epoch 12/256 loss: 0.3607 - mae: 0.4209 - val_loss: 0.3264 - val_mae: 0.3864
Epoch 13/256 loss: 0.3578 - mae: 0.4195 - val_loss: 0.3200 - val_mae: 0.3699
Epoch 14

this training, that the loss is now down to 0.219, and similarly, the validation loss is at 0.239, And we can also see that the predictions here are probably a little bit closer to the true values. So we can see there how we, by **making a more complex network with more layers and more neurons per layer, we got a little bit better results from the network.**

You can continue to experiment with this. One thing you could do is you could try to remove the dropout layers and see if, what I would expect, then you'll see that the training loss would go down and be significantly lower, while the validation loss probably wouldn't, so you would basically have more overfitting of your network at that case.