**NOTE:** <br>
For best results, please use the provided Docker container and run this notebook inside that.
You can also run the notebook in Google Colab.

# Homegrown Linear Regression (LR) via numpy

Here, we will implement linear regression (a single layer network, with an architecture "2-1", corresponding to a model with 2 inputs, and a single output node) using array programming, and also in network fashion using numpy.

Numpy provides an n-dimensional array class, and many functions for manipulating these arrays. Numpy is a generic framework for scientific computing; but it does not know anything about computation graphs, or deep learning, or gradients. However we can easily use numpy to fit a single-layer network (linear regression) to random data by manually implementing the forward and backward (gradient descent) passes through the network using numpy operations. The following code shows how to accomplish this:


The below code trains a linear regression model from random data (so the model learns nothing really; it is just an exercise).


**QUESTION** :Run the below code and modify as needed and answer the following question. What is the value of the gradient for the first iteration of gradient descent after running the above code?   

In [1]:
import numpy as np

# m_rows is batch size; 
# D_in is input dimension;
# D_out is output dimension.
m_rows, D_in, D_out = 64, 2, 1
#m_rows, D_in, D_out = 64, 1000, 1

np.random.seed(seed=42) #fix the seed
# Create random input and output data
X_train = np.random.randn(m_rows, D_in)
y_train = np.random.randn(m_rows, D_out)

# Randomly initialize weights
W1 = np.random.randn(D_in, D_out) #[w0, w1, w2, ...w999]

learning_rate = 1e-3
for epoch in range(10):  #Gradient descent
    # Forward pass: compute predicted y
    y_pred = X_train.dot(W1)
    #y_pred = X_train @ W1 

    # Compute and print loss
    loss = np.square(y_pred - y_train).mean()
    print(f"Epoch:{epoch}, MSE: {loss}")

    # Backprop to compute gradients of w1 and w2 with respect to loss
    err = 2.0 * (y_pred - y_train)
    grad_W1 = X_train.T.dot(err)/m_rows  #weighted sum of the train data
    print(f"Epoch:{epoch}, Gradient: {np.round(grad_W1, 3)}")
    
    # Update weights via gradient descent
    W1 -= learning_rate * grad_W1
    print('--------------------')
    
print(f"Weights after training: ,{np.round(W1, 3)}")  

Epoch:0, MSE: 2.389670625903901
Epoch:0, Gradient: [[ 0.587]
 [-2.399]]
--------------------
Epoch:1, MSE: 2.383574809721735
Epoch:1, Gradient: [[ 0.587]
 [-2.395]]
--------------------
Epoch:2, MSE: 2.377502693303068
Epoch:2, Gradient: [[ 0.586]
 [-2.39 ]]
--------------------
Epoch:3, MSE: 2.3714541834963083
Epoch:3, Gradient: [[ 0.585]
 [-2.385]]
--------------------
Epoch:4, MSE: 2.3654291875192457
Epoch:4, Gradient: [[ 0.585]
 [-2.38 ]]
--------------------
Epoch:5, MSE: 2.359427612957574
Epoch:5, Gradient: [[ 0.584]
 [-2.375]]
--------------------
Epoch:6, MSE: 2.3534493677634245
Epoch:6, Gradient: [[ 0.583]
 [-2.371]]
--------------------
Epoch:7, MSE: 2.3474943602538985
Epoch:7, Gradient: [[ 0.583]
 [-2.366]]
--------------------
Epoch:8, MSE: 2.3415624991096156
Epoch:8, Gradient: [[ 0.582]
 [-2.361]]
--------------------
Epoch:9, MSE: 2.3356536933732595
Epoch:9, Gradient: [[ 0.581]
 [-2.357]]
--------------------
Weights after training: ,[[ 0.208]
 [-1.222]]


# Linear regression using tensors and autograd


Traditionally, say using Numpy, we had to manually implement both the forward and backward passes to train a  linear regression model (aka a single-layered neural network). Manually implementing the backward pass is not a big deal for a linear regression model or for a small/shallow network, but this can quickly get very tricky for larger multilayers networks.

Thankfully, we can use automatic differentiation to automate the computation of backward passes in neural networks. The autograd package in PyTorch provides exactly this functionality.

When using autograd,
* **the forward pass of your network (code for making a prediction) will define a computational graph;**
* nodes in the graph will be Tensors,
* and edges will be functions that produce output Tensors from input Tensors.

PyTorch builds up a graph as you compute the forward pass through the network, and one call to **backward()** on some result node (loss function) then augments each intermediate node in the graph with the gradient of the result node with respect to that intermediate node.

This sounds complicated, but it’s pretty simple to use in practice. We wrap our PyTorch Tensors in Variable objects; a Variable represents a node in a computational graph. If x is a Variable then x.data is a Tensor, and x.grad is another Variable holding the gradient of x with respect to some scalar value.

PyTorch Variables are PyTorch Tensors variables.

Here we use PyTorch tensor variables and autograd to implement our single-layer network (linear regression model); now we no longer need to manually implement the backward pass through the network:

**Question** Run the below code and modify as needed to get the value of the first element of the learnt linear regression model. What is the value of the first element of the learnt linear regression model?

In [17]:
import torch
from torch.autograd import Variable

dtype = torch.FloatTensor
# dtype = torch.cuda.FloatTensor # Uncomment this to run on GPU

# N is batch size; D_in is input dimension;
# D_out is output dimension.
m_rows, D_in, D_out = 64, 1000, 1

torch.manual_seed(0)
# Create random Tensors to hold input and outputs, and wrap them in Variables.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Variables during the backward pass.
X_train = torch.randn((m_rows, D_in), requires_grad=False) #use (m_rows, D_in) or m_rows, D_in
Y_train = torch.randn(m_rows, D_out, requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Variables during the backward pass.
W1 = torch.randn(D_in, D_out,  requires_grad=True)

learning_rate = 1e-6
for epoch in range(50):
    # Forward pass: compute predicted y using operations on Variables; these
    # are exactly the same operations we used to compute the forward pass using
    # Tensors, but we do not need to keep references to intermediate values since
    # we are not implementing the backward pass by hand.
    Y_pred = X_train.matmul(W1)

    # Compute and print loss using operations on Variables.
    # Now loss is a Variable of shape (1,) and loss.data is a Tensor of shape
    # (1,); loss.data[0] is a scalar value holding the loss.
    loss = (Y_pred - Y_train).pow(2).mean()
    if (epoch +1 )%10 ==0:  #print every 10 epochs
        print(f"Epoch:{epoch+1}, MSE: {loss.data.numpy():.6}")

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Variables with requires_grad=True.
    # After this call W1.grad  will be Variables holding the gradient
    # of the loss with respect to W1.
    loss.backward()

    # Update weights using gradient descent; W1.data and W2.data are Tensors,
    # W1.grad are Variables and W1.grad.data aare
    # Tensors.
    W1.data -= learning_rate * W1.grad.data
    # Manually zero the gradients after updating weights
    W1.grad.data.zero_()

print('------------------------')    
# print the first value of the learnt linear regression model weight

# TODO: Add code below inside the print statement
print(W1[0])

Epoch:10, MSE: 829.467
Epoch:20, MSE: 828.909
Epoch:30, MSE: 828.352
Epoch:40, MSE: 827.794
Epoch:50, MSE: 827.237
------------------------
tensor([1.8839], grad_fn=<SelectBackward0>)


# PyTorch: optim


Up to this point we have updated the weights of our models by manually mutating the .data member for Variables holding learnable parameters.

`W1.data -= learning_rate * W1.grad.data`

This is not a huge burden for simple optimization algorithms like stochastic gradient descent on a single layer network (such as a multiple linear regession (MLR) model), but in practice we often train neural networks using more sophisticated optimizers like AdaGrad, RMSProp, Adam, etc. More on these optimizers later.

The **optim** package in PyTorch abstracts the idea of an optimization algorithm and provides implementations of commonly used optimization algorithms. Most commonly used methods are already supported, and the interface is general enough, so that more sophisticated ones can be also easily integrated in the future.

The following example brings these two highly scaleable concepts of computational graphs and optimization to life:

* we will use the nn package to define our our linear regression module,
* and we will use optimize the model using the **Adam** algorithm, a variant of stochastic gradient descent. The **Adam** algorithm comes as part of the the **optim** package.

# Building PyTorch Networks using the Sequential API

The **Sequential** class allows us to build a neural network in PyTorch in a high-level quick and modular manner.

* The Sequential class allows us to build PyTorch neural networks on-the-fly without having to build an explicit class.
* This make it much easier to rapidly build networks and allows us to skip over the step where we implement the forward() method. When we use the sequential way of building a PyTorch network, we construct the forward() method implicitly by defining our network's architecture sequentially.
 

Here we use the Sequential() API to build a single-layered neural network (upon closer inspection you see it is a linear regression model) . 

In [18]:
import torch
from torch.autograd import Variable

# N is batch size; D_in is input dimension;
# D_out is output dimension.
m_rows, D_in, D_out = 64, 1000, 1

torch.manual_seed(0)
# Create random Tensors to hold input and outputs, and wrap them in Variables.
# Setting requires_grad=False indicates that we do not need to compute gradients
# with respect to these Variables during the backward pass.
X_train = torch.randn((m_rows, D_in), requires_grad=False) #use (m_rows, D_in) or m_rows, D_in
Y_train = torch.randn(m_rows, D_out, requires_grad=False)

# Create random Tensors for weights, and wrap them in Variables.
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Variables during the backward pass.
# NOT Need as we are using the nn package
#  W1 = torch.randn(D_in, D_out,  requires_grad=True)

# Use the nn package to define our model and loss function.
# use the sequential API makes things simple
model = torch.nn.Sequential(  #  X_train @ W1
    torch.nn.Linear(D_in, D_out),   # X.matmul(W1)
)
# loss scaffolding layer
loss_fn = torch.nn.MSELoss(size_average=True)

# Use the optim package to define an Optimizer that will update the weights of
# the model for us. Here we will use Adam; the optim package contains many other
# optimization algoriths. The first argument to the Adam constructor tells the
# optimizer which Variables it should update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
#optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
for epoch in range(50):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(X_train)

    # Compute and print loss.
    loss = loss_fn(y_pred, Y_train)
    if epoch % 10 == 0:
         print(f"Epoch:{epoch}, MSE: {loss.item():.9}")

    # Before the backward pass, use the optimizer object to zero all of the
    # gradients for the variables it will update (which are the learnable
    # weights of the model). This is because by default, gradients are
    # accumulated in buffers( i.e, not overwritten) whenever .backward()
    # is called. Checkout docs of torch.autograd.backward for more details.
    optimizer.zero_grad()

    # Backward pass: compute gradient of the loss with respect to model
    # parameters
    loss.backward()

    # Calling the step function on an Optimizer makes an update to its
    # parameters
    optimizer.step()
print(model)


Epoch:0, MSE: 1.48583221
Epoch:10, MSE: 1.24699187
Epoch:20, MSE: 1.03472447
Epoch:30, MSE: 0.849931717
Epoch:40, MSE: 0.691670775
Sequential(
  (0): Linear(in_features=1000, out_features=1, bias=True)
)


# Activation Functions

A single layered  neural network consisting of an input layer (not counted as a layer) and an output layer  is essentially just a linear regression model. A two-layered network consisting for an input layer (not counted as a layer), a hidden layer, and an output layer is still essentially just a linear regression model. Both these networks are just linear transformations of the inputs. To make these networks nonlinear (and enhance their predictive power), we introduce  a non-linear activation function (that acts element-wise on the linear transformed inputs).

 

A popular activation function is the Sigmoid function. It is one of the most widely used non-linear activation function. Sigmoid transforms the values between the range 0 and 1. Here is the mathematical expression for sigmoid:

`f(x) = 1/(1+e^-x)`

``` import numpy as np
def sigmoid_function(x):
    z = (1/(1 + np.exp(-x)))
    return z
```  
    
**ReLU**

The ReLU function is another non-linear activation function that has gained popularity in the deep learning domain. ReLU stands for Rectified Linear Unit. The main advantage of using the ReLU function over other activation functions is that it does not activate all the neurons at the same time. This means that the neurons will only be deactivated if the output of the linear transformation is less than 0. 

```
def relu_function(x):
    if x<0:
        return 0
    else:
        return x
```

`relu_function(7), relu_function(-7). # returns (7, 0) as a result`

For the negative part of the input domain, you will notice that the value is zero.

In PyTorch ReLU is available as a built in layer via  torch.nn.ReLU().


**The following are examples of neural networks consisting of input, hidden, and output layers:**

```    
network1 = nn.Sequential(
    nn.Linear(in_features, out_features), # X.matmul(W1) + b1
    nn.ReLU(), #Nonlinear activation function.  nn.ReLU(X.matmul(W1) + b1) 
    nn.Linear(out_features, 1) # X.matmul(W2) + b2 )
)
```

```
network2 = nn.Sequential(
    nn.Linear(in_features, out_features), # X.matmul(W1) + b1
    nn.Linear(out_features, 1) # X.matmul(W2) + b2 )
)
```

```
network3 = nn.Sequential(
    nn.Linear(in_features, out_features), # X.matmul(W1) + b1
    nn.ReLU(), #Nonlinear activation function.  nn.ReLU(X.matmul(W1) + b1) 
    nn.Linear(in_features, out_features), # X.matmul(W2) + b1
    nn.ReLU(), #Nonlinear activation function.  nn.ReLU(X.matmul(W1) + b1) 
    nn.Linear(out_features, 1) # X.matmul(W3) + b2 )
)
```

# Boston house price regression via Sequential API

In [6]:
!pip install torchsummary

Collecting torchsummary
  Downloading torchsummary-1.5.1-py3-none-any.whl (2.8 kB)
Installing collected packages: torchsummary
Successfully installed torchsummary-1.5.1
You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.[0m


In [14]:
from torchsummary import summary  #install it if necessary using !pip install torchsummary 
import torch
#import torchvision
import torch.utils.data
#import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

# Set random seed for better reproducability
torch.manual_seed(0)

# if GPU, use cuda-enabled GPU device else CPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# load data
boston = load_boston()

X = boston['data']
y = boston['target']

in_features = X.shape[1]
# X.shape

# train validation test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42, shuffle=True)
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=0.15, random_state=42, shuffle=True)


## Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_validation = scaler.transform(X_validation) #Transform validation set with the same constants
X_test = scaler.transform(X_test) #Transform test set with the same constants

# convert numpy arrays to tensors
X_train_tensor = torch.from_numpy(X_train)
X_validation_tensor = torch.from_numpy(X_validation)
X_test_tensor = torch.from_numpy(X_test)
y_train_tensor = torch.from_numpy(y_train)
y_test_tensor = torch.from_numpy(y_test)
y_validation_tensor = torch.from_numpy(y_validation)

# create TensorDataset in PyTorch
boston_train = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
boston_validation = torch.utils.data.TensorDataset(X_validation_tensor, y_validation_tensor)
boston_test = torch.utils.data.TensorDataset(X_test_tensor, y_test_tensor)
# create dataloader


train_batch_size = 96
valid_test_batch_size = 16

trainloader_boston = torch.utils.data.DataLoader(boston_train, batch_size=train_batch_size, shuffle=True, num_workers=2)
validloader_boston = torch.utils.data.DataLoader(boston_validation, batch_size=valid_test_batch_size, shuffle=True, num_workers=2)
testloader_boston = torch.utils.data.DataLoader(boston_test, batch_size=valid_test_batch_size, shuffle=True, num_workers=2)


#
# Method to create, define and run a deep neural network model
#
def run_boston_seq_model( 
    hidden_layer_neurons=[32, 16, 8],
    opt=optim.SGD,
    epochs=5,
    learning_rate=1e-3
):
    
    D_in = X_test.shape[1]  # Input layer neurons depend on the input dataset shape
    D_out = 1  # Output layer neurons - depend on what you're trying to predict, here, just a single value
    
    str_neurons = [str(h) for h in hidden_layer_neurons]
    arch_string = f"{D_in}-{'-'.join(str_neurons)}-{D_out}"
    
    # print(arch_string)
    
    layers = [
        torch.nn.Linear(D_in, hidden_layer_neurons[0]),  # X.matmul(W1)
        nn.ReLU(),  # ReLU( X.matmul(W1))
    ]
    
    # Add hidden layers
    for i in range(1, len(hidden_layer_neurons)):
        prev, curr = hidden_layer_neurons[i - 1], hidden_layer_neurons[i]
        layers.append(torch.nn.Linear(prev, curr))
        layers.append(nn.ReLU())
        
    
    # Add final layer
    layers.append(nn.Linear(hidden_layer_neurons[-1], D_out)) # Relu( X.matmul(W1)).matmul(W2))
    
    # Use the nn package to define our model and loss function.
    # use the sequential API makes things simple
    model = torch.nn.Sequential(*layers)

    # MSE loss scaffolding layer
    loss_fn = torch.nn.MSELoss(reduction='mean')
    optimizer = opt(model.parameters(), lr=learning_rate)

    print('-'*50)
    print('Model summary:')
    print(model)
    summary(model, (1, 13))
    print('-'*50)

    # Print device used - CPU or GPU
    print(f"Using {device}...")
    
    '''
    Training Process:
        Load a batch of data.
        Zero the grad.
        Predict the batch of the data through net i.e forward pass.
        Calculate the loss value by predict value and true value.
        Backprop i.e get the gradient with respect to parameters
        Update optimizer i.e gradient update
    '''

    loss_history = []
    acc_history = []
    def train_epoch(epoch, model, loss_fn, opt, train_loader):
        running_loss = 0.0
        count = 0
        # dataset API gives us pythonic batching 
        for batch_id, data in enumerate(train_loader):
            inputs, target = data[0].to(device), data[1].to(device)        
            # 1:zero the grad, 2:forward pass, 3:calculate loss,  and 4:backprop!
            opt.zero_grad()
            preds = model(inputs.float()) #prediction over the input data

            # compute loss and gradients
            loss = loss_fn(preds, torch.unsqueeze(target.float(), dim=1))    #mean loss for this batch

            loss.backward() #calculate nabla_w
            loss_history.append(loss.item())
            opt.step()  #update W
            #from IPython.core.debugger import Pdb as pdb;    pdb().set_trace() #breakpoint; dont forget to quit

            running_loss += loss.item()
            count += 1

        train_mse = np.round(running_loss/count, 3)
        return train_mse



    #from IPython.core.debugger import Pdb as pdb;    pdb().set_trace() #breakpoint; dont forget to quit
    def evaluate_model(epoch, model, loss_fn, opt, data_loader, tag = "Test"):
        overall_loss = 0.0
        count = 0
        for i,data in enumerate(data_loader):
            inputs, target = data[0].to(device), data[1].to(device)                
            preds = model(inputs.float())      

            loss = loss_fn(preds, torch.unsqueeze(target.float(), dim=1))           # compute loss value

            overall_loss += (loss.item())  # compute total loss to save to logs
            count += 1

        # compute mean loss
        valid_mse = np.round(overall_loss/count, 3)
        # print(f"{tag} MSE loss: {valid_mse:.3f}")
        return valid_mse
        


    for epoch in range(epochs):
        # print(f"Epoch {epoch+1}")
        train_mse = train_epoch(epoch, model, loss_fn, optimizer, trainloader_boston)
        valid_mse = evaluate_model(epoch, model, loss_fn, optimizer, validloader_boston, tag = "Validation")
        print(f"Epoch {epoch+1}: Train MSE: {train_mse}\t Validation MSE: {valid_mse}")
    print("-"*50)
    test_mse = evaluate_model(epoch, model, loss_fn, opt, testloader_boston, tag="Test")
    
    return arch_string, train_mse, valid_mse, test_mse

In [21]:
#
# NOTE: Run this cell however number of times you want to achieve a low MSE value
# Experiment with different arguments to the function
#

import pandas as pd
torch.manual_seed(0)
#==================================================#
#    Modify START   #
#==================================================#
'''
(hidden_layers_neurons) - A list of the number of neurons in the hidden layers in order. DEFAULT: [32, 16, 8] => 1st hidden layer: 32 neurons, 2nd: 16, 3rd: 8
(opt) - The optimizer function to use: SGD, Adam, etc.,  DEFAULT: optim.SGD
(epochs) - The total number of epochs to train your model for,  DEFAULT: 5
(learning_rate) - The learning rate to take the gradient descent step with
'''
learning_rate = 1e-3
hidden_layer_neurons = [32, 16, 8]
opt = optim.SGD  # optim.SGD, Optim.Adam, etc.
epochs = 5

#==================================================#
#    Modify END #
#==================================================#

arch_string, train_mse, valid_mse, test_mse = run_boston_seq_model(
    hidden_layer_neurons,
    opt,
    epochs,
    learning_rate
)
    
try: bostonLog 
except : bostonLog  = pd.DataFrame(
    columns=[
        "Architecture string", 
        "Optimizer", 
        "Epochs", 
        "Train MSE",
        "Valid MSE",
        "Test MSE",
    ]
)

bostonLog.loc[len(bostonLog)] = [
    arch_string, 
    f"{opt}", 
    f"{epochs}", 
    f"{train_mse}",
    f"{valid_mse}", 
    f"{test_mse}",
]

bostonLog 

--------------------------------------------------
Model summary:
Sequential(
  (0): Linear(in_features=13, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=16, bias=True)
  (3): ReLU()
  (4): Linear(in_features=16, out_features=8, bias=True)
  (5): ReLU()
  (6): Linear(in_features=8, out_features=1, bias=True)
)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                [-1, 1, 32]             448
              ReLU-2                [-1, 1, 32]               0
            Linear-3                [-1, 1, 16]             528
              ReLU-4                [-1, 1, 16]               0
            Linear-5                 [-1, 1, 8]             136
              ReLU-6                 [-1, 1, 8]               0
            Linear-7                 [-1, 1, 1]               9
Total params: 1,121
Trainable params: 1,121
Non-trainable params: 0
----

Unnamed: 0,Architecture string,Optimizer,Epochs,Train MSE,Valid MSE,Test MSE
0,13-32-16-8-1,<class 'torch.optim.sgd.SGD'>,5,567.431,523.641,489.015


# Perform Classification on Iris dataset via Sequential API

In [22]:
import torch
import torch.utils.data
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn import datasets

torch.manual_seed(0)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# load data
iris = datasets.load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42, shuffle = True)
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=0.15, random_state=42, shuffle=True)

## Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_validation = scaler.transform(X_validation) #Transform validation set with the same constants
X_test = scaler.transform(X_test) #Transform test set with the same constants

# convert numpy arrays to tensors
X_train_tensor = torch.from_numpy(X_train)
X_validation_tensor = torch.from_numpy(X_validation)
X_test_tensor = torch.from_numpy(X_test)
y_train_tensor = torch.from_numpy(y_train)
y_validation_tensor = torch.from_numpy(y_validation)
y_test_tensor = torch.from_numpy(y_test)

# create TensorDataset in PyTorch
iris_train = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
iris_validation = torch.utils.data.TensorDataset(X_validation_tensor, y_validation_tensor)
iris_test = torch.utils.data.TensorDataset(X_test_tensor, y_test_tensor)

# print(X_test.shape)

# create dataloader
# DataLoader is implemented in PyTorch, which will return an iterator to iterate training data by batch.
train_batch_size = 96
valid_test_batch_size = 16
trainloader_iris = torch.utils.data.DataLoader(iris_train, batch_size=train_batch_size, shuffle=True, num_workers=2)
validloader_iris = torch.utils.data.DataLoader(iris_validation, batch_size=valid_test_batch_size, shuffle=True, num_workers=2)
testloader_iris = torch.utils.data.DataLoader(iris_test, batch_size=valid_test_batch_size, shuffle=True, num_workers=2)

#
# Method to create, define and run a deep neural network model
#
def run_iris_model(
    hidden_layer_neurons=[32, 16, 8],
    opt=optim.SGD,
    epochs=5,
    learning_rate=1e-3
):
    D_in = X_test.shape[1]  # Input layer neurons depend on the input dataset shape
    D_out = 3  # Output layer neurons - depend on what you're trying to predict, here, 3 classes
    
    str_neurons = [str(h) for h in hidden_layer_neurons]
    arch_string = f"{D_in}-{'-'.join(str_neurons)}-{D_out}"
    
    layers = [
        torch.nn.Linear(D_in, hidden_layer_neurons[0]),  # X.matmul(W1)
        nn.ReLU(),  # ReLU( X.matmul(W1))
    ]
    
    # Add hidden layers
    for i in range(1, len(hidden_layer_neurons)):
        prev, curr = hidden_layer_neurons[i - 1], hidden_layer_neurons[i]
        layers.append(torch.nn.Linear(prev, curr))
        layers.append(nn.ReLU())
        
    
    # Add final layer
    layers.append(nn.Linear(hidden_layer_neurons[-1], D_out)) # Relu( X.matmul(W1)).matmul(W2))
    
    # Use the nn package to define our model and loss function.
    # use the sequential API makes things simple
    model = torch.nn.Sequential(*layers)

    model.to(device)

    # use Cross Entropy and SGD optimizer.
    loss_fn = nn.CrossEntropyLoss()  #for classfication 
    optimizer = opt(model.parameters(), lr=learning_rate)

    #summary(model, (4, 20))
    print('-'*50)
    print('Model:')
    print(model)
    print('-'*50)
    
    '''
    Training Process:
        Load a batch of data.
        Zero the grad.
        Predict the batch of the data through net i.e forward pass.
        Calculate the loss value by predict value and true value.
        Backprop i.e get the gradient with respect to parameters
        Update optimizer i.e gradient update
    '''

    loss_history = []
    acc_history = []
    def train_epoch(epoch, model, loss_fn, opt, train_loader):
        running_loss = 0.0
        count = 0
        y_pred = []
        epoch_target = []
        # dataset API gives us pythonic batching 
        for batch_id, data in enumerate(train_loader):
            inputs, target = data[0].to(device), data[1].to(device)        
            # 1:zero the grad, 2:forward pass, 3:calculate loss,  and 4:backprop!
            opt.zero_grad()
            preds = model(inputs.float()) #prediction over the input data

            # compute loss and gradients
            loss = loss_fn(preds, target)    #mean loss for this batch

            loss.backward() #calculate nabla_w
            loss_history.append(loss.item())
            opt.step()  #update W
            y_pred.extend(torch.argmax(preds, dim=1).tolist())
            epoch_target.extend(target.tolist())
            #from IPython.core.debugger import Pdb as pdb;    pdb().set_trace() #breakpoint; dont forget to quit

            running_loss += loss.item()
            count += 1

        loss = np.round(running_loss/count, 3)
        
        #accuracy
        correct = (np.array(y_pred) == np.array(epoch_target))
        accuracy = correct.sum() / correct.size
        accuracy = np.round(accuracy, 3)
        return loss, accuracy



    #from IPython.core.debugger import Pdb as pdb;    pdb().set_trace() #breakpoint; dont forget to quit
    def evaluate_model(epoch, model, loss_fn, opt, data_loader, tag = "Test"):
        overall_loss = 0.0
        count = 0
        y_pred = []
        epoch_target = []
        for i,data in enumerate(data_loader):
            inputs, target = data[0].to(device), data[1].to(device)                
            preds = model(inputs.float())      

            loss = loss_fn(preds, target)           # compute loss value

            overall_loss += (loss.item())  # compute total loss to save to logs
            y_pred.extend(torch.argmax(preds, dim=1).tolist())
            epoch_target.extend(target.tolist())
            count += 1

        # compute mean loss
        loss = np.round(overall_loss/count, 3)
        #accuracy
        correct = (np.array(y_pred) == np.array(epoch_target))
        accuracy = correct.sum() / correct.size
        accuracy = np.round(accuracy, 3)
        return loss, accuracy
        


    for epoch in range(epochs):
        # print(f"Epoch {epoch+1}")
        train_loss, train_accuracy = train_epoch(epoch, model, loss_fn, optimizer, trainloader_iris)
        valid_loss, valid_accuracy = evaluate_model(epoch, model, loss_fn, optimizer, validloader_iris, tag = "Validation")
        print(f"Epoch {epoch+1}: Train Accuracy: {train_accuracy}\t Validation Accuracy: {valid_accuracy}")
    print("-"*50)
    test_loss, test_accuracy = evaluate_model(epoch, model, loss_fn, opt, testloader_iris, tag="Test")
    
    return arch_string, train_accuracy, valid_accuracy, test_accuracy

In [24]:
#
# NOTE: Run this cell however number of times you want to achieve larger train/test accuracy
# Experiment with different arguments to the function
#

import pandas as pd
torch.manual_seed(0)
#==================================================#
#    Modify START   #
#==================================================#
'''
(hidden_layers_neurons) - A list of the number of neurons in the hidden layers in order. DEFAULT: [32, 16, 8] => 1st hidden layer: 32 neurons, 2nd: 16, 3rd: 8
(opt) - The optimizer function to use: SGD, Adam, etc.,  DEFAULT: optim.SGD
(epochs) - The total number of epochs to train your model for,  DEFAULT: 5
(learning_rate) - The learning rate to take the gradient descent step with
'''

learning_rate = 1e-3
hidden_layer_neurons = [32, 16, 8]
opt = optim.SGD  # optim.SGD, Optim.Adam, etc.
epochs = 5

#==================================================#
#    Modify END #
#==================================================#

arch_string, train_accuracy, valid_accuracy, test_accuracy = run_iris_model(
    hidden_layer_neurons,
    opt,
    epochs,
    learning_rate
)
    

try: irisLog 
except : irisLog = pd.DataFrame(
    columns=[
        "Architecture string", 
        "Optimizer", 
        "Epochs", 
        "Train accuracy",
        "Validation accuracy",
        "Test accuracy",
    ]
)

irisLog.loc[len(irisLog)] = [
    arch_string, 
    f"{opt}", 
    f"{epochs}", 
    f"{train_accuracy * 100}%",
    f"{valid_accuracy * 100}%",
    f"{test_accuracy * 100}%",
]

irisLog

--------------------------------------------------
Model:
Sequential(
  (0): Linear(in_features=4, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=16, bias=True)
  (3): ReLU()
  (4): Linear(in_features=16, out_features=8, bias=True)
  (5): ReLU()
  (6): Linear(in_features=8, out_features=3, bias=True)
)
--------------------------------------------------
Epoch 1: Train Accuracy: 0.346	 Validation Accuracy: 0.35
Epoch 2: Train Accuracy: 0.346	 Validation Accuracy: 0.35
Epoch 3: Train Accuracy: 0.346	 Validation Accuracy: 0.35
Epoch 4: Train Accuracy: 0.346	 Validation Accuracy: 0.35
Epoch 5: Train Accuracy: 0.346	 Validation Accuracy: 0.35
--------------------------------------------------


Unnamed: 0,Architecture string,Optimizer,Epochs,Train accuracy,Validation accuracy,Test accuracy
0,4-32-16-8-3,<class 'torch.optim.sgd.SGD'>,5,34.599999999999994%,35.0%,26.1%


# Boston house price regression via OOP API

In [25]:
from torchsummary import summary  #install it if necessary using !pip install torchsummary 
import torch
#import torchvision
import torch.utils.data
#import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')


# Set random seed for better reproducability
torch.manual_seed(0)

# if GPU, use cuda-enabled GPU device else CPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# load data
boston = load_boston()

X = boston['data']
y = boston['target']

in_features = X.shape[1]
# X.shape

# train validation test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42, shuffle=True)
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=0.15, random_state=42, shuffle=True)


## Scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_validation = scaler.transform(X_validation) #Transform validation set with the same constants
X_test = scaler.transform(X_test) #Transform test set with the same constants

# convert numpy arrays to tensors
X_train_tensor = torch.from_numpy(X_train)
X_validation_tensor = torch.from_numpy(X_validation)
X_test_tensor = torch.from_numpy(X_test)
y_train_tensor = torch.from_numpy(y_train)
y_test_tensor = torch.from_numpy(y_test)
y_validation_tensor = torch.from_numpy(y_validation)

# create TensorDataset in PyTorch
boston_train = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
boston_validation = torch.utils.data.TensorDataset(X_validation_tensor, y_validation_tensor)
boston_test = torch.utils.data.TensorDataset(X_test_tensor, y_test_tensor)
# create dataloader


train_batch_size = 96
valid_test_batch_size = 16

trainloader_boston = torch.utils.data.DataLoader(boston_train, batch_size=train_batch_size, shuffle=True, num_workers=2)
validloader_boston = torch.utils.data.DataLoader(boston_validation, batch_size=valid_test_batch_size, shuffle=True, num_workers=2)
testloader_boston = torch.utils.data.DataLoader(boston_test, batch_size=valid_test_batch_size, shuffle=True, num_workers=2)



#
# Method to create, define and run a deep neural network model
#
def run_boston_oop_model(
    hidden_layer_neurons=[32, 16, 8],
    opt=optim.SGD,
    epochs=5,
    learning_rate=1e-3
):
    
    D_in = X_test.shape[1]  # Input layer neurons depend on the input dataset shape
    D_out = 1  # Output layer neurons - depend on what you're trying to predict, here, just a single value
    
    str_neurons = [str(h) for h in hidden_layer_neurons]
    arch_string = f"{D_in}-{'-'.join(str_neurons)}-{D_out}"

    # Use the OOP API to define a deep neural network model
    #
    class BaseModel(nn.Module):
        """Custom module for a simple  regressor"""
        def __init__(self, in_features, hidden_neurons=[16, 8, 4], n_output=1):
            super(BaseModel, self).__init__()
            self.fc1 = torch.nn.Linear(in_features, hidden_neurons[0])   # 1st hidden layer
            
            # All other intermediate hidden layers
            self.intermediate_layers = torch.nn.ModuleList()
            for i in range(1, len(hidden_neurons)):
                prev, curr = hidden_neurons[i - 1], hidden_neurons[i]
                self.intermediate_layers.append(torch.nn.Linear(prev, curr))
            # print(self.intermediate_layers)
             
            self.fc_output = torch.nn.Linear(hidden_neurons[-1], n_output) # output layer

        def forward(self, x):
            # print(self.intermediate_layers)
            x = F.relu(self.fc1(x))   # activation function for 1st hidden layer
            
            # The intermediate layers
            for i in range(len(self.intermediate_layers)):
                x = F.relu(self.intermediate_layers[i](x))
                
            x = self.fc_output(x)  # Output layer without activation
            return x

    # Print device used - CPU or GPU
    print(f"Using {device}...")

    # create classifier and optimizer objects
    model = BaseModel(in_features=D_in, hidden_neurons=hidden_layer_neurons, n_output=D_out)
    model.to(device) # put on GPU before setting up the optimizer

    # Here, summary will not reflect the actual number of layers as we have a list of intermediate_layers as apposed to a specific layer like self.fc1
    print('-'*50)
    print('Model:')
    print(model)
    summary(model, (1, 13))
    print('-'*50)

    loss_fn = torch.nn.MSELoss(reduction='mean')
    optimizer = opt(model.parameters(), lr=learning_rate)

    loss_history = []
    acc_history = []

    '''
    Training Process:
        Load a batch of data.
        Zero the grad.
        Predict the batch of the data through net i.e forward pass.
        Calculate the loss value by predict value and true value.
        Backprop i.e get the gradient with respect to parameters
        Update optimizer i.e gradient update
    '''

    def train_epoch(epoch, model, loss_fn, opt, train_loader):
        running_loss = 0.0
        count = 0
        # dataset API gives us pythonic batching 
        for batch_id, data in enumerate(train_loader):
            inputs, target = data[0].to(device), data[1].to(device)        
            # 1:zero the grad, 2:forward pass, 3:calculate loss,  and 4:backprop!
            opt.zero_grad()
            preds = model(inputs.float()) #prediction over the input data

            # compute loss and gradients
            loss = loss_fn(preds, torch.unsqueeze(target.float(), dim=1))    #mean loss for this batch

            loss.backward() #calculate nabla_w
            loss_history.append(loss.item())
            opt.step()  #update W
            #from IPython.core.debugger import Pdb as pdb;    pdb().set_trace() #breakpoint; dont forget to quit

            running_loss += loss.item()
            count += 1

        train_mse = np.round(running_loss/count, 3)
        return train_mse



    #from IPython.core.debugger import Pdb as pdb;    pdb().set_trace() #breakpoint; dont forget to quit
    def evaluate_model(epoch, model, loss_fn, opt, data_loader, tag = "Test"):
        overall_loss = 0.0
        count = 0
        for i,data in enumerate(data_loader):
            inputs, target = data[0].to(device), data[1].to(device)                
            preds = model(inputs.float())      

            loss = loss_fn(preds, torch.unsqueeze(target.float(), dim=1))           # compute loss value

            overall_loss += (loss.item())  # compute total loss to save to logs
            count += 1

        # compute mean loss
        valid_mse = np.round(overall_loss/count, 3)
        # print(f"{tag} MSE loss: {valid_mse:.3f}")
        return valid_mse
        


    for epoch in range(epochs):
        # print(f"Epoch {epoch+1}")
        train_mse = train_epoch(epoch, model, loss_fn, optimizer, trainloader_boston)
        valid_mse = evaluate_model(epoch, model, loss_fn, optimizer, validloader_boston, tag = "Validation")
        print(f"Epoch {epoch+1}: Train MSE: {train_mse}\t Validation MSE: {valid_mse}")
    print("-"*50)
    test_mse = evaluate_model(epoch, model, loss_fn, opt, testloader_boston, tag="Test")
    
    return arch_string, train_mse, valid_mse, test_mse

In [27]:
#
# NOTE: Run this cell however number of times you want to achieve a low MSE value
# Experiment with different arguments to the function
#

import pandas as pd
torch.manual_seed(0)
#==================================================#
#    Modify START   #
#==================================================#
'''
(hidden_layers_neurons) - A list of the number of neurons in the hidden layers in order. DEFAULT: [32, 16, 8] => 1st hidden layer: 32 neurons, 2nd: 16, 3rd: 8
(opt) - The optimizer function to use: SGD, Adam, etc.,  DEFAULT: optim.SGD
(epochs) - The total number of epochs to train your model for,  DEFAULT: 5
(learning_rate) - The learning rate to take the gradient descent step with
'''

learning_rate = 1e-3
hidden_layer_neurons = [32, 16, 8]
opt = optim.SGD  # optim.SGD, Optim.Adam, etc.
epochs = 5

#==================================================#
#    Modify END #
#==================================================#

arch_string, train_mse, valid_mse, test_mse = run_boston_oop_model(
    hidden_layer_neurons,
    opt,
    epochs,
    learning_rate
)
    
try: bostonOopLog 
except : bostonOopLog  = pd.DataFrame(
    columns=[
        "Architecture string", 
        "Optimizer", 
        "Epochs", 
        "Train MSE",
        "Validation MSE",
        "Test MSE",
    ]
)

bostonOopLog.loc[len(bostonOopLog )] = [
    arch_string, 
    f"{opt}", 
    f"{epochs}", 
    f"{train_mse}",
    f"{valid_mse}",
    f"{test_mse}",
]

bostonOopLog 

Using cpu...
--------------------------------------------------
Model:
BaseModel(
  (fc1): Linear(in_features=13, out_features=32, bias=True)
  (intermediate_layers): ModuleList(
    (0): Linear(in_features=32, out_features=16, bias=True)
    (1): Linear(in_features=16, out_features=8, bias=True)
  )
  (fc_output): Linear(in_features=8, out_features=1, bias=True)
)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                [-1, 1, 32]             448
            Linear-2                [-1, 1, 16]             528
            Linear-3                 [-1, 1, 8]             136
            Linear-4                 [-1, 1, 1]               9
Total params: 1,121
Trainable params: 1,121
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00

Unnamed: 0,Architecture string,Optimizer,Epochs,Train MSE,Validation MSE,Test MSE
0,13-32-16-8-1,<class 'torch.optim.sgd.SGD'>,5,567.431,523.641,489.015


# Perform Classification on HCDR dataset via Sequential API

## Note:
Have a look at the Home Credit Default Risk data on Kaggle: https://www.kaggle.com/competitions/home-credit-default-risk/overview <br>

There are multiple .csv files provided for this challenge on Kaggle. For this question, we'll be only using the **application_train.csv** to predict TARGET : 1 or 0. <br>

**DATASET LINK:** Download the .zip of the .csv file from here: https://www.dropbox.com/s/rmqsrhhc8qhyq50/application_train.csv.zip?dl=0

Save the **application_train.csv.zip** in the same directory you're running this notebook in. Please ensure that the .zip is downloaded and then run the next cells.

In [28]:
!apt install unzip

Reading package lists... Done
Building dependency tree       
Reading state information... Done
Suggested packages:
  zip
The following NEW packages will be installed:
  unzip
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 172 kB of archives.
After this operation, 559 kB of additional disk space will be used.
Err:1 http://deb.debian.org/debian stretch/main amd64 unzip amd64 6.0-21+deb9u2
  404  Not Found
E: Failed to fetch http://deb.debian.org/debian/pool/main/u/unzip/unzip_6.0-21+deb9u2_amd64.deb  404  Not Found
E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?


In [29]:
!unzip -o application_train.csv.zip

/bin/sh: 1: unzip: not found


In [33]:
import torch
import torch.utils.data
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer


torch.manual_seed(0)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# load data
hcdr_application = pd.read_csv("application_train.csv")
X = hcdr_application.drop('TARGET', axis = 1)
y = hcdr_application.TARGET
print("Shapes:", X.shape, y.shape)

# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42, shuffle = True)
X_train, X_validation, y_train, y_validation = train_test_split(X_train, y_train, test_size=0.15, random_state=42, shuffle=True)

## Scaling
numerical_features = X.select_dtypes(include = ['int64','float64']).columns
numerical_features = numerical_features.tolist()

num_pipeline =Pipeline([('std',StandardScaler()),
        ('imputer', SimpleImputer(strategy='mean'))
])

categorical_features = X.select_dtypes(include = ['object']).columns
categorical_features = categorical_features.tolist()


cat_pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy='most_frequent')),
        ('ohe', OneHotEncoder(sparse=False, handle_unknown="ignore"))
    ])

features = numerical_features + categorical_features

data_pipeline = ColumnTransformer([
       ("num_pipeline", num_pipeline, numerical_features),
       ("cat_pipeline", cat_pipeline, categorical_features)],
        remainder='drop',
        n_jobs=-1
    )


X_train = data_pipeline.fit_transform(X_train)
X_validation = data_pipeline.transform(X_validation) #Transform validation set with the same constants
X_test = data_pipeline.transform(X_test) #Transform test set with the same constants


y_train = y_train.to_numpy()
y_validation = y_validation.to_numpy()
y_test = y_test.to_numpy()

# convert numpy arrays to tensors
X_train_tensor = torch.from_numpy(X_train)
X_valid_tensor = torch.from_numpy(X_validation)
X_test_tensor = torch.from_numpy(X_test)
y_train_tensor = torch.from_numpy(y_train)
y_valid_tensor = torch.from_numpy(y_validation)
y_test_tensor = torch.from_numpy(y_test)

# create TensorDataset in PyTorch
hcdr_train = torch.utils.data.TensorDataset(X_train_tensor, y_train_tensor)
hcdr_valid = torch.utils.data.TensorDataset(X_valid_tensor, y_valid_tensor)
hcdr_test = torch.utils.data.TensorDataset(X_test_tensor, y_test_tensor)

# print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
# create dataloader
# DataLoader is implemented in PyTorch, which will return an iterator to iterate training data by batch.
train_batch_size = 96
valid_test_batch_size = 64
trainloader_hcdr = torch.utils.data.DataLoader(hcdr_train, batch_size=train_batch_size, shuffle=True, num_workers=2)
validloader_hcdr = torch.utils.data.DataLoader(hcdr_valid, batch_size=valid_test_batch_size, shuffle=True, num_workers=2)
testloader_hcdr = torch.utils.data.DataLoader(hcdr_test, batch_size=valid_test_batch_size, shuffle=True, num_workers=2)

#
# Method to create, define and run a deep neural network model
#
def run_hcdr_model(
    hidden_layer_neurons=[32, 16, 8],
    opt=optim.SGD,
    epochs=5,
    learning_rate=1e-3
):
    
    D_in = X_test.shape[1]  # Input layer neurons depend on the input dataset shape
    D_out = 2  # Output layer neurons - depend on what you're trying to predict, here, 2 classes: 0 and 1
    
    str_neurons = [str(h) for h in hidden_layer_neurons]
    arch_string = f"{D_in}-{'-'.join(str_neurons)}-{D_out}"
    
    layers = [
        torch.nn.Linear(D_in, hidden_layer_neurons[0]),  # X.matmul(W1)
        nn.ReLU(),  # ReLU( X.matmul(W1))
    ]
    
    # Add hidden layers
    for i in range(1, len(hidden_layer_neurons)):
        prev, curr = hidden_layer_neurons[i - 1], hidden_layer_neurons[i]
        layers.append(torch.nn.Linear(prev, curr))
        layers.append(nn.ReLU())
        
    
    # Add final layer
    layers.append(nn.Linear(hidden_layer_neurons[-1], D_out)) # Relu( X.matmul(W1)).matmul(W2))
    
    # Use the nn package to define our model and loss function.
    # use the sequential API makes things simple
    model = torch.nn.Sequential(*layers)

    model.to(device)

    # use Cross Entropy and SGD optimizer.
    loss_fn = nn.CrossEntropyLoss()  #for classfication 
    optimizer = opt(model.parameters(), lr=learning_rate)

    #summary(model, (4, 20))
    print('-'*50)
    print('Model:')
    print(model)
    print('-'*50)
    
    '''
    Training Process:
        Load a batch of data.
        Zero the grad.
        Predict the batch of the data through net i.e forward pass.
        Calculate the loss value by predict value and true value.
        Backprop i.e get the gradient with respect to parameters
        Update optimizer i.e gradient update
    '''

    loss_history = []
    acc_history = []
    def train_epoch(epoch, model, loss_fn, opt, train_loader):
        running_loss = 0.0
        count = 0
        y_pred = []
        epoch_target = []
        # dataset API gives us pythonic batching 
        for batch_id, data in enumerate(train_loader):
            inputs, target = data[0].to(device), data[1].to(device)        
            # 1:zero the grad, 2:forward pass, 3:calculate loss,  and 4:backprop!
            opt.zero_grad()
            preds = model(inputs.float()) #prediction over the input data

            # compute loss and gradients
            loss = loss_fn(preds, target)    #mean loss for this batch

            loss.backward() #calculate nabla_w
            loss_history.append(loss.item())
            opt.step()  #update W
            y_pred.extend(torch.argmax(preds, dim=1).tolist())
            epoch_target.extend(target.tolist())
            #from IPython.core.debugger import Pdb as pdb;    pdb().set_trace() #breakpoint; dont forget to quit

            running_loss += loss.item()
            count += 1

        loss = np.round(running_loss/count, 3)
        
        #accuracy
        correct = (np.array(y_pred) == np.array(epoch_target))
        accuracy = correct.sum() / correct.size
        accuracy = np.round(accuracy, 3)
        return loss, accuracy



    #from IPython.core.debugger import Pdb as pdb;    pdb().set_trace() #breakpoint; dont forget to quit
    def evaluate_model(epoch, model, loss_fn, opt, data_loader, tag = "Test"):
        overall_loss = 0.0
        count = 0
        y_pred = []
        epoch_target = []
        for i,data in enumerate(data_loader):
            inputs, target = data[0].to(device), data[1].to(device)                
            preds = model(inputs.float())      

            loss = loss_fn(preds, target)           # compute loss value

            overall_loss += (loss.item())  # compute total loss to save to logs
            y_pred.extend(torch.argmax(preds, dim=1).tolist())
            epoch_target.extend(target.tolist())
            count += 1

        # compute mean loss
        loss = np.round(overall_loss/count, 3)
        #accuracy
        correct = (np.array(y_pred) == np.array(epoch_target))
        accuracy = correct.sum() / correct.size
        accuracy = np.round(accuracy, 3)
        return loss, accuracy
        


    for epoch in range(epochs):
        # print(f"Epoch {epoch+1}")
        train_loss, train_accuracy = train_epoch(epoch, model, loss_fn, optimizer, trainloader_hcdr)
        valid_loss, valid_accuracy = evaluate_model(epoch, model, loss_fn, optimizer, validloader_hcdr, tag = "Validation")
        print(f"Epoch {epoch+1}: Train Accuracy: {train_accuracy}\t Validation Accuracy: {valid_accuracy}")
    print("-"*50)
    test_loss, test_accuracy = evaluate_model(epoch, model, loss_fn, opt, testloader_hcdr, tag="Test")
    
    return arch_string, train_accuracy, valid_accuracy, test_accuracy

Shapes: (307511, 121) (307511,)


**NOTE:** 
**The following cell might take relatively longer to run owing to the larger size of the dataset and the parameters you choose for the model.**

In [34]:
#
# NOTE: Run this cell however number of times you want to achieve larger train/test accuracy
# Experiment with different arguments to the function
#

import pandas as pd
torch.manual_seed(0)
#==================================================#
#    Modify START   #
#==================================================#
'''
(hidden_layers_neurons) - A list of the number of neurons in the hidden layers in order. DEFAULT: [32, 16, 8] => 1st hidden layer: 32 neurons, 2nd: 16, 3rd: 8
(opt) - The optimizer function to use: SGD, Adam, etc.,  DEFAULT: optim.SGD
(epochs) - The total number of epochs to train your model for,  DEFAULT: 5
(learning_rate) - The learning rate to take the gradient descent step with
'''

learning_rate = 1e-3
hidden_layer_neurons = [32, 16, 8]
opt = optim.SGD  # optim.SGD, Optim.Adam, etc.
epochs = 5

#==================================================#
#    Modify END #
#==================================================#

arch_string, train_accuracy, valid_accuracy, test_accuracy = run_hcdr_model(
    hidden_layer_neurons,
    opt,
    epochs,
    learning_rate
)
    

try: hcdrLog 
except : hcdrLog = pd.DataFrame(
    columns=[
        "Architecture string", 
        "Optimizer", 
        "Epochs", 
        "Train accuracy",
        "Valid accuracy",
        "Test accuracy",
    ]
)

hcdrLog.loc[len(hcdrLog)] = [
    arch_string, 
    f"{opt}", 
    f"{epochs}", 
    f"{train_accuracy * 100}%",
    f"{valid_accuracy * 100}%",
    f"{test_accuracy * 100}%",
]

hcdrLog

--------------------------------------------------
Model:
Sequential(
  (0): Linear(in_features=245, out_features=32, bias=True)
  (1): ReLU()
  (2): Linear(in_features=32, out_features=16, bias=True)
  (3): ReLU()
  (4): Linear(in_features=16, out_features=8, bias=True)
  (5): ReLU()
  (6): Linear(in_features=8, out_features=2, bias=True)
)
--------------------------------------------------
Epoch 1: Train Accuracy: 0.822	 Validation Accuracy: 0.916
Epoch 2: Train Accuracy: 0.92	 Validation Accuracy: 0.916
Epoch 3: Train Accuracy: 0.92	 Validation Accuracy: 0.916
Epoch 4: Train Accuracy: 0.92	 Validation Accuracy: 0.916
Epoch 5: Train Accuracy: 0.92	 Validation Accuracy: 0.916
--------------------------------------------------


Unnamed: 0,Architecture string,Optimizer,Epochs,Train accuracy,Valid accuracy,Test accuracy
0,245-32-16-8-2,<class 'torch.optim.sgd.SGD'>,5,92.0%,91.60000000000001%,91.9%
