<a href="https://colab.research.google.com/github/yuvi-s64/YuviN-DataScience-GenAI-Submissions/blob/main/6_02_DNN_101.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![](https://drive.google.com/uc?export=view&id=1xqQczl0FG-qtNA2_WQYuWePW9oU8irqJ)

# 6.02 Dense Neural Network (with PyTorch)
This will expand on our logistic regression example and take us through building our first neural network. If you haven't already, be sure to check (and if neccessary) switch to GPU processing by clicking Runtime > Change runtime type and selecting GPU. We can test this has worked with the following code:

In [1]:
import torch

# Check for GPU availability
print("Num GPUs Available: ", torch.cuda.device_count())

Num GPUs Available:  1


Hopefully your code shows you have 1 GPU available! Next let's get some data. We'll start with another in-built dataset:

In [2]:
# upload an in-built Python (OK semi-in-built) dataset
from sklearn.datasets import load_diabetes

import pandas as pd
import numpy as np

# import the data
data = load_diabetes()
data

{'data': array([[ 0.03807591,  0.05068012,  0.06169621, ..., -0.00259226,
          0.01990749, -0.01764613],
        [-0.00188202, -0.04464164, -0.05147406, ..., -0.03949338,
         -0.06833155, -0.09220405],
        [ 0.08529891,  0.05068012,  0.04445121, ..., -0.00259226,
          0.00286131, -0.02593034],
        ...,
        [ 0.04170844,  0.05068012, -0.01590626, ..., -0.01107952,
         -0.04688253,  0.01549073],
        [-0.04547248, -0.04464164,  0.03906215, ...,  0.02655962,
          0.04452873, -0.02593034],
        [-0.04547248, -0.04464164, -0.0730303 , ..., -0.03949338,
         -0.00422151,  0.00306441]]),
 'target': array([151.,  75., 141., 206., 135.,  97., 138.,  63., 110., 310., 101.,
         69., 179., 185., 118., 171., 166., 144.,  97., 168.,  68.,  49.,
         68., 245., 184., 202., 137.,  85., 131., 283., 129.,  59., 341.,
         87.,  65., 102., 265., 276., 252.,  90., 100.,  55.,  61.,  92.,
        259.,  53., 190., 142.,  75., 142., 155., 225.,  59

We are working on a regression problem, with "structured" data which has already been cleaned and normalised. We can skip the usual cleaning/engineering steps. However, we do need to get the data into PyTorch:

In [3]:
# Convert data to PyTorch tensors
X = torch.tensor(data.data, dtype=torch.float32)
y = torch.tensor(data.target, dtype=torch.float32).reshape(-1, 1) # Reshape y to be a column vector

Now our data is stored in tensors we can do train/test splitting as before (in fact we can use sklearn as before):

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)

torch.Size([353, 10]) torch.Size([353, 1])
torch.Size([89, 10]) torch.Size([89, 1])


Now we can set up our batches for training. As we have a nice round 400 let's go with batches of 50 (8 batches in total). We'll also seperate the features and labels:

In [5]:
from torch.utils.data import TensorDataset, DataLoader

# Create TensorDatasets and DataLoaders
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=50, shuffle=True)

test_dataset = TensorDataset(X_test, y_test)
test_loader = DataLoader(test_dataset, batch_size=50, shuffle=False)

Now its time to build our model. We'll keep it simple ... a model with an input layer of 10 features and then 2x _Dense_ (fully connected) layers each with 5 neurons and ReLU activation. Our output layer will be size=1 given this is a regression problem and we want a single value output per prediction.

This will be easier to understand if you have read through the logistic regression tutorial.

In [6]:
import torch
import torch.nn as nn

# Define the model
class DiabetesModel(nn.Module):
    def __init__(self):
        super(DiabetesModel, self).__init__()
        # we'll set up the layers as a sequence using nn.Sequential
        self.layers = nn.Sequential(

            # first layer will be a linear layer that has 5x neurons
            # (5x sets of linear regression)
            # the layer takes the 10 features as input (i.e. 10, 5)
            nn.Linear(10, 5),

            nn.ReLU(), # ReLU activation

            # second linear layer again has 5 neurons
            # this time taking the input as the output of the last layer
            # (which had 5x neurons)
            nn.Linear(5, 5),

            nn.ReLU(), # ReLU again

            # last linear layer takes the output from the previous 5 neurons
            # this time its a single output with no activation
            # i.e. this is the predicitons (regression)
            nn.Linear(5, 1)
        )

    def forward(self, x):
        return self.layers(x) # pass the data through the layers

As before we need to create a model object, specify the loss (criterion) and an optimiser (which we cover next week):

In [7]:
import torch.optim as optim

# Initialize the model, loss function, and optimizer
model = DiabetesModel()
criterion = nn.MSELoss() # MSE loss function
optimiser = optim.Adam(model.parameters(), lr=0.001)

Now we can train the model. Again, the logistic regression tutorial (6.01) may help you undertstand this:

In [8]:
# Move model to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Training loop (example - you'll likely want to add more epochs)
epochs = 100 # 100 epochs

for epoch in range(epochs):
  # use the train_loader to pass the inputs (x) and targets (y)
  for inputs, targets in train_loader:
    # pass to the GPU (hopefully)
    inputs, targets = inputs.to(device), targets.to(device)

    # pass model to GPU as well
    model.to(device)

    model.train() # put the model object in train mode
    optimiser.zero_grad() # reset the gradiants
    outputs = model(inputs) # create outputs
    loss = criterion(outputs, targets) # compare with Y to get loss
    loss.backward() # backpropogate the loss (next week)
    optimiser.step() # # update the parameters based on this round of training

  # every 10 steps we will print out the current loss
    if (epoch+1) % 10 == 0: # modular arithmetic
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

Epoch [10/100], Loss: 35465.5273
Epoch [10/100], Loss: 25143.334
Epoch [10/100], Loss: 30627.375
Epoch [10/100], Loss: 27042.3672
Epoch [10/100], Loss: 19174.418
Epoch [10/100], Loss: 34107.9648
Epoch [10/100], Loss: 36817.3828
Epoch [10/100], Loss: 27329.0547
Epoch [20/100], Loss: 30353.416
Epoch [20/100], Loss: 25781.8887
Epoch [20/100], Loss: 30997.4902
Epoch [20/100], Loss: 23386.625
Epoch [20/100], Loss: 27897.9863
Epoch [20/100], Loss: 37978.4688
Epoch [20/100], Loss: 29129.3945
Epoch [20/100], Loss: 71291.375
Epoch [30/100], Loss: 28124.5
Epoch [30/100], Loss: 31373.875
Epoch [30/100], Loss: 28487.0137
Epoch [30/100], Loss: 26272.9277
Epoch [30/100], Loss: 38156.8242
Epoch [30/100], Loss: 25043.9727
Epoch [30/100], Loss: 29609.5273
Epoch [30/100], Loss: 42630.3984
Epoch [40/100], Loss: 28806.9238
Epoch [40/100], Loss: 30599.252
Epoch [40/100], Loss: 26111.6543
Epoch [40/100], Loss: 29301.1074
Epoch [40/100], Loss: 32741.3398
Epoch [40/100], Loss: 34515.7656
Epoch [40/100], Loss:

We can see loss is significantly lower at the end than it was at the start. However, it is also bouncing around a little still which suggests the model needs more training (100 epochs is not a lot in deep learning terms). However, let's evaluate as before:

In [9]:
# Evaluation (example)
model.eval() # testing mode
mse_values = [] # collect the MSE scores

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # predict the test data

        # Calculate Mean Squared Error
        mse = criterion(outputs, targets) # calcualte mse for the batch
        mse_values.append(mse.item()) # add to the list of MSE values

# Calculate and print the average MSE
avg_mse = np.mean(mse_values)
print(f"Average MSE on test set: {avg_mse}")

Average MSE on test set: 26004.1181640625


MSE looks expected given training (no obvious sign of overfitting). However, we probably can get better results with tuning and more epochs.

Let's run the loop again a little differently to collect the predicted values (y_hat) and actuals (y) and add them to a dataset for comparions:

In [10]:
# Evaluation
model.eval()
predictions = []
actuals = []

with torch.no_grad():
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs)
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(targets.cpu().numpy())

# Create DataFrame
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
results_df

Unnamed: 0,Predicted,Actual
0,0.622101,219.0
1,0.622101,70.0
2,0.622101,202.0
3,0.622101,230.0
4,0.622101,111.0
...,...,...
84,0.622101,153.0
85,0.622101,98.0
86,0.622101,37.0
87,0.622101,63.0


Side-by-side, they don't look great. Can you improve them?

<br><br>

## EXERCISE #1
Try increasing the number of epochs to 1,000 (when the model is fairly well trained then the results printed for each 10x epochs will be fairly stable and not change much). Does this give better results?

<br><br>

## EXERCISE #2 (optional)
Try experimenting with the architecture (number of neurons and/or number of layers). Can we reach an optimal architecture?

In [12]:
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
import pandas as pd

# Re-defining the model class for self-containment in this cell
class DiabetesModel(nn.Module):
    def __init__(self):
        super(DiabetesModel, self).__init__()
        self.layers = nn.Sequential(
            nn.Linear(10, 5),
            nn.ReLU(),
            nn.Linear(5, 5),
            nn.ReLU(),
            nn.Linear(5, 1)
        )

    def forward(self, x):
        return self.layers(x)

# Initialize the model, loss function, and optimizer
# Re-initialize to ensure a fresh training start
model = DiabetesModel()
criterion = nn.MSELoss()
optimiser = optim.Adam(model.parameters(), lr=0.001)

# Determine device for training (GPU if available, else CPU)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

# Set the new number of epochs as requested
epochs = 1000

print(f"\nStarting training with {epochs} epochs...")

# Training loop
for epoch in range(epochs):
    for inputs, targets in train_loader:
        inputs, targets = inputs.to(device), targets.to(device)

        model.train() # Put the model in training mode
        optimiser.zero_grad() # Reset the gradients
        outputs = model(inputs) # Forward pass
        loss = criterion(outputs, targets) # Calculate loss
        loss.backward() # Backpropagate the loss
        optimiser.step() # Update model parameters

    # Print loss every 100 epochs to track progress
    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {round(loss.item(), 4)}')

print("\nTraining complete. Evaluating model...")

# Evaluation
model.eval() # Put the model in evaluation mode
new_mse_values = [] # Collect MSE scores for the new training run
predictions = []
actuals = []

with torch.no_grad(): # Disable gradient calculation for evaluation
    for inputs, targets in test_loader:
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = model(inputs) # Make predictions on the test data

        # Calculate Mean Squared Error for the batch
        mse = criterion(outputs, targets)
        new_mse_values.append(mse.item())

        # Store predictions and actuals for DataFrame
        predictions.extend(outputs.cpu().numpy())
        actuals.extend(targets.cpu().numpy())

# Calculate and print the average MSE on the test set
new_avg_mse = np.mean(new_mse_values)
print(f"Average MSE on test set after {epochs} epochs: {new_avg_mse}")

# Create DataFrame for predictions vs. actuals
results_df = pd.DataFrame({'Predicted': np.array(predictions).flatten(), 'Actual': np.array(actuals).flatten()})
print("\nPredictions vs. Actuals (first 5 rows):")
print(results_df.head())

# Comment on the results
previous_avg_mse = 26004.1181640625 # From the previous execution

print("\n--- Commentary on Results ---")
print(f"Previous Average MSE (100 epochs): {previous_avg_mse}")
print(f"New Average MSE ({epochs} epochs): {new_avg_mse}")


# Display the full results_df for inspection
print("\nFull Results DataFrame:")
print(results_df)


Starting training with 1000 epochs...
Epoch [100/1000], Loss: 28960.623
Epoch [200/1000], Loss: 1135.7103
Epoch [300/1000], Loss: 4325.0146
Epoch [400/1000], Loss: 3561.2021
Epoch [500/1000], Loss: 6078.603
Epoch [600/1000], Loss: 6641.3613
Epoch [700/1000], Loss: 3515.5886
Epoch [800/1000], Loss: 3610.2371
Epoch [900/1000], Loss: 1408.5449
Epoch [1000/1000], Loss: 1799.7952

Training complete. Evaluating model...
Average MSE on test set after 1000 epochs: 2871.0079345703125

Predictions vs. Actuals (first 5 rows):
    Predicted  Actual
0  149.570999   219.0
1  172.765121    70.0
2  143.600800   202.0
3  290.782532   230.0
4  131.129929   111.0

--- Commentary on Results ---
Previous Average MSE (100 epochs): 26004.1181640625
New Average MSE (1000 epochs): 2871.0079345703125

Full Results DataFrame:
     Predicted  Actual
0   149.570999   219.0
1   172.765121    70.0
2   143.600800   202.0
3   290.782532   230.0
4   131.129929   111.0
..         ...     ...
84  117.054016   153.0
85  

As we can see from the higher average MSE, training with 1000 epochs led to better results