# Experimenting/exploring Neural Networks and Transfer Learning with PyTorch

In this notebook we experiment with Neural Networks using PyTorch, as well we start exploring transfer learning. The following articles were used as reference to build the networks:

* https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
* https://www.pluralsight.com/guides/introduction-to-resnet

# Mounting google drive

In order to make it easy to read data from Google Drive, it can be mounted as a "local" unity. This encapsulates the connection to Google API and reduces the amount of coding that would be needed when interacting with Google API

In [1]:
from google.colab import drive
drive.mount("/content/gdrive", force_remount=True)

Mounted at /content/gdrive


In [2]:
import os
import matplotlib.pyplot as plt
import torch
from torchvision import datasets, transforms
from torch import nn

As in the previous notebook we load the dataset into a tensor and then augment it

In [3]:
base_folder = "/content/gdrive/MyDrive/08 CS670 Artificial Intelligence/Term Project"
data_folder = os.path.join(base_folder, "transformed_images")
transform = transforms.Compose([
    transforms.Resize(255),
    transforms.ToTensor()]
)

dataset = datasets.ImageFolder(data_folder, transform=transform)

In [4]:
transform =  transforms.Compose([
    transforms.RandomRotation(90),
    transforms.RandomHorizontalFlip(),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor()])
dataset_augmented = datasets.ImageFolder(data_folder, transform=transform)

In [5]:
# final_dataset = torch.utils.data.ConcatDataset([dataset, dataset_augmented])
final_dataset = torch.utils.data.ConcatDataset([dataset])
generator1 = torch.Generator().manual_seed(42)
train_dataset, test_dataset = torch.utils.data.random_split(final_dataset, [0.75, 0.25])

In [6]:
batch_size = 128
train_dataloader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, drop_last=True)
image, label = next(iter(train_dataloader))

In [7]:
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=True, drop_last=True)

# Build Neural Network

To experiment with NN we build a simple SoftMax regression with only two layers:

* A layer to flat the 3D Tensor (width X height X channels)
* Linear regression with two output channels (one for each class of label)

The SoftMax itself (getting the label withe the highest probability) is applied after prediction, as the network outputs the probability for the two classes.

In [8]:
class SoftMaxRegression(nn.Module):
    def __init__(self):
        super().__init__()
        # define layers
        self.flatten = nn.Flatten()
        self.linear = nn.Linear(195075, 2)

    def forward(self, x):
        # Now it only takes a call to the layer to make predictions
        y = self.flatten(x)
        y = self.linear(y)
        return y

Here we check if GPU is available, instantiate our model, pick our loss functions and define our optimizer. We are using one example from Deep Learning class without worrying about "the best choice", as this is just an experiment.

**Note**: in Google Colab, do not forget to change the runtime environment to one with GPU

In [9]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

# let's instantiate a model
model = SoftMaxRegression().to(device)

# define loss function
loss_fn = nn.CrossEntropyLoss()
# loss_fn = nn.NLLLoss()

# define optimizer
#optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
optimizer = torch.optim.Adam(model.parameters())

cuda


Here we initiallize the weights for our model

In [10]:
def init_weights(m):
    if type(m) == nn.Linear:
        nn.init.normal_(m.weight, std=0.01)

model.apply(init_weights);

And here we check how our model is working for one batch of our training set

In [11]:
for x_batch, y_batch in train_dataloader:

  print(x_batch.shape)
  print(y_batch.shape)
  print("Features:\n\t", x_batch[:1], "\nLabels:\n\t", y_batch[:1])

  print(nn.Flatten(x_batch))

  x_batch = x_batch.to(device)
  y_batch = y_batch.to(device)

  y_hat = model(x_batch)
  print(y_hat[1])

  ll = loss_fn(y_hat, y_batch )
#   print(ll)

  break   # break after first pair, we just want to observe

torch.Size([128, 3, 255, 255])
torch.Size([128])
Features:
	 tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]],

         [[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          ...,
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]]) 
Labels:
	 tensor([1])
Flatten(
  start_dim=tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
            [0., 0., 0.,  ..., 0., 0., 0.],
 

And here we built our train step (epoch) and test it

In [12]:
def make_train_step(model, loss_fn, optimizer):
    # Builds function that performs a step in the train loop
    def train_step(x, y):
        # Sets model to TRAIN mode
        model.train()
        # Makes predictions
        yhat = model(x)
        # Computes loss
        loss = loss_fn(yhat, y)
        # Computes gradients
        loss.backward()
        # Updates parameters and zeroes gradients
        optimizer.step()
        optimizer.zero_grad()
        # Returns the loss
        return loss.item()

    # Returns the function that will be called inside the train loop
    return train_step

# Creates the train_step function for our model, loss function and optimizer
train_step = make_train_step(model, loss_fn, optimizer)
losses = []
n_epochs = 10

Finally we train our model for 5 epochs (iterations)

In [14]:
import numpy as np

model.apply(init_weights)   #always good to initialize in the beginning
n_epochs = 5
losses = []
test_losses = []
train_step = make_train_step(model, loss_fn, optimizer)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

for epoch in range(n_epochs):
    print(epoch + 1, "/", n_epochs)
    for x_batch, y_batch in train_dataloader:
        x_batch = x_batch.to(device)
        y_batch = y_batch.to(device)

        loss = train_step(x_batch, y_batch)
        losses.append(loss)

    print("Train loss:", np.mean(losses))

    # torch no_grad makes sure that the nested-below computations happen without gradients,
    # since these are not needed for evaluation
    with torch.no_grad():
        for x_test, y_test in test_dataloader:
            x_test = x_test.to(device)
            y_test = y_test.to(device)

            model.eval()

            yhat = model(x_test)
            test_loss = loss_fn(yhat, y_test)
            test_losses.append(test_loss.item())

        print("Test loss:", np.mean(test_losses))
#print(model.state_dict())

cuda
1 / 5
Train loss: 8.616334774277426
Test loss: 4.210230523889715
2 / 5
Train loss: 5.395738652258208
Test loss: 2.724452265284278
3 / 5
Train loss: 3.932867935510597
Test loss: 2.0165023948207046
4 / 5
Train loss: 3.089810702272437
Test loss: 1.6321038888259367
5 / 5
Train loss: 2.542619027152206
Test loss: 1.3790089810436421


Let's check the accuracy of trained model

In [15]:
def accuracy(net, test_dataloader):

    n_samples = 0;
    n_correct = 0;
    model.eval()
    for X, y in test_dataloader:
        X = X.to(device)
        y = y.to(device)

        trues = y

        preds = model(X).argmax(axis=1)


        n_samples = n_samples + y.shape[0]
        n_correct = n_correct + (trues==preds).sum()
#         break

    return n_correct/n_samples

accuracy(model, test_dataloader)

tensor(0.9325, device='cuda:0')

Even without worrying about "the right choice" or performance, our test model was able to achieve an accuracy of 0.9325 (93.25%). It is important to notice that our test loss was always below the train loss, which indicates a good generalization of the model.

# Transfer Learning

We use PyTorch to load one of the several pretrained models available and check its architecture.

In [16]:
import torchvision
net = torchvision.models.resnet101(pretrained=True)
net = net.cuda() if torch.cuda.is_available() else net
net

Downloading: "https://download.pytorch.org/models/resnet101-63fe2227.pth" to /root/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth
100%|██████████| 171M/171M [00:01<00:00, 139MB/s]


ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 