In [None]:
import numpy as np
import torch
import torch.utils.data
import sklearn.datasets
from sklearn.metrics import accuracy_score

## Introduction

As NumPy, PyTorch provides basic functions for creating tensors and common operations on them.

In [None]:
a = torch.ones(5)
b = torch.full_like(a, 5)

In [None]:
a

In [None]:
b

In [None]:
a + b

However, in contrast to NumPy, PyTorch can perform computations on GPU.
 - See the [CUDA semantics](https://pytorch.org/docs/stable/notes/cuda.html) documentation for details, including how to write device-agnositc code

In [None]:
a.device

In [None]:
a = a.to('cuda')

In [None]:
a.device

In [None]:
# a + b

In [None]:
b = torch.full_like(a, 5)

In [None]:
a + b

## Neural Networks

Since PyTorch allows automatic differentiation, building neural networks with PyTorch is very easy.

All the models implemented in PyTorch should subclass the [`torch.nn.Module` class](https://pytorch.org/docs/stable/nn.html?highlight=module#torch.nn.Module). The main method of this class (which is used by a lot of other PyTorch classes) is `forward()`. This is the core method that defines how your model is going to run and what outputs it should produce given the inputs. 
In the constructor of the your model (the `__init__` method) you should initialize all the layers you are going to use. PyTorch provides a large amount of commonly used layers that are very easy to use. Please refer to the [documentation of PyTorch](https://pytorch.org/docs/stable/nn.html) for a complete list of layers.

Below we are going to declare a simple neural network with two layers and a ReLU activation function between them.

In [None]:
class Net(torch.nn.Module):
    """A basic neural network model with one layer"""
    def __init__(self, nb_features, hidden_size, nb_classes):
        """
        Initialize the model class

        :param nb_features: Number of input feature
        :param hidden_size: The size of the hidden layer
        :param nb_classes: Number of classes for classification

        """

        super().__init__()

        self.fc1 = torch.nn.Linear(nb_features, hidden_size)
        self.fc1_activ = torch.nn.ReLU()

        self.fc_logits = torch.nn.Linear(hidden_size, nb_classes)

    def forward(self, inputs):
        """
        Perform the forward pass on the input data

        :param inputs: input data

        """
        z1 = self.fc1(inputs)
        z1_active = self.fc1_activ(z1)

        logits = self.fc_logits(z1_active)

        return logits

In [None]:
model = Net(nb_features=4, hidden_size=8, nb_classes=3)

In [None]:
model

Let's test the model on a random input. Notice how the size of the input data correspond to the size of the first layer and the size of the output correspond to the size of the last layer.

In [None]:
inputs = torch.rand(1, 4)

In [None]:
inputs

In [None]:
outputs = model(inputs)

In [None]:
outputs

### Loss calcualtion

PyTorch has a quite a few pre-defined loss functions that we can use. Most common loss functions are enumerated below:
 - [Mean Squared Error loss](https://pytorch.org/docs/stable/nn.html#torch.nn.MSELoss)
 - [Cross Entropy loss](https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss)
 - [Binary Cross Entropy loss](https://pytorch.org/docs/stable/nn.html#torch.nn.BCELoss)

In [None]:
targets = torch.rand_like(outputs)

In [None]:
targets

In [None]:
criterion = torch.nn.MSELoss()
loss = criterion(outputs, targets)

In [None]:
loss

### Gradients

After calling `loss.backward()` PyTorch performs the backward pass of the network and stores the gradients of the weights.

In [None]:
model.zero_grad()

In [None]:
print('fc1.bias before backward')
print(model.fc1.bias.grad)

In [None]:
loss.backward()

In [None]:
print('fc1.bias after backward')
print(model.fc1.bias.grad)

### Parameters update

Alongside with the loss functions, PyTorch provides several differnet optmizers, ranging from the classical [Stochastic Gradient Descent](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD) to [RMSprop](https://pytorch.org/docs/stable/optim.html#torch.optim.RMSprop) and [Adam](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam)

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In general, training loop consists of the following parts:
1. Clearing the gradients
2. Obtaining inputs and targets, and, possibly, moving them to the GPU
3. Performing the forward pass of the model
4. Calculating the loss
5. Performing the backward pass
6. Updating the weights of the network

In [None]:
optimizer.zero_grad()

inputs = torch.rand(1, 4)
targets = torch.rand(1, 3)

outputs = model(inputs)

loss = criterion(outputs, targets)

loss.backward()
optimizer.step()

## Data loading

In [None]:
iris_data = sklearn.datasets.load_iris()

In [None]:
iris_data.feature_names

In [None]:
iris_data.data[:10,:]

In [None]:
iris_data.target_names

In [None]:
iris_data.target

The [Dataset](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.Dataset) class provided by PyTorch is an abstract class representing any dataset used as input to a model. It is conveniently designed in a way that all the classes subclassing it would only have to override `__len__` and `__getitem__` methods. The goal of the `__getitem__` method is, given an index, to return the corresponding input data

You might find it useful to have a look at the official [Data Loading and Processing Tutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) on the PyTorch website.

In [None]:
class IrisDataset(torch.utils.data.Dataset):
    """A PyTorch dataset for the Scikit-learn Iris data"""
    def __init__(self, data):
        """
        Initialize the dataset class

        :param data: Scikit-learn Iris data

        """
        self.features_names = data.feature_names
        self.target_names = data.target_names
        self.X = data.data.astype(np.float32)
        self.y = data.target

    def __getitem__(self, index):
        """
        Return the item by its index

        :param index: index of the item

        """
        X = self.X[index]
        y = self.y[index]

        return X, y

    def __len__(self):
        """ Return the length of the dataset """
        return len(self.y)

In [None]:
dataset = IrisDataset(iris_data)

In [None]:
len(dataset)

In [None]:
dataset[0]

[DataLoader](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader) is another useful class of PyTorch that combines a dataset and a sampler, and provides single- or multi-process iterators over the dataset. The goal of data loader is to create batches of training examples for the network by sampling the dataset and combining the sampled items into batches.

In [None]:
dataloader = torch.utils.data.DataLoader(dataset, batch_size=10, shuffle=True)

In [None]:
len(dataloader)

## Training loop

In [None]:
nb_features = dataset.X.shape[1]
hidden_size = 32
nb_classes = len(set(dataset.y))

model = Net(nb_features, hidden_size, nb_classes)
model = model.to('cuda')

In [None]:
model

In [None]:
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In [None]:
nb_epochs = 9

for i in range(nb_epochs):
    epoch_losses = []
    for X_batch, y_batch in dataloader:
        model.train()
        optimizer.zero_grad()
        
        X_batch = X_batch.to('cuda')
        y_batch = y_batch.to('cuda')
        
        logits = model(X_batch)
        loss = criterion(logits, y_batch)
        
        loss.backward()
        optimizer.step()
        
        epoch_losses.append(loss.item())
        
    epoch_loss = np.mean(epoch_losses)
    print(f'Epoch: {i+1}, loss: {epoch_loss:.3f}')