In [1]:
import numpy as np
import torch
import torch.utils.data
import sklearn.datasets
from sklearn.metrics import accuracy_score

## Introduction

As NumPy, PyTorch provides basic functions for creating tensors and common operations on them.

In [2]:
a = torch.ones(5)
b = torch.full_like(a, 5)

In [3]:
a

tensor([1., 1., 1., 1., 1.])

In [4]:
b

tensor([5., 5., 5., 5., 5.])

In [5]:
a + b

tensor([6., 6., 6., 6., 6.])

However, in contrast to NumPy, PyTorch can perform computations on GPU.
 - See the [CUDA semantics](https://pytorch.org/docs/stable/notes/cuda.html) documentation for details, including how to write device-agnositc code

In [6]:
a.device

device(type='cpu')

In [7]:
a = a.to('cuda')

In [8]:
a.device

device(type='cuda', index=0)

In [9]:
# a + b

In [9]:
b = torch.full_like(a, 5)

In [10]:
a + b

tensor([6., 6., 6., 6., 6.], device='cuda:0')

## Neural Networks

Since PyTorch allows automatic differentiation, building neural networks with PyTorch is very easy.

All the models implemented in PyTorch should subclass the [`torch.nn.Module` class](https://pytorch.org/docs/stable/nn.html?highlight=module#torch.nn.Module). The main method of this class (which is used by a lot of other PyTorch classes) is `forward()`. This is the core method that defines how your model is going to run and what outputs it should produce given the inputs. 
In the constructor of the your model (the `__init__` method) you should initialize all the layers you are going to use. PyTorch provides a large amount of commonly used layers that are very easy to use. Please refer to the [documentation of PyTorch](https://pytorch.org/docs/stable/nn.html) for a complete list of layers.

Below we are going to declare a simple neural network with two layers and a ReLU activation function between them.

In [12]:
class Net(torch.nn.Module):
    def __init__(self, nb_features, hidden_size, nb_classes):
        super().__init__()

        self.fc1 = torch.nn.Linear(nb_features, hidden_size)
        self.fc1_activ = torch.nn.ReLU()

        self.fc_logits = torch.nn.Linear(hidden_size, nb_classes)
        
    def forward(self, inputs):
        z1 = self.fc1(inputs)
        z1_active = self.fc1_activ(z1)

        logits = self.fc_logits(z1_active)
        
        return logits

In [13]:
model = Net(nb_features=4, hidden_size=8, nb_classes=3)

In [14]:
model

Net(
  (fc1): Linear(in_features=4, out_features=8, bias=True)
  (fc1_activ): ReLU()
  (fc_logits): Linear(in_features=8, out_features=3, bias=True)
)

Let's test the model on a random input. Notice how the size of the input data correspond to the size of the first layer and the size of the output correspond to the size of the last layer.

In [15]:
inputs = torch.rand(1, 4)

In [16]:
inputs

tensor([[0.7837, 0.6544, 0.1735, 0.5069]])

In [17]:
outputs = model(inputs)

In [18]:
outputs

tensor([[-0.0457, -0.1129, -0.1346]], grad_fn=<AddmmBackward>)

### Loss calcualtion

PyTorch has a quite a few pre-defined loss functions that we can use. Most common loss functions are enumerated below:
 - [Mean Squared Error loss](https://pytorch.org/docs/stable/nn.html#torch.nn.MSELoss)
 - [Cross Entropy loss](https://pytorch.org/docs/stable/nn.html#torch.nn.CrossEntropyLoss)
 - [Binary Cross Entropy loss](https://pytorch.org/docs/stable/nn.html#torch.nn.BCELoss)

In [19]:
targets = torch.rand_like(outputs)

In [20]:
targets

tensor([[0.3113, 0.1668, 0.1745]])

In [21]:
criterion = torch.nn.MSELoss()
loss = criterion(outputs, targets)

In [22]:
loss

tensor(0.1004, grad_fn=<MseLossBackward>)

### Gradients

After calling `loss.backward()` PyTorch performs the backward pass of the network and stores the gradients of the weights.

In [23]:
model.zero_grad()

In [24]:
print('fc1.bias before backward')
print(model.fc1.bias.grad)

fc1.bias before backward
None


In [25]:
loss.backward()

In [26]:
print('fc1.bias after backward')
print(model.fc1.bias.grad)

fc1.bias after backward
tensor([ 0.0952, -0.0072, -0.0719, -0.0127, -0.0388,  0.0000,  0.0000,  0.0279])


### Parameters update

Alongside with the loss functions, PyTorch provides several differnet optmizers, ranging from the classical [Stochastic Gradient Descent](https://pytorch.org/docs/stable/optim.html#torch.optim.SGD) to [RMSprop](https://pytorch.org/docs/stable/optim.html#torch.optim.RMSprop) and [Adam](https://pytorch.org/docs/stable/optim.html#torch.optim.Adam)

In [27]:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In general, training loop consists of the following parts:
1. Clearing the gradients
2. Obtaining inputs and targets, and, possibly, moving them to the GPU
3. Performing the forward pass of the model
4. Calculating the loss
5. Performing the backward pass
6. Updating the weights of the network

In [178]:
optimizer.zero_grad()

inputs = torch.rand(1, 4)
targets = torch.rand(1, 3)

outputs = model(inputs)

loss = criterion(outputs, targets)

loss.backward()
optimizer.step()

## Data loading

In [179]:
iris_data = sklearn.datasets.load_iris()

In [180]:
iris_data.feature_names

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [181]:
iris_data.data[:10,:]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

In [182]:
iris_data.target_names

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

In [183]:
iris_data.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

The [Dataset](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.Dataset) class provided by PyTorch is an abstract class representing any dataset used as input to a model. It is conveniently designed in a way that all the classes subclassing it would only have to override `__len__` and `__getitem__` methods. The goal of the `__getitem__` method is, given an index, to return the corresponding input data

You might find it useful to have a look at the official [Data Loading and Processing Tutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) on the PyTorch website.

In [184]:
class IrisDataset(torch.utils.data.Dataset):
    def __init__(self, data):
        self.features_names = data.feature_names
        self.target_names = data.target_names
        self.X = data.data.astype(np.float32)
        self.y = data.target

    def __getitem__(self, index):
        X = self.X[index]
        y = self.y[index]

        return X, y

    def __len__(self):
        return len(self.y)

In [185]:
dataset = IrisDataset(iris_data)

In [186]:
len(dataset)

150

In [187]:
dataset[0]

(array([5.1, 3.5, 1.4, 0.2], dtype=float32), 0)

[DataLoader](https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader) is another useful class of PyTorch that combines a dataset and a sampler, and provides single- or multi-process iterators over the dataset. The goal of data loader is to create batches of training examples for the network by sampling the dataset and combining the sampled items into batches.

In [189]:
dataloader = torch.utils.data.DataLoader(dataset, batch_size=10, shuffle=True)

In [190]:
len(dataloader)

15

## Training loop

In [199]:
nb_features = dataset.X.shape[1]
hidden_size = 32
nb_classes = len(set(dataset.y))

model = Net(nb_features, hidden_size, nb_classes)
model = model.to('cuda')

In [200]:
model

Net(
  (fc1): Linear(in_features=4, out_features=32, bias=True)
  (fc1_activ): ReLU()
  (fc_logits): Linear(in_features=32, out_features=3, bias=True)
)

In [201]:
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

In [203]:
nb_epochs = 9

for i in range(nb_epochs):
    epoch_losses = []
    for X_batch, y_batch in dataloader:
        model.train()
        optimizer.zero_grad()
        
        X_batch = X_batch.to('cuda')
        y_batch = y_batch.to('cuda')
        
        logits = model(X_batch)
        loss = criterion(logits, y_batch)
        
        loss.backward()
        optimizer.step()
        
        epoch_losses.append(loss.item())
        
    epoch_loss = np.mean(epoch_losses)
    print(f'Epoch: {i+1}, loss: {epoch_loss:.3f}')

Epoch: 1, loss: 0.511
Epoch: 2, loss: 0.490
Epoch: 3, loss: 0.468
Epoch: 4, loss: 0.451
Epoch: 5, loss: 0.431
Epoch: 6, loss: 0.417
Epoch: 7, loss: 0.405
Epoch: 8, loss: 0.386
Epoch: 9, loss: 0.376
