## Introduction

There are a lot of different machine learning framework, ranging from more traditional ML methods like random forest, linear regression etc. (scikit-learn) to deep learning and neural network focused ([pyTorch](https://pytorch.org/), [Tensorflow](https://www.tensorflow.org/), [Darknet](https://pjreddie.com/darknet/), etc.). Then we have the framework that provide more than just the ML methods like HuggingFace, SparkML, MLFlow, etc. 

I know we are interested in deep learning and neural network here, so you will at least need to know the name of pyTorch (and Torch), Tensorflow and Keras. PyTorch was developed by Facebook and Tensorflow was developed by Google. Keras was the new SDK/interface on top of Tensorflow that make it easier to use, it used to be a standalone library but now it is integrated into Tensorflow.

PyTorch is famous for its ease of use (and its dynamic graph) while Tensorflow is more production ready. However, that was the early days, currently, they are quite similar, even in term of syntax.

The differences between ML frameworks shrink even further with the [Open Neural Network Exchange (ONNX)](https://onnx.ai/), ONNX aims to be the common format for neural network model, so you can develop your model in any of the existing framework and can easily import/export the model from/to ONNX format.

For our session, we will look at pyTorch since it is still slightly easier to use than Tensorflow.

## PyTorch 101

So from the previous session, what are the most common components of a neural network (and the training of such network)?
They are:
- The dataset (input and ground truth)
- The model (with different type of layers and activation functions)
- The loss function
- The optimizer

### The dataset
PyTorch provide a class to define a `Dataset`:
```python
from torch.utils.data.dataset import Dataset
```

To create your own dataset, you can extend from this and implement  the `__len__` and `__getitem__` method. Think if this as a Python generator.

A more elabored example:
```python
from typing import Tuple
from abc import abstractmethod

from torch import Tensor
from torch.utils.data.dataset import Dataset


class IImageClassificationDataset(Dataset):
    """
    Interface for image classification dataset.
    """
    @abstractmethod
    def __len__(self) -> int:
        pass

    @abstractmethod
    def __getitem__(self, index: int) -> Tuple[Tensor, Tensor]:
        """
        :param index: sample position inside dataset
        :return: tuple of input image and groundtruth class as tensors
            Input image shape: [C, H, W]
                C: Number of channels for input image. 1 for intensity-based,
                3 for color-based.
            Output label shape: [N]
                N: Number of classes
        """
        pass

```

However, it is not a must to provide a Dataset as input, we can also simply convert our inputs into tensors and feed them to our network.

### The model

The most basic object of a neural network is the `Tensor`. They are the data that is being passed around, from the inputs, to the weights and biases, to the outputs. They are all `Tensor`.
```python
from torch import Tensor
```

The neural network layers reside in `torch.nn` module. Some basic neural network layers:
```python
from torch import nn
nn.Linear # apply a linear transformation to the input
nn.Conv1d # apply a convolution to the input, 1 dimensional
nn.Conv2d
nn.Conv3d
nn.RNN    # apply a recurrent layer
nn.LSTM
nn.GRU
nn.MaxPool1d # Max pooling layer, 1 dimensional
nn.Dropout   # Dropout layer, a powerful yet simple regularization technique
...

# Activation
nn.ReLU
nn.LeakyReLU
nn.Softmax
...
```

A neural network model is defined by extending the `nn.Module` class. We will need to define at least 2 methods:
- `__init__`: define what are the layers in your network
- `forward`: define how the data is being passed through your network

Example:

In [29]:
import torch
from torch import Tensor, nn

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = nn.Dropout(0.25)
        self.dropout2 = nn.Dropout(0.5)
        self.fc1 = nn.Linear(9216, 128)
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()
        self.log_softmax = nn.LogSoftmax(dim=1)
        self.max_pool2d = nn.MaxPool2d((2))

    # x represents our data
    def forward(self, x):
        # Pass data through conv1
        x = self.conv1(x)
        # Use the rectified-linear activation function over x
        x = self.relu(x)

        x = self.conv2(x)
        x = self.relu(x)

        # Run max pooling over x
        x = self.max_pool2d(x)
        # Pass data through dropout1
        x = self.dropout1(x)
        # Flatten x with start_dim=1
        x = torch.flatten(x, 1)
        # Pass data through fc1
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)

        # Apply softmax to x
        output = self.log_softmax(x)
        return output

In [30]:
# Equates to one random 28x28 image
random_data = torch.rand((1, 1, 28, 28))

my_nn = Net()
result = my_nn(random_data)
print(result)

tensor([[-2.3570, -2.3486, -2.1649, -2.4237, -2.3033, -2.2226, -2.4016, -2.2909,
         -2.2010, -2.3461]], grad_fn=<LogSoftmaxBackward0>)


### The loss function

Loss functions also reside in `torch.nn` module.
Refer to [this](https://pytorch.org/docs/stable/nn.html#loss-functions).

Example:
```python
# Initialize the loss function
loss_fn = nn.CrossEntropyLoss()
```

### The optimizer

Optimizers reside in `torch.optim` module. Refer to [this](https://pytorch.org/docs/stable/optim.html#).

Example:
```python
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
```

## Hands-on example

Consider the same simple example from Session 1.

Consider a 1 dimensional world with 2 countries A and B, and there border is at point x = 3. It means that all of the land with value < 3 belongs to country  A and all of the land with value >= 3 belongs to country B. Now create a model that represent this!

First, we can convert the 2 classes into a number representative:

A: 0

B: 1

In [93]:
import numpy as np
from torch import Tensor
from torch.utils.data.dataset import Dataset

class SimpleDataset(Dataset):
    def __init__(self) -> None:
        self.dataset = [
            (Tensor([[2.99]]), Tensor([[0, 1]])),
            (Tensor([[-1]]), Tensor([[0, 1]])),
            (Tensor([[3]]), Tensor([[1, 0]])),
            (Tensor([[3.000001]]), Tensor([[1, 0]])),
            (Tensor([[2.99999]]), Tensor([[0, 1]])),
            (Tensor([[3.000001]]), Tensor([[1, 0]])),
            (Tensor([[10]]), Tensor([[1, 0]])),
        ]
        
    def __len__(self) -> int:
        return len(self.dataset)
    
    def __getitem__(self, index: int) -> tuple[Tensor, Tensor]:
        return self.dataset[index]

In [94]:
dataset = SimpleDataset()
dataset[0]

(tensor([[2.9900]]), tensor([[0., 1.]]))

In [95]:
import torch
from torch import Tensor, nn, optim

class SimpleNet(nn.Module):
    def __init__(self) -> None:
        super(SimpleNet, self).__init__()
        self._first_layer = nn.Linear(1, 2, bias=False)
        self._log_softmax = nn.LogSoftmax(dim=1)
        
    def forward(self, x: Tensor) -> Tensor:
        first = self._first_layer(x)
        
        output = self._log_softmax(first)
        return output

In [96]:
model = SimpleNet()

In [103]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

for epoch in range(100):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(dataset, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        if i % 10 == 0:
            optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
    print(f'[Epoch: {epoch + 1}] loss: {running_loss / len(dataset):.3f}')
    running_loss = 0.0

print('Finished Training')


[Epoch: 1] loss: 2.728
[Epoch: 2] loss: 2.700
[Epoch: 3] loss: 2.672
[Epoch: 4] loss: 2.644
[Epoch: 5] loss: 2.616
[Epoch: 6] loss: 2.588
[Epoch: 7] loss: 2.561
[Epoch: 8] loss: 2.533
[Epoch: 9] loss: 2.506
[Epoch: 10] loss: 2.479
[Epoch: 11] loss: 2.452
[Epoch: 12] loss: 2.426
[Epoch: 13] loss: 2.399
[Epoch: 14] loss: 2.373
[Epoch: 15] loss: 2.347
[Epoch: 16] loss: 2.321
[Epoch: 17] loss: 2.295
[Epoch: 18] loss: 2.269
[Epoch: 19] loss: 2.243
[Epoch: 20] loss: 2.218
[Epoch: 21] loss: 2.192
[Epoch: 22] loss: 2.167
[Epoch: 23] loss: 2.142
[Epoch: 24] loss: 2.117
[Epoch: 25] loss: 2.092
[Epoch: 26] loss: 2.067
[Epoch: 27] loss: 2.043
[Epoch: 28] loss: 2.018
[Epoch: 29] loss: 1.994
[Epoch: 30] loss: 1.970
[Epoch: 31] loss: 1.946
[Epoch: 32] loss: 1.923
[Epoch: 33] loss: 1.899
[Epoch: 34] loss: 1.876
[Epoch: 35] loss: 1.853
[Epoch: 36] loss: 1.830
[Epoch: 37] loss: 1.807
[Epoch: 38] loss: 1.784
[Epoch: 39] loss: 1.762
[Epoch: 40] loss: 1.739
[Epoch: 41] loss: 1.717
[Epoch: 42] loss: 1.695
[

In [107]:
torch.exp(model(Tensor([[3.1]])))

tensor([[0.4222, 0.5778]], grad_fn=<ExpBackward0>)

In [105]:
list(model.parameters())

[Parameter containing:
 tensor([[-0.3267],
         [-0.2255]], requires_grad=True)]

Given the current training samples, this might be the best (or at least the local minimum) fit. This is an example of neural network being an approximation and not an analytical solution.