# Neural Networks in PyTorch

Let's put our knowledge to use and see how one builds a neural network in PyTorch for a given task. If we stick to the iris dataset from before, we know that we have 4 features for every datapoint (flower) and want to predict the species.

This already gives us a lot of information about what we should do. We want to build a classifier that takes the 4 features as input and spits out the probabilities of each species.

We can start with a class inheriting the `nn.Module` again as we have done before. Now we will use some more modules and functions that PyTorch provides in order to do actual machine learning. There are lots of nice building blocks that we can use, [have a look](https://pytorch.org/docs/stable/nn.html).

We often call these building blocks `Layers`. They are also PyTorch `Modules` with specific parameters and operations that are executed for you. We will start very simple, with linear layers and rectified linear activation. The `Linear` layer applies a simple linear transformation to data $x$:

$ y = xA^T + b$. with weight matrix $A$ and bias $b$ being trainable parameters of class `Parameter`.

In [20]:
# let's build a little classifier for iris

import torch.nn as nn
import torch.nn.functional as F

# define the network
class Classifier(nn.Module):
    def __init__(self, in_features:int, hidden_features:int, out_features:int):
        super().__init__()
        self.fc1 = nn.Linear( # define the first fully connected layer
            in_features,
            hidden_features
        )
        self.fc2 = nn.Linear( # define the second fully connected layer
            hidden_features,
            out_features
        )
        
    def forward(self, x):
        z = self.fc1(x) # apply the first fully connected layer
        z = F.relu(z) # apply the relu activation function (non-linearity)
        z = self.fc2(z) # apply the second fully connected layer
        return F.softmax(z, dim=1) # apply the softmax activation function to return probabilities for each class

In [21]:
iris_classifier = Classifier(in_features=4, hidden_features=64, out_features=3) # create an instance of the network

Now that we have an example, we can get a better intuition about how the Linear layer works.

In [22]:
### printing some things that happen in the network forward pass and architecture
dummy_input = torch.randn(1, 4)
print("an input sample would have the shape: ", dummy_input.shape)
print("the first layer weights have the shape: ", iris_classifier.fc1.weight.shape)
print("the first layer bias has the shape: ", iris_classifier.fc1.bias.shape)
print("the first layer output is of shape: ", iris_classifier.fc1(dummy_input).shape)

an input sample would have the shape:  torch.Size([1, 4])
the first layer weights have the shape:  torch.Size([64, 4])
the first layer bias has the shape:  torch.Size([64])
the first layer output is of shape:  torch.Size([1, 64])


The linear layer does in principle this (but more efficient in C):

In [23]:
pseudo_linear_output = torch.matmul(dummy_input, iris_classifier.fc1.weight.t()) + iris_classifier.fc1.bias
print(pseudo_linear_output.shape)

torch.Size([1, 64])


The ReLU activation is one of the simplest and most popular activation functions and looks like this:

<img src="images/ReLU_pytorch.png" width="300">

(image from [PyTorch docs](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html))