# Introduction to Pytorch
November, 2023

In this tutorial, we will introduce the basic principles of Pytorch. 

## Table of Contents

* 1) Tensors
* 2) Autograd
* 3) Models
* 4) Training loop
* 5) Datasets


In [2]:
import numpy as np
import matplotlib.pyplot as plt
import torch

### 1) Tensors
Tensors are a specialized data structure that are similar to  NumPy’s ndarrays, except that tensors can run on GPUs or other hardware accelerators.
Furthermore, tensors are optimized for automatic differentiation. 

Let's look at some basic manipulations of tensors.

In [None]:
a = torch.tensor([5., 3.])
print(a)
print(a.dtype)

Numpy's ndarrays and Pytorch's tensors are highly compatible and it is easy to switch between them. 
This is a way to create an object with the same undelying memory. 
This means that changes to the new tensor are reflected to the ndarray.

In [None]:
a = np.array([[1,2],[3,4]])
print(f'Numpy array:\n {a}')
b = torch.from_numpy(a)
print(f'Tensor created from the array:\n {b}')
b[:,0] = 8
print(f'Numpy array after changes in the tensor:\n {a}')

The conversion works also in the other direction, with the same rule.

In [None]:
b = torch.rand(2,3)
print(f'Tensor:\n {b}')
a = b.numpy()
print(f'Numpy array:\n {a}')
a[:,0] = 1
print(f'Numpy array:\n {a}')
print(f'Tensor array:\n {b}')

If we do not wish the two objects to use the same undelying memory, torch.tensor() creates a copy of the data.

In [None]:
a = np.array([[1,2],[3,4]])
b = torch.tensor(a)
print(f'Numpy array:\n {a}')
print(f'Tensor created from the array:\n {b}')
b[0,0]=8
print(f'Numpy array after changes in the tensor:\n {a}')

### 2) Autograd
The backpropagation algorithm involves adjusting model parameters based on the gradient of the loss function relative to the specific parameter. 
To calculate these gradients, PyTorch utilizes an integrated differentiation engine called torch.autograd, which facilitates the automatic computation of gradients for any computational graph

In [3]:
x = torch.ones(5)  # input tensor
l = torch.zeros(3)  # labels
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
loss = torch.mean(l - z)

w and b are the paramiters that we wish to optimize. 
To achieve this, it is essential to compute the gradients of the loss function wrt w and b.
In order to do that, we set the requires_grad property of those tensors.
This function, which is an Object, construct  a computational graph. 
This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step.

In [None]:
print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for b = {b.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

To compute the gradients:

In [10]:
loss.backward()
print(w.grad)
print(b.grad)

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

### 3) Models

We create our **neural network** as a subclass of nn.Module, which is a base class for all neural network modules.
This gives us more flexibility in network design.
* Create NN as child class of nn.Module
* Use super() to inherit all the methods and properties from the partent class
* Define learnable parameters and forward propagation

If not specified, all weights and biases will assume random initial values.

There are other ways to create NNs, e.g. nn.sequential.

In [11]:
from torch import nn

In [12]:
class SimpleNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.lin1 = nn.Linear(1, 2)
        self.relu1 = nn.ReLU()
        self.lin2 = nn.Linear(2, 1)
        self.lin1.weight = nn.Parameter(torch.Tensor([[-0.5], [1.5]])) # It is possible to assigned desired initial values
        self.lin1.bias = nn.Parameter(torch.Tensor([0.5, -1]))
        

    def forward(self, y):
        y = self.relu1(self.lin1(y))
        return self.lin2(y)

In [13]:
# Initialize the network
net = SimpleNet()
print(net)

SimpleNet(
  (lin1): Linear(in_features=1, out_features=2, bias=True)
  (relu1): ReLU()
  (lin2): Linear(in_features=2, out_features=1, bias=True)
)


To use the model, we pass directly the input data. This executes the model’s forward, along with some background operations. 

Do not call model.forward()!

In [20]:
# Create input tensor
y = torch.tensor(2.).reshape(1,)
# Create target tensor()
t = torch.tensor(3.5).reshape(1,)
# Calculate output of the network
xhat = net(y)
print(f'Output of the network {xhat}')

Output of the network tensor([-1.2888], grad_fn=<ViewBackward0>)


### 4) Training loop

We need an **optimizer** which is an object, that will hold the current state and will update the parameters based on the computed gradients.

torch.optim support most common  optimization algorithms, such as (stochastic) gradient descent.

Varoius gradient descent optimization algorithms exist and are used in practice, to overcome some of the gradient descent limitation, namely:
* Choice of the learning rate
* Adaptivity of the learning rate
* Local minima

Adaptive Moment Estimation (Adam) optimizer is usually a good starting choice,

In [21]:
from torch import optim
optimizer = optim.SGD(net.parameters(), lr=0.1)
optimizerAdam = optim.Adam(net.parameters(), lr =0.1)

In [22]:
# Select a loss function
loss_fn = nn.MSELoss()

In [24]:
# repeat until convergence or for desired number of iterations (epochs):
# forward propagation
xhat = net(y)
# compute loss
loss = loss_fn(xhat,t)
# reset gradient
optimizerAdam.zero_grad()
# backprop
loss.backward()
print(f'Gradient with respect to w1: {net.lin1.weight.grad[1]}')
# update parameters
optimizerAdam.step()
print(f'w1 after the weght update: {net.lin1.weight[1]}')

Gradient with respect to w1: tensor([-16.4207])
w1 after the weght update: tensor([0.9304], grad_fn=<SelectBackward0>)


### 5) Datasets


The DataLoader class in PyTorch that helps us to load and iterate over elements in a dataset.

To improve the efficiency of the gradient descent algorithm, the data is genereally divided into subsets (**batches**), and the parameters of the networks gets updated calculating the gradient over the batch rather than the whole dataset (which could be huge). During the training, for every iteration over the entire dataset (**epoch**) we update the parameters for every batch.

The DataLoader manages the data set for training such that we can extract the training batches from it during training.

In [25]:
# Create data
x = 2 * np.random.rand(100_000) - 1
y = np.tanh(x)
# Convert data to tensors
y_t = torch.Tensor(y).reshape(-1,1)
x_t = torch.Tensor(x).reshape(-1,1)

lr = 0.001  # learning rate
batch_size = 1_000  
num_epochs = 10
optimizer = optim.Adam(net.parameters(), lr=lr)
dataloader = torch.utils.data.DataLoader(torch.utils.data.TensorDataset(y_t, x_t),
                                             batch_size=batch_size)

In [28]:
# Training loop
for j in range(num_epochs):
    for (yi, xi) in dataloader:
        x_hat = net(yi)
        loss = loss_fn(x_hat, xi)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    if j % 2 == 0:
        x_hat = net(y_t)
        loss = loss_fn(x_hat, x_t)
        print(f'epoch {j}: Loss = {loss.detach().numpy() :.2e}')

epoch 0: Loss = 6.86e-03
epoch 2: Loss = 4.98e-03
epoch 4: Loss = 3.75e-03
epoch 6: Loss = 2.96e-03
epoch 8: Loss = 2.45e-03


It is in general a good practice to divide the available training data into 3 sets: training, validation and test.

Trining data are used to train the network, valdation are used to tune the hyperparameters.
Finally the test dataset can only be used once the training is over and we wish to evaluate the performance of the network.