## A practical introduction to PyTorch

The aim of this tutorial is to offer a quick view of the capabilities of PyTorch. PyTorch is a deep learning framework with a style that closely resemples that of numpy (https://numpy.org/).

As a remainder, the fundamental objects in numpy (and in any scientific computing library) are arrays, and the operations we can call of them, for instance:

In [6]:
import numpy as np

A = np.array([[1., 2.], [3., 4.]])
B = np.array([[0., 1.], [0., 1.]])

In [2]:
C = A + B
C

array([[1., 3.],
       [3., 5.]])

In [3]:
np.sum(A)

10.0

PyTorch is similar, though the main objects are called tensors (multidimensional arrays)

In [5]:
import torch

A = torch.tensor([[1., 2.], [3., 4.]])
B = torch.tensor([[0., 1.], [0., 1.]])

In [19]:
C = A + B
C

tensor([[1., 3.],
        [3., 5.]])

In [20]:
torch.sum(A)

tensor(10.)

It is really that simple! Almost every function in numpy has an equivalent one in pytorch (the full list of functions can be found at https://pytorch.org/docs/stable/torch.html)

You can convert between numpy arrays and pytorch tensors easily, using

In [21]:
C.numpy()

array([[1., 3.],
       [3., 5.]], dtype=float32)

In [22]:
torch.from_numpy(C.numpy())

tensor([[1., 3.],
        [3., 5.]])

A very useful tensor attribute (specially for debugging purposes) is .shape, which gives us the dimensions of our tensor

In [24]:
C.shape

torch.Size([2, 2])

### Then, why use PyTorch instead of numpy?

Until now, it seems that PyTorch can do the same as numpy. But there are a lot of extensions that make it extremely useful in machine learning. Let's have a look at them

1. Autograd

2. GPU

3. Abstractions

#### Automatic differentiation

Pytorch can compute gradients of any function you can write using pytorch functions. You don't need to write gradients by hand, and PyTorch does not make use of numerical differences. It rather relies in keeping track of the operations you define, and then strategically using the chain rule, as in https://en.wikipedia.org/wiki/Automatic_differentiation

To do this, you just need to activate a flag in the variables you are interested in computing gradients wrt.
For example, let's define a function $f(x) = \sum_{i=1}^{10} x_i^2$, where $x \in \mathbb{R}^{10}$, and suppose we want to compute $\nabla f(x)$ at $x = (1, 1, \ldots, 1)$

First, we define the input. Instead of torch.tensor, we just use torch.ones, similar to numpy.
But look that we have added a flag indicating that we are interested in computing a gradient wrt this variable later

In [109]:
x = torch.ones(10, requires_grad=True)
x

tensor([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], requires_grad=True)

Now, we just define the operations of the funcion. You could encapsulate it inside of a python function, but this is not really neccesary:

In [110]:
y = torch.sum(x**2)
y

tensor(10., grad_fn=<SumBackward0>)

And finally, we just need to call the method backward() to compute the derivative $\frac{\partial y}{\partial x}$. Then, we can go to any variable with the requires_grad flag from before and see the gradient:

In [111]:
y.backward()

In [112]:
x.grad

tensor([2., 2., 2., 2., 2., 2., 2., 2., 2., 2.])

This was a really simple example, but you can really compute derivatives through really complex code!!

In [113]:
x = torch.ones(5, requires_grad=True)

In [114]:
z = x
while z[0] >= 0.2:
    z = torch.sin(z)
y = torch.sum(z)

In [115]:
y.backward()
x.grad

tensor([0.0062, 0.0062, 0.0062, 0.0062, 0.0062])

#### GPU acceleration

Your CPU will probably have between 4 and 16 cores, so the parallelization is somewhat computing. If you have a GPU (nvidia card with the cuda drivers installed), you can use the larger number of cores it has available to accelerate matrix computations.

Let's see a simple example of computing the square of a big random matrix, using numpy, pytorch on CPU and pytorch on GPU

In [22]:
A = np.random.randn(5000, 5000)

In [23]:
%%timeit
B = A ** 2

28.6 ms ± 659 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [24]:
A = torch.from_numpy(A)

In [25]:
%%timeit
B = A ** 2

12.3 ms ± 249 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


With Pytorch, you can simple compute using the GPU using .to('cuda')

In [26]:
A = A.to('cuda')

In [27]:
%%timeit
B = A ** 2

2.46 ms ± 53.4 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Impressive! we have reduced the compute time from 28 ms to 2.5ms, that was more than 10X faster!!

#### Deep learning abstractions

On top of that, pytorch offers subpackages containing abstractions for deep learning.
For example:

1. torch.nn contains several types of layers for your neural networks (https://pytorch.org/docs/stable/nn.html)

2. torch.optim contains several optimizers already implemented for you, such as SGD, Adam, etc (https://pytorch.org/docs/stable/optim.html)

We will cover these in next notebooks!