# Intro to PyTorch


## General

> An open source machine learning framework that accelerates the path from research prototyping to production deployment.

- Mostly used to create neural networks
- Specialized to use hardware acceleration (GPU, TPU, Tensor Cores etc.)
- __API based on `numpy`__ (`torch.Tensor` instead of `np.array`)

In [1]:
import numpy as np
import torch
# Alias below is pretty common
# one can also use torch directly
import torch.nn.functional as F

X = torch.rand(300, 10)  # Random uniform
print(type(X))

W = torch.randn(10)  # Random normal
b = torch.tensor([1])  # create again another random tensor

y = X @ W + b

print(y.dtype, y.shape)

<class 'torch.Tensor'>
torch.float32 torch.Size([300])


## Data types

PyTorch, similarly to `numpy` provides multiple data types, for example: 

- `torch.float` (32-bit precision)
- `torch.double` (64-bit precision)
- `torch.half` (16-bit precision)

and many others (see [here](https://pytorch.org/docs/stable/tensor_attributes.html)).

> Usually we will use floating point values (either `float` or `half`), depending on context

### Why not double?

> Default `dtype` in PyTorch is `float` because __it doesn't take up so much memory and is accurate enough__

Also, GPU memory is costly (and there isn't enough of it usually), hence lower precision (up to a certain point) might be a good solution.

## Casting

One can easily cast PyTorch tensors to desired data types, see below:

In [2]:
import numpy as np

array = np.random.randn(10, 5)
tensor = torch.from_numpy(array)
print(array.dtype, tensor.dtype)

float64 torch.float64


In [3]:
# cast to half type
new_tensor = tensor.half()

new_tensor.dtype

torch.float16

In [4]:
# numpy interoperability
new_tensor.numpy()

array([[ 2.096  ,  0.3977 ,  1.913  , -0.2268 ,  0.5503 ],
       [-0.838  , -0.693  , -1.229  , -0.961  ,  1.445  ],
       [ 0.5923 , -2.684  ,  0.3794 ,  0.7896 ,  0.0817 ],
       [-0.2156 ,  1.046  , -0.1942 ,  0.1126 , -0.0657 ],
       [-0.899  ,  1.178  ,  0.7314 , -0.4578 ,  1.394  ],
       [ 0.5015 ,  0.747  ,  0.452  , -0.555  ,  1.456  ],
       [-0.2515 ,  1.233  ,  0.753  , -0.01872, -1.093  ],
       [ 0.2747 , -0.95   , -1.289  , -0.1742 , -1.067  ],
       [ 0.1503 ,  1.081  ,  0.4014 , -2.285  ,  0.788  ],
       [ 0.285  ,  1.14   , -0.652  ,  0.97   ,  0.9097 ]], dtype=float16)

In [5]:
# upcasting
(new_tensor + tensor).dtype

torch.float64

## Device type

PyTorch can utilize multiple device types. In general:
- we use CPU for data loading
- we use specialized devices (usually GPU, sometimes TPU) for running the data through neural network

> TPU support is currently experimental, see challenges for more info

Let's start by checking if GPU is available on our devices:

In [6]:
torch.cuda.is_available()

True

Based on this information we can create a special device type that we can later use:

In [7]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(device)

cuda


In later sections (basics of training) you will see how we can use this device variable
for device agnostic code.

## Automatic differentiation

In order for neural networks to learn we need to calculate gradients of `loss` w.r.t. parameters (like we did with linear regression previously).

> This time differentiation graph (__sometimes also called a tape__) is provided by PyTorch

![](./images/grad.jpg)

To use PyTorch's [autograd](https://pytorch.org/docs/stable/autograd.html) we need a few changes in the above code.

First, we have to mark tensors which require gradient using `requires_grad=True` argument during creation:

> Most of PyTorch functions creating tensors like `rand`, `randn` etc. have `requires_grad` as an optional parameter!

In [8]:
W = torch.randn(10, requires_grad=True)
b = torch.tensor([1.], requires_grad=True)

> Only tensor of floating data type can have gradient! __No integers or a-like__

After that we can use them normally:

In [9]:
y = X @ W + b

loss = y.sum()

## Running backpropagation

Like we did during "Gradient Methods" we can run backpropagation algorithm explicitly.

> In PyTorch we run backpropagation __on tensor__

In [10]:
print(W.grad, b.grad)

loss.backward()

# Use .grad attribute 
print(W.grad, b.grad)

None None
tensor([147.2227, 142.4924, 138.7641, 148.3273, 153.3686, 148.1476, 156.7694,
        146.8128, 155.7344, 154.1845]) tensor([300.])


> Implicitly tensor with `1` is fed into `backward` __if tensor is a scalar!__

> If tensor is not a scalar, you have to provide a tensor with initial gradient of specified shape, see [here](https://pytorch.org/docs/stable/autograd.html#torch.autograd.backward)

## grad_fn, how PyTorch keeps track of operations

- __PyTorch keeps functions which created the tensor (if any) inside `grad_fn`__ attribute
- if `grad_fn` is `None` it is a tensor which:
    - was created by user explicitly (either with `requires_grad` set to `True` or `False`)
    
See below:

In [11]:
print(y.grad_fn, y.is_leaf, y.requires_grad)

print(W.grad_fn, W.is_leaf, W.requires_grad)

print(X.grad_fn, X.is_leaf, X.requires_grad)

<AddBackward0 object at 0x7f545fcafdf0> False True
None True True
None True False


## torch.nn.Module

> [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html) is a base class for every deep learning model in PyTorch (usually neural networks)

Given that, we will inherit it from it each time we create a more complicated module.

Let's see how we can code up linear regression:

In [12]:
import torch.nn as nn  # also common alias


class LinearRegression(torch.nn.Module):
    def __init__(self, n_features: int):
        # This line is always required at the beginning
        # Registers parameters of our model in graph
        super().__init__()

        self.W = torch.nn.Parameter(torch.randn(n_features))
        self.b = torch.nn.Parameter(torch.ones(1))
        self.other_tensor = torch.randn(5)

    def forward(self, X):
        return X @ self.W + self.b

### torch.nn.Parameter

> If we want a tensor to be a part of `nn.Module` we have to wrap it inside `nn.Parameter`

Let's see what parameters our model currently has:

In [13]:
model = LinearRegression(15)

# named_parameters is a generator, you can also use parameters method
for name, parameter in model.named_parameters():
    print(name, parameter.shape)

W torch.Size([15])
b torch.Size([1])


As one can see `self.other_tensor` __is not registered as a parameter__.

This means, we won't be able to easily optimize it and it is "merely an attribute".

## forward method

Users should implement logic of the model (how data goes through neural network) inside this method.

> When running data through our model we will use `__call__` method. __This ensures any hooks registered for module will run correctly__

In [14]:
output = model(torch.randn(64, 15))

output.shape

torch.Size([64])

## Input shape

> PyTorch requires `(batch_size, n_features1, ..., n_features2)` tensors as input

In the case above, batch size was `64` with `15` input features to linear regression.

## Exercise

- Recreate linear regression using [`torch.nn.Linear`](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) (single line)
- Create __LogisticRegression__ using `torch.nn.Linear` inside `nn.Module` (simply assign to `self`. It should take `in_features` and `out_features` and return output from the `nn.Linear`.

In [15]:
linear_regression = ...

class LogisticRegression(...):
    ...

## Basic training loop

Let's assume we have our data already in place (we will use `torch.random.randn` as input, __we will talk about data and datasets in the next chapter__).

Below is a "standard" (skeletonized) training loop for regression tasks.

> __Loop like this is prevalent when using "pure" PyTorch__

In [None]:
model = model.to(device)
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Dummy data
X, y = torch.randn(64, 15), torch.randn(64)
X, y = X.to(device), y.to(device)

for epoch in range(20):
    outputs = model(X)
    loss = criterion(outputs, y)
    
    # Perform backpropagation
    loss.backward()
    
    # Perform optimization step & zero-out gradient
    optimizer.step()
    optimizer.zero_grad()
    
    print(f"EPOCH: {epoch} | LOSS: {loss.detach()}")

## Casting to device

- __Both data and module have to be casted to device__
- Loss is calculated using specified "criterion", in our case Mean Squared Error
- `backward()` is run on resulting tensor loss
- Optimizer takes a step and it's gradient has to be zero-ed out before next optimization step (otherwise it would be accumulated)

## Summary

- PyTorch can be considered as `numpy` on GPU for neural networks
- PyTorch provides different data types:
    - `float` is a good default value (good balance between necessary precision, performance and memory usage)
- PyTorch can run on different devices:
    - GPU is used for running neural networks
    - CPU is used for data loading and other intensive tasks
- We should write device agnostic code (basics shown here)
- We should inherit from `torch.nn.Module` when creating neural networks
    - Override `__init__` and `forward`
    - Use this model via functor `__call__`
- Basic PyTorch training loop consists of:
    - Setting up criterion
    - Casting model and data to device
    - Backpropagating loss
    - Taking optimizer step
    - Zeroing out gradient in model using `optimizer.zero_grad()`


## Challenges

- Use your created logistic regression in this loop. Change `y` to be
- What is "automatic mixed precision"? Check out [this part of documentation](https://pytorch.org/docs/stable/notes/amp_examples.html). When one should use it?
- Read more about [TPUs](https://medium.com/pytorch/get-started-with-pytorch-cloud-tpus-and-colab-a24757b8f7fc) provided by Google. Which neural networks are likely to get performance improvements with this device?
- Check out [CUDA semantics](https://pytorch.org/docs/stable/notes/cuda.html) in PyTorch. How to choose specific GPU device if there are multiple of them?
- What are the aforementioned hooks? Check out [this article](https://medium.com/the-dl/how-to-use-pytorch-hooks-5041d777f904) for more information
- Check available function and attributes of [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)