<center><img src="img/torch.png" alt="drawing" width="300"/></center>

[PyTorch](https://github.com/pytorch/pytorch) is a free and open-source machine learning framework for Python that facilitates building deep learning projects, originally developed by Meta AI and now part of the Linux Foundation umbrella. It emphasizes flexibility and allows deep learning models to be expressed in idiomatic Python. This approachability and ease of use found early adopters in the research community, and in the years since its first release, it has grown into one of the  most prominent deep learning tools across a broad range of applications.

It provides two high-level features:
* Tensor computation (like NumPy) with strong GPU acceleration.
* Deep neural networks built on a tape-based autograd system.

As Python does for programming, PyTorch provides an excellent introduction to deep learning. It minimizes cognitive overhead while focusing on flexibility and speed. It also defaults to immediate execution for operations. At the same time, PyTorch has been proven to be fully qualified for use in professional contexts for real-world, high-profile work.

PyTorch library ecosystem contains some useful libraries and submodules that we will be using throughout the notebooks. Here are some of the most important ones:

* **torch**: Basic Pytorch library.
* **torch.nn**: Basic and elegantly designed submodule developed to help create and train NNs. It allows easy prototyping and the building of complex models in just a few lines of code.
* **torch.utils**: Basic utils submodule for Pytorch.
* **torchmetrics**: Basic metric Pytorch library.
* **torchviz**: Basic model visualization library.
* **torchinfo**: Basic library for summarizing a PyTorch model.
* **torchvision**: Basic computer vision library for Pytorch.
* **torchtext**: Basic text library for Pytorch.

In [1]:
import numpy as np
import torch

In [2]:
print(f'Torch Version: {torch.__version__}')

Torch Version: 2.1.0


# Tensors

## Creating Tensors

In [3]:
# Scalar (rank 0 tensor)
scalar = torch.tensor(7)
scalar

tensor(7)

In [4]:
# Vector (rank 1 tensor)
vector = torch.tensor([7,7])
vector

tensor([7, 7])

In [5]:
# 2D-matrix (rank 2 tensor)
matrix_2d = torch.tensor([[7,8],
                          [9,10]])
matrix_2d 

tensor([[ 7,  8],
        [ 9, 10]])

In [6]:
# 3D-matrix (rank 3 tensor)
matrix_3d = torch.tensor([[[1,2,3],
                           [3,6,9],
                           [2,4,5]]])
matrix_3d

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

### Zeros And Ones

In [7]:
# Tensor of given dimensions full with 0
zeros = torch.zeros(2,3,5)
zeros

tensor([[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]])

In [8]:
# Tensor of given dimensions full with 1
ones = torch.ones(2,3,5)
ones

tensor([[[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]],

        [[1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.],
         [1., 1., 1., 1., 1.]]])

### Random Tensors

In [9]:
# Random tensor of specific size
random_tensor = torch.rand(size=(1,3,3))
random_tensor

tensor([[[0.7107, 0.2321, 0.0438],
         [0.3593, 0.3914, 0.0352],
         [0.0820, 0.4086, 0.1291]]])

### Ranges

In [10]:
# Tensor of a given range
arange = torch.arange(start=0, end=10, step=2)
arange

tensor([0, 2, 4, 6, 8])

### NumPy

We can transform Pytorch tensor to Numpy arrays and vice versa.

In [11]:
# Numpy array
numpy_array = np.arange(10)
numpy_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [12]:
# Numpy array to torch tensor
torch_tensor = torch.from_numpy(numpy_array)
torch_tensor

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [13]:
# Torch tensor to numpy array
numpy_array = torch.Tensor.numpy(torch_tensor)
numpy_array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## Indexing Tensors

We can **index** tensors just like we do in Python lists or NumPy arrays.

In [14]:
# Get first and second value of vector
vector[0], vector[1]

(tensor(7), tensor(7))

In [15]:
# Get the actual value instead of a tensor.
vector[0].item()

7

In [16]:
# Indexing notation for tensor (matrix_3d[0,:,1:3] also works)
matrix_3d[0][:][1:3]

tensor([[3, 6, 9],
        [2, 4, 5]])

## Attributes Of Tensors

We can access certain **attributes** of tensors as follows.

### Data Type: `dtype`

The `dtype` argument (deliberately similar to the standard NumPy argument of the same name) specifies the numerical data type that will be contained in the tensor. The data type specifies the possible values the tensor can hold (integers versus floating point numbers) and the number of bytes per value.

Here’s a list of the possible values for the dtype argument:
- `torch.float32` or `torch.float: 32-bit` floating-point
- `torch.float64` or `torch.double`: 64-bit, double-precision floating-point 
- `torch.float16` or `torch.half`: 16-bit, half-precision floating-point
- `torch.int8`: signed 8-bit integers
- `torch.uint8`: unsigned 8-bit integers
- `torch.int16` or `torch.short`: signed 16-bit integers
- `torch.int32` or `torch.int`: signed 32-bit integers
- `torch.int64` or `torch.long`: signed 64-bit integers
- `torch.bool`: Boolean

The default data type for tensors is 32-bit floating-point `torch.float32`.

In [17]:
# Get the data type of tensor
matrix_3d.dtype

torch.int64

In [18]:
# Change from one data type to another
matrix_3d = matrix_3d.to(torch.float64)
matrix_3d.dtype

torch.float64

### Rank (Number Of Dimensions): `ndim`

Attribute `ndim` informs as about the rank or number of dimensions of a tensor.

<center><img src="img/torch_01_01.png" alt="drawing" width="400"/></center>

In [19]:
# Rank (number of dimensions) of a tensor
scalar.ndim, vector.ndim, matrix_2d.ndim, matrix_3d.ndim

(0, 1, 2, 3)

### Shape (Number Of Elements): `shape`

Attribute `shape` informs us about the size of the tensor along each dimension.

In [28]:
# Number of elements for each dimension of a tensor
scalar.shape, vector.shape, matrix_2d.shape, matrix_3d.shape

(torch.Size([]), torch.Size([2]), torch.Size([2, 2]), torch.Size([1, 3, 3]))

Here is an illustration example of what `matrix_3d.shape` is exactly.

<center><img src="img/torch_01_02.png" alt="drawing" width="500"/></center>

## Manipulating Tensors


As we will see in upcoming sections, certain operations require that the input tensors have a certain number of dimensions (rank) associated with a certain number of elements (shape). Thus, we might need to change the shape of a tensor, add a new dimension, or squeeze an unnecessary dimension. PyTorch provides useful functions (or operations) to achieve this:

* **Transpose**: Transposes a tensor.
* **Reshape**: Reshapes an input tensor to a defined shape.
* **View**: Returns a view of a tensor of certain shape keeping the same memory as the original tensor. Changing the view changes the original tensor because they share the same memory.
* **Squeeze**: Removes all `1` dimensions from a tensor.
* **Unsqueeze**: Adds a `1` dimension to a target tensor .

In [29]:
a = torch.tensor([[1,2,3],[3,4,5]])
a

tensor([[1, 2, 3],
        [3, 4, 5]])

In [30]:
# Transpose tensor
a.transpose(dim0=0, dim1=1)

tensor([[1, 3],
        [2, 4],
        [3, 5]])

In [31]:
# Reshape tensor from 2x3 to 3x2
a.reshape(3,2)

tensor([[1, 2],
        [3, 3],
        [4, 5]])

In [32]:
# Change the view of tensor (use same memory)
a.view(3,2)

tensor([[1, 2],
        [3, 3],
        [4, 5]])

In [33]:
# Unsqueeze tensor
unsqueezed_a = a.unsqueeze(1)
unsqueezed_a

tensor([[[1, 2, 3]],

        [[3, 4, 5]]])

In [34]:
# Squeeze tensor
unsqueezed_a.squeeze()

tensor([[1, 2, 3],
        [3, 4, 5]])

## Tensor Operations

Applying mathematical operations, in particular linear algebra operations, is necessary for building most machine learning models. In this subsection, we will cover some widely used linear algebra operations.

In [35]:
tensor = torch.tensor([[10, 7], [3, 4]])

In [36]:
# Addition, Multiplication, Subtraction, Division, Power
tensor + 10, tensor * 10, tensor - 10, tensor /10, tensor**10

(tensor([[20, 17],
         [13, 14]]),
 tensor([[100,  70],
         [ 30,  40]]),
 tensor([[ 0, -3],
         [-7, -6]]),
 tensor([[1.0000, 0.7000],
         [0.3000, 0.4000]]),
 tensor([[10000000000,   282475249],
         [      59049,     1048576]]))

We can also use the following function for the basic operations.

In [38]:
a = torch.tensor([[3, 6]], dtype=torch.float32)
b = torch.tensor([[2, 2]], dtype=torch.float32)

In [39]:
# Element wise addition
addition = torch.add(a,b) # a.add(b) also works

# Element wise multiplication
multiplication = torch.multiply(a,b) # a.multiply(b) also works

# Matrix multiplication (transpose needed to match tensor dimensions)
matrix_multiplication = torch.matmul(a, b.transpose(0,1)) # a.matmul(b.transpose(0,1)) also works

# Element wise division
division = torch.divide(a, b) # a.divide(b) also works

# Element wise rise to power
power = torch.pow(a,b) # a.pow(b) also works

addition, multiplication, matrix_multiplication, division, power

(tensor([[5., 8.]]),
 tensor([[ 6., 12.]]),
 tensor([[18.]]),
 tensor([[1.5000, 3.0000]]),
 tensor([[ 9., 36.]]))

Some more basic operation functions.

In [40]:
# Square
square = torch.square(a) # a.square() also works

# Square root (needs changing type)
sqrt = torch.sqrt(a) # a.sqrt() also works

# Log (needs changing type)
log = torch.log(a) # a.log() also works

# Absolute values
absolute_value = torch.abs(a) # a.abs() also works

# Minimum
minimum = torch.min(a) # a.min() also works

# Minimum element position
argmin = torch.argmin(a) # a.argmin() also works

# Maximum
maximum = torch.max(a) # a.max() also works

# Maximum element position
argmax = torch.argmax(a) # a.argmax() also works

# Mean
mean = torch.mean(a) # a.mean() also works

# Sum
sumation = torch.sum(a) # a.sum() also works

square, sqrt, log, absolute_value, minimum, argmin, maximum, argmax, mean, sumation

(tensor([[ 9., 36.]]),
 tensor([[1.7321, 2.4495]]),
 tensor([[1.0986, 1.7918]]),
 tensor([[3., 6.]]),
 tensor(3.),
 tensor(0),
 tensor(6.),
 tensor(1),
 tensor(4.5000),
 tensor(9.))

## Splitting & Combining Tensors

In this subsection, we will cover PyTorch operations for splitting a tensor into multiple tensors, or the reverse: stacking and concatenating multiple tensors into a single one.

Assume that we have a single tensor, and we want to split it into two or more tensors. For this, PyTorch provides a convenient `torch.chunk()` function, which divides an input tensor into a list of equally sized tensors. We can determine the desired number of splits as an integer using the chunks argument to split a tensor along the desired dimension specified by the dim argument. In this case, the total size of the input tensor along the specified dimension must be divisible by the desired number of splits. 

In [41]:
t = torch.rand(6)
t

tensor([0.7484, 0.2492, 0.5632, 0.8910, 0.5715, 0.1486])

In [42]:
# Chunk tensor
torch.chunk(t, 3)

(tensor([0.7484, 0.2492]), tensor([0.5632, 0.8910]), tensor([0.5715, 0.1486]))

Alternatively, we can provide the desired sizes in a list using the `torch.split(`) function.

In [43]:
# Split tensor
torch.split(t, [4,2])

(tensor([0.7484, 0.2492, 0.5632, 0.8910]), tensor([0.5715, 0.1486]))

Sometimes, we are working with multiple tensors and need to concatenate or stack them to create a single tensor. In this case, PyTorch functions such as `torch.cat()` and `torch.stack()` come in handy.

In [44]:
A = torch.ones(3)
B = torch.zeros(3)
A, B

(tensor([1., 1., 1.]), tensor([0., 0., 0.]))

In [45]:
# Concatenate tensors together
torch.cat([A, B], axis=0)

tensor([1., 1., 1., 0., 0., 0.])

In [46]:
# Stack tensors together
torch.stack([A, B], axis=1)

tensor([[1., 0.],
        [1., 0.],
        [1., 0.]])

## Computation Graphs
PyTorch performs its computations based on a Directed Acyclic Graph (DAG). It uses this computation graph to derive relationships between tensors from the input all the way to the output. The computation graph is simply a network of nodes. Each node resembles an operation, which applies a function to its input tensor or tensors and returns zero or more tensors as the output. PyTorch builds this computation graph and uses it to compute the gradients accordingly.

<center><img src="img/torch_01_03.png" alt="drawing" width="250"/></center>

In [47]:
# Create a graph for evaluating z = 2 × (a – b) + c,
def compute_z(a, b, c):
    r1 = torch.sub(a, b)
    r2 = torch.mul(r1, 2)
    z = torch.add(r2, c)
    return z

# To carry out the computation, we call `compute_z` with tensor objects as function arguments
compute_z(torch.tensor(1), torch.tensor(2), torch.tensor(3))

tensor(1)

In PyTorch, a special tensor object for which gradients need to be computed allows us to store and update the parameters of our models during training. Such a tensor can be created by just assigning `requires_grad` to `True` on user-specified initial values. 

***Note**: Only tensors of floating point and complex dtype can require gradients)*

In [48]:
# `requires_grad` is set to `False` by default
a = torch.tensor([1.0, 2.0, 3.0])
print(a.requires_grad)

# `requires_grad` can be set to `True` by running `requires_grad_()`
a.requires_grad_()
print(a.requires_grad)

# Or it can be set to `True` upon construction
a = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
a

False
True


tensor([1., 2., 3.], requires_grad=True)

PyTorch supports automatic differentiation, which can be thought of as an implementation of the chain rule for computing gradients of nested functions. When we define a series of operations that results in some output or even intermediate tensors, PyTorch provides a context for calculating gradients of these computed tensors with respect to its dependent nodes in the computation graph. To compute these gradients, we can call the `backward` method from the `torch.autograd` module. It computes the sum of gradients of the given tensor with regard to leaf nodes (terminal nodes) in the graph.

In [49]:
def linear_fn(w,b,x): 
    return torch.add(torch.mul(w, x), b)

In [50]:
# parameters (weight and bias)
w = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(0.5, requires_grad=True)

# a datapoint (x_0,y_0)
x_0 = torch.tensor([1.4])
y_0 = torch.tensor([2.1])

# compute linear function
y = linear_fn(w=w, b=b, x=x_0)

# compute loss
loss = (y_0 - y).pow(2).sum()

# compute gradient manually by computing derivatives
dloss_dw = (2 * (y_0 - y) * (-x_0)).sum()
dloss_db = (2 * (y_0 - y) * (-1)).sum()

# compute gradients with `backward`
loss.backward()

# verify that they are the same
(dloss_dw == w.grad).item(), (dloss_db == b.grad).item()

(True, True)