# 00. PyTorch Fundamentals

## Importing PyTorch

In [1]:
import torch
torch.__version__

'1.13.1+cpu'

## Introduction to Tensors

Tensors represent data in a numerical way. For example, you could represent an image as a tensor of the shape `[3, 224, 224]`, which might stand for `[color_channels, height, width]`. This would be a three-dimensional tensor.

### Creating Tensors

A **scalar** is a single number, meaning it is a **single-dimensional tensor**.

In [2]:
scalar = torch.tensor(7)
scalar

tensor(7)

We can check the dimension using the `ndim` attribute.

In [3]:
scalar.ndim

0

Get the number from the tensor using the `item()` method.

In [4]:
scalar.item()

7

A **vector** is a single-dimension tensor but can contain multiple numbers.

In [5]:
vector = torch.tensor([7, 7])
vector

tensor([7, 7])

In [6]:
vector.ndim

1

The `shape` attribute can give you an idea of how the elements within the tensor are arranged:

In [7]:
vector.shape

torch.Size([2])

This means our vector has a shape of `[2]`, since we placed two elements inside the square brackets.

We can also create **matrices**:

In [8]:
MATRIX = torch.tensor([[7, 8],
                      [9, 10]])
MATRIX

tensor([[ 7,  8],
        [ 9, 10]])

Matrices are flexible like vectors, but contain an extra dimension:

In [9]:
MATRIX.ndim, MATRIX.shape # `MATRIX` is two elements deep and two elements wide

(2, torch.Size([2, 2]))

We can create a tensor like this:

In [10]:
TENSOR = torch.tensor([[[1, 2, 3],
                        [3, 6, 9],
                        [2, 4, 5]]])
TENSOR

tensor([[[1, 2, 3],
         [3, 6, 9],
         [2, 4, 5]]])

Tensors can represent almost anything, this one might represent the sales numbers for steak and almond butter:

| Day of week | 1 | 2 | 3 |
------------- | - | - | - |
| Steak sales | 3 | 6 | 9 |
| Almond butter sales | 2 | 4 | 5 |

In [11]:
TENSOR.ndim, TENSOR.shape

(3, torch.Size([1, 3, 3]))

> In practice, scalars and vectors are often denoted as lowercase letters, whereas matrices and tensors are denoted as uppercase letters. The names matrix and tensor are often used interchangeably, but in this case matrices are used to refer to 2-dimensional arrays of numbers, while tensors are n-dimensional.

### Random tensors

Creating tensors by hand is rare, instead a machine learning model will start with a large random tensor of numbers and adjusts these numbers as it works through data to better represent it.

Essentially, `random numbers -> look at data -> update numbers -> repeat`

As a data scientist, you can define how the model starts (initialization), looks at data (representation) and updates the numbers (optimization).

Create random tensors using `torch.rand()`:

In [12]:
random_tensor = torch.rand(size=(3, 4))
random_tensor, random_tensor.dtype

(tensor([[0.2638, 0.2019, 0.5282, 0.3042],
         [0.4866, 0.6798, 0.6364, 0.5665],
         [0.0077, 0.2077, 0.8574, 0.4656]]),
 torch.float32)

The flexibility allows us to adjust the `size` to be whatever we want. We can create a tensor in the common image shape of `[224, 224, 3]`:

In [13]:
random_image_size_tensor = torch.rand(size=(224, 224, 3))
random_image_size_tensor.shape, random_image_size_tensor.ndim

(torch.Size([224, 224, 3]), 3)

Sometimes, you want to fill a tensor with exclusively zeroes or ones. This is often used in masking (masking some values of a tensor with zeroes to let a model ignore them).

In [14]:
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

In [15]:
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

### Creating ranges

Create a range of numbers ,like `1..10` or `0..100` using `torch.arange(start, end, step)`.

In [16]:
zero_to_ten = torch.arange(start=0, end=10, step=1)
zero_to_ten

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Create a tensor of zeros or ones similar to the shape of another tensor using `torch.zeros_like(input)` or `torch.ones_like(input)`.

In [17]:
ten_zeros = torch.zeros_like(zero_to_ten)
ten_zeros

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

### Tensor datatypes

Create tensors with specific datatypes using the `dtype` parameter:

In [18]:
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # torch.float32, or whatever datatype is passed
                               device=None, # Use the default tensor type
                               requires_grad=False) # If true, operations are recorded
float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

In [19]:
float_16_tensor = torch.tensor([3.0, 6.0, 9.0], 
                               dtype=torch.float16)
float_16_tensor.dtype

torch.float16

### Getting information about tensors

Create a random tensor, and find out details about it:

In [20]:
some_tensor = torch.rand((3, 4))

print(some_tensor)
print(f"Shape of tensor: {some_tensor.shape}")
print(f"Datatype of tensor: {some_tensor.dtype}")
print(f"Device tensor is stored on: {some_tensor.device}") # Defaults to CPU

tensor([[0.8575, 0.7280, 0.6527, 0.2195],
        [0.0964, 0.0273, 0.3026, 0.2053],
        [0.4158, 0.4503, 0.3174, 0.2514]])
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu


### Manipulating tensors

In deep learning, data (images, texts, video, audio, etc.) gets represented as tensors.

A model learns by investigating tensors and performing series of operations on tensors to create a representation of the patterns within the input data.

These usually encompass:
* Addition
* Subtraction
* Element-wise multiplication
* Division
* Matrix multiplication
  
These are the basic building blocks of neural networks.

The fundamental operations addition, subtraction and multiplication work as expected:

In [21]:
tensor = torch.tensor([1, 2, 3])
tensor + 10

tensor([11, 12, 13])

In [22]:
tensor * 10

tensor([10, 20, 30])

In [23]:
tensor # Tensors do not change unless reassigned!

tensor([1, 2, 3])

In [24]:
tensor = tensor - 10
tensor

tensor([-9, -8, -7])

In [25]:
tensor = tensor + 10
tensor

tensor([1, 2, 3])

Note that saying `tensor + 10` is equivalent to saying `torch.add(tensor, 10)`. Same goes for multiplcation using `*` and `torch.mul()`.

### Matrix multiplication

One of the most common operations in machine learning and deep learning algorithms is **matrix multiplication**.

These are implemented using the `torch.matmul()` method.

Reminder:
1. The inner dimensions must match:
  * `(3, 2) @ (3, 2)` won't work
  * `(3, 2) @ (2, 3)` will work
2. The resulting matrix has the shape of the outer dimensions:
  * `(2, 3) @ (3, 2) -> (2, 2)`
  * `(3, 2) @ (2, 3) -> (3, 3`

In [26]:
tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

In [27]:
# Element-wise multiplication
tensor * tensor

tensor([1, 4, 9])

In [28]:
# Matrix multiplication
torch.matmul(tensor, tensor)

tensor(14)

This is equivalent to using `tensor @ tensor`, although this syntax is not recommended.

Implementing matrix multiplication by hand is easy, but not recommended since Python for-loops are computationally expensive:

In [29]:
%%time
res = 0
for i in range(0, len(tensor)):
    res += tensor[i] * tensor[i]
res

CPU times: total: 0 ns
Wall time: 0 ns


tensor(14)

In [30]:
%%time
torch.matmul(tensor, tensor)

CPU times: total: 0 ns
Wall time: 0 ns


tensor(14)

### Shape errors

Shape mismatches are one of the most common error types in deep learning.

In [31]:
tensor_A = torch.tensor([[1, 2],
                         [3, 4],
                         [5, 6]],
                        dtype=torch.float32)
tensor_B = torch.tensor([[7, 8],
                         [9, 10],
                         [11, 12]],
                        dtype=torch.float32)
torch.matmul(tensor_A, tensor_B) # This will throw an error

RuntimeError: mat1 and mat2 shapes cannot be multiplied (3x2 and 3x2)

We can make matrix multiplication work by making the inner dimensions match.

One way of doing this is by using the **transpose**:
* `torch.transpose(input, dim0, dim1)`, where `dim0` and `dim1` are the dimensions to be swapped
* `tensor.T`, where `tensor` is the desired tensor to transpose

In [None]:
tensor_B.T

tensor([[ 7.,  9., 11.],
        [ 8., 10., 12.]])

In [None]:
# Now, the operation works when B is transposed
print(f"Original shapes: {tensor_A.shape}, {tensor_B.shape}\n")
print(f"New shapes: {tensor_A.shape}, {tensor_B.T.shape}\n")
print(f"Result of matrix multiplication w/ transpose:\n{torch.mm(tensor_A, tensor_B.T)}")

Original shapes: torch.Size([3, 2]), torch.Size([3, 2])

New shapes: torch.Size([3, 2]), torch.Size([2, 3])

Result of matrix multiplication w/ transpose:
tensor([[ 23.,  29.,  35.],
        [ 53.,  67.,  81.],
        [ 83., 105., 127.]])


The `torch.nn.Linear()` (also known as a feed-forward layer or fully connected layer) module implements a matrix multiplication between an input $x$ and a weights matrix $A$:

\begin{align*}
y = x \cdot A^T + b
\end{align*}

Where:
* $x$ is the input to the layer (deep learning is a stack of layers like the one above)
* $A$ is the weights matrix created by the layer, this starts out as random numbers but gets adjusted over time as the neural network learns to better represent patterns in the data -- the $T$ denotes that the weights matrix gets transposed
* $b$ is the bias term used to slightly offset weights and inputs
* $y$ is the output (a manipulation of the input in the hopes to discover patterns in it)

This is a linear function, and hence can be used to draw a straight line.

In [None]:
# Since the linear layers starts with a random weights matrix, let's make it reproducable
torch.manual_seed(42)
# This uses matrix multiplication
linear = torch.nn.Linear(in_features=2, # matches inner dimension of output
                         out_features=6) # describes outer value
x = tensor_A
output = linear(x)
print(f"Input shape:\n{x.shape}\n")
print(f"Output shape:\n{output.shape}\n\nOutput:\n{output}")

Input shape:
torch.Size([3, 2])

Output shape:
torch.Size([3, 6])

Output:
tensor([[2.2368, 1.2292, 0.4714, 0.3864, 0.1309, 0.9838],
        [4.4919, 2.1970, 0.4469, 0.5285, 0.3401, 2.4777],
        [6.7469, 3.1648, 0.4224, 0.6705, 0.5493, 3.9716]],
       grad_fn=<AddmmBackward0>)


### Finding the min, max, mean, sum etc (aggregation)

In [None]:
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [None]:
x.min(), x.max(), x.type(torch.float32).mean(), x.sum()

(tensor(0), tensor(90), tensor(45.), tensor(450))

In [None]:
# You can also do the same with torch methods
torch.min(x), torch.max(x), torch.mean(x.type(torch.float32)), torch.sum(x)

(tensor(0), tensor(90), tensor(45.), tensor(450))

You can also get the index of the min or max position using `torch.argmin()` and `torch.argmax()`

In [None]:
x.argmin(), x.argmax()

(tensor(0), tensor(9))

### Change tensor datatype

Another common issue with deep learning operations is a mismatch of tensor datatypes.

If one tensor is `torch.float64` and the other is `torch.float32`, you might run into some issues.

Change the datatypes of tensors using `torch.Tensor.type(dtype=None)`, where `dtype` is the desired data type.

In [None]:
tensor = torch.arange(0., 100., 10.)
tensor.dtype # Default is float32

torch.float32

In [None]:
# Create another tensor, same as before but with a float16 data type
tensor_float16 = tensor.type(torch.float16)
tensor_float16

tensor([ 0., 10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

### Reshaping, stacking, squeezing and unsqueezing

Deep learning models are all about manipulating tensors in some way. Because of the rules of matrix multiplication, shape mismatches will result in errors. There are methods that help you mix the right elements of tensors with other tensors.

* `torch.reshape(input, shape)`
* `torch.Tensor.view(shape)` returns a view of the original tensor in a different `shape`
* `torch.stack(tensors, dim=0)` concatenates a seq of `tensors` along a new `dim` (must all be of same size)
* `torch.squeeze(input)` squeezes `input` to remove all dimensions with value `1`
* `torch.unsqueeze(input, dim)` returns `input` with value of `1` added at `dim`
* `torch.permute(input, dims)` returns view of `input` with rearranged `dims`

In [34]:
x = torch.arange(1., 8.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

In [35]:
# Add an extra dimension
x_reshaped = x.reshape(1, 7)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [36]:
# Change the view (same tensor though)
z = x.view(1, 7)
z, z.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [37]:
# Changing z changes x as well
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))

In [38]:
# Stack new tensor on top of itself five times
x_stacked = torch.stack([x for _ in range(0, 5)], dim=0)
x_stacked

tensor([[5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.],
        [5., 2., 3., 4., 5., 6., 7.]])

In [40]:
x_squeezed = x_reshaped.squeeze()
x_squeezed, x_squeezed.shape

(tensor([5., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

In [41]:
x_unsqueezed = x_squeezed.unsqueeze(dim=0)
x_unsqueezed, x_unsqueezed.shape

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [44]:
x_original = torch.rand(size=(224, 224, 3))
# Rearrange the axis order
x_permuted = x_original.permute(2, 0, 1) # Shifts axis 0->1, 1->2, 2->0
x_original.shape, x_permuted.shape

(torch.Size([224, 224, 3]), torch.Size([3, 224, 224]))

Keep in mind permuting returns a **view**, meaning it shares data with the original.

### Indexing (selecting data)

Indexing tensors in PyTorch is similar to Python list and NumPy indexing:

In [45]:
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

(tensor([[[1, 2, 3],
          [4, 5, 6],
          [7, 8, 9]]]),
 torch.Size([1, 3, 3]))

In [48]:
x[0], x[0][0], x[0][0][0] # Inner -> outer dimensions

(tensor([[1, 2, 3],
         [4, 5, 6],
         [7, 8, 9]]),
 tensor([1, 2, 3]),
 tensor(1))

In [51]:
x[:, 0] # All values of 0th dimension and 0 index of 1st dimension

tensor([[1, 2, 3]])

In [53]:
x[:, :, 1] # All values of 0th and 1st dimension, only 1 index of 2nd dimension

tensor([[2, 5, 8]])

In [55]:
x[:, 1, 1] # All values of 0 dimension and only 1 index of 1st and 2nd dimension

tensor([5])

In [56]:
x[0, 0, :] # Same as x[0][0]

tensor([1, 2, 3])

### PyTorch and NumPy

Two main methods to interact with NumPy:
* `torch.from_numpy(ndarray)` (NumPy array -> PyTorch tensor)
* `torch.Tensor.numpy()` (PyTorch tensor -> NumPy array)

In [57]:
import torch
import numpy as np

array = np.arange(1., 8.)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

> NumPy arrays area created with data type `float64` by default, so usually convert the tensor to `float32` using `.type(torch.float32)`

In [58]:
tensor = torch.ones(7)
numpy_tensor = tensor.numpy()
tensor, numpy_tensor

(tensor([1., 1., 1., 1., 1., 1., 1.]),
 array([1., 1., 1., 1., 1., 1., 1.], dtype=float32))

### Reproducibility

We've discussed neural networks start with random numbers to describe patterns in data and try to improve the accuracy of these numbers using tensor operations.

Randomness is powerful, but often we want **reproducibility**.

Use `torch.manual_seed(seed)` to "flavour" the randomness:

In [59]:
import torch
import random

RANDOM_SEED = 42
torch.manual_seed(seed=RANDOM_SEED)
random_tensor_A = torch.rand(3, 4)

# Reset the seed every time a new rand() is called
torch.random.manual_seed(seed=RANDOM_SEED)
random_tensor_B = torch.rand(3, 4)

random_tensor_A, random_tensor_B, random_tensor_A == random_tensor_B

(tensor([[0.8823, 0.9150, 0.3829, 0.9593],
         [0.3904, 0.6009, 0.2566, 0.7936],
         [0.9408, 0.1332, 0.9346, 0.5936]]),
 tensor([[0.8823, 0.9150, 0.3829, 0.9593],
         [0.3904, 0.6009, 0.2566, 0.7936],
         [0.9408, 0.1332, 0.9346, 0.5936]]),
 tensor([[True, True, True, True],
         [True, True, True, True],
         [True, True, True, True]]))

### Running tensors on GPU

By default, deep learning algorithms perform operations on the CPU.

The matrix multiplications done by neural networks are processed much faster by a GPU though.