<a href="https://colab.research.google.com/github/tonyzamyatin/learning-pytorch/blob/master/fcc-course/00_fundamentals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 00. PyTorch Fundamentals
**Goal:** Introduction to everything tensors.

Reference to online course book: https://www.learnpytorch.io/00_pytorch_fundamentals/

In [203]:
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(torch.__version__)

## Introductino to Tensors
### Creating Tensors

In [204]:
# Scalar
s = torch.tensor(7)
s

In [205]:
# Vector
v = torch.tensor([7, 7])
v

In [206]:
v.ndim

In [207]:
v.shape

In [208]:
# Matrix
M = torch.tensor([
    [1, 2],
    [3, 4]
])
M

In [209]:
M.ndim

In [210]:
M.shape

In [211]:
# Second row of matrix (a tensor yet again)
M[1]

In [212]:
# Tensor, with dim 3*4*2, randomly initialized
T = torch.rand(3, 4, 2)
T

In [213]:
T.ndim

In [214]:
T.shape

The first number of the .shape attribute denotes the number of tensors within the first pair of `[]`, the second number denotes the number of
tensors inside the second pair of `[]` (in the three-dimensional case it denotes the number of rows),

### Naming conventions
Scalars and vectors have lower case names, matrices and higher dimensional tensors have upper case names.

## Random Tensors
### Why random tensors?
Often we want to randomly initialize the parameters of neural networks as an unbiased starting point for training. For this we can use random tensors.

In [215]:
random_img_tensor = torch.rand(224, 224, 3)
random_img_tensor.shape, random_img_tensor.ndim

### Zeros and Ones

Zero tensors are useful for masking rows, columns or sub-tensors of the data tensors of any shape.

In [216]:
zeros = torch.zeros(3, 3)
zeros

In [217]:
ones = torch.ones(3, 3)
ones

In [218]:
zeros.dtype

### Creating a range of tensors and tensor-like

In [219]:
range_tensor = torch.arange(start=0, end=691, step=69)
range_tensor

In [220]:
# Creating tensors like
eleven_zeros = torch.zeros_like(range_tensor)
eleven_zeros, len(eleven_zeros)

## Tensor datatypes
*Note:* Tensor datatypes is one of the 3 big errors you will run into with PyTorch and deep learning:
1. Tensors not right datatype
2. Tensor not right shape
3. Tensor not on the right device

In [221]:
float_32_tensor = torch.tensor([3., 6., 9.],
                               dtype=None,          # datatype of the tensor (e.g. float32, float16)
                               device=None,         # the device the tensor is located on
                               requires_grad=False) # Tells PyTorch whether to track the gradient with this tensors operation
float_32_tensor, float_32_tensor.dtype, float_32_tensor.device, float_32_tensor.requires_grad

### Most common datatypes
- 32 bit ... single precision
- 16 bit ... half precision
- 64 bit ... double precision

In [222]:
float_16_tensor = float_32_tensor.type(torch.float16)   # Change datatype to float16
float_16_tensor, float_16_tensor.dtype

In [223]:
float_64_tensor = float_32_tensor.type(torch.float64)
float_64_tensor, float_64_tensor.dtype

In [224]:
res_16_32 = float_16_tensor * float_32_tensor
res_16_32, res_16_32.dtype

In [225]:
res_32_64 = float_32_tensor * float_64_tensor
res_32_64, res_32_64.dtype

Some tensor operations work with different tensor datatypes (e.g. by adopting the larger datatype). Other operations (many of the operations
during neural network training) will throw on error, however.

In [226]:
int_32_tensor = torch.tensor([3, 6, 9], dtype=torch.int32)

In [227]:
res_int_float = int_32_tensor * float_32_tensor
res_int_float, res_int_float.dtype

## Getting information from tensors
**Tensor attributes:**
1. Tensors not right datatype - get information using `tensor.dtype`
2. Tensor not right shape - get information using `tensor.shape`
3. Tensor not on the right device - get information using `tensor.device`

In [228]:
some_tensor = torch.rand(3, 4)
some_tensor

In [229]:
# Size and shape do the same
some_tensor.size(), some_tensor.shape

In [230]:
some_tensor.dtype, some_tensor.shape, some_tensor.device

In [231]:
# Change datatype and device of tensor
some_tensor.to(dtype=torch.float16, device="cuda")  # Throws error if no NVIDIA GPU with CUDA is available
some_tensor.dtype, some_tensor.device

## Manipulating tensors
Tensor operations include:
* Addition
* Subtraction
* Multiplication (element-wise)
* Division
* Matrix multiplication (dot product)

In [232]:
# Add scalar to tensor (creates a tensor-like filled with the scalar to add)
tensor = torch.tensor([1, 2, 3])
tensor + 10

In [233]:
# Multiply tensor by scalar
tensor * 10

In [234]:
# Subtract scalar from tensor
tensor - 10

In [235]:
# PyTorch in-built functions
torch.add(tensor, 10), torch.mul(tensor, 10), torch.subtract(tensor, 10)

### Rules for matrix mulitplication
1. **Inner dimensions** of the two matrices to be multiplied must match, e.g. 3x2 @ 2x3
2. The resulting matrix has the shape of the **outer dimensions**

In [236]:
matrix = torch.rand(10, 3)

In [237]:
%%time
torch.matmul(matrix, matrix.T)    # dot product

In [238]:
%%time
torch.mm(matrix, matrix.T)  # shorthand for torch.matmult
# Interestingly enough, torch.mm() does not work with vectors.

In [239]:
%%time
# by hand
res_matrix = torch.zeros(10, 10)
for row in range(matrix.shape[0]):
    for col in range(matrix.shape[0]):
        res_matrix[row][col] = (matrix[row] * matrix[col]).sum()
res_matrix

Apart from the basic matrix operations, computationally intensive operations like matrix multiplication are implemented more efficiently in PyTorch. Therefore, it is recommended to use the PyTorch implementation instead of the Python implementation.

In [240]:
%%time
matrix @ matrix.T

This is the shorthand form for matrix multiplication, but may cause confusions regarding which implementation is used when other libraries such as
numpy are used which also support the `@`operator

## Tensor aggregation
min, max, mean, sum


In [241]:
torch.min(res_matrix), res_matrix.min()

In [242]:
torch.max(res_matrix), res_matrix.max()

In [243]:
torch.mean(res_matrix)

In [244]:
torch.sum(res_matrix), res_matrix.sum()

### Positional min and max
Positional min and max return the index of the target tensor where the min or max respectively occurs.
*Note:* The positional index is also returned by `min()`and `max()` as the second return value.

In [245]:
res_matrix.argmin(dim=0), res_matrix.argmax()

If `dim` is not set, per default the tensor is flattened and the min/max index of the flattened tensor is returned.

In [246]:
res_matrix.flatten()[res_matrix.argmax().item()]

## Reshaping, stacking, squeezing and unsqueezing tensors
* Reshaping - reshapes an input tensor to a defined shape
* view - returns a view of an input tensor of a certain shape but keep the same memory as the original tensor
* Stacking - concatenate multiple tensors on top of each other (vstack) or side by side(hstack)
* Squeeze - remove all `1` dimensions from a tensor
* Unqueeze - add a `1` dimension to a target tensor
* Permute - return a view of an input tensor with dimensions permuted in a certain way

In [247]:
x = torch.arange(1., 11.)
x, x.shape

In [248]:
# Add extra dimension on 0-th dimension
x_reshaped = x.reshape(1, 10)
x_reshaped, x_reshaped.shape

In [249]:
# Add extra dimension on the 1-st dimension
x_reshaped = x_reshaped.reshape(10, 1)
x_reshaped, x_reshaped.shape

In [250]:
x_reshaped = x_reshaped.reshape(2, 5)
x_reshaped, x_reshaped.shape

When reshaping a tensor, the number of elements of the input and output tensor must match.

In [251]:
# Change view
z = x.view(1, 10)
z, z.shape

Same result as with reshaping. However, whereas `tensor.reshape()`creates a deep copy of the tensor, `tensor.view()` creates a shallow copy. This
means that
changes in one tensor object will be reflected in the other since they both share the same memory.

In [252]:
z[:, 0] = 11.
z, x

We can concatenate tensors along specified dimensions using `torch.stack()`.

In [253]:
# Stack tensors
x_hstacked = torch.stack([x, x, x, x], dim=0)
x_hstacked, x_hstacked.shape

In [254]:
x_vstacked = torch.stack([x, x, x, x], dim=1)
x_vstacked, x_vstacked.shape

In [255]:
# Throws an error
# torch.stack([x, x, x, x], dim=2)

This will throw an error since the original tensor only has two dimensions, not three. We can simply add another dimension to the original tensor
with `tensor.reshape()`.

In [256]:
x_3d = x.reshape(1, 2, 5)
x_3d, x_3d.shape

If we perform the stacking on `dim=2` with the 3d tensor it should work.

In [257]:
x_3d_stacked_dim2 = torch.stack([x_3d, x_3d, x_3d, x_3d], dim=2)
x_3d_stacked_dim2, x_3d_stacked_dim2.shape

We can now also stack along `dim=3`.

In [258]:
x_3d_stacked_dim3 = torch.stack([x_3d, x_3d, x_3d, x_3d], dim=3)
x_3d_stacked_dim3, x_3d_stacked_dim3.shape

In [259]:
# Negative dimension indices for indexing from the back
torch.stack([x, x, x, x], dim=-2), torch.stack([x, x, x, x], dim=0)

Remove all single dimensions with `torch.squeeze(tensor)` or `tensor.squeeze()`.

In [260]:
x_3d_squeezed = x_3d.squeeze()
x_3d.shape, x_3d_squeezed.shape

Add a single dimension with `torch.unsqueeze(tensor, dim)` or `tensor.unsqueeze(dim)`.

In [261]:
x_unsqueeze_dim0 = x.unsqueeze(0)
x_unsqueeze_dim1 = x.unsqueeze(1)
x_unsqueeze_dim0.shape, x_unsqueeze_dim1.shape

In [262]:
# Throws an error
# x.unsqueeze(2)

Switch the dimensions of the tensor using `tensor.permute(list of rearranged dimensions)`.

In [263]:
# We want to permute an image tensor s.t. the color dimension is the first
img_original = torch.rand(244, 244, 3)
img_permuted = img_original.permute([2, 0, 1])
img_original.shape, img_permuted.shape

## Indexing (selecting data from tensors)

In [264]:
x = torch.arange(1, 10).reshape(1, 3, 3)
x, x.shape

In [265]:
# Get first tensor inside original tensor
x[0]

In [266]:
# Indexing on dim=1
x[0][0], x[0, 0]

In [267]:
# Indexing on dim=2
x[0][1][1], x[0, 1, 1]

Use `:` top select "all" of a target dimension.

In [268]:
# Get all values of the 0th and 1st dimension but only index 1 of the 2nd dimension.
x[:, :, 1]

In [269]:
# Get all values of the 0th dimension but only index 1 of the 1st and 2nd dimension.
x[:, 1, 1]

In [270]:
# This is the same
x[:, 1], x[:, 1, :]

## PyTorch and Numpy
* Data in NumPy to PyTorch tensor -> `torch.from_numpy(ndarray)`
* PyTorch tensor to NumPy -> `torch.Tensor.numpy()`

In [271]:
# PyTorch tensor from NumPy array
int_array = np.arange(1, 10)
int_32_tensor = torch.from_numpy(int_array)
int_array, int_32_tensor

*Note:* In contrast to PyTorch, NumPy actually creates integer arrays when you pass integers to `np.arange()` and the resulting tesnor will be
int32!

In [272]:
float_array = np.arange(1., 10.)
float_64_tensor = torch.from_numpy(float_array)
float_array, float_64_tensor

*Note:* In contrast to PyTorch, NumPy uses float64 as default float datatype and not float32 as PyTorch! This means that we might have to change the
datatype to float32 to avoid potential datatype issues later down the line.

In [273]:
float_32_tensor = torch.from_numpy(float_array.astype("float32"))
float_32_tensor

In [274]:
float_32_tensor = torch.from_numpy(float_array).type(torch.float32)
float_32_tensor

In [275]:
# Tensor to NumPy
float_32_array = float_32_tensor.numpy()
float_32_array

**Note:** Conversion from NumPy to PyTorch and vice versa retain the original datatype of the array/tensor.

## Reproducibility: trying to remove the random from random
The weights of neural networks are usually initialized randomly. The starting point for the weights of the neural network effects the
weights at the end of training. However, we often want to have the quality of *reproducibility*, e.g. when sharing your code or publishing a paper.

To achieve reproducible randomness, we use *random seeds*.

In [276]:
random_tensor_A = torch.rand(3, 4)
random_tensor_B = torch.rand(3, 4)
random_tensor_A, random_tensor_B, random_tensor_A == random_tensor_B

In [277]:
# With random seed
RANDOM_SEED = 42    # some numerical value
torch.manual_seed(RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)
torch.manual_seed(RANDOM_SEED)      # Seed must be set before each random operation
random_tensor_D = torch.rand(3, 4)
random_tensor_C, random_tensor_D, random_tensor_C == random_tensor_D


**Note:** Once a random seed is set, the random number generator will generate the same sequences of random numbers when called multiple times. This is why we need to reset the seed manually in line 5 to get the same tensor as in line 4. However, if we remove this line and then run the code cell multiple times, the tensors will always be the same, even when they are different from each other.

Explanation on StackOverflow: https://stackoverflow.com/a/51425032

Extra resources for reproducibility:
* https://pytorch.org/docs/stable/notes/randomness.html
* https://en.wikipedia.org/wiki/Random_seed

## Running tensors and PyTorch objects on GPUs
GPUs = parallelization of matrix operations = faster computation

### 1. Getting a GPU
1. Free GPUs on Google Colab, suited for small experiments and projects
2. Get your own GPU rack, see https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/
3. Cloud computing on GCP, AWS, Azure, etc.

In [278]:
!nvidia-smi

### 2. Check for GPU access with PyTorch

In [279]:
torch.cuda.is_available()

### 3. Write device agnostic code
We want to be able to run the code on all machines but use GPUs when they are available.
See: https://pytorch.org/docs/stable/notes/cuda.html#best-practices

In [280]:
# Setup device
device = "cuda" if torch.cuda.is_available() else "mps" if torch.device.mps_available() else "cpu"
torch.device(device)
device

In [281]:
torch.cuda.device_count()

### 4. Putting tensors and models on the GPU

In [282]:
# Per default tensors are initialized to use CPU
tensor = torch.tensor([1, 2, 3])
tensor.device

In [283]:
# Move tensor to GPU
tensor_on_gpu = tensor.to(device)
tensor_on_gpu   # On gpu or mps if available

### 5. Moving tensors back to CPU

In [284]:
# Throws an error because NumPy only works on CPU
# tensor_on_gpu.numpy()

In [285]:
tensor_on_cpu = tensor_on_gpu.cpu() # Returns copy of tensor object in CPU memory
tensor_on_cpu.numpy()

## Exercises

1. Documentation reading - A big part of deep learning (and learning to code in general) is getting familiar with the documentation of a certain framework you're using. We'll be using the PyTorch documentation a lot throughout the rest of this course. So I'd recommend spending 10-minutes reading the following (it's okay if you don't get some things for now, the focus is not yet full understanding, it's awareness). See the documentation on `torch.Tensor` and for `torch.cuda`.

2. Create a random tensor with `shape (7, 7)`.

In [286]:
torch.manual_seed(RANDOM_SEED)
tensor_A = torch.rand(7, 7)
tensor_A

3. Perform a matrix multiplication on the tensor from 2 with another random tensor with `shape (1, 7)` (hint: you may have to transpose the second tensor).

In [287]:
torch.manual_seed(RANDOM_SEED)
tensor_B = torch.rand(1, 7)
torch.matmul(tensor_A, tensor_B.mT)

4. Set the random seed to 0 and do exercises 2 & 3 over again.

In [296]:
torch.manual_seed(0)
tensor_C = torch.rand(7, 7)
tensor_D = torch.rand(1, 7)
torch.matmul(tensor_C, tensor_D.mT)

5. Speaking of random seeds, we saw how to set it with `torch.manual_seed()` but is there a GPU equivalent? (hint: you'll need to look into the documentation for `torch.cuda` for this one). If there is, set the GPU random seed to 1234.

In [289]:
torch.cuda.manual_seed(RANDOM_SEED)
tensor_on_gpu = torch.rand(size=(7, 7), device="cuda")
tensor_on_gpu

6. Create two random tensors of `shape (2, 3)` and send them both to the GPU (you'll need access to a GPU for this). Set `torch.manual_seed(1234)` when creating the tensors (this doesn't have to be the GPU random seed).

In [294]:
device = "cuda" if torch.cuda.is_available() else "mps" if torch.device.mps_available() else "cpu"

torch.manual_seed(1234)
tensor_E = torch.rand(2, 3)
tensor_F = torch.rand_like(tensor_E)
tensor_E_on_gpu = tensor_E.to(device)
tensor_F_on_gpu = tensor_F.to(device)
tensor_E_on_gpu, tensor_F_on_gpu

7. Perform a matrix multiplication on the tensors you created in 6 (again, you may have to adjust the shapes of one of the tensors).

In [300]:
%%time
torch.matmul(tensor_E, tensor_F.mT)

In [302]:
%%time
res = torch.matmul(tensor_E_on_gpu, tensor_F_on_gpu.mT)
res

8. Find the maximum and minimum values of the output of 7.

In [303]:
res.min(), res.max()

9. Find the maximum and minimum index values of the output of 7.

In [304]:
res.argmin(), res.argmax()

10. Make a random tensor with `shape (1, 1, 1, 10)` and then create a new tensor with all the 1 dimensions removed to be left with a tensor of `shape (10)`. Set the seed to 7 when you create it and print out the first tensor and it's shape as well as the second tensor and it's shape.

In [309]:
torch.manual_seed(7)
tensor_G = torch.rand(1, 1, 1, 10)
tensor_G_squeezed = tensor_G.squeeze()
print(f"""
  First tensor:
  {tensor_G}
  shape: {tensor_G.shape}

  Second tensor:
  {tensor_G_squeezed}
  shape: {tensor_G_squeezed.shape}
""")