## Pytorch notes

### What is pytorch
* [pytorch link](pytorch.org) for most useful documents and examples
* most popular research deep learning framework
* write fast deep learning code in Python (able to run on a GPU/many GUPs)
* Able to access many pre-built deep learning models (Torch hub/torchvision.models)
* whole stack: preprocess data, model data, deploy model in your application/cloud
* originally desinged and used in-house by Facebook/Meta (now open-source and used by companies such as Tesla, MIcrosoft, OpenAI)
* find the [comnparison of deep learning frameworks from paperswithcode](paperswithcode.com/trends)
  + browse state-of-the-art lists the workbench of deep learning applications/packages 
* allows you to work with GPU through an interface called Cuda 

* colab.research.google.com
  + switch to GPU from Runtime tab
  + check what GPU you are using by !nvidia-sml
  
### Pytorch pracitce 
* PyTorch Fuondamentals practice. [code is from the follwoing link](https://www.learnpytorch.io/00_pytorch_fundamentals/)

In [16]:
# install pytorch without CUDA
! pip3 install torch torchvision torchaudio

Defaulting to user installation because normal site-packages is not writeable


In [11]:
import torch
import numpy as np
torch.__version__

'2.0.1+cpu'

In [12]:
# make sure numpy version is 1.24.1 (> 1.22 version)
np.__version__

'1.24.1'

### Create scalar, vector, matrix and multiple dimensional tensor
* A torch.Tensor is a multi-dimensional matrix containing elements of a single data type
* A tensor can be constructed from a python list or sequence using torch.tensor() constructor
  + torch.tensor() always copies data
  + if you have a numpy array and want to avoid a copy, use `torch.as_tensor()`
* A tensor of specific data type can be constructed by passing a torch.dtype and/or a torch.device to a constructor or tensor creation op
    ```python
        # create zeros of shape (2, 4) as integers
        torch.zeros([2, 4], dtype=torch.int32)

        # create cuda float64 tensor of shape (2,4)
        cuda0 = torch.device('cuda:0')
        torch.ones([2, 4], dtype=torch.float64, device=cuda0)
    ```       

In [3]:
# create tensor variables

# scalar
scalar = torch.tensor(7)
print(f'scalar={scalar}')

# vector
vector = torch.tensor([7, 7])
print(f'vector={vector}')

# MATRIX
MATRIX = torch.tensor([[7, 7], [8, 9]])
print(f'MATRIX = {MATRIX}')

scalar=7
vector=tensor([7, 7])
MATRIX = tensor([[7, 7],
        [8, 9]])


In [15]:
# generate a random tensor of size (3, 4)
random_tensor = torch.rand(size=(3, 4))
print(random_tensor, random_tensor.dtype)
print(random_tensor.ndim)
print(random_tensor.shape)

tensor([[0.9081, 0.9764, 0.9223, 0.9591],
        [0.1779, 0.7456, 0.1191, 0.8214],
        [0.2575, 0.8997, 0.8899, 0.8312]]) torch.float32
2
torch.Size([3, 4])


In [9]:
# create 3d tensor
random_image_size_tensor = torch.rand(size=(224, 224, 3))
print(random_image_size_tensor.ndim, random_image_size_tensor.shape)

3 torch.Size([224, 224, 3])


#### Converting between torch.tensor and numpy array
* tensor can be converted to numpy array by calling its numpy() method
  + has to have numpy version >= 1.24.1
* tensor can be created directly from numpy array by calling `from_numpy` method
* once brideged/linked, the tensor and numpy array point to the same memory address
  + changing one of them also changes the other
* if you want tensor and numpy array to be independent to each other, re-assign the tensor or numpy array. 

In [15]:
# convert tensor to numpy array
mat_np = MATRIX.numpy()
print(f'before, mat_np = {mat_np}')

before, mat_np = [[7 7]
 [8 9]]


In [19]:
# after modifying MATRIX, the corresponding numpy array is changed
MATRIX[0] = torch.tensor([4, 4])
print(f'after, mat_np = {mat_np}')

after, mat_np = [[4 4]
 [8 9]]


In [20]:
# create tensor from numpy array
# after changing numpy array, tensor is changed
a = np.ones(5)
b = torch.from_numpy(a)
print(f'before, a={a}, b={b}')
np.add(a, 1, out=a)
print(f'after a={a}, b={b}')

before, a=[1. 1. 1. 1. 1.], b=tensor([1., 1., 1., 1., 1.], dtype=torch.float64)
after a=[2. 2. 2. 2. 2.], b=tensor([2., 2., 2., 2., 2.], dtype=torch.float64)


In [25]:
# on the other direction, changing
# tensor changes the numpy array
b[0] = torch.tensor(3)
print(a, b)

[3. 2. 2. 2. 2.] tensor([3., 2., 2., 2., 2.], dtype=torch.float64)


In [90]:
# to keep tensor and numpy array independent from each other
# reassign the tensor or numpy array when modifying them
array = np.arange(1.0, 8.0)
tensor = torch.from_numpy(array)
array, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [91]:
# change array, and reassign array 
# to make numpy array and tensor independent
array = array + 1
array, tensor

(array([2., 3., 4., 5., 6., 7., 8.]),
 tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64))

In [92]:
# create numpy array from tensor by tensor.numpy
# modify tensor and re-assign the modified tensor
# to tensor to make array and tensor independent
array1 = tensor.numpy()
tensor = tensor + 1
array1, tensor

(array([1., 2., 3., 4., 5., 6., 7.]),
 tensor([2., 3., 4., 5., 6., 7., 8.], dtype=torch.float64))

#### Create zeros and ones vectors and tensors

In [27]:
zeros = torch.zeros(size=(3, 4))
zeros, zeros.dtype

(tensor([[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]),
 torch.float32)

In [28]:
ones = torch.ones(size=(3, 4))
ones, ones.dtype

(tensor([[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]),
 torch.float32)

In [30]:
# get the numbers in a range, the same as python range
torch.arange(0, 10)

tensor([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#### Data types
* some data types are for CPU, others for GPU
* torch.cuda means the tensor is used for GPU
* most common types are torch.float32 and torch.float64/torch.double
  + also torch.float16
  
* pytorch often likes tensors to be the same format
* pytorch often likes tensors on the same device, rather than one on CPU and the other on GPU
* you can get the information about the dtype, device and shape by calling the methods of tensor
  + `some_tensor.shape`
  + `some_tensor.dtype`
  + `some_tensor.device`

In [32]:
# Default datatype for tensors is float32
float_32_tensor = torch.tensor([3.0, 6.0, 9.0],
                               dtype=None, # defaults to None, which is torch.float32 or whatever datatype is passed
                               device=None, # defaults to None, which uses the default tensor type
                               requires_grad=False) # if True, operations performed on the tensor are recorded 

float_32_tensor.shape, float_32_tensor.dtype, float_32_tensor.device

(torch.Size([3]), torch.float32, device(type='cpu'))

### Basic operations
* basic operations include
  + addition
  + subtraction
  + multiplication
* all basic operations are vectorized
  + if you use the operators such as +, - and *, the results will not be stored, unless you rassign it to the original variables
  + you can use tensor.add, torch.mul with the same effects as operators.
  + operators are more commonly used

In [35]:
# create a tensor
tensor = torch.tensor([1, 2, 3])
print(f'original tensor = {tensor}')

# addition
print(f'addition, tensor+10 = {tensor + 10}')

# subtraction
print(f'subtraction, tensor-10 = {tensor - 10}')

# multiplication
print(f'multiplication, tensor*10 = {tensor * 10}')

# all operations will not change original tensor
print(f'tensor after operations = {tensor}')

# use torch.add 
print(f'torch.add(tensor, 10) = {torch.add(tensor, 10)}')

# use torch.multiply
print(f'torch.multiply(tensor, 10) = {torch.multiply(tensor, 10)}')

original tensor = tensor([1, 2, 3])
addition, tensor+10 = tensor([11, 12, 13])
subtraction, tensor-10 = tensor([-9, -8, -7])
multiplication, tensor*10 = tensor([10, 20, 30])
tensor after operations = tensor([1, 2, 3])
torch.add(tensor, 10) = tensor([11, 12, 13])
torch.multiply(tensor, 10) = tensor([10, 20, 30])


### Matrix multiplication
* element wise multiplication of two tensors of identical size
  + directly multiply the two tensors
* matrix multiplication (inner dimensions must match)
  + use torch.matmul
  + can also use @, which is not recommended

In [36]:
# initialize tensor
tensor = torch.tensor([1, 2, 3])
tensor.shape

torch.Size([3])

In [37]:
# element-wise multiplication
tensor * tensor

tensor([1, 4, 9])

In [38]:
# matrix multiplication
torch.matmul(tensor, tensor)

tensor(14)

In [41]:
%%time
# matmul is faster than matrix
# multiplication by for loop

value = 0
for num in tensor:
    value += num * num
    
value    

CPU times: total: 0 ns
Wall time: 4 ms


tensor(14)

In [42]:
%%time
torch.matmul(tensor, tensor)

CPU times: total: 0 ns
Wall time: 0 ns


tensor(14)

#### Matrix shape matching in matrix multiplications
* tensor object has a property of .T, which is the transpose of tensor
* torch.transpose(input, dim0, dim1) switches the dimensions of a give tensor
* torch.nn.Linear() module, also known as a feed-forward layer or fully connected layer, implements a matrix multiplication between an input X and a weight matrix A.
  + in pytorch, each sample occupies a row, so the number of features is the number of columns
  + torch.nn.Linear specifies `in_features` and `out_features` as the number of features of input and output
  + torch.nn.Linear generates randomly initialized weight matrix to transfer input to output using the randomly generated weight matrix with the specified featur dimensions

In [50]:
tensor_A = torch.tensor([[1, 2],[3, 4], [5, 6]], dtype=torch.float32)
tensor_B = torch.tensor([[7, 8], [9, 10], [11, 12]], dtype=torch.float32)

In [51]:
# view tensor_A and tensor_B
print(tensor_A)
print(tensor_B)

tensor([[1., 2.],
        [3., 4.],
        [5., 6.]])
tensor([[ 7.,  8.],
        [ 9., 10.],
        [11., 12.]])


In [52]:
torch.mm(tensor_A, tensor_B.T)

tensor([[ 23.,  29.,  35.],
        [ 53.,  67.,  81.],
        [ 83., 105., 127.]])

In [53]:
# use torch.nn.linear to generate fully
# connected neuron network to tranform
# input matrix to output matrix
torch.manual_seed(42)

linear = torch.nn.Linear(in_features=2, out_features=6)
x = tensor_A
output = linear(x)

print(f'input shape: {x.shape}')
print(f'output shape: {output.shape}')

input shape: torch.Size([3, 2])
output shape: torch.Size([3, 6])


### Transformation of Tensors
#### Aggregation of tensor
* we can find the min, max, mean and sum of tensors using tensor's aggregation methods
  + x.min()
  + x.max()
  + x.mean()
  + x.sum()

In [54]:
x = torch.arange(0, 100, 10)
x

tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])

In [56]:
print(f"min is {x.min()}")
print(f"max is {x.max()}")
print(f"average is {x.type(torch.float32).mean()}")
print(f"sum is {x.sum()}")
      

min is 0
max is 90
average is 45.0
sum is 450


#### Change tensor datatype
* A common issue with deep learning operations is having your tensors in different datatypes
  + if one tensor is in torach.float64, and another is in torch.float32, you might run into some errors
  + you can create tensor based on the current tensor with the desired datatype by tensor.type(dtype)
    + the function will return a new tensor with the dtype using the original data content
* Different datatypes
  + the lower the number (e.g. 32, 16, 8), the less precise a computer stores the value
  + a lower amount of storage generally results in faster computation and a smaller overall model
  + mobile-based neural networks often operate with 8-bit integers, smaller and faster to run but less accurate than their float32 counterparts

In [59]:
# create a tensor and check its datatype
tensor = torch.arange(10., 100., 10.)
tensor.dtype

torch.float32

In [60]:
# create a float16 tensor
tensor_float16 = tensor.type(dtype=torch.float16)
tensor_float16

tensor([10., 20., 30., 40., 50., 60., 70., 80., 90.], dtype=torch.float16)

#### reshape, stacking, squeezing and unsqueezing
* torch.reshape(input, shape). Can also use torch.Tensor.reshape()
* torch.Tensor.view(shape)
  + returns a view of the original tensor in a different shape but shares the same data as the original tensor
* torch.stack(tensors, dim=0)
  + concatenates a sequence of tensors along a new dimension(dim), all thensors must be same size
* torch.squeeze(input)
  + squeezes input to remove all the dimensions with value 1
* torch.unsqueeze(input, dim)
  + returns input with a dimension value of 1 added at dim
* torch.permute(input, dims)
  + returns a view of the original input with its dimension permuted (rearranged) to dims

#### Reshape code examples

In [61]:
# create a tensor
import torch

x = torch.arange(1., 8.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

In [62]:
# reshape the tensor by adding an extra dimension
x_reshaped = x.reshape(1, 7)
x, x_reshaped, x_reshaped.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]),
 tensor([[1., 2., 3., 4., 5., 6., 7.]]),
 torch.Size([1, 7]))

In [63]:
# reshape(1,7) is the same as unsqueeze
torch.unsqueeze(x, 0)

tensor([[1., 2., 3., 4., 5., 6., 7.]])

In [69]:
# change the view with torch.view
z = x.view(1, 7)
z, z.shape

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [68]:
# note that view shares the same underline data with original tensor
# if you change the view, the original tensor is changed, too
z[:, 0] = 5
z, x

(tensor([[5., 2., 3., 4., 5., 6., 7.]]), tensor([5., 2., 3., 4., 5., 6., 7.]))

In [72]:
# you can specify the element by indices to reset element
z[0][0] = 1.
z, x

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), tensor([1., 2., 3., 4., 5., 6., 7.]))

#### Stack code examples

In [76]:
# stack tensors on top of each other
# is done by setting dim = 0. This is
# also called vertical stack
x = torch.arange(1., 8.)
x_stacked_v = torch.stack([x, x, x, x, x], dim = 0)
x, x_stacked_v, x_stacked_v.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]),
 tensor([[1., 2., 3., 4., 5., 6., 7.],
         [1., 2., 3., 4., 5., 6., 7.],
         [1., 2., 3., 4., 5., 6., 7.],
         [1., 2., 3., 4., 5., 6., 7.],
         [1., 2., 3., 4., 5., 6., 7.]]),
 torch.Size([5, 7]))

In [77]:
# transpose the vector, and stack them
# on horizontal direction
x_stacked_h = torch.stack([x, x, x, x, x], dim = 1)
x,x_stacked_h, x_stacked_h.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]),
 tensor([[1., 1., 1., 1., 1.],
         [2., 2., 2., 2., 2.],
         [3., 3., 3., 3., 3.],
         [4., 4., 4., 4., 4.],
         [5., 5., 5., 5., 5.],
         [6., 6., 6., 6., 6.],
         [7., 7., 7., 7., 7.]]),
 torch.Size([7, 5]))

#### Squeeze and unsqueeze code example

In [78]:
x = torch.arange(1., 8.)
x, x.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

In [79]:
x_reshaped = x.reshape(1, 7)
x_reshaped, x_reshaped.shape

(tensor([[1., 2., 3., 4., 5., 6., 7.]]), torch.Size([1, 7]))

In [80]:
x_squeezed = x_reshaped.squeeze()
x_squeezed, x_squeezed.shape

(tensor([1., 2., 3., 4., 5., 6., 7.]), torch.Size([7]))

#### Permute tensor's axis order
* permute operation returns a view of the original tensor
* if you change the values in the view, the original tensor will be changed accordingly

In [81]:
x_original = torch.rand(size=(224, 224, 3))

# permute the original tensor to rearrange the axis order
# the current axis at 0, 1, 2 are changed to the original
# axis at 2, 0, 1, corresponding to (3, 224, 224)
x_permuted = x_original.permute(2, 0, 1)

x_original.shape, x_permuted.shape

(torch.Size([224, 224, 3]), torch.Size([3, 224, 224]))

#### Indexing
* this is used to select a portion/slice of data from a tensor
* a key concept is that indexing values goes outer dimension -> inner dimension
* you can use `:` to specify "all values in this dimension" and then use a comma to add another dimension

In [82]:
# create a tensor with three dimensions 
x = torch.arange(1., 10.).reshape(1, 3, 3)
x, x.shape

(tensor([[[1., 2., 3.],
          [4., 5., 6.],
          [7., 8., 9.]]]),
 torch.Size([1, 3, 3]))

In [83]:
# index the first, second and third dimension
# note the dimension is from outer to inner
print(f'First dimension:\n{x[0]}')
print(f'second dimension:\n{x[0][0]}')
print(f'third dimension:\n{x[0][0][0]}')

First dimension:
tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])
second dimension:
tensor([1., 2., 3.])
third dimension:
1.0


In [84]:
# demonstrate of : to select all

# get all values of 0th dimension and the 0 index of 1st dimension
# note that we only have one element in 0the dimension
# this is the same as x[0, 0, :] since there is only one element
# on the first dimension
x[:, 0]

tensor([[1., 2., 3.]])

In [86]:
# get all values of 0th and 1th dimensions but only index 1 of 2nd dimension
x[:, :, 1]

tensor([[2., 5., 8.]])

In [88]:
# get all values of 0th dimension but only 1 index value of the 
# 1st and 2nd dimensions
x[:, 1, 1]

tensor([5.])

In [89]:
# get index 0 of 0th and 1st dimension and all values of 2nd dimension
x[0, 0, :] # same as x[0][0]

tensor([1., 2., 3.])

### Reproducibility
* get the same results on different computers running the same code
* fix the randomness by torch.manual_seed(seed=an integer)

In [94]:
import random

# set random seed
RANDOM_SEED = 42
torch.manual_seed(RANDOM_SEED)
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called
# Otherwise, tensor_D will be different to tensor_C
torch.manual_seed(RANDOM_SEED)
random_tensor_D = torch.rand(3, 4)

# test if tensor_C == tensor_D
random_tensor_C == random_tensor_D



tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])