# PyTorch Introduction

Welcome to the introduction of PyTorch. PyTorch is a scientific computing package targeted for two main purposes: 

1. A replacement for NumPy with the ability to use the power of GPUs.

2. A deep learning framework that enables the flexible and swift building of neural network models.

Let's get started!


### Goals of this tutorial

- Getting to know PyTorch and understanding how it is different from numpy

- Understanding PyTorch's Tensor and Pytorch's Autograd

## Enable GPUs on Colab

Having a library that has GPU support is one thing, the other is actually owning the hardware. Alternatively, you can use google colab though we have to manually enable it.

To enable GPU support in Google Colab go to `Menu > Runtime > Change runtime type` and enable the GPU hardware accelerator to speed up your trainings considerably. However, this functionality might not be available at any time.


# Installing PyTorch

Pytorch provides support for accelerating computation using CUDA enabled GPU's. If your workstation has an NVIDIA GPU, install PyTorch along with the CUDA component.

#### Install [PyTorch](https://pytorch.org/) and [torchvision](https://github.com/pytorch/vision)

For this class we will use the current Pytorch version 1.11. To install, please uncomment and run the proper line in the upcoming cell depending on your operating system (and CUDA setup). We won't go into details of installing

In [1]:
# Install a pip package in the current Jupyter kernel
import sys

# For google colab
# !python -m pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html

# For Linux and probably Windows (CPU)
# !{sys.executable} -m pip install torch==1.11.0+cpu torchvision==0.12.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

# For Linux and probably Windows (Prerequisites: Nvidia GPU + CUDA toolkit 11.3)
# !{sys.executable} -m pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

# For OS X/Mac
# !{sys.executable} -m pip install torch==1.11.0 torchvision==0.12.0 -f https://download.pytorch.org/whl/torch_stable.html

<div class="alert alert-block alert-warning">
    <b>Nvidia GPU</b>
    <p>If you have a rather recent Nvidia GPU, you can go ahead and install the CUDA toolkit together with a current version of cudnn (though it is possible to use other versions as long as you build it yourself). Afterwards, you can run the respective line in the cell above.</p>
    <p>There are multiple setups on how to install those on both Linux and Windows, but it depends on your setup. If you want to utilize your GPU you have to go through those steps. Use the forum for help if you get stuck.</p>
    <br>
    <b>Google Colab Pytorch Installation Time</b>
    <p>Google colab might use an older/newer version of pytorch. Since we are mostly using defualt functionality, you should be fine by using the default colab version to avoid the long installation time at your own risk.</p>
</div>

#### Checking PyTorch Installation and Version

In [2]:
import torch
import torchvision
print(f"PyTorch version Installed: {torch.__version__}\nTorchvision version Installed: {torchvision.__version__}\n")
if not torch.__version__.startswith("1.11"):
    print(f"you are using an another version of PyTorch. We expect PyTorch 1.11.0. You may continue using your version but it"
          " might cause dependency and compatibility issues.")
if not torchvision.__version__.startswith("0.12"):
    print(f"you are using an another version of torchvision. We expect torchvision 0.12. You can continue with your version but it"
          " might cause dependency and compatibility issues.")
# set printing options for nice output in this notebook
torch.set_printoptions(profile="short")

PyTorch version Installed: 1.10.2+cu113
Torchvision version Installed: 0.11.3+cu113

you are using an another version of PyTorch. We expect PyTorch 1.11.0. You may continue using your version but it might cause dependency and compatibility issues.
you are using an another version of torchvision. We expect torchvision 0.12. You can continue with your version but it might cause dependency and compatibility issues.


That's the end of installation. Let's dive right into PyTorch!


The following block imports the required packages for the rest of the notebook.

In [3]:

import numpy as np
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler

import os
import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'

%load_ext autoreload
%autoreload 2
%matplotlib inline

## 1. Tensors

[`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html) is the central class of PyTorch.
Tensors are similar to NumPyâ€™s ndarrays. The advantage of using Tensors is that 

* one can easily transfer them from CPU to GPU and therefore computations on tensors can be accelerated with a GPU.
* they store additionally the gradients, if requires_grad=True is set, which is needed for efficient backpropagation.

## 1.1 Initializing Tensor
Let us construct a NumPy array and a tensor of shape (2,3) directly from data values.

The interfaces are very similar.

In [9]:
# Initializing the Numpy Array
array_np = np.array([[1,2,3],[5,6,7]]) #NumPy array
# Initializing the Tensor
array_ts = torch.tensor([[1,2,3],[4,5,6]]) # Tensor

print(f"Variable array_np:\nDatatype: {type(array_np)}\nShape: {array_np.shape}")
print(f"Values:\n{array_np}")
print(f"\n\nVariable array_ts:\nDatatype {type(array_ts)}\nShape: {array_ts.shape}")
print(f"Values:\n{array_ts.cpu().numpy()}")

Variable array_np:
Datatype: <class 'numpy.ndarray'>
Shape: (2, 3)
Values:
[[1 2 3]
 [5 6 7]]


Variable array_ts:
Datatype <class 'torch.Tensor'>
Shape: torch.Size([2, 3])
Values:
[[1 2 3]
 [4 5 6]]


## 1.2 Conversion between NumPy array and Tensor

The conversion between NumPy ndarray and PyTorch tensor is quite easy.


In [8]:
# Conversion
array_np = np.array([1, 2, 3])
# Conversion from  a numpy array to a Tensor
array_ts_2 = torch.from_numpy(array_np)

# Conversion from  Tensor to numpy array
array_np_2 = array_ts_2.numpy() 

# Change a value of the np_array
array_np_2[1] = -1 

# Changes in the numpy array will also change the values in the tensor
assert(array_np[1] == array_np_2[1])

<div class="alert alert-block alert-info"><b></b> During the conversion, both ndarrays and the Tensor share the same memory address. Changes in value of one will
affect the other.</div>

## 1.3 Operations on Tensor

### 1.3.1 Indexing

We can use the NumPy array-like indexing for Tensors.

In [14]:
array_ts[:2,:2][:, 0]

tensor([0, 0])

In [10]:
# Let us take the first two columns from the original tensor array and save it in a new one
b = array_ts[:2, :2] 

# Let's assign the value of first column of the new variable to be zero 
b[:, 0] = 0 
print(b)

tensor([[0, 2],
        [0, 5]])


We will now select elements which satisfy a particular condition. In this example, let's find those elements of tensor which are array greater than one.

In [15]:
# Index of the elements with value greater than one
mask = array_ts > 1 
new_array = array_ts[mask]
print(new_array)

tensor([2, 3, 5, 6])


Let's try performing the same operation in a single line of code!

In [16]:
c = array_ts[array_ts>1]

# Is the result same as the array from the previous cell?
print(c == new_array)

tensor([True, True, True, True])


### 1.3.2 Mathematical operations on Tensor


#### Element-wise operations on Tensors

In [17]:
x = torch.tensor([[1,2],[3,4]])
y = torch.tensor([[5,6],[7,8]])

# Addition - Syntax 1
print(f"x + y: \n{(x + y).cpu().numpy()}")

# Addition - Syntax 2
print(f"x + y: \n{torch.add(x, y).cpu().numpy()}")

# Addition - Syntax 3
result_add = torch.empty(2, 2)
torch.add(x, y, out=result_add)
print(f"x + y: \n{result_add.cpu().numpy()}")

x + y: 
[[ 6  8]
 [10 12]]
x + y: 
[[ 6  8]
 [10 12]]
x + y: 
[[ 6.  8.]
 [10. 12.]]


Note: We only added `.cpu().numpy()` to receive a better formatted print statement.

Similar syntax holds for other element-wise operations such as subtraction and multiplication.

When dividing two integers in NumPy as well PyTorch, the result is always a **float**.   
For example,

In [18]:
x_np = np.array([[1,2],[3,4]])
y_np = np.array([[5,6],[7,8]])
print(f"x / y: \n{x_np / y_np}")

x / y: 
[[0.2        0.33333333]
 [0.42857143 0.5       ]]


#### Matrix Multiplication

PyTorch offers different options for doing matrix matrix multiplication.

If you want to do matrix mupliplication with more then two tensors you can use [torch.einsum()](https://pytorch.org/docs/stable/generated/torch.einsum.html). Einsum allows computing many common multi-dimensional linear algebraic array operations by representing them in a short-hand format based on the Einstein summation convention.

In [20]:
tensor1 = torch.randn(3, 3)
tensor2 = torch.randn(3)

# Matrix Multiplication - Syntax 1
output1 = tensor1 @ tensor2
# Matrix Multiplication - Syntax 2
output2 = torch.matmul(tensor1, tensor2)
# Matrix Multiplication - Syntax 3
output3 = torch.einsum("ij,j->i", tensor1, tensor2)

print(f"Matrix mutlplication\nInputs:\n{tensor1.cpu().numpy()}\nand\n{tensor2.cpu().numpy()} \n\n",
      f"Output1: \n{output1.cpu().numpy()}\n",
      f"Output2: \n{output2.cpu().numpy()}\n",
      f"Output3: \n{output3.cpu().numpy()}")

assert output1.equal(output2)
assert output2.equal(output3)

Matrix mutlplication
Inputs:
[[ 0.48588315 -0.8225588   0.14899746]
 [ 0.69823587 -0.8034909   0.16900674]
 [ 0.8192266  -0.43813497  0.12787765]]
and
[ 0.57209957 -0.61691636  0.88316196] 

 Output1: 
[0.9170124 1.0444074 0.8519085]
 Output2: 
[0.9170124 1.0444074 0.8519085]
 Output3: 
[0.9170124 1.0444074 0.8519085]


Doing matrix multiplication with more than two tensors.

In [21]:
tensor1 = torch.randn(3)
tensor2 = torch.randn(3, 3)
tensor3 = torch.randn(3)
# Matrix Multiplication - Syntax 1
output1 = tensor1 @ tensor2 @ tensor3
# Matrix Multiplication - Syntax 2
output2 = torch.einsum("i,ij,j", tensor1, tensor2, tensor3)

print(f"Chain multiplication:\n{output1}\n{output2}")

Chain multiplication:
-1.0988529920578003
-1.0988529920578003


When working with PyTorch operations are often done over a batch. PyTorch offers batching and broadcasting for matrix multiplication.

In [24]:
tensor1 = torch.randn(3, 4)
tensor2 = torch.randn(4)
print(
    f"matrix x vector multiplication:\n",
    f"Input shapes:\n",
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

matrix x vector multiplication:
 Input shapes:
 [3, 4] and [4]
 Output shape:
 [3]



In [25]:
# vector x vector
tensor1 = torch.randn(3)
tensor2 = torch.randn(3)
torch.matmul(tensor1, tensor2).size()
print(
    f"vector x vector multiplication:\n",
    f"Input shapes:\n", 
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

# matrix x vector
tensor1 = torch.randn(3, 4)
tensor2 = torch.randn(4)
print(
    f"matrix x vector multiplication:\n",
    f"Input shapes:\n",
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

# batched matrix x broadcasted vector
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4)
print(
    f"batched matrix x broadcasted vector multiplication:\n",
    f"Input shapes:\n",
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

# batched matrix x batched matrix
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(10, 4, 5)
print(
    f"batched matrix x batched matrix multiplication:\n",
    f"Input shapes:\n",
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

# batched matrix x broadcasted matrix
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4, 5)
print(
    f"batched matrix x broadcasted matrix multiplication:\n",
    f"Input shapes:\n",
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

vector x vector multiplication:
 Input shapes:
 [3] and [3]
 Output shape:
 []

matrix x vector multiplication:
 Input shapes:
 [3, 4] and [4]
 Output shape:
 [3]

batched matrix x broadcasted vector multiplication:
 Input shapes:
 [10, 3, 4] and [4]
 Output shape:
 [10, 3]

batched matrix x batched matrix multiplication:
 Input shapes:
 [10, 3, 4] and [10, 4, 5]
 Output shape:
 [10, 3, 5]

batched matrix x broadcasted matrix multiplication:
 Input shapes:
 [10, 3, 4] and [4, 5]
 Output shape:
 [10, 3, 5]



For addtional mathematical operations check out the [PyTorch](https://pytorch.org/docs/stable/index.html) documentation

## 1.4 Gradients

We create two tensors a and b with requires_grad=True. This signals to `autograd` that every operation on them should be tracked. We create another tensor ``Q`` from ``a`` and ``b``. 

$Q = 3a^3 - b^2$

`autograd` then let us compute the gradient of ``Q`` with respect to ``a`` and ``b``. In this case

$\frac{\partial Q}{\partial a} = 9a^2$

$\frac{\partial Q}{\partial b} = -2b$

In [35]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)
q  = 3*a**3 - b**2

q.sum().backward()
print(a.grad.cpu())
print(b.grad.cpu())

tensor([36., 81.])
tensor([-12.,  -8.])


In [36]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

# compute the a function with the pytorch tensors
Q = 3*a**3 - b**2

# call backward on a function to compute the gradient
Q.sum().backward()
print(f"Gradients:\na:\n{a.grad.cpu().numpy()}\nb:\n{b.grad.cpu().numpy()}")

Gradients:
a:
[36. 81.]
b:
[-12.  -8.]


Disable the gradient computation for single tensors by setting `requires_grad=False`.

In [37]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=False)

# compute the a function with the pytorch tensors
Q = 3*a**3 - b**2

# call backward on a function to compute the gradient
Q.sum().backward()
print(f"Gradients:\na:\n{a.grad.cpu().numpy()}\nb:\n{b.grad}")

Gradients:
a:
[36. 81.]
b:
None


When doing evaluations you can wrap a code block in 
`with torch.no_grad()`
to prevent gradient computation.

In [38]:
with torch.no_grad():
  a = torch.tensor([2., 3.], requires_grad=True)
  b = torch.tensor([6., 4.], requires_grad=False)

  # compute the a function with the pytorch tensors
  Q = 3*a**3 - b**2

  # call backward with torch.no_grad() enabled results in a runtime error
  try:
    Q.sum().backward()
  except RuntimeError as e:
    print(f"RuntimeError: {e}")
  print(f"Gradients:\na:\n{a.grad}\nb:\n{b.grad}")

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Gradients:
a:
None
b:
None


## 1.5 Devices

When training a neural network, it is important to make sure that all the required tensors as well as the model are on the same device. Tensors can be moved between the CPU and GPU using `.to` method.

Let us check if a GPU is available. If it is available, we will assign it to `device` and move the tensor `x` to the GPU.

In [39]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

print(f"Original device: {x.device}") # "cpu"

tensor = x.to(device)
print(f"Current device: {tensor.device}") #"cpu" or "cuda"

cuda
Original device: cpu
Current device: cuda:0


So `x` has been moved on to a CUDA device for those who have a GPU; otherwise it's still on the CPU.

<div class="alert alert-block alert-info"><b>Tip:</b> Try including the <b>.to(device)</b> calls in your codes. It is then easier to port the code to run on a GPU.</div>

## 1.6 Timing with PyTorch

Timing CPU-only operations can be done with standard python timing operations, e.g. timeit.

Since CUDA is asynchronous, timing GPU operations needs some additional tools. One option uses CUDA events. Timing the matrix multiplication is done by sandwiching the call between CUDA events.

Other timing options that use the PyTorch [autograd profiler](https://pytorch.org/docs/stable/autograd.html?highlight=autograd%20profiler#torch.autograd.profiler.profile) are also possible.

In [40]:
import time

# create random variables to do matrix multiplication with
A = torch.randn((10, 10000, 10000), device="cpu")
b = torch.randn((10000, 1), device="cpu")

start_cpu = time.perf_counter()
results_cpu = A @ b
end_cpu = time.perf_counter()

# Waits for everything to finish running
print(f"Time with cpu in sec: \n{end_cpu - start_cpu}")

A.to(device)
b.to(device)

# create a start and end cuda event used for timing
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

start.record()
results_gpu = A @ b
end.record()

# Waits for everything to finish running
torch.cuda.synchronize()
print(f"Timing with {device} in sec: \n{start.elapsed_time(end) / 1000}")

Time with cpu in sec: 
0.1541735000000699
Timing with cuda in sec: 
0.12709046173095703
