# Exercise 00
## PyTorch Introduction - Part I

Welcome to the introduction of PyTorch. PyTorch is a scientific computing package targeted for two main purposes: 

1. A replacement for NumPy with the ability to use the power of GPUs.
2. A deep learning framework that enables the flexible and swift building of neural network models.

Let's get started!


### Goals of this tutorial

- Getting to know PyTorch and understanding how it is different from numpy
- Understanding PyTorch's Tensor and Pytorch's Autograd

#### Checking PyTorch Installation and Version
Before we begin with code, we first have to check if the correct pytorch version is installed. For this, just run the next code cell.

In [None]:
import torch
import torchvision
# set printing options for nice output in this notebook
torch.set_printoptions(profile="short")

Now, we are almost ready to dive right into PyTorch! But first, we need to import the rest of packages required for the notebook. Such a cell is included at the top of every exercise notebook.

We added the following options to the notebook to make your experience smoother:
- %load_ext autoreload
- %autoreload 2
- %matplotlib inline

The first two options enable a continuous reloading of the code outside of this notebook. This becomes important later on in the exercises. The last option allows for matplotlib plots to be part of the jupyter notebook.

In [None]:
import numpy as np

%load_ext autoreload
%autoreload 2
%matplotlib inline

## 1. Tensors

[`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html) is the central class of PyTorch.
Tensors are similar to NumPy’s ndarrays. The advantage of using Tensors is that 

* one can easily transfer them from CPU to GPU and therefore computations on tensors can be accelerated with a GPU.
* they store additionally the gradients, if requires_grad=True is set, which is needed for efficient backpropagation.

## 1.1 Initializing Tensor
Let us construct a NumPy array and a tensor of shape (2,3) directly from data values.

The interfaces are very similar.

In [None]:
# Initializing the Numpy Array
array_np = np.array([[1,2,3],[5,6,7]]) #NumPy array
# Initializing the Tensor
array_ts = torch.tensor([[1,2,3],[4,5,6]]) # Tensor

print(f"Variable array_np:\nDatatype: {type(array_np)}\nShape: {array_np.shape}")
print(f"Values:\n{array_np}")
print(f"\n\nVariable array_ts:\nDatatype {type(array_ts)}\nShape: {array_ts.shape}")
print(f"Values:\n{array_ts.cpu().numpy()}")

## 1.2 Conversion between NumPy array and Tensor

The conversion between NumPy ndarray and PyTorch tensor is quite easy.


In [None]:
# Conversion
array_np = np.array([1, 2, 3])
# Conversion from  a numpy array to a Tensor
array_ts_2 = torch.from_numpy(array_np)

# Conversion from  Tensor to numpy array
array_np_2 = array_ts_2.numpy() 

# Change a value of the np_array
array_np_2[1] = -1 

# Changes in the numpy array will also change the values in the tensor
assert(array_np[1] == array_np_2[1])

<div class="alert alert-block alert-info"><b></b> During the conversion, both ndarrays and the Tensor share the same memory address. Changes in value of one will
affect the other.</div>

## 1.3 Operations on Tensor

### 1.3.1 Indexing

We can use the NumPy array-like indexing for Tensors.

In [None]:
# Let us take the first two columns from the original tensor array and save it in a new one
b = array_ts[:2, :2] 

# Let's assign the value of first column of the new variable to be zero 
b[:, 0] = 0 
print(b)

We will now select elements which satisfy a particular condition. In this example, let's find those elements of tensor which are array greater than one.

In [None]:
# Index of the elements with value greater than one
mask = array_ts > 1 
new_array = array_ts[mask]
print(new_array)

Let's try performing the same operation in a single line of code!

In [None]:
c = array_ts[array_ts>1]

# Is the result same as the array from the previous cell?
print(c == new_array)

### 1.3.2 Mathematical operations on Tensor


#### Element-wise operations on Tensors

In [None]:
x = torch.tensor([[1,2],[3,4]])
y = torch.tensor([[5,6],[7,8]])

# Addition - Syntax 1
print(f"x + y: \n{(x + y).cpu().numpy()}")

# Addition - Syntax 2
print(f"x + y: \n{torch.add(x, y).cpu().numpy()}")

# Addition - Syntax 3
result_add = torch.empty(2, 2)
torch.add(x, y, out=result_add)
print(f"x + y: \n{result_add.cpu().numpy()}")

Note: We only added `.cpu().numpy()` to receive a better formatted print statement.

Similar syntax holds for other element-wise operations such as subtraction and multiplication.

When dividing two integers in NumPy as well PyTorch, the result is always a **float**.   
For example,

In [None]:
x_np = np.array([[1,2],[3,4]])
y_np = np.array([[5,6],[7,8]])
print(f"x / y: \n{x_np / y_np}")

#### Matrix Multiplication

PyTorch offers different options for doing matrix matrix multiplication.

If you want to do matrix mupliplication with more then two tensors you can use [torch.einsum()](https://pytorch.org/docs/stable/generated/torch.einsum.html). Einsum allows computing many common multi-dimensional linear algebraic array operations by representing them in a short-hand format based on the Einstein summation convention.

In [None]:
tensor1 = torch.randn(3, 3)
tensor2 = torch.randn(3)

# Matrix Multiplication - Syntax 1
output1 = tensor1 @ tensor2
# Matrix Multiplication - Syntax 2
output2 = torch.matmul(tensor1, tensor2)
# Matrix Multiplication - Syntax 3
output3 = torch.einsum("ij,j->i", tensor1, tensor2)

print(f"Matrix mutlplication\nInputs:\n{tensor1.cpu().numpy()}\nand\n{tensor2.cpu().numpy()} \n\n",
      f"Output1: \n{output1.cpu().numpy()}\n",
      f"Output2: \n{output2.cpu().numpy()}\n",
      f"Output3: \n{output3.cpu().numpy()}")

assert output1.equal(output2)
assert output2.equal(output3)

Doing matrix multiplication with more than two tensors.

In [None]:
tensor1 = torch.randn(3)
tensor2 = torch.randn(3, 3)
tensor3 = torch.randn(3)
# Matrix Multiplication - Syntax 1
output1 = tensor1 @ tensor2 @ tensor3
# Matrix Multiplication - Syntax 2
output2 = torch.einsum("i,ij,j", tensor1, tensor2, tensor3)

print(f"Chain multiplication:\n{output1}\n{output2}")

### 1.3.3 Broadcasting

Broadcasting is very important in PyTorch often leading to cleaner code. The idea of broadcasting is to enable cleaner code by automatically expanding tensors to the right shape. This removes the necessity of doing these operations by hand in the code itself, which is often confusing and introduces potential bugs in the code. Broadcasting is also available in Numpy, but in PyTorch it is often quite handy as operations are often done over a batch. We show some examples for broadcasting with matrix multiplication.

In [None]:
# vector x vector
tensor1 = torch.randn(3)
tensor2 = torch.randn(3)
torch.matmul(tensor1, tensor2).size()
print(
    f"vector x vector multiplication:\n",
    f"Input shapes:\n", 
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

# matrix x vector
tensor1 = torch.randn(3, 4)
tensor2 = torch.randn(4)
print(
    f"matrix x vector multiplication:\n",
    f"Input shapes:\n",
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

# batched matrix x broadcasted vector
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4)
print(
    f"batched matrix x broadcasted vector multiplication:\n",
    f"Input shapes:\n",
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

# batched matrix x batched matrix
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(10, 4, 5)
print(
    f"batched matrix x batched matrix multiplication:\n",
    f"Input shapes:\n",
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

# batched matrix x broadcasted matrix
tensor1 = torch.randn(10, 3, 4)
tensor2 = torch.randn(4, 5)
print(
    f"batched matrix x broadcasted matrix multiplication:\n",
    f"Input shapes:\n",
    f"{[size for size in tensor1.size()]} and {[size for size in tensor2.size()]}\n",
    f"Output shape:\n",
    f"{[size for  size in torch.matmul(tensor1, tensor2).size()]}\n"
)

Broadcasting often goes hand in hand with the `Tensor.unsqueeze(dim)` method. `Tensor.unsqueeze(dim)` returns a Tensor with an additional dimension of size 1 inserted at `dim`. We show the use of this together in use with tensor addition. For addtional mathematical operations check out the [PyTorch](https://pytorch.org/docs/stable/index.html) documentation

In [None]:
# adding a vector to each row of a matric
tensor1 = torch.randn(2, 3)
tensor2 = torch.randn(3)
print(
    f"adding a vector to each row of a matrix:\n",
    f"Inputs:\n", 
    f"{tensor1.cpu().numpy()}\n",
    f"{tensor2.cpu().numpy()}\n",
    f"Outputs:\n",
    f"{(tensor1 + tensor2.unsqueeze(0)).cpu().numpy()}\n"
)

# adding each element of a vector to each element of another vector
tensor1 = torch.randn(3)
tensor2 = torch.randn(4)
print(
    f"adding each element of a vector to each element of another vector:\n",
    f"Inputs:\n", 
    f"{tensor1.cpu().numpy()}\n",
    f"{tensor2.cpu().numpy()}\n",
    f"Outputs:\n",
    f"{(tensor1.unsqueeze(1) + tensor2.unsqueeze(0)).cpu().numpy()}\n"
)

## 1.4 Gradients

We create two tensors a and b with requires_grad=True. This signals to `autograd` that every operation on them should be tracked. We create another tensor ``Q`` from ``a`` and ``b``. 

$Q = 3a^3 - b^2$

`autograd` then let us compute the gradient of ``Q`` with respect to ``a`` and ``b``. In this case

$\frac{\partial Q}{\partial a} = 9a^2$

$\frac{\partial Q}{\partial b} = -2b$

In [None]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

# compute the a function with the pytorch tensors
Q = 3*a**3 - b**2

# call backward on a function to compute the gradient
Q.sum().backward()
print(f"Gradients:\na:\n{a.grad.cpu().numpy()}\nb:\n{b.grad.cpu().numpy()}")

Disable the gradient computation for single tensors by setting `requires_grad=False`.

In [None]:
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=False)

# compute the a function with the pytorch tensors
Q = 3*a**3 - b**2

# call backward on a function to compute the gradient
Q.sum().backward()
print(f"Gradients:\na:\n{a.grad.cpu().numpy()}\nb:\n{b.grad}")

When doing evaluations you can wrap a code block in 
`with torch.no_grad()`
to prevent gradient computation.

In [None]:
with torch.no_grad():
  a = torch.tensor([2., 3.], requires_grad=True)
  b = torch.tensor([6., 4.], requires_grad=False)

  # compute the a function with the pytorch tensors
  Q = 3*a**3 - b**2

  # call backward with torch.no_grad() enabled results in a runtime error
  try:
    Q.sum().backward()
  except RuntimeError as e:
    print(f"RuntimeError: {e}")
  print(f"Gradients:\na:\n{a.grad}\nb:\n{b.grad}")

## 1.5 Devices

When training a neural network, it is important to make sure that all the required tensors as well as the model are on the same device. Tensors can be moved between the CPU and GPU using `.to` method.

Let us check if a GPU is available. If it is available, we will assign it to `device` and move the tensor `x` to the GPU.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"The selected device is {device}")

x = torch.rand(3, 4, device="cpu")
print(f"Original device: {x.device}") # "cpu"

tensor = x.to(device)
print(f"Current device: {tensor.device}") #"cpu" or "cuda"

print(f"The original Tensor remains on: {x.device}") # "cpu"

By calling `.to(device)` we have created a copy of the tensor on our selected device. This is a CUDA device for those who have a GPU; otherwise it's still on the CPU. `x` however, remains on the CPU.

You might have noticed that the device for the tensor is `cuda:0` instead of `cuda`. The `:0` signals that the tensor is on the first GPU. This becomes very important when you start to do distributed training on multiple GPUs.

<div class="alert alert-block alert-info"><b>Tip:</b> Try including the <b>.to(device)</b> calls in your codes. It is then easier to port the code to run on a GPU.</div>

## 1.6 Timing with PyTorch

Timing CPU-only operations can be done with standard python timing operations, e.g. timeit.

Since CUDA is asynchronous, timing GPU operations needs some additional tools. One option uses CUDA events. Timing the matrix multiplication is done by sandwiching the call between CUDA events.

Other timing options that use the PyTorch [autograd profiler](https://pytorch.org/docs/stable/autograd.html?highlight=autograd%20profiler#torch.autograd.profiler.profile) are also possible.

In [None]:
import time

# create random variables to do matrix multiplication with
A = torch.randn((10, 10000, 10000), device="cpu")
b = torch.randn((10000, 1), device="cpu")

start_cpu = time.perf_counter()
results_cpu = A @ b
end_cpu = time.perf_counter()

# Waits for everything to finish running
print(f"Time with cpu in sec: \n{end_cpu - start_cpu}")

A = A.to(device)
b = b.to(device)

# create a start and end cuda event used for timing
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

start.record()
results_gpu = A @ b
end.record()

# Waits for everything to finish running
torch.cuda.synchronize()
print(f"Timing with {device} in sec: \n{start.elapsed_time(end) / 1000}")