<a href="https://colab.research.google.com/github/vin136/uncertainty-estimates/blob/vin-ideas_1/nbs/tools.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tools


## Pytorch-lightning
To remove boiler plate code without losing the flexibility of Pytorch.

In [2]:
# To run in colab, on local you should have already installed all the packages.
!pip install pytorch-lightning==1.5.7

Collecting pytorch-lightning==1.5.7
  Downloading pytorch_lightning-1.5.7-py3-none-any.whl (526 kB)
[?25l[K     |▋                               | 10 kB 23.3 MB/s eta 0:00:01[K     |█▎                              | 20 kB 14.6 MB/s eta 0:00:01[K     |█▉                              | 30 kB 7.7 MB/s eta 0:00:01[K     |██▌                             | 40 kB 3.8 MB/s eta 0:00:01[K     |███▏                            | 51 kB 4.3 MB/s eta 0:00:01[K     |███▊                            | 61 kB 4.6 MB/s eta 0:00:01[K     |████▍                           | 71 kB 4.7 MB/s eta 0:00:01[K     |█████                           | 81 kB 5.3 MB/s eta 0:00:01[K     |█████▋                          | 92 kB 5.1 MB/s eta 0:00:01[K     |██████▎                         | 102 kB 4.4 MB/s eta 0:00:01[K     |██████▉                         | 112 kB 4.4 MB/s eta 0:00:01[K     |███████▌                        | 122 kB 4.4 MB/s eta 0:00:01[K     |████████                        | 133 k

In [2]:
import time

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.utils.data as data

%matplotlib inline
from IPython.display import set_matplotlib_formats
from matplotlib.colors import to_rgba
from tqdm.notebook import tqdm  # Progress bar

set_matplotlib_formats("svg", "pdf")

In [5]:
torch.manual_seed(42)  # Setting the seed

<torch._C.Generator at 0x7fd4f2607ab0>

### Pytorch-tour

In [13]:
# Tensors
torch.tensor([2,4,8])

tensor([2, 4, 8])

In [24]:
# create tensors implicitly
torch.arange(0,10,1)
torch.zeros(10)
torch.randn(2,3)
torch.rand(10)

tensor([0.1053, 0.2695, 0.3588, 0.1994, 0.5472, 0.0062, 0.9516, 0.0753, 0.8860,
        0.5832])

In [30]:
# conv btw numpy and torch tensors
a = np.array([0,8,7.7])
a_t = torch.from_numpy(a)
#torch tensor
a_t
#numpy array - call .cpu before as the original tensor might be on a GPU
a_t.cpu().numpy()

array([0. , 8. , 7.7])

In [31]:
#getting shapes

t = torch.rand(4,2,3)
t.shape
b,r,c = t.size()
print(f"site of tensor : {b,r,c}")

site of tensor : (4, 2, 3)


In [35]:
# operations on tensors


x1 = torch.rand(2)
x2 = torch.rand(2)
#creates new tensor
x1+x2

tensor([0.7498, 0.3936])

In [36]:
#inplace ops
print(f"x1 before:{x1}")
print(f"x2 before:{x2}")
x1.add_(x2)
print(f"x1 after:{x1}")
print(f"x2 after:{x2}")

x1 before:tensor([0.1716, 0.3336])
x2 before:tensor([0.5782, 0.0600])
x1 after:tensor([0.7498, 0.3936])
x2 after:tensor([0.5782, 0.0600])


In [38]:
#reshape ops

x = torch.rand(1,2,4)
x.view(8)

tensor([0.3289, 0.1054, 0.9192, 0.4008, 0.9302, 0.6558, 0.0766, 0.8460])

In [41]:
y = x.permute([1,0,2])
y.shape

torch.Size([2, 1, 4])

In [43]:
#matrix multiplication with broadcasting.

x = torch.tensor([1.0,2.0,3.0]).view(3,1)
W = torch.rand(2,3)
print(x.shape,W.shape)

torch.Size([3, 1]) torch.Size([2, 3])


In [45]:
torch.matmul(W,x)

tensor([[3.4876],
        [2.7736]])

In [47]:
W@x

tensor([[3.4876],
        [2.7736]])

In [49]:
# Gradients
x = torch.ones((2,))
x.requires_grad

False

In [50]:
# if we need grad w.r.t this variable
x.requires_grad_(True)
x.requires_grad

True

Dynamic computation graphs

In order to get familiar with the concept of a computation graph, we will create one for the following function:

$$y = \frac{1}{|x|}\sum_i \left[(x_i + 2)^2 + 3\right]$$

You could imagine that $x$ are our parameters, and we want to optimize (either maximize or minimize) the output $y$.
For this, we want to obtain the gradients $\partial y / \partial \mathbf{x}$.
For our example, we'll use $\mathbf{x}=[0,1,2]$ as our input.

In [52]:
x = torch.arange(3,dtype=torch.float32,requires_grad=True)
x

tensor([0., 1., 2.], requires_grad=True)

In [54]:
a = x+2
b = a**2
c = b+3
y = c.mean()
y

tensor(12.6667, grad_fn=<MeanBackward0>)

In [55]:
y.backward()
print(x.grad.data)

tensor([1.3333, 2.0000, 2.6667])


We can also verify these gradients by hand.

---


We will calculate the gradients using the chain rule, in the same way as PyTorch did it:

$$\frac{\partial y}{\partial x_i} = \frac{\partial y}{\partial c_i}\frac{\partial c_i}{\partial b_i}\frac{\partial b_i}{\partial a_i}\frac{\partial a_i}{\partial x_i}$$

Note that we have simplified this equation to index notation, and by using the fact that all operation besides the mean do not combine the elements in the tensor.
The partial derivatives are:

$$
\frac{\partial a_i}{\partial x_i} = 1,\hspace{1cm}
\frac{\partial b_i}{\partial a_i} = 2\cdot a_i\hspace{1cm}
\frac{\partial c_i}{\partial b_i} = 1\hspace{1cm}
\frac{\partial y}{\partial c_i} = \frac{1}{3}
$$

Hence, with the input being $\mathbf{x}=[0,1,2]$, our gradients are $\partial y/\partial \mathbf{x}=[4/3,2,8/3]$.
The previous code cell should have printed the same result.

In [3]:
#GPU's
gpu_avail = torch.cuda.is_available()
print(f"Is the GPU available? {gpu_avail}")

Is the GPU available? True


In [4]:
#specify device
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print("Device", device)

x = torch.zeros(2, 3)
x = x.to(device)
print("X", x)

Device cuda
X tensor([[0., 0., 0.],
        [0., 0., 0.]], device='cuda:0')


In [5]:
# gpu speed-up
x = torch.randn(10000, 10000)

# CPU version
start_time = time.time()
_ = torch.matmul(x, x)
end_time = time.time()
print(f"CPU time: {(end_time - start_time):6.5f}s")

# GPU version
if torch.cuda.is_available():
    x = x.to(device)
    # CUDA is asynchronous, so we need to use different timing functions
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)
    start.record()
    _ = torch.matmul(x, x)
    end.record()
    torch.cuda.synchronize()  # Waits for everything to finish running on the GPU
    print(f"GPU time: {0.001 * start.elapsed_time(end):6.5f}s")  # Milliseconds to seconds

CPU time: 26.72527s
GPU time: 0.92781s


In [6]:
# GPU operations have a separate seed we also want to set
if torch.cuda.is_available():
    torch.cuda.manual_seed(42)
    torch.cuda.manual_seed_all(42)

# Additionally, some operations on a GPU are implemented stochastic for efficiency
# We want to ensure that all operations are deterministic on GPU (if used) for reproducibility
torch.backends.cudnn.determinstic = True
torch.backends.cudnn.benchmark = False

### Lightning-tour

In [7]:
# Let's get straight into how to build models using pytorch-lightning


