# day 04: PyTorch warmup and Dataloaders

*special thanks to YSDA team for provided materials*
Second part is based on PyTorch official tutorials and [this kaggle kernel](https://www.kaggle.com/pinocookie/pytorch-dataset-and-dataloader)

What comes today:
- Introduction to PyTorch
- Automatic gradient computation
- Logistic regression (it's a neural network, actually ;) )

![img](https://pytorch.org/tutorials/_static/pytorch-logo-dark.svg)

__This notebook__ will teach you to use pytorch low-level core. You can install it [here](http://pytorch.org/).

__Pytorch feels__ differently than other frameworks (like tensorflow/theano) on almost every level. TensorFlow makes your code live in two "worlds" simultaneously:  symbolic graphs and actual tensors. First you declare a symbolic "recipe" of how to get from inputs to outputs, then feed it with actual minibatches of data.  In pytorch, __there's only one world__: all tensors have a numeric value.

You compute outputs on the fly without pre-declaring anything. The code looks exactly as in pure numpy with one exception: pytorch computes gradients for you. And can run stuff on GPU. And has a number of pre-implemented building blocks for your neural nets. [And a few more things.](https://medium.com/towards-data-science/pytorch-vs-tensorflow-spotting-the-difference-25c75777377b)

Let's dive into it!

In [1]:
# In Google Colab uncomment this

# ! wget https://raw.githubusercontent.com/v-goncharenko/madmo-21-04/madmo_04_optimization_regularization/notmnist.py

In [2]:
import numpy as np
import torch


print(torch.__version__)

1.7.1


In [3]:
import matplotlib.pyplot as plt
import numpy as np  # linear algebra
import pandas as pd  # data processing, CSV file I/O (e.g. pd.read_csv)

import torch
from torch.utils.data import DataLoader, Dataset

import torchvision
from torchvision import transforms

## Warming up: Tensormancy

**The [_disclaimer_](https://gist.githubusercontent.com/justheuristic/e2c1fa28ca02670cabc42cacf3902796/raw/fd3d935cef63a01b85ed2790b5c11c370245cbd7/stddisclaimer.h)**

Let's write another function, this time in polar coordinates:
$$\rho(\theta) = (1 + 0.9 \cdot cos (6 \cdot \theta) ) \cdot (1 + 0.01 \cdot cos(24 \cdot \theta)) \cdot (0.5 + 0.05 \cdot cos(200 \cdot \theta)) \cdot (10 + sin(10 \cdot \theta))$$


Then convert it into cartesian coordinates ([howto](http://www.mathsisfun.com/polar-cartesian-coordinates.html)) and plot the results.

Use torch tensors only: no lists, loops, numpy arrays, etc.

In [None]:
theta = torch.linspace(-np.pi, np.pi, steps=1000)

# compute rho(theta) as per formula above
rho = None

# Now convert polar (rho, theta) pairs into cartesian (x,y) to plot them.
x = None  # <your_code_here>
y = None  # <your_code_here>


plt.figure(figsize=[6, 6])
plt.fill(x.numpy(), y.numpy(), color="red")
plt.grid()

## Task 1: The game of life

Now it's time for you to make something more challenging. We'll implement Conway's [Game of Life](http://web.stanford.edu/~cdebs/GameOfLife/) in _pure pytorch_. 

While this is still a toy task, implementing game of life this way has one cool benefit: __you'll be able to run it on GPU! __ Indeed, what could be a better use of your gpu than simulating game of life on 1M/1M grids?

![img](https://cdn.tutsplus.com/gamedev/authors/legacy/Stephane%20Beniak/2012/09/11/Preview_Image.png)
If you've skipped the url above out of sloth, here's the game of life:
* You have a 2D grid of cells, where each cell is "alive"(1) or "dead"(0)
* Any living cell that has 2 or 3 neighbors survives, else it dies [0,1 or 4+ neighbors]
* Any cell with exactly 3 neighbors becomes alive (if it was dead)

For this task, you are given a reference numpy implementation that you must convert to pytorch.
_[numpy code inspired by: https://github.com/rougier/numpy-100]_


__Note:__ You can find convolution in `torch.nn.functional.conv2d(Z,filters)`. Note that it has a different input format.

__Note 2:__ From the mathematical standpoint, pytorch convolution is actually cross-correlation. Those two are very similar operations. More info: [video tutorial](https://www.youtube.com/watch?v=C3EEy8adxvc), [scipy functions review](http://programmerz.ru/questions/26903/2d-convolution-in-python-similar-to-matlabs-conv2-question), [stack overflow source](https://stackoverflow.com/questions/31139977/comparing-matlabs-conv2-with-scipys-convolve2d).

In [None]:
from scipy.signal import correlate2d


def np_update(Z):
    # Count neighbours with convolution
    filters = np.array([[1, 1, 1], [1, 0, 1], [1, 1, 1]])

    N = correlate2d(Z, filters, mode="same")

    # Apply rules
    birth = (N == 3) & (Z == 0)
    survive = ((N == 2) | (N == 3)) & (Z == 1)

    Z[:] = birth | survive
    return Z

In [None]:
def torch_update(Z):
    """
    Implement an update function that does to Z exactly the same as np_update.
    :param Z: torch.FloatTensor of shape [height,width] containing 0s(dead) an 1s(alive)
    :returns: torch.FloatTensor Z after updates.

    You can opt to create new tensor or change Z inplace.
    """

    # <Your code here!>

    return Z

In [None]:
# initial frame
Z_numpy = np.random.choice([0, 1], p=(0.5, 0.5), size=(100, 100))
Z = torch.from_numpy(Z_numpy).type(torch.FloatTensor)

# your debug polygon :)
Z_new = torch_update(Z.clone())

# tests
Z_reference = np_update(Z_numpy.copy())
assert np.all(
    Z_new.numpy() == Z_reference
), "your pytorch implementation doesn't match np_update. Look into Z and np_update(ZZ) to investigate."
print("Well done!")

In [None]:
plt.ion()

# initialize game field
Z = np.random.choice([0, 1], size=(100, 100))
Z = torch.from_numpy(Z).type(torch.FloatTensor)

fig = plt.figure()
ax = fig.add_subplot(111)
fig.show()

for _ in range(100):

    # update
    Z = torch_update(Z)

    # re-draw image
    ax.clear()
    ax.imshow(Z.numpy(), cmap="gray")
    fig.canvas.draw()

In [None]:
# Some fun setups for your amusement

# parallel stripes
Z = np.arange(100) % 2 + np.zeros([100, 100])
# with a small imperfection
Z[48:52, 50] = 1

Z = torch.from_numpy(Z).type(torch.FloatTensor)

fig = plt.figure()
ax = fig.add_subplot(111)
fig.show()

for _ in range(100):
    Z = torch_update(Z)
    ax.clear()
    ax.imshow(Z.numpy(), cmap="gray")
    fig.canvas.draw()

More fun with Game of Life: [video](https://www.youtube.com/watch?v=C2vgICfQawE)

## Task 2: Going serious with NotMNIST

In [None]:
from notmnist import load_notmnist


X_train, y_train, X_test, y_test = load_notmnist(letters="AB", test_size=0)
X_train, X_test = X_train.reshape([-1, 784]), X_test.reshape([-1, 784])

print("Train size = %i, test_size = %i" % (len(X_train), len(X_test)))

In [None]:
for i in [0, 1]:
    plt.subplot(1, 2, i + 1)
    plt.imshow(X_train[i].reshape([28, 28]))
    plt.title(str(y_train[i]))

Let's start with layers. The main abstraction here is __`torch.nn.Module`__

In [None]:
import torch.nn.functional as F
from torch import nn

### Putting it all together

In [None]:
# create network again just in case
model = nn.Sequential()
model.add_module("first", nn.Linear(784, 1))
model.add_module("second", nn.Sigmoid())

opt = torch.optim.Adam(model.parameters(), lr=1e-3)

In [None]:
history = []

for i in range(100):

    # sample 256 random images
    ix = np.random.randint(0, len(X_train), 256)
    x_batch = torch.tensor(X_train[ix], dtype=torch.float32)
    y_batch = torch.tensor(y_train[ix], dtype=torch.float32)

    # predict probabilities
    y_predicted = None  ### YOUR CODE

    assert y_predicted.dim() == 1, "did you forget to select first column with [:, 0]"

    # compute loss, just like before
    loss = None  ### YOUR CODE

    # compute gradients
    ### YOUR CODE

    # Adam step
    ### YOUR CODE

    # clear gradients
    ### YOUR CODE

    history.append(loss.data.numpy())

    if i % 10 == 0:
        print("step #%i | mean loss = %.3f" % (i, np.mean(history[-10:])))

In [None]:
# use your model to predict classes (0 or 1) for all test samples
predicted_y_test = None  ### YOUR CODE
predicted_y_test = np.array(predicted_y_test > 0.5)

assert isinstance(predicted_y_test, np.ndarray), "please return np array, not %s" % type(
    predicted_y_test
)
assert predicted_y_test.shape == y_test.shape, "please predict one class for each test sample"
assert np.in1d(predicted_y_test, y_test).all(), "please predict class indexes"

accuracy = np.mean(predicted_y_test == y_test)

print("Test accuracy: %.5f" % accuracy)
assert accuracy > 0.95, "try training longer"

print("Great job!")

## Task 3: Using the Dataloader

In [None]:
import matplotlib.pyplot as plt
import numpy as np  # linear algebra
import pandas as pd  # data processing, CSV file I/O (e.g. pd.read_csv)
import torch
import torchvision
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms

In [None]:
from torch.utils.data import DataLoader, Dataset

In [None]:
class DatasetMNIST(Dataset):
    def __init__(self, file_path, transform=None):
        self.data, self.labels, _, _ = load_notmnist(path=file_path, test_size=0)
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        # load image as ndarray type (Height * Width * Channels)
        # be carefull for converting dtype to np.uint8 [Unsigned integer (0 to 255)]
        # in this example, i don't use ToTensor() method of torchvision.transforms
        # so you can convert numpy ndarray shape to tensor in PyTorch (H, W, C) --> (C, H, W)
        image = self.data[index].transpose(1, 2, 0)
        label = self.labels[index]

        if self.transform is not None:
            image = self.transform(image)

        return image, label

In [None]:
train_dataset = DatasetMNIST("./notMNIST_small", transform=None)

In [None]:
# we can access and get data with index by __getitem__(index)
img, lab = train_dataset.__getitem__(0)

In [None]:
print(img.shape)
print(type(img))

In [None]:
a = torchvision.transforms.ToTensor()

In [None]:
a(img).shape

In [None]:
for i in [0, 1]:
    plt.subplot(1, 2, i + 1)
    plt.imshow(full_dataset[i][0].reshape([28, 28]))
    plt.title(str(full_dataset[i][1]))

#### To the DataLoader

In [None]:
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True)

We can use dataloader as iterator by using iter() function.

In [None]:
train_iter = iter(train_loader)
print(type(train_iter))

We can look at images and labels of batch size by extracting data `.next()` method.

In [None]:
images, labels = train_iter.next()

print("images shape on batch size = {}".format(images.size()))
print("labels shape on batch size = {}".format(labels.size()))

In [None]:
images.shape

In [None]:
# make grid takes tensor as arg
# tensor : (batchsize, channels, height, width)
grid = torchvision.utils.make_grid(images.permute([0, 3, 1, 2]))

plt.imshow(grid.numpy().transpose((1, 2, 0)))
plt.axis("off")
plt.title(labels.numpy());

And now with transformations:

In [None]:
train_dataset_with_transform = DatasetMNIST(
    "./notMNIST_small", transform=torchvision.transforms.ToTensor()
)

In [None]:
img, lab = train_dataset_with_transform.__getitem__(0)

print("image shape at the first row : {}".format(img.size()))

In [None]:
train_loader2 = DataLoader(train_dataset_with_transform, batch_size=8, shuffle=True)

train_iter2 = iter(train_loader2)
print(type(train_iter2))

images, labels = train_iter2.next()

print("images shape on batch size = {}".format(images.size()))
print("labels shape on batch size = {}".format(labels.size()))

In [None]:
grid = torchvision.utils.make_grid(images)

plt.imshow(grid.numpy().transpose((1, 2, 0)))
plt.axis("off")
plt.title(labels.numpy());

### Composing several transformations

If you want to take data augmentation, you have to make List using `torchvision.transforms.Compose`

```
class Compose(object):
    """Composes several transforms together.
    Args:
        transforms (list of ``Transform`` objects): list of transforms to compose.
    Example:
        >>> transforms.Compose([
        >>>     transforms.CenterCrop(10),
        >>>     transforms.ToTensor(),
        >>> ])
    """

    def __init__(self, transforms):
        self.transforms = transforms

    def __call__(self, img):
        for t in self.transforms:
            img = t(img)
        return img

    def __repr__(self):
        format_string = self.__class__.__name__ + '('
        for t in self.transforms:
            format_string += '\n'
            format_string += '    {0}'.format(t)
        format_string += '\n)'
        return format_string
```


this function can convert some image by order within `__call__` method.

In [None]:
class ReshapeToVector:
    def __call__(self, pic):
        return pic.view(pic.size(0), -1)[0]

    def __repr__(self):
        return self.__class__.__name__ + "()"

In [None]:
a = ReshapeToVector()

In [None]:
a(img).shape

In [None]:
new_transform = torchvision.transforms.Compose(
    [torchvision.transforms.ToTensor(), ReshapeToVector()]
)

### Putting all together

In [None]:
train_dataset_final = DatasetMNIST("./notMNIST_small", transform=new_transform)

In [None]:
train_loader = DataLoader(train_dataset_final, batch_size=8, shuffle=True)

train_iter = iter(train_loader)
print(type(train_iter))

images, labels = train_iter.next()

print("images shape on batch size = {}".format(images.size()))
print("labels shape on batch size = {}".format(labels.size()))

In [None]:
# create network again just in case
model = nn.Sequential()
model.add_module("first", nn.Linear(784, 10))
model.add_module("second", nn.Softmax())

opt = torch.optim.Adam(model.parameters(), lr=1e-3)

In [None]:
history = []

for i in range(100):
    # sample 256 random images
    x_batch, y_batch = train_iter.next()

    # predict probabilities
    y_predicted = model(x_batch)

    # assert y_predicted.dim() == 1, "did you forget to select first column with [:, 0]"

    # compute loss, just like before
    loss = F.cross_entropy(y_predicted, y_batch, reduction="mean")  ### YOUR CODE

    # compute gradients
    loss.backward()
    ### YOUR CODE

    # Adam step
    opt.step()
    ### YOUR CODE

    # clear gradients
    opt.zero_grad()
    ### YOUR CODE

    history.append(loss.data.numpy())

    if i % 10 == 0:
        print("step #%i | mean loss = %.3f" % (i, np.mean(history[-10:])))

### Your turn
Try to add some additional transformations (e.g. random crop, rotation etc.) and train your model!

## More about pytorch

* Using torch on GPU and multi-GPU - [link](http://pytorch.org/docs/master/notes/cuda.html)
* More tutorials on pytorch - [link](http://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html)
* Pytorch examples - a repo that implements many cool DL models in pytorch - [link](https://github.com/pytorch/examples)
* Practical pytorch - a repo that implements some... other cool DL models... yes, in pytorch - [link](https://github.com/spro/practical-pytorch)
* And some more - [link](https://www.reddit.com/r/pytorch/comments/6z0yeo/pytorch_and_pytorch_tricks_for_kaggle/)