[![Dataflowr](https://raw.githubusercontent.com/dataflowr/website/master/_assets/dataflowr_logo.png)](https://dataflowr.github.io/website/)

# [Module 5](https://dataflowr.github.io/website/modules/5-stacking-layers/): overfitting a MLP on CIFAR10

Training loop over CIFAR10 (40,000 train images, 10,000 test images). What happens if you
- switch the training to a GPU? Is it faster?
- Remove the `ReLU()`?
- Increase the learning rate?
- Stack more layers?
- Perform more epochs?

Can you completely overfit the training set (i.e. get 100% accuracy?)

This code is highly non-modulable. Create functions for each specific task.
(hint: see [this](https://github.com/pytorch/examples/blob/master/mnist/main.py))

Your training went well. Good. Why not save the weights of the network (`net.state_dict()`) using `torch.save()`?

In [None]:
import torch
import torchvision
import torch.nn as nn
import torchvision.transforms as t

# define network structure
net = nn.Sequential(nn.Linear(3 * 32 * 32, 1000), nn.ReLU(), nn.Linear(1000, 10))
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr = 0.01, momentum=0.9)

# load data
to_tensor =  t.ToTensor()
normalize = t.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
flatten =  t.Lambda(lambda x:x.view(-1))

transform_list = t.Compose([to_tensor, normalize, flatten])
train_set = torchvision.datasets.CIFAR10(root='.', train=True, transform=transform_list, download=True)
test_set = torchvision.datasets.CIFAR10(root='.', train=False, transform=transform_list, download=True)

train_loader = torch.utils.data.DataLoader(train_set, batch_size=64)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=64)

# === Train === ###
net.train()

# train loop
for epoch in range(3):
    train_correct = 0
    train_loss = 0
    print('Epoch {}'.format(epoch))

    # loop per epoch
    for i, (batch, targets) in enumerate(train_loader):

        output = net(batch)
        loss = criterion(output, targets)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        pred = output.max(1, keepdim=True)[1]
        train_correct += pred.eq(targets.view_as(pred)).sum().item()
        train_loss += loss

        if i % 100 == 10: print('Train loss {:.4f}, Train accuracy {:.2f}%'.format(
            train_loss / ((i+1) * 64), 100 * train_correct / ((i+1) * 64)))

print('End of training.\n')

# === Test === ###
test_correct = 0
net.eval()

# loop, over whole test set
for i, (batch, targets) in enumerate(test_loader):

    output = net(batch)
    pred = output.max(1, keepdim=True)[1]
    test_correct += pred.eq(targets.view_as(pred)).sum().item()

print('End of testing. Test accuracy {:.2f}%'.format(
    100 * test_correct / (len(test_loader) * 64)))

100%|██████████| 170M/170M [00:03<00:00, 56.7MB/s]


Epoch 0
Train loss 0.0347, Train accuracy 17.76%
Train loss 0.0296, Train accuracy 32.55%
Train loss 0.0282, Train accuracy 36.53%
Train loss 0.0275, Train accuracy 38.36%
Train loss 0.0267, Train accuracy 39.88%
Train loss 0.0264, Train accuracy 40.66%
Train loss 0.0261, Train accuracy 41.36%
Train loss 0.0258, Train accuracy 41.90%
Epoch 1
Train loss 0.0229, Train accuracy 48.58%
Train loss 0.0229, Train accuracy 48.80%
Train loss 0.0229, Train accuracy 48.96%
Train loss 0.0228, Train accuracy 49.18%
Train loss 0.0226, Train accuracy 49.52%
Train loss 0.0225, Train accuracy 49.65%
Train loss 0.0224, Train accuracy 49.79%
Train loss 0.0224, Train accuracy 49.94%
Epoch 2
Train loss 0.0209, Train accuracy 53.12%
Train loss 0.0212, Train accuracy 53.03%
Train loss 0.0212, Train accuracy 52.98%
Train loss 0.0211, Train accuracy 53.46%
Train loss 0.0209, Train accuracy 53.69%
Train loss 0.0208, Train accuracy 53.82%
Train loss 0.0208, Train accuracy 53.84%


## Autograd tips and tricks

Pointers are everywhere!

In [None]:
net = nn.Linear(2, 2)
w = net.weight
print(w)

x = torch.rand(1, 2)
y = net(x).sum()
y.backward()
net.weight.data -= 0.01 * net.weight.grad # <--- What is this?
print(w)

In [None]:
net = nn.Linear(2, 2)
w = net.weight.clone()
print(w)

x = torch.rand(1, 2)
y = net(x).sum()
y.backward()
net.weight.data -= 0.01 * net.weight.grad # <--- What is this?
print(w)

Sharing weights

In [None]:
net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
net[0].weight = net[1].weight  # weight sharing

x = torch.rand(1, 2)
y = net(x).sum()
y.backward()
print(net[0].weight.grad)
print(net[1].weight.grad)

[![Dataflowr](https://raw.githubusercontent.com/dataflowr/website/master/_assets/dataflowr_logo.png)](https://dataflowr.github.io/website/)