<a href="https://colab.research.google.com/github/usm-cos-432/InClass/blob/master/chapter4/mlp_concise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The following additional libraries are needed to run this
notebook. Note that running on Colab is experimental, please report a Github
issue if you have any problem.

In [None]:
!pip install d2l==0.14.4


# Concise Implementation of Multilayer Perceptrons
:label:`sec_mlp_concise`

As you might expect, by relying on the high-level APIs,
we can implement MLPs even more concisely.


In [None]:
from d2l import torch as d2l
import torch
from torch import nn
import numpy as np
import torchvision
from torchvision import transforms
from torch.utils import data
from torch.utils.data.sampler import SubsetRandomSampler

In [None]:
def load_data_fashion_mnist(batch_size, resize=None): 
    """Download the Fashion-MNIST dataset and then load it into memory."""
    trans = [transforms.ToTensor()]
    if resize:
        trans.insert(0, transforms.Resize(resize))
    trans = transforms.Compose(trans)
    mnist_train = torchvision.datasets.FashionMNIST(root="../data", train=True, transform=trans, download=True)
    mnist_test = torchvision.datasets.FashionMNIST(root="../data", train=False, transform=trans, download=True)
    indices = list(range(len(mnist_train)))
    np.random.shuffle(indices)
    split = int(np.floor(0.20 * len(mnist_train)))
    mnist_train_sample = SubsetRandomSampler(indices[split:])
    mnist_valid_sample = SubsetRandomSampler(indices[:split])

    return (data.DataLoader(mnist_train, batch_size, sampler=mnist_train_sample),
            data.DataLoader(mnist_train, batch_size, sampler=mnist_valid_sample),
            data.DataLoader(mnist_test, batch_size, shuffle=False))

In [None]:
def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater): 
    """Train a model (defined in Chapter 3)."""
    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.0, 1.0],
                        legend=['train loss', 'train acc', 'test acc'])
    for epoch in range(num_epochs):
        train_metrics = d2l.train_epoch_ch3(net, train_iter, loss, updater)
        test_acc = d2l.evaluate_accuracy(net, test_iter)
        animator.add(epoch + 1, train_metrics + (test_acc,))

In [None]:
batch_size = 24
train_iter, valid_iter, test_iter = load_data_fashion_mnist(batch_size)
X, y = next(iter(train_iter))
d2l.show_images(X.reshape(batch_size, 28, 28), 2, 9, titles=d2l.get_fashion_mnist_labels(y));


## Model

As compared with our concise implementation
of softmax regression implementation
(:numref:`sec_softmax_concise`),
the only difference is that we add
*two* fully-connected layers
(previously, we added *one*).
The first is our hidden layer,
which contains 256 hidden units
and applies the ReLU activation function.
The second is our output layer.


In [None]:
net = nn.Sequential(nn.Flatten(),
                    nn.Linear(784, 256),
                    nn.ReLU(),
                    nn.Linear(256, 10))

def init_weights(m):
    if type(m) == nn.Linear:
        torch.nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights)

The training loop is exactly the same
as when we implemented softmax regression.
This modularity enables us to separate
matters concerning the model architecture
from orthogonal considerations.


In [None]:
batch_size, lr, num_epochs = 256, 0.1, 10
loss = nn.CrossEntropyLoss()
trainer = torch.optim.SGD(net.parameters(), lr=lr)

In [None]:
train_iter, valid_iter, test_iter = load_data_fashion_mnist(batch_size)
net.apply(init_weights)
train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

In [None]:
print(d2l.evaluate_accuracy(net, train_iter))
print(d2l.evaluate_accuracy(net, test_iter))
