# Multiclass logistic regression with ``gluon``

Now that we've built a [logistic regression model from scratch](./P02-C03-softmax-regression-scratch.ipynb), let's make this more efficient with ``gluon``. If you completed the corresponding chapters on linear regression, you might be tempted rest your eyes a little in this one. We'll be using ``gluon`` in a rather similar way and since library is reasonably well designed, you won't have to do much differently. To keep you awake we'll introduce a few subtle tricks. 

Let's start by importing the standard packages. 

In [1]:
from __future__ import print_function
import mxnet as mx
from mxnet import nd, autograd
from mxnet import gluon
import numpy as np

## Set the context

We'll set the cotext. In the linear regression tutorial we did all of our computation on the cpu (`mx.cpu()`) just to keep things simple. When you've got 2-dimensional data and scalar labels, a smartwatch can probably handle the job. Already, in this tutorial we'll be working with a considerably larger dataset. If you happen to be running this code on a server with a GPU and remembered to build MXNet with ``CUDA=1``, you might want to substitute the following line for its commented-out counterpart.

In [2]:
########################
#  Set the context to CPU
########################
ctx = mx.cpu()

########################
#  If you have GPU, instead call:
########################
#ctx = mx.gpu()

## The MNIST Dataset

We won't suck up too much wind describing the MNIST dataset for a second time. If you're unfamiliar with the dataset and are reading these chapters out of sequence, take a look at the data section in the previous chapter on [softmax regression from scratch](./P02-C03-softmax-regression-scratch.ipynb).

In [3]:
mnist = mx.test_utils.get_mnist()

## Data Iterators

We'll load up data iterators corresponding to the training and test splits of our dataset. 

In [4]:
batch_size = 64
num_inputs = 784
num_outputs = 10
train_data = mx.io.NDArrayIter(mnist["train_data"], mnist["train_label"], batch_size, shuffle=True)
test_data = mx.io.NDArrayIter(mnist["test_data"], mnist["test_label"], batch_size, shuffle=True)

We're also going to want to load up an iterator with *test* data. After we train on the training dataset we're going to want to test our model on the test data. Otherwise, for all we know, our model could be doing something stupid (or treacherous?) like memorizing the training examples and regurgitating the labels on command.

## Multiclass Logistic Regression

Now we're going to define our model. 
Remember from [our tutorial on linear regression with ``gluon``](./P02-C02-linear-regression-gluon)
that we add ``Dense`` layers by calling ``net.add(gluon.nn.Dense(num_outputs))``. 
This leaves the parameter shapes underspecified, 
but ``gluon`` will infer the desired shapes 
the first time we pass real data through the network.


In [5]:
net = gluon.nn.Sequential()
with net.name_scope():
    net.add(gluon.nn.Dense(num_outputs))

## Parameter initialization


In [6]:
net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=ctx)

## Softmax Cross Entropy Loss

Note, we didn't have to include the softmax layer because MXNet's has an efficient function that simultaneously computes the softmax activation and cross-entropy loss. However, if ever need to get the output probabilities, 

In [7]:
loss = gluon.loss.SoftmaxCrossEntropyLoss()

## Optimizer

And let's instantiate an optimizer to make our updates

In [8]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

## Evaluation Metric

This time, let's simplify the evaluation code by relying on MXNet's built-in ``metric`` package.

In [9]:
def evaluate_accuracy(data_iterator, net):
    acc = mx.metric.Accuracy()
    data_iterator.reset()
    for i, batch in enumerate(data_iterator):
        data = batch.data[0].as_in_context(ctx).reshape((-1,784))
        label = batch.label[0].as_in_context(ctx)
        output = net(data)
        predictions = nd.argmax(output, axis=1)
        acc.update(preds=predictions, labels=label)
    return acc.get()[1]

Because we initialized our model randomly, and because roughly one tenth of all examples belong to each of the ten classes, we should have an accuracy in the ball park of .10.

In [10]:
evaluate_accuracy(test_data, net)

0.11743630573248408

## Execute training loop

In [11]:
epochs = 10
moving_loss = 0.

for e in range(epochs):
    train_data.reset()
    for i, batch in enumerate(train_data):
        data = batch.data[0].as_in_context(ctx).reshape((-1,784))
        label = batch.label[0].as_in_context(ctx)
        with autograd.record():
            output = net(data)
            cross_entropy = loss(output, label)
        cross_entropy.backward()
        trainer.step(data.shape[0])
        
        ##########################
        #  Keep a moving average of the losses
        ##########################
        if i == 0:
            moving_loss = np.mean(cross_entropy.asnumpy()[0])
        else:
            moving_loss = .99 * moving_loss + .01 * np.mean(cross_entropy.asnumpy()[0])
            
    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    print("Epoch %s. Loss: %s, Train_acc %s, Test_acc %s" % (e, moving_loss, train_accuracy, test_accuracy))    
    

Epoch 0. Loss: 0.993500187288, Train_acc 0.79157782516, Test_acc 0.803742038217
Epoch 1. Loss: 0.613952599768, Train_acc 0.836587153518, Test_acc 0.846835191083
Epoch 2. Loss: 0.501101848272, Train_acc 0.856093416844, Test_acc 0.863654458599
Epoch 3. Loss: 0.447380604863, Train_acc 0.867154184435, Test_acc 0.873308121019
Epoch 4. Loss: 0.415179233267, Train_acc 0.874083821962, Test_acc 0.878582802548
Epoch 5. Loss: 0.393141356786, Train_acc 0.879480943497, Test_acc 0.883260350318
Epoch 6. Loss: 0.376974711482, Train_acc 0.883795309168, Test_acc 0.887141719745
Epoch 7. Loss: 0.36447996155, Train_acc 0.887043576759, Test_acc 0.890326433121
Epoch 8. Loss: 0.354348139062, Train_acc 0.890175239872, Test_acc 0.893113057325
Epoch 9. Loss: 0.345816105349, Train_acc 0.892457356077, Test_acc 0.894705414013


## Conclusion

Now let's take a look at how to implement modern neural networks. 

For whinges or inquiries, [open an issue on  GitHub.](https://github.com/zackchase/mxnet-the-straight-dope)