# Multiclass Logistic Regression with ``gluon``

Now that we've built a [logistic regression model from scratch](http://5-softmax-reression-scratch.ipynb), let's make this more efficient with ``gluon``.

In [1]:
from __future__ import print_function
import mxnet as mx
from mxnet import nd, autograd
from mxnet import gluon
import numpy as np

We'll also want to set the compute context for our modeling. Feel free to go ahead and change this to mx.gpu(0) if you're running on an appropriately endowed machine.

In [2]:
ctx = mx.cpu()

## The MNIST Dataset

First, we'll grab the data.

In [3]:
mnist = mx.test_utils.get_mnist()

## Data Iterators

And load up two data iterators.

In [4]:
batch_size = 64
train_data = mx.io.NDArrayIter(mnist["train_data"], mnist["train_label"], batch_size, shuffle=True)
test_data = mx.io.NDArrayIter(mnist["test_data"], mnist["test_label"], batch_size, shuffle=True)

We're also going to want to load up an iterator with *test* data. After we train on the training dataset we're going to want to test our model on the test data. Otherwise, for all we know, our model could be doing something stupid (or treacherous?) like memorizing the training examples and regurgitating the labels on command.

## Multiclass Logistic Regression

Now we're going to define our model. 

In [5]:
net = gluon.nn.Sequential()
with net.name_scope():
    net.add(gluon.nn.Dense(10))

## Parameter initialization


In [6]:
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)

## Softmax Cross Entropy Loss

Note, we didn't have to include the softmax layer because MXNet's has an efficient function that simultaneously computes the softmax activation and cross-entropy loss.

In [7]:
loss = gluon.loss.SoftmaxCrossEntropyLoss()

## Optimizer

And let's instantiate an optimizer to make our updates

In [8]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

## Evaluation Metric

This time, let's simplify the evaluation code by relying on MXNet's built-in ``metric`` package.

In [9]:
metric = mx.metric.Accuracy()

def evaluate_accuracy(data_iterator, net):
    numerator = 0.
    denominator = 0.
    
    data_iterator.reset()
    for i, batch in enumerate(data_iterator):
        with autograd.record():
            data = batch.data[0].as_in_context(ctx).reshape((-1,784))
            label = batch.label[0].as_in_context(ctx)
            label_one_hot = nd.one_hot(label, 10)
            output = net(data)
        
        metric.update([label], [output])
    return metric.get()[1]

Because we initialized our model randomly, and because roughly one tenth of all examples belong to each fo the ten classes, we should have an accuracy in the ball park of .10.

In [10]:
evaluate_accuracy(test_data, net)

0.081608280254777066

## Execute training loop

In [11]:
epochs = 10
moving_loss = 0.

for e in range(epochs):
    train_data.reset()
    for i, batch in enumerate(train_data):
        data = batch.data[0].as_in_context(ctx).reshape((-1,784))
        label = batch.label[0].as_in_context(ctx)
        with autograd.record():
            output = net(data)
            cross_entropy = loss(output, label)
            cross_entropy.backward()
        trainer.step(data.shape[0])
        
        ##########################
        #  Keep a moving average of the losses
        ##########################
        if i == 0:
            moving_loss = np.mean(cross_entropy.asnumpy()[0])
        else:
            moving_loss = .99 * moving_loss + .01 * np.mean(cross_entropy.asnumpy()[0])
            
    test_accuracy = evaluate_accuracy(test_data, net)
    train_accuracy = evaluate_accuracy(train_data, net)
    print("Epoch %s. Loss: %s, Train_acc %s, Test_acc %s" % (e, moving_loss, train_accuracy, test_accuracy))    
    

Epoch 0. Loss: 0.350726645904, Train_acc 0.79873452476, Test_acc 0.493929140127
Epoch 1. Loss: 0.309667650436, Train_acc 0.850653760119, Test_acc 0.811479772889
Epoch 2. Loss: 0.292941445752, Train_acc 0.870837267577, Test_acc 0.854632587859
Epoch 3. Loss: 0.283371088237, Train_acc 0.881794825876, Test_acc 0.87282057516
Epoch 4. Loss: 0.27702516471, Train_acc 0.8889132413, Test_acc 0.883015551768
Epoch 5. Loss: 0.272453047183, Train_acc 0.893979021109, Test_acc 0.889718107618
Epoch 6. Loss: 0.268978957032, Train_acc 0.897760323447, Test_acc 0.894558487071
Epoch 7. Loss: 0.266237516287, Train_acc 0.900735603342, Test_acc 0.898203487279
Epoch 8. Loss: 0.264009937258, Train_acc 0.903152154914, Test_acc 0.901070365881
Epoch 9. Loss: 0.262155878024, Train_acc 0.905145685154, Test_acc 0.903425853083


## Conclusion

Now let's take a look at how to implement modern neural networks. 