# Linear regression with ``gluon``

Now that we've implemented a whole neural network from scratch, using nothing but ``mx.ndarray`` and ``mxnet.autograd``, let's see how we can make the same model while doing a lot less work.

Again, let's import some packages, this time adding ``mxnet.gluon`` to the list of dependencies.

In [2]:
from __future__ import print_function
import mxnet as mx
from mxnet import nd, autograd
from mxnet import gluon

## Set the context

And let's also set a context where we'll do most of the computation.

In [3]:
ctx = mx.cpu()

## Build the dataset

Again we'll look at the problem of linear regression and stick with the same synthetic data. 

In [4]:
X = nd.random_normal(shape=(10000,2))
y = 2* X[:,0] - 3.4 * X[:,1] + 4.2 + .01 * nd.random_normal(shape=(10000,))

## Load the data iterator

We'll stick with the ``NDArrayIter`` for handling out data batching.

In [5]:
batch_size = 4
train_data = mx.io.NDArrayIter(X, y, batch_size, shuffle=True)

## Define the model

Before we had to individual allocate our parameters and then compose them as a model. While it's good to know how to do things from scratch, with ``gluon``, we can usually just compose a network from predefined standard layers.

In [6]:
net = gluon.nn.Sequential()
net.add(gluon.nn.Dense(1, in_units=2))

## Initialize parameters

Before we can do anything with this model we'll have to initialize the weights. *MXNet* provides a variety of common initializers in ``mxnet.init``. Note that we pass the initializer a context. That's how we tell ``gluon`` model where should to store our parameters. Once we start training deep nets, we'll generally want to keep parameters on one or more GPUs.

In [7]:
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)

## Define loss

Instead of writing our own loss function wer'e just going to call down to ``gluon.loss.L2Loss`` 

In [8]:
loss = gluon.loss.L2Loss()

## Optimizer

Instead of writing gradient descent from scratch every time, we can instantiate a ``gluon.Trainer``, passing it a dictionary of parameters.

In [9]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

## Execute training loop

You might have notived that it was a bit more concise to express our model in ``gluon``. For example, we didn't have to individually allocate parameters, define our loss function, or implement stochastic gradient descent. The benefits of relying on ``gluon``'s abstractions will grow substantially once we start working with much more complext models. But once we have all the basic pieces in place, the training loop itself is quite similar to what we would do if implementing everything from scratch. 

To refresh your memory. For some number of ``epochs``, we'll make a complete pass over the dataset (``train_data``), grabbing one mini-batch of inputs and the corresponding ground-truth labels at a time. 

Then, for each batch, we'll go through the following ritual. To make it maximally ritualistic, we'll :
* Generate predictions (``yhat``) and the loss (``loss``) by executing a forward pass through the network.
* Calculate gradients by making a backwards pass through the network (``loss.backward()``). 
* Update the model parameters by invoking our SGD optimizer.

In [10]:
epochs = 2

for e in range(epochs):
    moving_loss = 0.
    train_data.reset()
    for i, batch in enumerate(train_data):
        data = batch.data[0].as_in_context(ctx)
        label = batch.label[0].as_in_context(ctx).reshape((-1,1))
        with autograd.record():
            output = net(data)
            mse = loss(output, label)
        mse.backward()
        trainer.step(data.shape[0])
        
        #  Keep a moving average of the losses
        moving_loss = .99 * moving_loss + .01 * nd.sum(mse).asscalar()
    
        if i % 500 == 0:
            print("Epoch %s, batch %s. Moving avg of loss: %s" % (e, i, moving_loss))    

Epoch 0, batch 0. Moving avg of loss: 0.550449523926
Epoch 0, batch 500. Moving avg of loss: 0.0248766543482
Epoch 0, batch 1000. Moving avg of loss: 0.000377714028904
Epoch 0, batch 1500. Moving avg of loss: 0.000196770685601
Epoch 0, batch 2000. Moving avg of loss: 0.000200642416255
Epoch 1, batch 0. Moving avg of loss: 1.96520282771e-06
Epoch 1, batch 500. Moving avg of loss: 0.000204079477443
Epoch 1, batch 1000. Moving avg of loss: 0.000215603294103
Epoch 1, batch 1500. Moving avg of loss: 0.000195705539767
Epoch 1, batch 2000. Moving avg of loss: 0.000200635417732


## Conclusion 

As you can see, even for a simple eample like linear regression, ``gluon`` can help you to write quick, clean, clode. Next, we'll repeat this exercise for multilayer perceptrons, extending these lessons to deep neural networks and (comparatively) real datasets. 

For whinges or inquiries, [open an issue on  GitHub.](https://github.com/zackchase/mxnet-the-straight-dope)