# Linear regression with ``gluon``

Now that we've implemented a whole neural network from scratch, using nothing but ``mx.ndarray`` and ``mxnet.autograd``, let's see how we can make the same model while doing a lot less work.

Again, let's import some packages, this time adding ``mxnet.gluon`` to the list of dependencies.

In [1]:
from __future__ import print_function
import mxnet as mx
from mxnet import nd, autograd
from mxnet import gluon
import numpy as np

## Set the context

And let's also set a context where we'll do most of the computation.

In [2]:
ctx = mx.cpu()

## Build the dataset

Again we'll look at the problem of linear regression and stick with synthetic data. 

In [3]:
X = np.random.randn(10000,2)
Y = 2* X[:,0] - 3.4 * X[:,1] + 4.2 + .01 * np.random.normal(size=10000)

## Load the data iterator

We'll stick with the ``NDArrayIter`` for handling out data batching

In [4]:
batch_size = 4
train_data = mx.io.NDArrayIter(X, Y, batch_size, shuffle=True)

## Define the model

Before we had to individually allocate our parameters and then compose them as a model. While it's good to know how to do things from scratch, with ``gluon``, we can usually just compose a network from predefined standard layers.

In [5]:
net = gluon.nn.Sequential()
net.add(gluon.nn.Dense(1))

## Initialize parameters

Before we can do anything with this model we'll have to initialize the weights. *MXNet* provides a variety of common initializers in ``mxnet.init``. 

In [6]:
net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)

## Define loss

Instead of writing our own loss function wer'e just going to call down to ``gluon.loss.L2Loss`` 

In [7]:
loss = gluon.loss.L2Loss()

## Optimizer

Instead of writing gradient descent from scratch every time, we can instantiate a ``gluon.Trainer``, passing it a dictionary of parameters.

In [8]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

## Execute training loop

Now that we have all the pieces, all we have to do is wire them together by writing a training loop. First we'll define ``epochs``, the number of passes to make over the dataset. Then for each pass, we'll iterate through ``train_data``, grabbing batches of examples and their corresponding labels. 

For each batch, we'll go through the following ritual:
* Generate predictions (``yhat``) and compute the loss (``loss``) by executing a forward pass through the network.
* Calculate gradients by making a backwards pass through the network (``loss.backward()``). 
* Update the model parameters by invoking our SGD optimizer.

In [9]:
epochs = 2
ctx = mx.cpu()
moving_loss = 0.

for e in range(epochs):
    train_data.reset()
    for i, batch in enumerate(train_data):
        data = batch.data[0].as_in_context(ctx)
        label = batch.label[0].as_in_context(ctx).reshape((-1,1))
        with autograd.record():
            output = net(data)
            mse = loss(output, label)
        mse.backward()
        trainer.step(data.shape[0])
        
        ##########################
        #  Keep a moving average of the losses
        ##########################
        if i == 0:
            moving_loss = np.mean(mse.asnumpy()[0])
        else:
            moving_loss = .99 * moving_loss + .01 * np.mean(mse.asnumpy()[0])
            
        if i % 500 == 0:
            print("Epoch %s, batch %s. Moving avg of loss: %s" % (e, i, moving_loss))    

Epoch 0, batch 0. Moving avg of loss: 4.36152
Epoch 0, batch 500. Moving avg of loss: 0.032082290376
Epoch 0, batch 1000. Moving avg of loss: 0.00026563421502
Epoch 0, batch 1500. Moving avg of loss: 4.25103789282e-05
Epoch 0, batch 2000. Moving avg of loss: 4.52190411882e-05
Epoch 1, batch 0. Moving avg of loss: 9.47882e-07
Epoch 1, batch 500. Moving avg of loss: 4.71480747516e-05
Epoch 1, batch 1000. Moving avg of loss: 5.51478557681e-05
Epoch 1, batch 1500. Moving avg of loss: 4.1127381874e-05
Epoch 1, batch 2000. Moving avg of loss: 4.52099542295e-05


## Conclusion 

As you can see, even for a simple eample like linear regression, ``gluon`` can help you to write quick, clean, clode. Next, we'll repeat this exercise for multilayer perceptrons, extending these lessons to deep neural networks and (comparatively) real datasets. 