# Linear regression with ``gluon``

Now that we've implemented a whole neural network from scratch, using nothing but ``mx.ndarray`` and ``mxnet.autograd``, let's see how we can make the same model while doing a lot less work. 

Again, let's import some packages, this time adding ``mxnet.gluon`` to the list of dependencies.

In [1]:
from __future__ import print_function
import mxnet as mx
from mxnet import nd, autograd
from mxnet import gluon

## Set the context

We'll also want to set a context to tell gluon where to do most of the computation.

In [2]:
ctx = mx.cpu()

## Build the dataset

Again we'll look at the problem of linear regression and stick with the same synthetic data. 

In [3]:
num_inputs = 2
num_outputs = 1
num_examples = 10000

X = nd.random_normal(shape=(num_examples, num_inputs))
y = 2* X[:,0] - 3.4 * X[:,1] + 4.2 + .01 * nd.random_normal(shape=(num_examples,))

## Load the data iterator

We'll stick with the ``NDArrayIter`` for handling out data batching.

In [4]:
batch_size = 4
train_data = mx.gluon.data.DataLoader(mx.gluon.data.ArrayDataset(X, y),
                                      batch_size=batch_size, shuffle=True)

## Define the model

When we implemented things from scratch, we had to individually allocate parameters and then compose them together as a model. While it's good to know how to do things from scratch, with ``gluon``, we can just compose a network from predefined layers. For a linear model, the appropriate layer is called ``Dense``. It's called a *dense* layer because every node in the input is connected to every node in the subsequent layer. That description seems excessive because we only have one output here. But in most subsequent chapters we'll work with networks that have multiple outputs.

Unless we're planning to make some wild decisions (and at some point, we will!), the easiest way to throw together a neural network is to rely on the ``gluon.nn.Sequential``. Once instantiated, a Sequential just stores a chain of layers. Presented with data, the `Sequential` executes each layer in turn. We'll delve deeper into these details later when we actually have more than one layer to work with, for now let's just instantiate the ``Sequential``.

In [5]:
net = gluon.nn.Sequential()

Recall that in our linear regression example, the number of inputs is 2 and the number of outputs is 1. We can then add on a single ``Dense`` layer. The most direct way to do this is to specify the number of inputs and the number of outpus. 

In [6]:
with net.name_scope():
    net.add(gluon.nn.Dense(1, in_units=2))

This tells ``gluon`` all that it needs in order to allocate memory for the weights. After that, all we need to do is initialize the weights, instantiate a loss and an optimizer, and we can start training.

## Shape inference

One slick feature that we can take advantage of in ``gluon`` is shape inference on parameters. 
Instead of explicitly declaring the number of inputs to a layer, 
we can simply state the number of outputs. 

In [7]:
net = gluon.nn.Sequential()
with net.name_scope():
    net.add(gluon.nn.Dense(1))

You might wonder, how can gluon allocate our parameters if it doesn't know what shape they should take? We'll elaborate on this and more of ``gluon``'s internal workings in [our chapter on plumbing](./P03.5-C01-plumbing.ipynb), but here's the short version. In fact, ``gluon`` doesn't allocate our parameters. Instead it defers allocation to the first time we actually make a forward pass through the model with real data. Then, when ``gluon`` sees the shape of our data, it can infer the shapes of all of the parameters.

## Initialize parameters


This all we need to do to define our network. However, we're not ready to pass it data just yet. If you try calling ``net(nd.array([[0,1]]))``, you'll find the following hideous error message:

``RuntimeError: Parameter dense1_weight has not been initialized. Note that you should initialize parameters and create Trainer with Block.collect_params() instead of Block.params because the later does not include Parameters of nested child Blocks``.

That's because we haven't yet told ``gluon`` what the *initial values* for our parameters should be. Also note that we need not tell our network about the *input dimensionality* and it still works. This is because the dimensions are bound the first time ``net(x)`` is called. This is a common theme in MxNet - stuff is evaluated only when needed (called lazy evaluation), using all the information available at the time when the results is needed.

Before we can do anything with this model, we must initialize its parameters. *MXNet* provides a variety of common initializers in ``mxnet.init``. To keep things consistent with the model we built by hand, we'll choose to initialize each parameter by sampling from a standard normal distribution. Note that we pass the initializer a *context*. This is how we tell ``gluon`` model where to store our parameters. Once we start training deep nets, we'll generally want to keep parameters on one or more GPUs.

In [8]:
net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=ctx)

## Deferred Initialization

Since ``gluon`` doesn't know the shape of our net's parameters, 
and we haven't even allcoated memory for them yet, 
it might seem bizarre that we can initialize them. 
This is where ``gluon`` does a little more magic to make our lives easier.
When we call ``initialize``, ``gluon`` associates each parameter with an initializer.
However, the *actual initialization* is deferred until the shapes have been deferred. 



## Define loss

Instead of writing our own loss function wer'e just going to call down to ``gluon.loss.L2Loss`` 

In [9]:
square_loss = gluon.loss.L2Loss()

## Optimizer

Instead of writing gradient descent from scratch every time, we can instantiate a ``gluon.Trainer``, passing it a dictionary of parameters.

In [None]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1})

## Execute training loop

You might have notived that it was a bit more concise to express our model in ``gluon``. For example, we didn't have to individually allocate parameters, define our loss function, or implement stochastic gradient descent. The benefits of relying on ``gluon``'s abstractions will grow substantially once we start working with much more complext models. But once we have all the basic pieces in place, the training loop itself is quite similar to what we would do if implementing everything from scratch. 

To refresh your memory. For some number of ``epochs``, we'll make a complete pass over the dataset (``train_data``), grabbing one mini-batch of inputs and the corresponding ground-truth labels at a time. 

Then, for each batch, we'll go through the following ritual. So that this process becomes maximally ritualistic, we'll repeat it verbatim:
* Generate predictions (``yhat``) and the loss (``loss``) by executing a forward pass through the network.
* Calculate gradients by making a backwards pass through the network (``loss.backward()``). 
* Update the model parameters by invoking our SGD optimizer.

In [None]:
epochs = 2
smoothing_constant = .01

for e in range(epochs):
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx)
        label = label.as_in_context(ctx)
        with autograd.record():
            output = net(data)
            loss = square_loss(output, label)
        loss.backward()
        trainer.step(batch_size)
        
        ##########################
        #  Keep a moving average of the losses
        ##########################
        curr_loss = nd.mean(loss).asscalar()
        moving_loss = (curr_loss if ((i == 0) and (e == 0)) 
                       else (1 - smoothing_constant) * moving_loss + (smoothing_constant) * curr_loss)
    
    print("Epoch %s. Moving avg of MSE: %s" % (e, moving_loss))       

Epoch 0. Moving avg of MSE: 12.3373
Epoch 0. Moving avg of MSE: 12.2992141533
Epoch 0. Moving avg of MSE: 12.4694043085
Epoch 0. Moving avg of MSE: 12.372642361
Epoch 0. Moving avg of MSE: 12.3675292854
Epoch 0. Moving avg of MSE: 12.2710584512
Epoch 0. Moving avg of MSE: 12.1652803792
Epoch 0. Moving avg of MSE: 12.0637343011
Epoch 0. Moving avg of MSE: 12.0322570381
Epoch 0. Moving avg of MSE: 11.9569245638
Epoch 0. Moving avg of MSE: 11.8420571829
Epoch 0. Moving avg of MSE: 11.7757727414
Epoch 0. Moving avg of MSE: 11.6631294245
Epoch 0. Moving avg of MSE: 11.5581669339
Epoch 0. Moving avg of MSE: 11.4470395271
Epoch 0. Moving avg of MSE: 11.3413224208
Epoch 0. Moving avg of MSE: 11.2281326755
Epoch 0. Moving avg of MSE: 11.1205630298
Epoch 0. Moving avg of MSE: 11.0172487653
Epoch 0. Moving avg of MSE: 10.9093835067
Epoch 0. Moving avg of MSE: 10.8066534331
Epoch 0. Moving avg of MSE: 10.7013732469
Epoch 0. Moving avg of MSE: 10.5966493789
Epoch 0. Moving avg of MSE: 10.4917990999

Epoch 0. Moving avg of MSE: 1.4500345576
Epoch 0. Moving avg of MSE: 1.43553454068
Epoch 0. Moving avg of MSE: 1.42117970419
Epoch 0. Moving avg of MSE: 1.40696817418
Epoch 0. Moving avg of MSE: 1.39289961587
Epoch 0. Moving avg of MSE: 1.37897131433
Epoch 0. Moving avg of MSE: 1.36518241194
Epoch 0. Moving avg of MSE: 1.35153116202
Epoch 0. Moving avg of MSE: 1.3380163401
Epoch 0. Moving avg of MSE: 1.3246370313
Epoch 0. Moving avg of MSE: 1.31139129525
Epoch 0. Moving avg of MSE: 1.2982781742
Epoch 0. Moving avg of MSE: 1.2852956441
Epoch 0. Moving avg of MSE: 1.27244320663
Epoch 0. Moving avg of MSE: 1.25971947675
Epoch 0. Moving avg of MSE: 1.2471230155
Epoch 0. Moving avg of MSE: 1.23465220639
Epoch 0. Moving avg of MSE: 1.22230638461
Epoch 0. Moving avg of MSE: 1.21008369125
Epoch 0. Moving avg of MSE: 1.19798352352
Epoch 0. Moving avg of MSE: 1.18600425358
Epoch 0. Moving avg of MSE: 1.17414442373
Epoch 0. Moving avg of MSE: 1.16240319671
Epoch 0. Moving avg of MSE: 1.1507795566

Epoch 0. Moving avg of MSE: 0.173983416149
Epoch 0. Moving avg of MSE: 0.172244277143
Epoch 0. Moving avg of MSE: 0.170522564386
Epoch 0. Moving avg of MSE: 0.168818098141
Epoch 0. Moving avg of MSE: 0.16713046929
Epoch 0. Moving avg of MSE: 0.16545999357
Epoch 0. Moving avg of MSE: 0.163805412018
Epoch 0. Moving avg of MSE: 0.162167726309
Epoch 0. Moving avg of MSE: 0.160547273188
Epoch 0. Moving avg of MSE: 0.158942742776
Epoch 0. Moving avg of MSE: 0.157353434563
Epoch 0. Moving avg of MSE: 0.155780690092
Epoch 0. Moving avg of MSE: 0.15422369991
Epoch 0. Moving avg of MSE: 0.152682500864
Epoch 0. Moving avg of MSE: 0.151156004058
Epoch 0. Moving avg of MSE: 0.149644999576
Epoch 0. Moving avg of MSE: 0.14814864415
Epoch 0. Moving avg of MSE: 0.146667359772
Epoch 0. Moving avg of MSE: 0.14520136851
Epoch 0. Moving avg of MSE: 0.14374976314
Epoch 0. Moving avg of MSE: 0.142312606223
Epoch 0. Moving avg of MSE: 0.140890755205
Epoch 0. Moving avg of MSE: 0.139482080337
Epoch 0. Moving a

Epoch 0. Moving avg of MSE: 0.0183632721852
Epoch 0. Moving avg of MSE: 0.0181807774747
Epoch 0. Moving avg of MSE: 0.0179997475231
Epoch 0. Moving avg of MSE: 0.0178197939315
Epoch 0. Moving avg of MSE: 0.0176419024949
Epoch 0. Moving avg of MSE: 0.0174660329635
Epoch 0. Moving avg of MSE: 0.0172918198033
Epoch 0. Moving avg of MSE: 0.0171196236221
Epoch 0. Moving avg of MSE: 0.0169487503873
Epoch 0. Moving avg of MSE: 0.0167799730875
Epoch 0. Moving avg of MSE: 0.0166126801224
Epoch 0. Moving avg of MSE: 0.0164466337324
Epoch 0. Moving avg of MSE: 0.0162823342053
Epoch 0. Moving avg of MSE: 0.0161198926857
Epoch 0. Moving avg of MSE: 0.0159590690021
Epoch 0. Moving avg of MSE: 0.0157998256505
Epoch 0. Moving avg of MSE: 0.0156429070201
Epoch 0. Moving avg of MSE: 0.0154879899437
Epoch 0. Moving avg of MSE: 0.015333463215
Epoch 0. Moving avg of MSE: 0.0151812582116
Epoch 0. Moving avg of MSE: 0.0150297150229
Epoch 0. Moving avg of MSE: 0.0148798427334
Epoch 0. Moving avg of MSE: 0.014

Epoch 0. Moving avg of MSE: 0.00174345902734
Epoch 0. Moving avg of MSE: 0.00172636587662
Epoch 0. Moving avg of MSE: 0.00170917215205
Epoch 0. Moving avg of MSE: 0.00169230913803
Epoch 0. Moving avg of MSE: 0.00167579183856
Epoch 0. Moving avg of MSE: 0.00165962995316
Epoch 0. Moving avg of MSE: 0.00164384521736
Epoch 0. Moving avg of MSE: 0.00162817137935
Epoch 0. Moving avg of MSE: 0.00161243495983
Epoch 0. Moving avg of MSE: 0.00159662097901
Epoch 0. Moving avg of MSE: 0.0015808395718
Epoch 0. Moving avg of MSE: 0.00156554032886
Epoch 0. Moving avg of MSE: 0.00155045145447
Epoch 0. Moving avg of MSE: 0.00153527808877
Epoch 0. Moving avg of MSE: 0.00152019111071
Epoch 0. Moving avg of MSE: 0.0015057320732
Epoch 0. Moving avg of MSE: 0.00149121425821
Epoch 0. Moving avg of MSE: 0.00147635694879
Epoch 0. Moving avg of MSE: 0.00146328409998
Epoch 0. Moving avg of MSE: 0.00144882310106
Epoch 0. Moving avg of MSE: 0.00143447471577
Epoch 0. Moving avg of MSE: 0.00142112112233
Epoch 0. Mov

Epoch 0. Moving avg of MSE: 0.000201171644679
Epoch 0. Moving avg of MSE: 0.000199217181547
Epoch 0. Moving avg of MSE: 0.000197761465322
Epoch 0. Moving avg of MSE: 0.000196357777915
Epoch 0. Moving avg of MSE: 0.000194641330964
Epoch 0. Moving avg of MSE: 0.000193491952536
Epoch 0. Moving avg of MSE: 0.000191731897919
Epoch 0. Moving avg of MSE: 0.000190394229927
Epoch 0. Moving avg of MSE: 0.000188838220174
Epoch 0. Moving avg of MSE: 0.000187386127746
Epoch 0. Moving avg of MSE: 0.000186089535671
Epoch 0. Moving avg of MSE: 0.000184386607538
Epoch 0. Moving avg of MSE: 0.000182896363772
Epoch 0. Moving avg of MSE: 0.000181486048585
Epoch 0. Moving avg of MSE: 0.000179718794622
Epoch 0. Moving avg of MSE: 0.000178156241538
Epoch 0. Moving avg of MSE: 0.000177223862393
Epoch 0. Moving avg of MSE: 0.000175713630894
Epoch 0. Moving avg of MSE: 0.000174424225155
Epoch 0. Moving avg of MSE: 0.000173074179189
Epoch 0. Moving avg of MSE: 0.000171464736724
Epoch 0. Moving avg of MSE: 0.0001

Epoch 0. Moving avg of MSE: 6.74835396255e-05
Epoch 0. Moving avg of MSE: 6.70703646908e-05
Epoch 0. Moving avg of MSE: 6.69972338421e-05
Epoch 0. Moving avg of MSE: 6.67094033468e-05
Epoch 0. Moving avg of MSE: 6.71815457237e-05
Epoch 0. Moving avg of MSE: 6.68152679902e-05
Epoch 0. Moving avg of MSE: 6.63892085754e-05
Epoch 0. Moving avg of MSE: 6.66291497688e-05
Epoch 0. Moving avg of MSE: 6.60858053838e-05
Epoch 0. Moving avg of MSE: 6.56932976344e-05
Epoch 0. Moving avg of MSE: 6.5422262969e-05
Epoch 0. Moving avg of MSE: 6.56609842748e-05
Epoch 0. Moving avg of MSE: 6.52645405392e-05
Epoch 0. Moving avg of MSE: 6.51347983442e-05
Epoch 0. Moving avg of MSE: 6.50120777125e-05
Epoch 0. Moving avg of MSE: 6.5189667909e-05
Epoch 0. Moving avg of MSE: 6.45981689541e-05
Epoch 0. Moving avg of MSE: 6.4514477198e-05
Epoch 0. Moving avg of MSE: 6.50860419234e-05
Epoch 0. Moving avg of MSE: 6.5228100522e-05
Epoch 0. Moving avg of MSE: 6.46778402023e-05
Epoch 0. Moving avg of MSE: 6.49102825

Epoch 0. Moving avg of MSE: 5.41310016322e-05
Epoch 0. Moving avg of MSE: 5.37922993504e-05
Epoch 0. Moving avg of MSE: 5.40937833597e-05
Epoch 0. Moving avg of MSE: 5.38243814812e-05
Epoch 0. Moving avg of MSE: 5.34460050434e-05
Epoch 0. Moving avg of MSE: 5.32186370796e-05
Epoch 0. Moving avg of MSE: 5.3961807626e-05
Epoch 0. Moving avg of MSE: 5.40587671559e-05
Epoch 0. Moving avg of MSE: 5.45922333553e-05
Epoch 0. Moving avg of MSE: 5.51502220031e-05
Epoch 0. Moving avg of MSE: 5.48602478586e-05
Epoch 0. Moving avg of MSE: 5.54269524754e-05
Epoch 0. Moving avg of MSE: 5.56380436242e-05
Epoch 0. Moving avg of MSE: 5.56864005488e-05
Epoch 0. Moving avg of MSE: 5.62909752474e-05
Epoch 0. Moving avg of MSE: 5.58312710337e-05
Epoch 0. Moving avg of MSE: 5.57081274911e-05
Epoch 0. Moving avg of MSE: 5.53116163962e-05
Epoch 0. Moving avg of MSE: 5.522126718e-05
Epoch 0. Moving avg of MSE: 5.59853373773e-05
Epoch 0. Moving avg of MSE: 5.62238853099e-05
Epoch 0. Moving avg of MSE: 5.5979468

Epoch 0. Moving avg of MSE: 5.64969683313e-05
Epoch 0. Moving avg of MSE: 5.59830597588e-05
Epoch 0. Moving avg of MSE: 5.68098158252e-05
Epoch 0. Moving avg of MSE: 5.69627019112e-05
Epoch 0. Moving avg of MSE: 5.70469582907e-05
Epoch 0. Moving avg of MSE: 5.79646903772e-05
Epoch 0. Moving avg of MSE: 5.76094634011e-05
Epoch 0. Moving avg of MSE: 5.80583166181e-05
Epoch 0. Moving avg of MSE: 5.75790629123e-05
Epoch 0. Moving avg of MSE: 5.71360467499e-05
Epoch 0. Moving avg of MSE: 5.73071877161e-05
Epoch 0. Moving avg of MSE: 5.72783613248e-05
Epoch 0. Moving avg of MSE: 5.72734634384e-05
Epoch 0. Moving avg of MSE: 5.69132339328e-05
Epoch 0. Moving avg of MSE: 5.70452963971e-05
Epoch 0. Moving avg of MSE: 5.81393406825e-05
Epoch 0. Moving avg of MSE: 5.79497583935e-05
Epoch 0. Moving avg of MSE: 5.87922402053e-05
Epoch 0. Moving avg of MSE: 5.8512290026e-05
Epoch 0. Moving avg of MSE: 5.88517400855e-05
Epoch 0. Moving avg of MSE: 5.83094182716e-05
Epoch 0. Moving avg of MSE: 5.78312

Epoch 0. Moving avg of MSE: 5.13435055234e-05
Epoch 0. Moving avg of MSE: 5.13394609519e-05
Epoch 0. Moving avg of MSE: 5.11880564387e-05
Epoch 0. Moving avg of MSE: 5.11300731023e-05
Epoch 0. Moving avg of MSE: 5.13710397408e-05
Epoch 0. Moving avg of MSE: 5.10016902178e-05
Epoch 0. Moving avg of MSE: 5.06556593758e-05
Epoch 0. Moving avg of MSE: 5.0325663516e-05
Epoch 0. Moving avg of MSE: 5.08799346511e-05
Epoch 0. Moving avg of MSE: 5.04299482797e-05
Epoch 0. Moving avg of MSE: 5.05677874175e-05
Epoch 0. Moving avg of MSE: 5.02162807187e-05
Epoch 0. Moving avg of MSE: 5.02867422877e-05
Epoch 0. Moving avg of MSE: 5.06249865517e-05
Epoch 0. Moving avg of MSE: 5.0675309536e-05
Epoch 0. Moving avg of MSE: 5.04939733539e-05
Epoch 0. Moving avg of MSE: 5.09448472138e-05
Epoch 0. Moving avg of MSE: 5.10548886159e-05
Epoch 0. Moving avg of MSE: 5.12304311733e-05
Epoch 0. Moving avg of MSE: 5.09611562514e-05
Epoch 0. Moving avg of MSE: 5.05229197231e-05
Epoch 0. Moving avg of MSE: 5.036842

Epoch 0. Moving avg of MSE: 5.41051632305e-05
Epoch 0. Moving avg of MSE: 5.38057210303e-05
Epoch 0. Moving avg of MSE: 5.34994650248e-05
Epoch 0. Moving avg of MSE: 5.3460398736e-05
Epoch 0. Moving avg of MSE: 5.33527125076e-05
Epoch 0. Moving avg of MSE: 5.34913254684e-05
Epoch 0. Moving avg of MSE: 5.37957303043e-05
Epoch 0. Moving avg of MSE: 5.37746750382e-05
Epoch 0. Moving avg of MSE: 5.36699418804e-05
Epoch 0. Moving avg of MSE: 5.33602025716e-05
Epoch 0. Moving avg of MSE: 5.36383808831e-05
Epoch 0. Moving avg of MSE: 5.32303652938e-05
Epoch 0. Moving avg of MSE: 5.34099522478e-05
Epoch 0. Moving avg of MSE: 5.32040447979e-05
Epoch 0. Moving avg of MSE: 5.30634899779e-05
Epoch 0. Moving avg of MSE: 5.28945716711e-05
Epoch 0. Moving avg of MSE: 5.29016726729e-05
Epoch 0. Moving avg of MSE: 5.27784423232e-05
Epoch 0. Moving avg of MSE: 5.23589336808e-05
Epoch 0. Moving avg of MSE: 5.24716512846e-05
Epoch 0. Moving avg of MSE: 5.2466733331e-05
Epoch 0. Moving avg of MSE: 5.219614

## Conclusion 

As you can see, even for a simple eample like linear regression, ``gluon`` can help you to write quick, clean, clode. Next, we'll repeat this exercise for multilayer perceptrons, extending these lessons to deep neural networks and (comparatively) real datasets. 

For whinges or inquiries, [open an issue on  GitHub.](https://github.com/zackchase/mxnet-the-straight-dope)