MXNet’s autograd package expedites this work by automatically calculating derivatives. 

And while most other libraries require that we compile a symbolic graph to take automatic derivatives, mxnet.autograd, like PyTorch, allows you to take derivatives while writing ordinary imperative code. 

Every time you make pass through your model, autograd builds a graph on the fly, through which it can immediately backpropagate gradients.

In [8]:
import mxnet as mx
from mxnet import nd
from mxnet import autograd

In [10]:
mx.random.seed(1234)

As a toy example, Let’s say that we are interested in differentiating a function ``f = 2 * (x ** 2)`` with respect to parameter x. We can start by assigning an initial value of x.

Once we computed the gradient of `f` with respect to `x`, we'll need a place to store it. In MXNet, we can tell an ndarrat that we plant ostore a gradient by invoking its **attach_grad()** method

In [24]:
x = nd.array([[1,2],[3,4]])
x.attach_grad()

In [25]:
#Create computational graph & instruct MXNet to start recording
with autograd.record():
    y = x*2
    z = y*x

In [26]:
#Call Backprop
z.backward()

In [27]:
print(x.grad)

print('')
print(x)
print(y)
print(z)


[[  4.   8.]
 [ 12.  16.]]
<NDArray 2x2 @cpu(0)>


[[ 1.  2.]
 [ 3.  4.]]
<NDArray 2x2 @cpu(0)>

[[ 2.  4.]
 [ 6.  8.]]
<NDArray 2x2 @cpu(0)>

[[  2.   8.]
 [ 18.  32.]]
<NDArray 2x2 @cpu(0)>


## Using head gradients

In [22]:
with autograd.record():
    y = x*2
    z = y*x

head_gradient = nd.array([[10,1.],[0.1,0.01]])
z.backward(head_gradient)
print(x.grad)


[[ 40.           8.        ]
 [  1.20000005   0.16      ]]
<NDArray 2x2 @cpu(0)>
