# Chainer Turorial

## Core Concept

### Define-by-Run

the network is defined on-the-fly via the actual forward computation. More precisely, Chainer stores the history of computation instead of programming logic. This strategy enables to fully leverage the power of programming logic in Python. For example, Chainer does not need any magic to introduce conditionals and loops into the network definitions. The Define-by-Run scheme is the core concept of Chainer. 

### Variable

Array with a structure to keep track of computation.
Every variable holds a data array of type either <b>numpy.ndarray</b> or <b>cupy.ndarray</b>.

In [1]:
import numpy as np
import chainer
from chainer import cuda, Function, gradient_check, Variable, optimizers, serializers, utils
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L

In [3]:
x_data = np.array([5], dtype=np.float32)
x = Variable(x_data)
x.data

array([ 5.], dtype=float32)

In [4]:
y = x**2 - 2 * x + 1
y.data

array([ 16.], dtype=float32)

In [5]:
y.backward()
x.grad

array([ 8.], dtype=float32)

In [6]:
x = Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32))
y = x**2 - 2*x + 1
y.grad = np.ones((2, 3), dtype=np.float32)
y.backward()
x.grad

array([[  0.,   2.,   4.],
       [  6.,   8.,  10.]], dtype=float32)

### Links

we have to combine functions with parameters and optimize the parameters. You can use links to do this. Link is an object that holds parameters (i.e. optimization targets).

The most fundamental ones are links that behave like regular functions while replacing some arguments by their parameters. We will introduce higher level links, but here think links just like functions with parameters.

In [8]:
f = L.Linear(3, 2)
print f.W.data
print f.b.data

[[ 0.33271191 -0.38989511  0.05334002]
 [ 0.44967735  0.26993233 -0.8891862 ]]
[ 0.  0.]


##### An instance of the Linear link acts like a usual function:

In [9]:
x = Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32))
y = f(x)
y.data

array([[-0.28705826, -1.67801654],
       [-0.2985878 , -2.18674636]], dtype=float32)

In [10]:
f.zerograds()
y.grad = np.ones((2, 2), dtype=np.float32)
y.backward()
print f.W.grad
print f.b.grad

[[ 5.  7.  9.]
 [ 5.  7.  9.]]
[ 2.  2.]


### Chain

Most neural network architectures contain multiple links. For example, a multi-layer perceptron consists of multiple linear layers. We can write complex procedures with parameters by combining multiple links

In [11]:
l1 = L.Linear(4, 3)
l2 = L.Linear(3, 2)
def my_forward(x):
    h = l1(x)
    return l2(h)

In [None]:
class MyProc(object):
    def __init__(self):
        self.l1 = L.Linear(4, 3)
        self.l2 = L.Linear(3, 2)
    
    def forward(self, x):
        h = self.l1(x)
        return self.l2(h)

To support parameter management, CPU/GPU migration support, robust and flexible save/load features, etc. These features are <b>all supported by the Chain class in Chainer</b>.

In [14]:
class MyChain(Chain):
    def __init__(self):
        super(MyChain, self).__init__(
            l1=L.Linear(4, 3),
            l2=L.Linear(3, 2),
        )
    def __call__(self, x):
        h = self.l1(x)
        return self.l2(h)

### Optimizer

In order to get good values for parameters, we have to optimize them by the Optimizer class. It runs a numerical optimization algorithm given a link. 

In [15]:
model = MyChain()
optimizer = optimizers.SGD()
optimizer.setup(model)

The method setup() prepares for the optimization given a link.

In [16]:
model.zerograds()
optimizer.update()

In [None]:
def lossfun(args...):
    ...
    return loss
optimizer.update(lossfun, args...)