In [2]:
import numpy as np
import matplotlib.pyplot as plt

# 8-4. Initializers

To avoid vanishing or exploding gradient, we want the activation of each layer not too smaller or larger than 1.

Careful choice of initization method can help with it.

## Xavier Initialization

$$
W ~ U \left[ -\frac{\sqrt{6}}{\sqrt{n_j + n_{j+1}}} , \frac{\sqrt{6}}{\sqrt{n_j + n_{j+1}}} \right]
$$

Works well with tanh activation. Sometime called Glorot Initialization because the name of author is Xavier Glorot.

X. Glorot, and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. Proc. AISTATS, volume 9, pp. 249-256, 2010.

In [47]:
def xavier_init(n_in, n_out):
    limit = np.sqrt(6.0 / (n_in + n_out))
    return np.random.uniform(low=-limit, high=limit, size=(n_in, n_out))

#### Not careful initialization

In [75]:
x = np.random.randn(4, 10)
w = np.random.rand(10, 5) * 0.01
z = np.dot(x, w)
a = np.tanh(z)

In [76]:
a

array([[-0.03169883, -0.01401587, -0.03051881, -0.01716881, -0.0068985 ],
       [ 0.00604145, -0.00514469, -0.00155403, -0.01120579, -0.00641372],
       [-0.01578172, -0.00402429, -0.00561204, -0.02732289, -0.01917034],
       [-0.02890823, -0.01879694, -0.00350305, -0.02548757, -0.02527769]])

In [77]:
print(np.mean(x))
print(np.var(x))

-0.213624441131
1.09739694476


In [78]:
print(np.mean(w))
print(np.var(w))

0.00514046097504
8.07739736653e-06


In [79]:
print(np.mean(a))
print(np.var(a))

-0.0146231177692
0.000116552335896


#### Xavier initialization

In [80]:
x = np.random.randn(4, 10)
w = xavier_init(10, 5)
z = np.dot(x, w)
a = np.tanh(z)

In [81]:
a

array([[ 0.90007808,  0.83604129,  0.78359356, -0.93803927, -0.62864109],
       [-0.53958882, -0.89232924, -0.61626704,  0.07091654,  0.5156064 ],
       [ 0.68572553,  0.56603592,  0.80902637, -0.83510455, -0.77763795],
       [-0.00522897, -0.77569961, -0.78265741, -0.39748203, -0.86856651]])

In [82]:
print(np.mean(x))
print(np.var(x))

0.0592525909824
1.13072093473


In [83]:
print(np.mean(w))
print(np.var(w))

-0.0060916606688
0.144268379321


In [84]:
print(np.mean(a))
print(np.var(a))

-0.144510939779
0.479629251386


## He Initialization


K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. IEEE International Conference on Computer Vision (ICCV), 2015.