In [1]:
import numpy as np

## The softmax function

The softmax function allows us to transform a vector of real numbers into a probability distribution. Its formula is:

$$
\mathrm{softmax}(x, i) = \frac{e^x_i}{\sum_{j=1}^{n}{e^x_j}}
$$

$e^x$ gets huge quickly, and makes us run into numerical limitation. We can show that 
$$
\mathrm{softmax}(x,i) = \frac{e^{x_i-c}}{\sum_{j=1}^{n}{e^{x_j - c}}}
$$
which allows us to subtract the maximum value of our vector from each element to avoid numerical issues.

In [2]:
def softmax(x):
    orig_shape = x.shape
    
    if len(x.shape) > 1:
        # Matrix
        c = -np.array([np.max(x, axis=1)]).T
        e_x = np.exp(x + c)
        _sum = e_x.sum(axis=1)
        x = e_x / _sum[:, None]
    else:
        # Vector
        c = -np.max(c)
        e_x = np.exp(x + c)
        _sum = np.sum(e_x)
        x = e_x / sum
        
    assert x.shape == orig_shape
    return x

When implementing derivative functions, it's quite useful to check intermediate results before going wildly off tangent. An easy way to do this is to compute the numerical gradient, and compare the two values.

We use a very simple numerical calculation of the gradient:

$$
\mathrm{grad}(f(x)) = \frac{f(x+h) - f(x-h)}{2h}
$$

We use [numpy multi-index iteration](https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.nditer.html#tracking-an-index-or-multi-index).

In [13]:
def check_gradient(f, x):
    fx, grad = f(x)
    h = 1e-4
    
    # x can be a vector or a matrix, so we want to compute the derivative according to each element.
    # This is straight out of the cs224n code.
    it = np.nditer(x, flags=['multi_index'], op_flags=['readwrite'])
    while not it.finished:
        ix = it.multi_index
        
        x[ix] += h
        fx_1, _ = f(x)
        x[ix] -= 2 * h
        fx_2, _ = f(x)
        x[ix] += h
        
        numgrad = (fx_1 - fx_2) / (2 * h)
        
        # we can now check the numerical gradient against the computed gradient.
        reldiff = abs(numgrad - grad[ix]) / max(1, abs(numgrad), abs(grad[ix]))
        if reldiff > 1e-5:
            print("Gradient check failed at index {}: {} should be {}".format(ix, grad[ix], numgrad))
            return
        
        it.iternext()
    
    print("Gradients seem to be ok!")

Let's check out the gradient checking.

In [14]:
def test_gradient_f(x):
    return 2. * x[0], np.ones_like(x) * 2.

check_gradient(test_gradient_f, np.array([1.]))

Gradients seem to be ok!


When 