In [7]:
import numpy as np
import theano.tensor as T

Load the data that was generated earlier using Gibbs sampling:

In [8]:
data = np.load('gibbs-sample.dat.npy')
print ('Shape of data: ', data.shape)

Shape of data:  (50000, 16)


Here we shall implement the cost function of Minimum Probability Flow:

It can be shown that, for $E_{x}(W,b)=-\frac{1}{2}(x^TWx+bx)$ we have
$$E_x(W,b)-E_{x'}(W,b)=(1/2-x_h)(Wx+b)_h$$
where $x$ and $x'$ are data vectors with a [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) of one. The cost function of MPF denoted by $K(\theta)$ is given by:
$$K(\theta) = \frac{\epsilon}{|D|}\sum_{x\in D}\sum_{h=1}^{d}\exp\left[(1/2-x)_h(Wx+b)_h\right]$$
where $x$ is vector in the dataset $D$, $d$ is the dimension of the vector $x$ and $W, b$ are the weights and bias to be learnt respectively.

Initialise parameters for $W$ and $b$:

In [9]:
# Parameters
v = 16
epsilon = 0.01
D = data.shape[0]

In [15]:
W = np.random.rand(v, v)
b = np.random.rand(1, v)
print ('Shape of W:', W.shape)
print ('Shape of b:', b.shape)

Shape of W: (16, 16)
Shape of b: (1, 16)


Compute the initial cost, $K(\theta)$:

In [21]:
def Kcost(data, W, b):
    return np.sum(np.exp((0.5 - data) * data.dot(W) + b)) * (epsilon/D)

Compute the initial gradient: