In [1]:
import numpy as np
import theano.tensor as T

Load the data that was generated earlier using Gibbs sampling:

In [2]:
data = np.load('gibbs-sample.dat.npy')
print ('Shape of data: ', data.shape)

Shape of data:  (50000, 16)


Here we shall implement the cost function of Minimum Probability Flow:

It can be shown that, for $E_{x}(W,b)=-\frac{1}{2}(x^TWx+bx)$ we have
$$E_x(W,b)-E_{x'}(W,b)=(1/2-x_h)(Wx+b)_h$$
where $x$ and $x'$ are data vectors with a [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) of one. The cost function of MPF denoted by $K(\theta)$ is given by:
$$K(\theta) = \frac{\epsilon}{|D|}\sum_{x\in D}\sum_{h=1}^{d}\exp\left[(1/2-x)_h(Wx+b)_h\right]$$
where $x$ is vector in the dataset $D$, $d$ is the dimension of the vector $x$ and $W, b$ are the weights and bias to be learnt respectively.

Initialise parameters for $W$ and $b$:

In [3]:
# Parameters
v = 16
epsilon = 0.01
D = data.shape[0]

In [4]:
W = np.random.rand(v, v)
b = np.random.rand(1, v)
print ('Shape of W:', W.shape)
print ('Shape of b:', b.shape)

Shape of W: (16, 16)
Shape of b: (1, 16)


Compute the initial cost, $K(\theta)$:

In [9]:
def Kcost(data, W, b, epsilon):
    """
    Returns the cost.
    Inputs:
    - data: numpy array of data that contains the different states of the network of size n
    - W: (n, n) numpy array of the weight matrix
    - b: (1, ) numpy array of biases
    - espilon: parameter for the cost
    """
    D = data.shape[0]
    return np.sum(np.exp((0.5 - data) * (data.dot(W) + b))) * (epsilon/D)

In [10]:
c = Kcost(data, W, b, epsilon)
print (c)

0.757915739647


Compute the initial gradient:

In [16]:
def initialgrad(data, W, b, epsilon):
    """
    Computes the initial gradient of the cost function
    Inputs:
    - data: numpy array of data that contains the different states of the network of size n
    - W: (n, n) numpy array of the weight matrix
    - b: (1, ) numpy array of biases
    - espilon: parameter for the cost
    """
    D = data.shape[0]
    Wgrad = np.zeros(W.shape)
    bgrad = np.zeros(b.shape)
    delta = 0.5 - data
    Wgrad = np.triu((epsilon / D) * delta * x * Kcost(data, W, b, epsilon)) \
          + np.tril((epsilon / D) * delta * x * Kcost(data, W, b, epsilon)).T
    bgrad = (epsilon / D) * Kcost(data, W, b, epsilon)
    
    
    