# L10b: Training a Boltzmann Machine

___
In this lab, we will train a _small_ Boltzmann machine on some simple datasets. 

## Tasks
Before we get started, we'll quickly review modern Hopfied Networks. Then, you'll execute the `Run All Cells` command to check if you (or your neighbor) have any code or setup issues. Code issues, then raise your hands - and let's get those fixed!

* __Task 1: Setup, Data, Constants (5 min)__: Let's take 5 minutes to load [a Simpsons character library from Kaggle](https://www.kaggle.com/datasets/kostastokis/simpsons-faces) that our Hopfield network will memorize.
*  __Task 2: Build a Modern Network Model (5 min)__: In this task, we'll formulate the image dataset we give the network and then create a model of a modern Hopfield network. We'll also quickly check to ensure we are doing what we think we are doing.
* __Task 3: Retrieve a memory from the network (30 min)__: In this task, we will retrieve a memory from the modern Hopfield network starting from a random state vector $\mathbf{s}_{\circ}$. We'll corrupt an image (by cutting off some fraction of the image) and then see if the model recovers the correct memory given the corrupted starting point. 

Let's get started!
___

In [1]:
include("Include.jl"); # Include the required packages and codes from Include.jl

Set some constants that we will use later.

In [2]:
number_of_nodes = 3; # number of nodes in the system
β = 1.0; # temperature parameter for the system
number_of_turns = 1000; # number of turns that we take in the simulation

## Training a Boltzmann Machine
Suppose have a collection of patterns $\mathbf{X} = \left\{\mathbf{x}^{(1)},\mathbf{x}^{(2)},\ldots,\mathbf{x}^{(m)}\right\}$, where $\mathbf{x}^{(i)}\in\mathbb{R}^{|\mathcal{V}|}$ is a binary vector of size $|\mathcal{V}|$ and $m$ is the number of patterns. We want to learn the parameters of the Boltzmann Machine $\mathcal{B}$ such that the stationary distribution of the Boltzmann Machine matches the distribution of the training patterns $\mathbf{X}$.

* __Goal__: The goal of training the Boltzmann Machine is to learn the weights $\mathbf{W}$ and biases $\mathbf{b}$ of the network such that the stationary distribution of the Boltzmann Machine matches the distribution of the training patterns in the dataset $\mathbf{X}$.
* __Gradient ascent__: The learning algorithm for the Boltzmann Machine is based on gradient ascent. The idea is to adjust the weights and biases of the network in the direction of the gradient of the log-likelihood of the training patterns. This will maximize the likelihood of observing the training patterns given the weights and biases of the network.

### Training Algorithm
The training algorithm for the Boltzmann Machine maximizes the log-likelihood of observing the training patterns $x_{i}\in\mathbf{X}$ given the weights $\mathbf{W}$ and biases $\mathbf{b}$ of the network. The log-likelihood algorithm is given by:

__Initialize__: the weights $\mathbf{W}$ and biases $\mathbf{b}$ of the network to some initial guess, e.g., using the Hopfield network Hebbian learning rule. Set the learning rate $\eta$, temperature $\beta = 1$, and number of turns $T$. Precompute the data-dependent expectation $\langle{x_{i}x_{j}}\rangle_{\mathbf{X}}$ and $\langle{x_{i}}\rangle_{\mathbf{X}}$ using every training pattern $\mathbf{x}^{(i)}\in\mathbf{X}$.

1. Simulate the Boltzmann Machine $\mathcal{B}$ until it becomes stationary (or for a fixed number of turns $T$). Then, generate a set of stationary samples $\mathbf{S} = \left\{\mathbf{s}^{(1)},\mathbf{s}^{(2)},\ldots,\mathbf{s}^{(m)}\right\}$.
3. Compute the model-dependent expectation $\langle{s_{i}s_{j}}\rangle_{\mathbf{S}}$ using the stationary samples $\mathbf{s}^{(i)}\in\mathbf{S}$.
3. Update the weights of the network using the following update rule: $w_{ij}^{\prime} = w_{ij} + \Delta{w_{ij}}$ where $\Delta{w_{ij}} = \eta\left(\langle{x_{i}x_{j}}\rangle_{\mathbf{X}} - \langle{s_{i}s_{j}}\rangle_{\mathbf{S}}\right)$. The hyperparameter $\eta$ is the learning rate, $\langle{x_{i}x_{j}}\rangle_{\mathbf{X}}$ is the data-dependent expectation, and $\langle{s_{i}s_{j}}\rangle_{\mathbf{S}}$ is the model-dependent expectation. The update rule is applied to all weights in the network, i.e..., $\forall i,j\in\mathcal{V}$.
4. Update the biases of the network using the following update rule: $b_{i}^{\prime} = b_{i} + \Delta{b_{i}}$ where $\Delta{b_{i}} = \eta\left(\langle{x_{i}}\rangle_{\mathbf{X}} - \langle{s_{i}}\rangle_{\mathbf{S}}\right)$. The hyperparameter $\eta$ is the learning rate, $\langle{x_{i}}\rangle_{\mathbf{X}}$ is the data-dependent expectation, and $\langle{s_{i}}\rangle_{\mathbf{S}}$ is the model-dependent expectation. The update rule is applied to all biases in the network, i.e., $\forall i\in\mathcal{V}$.
5. Repeat steps 2-4 until convergence (or for a fixed number of iterations). 

## Task 1: Machine we will learn
In this task, let's review the dynamics of the Boltzmann machine that we are trying to learn. We'll do a very simple _three_ node Boltzmann machine (which is small enough to compute all the possible configurations). 

First, let's setup our model of the Boltzmann machine with some random parameters that we will learn in the next task. We'll save the random weights in the `W::Array{Float64,2}` matrix and the random biases in the `b::Array{Float64,1}` vector.

In [3]:
W,b = let 

    # initialize some random weights and biases
    W = randn(number_of_nodes, number_of_nodes);
    b = randn(number_of_nodes);

    # subract the mean from the weights (no self connections)
    W = W - diagm(diag(W));

    # return -
    W, b
end;

Next, let's build a model of the test Boltzmann machine. We'll use [the `MySimpleBoltzmannMachineModel` struct](src/Types.jl) to represent the machine, we build an instance of this type [using a `build(...)` method](src/Factory.jl). The struct will have `W::Array{Float64,2}` and `b::Array{Float64,1}` fields that we set when we are build an instance of the model.

In [4]:
model = build(MySimpleBoltzmannMachineModel, (
    W = W,
    b = b,
));

### Sample the test model
Fill me in.

In [7]:
S,E = let

    # initialize -
    sₒ = [-1,1,1]; # setup the initial state of the system

    # run the simulation
    (T,S,E) = simulate(model, sₒ, T = number_of_turns, β = β); # simulate the model 

    # return the data (we don't need the turn vector)
    S,E
end;

In [8]:
S

3×1000 Matrix{Int64}:
 -1   1   1  1  1  -1   1   1   1   1  …   1   1   1   1   1   1  1   1   1
  1   1   1  1  1   1   1   1   1   1      1   1   1   1   1   1  1   1   1
  1  -1  -1  1  1  -1  -1  -1  -1  -1     -1  -1  -1  -1  -1  -1  1  -1  -1

### What is the stationary distribution?
Fill me in.

## Task 2: Estimate the Boltzmann machine parameters
Fill me in.

## Task 3: Compare with Theoretical Expectations
Fill me in.