

# RBMs and DBNs

## Learning Objectives
By the end of this lesson, you will be able to:


1) Brief RBMs and DBNs <br/>
2) Implement RBMs to reconstruct the input image <br/>
3) Use Convolution theory within RBMs <br/>

## Collaborative Filtering

The collaborative filtering technique is used to build intelligent recommendation systems. It filters items or products that a user might like based on likes or reactions of similar users.


There are two types of collaborative filtering:

* Memory Based: It consists of two methods, user-based and item-based. User-based collaborative filtering works on the users that have rated similar items to predict target user's ratings, whereas item-based collaborative filtering works on similar items that the user has already rated.

* Model based: It uses machine learning to find user ratings for an item. The typical machine learning algorithms used are PCA, SVD, Neural Networks, etc.

Consider the below example of collaborative filtering, where you examine eight users and their ratings against seven movies.

![alt text](https://drive.google.com/uc?id=12sLyoyqsPPFbUx7BWDqbC2rgUZ09nCf5)

    The value in each cell shows the score that the user has given to the movie after watching it.
![image.png](https://drive.google.com/uc?id=1tPsi1ULtMWX9SxzpnhMrpu_PcMknJar9)

##Neural Network

Now, imagine you have a type of neural network that has only two layers: the input layer and the hidden layer.
![image.png](3.png)

## Collaborative Filtering and Neural Network
When you feed the first user vector into the network, it goes through the network and finally fires up some units in the hidden layer.


![image.png](https://drive.google.com/uc?id=1q857UtW1TTdTuauB-e2hH6aZkn2oMj2s)


Then, the values of the hidden layer will be fed back into the network and a vector, which is almost the same as the input vector is reconstructed as an output.


![image.png](https://drive.google.com/uc?id=1Fj73xwSRZdXfXopomPwD3KA4Y7fd5rtG)

You feed the second user’s ratings, which are not very different from the first user, and the same hidden units will be turned on, and the network output will be the same.

![image.png](https://drive.google.com/uc?id=1RG_rE7fcFYKNrspicqd6g0Xx9M0m6y68)


You can repeat it for the third user as well as the fourth user.


![image.png](https://drive.google.com/uc?id=1vGMFVc73ts3CwOfO8bWU8tKwvuou8yX5)

Now, feed a user that has a completely different idea about these movies. When you feed the respective rating values into the network, different hidden units get turned on giving way to the reconstruction of a different vector.


![image.png](https://drive.google.com/uc?id=1eDpYMpp3fGW_VP9Xys7a4Iq6Ujw3iR7m)

The vector is almost same as user number six and the process can be repeated for other users.


![image.png](https://drive.google.com/uc?id=1ZH-rn_5r3cbLc5tTiUTFA8k-3YL2-JjF)

Now, consider user number eight. She or he hasn’t watched movie six but, does have some preferences that are almost same as users five and six.


![image.png](https://drive.google.com/uc?id=1yD9yZBex93Wsmbgl-KtL1ElA7UCrGki4)

If you now feed this vector into your network, it’ll fire up the same hidden units as users five and six.


![image.png](https://drive.google.com/uc?id=1ckFRBQfM-_FPFSXBgv2yrtWRtFUNzQ4J)

## Boltzmann Machines

Boltzmann machines are named after the Boltzmann distribution.
<br><br>
**Boltzmann Distribution:**<br>
It is a part of Statistical Mechanics that enables us to understand the impact of parameters, such as Entropy and Temperature on the Quantum States in Thermodynamics.
<br><br>
**History:**<br>
Boltzmann distribution was invented in 1985 by Geoffrey Hinton at Carnegie Mellon University. The research and development of Boltzmann distribution were further carrried out by Terry Sejnowski, a professor at Johns Hopkins University.

### What Are Boltzmann Machines?

Boltzmann machines are stochastic and generative neural networks that can learn internal representations and solve difficult problems of combinatorics.

### Overview of the Functioning of Boltzmann Machines 

Boltzmann machines consist of two types of nodes:
* Hidden nodes (h)
* Visible nodes (v)

![boltzmann](https://drive.google.com/uc?id=1rhhgnH05qlS2hxXQtsGwsFlALaADqRnW)

**Why Boltzmann machines are stochastic?**<br>

Boltzmann machines are devoid of output nodes and this factor provides the machines their stochastic feature.
In other words, Boltzmann machines do not have output as 1 or 0 rather, they learn the patterns without output.
<br><br>
**How Boltzmann machines are different from traditional neural netwroks?**<br>
* Boltzmann machines have connections among the input nodes. 
In the image above, you can see that irrespective of their type, all the nodes are connected to each other. These indiscriminated connections help sharing the information among themselves as well as in the self-generation of the subsequent data.

* Measurement depends on the data present in the visible node and not in the hidden node.

### Limitations of Boltzmann Machines

* Have increased frequency of weight adjustment and no measures to determine the number of times the weight change is allowed

* Require more time to collect statistics in order to calculate probabilities

* Are ambiguous about adjusting the temperature during simulated annealing

* Have no measures to decide whether the network has reached the equilibrium temperature or not

## Restricted Boltzmann Machines

RBMs are shallow neural networks.


![image.png](https://drive.google.com/uc?id=1YqxzjPqkN6PAs7iMpDs9g9B_q_gAOOip)   

* Has 2 layers 
* Is unsupervised
* Finds patterns in the data by reconstructing the input
* RBMs do not have intra layer connection between neurons
* The two-layered networks are used for regression, classification, dimensionality reduction, feature learning, collaborative filtering, and topic modeling. 


They are restricted because neurons within the same layer are not connected.


![image.png](https://drive.google.com/uc?id=1kOksg7Eai5qoklOjO7hgU6oL65v2Dhd7)


## Feeding RBM

On feeding an input image, the values that appear in the hidden layer can be considered as features learned automatically from the input data.

![image.png](https://drive.google.com/uc?id=1sgUBPSUuyq-gBxF8U3rJ-B4GBwfI805G)


The values in the hidden units are a good representation of data that are lower in dimensionality when compared to the original data.

## Applications of RBM

* Dimensionality Reduction
* Collaborative Filtering
* Feature Extraction
* Deep Belief Networks

    Note: Deep Belief Networks are created by stacking multiple RBMs along with a back propagation and gradient descent-based optimization.

## Learning Process of RBM


The learning process consists of several forward and backward passes, where the RBM tries to reconstruct the input data and feed it to a visible layer for a new forward pass. The weights and biases of the neural net are adjusted accordingly.


![image.png](https://drive.google.com/uc?id=1Wn04_KR207_uvO2blrIrBqhbbj3l8exO)

## Problem Statement: 
Consider the MNIST dataset and the classification model from the previous lesson for image classification. Now to increase the model accuracy, the training data must be increased. Apply the concept of RBM as they are also known for their ability to reconstruct images.
<br>
<br>
## Objective: 
Classify images of MNIST data and reconstruct them using RBM.

Link to dataset: https://www.dropbox.com/s/39ohoapd2o88ky3/train%20%281%29.csv?dl=0



### Current State of Neuron

The energy of neuron **i** in a state is calculated using the total input on connections of all active neurons and its own bias:

![image.png](https://drive.google.com/uc?id=1RHNDn1rhXWTLkrEVbloYxl_cV5aS8e7l)

### Probability of Neuron Activation

**Sigmoid Function**<br>
The probability that the neuron **i** will be active is given by:<br>
![image.png](https://drive.google.com/uc?id=1Vvw2oUDLSEr7wVuykPmnVvrZ3AFvlqQ0)

### The Energy Factor

Now, if the neurons change their state, the change is driven by Boltzmann’s energy equation:

![image.png](https://drive.google.com/uc?id=1TQhmE6qq0HpUe4wlJkzy44xhf14ClZEj)

### The Energy Function

In a Boltzmann machine, the energy of a state vector is defined as:

![image.png](https://drive.google.com/uc?id=1MZa5I1uIU86IJ_iTlhDK7968BR6GVXMW)

The energy function can be further generalized with respect to v, h as:

![image.png](https://drive.google.com/uc?id=1IDr4zrJ1eveAwgduJxbXo2XUEgIICz1n)

The probability of the whole system with respect to v, h can be represented as:

![image.png](https://drive.google.com/uc?id=1pPbbC3PIo3r2HFSmAvbxjHazVHAAaHb3)

Where:<br>
z = Partition function<br>
E*e^(-E(v,h)) = Summation of all pairs of visible and hidden vectors

The above formula confirms the probability that any neuron can be activated.

### An Activated Hidden Neuron

The probability of a hidden neuron getting activated can be expressed collectively as:

![image.png](https://drive.google.com/uc?id=1W34lQIBR9blqfVXEjpXQ54eIKdhp5F0v)

### An Activated Visible Neuron

The probability of a visible neuron getting activated can be expressed collectively as:

![image.png](https://drive.google.com/uc?id=1vD7frgMJv58BDR6IH3dlWzy0JV7WCSEn)

## RBM Training Procedure

### Forward Pass

Inputs are converted into binary values then the vector input is fed to the network, where its values are combined with individual weights and corresponding biases.

![image.png](https://drive.google.com/uc?id=1D4eTRz_sOzFSg_L12R0o2HRcb--FQqyJ)

### Backward Pass

Activations are combined with an individual weight and a bias. Results are passed on to the visible layer.

![image.png](https://drive.google.com/uc?id=1uvM4Vig8UZJrO3dWlvxvabHeCML4k27E)

### Divergence Calculation

Input 𝑥 and samples 𝑥~ are compared in the visible layer. The parameters are then updated and the steps are repeated.

![image.png](https://drive.google.com/uc?id=1QO5xEU0AhknmuCDALvOrjDPFictbFeEp)


![image.png](https://drive.google.com/uc?id=1UhSf9bahBfPkJZ4wNrQysdLi1cExKy_O)

## Deep Belief Networks (DBN)

* DBNs can be represented as stacked RBMs. The hidden layer of one RBM is the visible layer of the one above it.
* An unsupervised pretraining step is performed by training the layers one RBM at a time.
* The output for one set is used as the input for the next one.

![image.png](https://drive.google.com/uc?id=1uSxFTRyuzWarmK2oWxO3Jmli794ihIXA)

### Supervised Fine Tuning of DBNs

* The network can be further optimized by gradient descent with respect to a supervised training criterion.
* The parameters are slightly updated as a small set of labeled samples is introduced.

![image.png](https://drive.google.com/uc?id=1DtSdH-IBNY2udrOW5ANwJijjcjmnnO9v)

### DBN Energy Equations


![image.png](https://drive.google.com/uc?id=1piIAeScisYpOHLHN72KjUr8PY7xYUwA6)

### RBM vs. DBN

####RBM  <br>![image.png](https://drive.google.com/uc?id=19srGI4T3vL6hHQiK5LUgtc8DSDIVTTGl)

#### DBN <br>![image.png](https://drive.google.com/uc?id=1K3mj8YrX2Q6N70H0Wv7HVsFwf3Em6fgm)

### Application of DBN

* **Image recognition**<br>
DBN can be used in image recognition where input can be a picture and output will be the category. This can prove useful in heatlhcare sector where diagnosis of medical condition can be done on the basis of photographs. For example, classifying pathogens will be an easy task and would eliminate the reliance on specialists during epidemics.

* **Video recognition**<br>
DBN helps in video recognition where it works like vision and the purpose is to find meaning in the data that is present in the form of video. For example, identifying gesture of a person or finding an object. 

* **Motion-capture data**<br>
Motion-capture data is used in tracking the movement of people or objects and DBN helps in achieving the outcome. Motion capture relies not only on the appearance of the object or a person but also on the velocity and distance. Motion capture is widely used in filmmaking and development of video games.

## Convolutional RBM

* Convolutional RBM is based on the idea of translation invariance.
* It can be considered as a variant of RBM.
* It has 𝑁_𝑣 x 𝑁_𝑣 visible units.
* It has 𝑁_𝑘 x 𝑁_𝑘 hidden units per group.
* Each group has a 𝑁_𝑤 x 𝑁_𝑤 filter.
* A bias 𝑏^𝑘 is associated with each hidden group.

### Feature Maps in CRBM

* The hidden units are partitioned into K submatrices called feature maps (H1, H2, . . . , HK).
* Each hidden unit represents the presence of a specific feature.

![image.png](https://drive.google.com/uc?id=1KYic-4Mm-RYfxbdh65n2QdM1gt2r-0TJ)

### Convolving Filters and Hidden Units

One filter is convolved against each hidden group.


![image.png](https://drive.google.com/uc?id=1Xpc42_nyRUoM2O-gF2KYmDzd_qY1xZx5)

![image.png](https://drive.google.com/uc?id=1P0EG_ubvk62gqtaNebT3amgUxFBdsSqH)

![image.png](https://drive.google.com/uc?id=1NkDwy2_2dnG4wRKLLUcNzTd4ER4Mbv1v)






![image.png](https://drive.google.com/uc?id=1z-rLj9ZgUnJBukL-I5hoNfJwmkf9YbVW)

![image.png](https://drive.google.com/uc?id=1mTAKJ25sLQYn--OpB0Asg14Og3cvYeHP)

### Energy in CRBM

The energy in CRBM is given in the below equation:


![image.png](https://drive.google.com/uc?id=1gbKGehYc7gOHqFyI3iUVNevathJ27K84)

### Fully Connected CRBM

CRBM is used as the building block of the local feature detector hierarchy.


![image.png](https://drive.google.com/uc?id=1nm32pmzxtlyzJrLW4RD0F9l21EALfkR8)

Max pooling can be performed to subsample the features in non-overlapping image regions.

## Convolutional DBN (CDBN)

* Several max pooling CRBMs are stacked together 
* Realistic images are gracefully scaled
* CRBMs are translational invariant 

### Training CDBN



![image.png](https://drive.google.com/uc?id=14ax2R0rcJeog8hD5136t3cJCQX5dkHnc)

## Knowledge Check
Click [here](https://drive.google.com/open?id=1wOEiMHF4Oo0A9Vvn6WRUlP3X8f51bheRl3xtj8stcew) for knowledge check

### Key Takeaways

* RBMs are shallow neural networks and try to reconstruct the input data.
* The learning process consists of several forward and backward passes.
* Convolutional RBM is based on the idea of translation invariance.

## Lesson end Project

### Build an Movie Recommendation System Using RBM

**Problem Scenario**: You work in a content streaming company where you need to create a recommendation system for movies using RBM.

**Objective** : Create a movie recommendation system using RBM with PyTorch.
<br>
<br>
Link to Dataset: https://www.dropbox.com/sh/7k8nymvfww5xtww/AAAQvkeCoiVbTYE8HJN1STxSa?dl=0