In [3]:
import tensorflow as tf

### Why use Tensorflow for implementing Recommender Systems?

Tensorflow allows you to calculate derivatives for the cost functions, so you can calculate the derivative of the cost function without even knowing any calculas.

### Example of calculating a derivative of a Cost Function

We will look at an example of a cost function:

J = (w.x - y) ** 2

The value for y and x will be 1, and we will initialise the value of w as 3. We will be finding the derivative of cost function and updating the value of w with that

In [5]:
w = tf.Variable(3.0)
x = y = 1.0

alpha = 0.1
epochs = 30

for iter in range(epochs):
    
    # To enable automatic differentiation, we need to use tf.GradientTape(), and tell the 'tape' how costJ is calculated, so that it can calculate the gradient of costJ with respect to w
    with tf.GradientTape() as tape:
        f_wb = w*x
        costJ = (f_wb - y) ** 2
    
    # Use the tape to calculate the gradient of costJ with respect to w
    [dJ_dW] = tape.gradient(costJ, [w])
    
    # Update the value of w
    w.assign_add(-alpha * dJ_dW)

In [7]:
print(w.numpy()) # Value of w after training

1.002476


Auto Differentiation is also called Auto Diff, but it is also called Auto Grad, but the technically correct term is Auto Diff, because Auto Grad is the name of a software that calculates Auto Diff. You can use Auto Diff not just for implementing Gradient Descent, but also Adam optimizer in TensorFlow as well.

### Implementation of calculation of Derivatives and Adam optimizer for Collaborative Filtering

In [8]:
# Instantiate the optimizer with learning rate 0.1
optimizer = tf.keras.optimizers.Adam(learning_rate=1e-1)

epochs = 200
for iter in range(epoch);
    with tf.GradientTape() as tape:
        cost_value = tf.cofiCostFuncV(X, W, b, Ynorm, R, num_users, num_movies, lambda_reg) # define the collaborative filtering cost function

    grads = tape.gradient(cost_value, [X, W, b]) # calculate the gradients of the cost function with respect to X, W, and b

    optimizer.apply_gradients(zip(grads, [X, W, b])) # update the values of X, W, and b

Well, if you are asking why we didn't fit the data into a Dense layer of a NN and train the parameters that way, it's because the data can't be fit into a Dense layer, so we instead use Auto Diff in this way to calculate the derivatives of the cost function and implement the algorithm that way.