# <img align="left" src="./images/movie_camera.png"     style=" width:40px;  " > Collaborative Filtering Recommender Systems

In this exercise, you will implement collaborative filtering to build a recommender system for movies. 

# <img align="left" src="./images/film_reel.png"     style=" width:40px;  " > Outline
- [ 1 - Notation](#1)
- [ 2 - Recommender Systems](#2)
- [ 3 - Movie ratings dataset](#3)
- [ 4 - Collaborative filtering learning algorithm](#4)
  - [ 4.1 Collaborative filtering cost function](#4.1)
    - [ Exercise 1](#ex01)
- [ 5 - Learning movie recommendations](#5)
- [ 6 - Recommendations](#6)
- [ 7 - Congratulations!](#7)




In [3]:
import numpy as np
import copy, math
import tensorflow as tf
from tensorflow import keras
from recsys_utils import *
from numpy import loadtxt
from numpy import savetxt

<a name="2"></a>
## 2 - Recommender Systems <img align="left" src="./images/film_rating.png" style=" width:40px;  " >
In this lab, you will implement the collaborative filtering learning algorithm and apply it to a dataset of movie ratings.
The goal of a collaborative filtering recommender system is to generate two vectors: For each user, a 'parameter vector' that embodies the movie tastes of a user. For each movie, a feature vector of the same size which embodies some description of the movie. The dot product of the two vectors plus the bias term should produce an estimate of the rating the user might give to that movie.

The diagram below details how these vectors are learned.

<figure>
   <img src="./images/ColabFilterLearn.PNG"  style="width:740px;height:250px;" >
</figure>

Existing ratings are provided in matrix form as shown. $Y$ contains ratings; 0.5 to 5 inclusive in 0.5 steps. 0 if the movie has not been rated. $R$ has a 1 where movies have been rated. Movies are in rows, users in columns. Each user has a parameter vector $w^{user}$ and bias. Each movie has a feature vector $x^{movie}$. These vectors are simultaneously learned by using the existing user/movie ratings as training data. One training example is shown above: $\mathbf{w}^{(1)} \cdot \mathbf{x}^{(1)} + b^{(1)} = 4$. It is worth noting that the feature vector $x^{movie}$ must satisfy all the users while the user vector $w^{user}$ must satisfy all the movies. This is the source of the name of this approach - all the users collaborate to generate the rating set. 

<figure>
   <img src="./images/ColabFilterUse.PNG"  style="width:640px;height:250px;" >
</figure>

Once the feature vectors and parameters are learned, they can be used to predict how a user might rate an unrated movie. This is shown in the diagram above. The equation is an example of predicting a rating for user one on movie zero.


In this exercise, you will implement the function `cofiCostFunc` that computes the collaborative filtering
objective function. After implementing the objective function, you will use a TensorFlow custom training loop to learn the parameters for collaborative filtering. The first step is to detail the data set and data structures that will be used in the lab.

<a name="3"></a>
## 3 - Movie ratings dataset <img align="left" src="./images/film_rating.png"     style=" width:40px;  " >
The data set is derived from the [MovieLens "ml-latest-small"](https://grouplens.org/datasets/movielens/latest/) dataset.   
[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>]

The original dataset has  9000 movies rated by 600 users. The dataset has been reduced in size to focus on movies from the years since 2000. This dataset consists of ratings on a scale of 0.5 to 5 in 0.5 step increments. The reduced dataset has $n_u = 443$ users, and $n_m= 4778$ movies. 

Below, you will load the movie dataset into the variables $Y$ and $R$.

The matrix $Y$ (a  $n_m \times n_u$ matrix) stores the ratings $y^{(i,j)}$. The matrix $R$ is an binary-valued indicator matrix, where $R(i,j) = 1$ if user $j$ gave a rating to movie $i$, and $R(i,j)=0$ otherwise. 

Throughout this part of the exercise, you will also be working with the
matrices, $\mathbf{X}$, $\mathbf{W}$ and $\mathbf{b}$: 

$$\mathbf{X} = 
\begin{bmatrix}
--- (\mathbf{x}^{(0)})^T --- \\
--- (\mathbf{x}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{x}^{(n_m-1)})^T --- \\
\end{bmatrix} , \quad
\mathbf{W} = 
\begin{bmatrix}
--- (\mathbf{w}^{(0)})^T --- \\
--- (\mathbf{w}^{(1)})^T --- \\
\vdots \\
--- (\mathbf{w}^{(n_u-1)})^T --- \\
\end{bmatrix},\quad
\mathbf{ b} = 
\begin{bmatrix}
 b^{(0)}  \\
 b^{(1)} \\
\vdots \\
b^{(n_u-1)} \\
\end{bmatrix}\quad
$$ 

The $i$-th row of $\mathbf{X}$ corresponds to the
feature vector $x^{(i)}$ for the $i$-th movie, and the $j$-th row of
$\mathbf{W}$ corresponds to one parameter vector $\mathbf{w}^{(j)}$, for the
$j$-th user. Both $x^{(i)}$ and $\mathbf{w}^{(j)}$ are $n$-dimensional
vectors. For the purposes of this exercise, you will use $n=10$, and
therefore, $\mathbf{x}^{(i)}$ and $\mathbf{w}^{(j)}$ have 10 elements.
Correspondingly, $\mathbf{X}$ is a
$n_m \times 10$ matrix and $\mathbf{W}$ is a $n_u \times 10$ matrix.

We will start by loading the movie ratings dataset to understand the structure of the data.
We will load $Y$ and $R$ with the movie dataset.  
We'll also load $\mathbf{X}$, $\mathbf{W}$, and $\mathbf{b}$ with pre-computed values. These values will be learned later in the lab, but we'll use pre-computed values to develop the cost model.

In [4]:
#Load data
X, W, b, num_movies, num_features, num_users = load_precalc_params_small()
Y, R = load_ratings_small()

print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
print("W", W.shape)
print("b", b.shape)
print("num_features", num_features)
print("num_movies",   num_movies)
print("num_users",    num_users)

Y (4778, 443) R (4778, 443)
X (4778, 10)
W (443, 10)
b (1, 443)
num_features 10
num_movies 4778
num_users 443


In [5]:
#
# From the matrix, we can compute statistics like average rating.
#
tsmean =  np.mean(Y[0, R[0, :].astype(bool)])
print(f"Average rating for movie 1 : {tsmean:0.3f} / 5" )

Average rating for movie 1 : 3.400 / 5


<a name="4"></a>
## 4 - Collaborative filtering learning algorithm <img align="left" src="./images/film_filter.png"     style=" width:40px;  " >

Now, you will begin implementing the collaborative filtering learning
algorithm. You will start by implementing the objective function. 

The collaborative filtering algorithm in the setting of movie
recommendations considers a set of $n$-dimensional parameter vectors
$\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}$, $\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$, where the
model predicts the rating for movie $i$ by user $j$ as
$y^{(i,j)} = \mathbf{w}^{(j)}\cdot \mathbf{x}^{(i)} + b^{(i)}$ . Given a dataset that consists of
a set of ratings produced by some users on some movies, you wish to
learn the parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},
\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$  and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes
the squared error).

You will complete the code in cofiCostFunc to compute the cost
function for collaborative filtering. 


<a name="4.1"></a>
### 4.1 Collaborative filtering cost function

The collaborative filtering cost function is given by
$$J({\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)},\mathbf{w}^{(0)},b^{(0)},...,\mathbf{w}^{(n_u-1)},b^{(n_u-1)}})= \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2
+\underbrace{
\frac{\lambda}{2}
\sum_{j=0}^{n_u-1}\sum_{k=0}^{n-1}(\mathbf{w}^{(j)}_k)^2
+ \frac{\lambda}{2}\sum_{i=0}^{n_m-1}\sum_{k=0}^{n-1}(\mathbf{x}_k^{(i)})^2
}_{regularization}
\tag{1}$$
The first summation in (1) is "for all $i$, $j$ where $r(i,j)$ equals $1$" and could be written:

$$
= \frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2
+\text{regularization}
$$

You should now write cofiCostFunc (collaborative filtering cost function) to return this cost.

<a name="ex01"></a>
### Exercise 1

**For loop Implementation:**   
Start by implementing the cost function using for loops.
Consider developing the cost function in two steps. First, develop the cost function without regularization. A test case that does not include regularization is provided below to test your implementation. Once that is working, add regularization and run the tests that include regularization.  Note that you should be accumulating the cost for user $j$ and movie $i$ only if $R(i,j) = 1$.

In [6]:
def compute_cost(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movie was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    nm, nu = Y.shape
    f_wb = np.matmul (X,W.T) + b
    f_wb_yr = np.square (f_wb - Y) * R
    J = 0.5 * (np.sum(f_wb_yr) + lambda_ * (np.linalg.norm(W)**2 + np.linalg.norm(X)**2))
    return J


**Vectorized Implementation**

It is important to create a vectorized implementation to compute $J$, since it will later be called many times during optimization. The linear algebra utilized is not the focus of this series, so the implementation is provided. If you are an expert in linear algebra, feel free to create your version without referencing the code below. 

Run the code below and verify that it produces the same results as the non-vectorized version.

In [7]:
def compute_cost_tf(X, W, b, Y, R, lambda_):
    """
    Returns the cost for the content-based filtering
    Vectorized for speed. Uses tensorflow operations to be compatible with custom training loop.
    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th user
      lambda_ (float): regularization parameter
    Returns:
      J (float) : Cost
    """
    
    #
    # We are using Tensorflow APIs to compute the cost
    # We need to do this so that Tensorflow can track these operations when it need to compute gradients using AutoDiff
    #
    f_wb_yr = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*R
    J = 0.5 * tf.reduce_sum(f_wb_yr**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))
    return J

<a name="5"></a>
## 5 - Learning movie recommendations <img align="left" src="./images/film_man_action.png" style=" width:40px;  " >
------------------------------

After you have finished implementing the collaborative filtering cost
function, you can start training your algorithm to make
movie recommendations for yourself. 

In the cell below, you can enter your own movie choices. The algorithm will then make recommendations for you! We have filled out some values according to our preferences, but after you have things working with our choices, you should change this to match your tastes.
A list of all movies in the dataset is in the file [movie list](data/small_movie_list.csv).

In [8]:
movieList, movieList_df = load_Movie_List_pd()

# Initialize my ratings
my_ratings = np.zeros(num_movies)          

#
# Create an array that with ratings given by you.
# Based in these ratinbgs, we want the algorithm to predict what other movies you may like
# Check the file small_movie_list.csv for id of each movie in our dataset
#
my_ratings[2700] = 5   # Toy Story 3 (2010)
my_ratings[2609] = 2   # Persuasion (2007)
my_ratings[29]   = 4   # Road to El Dorado, The (2000)
my_ratings[79]   = 4   # X-Men (2000)
my_ratings[204]  = 3   # The Contenders (2001)
my_ratings[271]  = 1   # Cats & Dogs (2001)
my_ratings[610]  = 5   # Ghost Ship (2002)
my_ratings[837]  = 2   # The Rise and Fall of Gator (2002)
my_ratings[929]  = 5   # Lord of the Rings: The Return of the King, The
my_ratings[653]  = 4   # Lord of the Rings: The Two Towers, The (2002)
my_ratings[246]  = 5   # Shrek (2001)
my_ratings[2716] = 3   # Inception
my_ratings[1150] = 5   # Incredibles, The (2004)
my_ratings[382]  = 2   # Amelie (Fabuleux destin d'Amélie Poulain, Le)
my_ratings[1051] = 5   # Harry Potter and the Prisoner of Azkaban (2004)
my_ratings[622]  = 5   # Harry Potter and the Chamber of Secrets (2002)
my_ratings[988]  = 3   # Eternal Sunshine of the Spotless Mind (2004)
my_ratings[2925] = 1   # Louis Theroux: Law & Disorder (2008)
my_ratings[2937] = 1   # Nothing to Declare (Rien à déclarer)
my_ratings[793]  = 5   # Pirates of the Caribbean: The Curse of the Black Pearl (2003)

#
# Show the IDs of all movies for which you have provided a rating
#
my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]
print (my_rated)

print('-- Movie ratings by Viji Sarathy ---')
for i in range(len(my_ratings)):
    if my_ratings[i] > 0 :
        print(f'Rated {my_ratings[i]} for  {movieList_df.loc[i,"title"]}');

[29, 79, 204, 246, 271, 382, 610, 622, 653, 793, 837, 929, 988, 1051, 1150, 2609, 2700, 2716, 2925, 2937]
-- Movie ratings by Viji Sarathy ---
Rated 4.0 for  Road to El Dorado, The (2000)
Rated 4.0 for  X-Men (2000)
Rated 3.0 for  Series 7: The Contenders (2001)
Rated 5.0 for  Shrek (2001)
Rated 1.0 for  Cats & Dogs (2001)
Rated 2.0 for  Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Rated 5.0 for  Ghost Ship (2002)
Rated 5.0 for  Harry Potter and the Chamber of Secrets (2002)
Rated 4.0 for  Lord of the Rings: The Two Towers, The (2002)
Rated 5.0 for  Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Rated 2.0 for  Stoked: The Rise and Fall of Gator (2002)
Rated 5.0 for  Lord of the Rings: The Return of the King, The (2003)
Rated 3.0 for  Eternal Sunshine of the Spotless Mind (2004)
Rated 5.0 for  Harry Potter and the Prisoner of Azkaban (2004)
Rated 5.0 for  Incredibles, The (2004)
Rated 2.0 for  Persuasion (2007)
Rated 5.0 for  Toy Story 3 (2010)
Rated 3.0 for  Ince

In [10]:
# Reload ratings and add new ratings
Y, R = load_ratings_small()
num_movies, num_users = Y.shape
num_features = 100

#
# Concatenate the ratings that you provided to the given data
#
Y = np.c_[my_ratings, Y]
R = np.c_[(my_ratings != 0).astype(int), R]

# Normalize the Dataset
Ynorm, Ymean = normalizeRatings(Y, R)

Let's prepare to train the model. Initialize the parameters and select the Adam optimizer.

In [11]:
#  Useful Values
num_movies, num_users = Y.shape
num_features = 100

#
# Set Initial Parameters (W, X), use tf.Variable to track these variables
# W represents the matrix of "parameters" for the users
# X represents the matrix of "embeddings" for the movies (items)
# The parameter 'num_features' can be set to any value depending but it is always much smaller than 'num_users' and 'num_movies'
#
tf.random.set_seed(1234) 
W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

# Instantiate an optimizer.
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

Let's now train the collaborative filtering model. This will learn the parameters $\mathbf{X}$, $\mathbf{W}$, and $\mathbf{b}$. 

The operations involved in learning $w$, $b$, and $x$ simultaneously do not fall into the typical 'layers' offered in the TensorFlow neural network package.  Consequently, the flow used in Course 2: Model, Compile(), Fit(), Predict(), are not directly applicable. Instead, we can use a custom training loop.

Recall from earlier labs the steps of gradient descent.
- repeat until convergence:
    - compute forward pass
    - compute the derivatives of the loss relative to parameters
    - update the parameters using the learning rate and the computed derivatives 
    
TensorFlow has the marvelous capability of calculating the derivatives for you. This is shown below. Within the `tf.GradientTape()` section, operations on Tensorflow Variables are tracked. When `tape.gradient()` is later called, it will return the gradient of the loss relative to the tracked variables. The gradients can then be applied to the parameters using an optimizer. 
This is a very brief introduction to a useful feature of TensorFlow and other machine learning frameworks. Further information can be found by investigating "custom training loops" within the framework of interest.
    


In [12]:
outfile="./data/weights-and-features-tf.npz"
print (f"Saving computed weights and features to {outfile}")
np.savez (outfile, weights=W, bias=b, features=X)
              
iterations = 500
lambda_ = 1
for iter in range(iterations):
    # Use TensorFlow’s GradientTape
    # to record the operations used to compute the cost 
    with tf.GradientTape() as tape:

        # Compute the cost (forward pass included in cost)
        cost_value = compute_cost_tf(X, W, b, Ynorm, R, lambda_)

    # Use the gradient tape to automatically retrieve
    # the gradients of the trainable variables with respect to the loss
    grads = tape.gradient(cost_value, [X,W,b])

    W_np = W.numpy()
    X_np = X.numpy()
    b_np = b.numpy()
    J = compute_cost_tf(X_np, W_np, b_np, Ynorm, R, lambda_)
    
    # Run one step of gradient descent by updating
    # the value of the variables to minimize the loss.
    optimizer.apply_gradients( zip(grads, [X,W,b]) )

    # Log periodically.
    if iter % 20 == 0:
        print(f"Training loss at iteration {iter}: {cost_value:0.1f},  {J:0.1f}")

Saving computed weights and features to ./data/weights-and-features-tf.npz
Extension horovod.torch has not been built: /usr/local/lib/python3.8/site-packages/horovod/torch/mpi_lib/_mpi_lib.cpython-38-x86_64-linux-gnu.so not found
If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.
[2023-04-27 19:30:23.703 tensorflow-2-6-cpu-py-ml-t3-medium-9169b2e75617c45c79c40579f6a8:234 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None
[2023-04-27 19:30:24.098 tensorflow-2-6-cpu-py-ml-t3-medium-9169b2e75617c45c79c40579f6a8:234 INFO profiler_config_parser.py:111] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.
Training loss at iteration 0: 2321300.0,  2321300.0
Training loss at iteration 20: 136174.9,  136174.9
Training loss at iteration 40: 51870.9,  51870.9
Training loss at iteration 60: 24603.1,  24603.1
Training loss at iteration 80: 13633.0,  13633.0
Training loss at iteration 100: 8489.3,  8489.3
Training l

In [13]:
def compute_gradient(X, W, b, Y, R, RU, RM, lambda_):
    """
    Computes the gradient for linear regression

    Args:
      X (ndarray (num_movies,num_features)): matrix of item features
      W (ndarray (num_users,num_features)) : matrix of user parameters
      b (ndarray (1, num_users)            : vector of user parameters
      Y (ndarray (num_movies,num_users)    : matrix of user ratings of movies
      R (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movie was rated by the j-th user
      RU (1, num_users)                    : matrix of no.of ratings given by each user
      RM (1, num_movies)                   : matrix of no.of ratings given to each movie
      lambda_ (float): regularization parameter
    Returns
      dj_dw (ndarray (num_features,num_users)):  The gradient of the cost w.r.t. the parameters w.
      dj_dx (ndarray (num_features,num_movies)): The gradient of the cost w.r.t. the features x.
      dj_db (1, num_users):                      The gradient of the cost w.r.t. the parameter b.

    """
    nm, nu = Y.shape
    delta = (np.matmul (X,W.T) + b - Y) * R

    dj_dw = np.divide (np.matmul(X.T,delta), RU) + np.divide ((lambda_*W.T), RU)
    dj_dx = np.divide (np.transpose(np.matmul(delta,W)), RM) + np.divide ((lambda_*X.T), RM)
    dj_db = np.divide (np.sum(delta,axis=0),RU)

    return dj_dw, dj_dx, dj_db

In [14]:
# 
# Save the old results
#
W_old = W.numpy()
X_old = X.numpy()
b_old = b.numpy()

In [17]:

#
# Reload ratings and add new ratings
#
Y, R = load_ratings_small()

#
# Concatenate the ratings that you provided to the given data
#
Y = np.c_[my_ratings, Y]
R = np.c_[(my_ratings != 0).astype(int), R]
RU = np.sum(R,axis=0,keepdims=True)
RM = np.transpose(np.sum(R,axis=1,keepdims=True))

#
# Normalize the Dataset
#
Ynorm, Ymean = normalizeRatings(Y, R)

#
#  Useful Parameters
#
num_movies, num_users = Y.shape
num_features = 100

#
# Set Initial Parameters (W, X), use tf.Variable to track these variables
#
tf.random.set_seed(1234) # for consistent results
W_tf = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X_tf = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b_tf = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')

#
# Convert the TensorFlow arrays to Numpy arrays
#
W = W_tf.numpy()
X = X_tf.numpy()
b = b_tf.numpy()

#
# Read the saved weights and features from the file
#
start_iter = 0
#infile="./data/weights-and-features-" + str(start_iter) + ".npz"
#print (f"Loading computed weights and features at {start_iter:4d} from {infile}")
#npzfile = np.load(infile)
#W = npzfile["weights"]
#b = npzfile["bias"]
#X = npzfile["features"]


iterations = 500
lambda_ = 1
learning_rate = 1e-2
for iter in range(iterations):
    # Calculate the gradient 
    dj_dw, dj_dx, dj_db = compute_gradient(X, W, b, Ynorm, R, RU, RM, lambda_)
    
    # Update Parameters using w, b, x, alpha and gradient
    W = W - learning_rate * np.transpose (dj_dw)            
    X = X - learning_rate * np.transpose (dj_dx)            
    b = b - learning_rate * dj_db 
    
    # Compute the cost J at each iteration
    J = compute_cost(X, W, b, Ynorm, R, lambda_)
    if iter % 100 == 0:
        print(f"Iteration {iter:4d}: Cost {J:8.2f}   ")

    # After every 1000 iterations, save the computed weights & features to a distinct file
    total_iter = start_iter + iter
    if total_iter % 20 == 0:
        outfile="./data/weights-and-features-" + str(total_iter) + ".npz"
        print (f"Saving computed weights and features at {total_iter:4d} to {outfile}")
        np.savez (outfile, weights=W, bias=b, features=X)
              
    

Iteration    0: Cost 1823218.30   
Saving computed weights and features at    0 to ./data/weights-and-features-0.npz
Saving computed weights and features at   20 to ./data/weights-and-features-20.npz
Saving computed weights and features at   40 to ./data/weights-and-features-40.npz
Saving computed weights and features at   60 to ./data/weights-and-features-60.npz
Saving computed weights and features at   80 to ./data/weights-and-features-80.npz
Iteration  100: Cost 145102.74   
Saving computed weights and features at  100 to ./data/weights-and-features-100.npz
Saving computed weights and features at  120 to ./data/weights-and-features-120.npz
Saving computed weights and features at  140 to ./data/weights-and-features-140.npz
Saving computed weights and features at  160 to ./data/weights-and-features-160.npz
Saving computed weights and features at  180 to ./data/weights-and-features-180.npz
Iteration  200: Cost 86162.49   
Saving computed weights and features at  200 to ./data/weights-a

<a name="6"></a>
## 6 - Recommendations
Below, we compute the ratings for all the movies and users and display the movies that are recommended. These are based on the movies and ratings entered as `my_ratings[]` above. To predict the rating of movie $i$ for user $j$, you compute $\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}$. This can be computed for all ratings using matrix multiplication.

In [14]:
#
# Read the saved weights and features from the file
#
start_iter = 10000
infile="./data/weights-and-features-" + str(start_iter) + ".npz"
print (f"Loading computed weights and features at {start_iter:4d} from {infile}")
npzfile = np.load(infile)
W = npzfile["weights"]
b = npzfile["bias"]
X = npzfile["features"]


# Make a prediction using trained weights and biases
#p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()
p = np.matmul(X, W.T) + b

#
# When using mean normalization, we subtract the mean rating for each movie from all its ratings
# Before making predictions, we will have to restore these values to the original values.
#
pm = p + Ymean

user_id = 0
user_predictions = pm[:,user_id]
user_ratings = Y[:,user_id]

#
# sort predictions in descending order
#
indices = tf.argsort(user_predictions, direction='DESCENDING')
num_movies, num_users = Y.shape

#
# Predict ratings for 50 random movies 
#
for i in range(25):
    movie_index = np.random.randint(0,num_movies)
    j = indices[movie_index]
    if j not in my_rated:
        print(f'Predicting rating {user_predictions[j]:0.2f} for movie {movieList[j]}')

        
print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(user_ratings)):
    if user_ratings[i] > 0:
        print(f'Original {user_ratings[i]}, Predicted {user_predictions[i]:0.2f} for {movieList[i]}')

Loading computed weights and features at 10000 from ./data/weights-and-features-10000.npz
Predicting rating 2.84 for movie Lovely & Amazing (2001)
Predicting rating 3.31 for movie Cemetery Junction (2010)
Predicting rating 3.81 for movie Trials of Henry Kissinger, The (2002)
Predicting rating 2.91 for movie Number 23, The (2007)
Predicting rating 4.48 for movie Doctor Who: The Time of the Doctor (2013)
Predicting rating 2.02 for movie House of Mirth, The (2000)
Predicting rating 2.79 for movie Batman: Assault on Arkham (2014)
Predicting rating 3.34 for movie Pride & Prejudice (2005)
Predicting rating 3.43 for movie Moana (2016)
Predicting rating 2.31 for movie Someone Marry Barry (2014)
Predicting rating 2.79 for movie Ip Man 2 (2010)
Predicting rating 0.21 for movie Ben-hur (2016)
Predicting rating 3.59 for movie Notes on a Scandal (2006)
Predicting rating 3.79 for movie Down Terrace (2009)
Predicting rating 2.18 for movie Clash of the Titans (2010)
Predicting rating 3.83 for movie Fr

In practice, additional information can be utilized to enhance our predictions. Above, the predicted ratings for the first few hundred movies lie in a small range. We can augment the above by selecting from those top movies, movies that have high average ratings and movies with more than 20 ratings. This section uses a [Pandas](https://pandas.pydata.org/) data frame which has many handy sorting features.

In [19]:
filter=(movieList_df["number of ratings"] > 10)
movieList_df["pred"] = user_predictions
movieList_df = movieList_df.reindex(columns=["pred", "mean rating", "number of ratings", "title"])
movieList_df.loc[indices[:300]].loc[filter].sort_values("mean rating", ascending=False)

Unnamed: 0,pred,mean rating,number of ratings,title
155,4.62001,4.155914,93,Snatch (2000)
929,4.924036,4.118919,185,"Lord of the Rings: The Return of the King, The..."
2700,4.896067,4.109091,55,Toy Story 3 (2010)
393,4.346708,4.106061,198,"Lord of the Rings: The Fellowship of the Ring,..."
2804,4.446866,3.989362,47,Harry Potter and the Deathly Hallows: Part 1 (...
3283,4.498875,3.982143,28,Argo (2012)
773,4.511342,3.960993,141,Finding Nemo (2003)
1051,4.893116,3.913978,93,Harry Potter and the Prisoner of Azkaban (2004)
246,4.928474,3.867647,170,Shrek (2001)
1930,4.411047,3.862069,58,Harry Potter and the Order of the Phoenix (2007)
