##### Copyright 2020 The TensorFlow Authors.

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Using list optimization in model training

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/recommenders/examples/list_optimization"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/list_optimization.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/recommenders/blob/main/docs/examples/list_optimization.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/list_optimization.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

In [the basic ranking tutorial](basic_ranking), we trained a model that can predict ratings for user/movie pairs. The model was trained to minimize the mean squared error of predicted ratings.

However, optimizing the model's predictions on individual movies is not necessarily the best method for training ranking models. We do not need ranking models to predict scores with great accuracy. Instead, we care more about the ability of the models to generate an ordered list of items that matches the user's preference ordering.

Instead of optimizing the model's predictions on individual query/item pairs, we can optimize the model's ranking of the list as a whole. This method is called list optimization and it can be useful in training ranking models.

In this tutorial, we will experiment with using TensorFlow Recommenders to build ranking models using list optimization. We will make use of ranking losses and metrics provided by [TensorFlow Ranking](https://github.com/tensorflow/ranking), a TensorFlow package that focuses on the [learning to rank](https://www.microsoft.com/en-us/research/publication/learning-to-rank-from-pairwise-approach-to-listwise-approach/).

## Preliminaries

If TensorFlow Ranking is not available in your runtime environment, you can install it using `!pip install tensorflow_ranking`. We then import all necessary packages.

In [0]:
import pprint

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

In [0]:
import tensorflow_recommenders as tfrs
import tensorflow_ranking as tfr

We will continue to use the MovieLens 100K dataset. We now load the datasets and keep only the user id, movie id, and user rating features for this tutorial.

In [0]:
ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")

ratings = ratings.map(lambda x: {
    "movie_id": x["movie_id"],
    "user_id": x["user_id"],
    "user_rating": x["user_rating"],
})
movies = movies.map(lambda x: {
    "movie_id": x["movie_id"],
})

However, we cannot use the MovieLens dataset for list optimization directly. Each example in the MovieLens 100K dataset contains only the rating of one user given to one movie. To allow list optimization, we have to modify the dataset so that each example contains a list of movies and their ratings. Some movies in the list will be ranked higher than others; the goal of our model will be to make predictions that match this ordering.

To do this, we use the `tfrs.examples.movielens.movielens_to_listwise` helper function. It takes the MovieLens 100K dataset and generates a dataset containing list examples as discussed above. The implementation details can be found in the [source code](https://github.com/tensorflow/recommenders/blob/main/tensorflow_recommenders/examples/movielens.py).

In [0]:
random.seed(42)
tf.random.set_seed(42)

# We sample 50 lists for each user for the training data. For each list we
# sample 10 movies from the movies the user rated and 5 movies from the movies
# the user did not rate.
train, test = tfrs.examples.movielens.movielens_to_listwise(
    ratings,
    train_num_list_per_user=50,
    test_num_list_per_user=10,
    num_pos_examples_per_list=10,
    num_neg_examples_per_list=5,
)

We can inspect an example from the training data. The example includes a user id, a list of 15 movie ids, and their ratings by the user. Note that five of the ratings are 0 because their corresponding movies are not rated by the user.

In [0]:
for example in train.take(1):
  pprint.pprint(example)

## Model definition

We now define the model architectures. We will train the same model with three different losses: mean squared error, pairwise hinge loss, and ListMLE. These three losses correspond to pointwise, pairwise, and listwise optimization. We will explain the three losses more in depth in sections below.

### Query model and candidate model

We first define the model for encoding the user ids and movie ids. We use `tf.keras.layers.experimental.preprocessing.Hashing` to convert id strings into integers and then generate embeddings accordingly. We generate 32 dimensional embeddings for both user ids and movie ids.

In [0]:
class UserModel(tf.keras.Model):
  
  def __init__(self):
    super().__init__()

    num_hashing_bins = 20_000

    self.user_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.Hashing(num_bins=num_hashing_bins),
        tf.keras.layers.Embedding(num_hashing_bins, 32),
    ])

  def call(self, inputs):

    # Take the input dictionary, pass it through each input layer,
    # and concatenate the result.
    return self.user_embedding(inputs["user_id"])

In [0]:
class MovieModel(tf.keras.Model):
  
  def __init__(self):
    super().__init__()

    num_hashing_bins = 20_000

    self.movie_embedding = tf.keras.Sequential([
      tf.keras.layers.experimental.preprocessing.Hashing(num_bins=num_hashing_bins),
      tf.keras.layers.Embedding(num_hashing_bins, 32)
    ])

  def call(self, inputs):
    return self.movie_embedding(inputs["movie_id"])

### Combined ranking model

We now define a ranking model using the embedding models we defined above. We will use a ranking model that, given an input, first converts the input user id and movie id features into embeddings using the query and candidate embedding models. It then concatenates the user id embedding with each movie id embedding and passes them through dense layers to produce a ranking score for each movie.

We define the model to take a training loss as a parameter. This way we can train the model using different training losses. We use [normalized discounted cumulative gain (NDCG)](https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG) as our validation metric. NDCG measures a predicted ranking by taking a weighted sum of the actual rating of each candidate. The ratings of movies that are ranked lower by the model would be discounted more. As a result, a good model that ranks highly-rated movies on top would have a high NDCG result. Since this metric takes the ranked position of each candidate into account, it is a listwise metric.

In [0]:
class MovielensModel(tfrs.Model):

  def __init__(self, query_model, candidate_model, loss):
    super().__init__()
    self.query_model: tf.keras.Model = query_model
    self.candidate_model: tf.keras.Model = candidate_model
    self.dense_layers = tf.keras.Sequential([
        tf.keras.layers.Dense(128, activation="relu"),
        tf.keras.layers.Dense(1),
    ])
    self.task = tfrs.tasks.Ranking(
        loss=loss,
        metrics=[tfr.keras.metrics.NDCGMetric()],
    )
  
  def call(self, features):
    # We first convert the id features into embeddings.
    query_embeddings = self.query_model({
        "user_id": features["user_id"],
    })
    movie_embeddings = self.candidate_model({
        "movie_id": features["movie_id"],
    })
    # We now repeat the user id embedding and then concatenate the resulting
    # tensor with movie id embeddings.
    list_length = features["movie_id"].shape[1]
    user_embedding_repeated = tf.repeat(tf.expand_dims(query_embeddings, 1), [list_length], axis=1)
    concatenated_embeddings = tf.concat([user_embedding_repeated, movie_embeddings], 2)
    # We finally pass the concatenated embeddings to the dense layers to
    # generate ranking scores.
    return self.dense_layers(concatenated_embeddings)

  def compute_loss(self, features, training=False):
    list_scores = self(features)
    return self.task(
        labels=features["user_rating"],
        predictions=tf.squeeze(list_scores, axis=-1),
    )

## Training the models

We now shuffle and batch the data.

In [0]:
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()

We first train a model using mean squared error. This model is very similar to the model in [the basic ranking tutorial](basic_ranking). We train the model to minimize the mean squared error between the actual ratings and predicted ratings. Therefore, this loss is computed individually for each movie and the training is pointwise.

In [0]:
mse_user_model = UserModel()
mse_movie_model = MovieModel()
mse_model = MovielensModel(mse_user_model, mse_movie_model, tf.keras.losses.MeanSquaredError())
mse_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [0]:
mse_model.fit(cached_train, epochs=5)

We now train a model using the pairwise hinge loss from Tensorflow Ranking. By minimizing the pairwise hinge loss, the model tries to maximize the difference between the model's predictions for a highly rated item and a low rated item: the bigger that difference is, the lower the model loss. However, once the difference is large enough, the loss becomes zero, stopping the model from further optimizing this particular pair and letting it focus on other pairs that are incorrectly ranked

This loss is not computed for individual movies, but rather for pairs of movies. Hence the training using this loss is pairwise.

In [0]:
hinge_user_model = UserModel()
hinge_movie_model = MovieModel()
hinge_model = MovielensModel(hinge_user_model, hinge_movie_model, tfr.keras.losses.get(tfr.losses.RankingLossKey.PAIRWISE_HINGE_LOSS))
hinge_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [0]:
hinge_model.fit(cached_train, epochs=5)

We now train a model using the ListMLE loss from Tensorflow Ranking. ListMLE refers to list maximum likelihood estimation. To calculate the ListMLE loss, we first use the user ratings to generate an optimal ranking. We then calculate the likelihood of each candidate being out-ranked by any item below it in the optimal ranking using the predicted scores. The model tries to minimize such likelihood to ensure highly rated candidates are not out-ranked by low rated candidates. You can learn more about the details of ListMLE in section 2.2 of the paper [Position-aware ListMLE: A Sequential Learning Process](http://auai.org/uai2014/proceedings/individuals/164.pdf).

Note that since the likelihood is computed with respect to a candidate and all candidates below it in the optimal ranking, the loss is not pairwise but listwise. Hence the training uses list optimization.

In [0]:
listwise_user_model = UserModel()
listwise_movie_model = MovieModel()
listwise_model = MovielensModel(listwise_user_model, listwise_movie_model, tfr.keras.losses.get(tfr.losses.RankingLossKey.LIST_MLE_LOSS))
listwise_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [0]:
listwise_model.fit(cached_train, epochs=5)

## Comparing the models

We now compare and evaluate the three models.

In [0]:
mse_model_result = mse_model.evaluate(cached_test, return_dict=True)
print("NDCG of the MSE Model: {:.4f}".format(mse_model_result["ndcg_metric"]))

In [0]:
hinge_model_result = hinge_model.evaluate(cached_test, return_dict=True)
print("NDCG of the pairwise hinge loss model: {:.4f}".format(hinge_model_result["ndcg_metric_1"]))

In [0]:
listwise_model_result = listwise_model.evaluate(cached_test, return_dict=True)
print("NDCG of the ListMLE model: {:.4f}".format(listwise_model_result["ndcg_metric_2"]))

Of the three models, the model trained using ListMLE has the highest NDCG metric. This result shows how listwise optimization can be used to train ranking models and can potentially produce models that perform better than models optimized in a pointwise or pairwise fashion. A reason might be that model trained using a listwise loss can optimize its performance on the list of candidates as a whole. In pointwise and pairwise training, the model is optimized for each individual candidate or candidate pair, and the gradients from the losses of different candidates or candidate pairs might be conflicting, leading to less efficient training.

Nonetheless, the results from this tutorial does not conclusively show that list optimization is definitively better than pairwise or pointwise training. We cannot draw such conclusion without fully tuning the hyperparameters and run repeated experiments. Different types of training might fit different combinations of hyperparameters and different datasets.