##### Copyright 2020 The TensorFlow Authors.

In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Using deep models in retrieval

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/recommenders/examples/deep_recommenders"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/deep_recommenders.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/recommenders/blob/main/docs/examples/deep_recommenders.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/deep_recommenders.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

In [the featurization tutorial](https://github.com/tensorflow/recommenders/blob/main/docs/examples/featurization.ipynb) we incorporated multiple features into our models, but the models consist of only an embedding layer. We can add more dense layers to our models to increase their capacity.

Deep models with multiple layers can approximate more complex patterns and functions than models with only an embedding layer. Furthermore, with more layers, the learnability of the model might also improve. While model with one hidden layer can approximate any function in theory, in practice models with more hidden layers can learn to approximate complex functions more easily.

Nonetheless, complex models also have their disadvantages. More layers require more training epochs, and each training step would require more computation. It would also be harder for the gradients to propagate through models with more layers when updating model parameters. Furthermore, with more parameters, deep models might overfit or even simply memorize the training examples instead of learning a function that can generalize.

In this notebook we will build deep models with multiple layers and compare the results.

## Preliminaries

We first import the necessary packages.

In [0]:
import os
import tempfile

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

import tensorflow_recommenders as tfrs

In this tutorial we will use the models from [the featurization tutorial](https://github.com/tensorflow/recommenders/blob/main/docs/examples/featurization.ipynb) to generate embeddings. Hence we will only be using the user id, timestamp, and movie title features.

In [0]:
ratings = tfds.load("movielens/100k-ratings", split="train")
movies = tfds.load("movielens/100k-movies", split="train")

ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"],
    "timestamp": x["timestamp"],
})
movies = movies.map(lambda x: {
    "movie_title": x["movie_title"],
})

We bucketize the timestamp feature.

In [0]:
max_timestamp = ratings.map(lambda x: x["timestamp"]).reduce(
    tf.cast(0, tf.int64), tf.maximum).numpy().max()
min_timestamp = ratings.map(lambda x: x["timestamp"]).reduce(
    np.int64(1e9), tf.minimum).numpy().min()
timestamp_buckets = np.linspace(
    min_timestamp, max_timestamp, num=1000,
)

## Model definition

### Query model

We will define a generic query model that can have different architectures depending on the constructor arguments.

We will use the user model defined in [the featurization tutorial](https://github.com/tensorflow/recommenders/blob/main/docs/examples/featurization.ipynb) as a component of our query model. It will convert input examples into feature embeddings.

In [0]:
class UserModel(tf.keras.Model):
  
  def __init__(self):
    super().__init__()

    num_hashing_bins = 20_000

    self.user_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.Hashing(num_bins=num_hashing_bins),
        tf.keras.layers.Embedding(num_hashing_bins, 32),
    ])
    self.timestamp_embedding = tf.keras.Sequential([
        tf.keras.layers.experimental.preprocessing.Discretization(timestamp_buckets.tolist()),
        tf.keras.layers.Embedding(len(timestamp_buckets) + 2, 32),
    ])
    self.normalized_timestamp = tf.keras.layers.experimental.preprocessing.Normalization()

  def call(self, inputs):

    # Take the input dictionary, pass it through each input layer,
    # and concatenate the result.
    return tf.concat([
        self.user_embedding(inputs["user_id"]),
        self.timestamp_embedding(inputs["timestamp"]),
        self.normalized_timestamp(inputs["timestamp"]),
    ], axis=1)

In addition to the embedding model, we also add hidden layers according to the argument `hidden_layer_sizes` to make the query model deep. Since deep linear models have the same expressive power as normal linear models, we use ReLUs for all hidden layers to allow to model nonlinearities. After adding the hidden layers, we add a projection layer to generate embeddings of dimensionality specified by `final_embedding_dimension`.

Note that we do not use any activation function on the projection layer. Using an activation function would limit the output space of the final embeddings and might negatively impact the performance of the model. For instance, if ReLUs are used in the projection layer, all components in the output embedding would be non-negative.

In [0]:
class QueryModel(tf.keras.Model):
  """Model for encoding user queries."""

  def __init__(
      self,
      final_embedding_dimension,
      hidden_layer_sizes=None,
  ):
    """Initializes a query model for encoding user queries.

    Args:
      final_embedding_dimension:
        An integer representing the dimensionality of the final embedding. The
        model would add a final projection layer with
        `final_embedding_dimension` units to ensure that the output embedding
        has the specified number of dimensions.
      hidden_layer_sizes:
        A list of integers where the ith entry represents the number of units
        the ith layer contains.
    
    Returns:
      A query model for encoding queries.
    """
    super().__init__()

    if hidden_layer_sizes is None:
      hidden_layer_sizes = []

    # We first use the user model for generating embeddings.
    self.embedding_model = UserModel()

    dense_layer_list = []
    # We now add the hidden layers.
    for layer_size in hidden_layer_sizes:
      dense_layer_list.append(
          tf.keras.layers.Dense(layer_size, activation="relu"),
      )
    # We finally add a projection layer without any activation function.
    dense_layer_list.append(
        tf.keras.layers.Dense(final_embedding_dimension, activation=None),
    )
    self.dense_layers = tf.keras.Sequential(dense_layer_list, name='dense_layers')
    
  def call(self, inputs):
    feature_embedding = self.embedding_model(inputs)
    return self.dense_layers(feature_embedding)

### Candidate model

Since we are focusing on exploring different query models, we will keep the candidate model simple and use only the embedding model from [the featurization tutorial](https://github.com/tensorflow/recommenders/blob/main/docs/examples/featurization.ipynb). Note that this model generates embeddings with 64 dimensions.

In [0]:
class MovieModel(tf.keras.Model):
  
  def __init__(self):
    super().__init__()

    num_hashing_bins = 20_000
    max_tokens = 10_000

    self.title_embedding = tf.keras.Sequential([
      tf.keras.layers.experimental.preprocessing.Hashing(num_bins=num_hashing_bins),
      tf.keras.layers.Embedding(num_hashing_bins, 32)
    ])
    self.title_text_embedding = tf.keras.Sequential([
      tf.keras.layers.experimental.preprocessing.TextVectorization(max_tokens=max_tokens),
      tf.keras.layers.Embedding(max_tokens, 32)
    ])

  def call(self, inputs):
    return tf.concat([
        self.title_embedding(inputs["movie_title"]),
        # We average the embedding of individual words to get one embedding vector
        # per title.
        tf.reduce_mean(self.title_text_embedding(inputs["movie_title"]), axis=-2),
    ], axis=1)

### Combined model

We now define a combined model that takes a query model and a candidate model as arguments.

In [0]:
class MovielensModel(tfrs.models.Model):

  def __init__(self, query_model, candidate_model):
    super().__init__()
    self.query_model: tf.keras.Model = query_model
    self.candidate_model: tf.keras.Model = candidate_model
    self.task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=movies.batch(128).map(self.candidate_model),
        ),
    )

  def compute_loss(self, features, training=False):
    # We only pass the user id and timestamp features into the query model. This
    # is to ensure that the training inputs would have the same keys as the
    # query inputs. Otherwise the discrepancy in input structure would cause an
    # error when loading the query model after saving it.
    query_embeddings = self.query_model({
        "user_id": features["user_id"],
        "timestamp": features["timestamp"],
    })
    positive_movie_embeddings = self.candidate_model({
        "movie_title": features["movie_title"],
    })
    return self.task(query_embeddings, positive_movie_embeddings, training=False)

## Training the model

### Prepare the data

We first split the data into a training set and a testing set.

In [0]:
tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()

### Model with one dense layer

We now define the dimensionalities of the final embeddings below. Since `MovieModel` generates embeddings with 64 dimensions, we want our query models to generate 64 dimensional embeddings as well.

In [0]:
final_embedding_dimension = 64

We now define a simple model with only a projection layer and no hidden layer.

In [0]:
one_layer_query_model = QueryModel(
    final_embedding_dimension,
)
one_layer_candidate_model = MovieModel()
one_layer_candidate_model.title_text_embedding.layers[0].adapt(
    movies.map(lambda x: x["movie_title"]),
)
one_layer_model = MovielensModel(
    one_layer_query_model,
    one_layer_candidate_model,
)
one_layer_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [0]:
one_layer_model.fit(cached_train, epochs=3)

### Model with three dense layers and no activation function

We now create and train a model with two hidden layers of size 64 and one projection layer.

In [0]:
three_layer_query_model = QueryModel(
    final_embedding_dimension,
    hidden_layer_sizes=[64, 64],
)
three_layer_candidate_model = MovieModel()
three_layer_candidate_model.title_text_embedding.layers[0].adapt(
    movies.map(lambda x: x["movie_title"]),
)
three_layer_model = MovielensModel(
    three_layer_query_model,
    three_layer_candidate_model,
)
three_layer_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [0]:
three_layer_model.fit(cached_train, epochs=3)

### Model with five dense layers and no activation function

We now create a model with four hidden layers of sizes 128, 128, 64, 64, and a projection layer.

In [0]:
five_layer_query_model = QueryModel(
    final_embedding_dimension,
    hidden_layer_sizes=[128, 128, 64, 64],
)
five_layer_candidate_model = MovieModel()
five_layer_candidate_model.title_text_embedding.layers[0].adapt(
    movies.map(lambda x: x["movie_title"]),
)
five_layer_model = MovielensModel(
    five_layer_query_model,
    five_layer_candidate_model,
)
five_layer_model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [0]:
five_layer_model.fit(cached_train, epochs=3)

## Comparing the models

We now evaluate the four models and compare their results.

In [0]:
one_layer_results = one_layer_model.evaluate(cached_test, return_dict=True)
print("Top 100 categorical accuracy: {:.4f}".format(one_layer_results["factorized_top_k/top_100_categorical_accuracy"]))

In [0]:
three_layer_results = three_layer_model.evaluate(cached_test, return_dict=True)
print("Top 100 categorical accuracy: {:.4f}".format(three_layer_results["factorized_top_k/top_100_categorical_accuracy"]))

In [0]:
five_layer_results = five_layer_model.evaluate(cached_test, return_dict=True)
print("Top 100 categorical accuracy: {:.4f}".format(five_layer_results["factorized_top_k/top_100_categorical_accuracy"]))

We focus on the top 100 categorical accuracy. In many recommender systems, a retrieval model picks candidates for ranking models. The number of candidates should be large enough to ensure items of interest are included. Hence we look at the top 100 candidates returned by each model here.

The one layer model has the highest top 100 categorical accuracy. This shows that a more complex model does not always guarantee better performance. As mentioned above, complex models require more training epochs. In this case, three epochs might not be sufficient for a three layer or five layer model. Complex models also require more regularization, which is not used in this example.

Of course, all these results should be treated with a certain level of skepticism. We did not thoroughly tune all hyperparameters, including the learning rate, the optimizer, and regularization. These hyperparameters may play a huge role in model performance, and we cannot have conclusive results without fully tuning these hyperparameters.

## Serving the model

We now serve our best performing model, the one layer model. In a simple recommendation model, when receiving a query, we simply compute the embedding of the query and find the movies with the closest embeddings to that query. For efficiency, we can precompute the embeddings of all candidates.

In [0]:
movie_embeddings = movies.batch(1_000).map(lambda x: one_layer_model.candidate_model(x))

To create a recommendation model that takes a query and returns the top candidates, we use the `ann` module in TFRS. Since the number of movies in the dataset is small, we can use the `BruteForce` layer to look for top candidates. We pass in the query model as an argument for encoding input queries.

In [0]:
serving_model = tfrs.layers.ann.BruteForce(one_layer_model.query_model)

We then let the recommendation model index the candidates by calling `index` and pass in the movie embeddings we precomputed and the movie titles as identifiers.

In [0]:
serving_model.index(
    candidates=movie_embeddings,
    identifiers=movies.batch(1_000).map(lambda x: x["movie_title"]),
)

We can query the model for recommendations by passing in a dictionary of features.

In [0]:
scores, titles = serving_model(
    {"user_id": np.array(["42"]), "timestamp": np.array([879024327])},
    num_candidates=3,
)
for i, title in enumerate(titles[0].numpy().tolist()):
  print("{:d}. {:s}".format(i + 1, str(title)))

We can save the model using `tf.saved_model.save`.

In [0]:
tmp = tempfile.TemporaryDirectory()
path = os.path.join(tmp.name, "s_model")
tf.saved_model.save(serving_model, path)

We can then load the model using `tf.keras.models.load_model`.

In [0]:
loaded_model = tf.keras.models.load_model(path)

We can query the loaded model for recommendations by passing in a dictionary of features.

In [0]:
scores, titles = loaded_model(
    {"user_id": np.array(["42"]), "timestamp": np.array([879024327])}
)
for i, title in enumerate(titles[0].numpy().tolist()):
  print("{:d}. {:s}".format(i + 1, str(title)))

## Next Steps

In this tutorial we expanded our retrieval model with dense layers and activation functions. To see how to create a model that can perform not only retrieval tasks but also rating tasks, take a look at [the multitask tutorial](https://github.com/tensorflow/recommenders/blob/main/docs/examples/multitask.ipynb).