##### Copyright 2020 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Using side information in retrieval

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/recommenders/examples/movielens_side_information"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/recommenders/blob/main/docs/examples/movielens_side_information.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/tensorflow/recommenders/blob/main/docs/examples/movielens_side_information.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/recommenders/docs/examples/movielens_side_information.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

In the [basic retrieval tutorial](https://github.com/tensorflow/recommenders/blob/main/docs/examples/basic_retrieval.ipynb) we built a simple retrieval model for the Movielens dataset, using user and movie ids as only features.

However, it's often useful to use a richer set of features, for both queries and candidates. For example:

1. In a large candidate catalogue, there may not be enough data per item to accurately estimate an embedding vector for every item. Using items features (categories, descriptions, images) will help build a model that's more accurate and generalizes better to unseen items.
2. Some applications (like fashion) have candidate sets that change frequently, and any given item is available only for a short time. This leaves little time for learning of accurate item representations.
3. Users' preferences may change depending on the context they are in. For example, a single user may prefer to consume short-form content when on their phone, and reserve long-form content for their TV. Being able to use the context of the interaction in the model will help capture these nuances.
4. Items' relevance may change over time. Including time as an explicit feature will help the model capture popularity trends, preventing items that were once popular but not relevant any more from dominating future recommendations.

In this example, we're going to make use of query context to improve our initial model.

## Preliminaries

Let's start with the necessary imports.

In [None]:
import os
import pprint
import tempfile

import numpy as np
import tensorflow as  tf
import tensorflow_datasets as tfds

import tensorflow_recommenders as tfrs

We also need to repeat the training/test splits and vocabulary building.

In [None]:
ratings = tfds.load("movie_lens/100k-ratings", split="train")
movies = tfds.load("movie_lens/100k-movies", split="train")

ratings = ratings.map(lambda x: {
    "movie_id": x["movie_id"],
    "user_id": x["user_id"],
    "timestamp": x["timestamp"],
})
movies = movies.map(lambda x: {
    "movie_id": x["movie_id"],
})

# Randomly shuffle data and split between train and test.
# We fix the random seed to obtain deterministic results.
tf.random.set_seed(24)
shuffled = ratings.shuffle(100_000, seed=24, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

movie_ids = movies.batch(1_000_000).map(lambda x: x["movie_id"])
user_ids = ratings.batch(1_000_000).map(lambda x: x["user_id"])

unique_movie_ids = np.unique(np.concatenate(list(movie_ids)))
unique_user_ids = np.unique(np.concatenate(list(user_ids)))

## Using query context information

The first piece of side information we can make use of is the time when a rating is given. This will help us capture two things:

1. Popularity dynamics: some movies are watched a lot when they are first released, but do not become classics. This makes them a good recommendation soon after release, but a bad recommendation later on.
2. User tastes changing over time. Making sure our model has the capacity to express this via interactions of user embeddings and time will help us capture this.

### Representing time

Our ratings dataset has raw timestamp features:

In [None]:
for row in ratings.take(2).as_numpy_iterator():
  print(f"Timestamp: {row['timestamp']}")

We can't use this directly: we need normalized features in order to make our learning algorithm stable.

There are many ways of doing this:
1. We could treat time as a continuous feature, and scale it to lie between 0 and 1. This coule be done via quantile scaling (like in sklearn's [QuantileTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html)) or through any variety of other methods that map an arbitrary real variable to a fixed interval.
2. We could treat time as a discrete variable: a series of discrete time periods.

The advantage of treating time as a continuous variable is that its effects become easy to estimate: there are only a few parameters added to the model. However, given enough data, treating time as discrete gives us a more flexible model, where each period as different effects, rather than following a smooth curve over time.

In this example, we're going to adopt the discrete approach. We'll start by dividing time into 1000 equal buckets.

In [None]:
max_timestamp = ratings.map(lambda x: x["timestamp"]).reduce(
    tf.cast(0, tf.int64), tf.maximum).numpy().max()
min_timestamp = ratings.map(lambda x: x["timestamp"]).reduce(
    np.int64(1e9), tf.minimum).numpy().min()

timestamp_buckets = np.linspace(
    min_timestamp, max_timestamp, num=1000)

print(f"Buckets: {timestamp_buckets[:3]}")

### Query model

We first define the dimensionality of the query and candidate representations:

In [None]:
embedding_dimension = 32

We now define the query model. We can map raw features into embeddings via defining appropriate feature columns for our query model.

We then use these features in the model. While we could use complex models here, let's start with something simple: we simply concatenate the time and user embedding, and project them onto the item embedding dimension (remember, the output dimensionality between the user model and the candidate model has to be the same).

In [None]:
class UserModel(tf.keras.Model):

  def __init__(self, embedding_dimension, timestamp_buckets):
    super(UserModel, self).__init__()
    # An embedding column for user ids.
    user_id_feature = tf.feature_column.embedding_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            "user_id", unique_user_ids,
        ),
        embedding_dimension,
    )

    # An embedding column for the bucketized timestamps: there will be a separate
    # embedding for each of the timestamp buckets.
    time_feature = tf.feature_column.embedding_column(
        tf.feature_column.bucketized_column(
            tf.feature_column.numeric_column("timestamp"),
            timestamp_buckets.tolist(),
        ),
        embedding_dimension,
    )
    self.embedding_layer = tf.keras.layers.DenseFeatures(
        [user_id_feature, time_feature],
        name="query_embedding",
    )
    self.dense_layer = tf.keras.layers.Dense(embedding_dimension)

  def call(self, inputs):
    input_embedding = self.embedding_layer(inputs)
    return self.dense_layer(input_embedding)

### Candidate model

Let's keep the candidate model as in the [basic retrieval tutorial](https://github.com/tensorflow/recommenders/blob/main/docs/examples/basic_retrieval.ipynb) - using id information only - and see what effect the inclusion of context information has on the model's accuracy.

In [None]:
class MovieModel(tf.keras.Model):

  def __init__(self, embedding_dimension):
    super(MovieModel, self).__init__()
    movie_features = [tf.feature_column.embedding_column(
        tf.feature_column.categorical_column_with_vocabulary_list(
            "movie_id", unique_movie_ids,
        ),
        embedding_dimension,
    )]
    self.embedding_layer = tf.keras.layers.DenseFeatures(movie_features, name="movie_embedding")
  
  def call(self, inputs):
    return self.embedding_layer(inputs)

### Putting it all together

We now put it all together in a model class we'll use for fitting and evaluation.

In [None]:
class MovielensModel(tfrs.models.Model):

  def __init__(self, embedding_dimension, timestamp_buckets):
    super().__init__()
    self.user_model: tf.keras.Model = UserModel(embedding_dimension, timestamp_buckets)
    self.movie_model: tf.keras.Model = MovieModel(embedding_dimension)
    self.task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=movies.batch(128).map(self.movie_model)
        )
    )

  def compute_loss(self, features, training=False):
    user_embeddings = self.user_model({"user_id": features["user_id"],
                                       "timestamp": features["timestamp"]})
    positive_movie_embeddings = self.movie_model(
        {"movie_id": features["movie_id"]}
    )

    return self.task(user_embeddings, positive_movie_embeddings)

In [None]:
model = MovielensModel(embedding_dimension, timestamp_buckets)
model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.1))

Then shuffle, batch, and cache the training and evaluation data.

In [None]:
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(4096).cache()

Then train the  model:

In [None]:
model.fit(cached_train, epochs=3)

### Results

What do the results look like?

In [None]:
model.evaluate(cached_test, return_dict=True)

Our accuracies for all values of k are quite a bit better than the model from the basic retrieval tutorial. Clearly, accounting for time is useful.

When using timestamp information, the performance of the model varies with the number of buckets we use when bucketizing the timestamps. Hence it is important to choose the right number of buckets.

In this tutorial we used 1000 buckets and significantly improved the performance of the model. However, in an experiment with 100 buckets, the model using timestamp information did not perform significantly better than the model from the basic retrieval tutorial.