https://www.tensorflow.org/recommenders/examples/basic_ranking

Real-world recommender systems are often composed of two stages:

- The retrieval stage is responsible for selecting an initial set of hundreds of candidates from all possible candidates. The main objective of this model is to efficiently weed out all candidates that the user is not interested in. Because the retrieval model may be dealing with millions of candidates, it has to be computationally efficient.

- The ranking stage takes the outputs of the retrieval model and fine-tunes them to select the best possible handful of recommendations. Its task is to narrow down the set of items the user may be interested in to a shortlist of likely candidates.

We're going to focus on the second stage, ranking. If you are interested in the retrieval stage, have a look at our retrieval tutorial.

In this tutorial, we're going to:

- Get our data and split it into a training and test set.
- Implement a ranking model.
- Fit and evaluate it.

In [1]:
import os
import pprint
import tempfile

from typing import Dict, Text

import numpy as np
import pandas as pd

In [2]:
import tensorflow as tf
import tensorflow_datasets as tfds

In [3]:
import tensorflow_recommenders as tfrs

### Preparing the dataset

In [19]:
def load_data_file_cold(file, stats):
    print('loading file:' + file)
    training_df = pd.read_csv(
        file,
        skiprows=[0],
        names=["viewer","broadcaster","viewer_age","viewer_gender","viewer_longitude","viewer_latitude","viewer_lang","viewer_country","broadcaster_age","broadcaster_gender","broadcaster_longitude","broadcaster_latitude","broadcaster_lang","broadcaster_country","duration", "viewer_network", "broadcaster_network", "count"], dtype={
            'viewer': np.unicode,
            'broadcaster': np.unicode,
            'viewer_age': np.single,
            'viewer_gender': np.unicode,
            'viewer_longitude': np.single,
            'viewer_latitude': np.single,
            'viewer_lang': np.unicode,
            'viewer_country': np.unicode,
            'broadcaster_age': np.single,
            'broadcaster_longitude': np.single,
            'broadcaster_latitude': np.single,
            'broadcaster_lang': np.unicode,
            'broadcaster_country': np.unicode,
            'viewer_network': np.unicode,
            'broadcaster_network': np.unicode,
            'count': np.int
        })

    values = {
        'viewer': 'unknown',
        'broadcaster': 'unknown',
        'viewer_age': 30,
        'viewer_gender': 'unknown',
        'viewer_longitude': 0,
        'viewer_latitude': 0,
        'viewer_lang': 'unknown',
        'viewer_country': 'unknown',
        'broadcaster_age': 30,
        'broadcaster_longitude': 0,
        'broadcaster_latitude': 0,
        'broadcaster_lang': 'unknown',
        'broadcaster_country': 'unknown',
        'duration': 0,
        'viewer_network': 'unknown',
        'broadcaster_network': 'unknown',
        'count': 0
    }
    training_df.fillna(value=values, inplace=True)
#     print(training_df.head(10))
#     print(training_df.iloc[-10:])
#     stats.send_stats('data-size', len(training_df.index))

    sampled_df = training_df.sample(frac=0.1)
    print(sampled_df.head(10))
    print(sampled_df.iloc[-10:])
    return sampled_df

def load_training_data_cold(file, stats):
    ratings_df = load_data_file_cold(file, stats)
    print('creating data set')
    training_ds = (
        tf.data.Dataset.from_tensor_slices(
            ({
                "viewer": tf.cast(
                    ratings_df['viewer'].values,
                    tf.string),
                "viewer_gender": tf.cast(
                    ratings_df['viewer_gender'].values,
                    tf.string),
                "viewer_lang": tf.cast(
                    ratings_df['viewer_lang'].values,
                    tf.string),
                "viewer_country": tf.cast(
                    ratings_df['viewer_country'].values,
                    tf.string),
                "viewer_age": tf.cast(
                    ratings_df['viewer_age'].values,
                    tf.int16),
                "viewer_longitude": tf.cast(
                    ratings_df['viewer_longitude'].values,
                    tf.float16),
                "viewer_latitude": tf.cast(
                    ratings_df['viewer_latitude'].values,
                    tf.float16),
                "broadcaster": tf.cast(
                    ratings_df['broadcaster'].values,
                    tf.string),
                "viewer_network": tf.cast(
                    ratings_df['viewer_network'].values,
                    tf.string),
                "broadcaster_network": tf.cast(
                    ratings_df['broadcaster_network'].values,
                    tf.string),
                "duration": tf.cast(
                    ratings_df['duration'].values,
                    tf.float16),
                "count": tf.cast(
                    ratings_df['count'].values,
                    tf.int16),
            })))

    return training_ds

In [20]:
def convert_to_dense(x, dims):
    if type(x) == tf.SparseTensor:
        default_value = "" if x.dtype == tf.string else 0
        return tf.reshape(
            tf.squeeze(tf.sparse_to_dense(x.indices, [x.dense_shape[0], dims], x.values, default_value)), [-1, dims]
        )
    else:
        return tf.reshape(tf.squeeze(x), [-1, dims])

In [21]:
ratings = load_training_data_cold(file="a3d86f3b-eb45-4641-b05d-30dff7423e6b.csv", stats="")

ratings = ratings.map(lambda x: {
    "broadcaster": x["broadcaster"],
    "viewer": x["viewer"],
    "count": x["count"]
})

loading file:a3d86f3b-eb45-4641-b05d-30dff7423e6b.csv
                   viewer       broadcaster  viewer_age viewer_gender  \
4334600  meetme:317107180  meetme:277122109        37.0        female   
5163378  meetme:317885958  meetme:285610962        42.0          male   
1207811     pof:206886313  meetme:176496069        28.0          male   
2409986   skout:171062405   skout:182827501        53.0          male   
4311241   meetme:95307095     pof:322045884        33.0          male   
2022693   skout:163687348   skout:181243363        22.0          male   
4039038   skout:178025563  meetme:214440162        65.0        female   
3850798   skout:178189061  meetme:271665969        30.0          male   
4815759     pof:113662751   skout:168883218        44.0          male   
580640    skout:178190394   skout:155616729        47.0        female   

         viewer_longitude  viewer_latitude viewer_lang viewer_country  \
4334600        -99.048599        19.325701          es             US

In [7]:
tf.random.set_seed(42)
shuffled = ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

In [8]:
broadcaster_ids = ratings.batch(1_000_000).map(lambda x: x["broadcaster"])
user_ids = ratings.batch(1_000_000).map(lambda x: x["viewer"])

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'


In [9]:
unique_broadcaster_ids = np.unique(np.concatenate(list(broadcaster_ids)))
unique_broadcaster_ids

array([b'meetme:100081867', b'meetme:100104254', b'meetme:100114731', ...,
       b'zoosk:fd7a32868c43ed589efba6653949426c',
       b'zoosk:ff0ba42fa32cddbec949c96694895fe2',
       b'zoosk:ffd69ee0bb59b722020f374298b9e0b9'], dtype=object)

In [10]:
unique_user_ids = np.unique(np.concatenate(list(user_ids)))
unique_user_ids[:10]

array([b'meetme:100030793', b'meetme:100081867', b'meetme:100116030',
       b'meetme:10015227', b'meetme:100155917', b'meetme:100190086',
       b'meetme:100196265', b'meetme:100197023', b'meetme:100198242',
       b'meetme:100200365'], dtype=object)

### Candidate / Ranking model

In [11]:
class RankingModel(tf.keras.Model):

	def __init__(self):
		super().__init__()
		embedding_dimension = 32
		max_tokens = 10_000

		# Compute embeddings for users.
		self.user_embeddings = tf.keras.Sequential([
			tf.keras.layers.experimental.preprocessing.StringLookup(
				vocabulary = unique_user_ids, mask_token = None),
			tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
		])

		# Compute embeddings for movies.
		self.broadcaster_embedding = tf.keras.Sequential([
			tf.keras.layers.experimental.preprocessing.StringLookup(
				vocabulary = unique_broadcaster_ids, max_tokens=None),
			tf.keras.layers.Embedding(len(unique_broadcaster_ids) + 1, embedding_dimension)
		])

		# Compute predictions.
		self.ratings = tf.keras.Sequential([
			# Learn multiple dense layers.
			tf.keras.layers.Dense(256, activation = "relu"),
			tf.keras.layers.Dense(64, activation = "relu"),
			# Make rating predictions in the final layer.
			tf.keras.layers.Dense(1)
		])

	def call(self, inputs):
		user_id, broadcaster = inputs

		user_embedding = self.user_embeddings(user_id)
		broadcaster_embedding = self.broadcaster_embedding(broadcaster)

		return self.ratings(tf.concat([user_embedding, broadcaster_embedding], axis = 1))

### Loss and metrics

In [12]:
task = tfrs.tasks.Ranking(
  loss = tf.keras.losses.MeanSquaredError(),
  metrics=[tf.keras.metrics.RootMeanSquaredError()]
)

### Full model

In [13]:
class MovielensModel(tfrs.models.Model) :

	def __init__(self) :
		super().__init__()
		self.ranking_model: tf.keras.Model = RankingModel()
		self.task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
			loss = tf.keras.losses.MeanSquaredError(),
			metrics = [tf.keras.metrics.RootMeanSquaredError()]
		)

	def call(self, features: Dict[str, tf.Tensor]) -> tf.Tensor :
		return self.ranking_model(
			(features["viewer"], features["broadcaster"]))

	def compute_loss(self, features: Dict[Text, tf.Tensor], training = False) -> tf.Tensor :
		labels = features.pop("count")

		rating_predictions = self(features)

		# The task computes the loss and the metrics.
		return self.task(labels = labels, predictions = rating_predictions)

### Fitting and evaluating

In [14]:
model = MovielensModel()
model.compile(optimizer=tf.keras.optimizers.Adagrad(learning_rate=0.001))

In [15]:
cached_train = train.shuffle(100_000).batch(8192).cache()
cached_test = test.batch(1_000_00).cache()

In [16]:
model.fit(cached_train, epochs=10)

Epoch 1/10
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and a

<keras.callbacks.History at 0x7f8d3d8c6fd0>

In [17]:
model.evaluate(cached_test, return_dict=True)

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


{'root_mean_squared_error': 4.596546173095703,
 'loss': 21.12823486328125,
 'regularization_loss': 0,
 'total_loss': 21.12823486328125}

### Testing the ranking model

In [29]:
test_ratings = {}
test_broadcasters = ['pof:312971369', 'pof:312971368','pof:312971367']
for bradcaster_id in test_broadcasters:
    test_ratings[bradcaster_id] = model({
    "viewer": np.array([ "meetme:100116030" ]),
    "broadcaster": np.array([bradcaster_id])
  })

print("Ratings:")
for bradcaster_id, score in sorted(test_ratings.items(), key=lambda x: x[1], reverse=True):
    print(f"{bradcaster_id}: {score}")

Ratings:
pof:312971369: [[0.34892666]]
pof:312971368: [[0.33957207]]
pof:312971367: [[0.33957207]]


### Exporting for serving

In [31]:
tf.saved_model.save(model, "export")

Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module 'gast' has no attribute 'Constant'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'




Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'


Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: 'arguments' object has no attribute 'posonlyargs'
INFO:tensorflow:Assets written to: export/assets


INFO:tensorflow:Assets written to: export/assets


In [32]:
loaded = tf.saved_model.load("export")
loaded({"viewer": np.array(["meetme:100116030"]), "broadcaster": ["pof:312971369"]}).numpy()

array([[0.34892666]], dtype=float32)