Save & Serve tfrs that does not recommend items previously interacted with using BruteForce layer #400

yrianderreumaux · 2021-11-04T18:24:23Z

I am working with a relatively small data set and am noticing that users are being recommended a lot of items they have previously interacted with and I would like to pre-filter out these items before saving and serving the mode on ai platform. I realize this could also be done after the list has been generated, but the way our app works makes it rather difficult (and slow) to filter post hoc on the client side.

I have seen a few other issues addressing related questions (e.g., 307, 113). However, I have not seen a definitive solution, and these seem to deal with either excluding a set of items for all users (rather than a unique set for each user), or excluding items from test recommendations.

Currently, I generate an index with 80 recommended items: index = tfrs.layers.factorized_top_k.BruteForce(model.user_model, k = 80)

and then remove any duplicates that existed in the training df: index.index_from_dataset( tf.data.Dataset.zip((unique_recipe_id_pred.batch(80), unique_recipe_id_pred.batch(80).map(model.recipe_model))))

This works well, but how could I also exclude items in the index that users have interacted with? Could the query with exclusions function be a possible solution?

Apologies if I have missed something here and thanks in advance for any advice!

The text was updated successfully, but these errors were encountered:

msvensson222 · 2021-11-04T19:54:50Z

I'm not sure why you would not use the query with exclusions feature, seems to fit your use case. However, if you for some reason cannot do that, one idea that might work is to make your own version of an "index", eg get the user and recipe embeddings, and do the math yourself.

# First, select all recipes that your user(s) have not interacted with previously;
possible_candidates_for_a_user = ...

# Get candidate embeddings for these recipes
candidate_embeddings = recipe_model.predict(possible_candidates_for_a_user.batch(256))

# Get user embeddings
user_dict = {"customer_no": customer} # Add what other user features you might have as well
user_embeddings = user_model.predict(tf.data.Dataset.from_tensor_slices(query_dict).batch(256))

# Get top-k
k = 80

# Get a score for each recipe
scores = np.dot(user_embeddings, candidate_embeddings.T)
indices = np.argpartition(scores, range(-k, 0), axis=1)[:, :-(k + 1):-1]

top_k_recipies_to_recommend = candidates[indices]

Thoughts?

yrianderreumaux · 2021-11-05T21:19:36Z

Thanks @msvensson222 for engaging with the question!
Your proposed solution is a creative one and something I would not have considered.

Two quick follow up questions: (1) I can see how this would work with a single user, but it's still not clear how this would work with a set of users that each interacted with a different list of recipes. For the latter, would possible_candidates_for_a_user be a dictionary with each unique user id referencing a list of all previously seen recipe ids? and (2) It's important for me to save the model as a graph that takes in raw json user ids so that it can be hosted on ai platform for on-demand predictions, how could this version of an index be saved in such a way?

patrickorlando · 2021-11-18T01:51:21Z

You need to track the state of what user's have interacted with separately from the model.
If for each user you maintain a list of item_ids that the user has interacted with, you can then filter them out in your caller API, or wrap the BruteForce Index in a model with some logic to remove them.

maciejkula · 2021-11-18T01:58:45Z

Patrick's answer is spot on - ultimately, every real production system has a business logic layer that sits between the model and the user. This layer will normally keep track of things like past interactions, and filters those out of recommendations if that makes sense for the product in question.

yrianderreumaux · 2021-11-19T00:03:18Z

@maciejkula & @patrickorlando thank you both for your thoughts on this. I am curious if you could point me to any code regarding the second option of wrapping the BruteForce index in a model with logic, I fear my technical skills are not sufficient to figure it out alone. In the meantime, using your suggestion we have added a middle (business) layer using React Native to filter out previously interacted with items, which whilst slowing us down ~.5 seconds per call, has solved the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save & Serve tfrs that does not recommend items previously interacted with using BruteForce layer #400

Save & Serve tfrs that does not recommend items previously interacted with using BruteForce layer #400

yrianderreumaux commented Nov 4, 2021

msvensson222 commented Nov 4, 2021 •

edited

yrianderreumaux commented Nov 5, 2021

patrickorlando commented Nov 18, 2021

maciejkula commented Nov 18, 2021 •

edited

yrianderreumaux commented Nov 19, 2021

Save & Serve tfrs that does not recommend items previously interacted with using BruteForce layer #400

Save & Serve tfrs that does not recommend items previously interacted with using BruteForce layer #400

Comments

yrianderreumaux commented Nov 4, 2021

msvensson222 commented Nov 4, 2021 • edited

yrianderreumaux commented Nov 5, 2021

patrickorlando commented Nov 18, 2021

maciejkula commented Nov 18, 2021 • edited

yrianderreumaux commented Nov 19, 2021

msvensson222 commented Nov 4, 2021 •

edited

maciejkula commented Nov 18, 2021 •

edited