Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference at test time? #5

Closed
smeyerhot opened this issue Sep 28, 2021 · 3 comments
Closed

Inference at test time? #5

smeyerhot opened this issue Sep 28, 2021 · 3 comments

Comments

@smeyerhot
Copy link

How should the FM be used to make predictions? For example, say I train this model on 1000 user movie pairs. I want to make a prediction for an unseen user, which is a vector where the values are predicted ratings for all possible movies. However, in the examples it looks like the same users get used for training and testing. ie. for user A the model trains on 80% of the known movie ratings and then tries to predict the remaining 20%. How should we call the model when we want to predict 80% of ratings for an unseen user ie. one not in the training set?

In other words I would like to take a vector of length n where I have m known ratings and infer the remaining n-m? Would I have to include the m known ratings in the training set?

@smeyerhot smeyerhot changed the title Test time? Inference at test time? Sep 28, 2021
@tohtsky
Copy link
Owner

tohtsky commented Sep 29, 2021

So you are considering a pure matrix factorization (i.e., only features are user id and item ids) model, right?
In that case, as you said, you have to include known ratings of the user in the training set.
Is there any reason why you can't do that? (training requires too much time?)

@smeyerhot
Copy link
Author

Thanks for getting back to me!

Yes, pure matrix factorization model. No reason right now, but yes I am worried that it may be expensive to train a new model every time I want to give a new recommendation but I guess that is necessary.

Just to recap, I have n items and m < n ratings. I should pass in a table like this:

User ID Item Id
1 1
1 2
1 ...
1 n - m

Where I have n-m rows for all unrated items (obviously they wont all be in order)

@tohtsky
Copy link
Owner

tohtsky commented Sep 29, 2021

Thank you for clarifying the setting!
I still believe you have to include known ratings of the user in the training set.
I think this is a kind of cold-start problem.

For refefence, in a recent article (though it is for implicit-feedback setting) https://arxiv.org/abs/1911.07698 ,
there is a similar problem regarding the evaluation of "Mult-VAE" model (page 31 of the latest version).
The workaround there (and reference therein) is to pick up other users (known at the training time) who have similar rating logs and average the latent factors thereof.

@tohtsky tohtsky closed this as completed Jan 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants