Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issue found with cross term evaluation #1

Open
kgierach opened this issue Oct 9, 2015 · 0 comments
Open

issue found with cross term evaluation #1

kgierach opened this issue Oct 9, 2015 · 0 comments

Comments

@kgierach
Copy link

kgierach commented Oct 9, 2015

Hi Ruifeng,

I have run a dataset thru your FM engine, which I generated, and contains specific cross term relationships. With Rendle’s libfm project I am getting the expected results, however, when using your library, I am only getting noise, or rather false positives. Can you provide some usage tips, or would you be willing to run thru the data set and look for possible errors somewhere in the code?

I am using the Gradient Descent optimization algorithm.

All individual weights come out as zero, and I have tried using different values for the learning rate as well. However I’m not actually concerned with the individual weights, and I’m considering this to be a symptom of the underlying problem.

My concern is to locate outlier cross terms in the data. With your library I don’t get a single expected cross term, but with libfm I get all the cross terms, and only a handful of false postives as well.

Here’s my expected cross – term list:
1,7
3,9
5,10
6,12
14,15
16,17
19,20

My method of finding the cross term is to:
Take the output matrix of the model F
C = F * F_transpose
Use C to lookup the terms of interest by striping by row, compute mean and variance, assume a normal distribution and look for the upper terms exceeding a threshold. If no terms are found then I decrease the threshold gradually to a point until I either find some “outliers” or find none.
Examine the list of "outliers" for my cross terms of interest, and I don't care about the order.
Let me re-state that using the same method works using Rendle’s libfm engine. I have tried replicating the algorithm parameters that I used in his library as well, when running with your code in Spark.

Thank you,
Karl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant