-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A couple of questions #1
Comments
Hello, The NNDescent algorithm, which is currently the most robust, was tested with a dataset of 4 millions items (spam titles). For filtering, the cleanest way is by first filtering your RDD: In any way, don't hesitate to keep me informed... Regards, |
Ok thanks - from a quick look through the code it appears that this could potentially scale out with item size if the number of buckets is made large enough. Will be interesting to confirm. What about For #2, what I meant was rather this: Say I have 20 million user vectors, and 5 million item vectors. I'd like to compute the NN for item-item cosine similarity. This is easy, I just pass in Now say I would like to compute the user-item "similarity" (in reality, I'd take the dot product, but cosine sim could suffice and I guess one could implement a simple dot product "similarity" too). I could pass in an So what I'd want is to apply some step so that, for each user vector, I compute the NN from the set of item vectors. Hope this is more clear. |
Hello, The drawback of LSHSuperBit (and other partitioning algorithms) is that if For your second question, this use case is currently not possible, but I
What do you think about this? Regards, On Fri, Aug 28, 2015 at 9:45 AM, MLnick notifications@github.com wrote:
|
Ok, that makes sense The interface for prefilter looks good - though it looks like Node can only use the id String to do the filter? In which case I'd need to hack the id a bit to be something like Or, can I define a value for the Node that is a |
Hi, Just realized that you can achieve the same effect without this new You need to:
I posted an example on GitHub: Don't hesitate to let me know how it works for you... Regards, On Mon, Sep 7, 2015 at 9:52 AM, MLnick notifications@github.com wrote:
|
For 2, the use case would be say to compute user- and item- vectors in a CF model, and compute the NN vector set for item-item similarity, as well as user-item scores.
The text was updated successfully, but these errors were encountered: