You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, the first line of CF::GetRecommendations() reads
rating = w * h
which has the issue that the RAM on the system must be equal to the number of items vs. the number of recommendations. Then, we run tree-based kNN on the rating matrix, which is of high dimension, which will be slow:
// Calculate the neighborhood of the queried users.
// This should be a templatized option.
neighbor::AllkNN a(rating, query);
arma::mat resultingDistances; // Temporary storage.
a.Search(numUsersForSimilarity, neighborhood, resultingDistances);
But this isn't necessary. Note that what we are trying to do is find the most similar users (columns), but we have decomposed the input matrix X = W * H. (H is the matrix that holds the user preferences, depending on how you look at it.)
Now, some quick linear algebra gives us that X.col(i) = W * H.col(i). But remember, we are looking for the nearest neighbors of X.col(i), so this is equivalent to the nearest neighbors of H.col(i). Why aren't we searching for the nearest neighbors in the H matrix, then?
A patch for this ticket should also include some information on the speedup obtained (in either a test program or the cf executable), and verification that the module provides the same results (perhaps through the already written tests).
The text was updated successfully, but these errors were encountered:
My linear algebra is wrong. X.col(i) = W * H.col(i), but it is not true that d(X.col(i), X.col(j)) = d(H.col(i), H.col(j)) unless we make some assumptions about W, which we can't do. Oops. Self-assigning until I figure out how this can be done.
d(X.col(i), X.col(j)) = d(W H.col(i), W H.col(j)).
For the L2 distance (which is fine for now), we can show that this is the Mahalanobis distance with M^{-1} = W^T W. Decompose M^{-1} = L L^T (Cholesky decomposition), and then multiply H by L^T to obtain H' (this takes O(r^2 n) time). Then, once this is done,
d(X.col(i), X.col(j)) = d(H'.col(i), H'.col(j))
and each distance calculation takes only O(r) time. We can use simple nearest neighbor search out of the box on H', then.
Right now, the first line of
CF::GetRecommendations()
readswhich has the issue that the RAM on the system must be equal to the number of items vs. the number of recommendations. Then, we run tree-based kNN on the rating matrix, which is of high dimension, which will be slow:
But this isn't necessary. Note that what we are trying to do is find the most similar users (columns), but we have decomposed the input matrix X = W * H. (H is the matrix that holds the user preferences, depending on how you look at it.)
Now, some quick linear algebra gives us that X.col(i) = W * H.col(i). But remember, we are looking for the nearest neighbors of X.col(i), so this is equivalent to the nearest neighbors of H.col(i). Why aren't we searching for the nearest neighbors in the H matrix, then?
A patch for this ticket should also include some information on the speedup obtained (in either a test program or the
cf
executable), and verification that the module provides the same results (perhaps through the already written tests).The text was updated successfully, but these errors were encountered: