Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libSVM Parser Shuffles Documents #44

Closed
eggie5 opened this issue Mar 21, 2019 · 3 comments
Closed

libSVM Parser Shuffles Documents #44

eggie5 opened this issue Mar 21, 2019 · 3 comments
Assignees

Comments

@eggie5
Copy link
Contributor

eggie5 commented Mar 21, 2019

Using the libsvm parser I noticed my documents were getting shuffled. I found this code:

np.random.shuffle(doc_list)

Why would one shuffle the documents? Doesn't the order (position) of your relevance labels imply meaning? Wouldn't this throw off DCG?

DCG for reference:

DCG = \sum^n_{i=1} \frac{rel_i}{\log(i+1)}

dcg = lambda r: np.sum(r/np.log2(np.arange(2, r.size+2)))
@ramakumar1729
Copy link
Member

Learning to rank algorithms ideally do not depend on the input order of the list of documents. DCG and other measures are over the ranking induced by the scores for the documents, and the position/rank of relevance labels within this.

Let me know if you have any further questions.

@ramakumar1729 ramakumar1729 self-assigned this Mar 25, 2019
@eggie5
Copy link
Contributor Author

eggie5 commented Mar 26, 2019

Is there any particular reason tf_ranking_libsvm.py (the non-notebook example) doesn't shuffle?

https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/examples/tf_ranking_libsvm.py#L116

@xuanhuiwang
Copy link
Contributor

Since we only keep the list_size number of documents, shuffling makes us select a random set of list_size documents for every call. Otherwise, we only keep the top list_size documents.

Personally, I prefer to non-shuffling version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants