libSVM Parser Shuffles Documents #44

eggie5 · 2019-03-21T15:18:00Z

Using the libsvm parser I noticed my documents were getting shuffled. I found this code:

ranking/tensorflow_ranking/python/data.py

Line 411 in d7008f2

np.random.shuffle(doc_list)

Why would one shuffle the documents? Doesn't the order (position) of your relevance labels imply meaning? Wouldn't this throw off DCG?

DCG for reference:

$DCG = \sum^n_{i=1} \frac{rel_i}{\log(i+1)}$

dcg = lambda r: np.sum(r/np.log2(np.arange(2, r.size+2)))

The text was updated successfully, but these errors were encountered:

ramakumar1729 · 2019-03-25T16:46:04Z

Learning to rank algorithms ideally do not depend on the input order of the list of documents. DCG and other measures are over the ranking induced by the scores for the documents, and the position/rank of relevance labels within this.

Let me know if you have any further questions.

eggie5 · 2019-03-26T01:17:56Z

Is there any particular reason tf_ranking_libsvm.py (the non-notebook example) doesn't shuffle?

https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/examples/tf_ranking_libsvm.py#L116

xuanhuiwang · 2019-04-01T08:49:36Z

Since we only keep the list_size number of documents, shuffling makes us select a random set of list_size documents for every call. Otherwise, we only keep the top list_size documents.

Personally, I prefer to non-shuffling version.

ramakumar1729 self-assigned this Mar 25, 2019

xuanhuiwang closed this as completed Apr 1, 2019

rishabhsshah mentioned this issue Jun 7, 2019

Are predictions stable over time? #54

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libSVM Parser Shuffles Documents #44

libSVM Parser Shuffles Documents #44

eggie5 commented Mar 21, 2019 •

edited

Loading

ramakumar1729 commented Mar 25, 2019

eggie5 commented Mar 26, 2019

xuanhuiwang commented Apr 1, 2019

libSVM Parser Shuffles Documents #44

libSVM Parser Shuffles Documents #44

Comments

eggie5 commented Mar 21, 2019 • edited Loading

ramakumar1729 commented Mar 25, 2019

eggie5 commented Mar 26, 2019

xuanhuiwang commented Apr 1, 2019

eggie5 commented Mar 21, 2019 •

edited

Loading