How is the NDCG metric in TF Ranking computed for the whole dataset?

Can someone please clarify how the NDCG metric is computed? Looking at 
https://github.com/tensorflow/ranking/blob/d8c2e2e64a92923f1448cf5302c92a80bb469a20/tensorflow_ranking/python/metrics.py#L457
I see that a "ranking score" is used to order examples. 

My first question is: how is this "ranking score" computed? Code examples that I found: 
https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/examples/handling_sparse_features.ipynb
or
https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/examples/tf_ranking_tfrecord.py#L328
use
logits = tf.compat.v1.layers.dense(cur_layer, units=_GROUP_SIZE)
This suggests that for each example (answer to a query), the NN computes a softmax function to decide which rank (position within the group) is most appropriate for this example.  When _GROUP_SIZE = 1, the output would be just a score. But what if _GROUP_SIZE > 1? How would a single score value be computed from the logits?

Next question: 
https://github.com/tensorflow/ranking/blob/d8c2e2e64a92923f1448cf5302c92a80bb469a20/tensorflow_ranking/python/metrics.py#L457
also suggests that predictions is a`Tensor` with shape [batch_size, list_size]. On which line of 
https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/examples/tf_ranking_tfrecord.py
are such tensors formed? The tutorial https://colab.research.google.com/github/tensorflow/ranking/blob/master/tensorflow_ranking/examples/handling_sparse_features.ipynb also did not mention anything about grouping examples into such tensors...

If the data is indeed organized into such tensors, then I can see how a proper NDCG would be computed -- it would be just the average value of the NDCG values computed for each row of this tensor, with length =  list_size. And within each row (a set of documents of size list_size), the standard way of computing NDCG would be used...

What if _GROUP_SIZE < _LIST_SIZE? Would just the top _GROUP_SIZE documents for each query be used to compute NDCG?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is the NDCG metric in TF Ranking computed for the whole dataset? #195

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How is the NDCG metric in TF Ranking computed for the whole dataset? #195

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions