Skip to content
This repository was archived by the owner on May 21, 2025. It is now read-only.
This repository was archived by the owner on May 21, 2025. It is now read-only.

How is the NDCG metric in TF Ranking computed for the whole dataset? #195

@DavidV17

Description

@DavidV17

Can someone please clarify how the NDCG metric is computed? Looking at

predictions: A `Tensor` with shape [batch_size, list_size]. Each value is

I see that a "ranking score" is used to order examples.

My first question is: how is this "ranking score" computed? Code examples that I found:
https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/examples/handling_sparse_features.ipynb
or
https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/examples/tf_ranking_tfrecord.py#L328
use
logits = tf.compat.v1.layers.dense(cur_layer, units=_GROUP_SIZE)
This suggests that for each example (answer to a query), the NN computes a softmax function to decide which rank (position within the group) is most appropriate for this example. When _GROUP_SIZE = 1, the output would be just a score. But what if _GROUP_SIZE > 1? How would a single score value be computed from the logits?

Next question:

predictions: A `Tensor` with shape [batch_size, list_size]. Each value is

also suggests that predictions is aTensor with shape [batch_size, list_size]. On which line of
https://github.com/tensorflow/ranking/blob/master/tensorflow_ranking/examples/tf_ranking_tfrecord.py
are such tensors formed? The tutorial https://colab.research.google.com/github/tensorflow/ranking/blob/master/tensorflow_ranking/examples/handling_sparse_features.ipynb also did not mention anything about grouping examples into such tensors...

If the data is indeed organized into such tensors, then I can see how a proper NDCG would be computed -- it would be just the average value of the NDCG values computed for each row of this tensor, with length = list_size. And within each row (a set of documents of size list_size), the standard way of computing NDCG would be used...

What if _GROUP_SIZE < _LIST_SIZE? Would just the top _GROUP_SIZE documents for each query be used to compute NDCG?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions