pt.Experiment consistency in missing qids in topics/qrels/runs #228

seanmacavaney · 2021-09-16T10:46:07Z

fixes #226

seanmacavaney · 2021-09-16T10:48:08Z

pyterrier/pipelines.py

-                    s_metric = rev_mapping.get(metric, str(metric))
-                    aggregators[s_metric].add(evalMeasuresDict[q][s_metric])
-            evalMeasuresDict = {m: agg.result() for m, agg in aggregators.items()}
+        evalMeasuresDict = _ir_measures_to_dict(


This change may be controversial. When batch_size is set, it no longer evaluates after each batch, but rather keeps all the results and evaluates everything at the end.

Alternatively, I could update so it filters the qrels each time.

seanmacavaney · 2021-09-16T10:50:00Z

pyterrier/pipelines.py

@@ -95,7 +97,17 @@ def _ir_measures_to_dict(
            metric = m.measure
            metric = rev_mapping.get(metric, str(metric))
            rtr[m.query_id][metric] = m.value
+        if backfill_qids is not None:


It was necessary to move the backfilling stuff down to this level where rev_mapping is available. It was actually buggy before: If a family name was given and perquery was set, the family name would appear with a 0 in the summary, rather than the member names. Tests catch this now.

seanmacavaney · 2021-09-16T10:50:28Z

pyterrier/pipelines.py

@@ -56,15 +57,15 @@ def _color_cols(data, col_type,
 }

 def _convert_measures(metrics : MEASURES_TYPE) -> Tuple[Sequence[BaseMeasure], Dict[BaseMeasure,str]]:
-    from ir_measures import convert_trec_name
+    from ir_measures import parse_trec_measure


same function, old name was deprecated

seanmacavaney · 2021-09-16T10:51:06Z

pyterrier/pipelines.py

@@ -346,6 +354,8 @@ def _apply_round(measure, value):
            evalDict[name] = evalMeasuresDict
        else:
            import builtins
+            if mrt_needed:


mrt only applis when perquery is false

cmacdonald · 2021-09-16T10:52:50Z

sorry, evaluation after each batch_size is definitely preferred - if you have lots of queries, this avoids keeping ALL of them in memory. Esp. a big challenge for MSMARCO dev/eval sets.

seanmacavaney · 2021-09-16T10:53:55Z

yep, makes sense, I'll update.

cmacdonald · 2021-09-16T10:47:31Z

pyterrier/pipelines.py

@@ -289,6 +299,14 @@ def _apply_round(measure, value):
        # the commented variant would drop queries not having any RELEVANT labels
        # topics = topics.merge(qrels[qrels["label"] > 0][["qid"]].drop_duplicates())        
        topics = topics.merge(qrels[["qid"]].drop_duplicates())
+        if len(topics) == 0:
+            raise ValueError('There is no overlap between the query IDs found in the topics and qrels. If this is intentional, set filter_qrels=False and drop_unused=False.')


cmacdonald · 2021-09-16T10:49:00Z

pyterrier/pipelines.py

@@ -214,6 +223,7 @@ def Experiment(
            Applying a batch_size is useful if you have large numbers of topics, and/or if your pipeline requires large amounts of temporary memory
            during a run.
        drop_unused(bool): If True, will drop topics from the topics dataframe that have qids not appearing in the qrels dataframe. 
+        filter_qrels(bool): If True, will drop topics from the qrels dataframe that have qids not appearing in the topics dataframe. 


Can filter_qrels and drop_unused be both set to True? Surely not.
One wonders if we should make these have similar named:
filter_by_topics and filter_by_qrels?
or filter_by='topics'

Yeah, you can have both true. It's essentially left outer join, right outer join, and inner join.

If qrels include A, B and topics include B, C:

drop_unused=True filter_qrels=True gives B

drop_unused=True filter_qrels=False gives A, B

drop_unused=False filter_qrels=True gives B, C

drop_unused=False filter_qrels=False gives A, B, C

I'm in favour of giving them more similar names. filter_by_topics (formerly filter_qrels) and filter_by_qrels (formerly drop_unused) sound reasonable to me. Should probably pull from kwargs and give a warning if drop_unused is used for backward compatibility.

Agreed. We may need something in the documentation about this filtering.

Are we sure

filter_by='topics'

filter_by='qrels'

filter_by='both'

filter_by=None

shouldnt be the correct nomenclature?

I've been thinking about it in terms of separate decisions the user could make, and as a result, a single option that does both could be confusing.

Do you want skip topics that do not appear in the qrels? If so, set filter_by_qrels=True.

Do you want to evaluate across all topics from the qrels, even if they do not appear in the requested topics? If so, set filter_by_topics=False. I'd expect this to be particularly uncommon and be limited to situations like the one brought up in pt.Experiment consistency in missing qids in topics/qrels/runs #226.

I could probably be convinced otherwise, though.

Ok, you convinced me. Carry on.

cmacdonald · 2021-09-16T10:49:16Z

pyterrier/pipelines.py

+    if filter_qrels:
+        qrels = qrels.merge(topics[["qid"]].drop_duplicates())
+        if len(qrels) == 0:
+            raise ValueError('There is no overlap between the query IDs found in the topics and qrels. If this is intentional, set filter_qrels=False and drop_unused=False.')


ditto. qids please rather than query IDs.

cmacdonald · 2021-09-16T10:49:55Z

pyterrier/pipelines.py

@@ -1,6 +1,7 @@
 from collections import defaultdict
 from warnings import warn
 import os
+import itertools


is itertools used?

cmacdonald · 2021-09-16T10:50:54Z

pyterrier/pipelines.py

@@ -95,7 +97,17 @@ def _ir_measures_to_dict(
            metric = m.measure
            metric = rev_mapping.get(metric, str(metric))
            rtr[m.query_id][metric] = m.value
+        if backfill_qids is not None:


I think we need a comment explaining "what backfill is/means"

docs/experiments.rst

pt.Experiment consistency in missing qids in topics/qrels/runs

da4fdfa

seanmacavaney requested a review from cmacdonald September 16, 2021 10:46

seanmacavaney commented Sep 16, 2021

View reviewed changes

cmacdonald requested changes Sep 16, 2021

View reviewed changes

per feedback from @cmacdonald

a12537c

seanmacavaney requested a review from cmacdonald September 17, 2021 10:42

cmacdonald reviewed Sep 17, 2021

View reviewed changes

docs/experiments.rst Outdated Show resolved Hide resolved

update experiments.rst

20dd0cf

cmacdonald merged commit c6d7c97 into master Sep 17, 2021

cmacdonald added this to the 0.7 milestone Sep 17, 2021

cmacdonald deleted the irm branch December 20, 2021 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pt.Experiment consistency in missing qids in topics/qrels/runs #228

pt.Experiment consistency in missing qids in topics/qrels/runs #228

seanmacavaney commented Sep 16, 2021

seanmacavaney Sep 16, 2021

seanmacavaney Sep 16, 2021

seanmacavaney Sep 16, 2021

seanmacavaney Sep 16, 2021

cmacdonald commented Sep 16, 2021

seanmacavaney commented Sep 16, 2021

cmacdonald Sep 16, 2021

cmacdonald Sep 16, 2021

seanmacavaney Sep 16, 2021

cmacdonald Sep 16, 2021

seanmacavaney Sep 16, 2021

cmacdonald Sep 16, 2021

cmacdonald Sep 16, 2021

cmacdonald Sep 16, 2021

cmacdonald Sep 16, 2021

pt.Experiment consistency in missing qids in topics/qrels/runs #228

pt.Experiment consistency in missing qids in topics/qrels/runs #228

Conversation

seanmacavaney commented Sep 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cmacdonald commented Sep 16, 2021

seanmacavaney commented Sep 16, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment