Upstream issue #11961: alternative de-duplication approach #1

jayaddison · 2024-02-07T11:20:51Z

Feature or Bugfix

N/A - discussion pull request.

Purpose

This pull request is to help describe an alternative approach I'd been thinking about while reviewing HTML Search: Fix duplicate results sphinx-doc/sphinx#11942.

Detail

Drop the title field from the search result de-duplication key (resultStr).

Relates

N/A

We currently list the same page multiple times if, for example, both the title and content search match it. Since the preview is always the same, this is not very helpful.

… the de-duplication string

wlach · 2024-02-07T12:20:38Z

sphinx/themes/basic/static/searchtools.js

@@ -345,7 +350,8 @@ const Search = {
    // note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept
    let seen = new Set();
    results = results.reverse().reduce((acc, result) => {
-      let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(',');
+      const [docName, _title, anchor, descr, _score, filename] = result;
+      const resultStr = `${docName},${anchor},${descr},${filename}`;


Hmm, I don't see how this would pass the test below. 🤔 The body search and title search will have different resultStr's (since they have different anchors), so you'd see them both in the results.

The test does succeed with that - could that imply a bug in the test?

There's only one search result because my other fixes haven't been applied (I left them out for ease of review). If you rebase your branch on top of these pull requests the test should start failing:

sphinx-doc#11960
sphinx-doc#11958

Hm. Are you sure? I can still get the test to pass with a docid of 1 instead of zero, and the partial-matching only affects the scoring, if I understand correctly?

Took a look, I think it's actually sphinx-doc#11959 (the multiple matches) that allows multiple results to be generated as you'd expect. If you apply this squashed patch your tests should start failing: https://gist.github.com/wlach/74124608c2113d8eb9737cdced4762db

Bug raised as sphinx-doc#11965.

A further change may be needed to duplicate between those client-side, yes, but I think it could be simplified (more like e2d07a1) if the duplicate titles index entries are removed.

Hmm, I'm confused-- that looks similar to my original solution which you rejected here:

sphinx-doc#11942 (comment)

It's not ideal, but it's smaller -- there's no isDocumentTitle, and no additional boolean passed down to the de-duplication phase.

(and in fact, it might not even be required. if the index-construction issue is valid and fixed, then we could try removing it and relying purely on an updated de-duplication key)

It's not ideal, but it's smaller -- there's no isDocumentTitle, and no additional boolean passed down to the de-duplication phase.

Ok, I don't have a super strong opinion: I thought your original critique had a point but I don't think it matters all that much in the end.

(and in fact, it might not even be required. if the index-construction issue is valid and fixed, then we could try removing it and relying purely on an updated de-duplication key)

I am pretty sure we still have the problem of the title search / term search returning more-or-less the same result. The unit test uses a hand-constructed index without this problem and still reproduces the bug.

…nges; currently requires enabling mutability of _displayNextItem - not great.

…inx-doc#11960

wlach · 2024-02-27T14:15:07Z

I'm going to close this as it looks like the discussion has ended and I'm satisfied with my approach in sphinx-doc#11942

wlach and others added 10 commits February 7, 2024 11:19

Fix duplicate search results

1a0e6a3

We currently list the same page multiple times if, for example, both the title and content search match it. Since the preview is always the same, this is not very helpful.

Another approach

eea244f

Review feedback

6473d51

HTML Search: Add test, refactoring to make that possible

d2648b2

Keep anchor, de-duplicate on less keys

2d82cb7

De-duplicate on document title only

5819346

experimentation: undo local fix attempt

4208655

experimentation: undo test expectation change

17da9b1

experimentation: alternative de-duplication method

344274b

experimentation: compared to original, omit only the title field from…

a94ab83

… the de-duplication string

jayaddison mentioned this pull request Feb 7, 2024

HTML Search: Fix duplicate results sphinx-doc/sphinx#11942

Closed

jayaddison changed the title ~~Upstream issue #11942: alternative de-duplication approach~~ Upstream issue #11961: alternative de-duplication approach Feb 7, 2024

wlach reviewed Feb 7, 2024

View reviewed changes

jayaddison added 6 commits February 7, 2024 13:09

experimentation: use jasmine.createSpy to avoid function contract cha…

1f2eb3d

…nges; currently requires enabling mutability of _displayNextItem - not great.

cleanup: remove unnecessary '.and.returnValue' call-chain

2b7f49f

experimentation: update test case to avoid dependency on fix from sph…

04ad76a

…inx-doc#11960

experimentation: apply patch derived from sphinx-doc#11959

dcc3443

experimentation: expect anchor-less title result

e2d07a1

experimentation: cleanup: remove 'resultStr' de-dup key changes

251bf95

wlach closed this Feb 27, 2024

jayaddison deleted the issue-11961/pr-11942-review-experimentation branch February 27, 2024 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream issue #11961: alternative de-duplication approach #1

Upstream issue #11961: alternative de-duplication approach #1

jayaddison commented Feb 7, 2024

wlach Feb 7, 2024

jayaddison Feb 7, 2024

wlach Feb 7, 2024

jayaddison Feb 7, 2024

wlach Feb 7, 2024

jayaddison Feb 7, 2024

wlach Feb 7, 2024

jayaddison Feb 7, 2024

jayaddison Feb 7, 2024

wlach Feb 7, 2024 •

edited

wlach commented Feb 27, 2024

Upstream issue #11961: alternative de-duplication approach #1

Upstream issue #11961: alternative de-duplication approach #1

Conversation

jayaddison commented Feb 7, 2024

Feature or Bugfix

Purpose

Detail

Relates

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlach Feb 7, 2024 • edited

Choose a reason for hiding this comment

wlach commented Feb 27, 2024

wlach Feb 7, 2024 •

edited