Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup scores, search results and DocID #3849

Merged
merged 33 commits into from
Jan 19, 2024
Merged

Cleanup scores, search results and DocID #3849

merged 33 commits into from
Jan 19, 2024

Conversation

donomii
Copy link
Contributor

@donomii donomii commented Dec 4, 2023

What's being changed:

  • Don't assign to additionalProperties["score"] inside bm25f, wait for explorer to do it
  • Assign correctly to SecondarySortValue
  • Do not assign keyword scores to .Dist field
  • Remove .Score() from StorObj
  • Renamed .docID -> DocID to support testing from outside module, and for consistency with Weaviate's code guidelines
  • Merges hybrid results and search results structures
  • Adds more tests for hybrid, including specific tests for relativeScoreFusion

This PR is motived by reports from users about issues including scores not appearing, or scores being incorrect. I was unable to reproduce the exact issues reported, but a look through the code revealed the opportunity for some improvements.

There were some code paths where it was possible for scores to become out of sync or misreported, so this PR reworks our internal handling of scores. For instance, DocID was not being kept in the same structure as the UUID and Score and other object attributes. This led to situations where code would try to rebuild the DocID from the UUID, or by attaching an auxillary structure to carry that information. DocID is now kept in the same structure as the rest of the object attributes, from the moment the structure is created. DocID is a pointer value because 0x00000 is a valid document id, so there needs to be a way to tell the difference between document #0 and an uninitialised value.

Additionally, sometimes the score was being stored in the .Additional[] property structure, sometimes it was kept in the .Score field. This patch removes all the places where score was directly assigned to .Additional, and replaces them with .Score. .Additional is(and was) populated at the end of the request, using values from the result structure.

There were also some places where .SecondarySortValue appeared to be set to the wrong value, or not set at all. Correcting these did not trigger any tests, so they were probably unused, which is its own problem.

The remainder are some smaller cleanups to match the Weaviate coding style, adding more tests.

Remaining TODO: we do some extra copying on the search results that we don't need to, so we can save a little bit of memory and time by moving all functions to accept []*search.Result

Review checklist

  • Chaos pipeline run or not necessary. Link to pipeline:
  • All new code is covered by tests where it is reasonable.
  • Performance tests have been run or not necessary.

@donomii donomii self-assigned this Dec 4, 2023
@donomii donomii changed the title Fix explainScore returning nulls on a search with no query Fix code keeping separate copies of "score", fix assigning score to .Dist Dec 6, 2023
@donomii donomii requested a review from dirkkul December 6, 2023 11:33
@donomii donomii marked this pull request as ready for review December 6, 2023 11:33
@donomii donomii changed the title Fix code keeping separate copies of "score", fix assigning score to .Dist Cleanup scores, search results and DocID Dec 22, 2023
dirkkul
dirkkul previously approved these changes Dec 29, 2023
Copy link
Contributor

@dirkkul dirkkul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not look at every change in detail, but this contains nothing scary :)

Copy link

sonarcloud bot commented Jan 18, 2024

Quality Gate Failed Quality Gate failed

Failed conditions

47.5% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

@donomii donomii merged commit 1d9faf5 into master Jan 19, 2024
19 of 20 checks passed
@donomii donomii deleted the explainScore-nulls branch January 19, 2024 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants