Skip to content

Optimize BM25 scoring in DAAT MaxScore#1629

Merged
sre-ci-robot merged 1 commit into
zilliztech:mainfrom
lyang24:optimize-daat-maxscore-bm25-hoist
May 20, 2026
Merged

Optimize BM25 scoring in DAAT MaxScore#1629
sre-ci-robot merged 1 commit into
zilliztech:mainfrom
lyang24:optimize-daat-maxscore-bm25-hoist

Conversation

@lyang24
Copy link
Copy Markdown
Contributor

@lyang24 lyang24 commented May 16, 2026

issue: #1636

BM25 DAAT_MAXSCORE Benchmark: optimize-daat-maxscore-bm25-hoist vs main

Optimization: Hoist BM25 doc-normalization (p2 + p3 * row_sums[doc_id]) out of the per-term inner loop in DaatMaxScoreSearcher, and precompute qval * p1 per cursor
in BM25DimScorer.

Setup

  • Machine: AWS r6i.2xlarge (Intel Xeon Platinum 8375C, 8 vCPU, 64 GB), us-west-1c; single-core pinned (taskset -c 2)
  • Metric: BM25 (k1=1.2, b=0.75), algo: DAAT_MAXSCORE, topk=10
  • Protocol: warmup=50, repeat=3

Results

Dataset Docs / Avg DL nq Branch QPS (3 runs) QPS Δ Recall p95 (ms)
MS MARCO 8.84M / 38.55 500 main 393.25 (391.97 / 394.49 / 393.29) 0.99600 8.40
MS MARCO 8.84M / 38.55 500 branch 436.92 (437.45 / 434.43 / 438.89) +11.11% 0.99600 7.59
MS MARCO 8.84M / 38.55 1000 main 410.30 (412.86 / 412.31 / 405.72) 0.99440 7.94
MS MARCO 8.84M / 38.55 1000 branch 449.52 (449.89 / 449.14 / 449.52) +9.56% 0.99440 7.17
HotpotQA 5.23M / 2.68 500 main 1649.78 (1640.10 / 1653.93 / 1655.31) 0.87840 1.51
HotpotQA 5.23M / 2.68 500 branch 1763.42 (1751.20 / 1765.47 / 1773.57) +6.89% 0.87840 1.40
HotpotQA 5.23M / 2.68 1000 main 1551.54 (1553.00 / 1552.27 / 1549.36) 0.88080 1.53
HotpotQA 5.23M / 2.68 1000 branch 1788.46 (1781.61 / 1789.68 / 1794.09) +15.27% 0.88080 1.31

@mergify
Copy link
Copy Markdown

mergify Bot commented May 16, 2026

@lyang24 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

@mergify
Copy link
Copy Markdown

mergify Bot commented May 16, 2026

@lyang24 e2e jenkins job failed, comment /run-e2e can trigger the job again.

@lyang24 lyang24 force-pushed the optimize-daat-maxscore-bm25-hoist branch from 4712e80 to 897294a Compare May 16, 2026 07:43
@mergify
Copy link
Copy Markdown

mergify Bot commented May 16, 2026

@lyang24 e2e jenkins job failed, comment /run-e2e can trigger the job again.

@zhengbuqian
Copy link
Copy Markdown
Collaborator

/run-e2e

@zhengbuqian
Copy link
Copy Markdown
Collaborator

this is great, thanks for the PR!

@sre-ci-robot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lyang24, zhengbuqian

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify
Copy link
Copy Markdown

mergify Bot commented May 18, 2026

@lyang24 e2e jenkins job failed, comment /run-e2e can trigger the job again.

@alexanderguzhva
Copy link
Copy Markdown
Collaborator

waiting for #1619

@alexanderguzhva
Copy link
Copy Markdown
Collaborator

@lyang24
please rebase to master. Thanks.

Signed-off-by: lyang24 <lanqingy93@gmail.com>
@lyang24 lyang24 force-pushed the optimize-daat-maxscore-bm25-hoist branch from 897294a to 585548d Compare May 19, 2026 20:21
@lyang24
Copy link
Copy Markdown
Contributor Author

lyang24 commented May 19, 2026

@lyang24 please rebase to master. Thanks.

rebased thanks

@mergify mergify Bot added the ci-passed label May 19, 2026
@sparknack
Copy link
Copy Markdown
Collaborator

/lgtm

@lyang24
Copy link
Copy Markdown
Contributor Author

lyang24 commented May 20, 2026

/tide

@lyang24
Copy link
Copy Markdown
Contributor Author

lyang24 commented May 20, 2026

@Mergifyio refresh

@mergify
Copy link
Copy Markdown

mergify Bot commented May 20, 2026

refresh

✅ Pull request refreshed

@foxspy
Copy link
Copy Markdown
Collaborator

foxspy commented May 20, 2026

/kind improvement

@sre-ci-robot sre-ci-robot merged commit 3b3f6a5 into zilliztech:main May 20, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants