MMR reranking via mmr_lambda hidden column by MayCXC · Pull Request #6 · vlasky/sqlite-vec

MayCXC · 2026-02-27T22:41:17Z

Rebased version of asg017#267 (issue: asg017#266) for this fork. Adds a mmr_lambda hidden column to vec0 for Maximal Marginal Relevance reranking in KNN queries.

SELECT rowid, distance FROM vec_items
WHERE embedding MATCH ? AND k = 10 AND mmr_lambda = 0.5;

Composes with this fork's distance constraints, partition keys, and all vector types/metrics.

Also fixes the pre-existing test_shadow snapshot failure (missing ORDER BY on pragma_table_list).

Full design and rationale in the upstream PR.

Add Maximal Marginal Relevance (MMR) support to vec0 virtual table. When mmr_lambda is provided in a KNN query, candidates are over-fetched and then greedily re-selected to balance relevance against diversity. API: WHERE embedding MATCH ? AND k = 10 AND mmr_lambda = 0.7 - mmr_lambda range [0.0, 1.0]: 1.0 = pure relevance, 0.0 = pure diversity - Over-fetch factor: 5x (capped at k_max=4096) - Supports float32, int8, and bit vector types - All distance metrics (L2, cosine, L1, hamming) - Zero impact when mmr_lambda is not provided

9 test functions covering: - Cosine diversity (baseline vs lambda=1.0, 0.5, 0.0) - L2 distance metric compatibility - Int8 vector element type - Cluster monopoly breaking - Composition with distance constraints - Composition with partition keys - Edge cases (k=1, k=0) - Error handling (invalid lambda range) - Insert guard for hidden column

pragma_table_list does not guarantee row order. Add ORDER BY name to the two shadow table queries so the snapshot is deterministic.

…r KNN queries -- see vlasky#6

mceachen · 2026-02-27T23:20:49Z

Thanks for this! Claude (and I) just merged it into my fork.

One fix it made: in vec0_mmr_rerank, the copy-back loop at the end iterates k_target times, but the greedy selection loop can terminate early (via if (best_idx < 0) break), leaving the tail of out_rowids/out_distances uninitialized. We added an n_selected counter and an out_n_selected output parameter so only the actually-selected entries are copied back. The caller now sets k_used = n_selected instead of k_used = k_original. (You can see my referenced commit for details)

The copy-back loop iterated k_target times, but the greedy selection loop can terminate early via `if (best_idx < 0) break`, leaving the tail of out_rowids/out_distances uninitialized. Add an n_selected counter and out_n_selected output parameter so only actually-selected entries are copied back. The caller now sets k_used = n_selected instead of k_used = k_original. Credit: mceachen (vlasky#6)

vlasky · 2026-02-28T04:07:21Z

Thanks for this contribution! Merged.

I added a follow-up commit (8d4ef9e) to normalize the diversity term in the MMR greedy loop. The relevance term was already normalized to [0,1] by dividing by max_dist, but the inter-candidate similarity used raw distances (1 - d). For cosine distance this is fine since values are already in [0,1], but for L2 and L1 where distances are unbounded, the two terms in the MMR score were on different scales - meaning mmr_lambda didn't have a consistent effect across distance metrics.

The fix divides the inter-candidate distance by max_dist to match, so both terms are on the same scale regardless of metric.

MayCXC · 2026-02-28T07:06:43Z

@mceachen @vlasky thank you both for the very rapid fixes :)

MayCXC added 3 commits February 27, 2026 22:28

Fix test_shadow snapshot ordering on newer SQLite versions

6480b73

pragma_table_list does not guarantee row order. Add ORDER BY name to the two shadow table queries so the snapshot is deterministic.

mceachen added a commit to photostructure/sqlite-vec that referenced this pull request Feb 27, 2026

feat: add mmr_lambda hidden column and MMR reranking functionality fo…

bc60431

…r KNN queries -- see vlasky#6

vlasky merged commit 69dfda1 into vlasky:main Feb 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMR reranking via mmr_lambda hidden column#6

MMR reranking via mmr_lambda hidden column#6
vlasky merged 4 commits intovlasky:mainfrom
MayCXC:mmr-reranking-vlasky

MayCXC commented Feb 27, 2026 •

edited

Loading

Uh oh!

mceachen commented Feb 27, 2026 •

edited

Loading

Uh oh!

vlasky commented Feb 28, 2026

Uh oh!

MayCXC commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MayCXC commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mceachen commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vlasky commented Feb 28, 2026

Uh oh!

MayCXC commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MayCXC commented Feb 27, 2026 •

edited

Loading

mceachen commented Feb 27, 2026 •

edited

Loading