feat(search): add fuzzy match tier for typo tolerance by x3ek · Pull Request #124 · xeek-dev/squishmark

x3ek · 2026-07-03T16:36:59Z

Closes #103

Adds a third, lowest-ranked match tier to the search scorer using stdlib difflib: query tokens of length >= 4 that fail exact and prefix match against field tokens at SequenceMatcher ratio >= 0.8.

Ranking guarantee: weights alone cannot keep every fuzzy hit below every real hit across fields (a fuzzy title weight would outscore an exact body weight), so posts that needed any fuzzy match sort behind posts matched entirely by exact/prefix; the third-tier weights only order posts within the fuzzy class. AND semantics, the /search contract, and the index format are unchanged.

Performance: fuzzy only runs for tokens that already failed exact and prefix, with a real_quick_ratio pre-filter; measured ~0.6 ms per post worst case against a 500-token body vocabulary. rapidfuzz remains the escape hatch per the issue if content scale grows.

Tests: 9 new cases covering the 0.8 boundary, short-token exclusion, typo recall (gumob, intergalatic), in-field tier ordering, cross-field never-outrank, and AND semantics. One existing test repointed: its old probe token now legitimately fuzzy-matches, kept as a positive fuzzy case.

Live-verified: /search?q=gumob and /search?q=intergalatic both return the gumbo post; exact queries unaffected.

🤖 Generated with Claude Code

Query tokens of 4+ chars that fail exact and prefix now match field tokens at difflib SequenceMatcher ratio 0.8 or higher. Tier ordering is strict (exact > prefix > fuzzy) and posts needing any fuzzy match always rank behind posts matched entirely by exact/prefix. AND semantics and the /search contract are unchanged; the index format is untouched. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR extends SquishMark’s server-side search scorer with a third (lowest) fuzzy-matching tier to improve typo tolerance, while keeping AND semantics and the /search response contract intact.

Changes:

Added a fuzzy tier (stdlib difflib.SequenceMatcher) to token scoring, gated by minimum token length and a ratio threshold.
Updated ranking to ensure posts that require any fuzzy-only token match sort behind posts matched entirely via exact/prefix.
Added focused unit tests for fuzzy threshold boundaries, short-token exclusion, ranking behavior, and AND semantics.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/squishmark/services/search.py`	Adds fuzzy token matching, updates scoring to track fuzzy dependence, and adjusts sort keys to keep fuzzy-dependent posts behind real matches.
`tests/test_search.py`	Adds unit tests validating fuzzy recall, threshold behavior, and ranking guarantees.

Keeps fuzzy from perturbing order among exact/prefix posts and skips the fuzzy scan entirely on queries with real hits.

x3ek added this to the SquishMark 1.0 milestone Jul 3, 2026

x3ek requested a review from Copilot July 3, 2026 16:37

Copilot started reviewing on behalf of x3ek July 3, 2026 16:37 View session

Copilot AI reviewed Jul 3, 2026

View reviewed changes

Comment thread src/squishmark/services/search.py Outdated

fix(search): evaluate fuzzy tier only when a token has no real match

9343aad

Keeps fuzzy from perturbing order among exact/prefix posts and skips the fuzzy scan entirely on queries with real hits.

x3ek merged commit ada0721 into main Jul 3, 2026
5 checks passed

x3ek deleted the feat/103-fuzzy-search branch July 3, 2026 17:37

x3ek mentioned this pull request Jul 3, 2026

chore(main): release 0.3.0 #102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(search): add fuzzy match tier for typo tolerance#124

feat(search): add fuzzy match tier for typo tolerance#124
x3ek merged 2 commits into
mainfrom
feat/103-fuzzy-search

x3ek commented Jul 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

x3ek commented Jul 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants