fix(now-playing): match tracks when equipment embeds "(ft. …)" in the title#501
Conversation
… title DJ equipment (StageLinQ/Denon, streaming metadata) reports featured artists inside the track title (e.g. "Get Low (ft. Ying Yang Twins)") and often only the primary artist, while guests request the plain title listing every collaborator. The fuzzy matcher left "(ft. …)" in the normalized title, collapsing the score below the 0.8 threshold, so accepted requests were never auto-matched and thus never marked played. Reconnecting re-POSTed the same verbose title and re-ran the same failing match. - normalize_track_title strips featured-artist credits (parenthetical, bracketed, and trailing "feat./ft./featuring …"). Named remixes/versions are preserved; "with" is excluded to avoid eating real title words. - fuzzy_match_pending_request uses the multi-artist-aware artist_match_score so "Lil Jon" matches "Lil Jon, The EastSide Boyz, Ying Yang Twins". Regression tests pin the production case (event RMLNDZ / request #491). Closes #500 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 2 hours, 55 minutes, and 27 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more credits in the billing tab to continue. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds two regex patterns ( ChangesFeatured-artist credit stripping and multi-artist matching fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
server/app/services/track_normalizer.py (1)
28-39: Regex patterns have theoretical ReDoS vulnerability, but input constraints mitigate practical risk.Both
FEAT_PAREN_REandFEAT_TRAILING_REexhibit backtracking under pathological input. However, all user-facing endpoints enforcemax_length=255on track titles via Pydantic validation, and testing confirms realistic payloads complete in <1ms:
- 255-char payload: 0.0006s
- 300-char payload: 0.0008s
The vulnerability only manifests with inputs >> 255 characters, which cannot be submitted through the API. While the patterns could be improved for defense-in-depth, this is not a practical security issue given the existing constraints.
Suggested improvement (optional): Constrain whitespace repetitions and use lazy quantifiers:
-FEAT_PAREN_RE = re.compile( - r"[\(\[]\s*(?:featuring|feat|ft)\.?\s+[^\)\]]*[\)\]]", - re.IGNORECASE, -) -FEAT_TRAILING_RE = re.compile( - r"\s+(?:featuring|feat|ft)\.?\s+.*$", - re.IGNORECASE, -) +FEAT_PAREN_RE = re.compile( + r"[\(\[]\s{0,3}(?:featuring|feat|ft)\.?\s{1,3}[^\)\]]{0,100}?[\)\]]", + re.IGNORECASE, +) +FEAT_TRAILING_RE = re.compile( + r"\s+(?:featuring|feat|ft)\.?\s{1,3}.{0,100}?$", + re.IGNORECASE, +)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/app/services/track_normalizer.py` around lines 28 - 39, The regex patterns FEAT_PAREN_RE and FEAT_TRAILING_RE use greedy quantifiers and unbounded whitespace matching that can cause excessive backtracking under pathological input. To improve regex performance and follow defense-in-depth principles, constrain the whitespace repetitions by limiting `\s+` occurrences and convert greedy quantifiers like `.*` in FEAT_TRAILING_RE to lazy quantifiers using `.*?` to reduce backtracking. Additionally, in FEAT_PAREN_RE, consider constraining the `[^\)\]]*` pattern to avoid unbounded matching. These changes will make the patterns more efficient while maintaining the same matching behavior for normal inputs.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@server/app/services/track_normalizer.py`:
- Around line 28-39: The regex patterns FEAT_PAREN_RE and FEAT_TRAILING_RE use
greedy quantifiers and unbounded whitespace matching that can cause excessive
backtracking under pathological input. To improve regex performance and follow
defense-in-depth principles, constrain the whitespace repetitions by limiting
`\s+` occurrences and convert greedy quantifiers like `.*` in FEAT_TRAILING_RE
to lazy quantifiers using `.*?` to reduce backtracking. Additionally, in
FEAT_PAREN_RE, consider constraining the `[^\)\]]*` pattern to avoid unbounded
matching. These changes will make the patterns more efficient while maintaining
the same matching behavior for normal inputs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 59daa100-5e77-4586-9a2f-969fe31880f4
📒 Files selected for processing (4)
server/app/services/now_playing.pyserver/app/services/track_normalizer.pyserver/tests/test_now_playing.pyserver/tests/test_track_normalizer.py
…tracking
CodeQL flagged FEAT_PAREN_RE / FEAT_TRAILING_RE as polynomial-regex (ReDoS):
an unbounded `\s+` adjacent to a class that also matches spaces backtracks
O(n^2) on long runs of whitespace. Track titles are capped at 255 chars by
Pydantic so it is not practically exploitable, but bound every quantifier
({0,3}/{1,3}/{0,100}/{0,150}) for defense in depth. Behavior is unchanged for
realistic titles. Adds a linear-time regression guard on the two patterns.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
CodeRabbit nitpick addressed (ReDoS in Note: the pre-existing |
…#11 (#517) * fix(security): bound _SPLIT_RE quantifiers to clear ReDoS alert (#516) _SPLIT_RE used unbounded \s+/\s* around literal delimiters, which CodeQL py/polynomial-redos flagged (alert #11): a long run of spaces backtracks quadratically. Not exploitable in prod (every caller caps artist at 255 chars and the public path collapses whitespace), but the regex itself was the last unbounded one in this file — the sibling FEAT_* regexes were already bounded in #501 for the same rule. Bound the whitespace quantifiers to \s{0,3}/\s{1,3} so the pattern is linear-time regardless of input length or caller. Add a ReDoS guard test mirroring the existing feat-regex guard. Closes #516 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: loosen ReDoS-guard wall-clock budget to 2.0s for slow CI runners CodeRabbit flagged the 0.2s threshold as potentially flaky on shared runners. A quadratic regression takes tens of seconds, so 2.0s still catches it with wide margin while removing any flake risk. Applied to both ReDoS guards in this file to keep them consistent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Why
On production event RMLNDZ, the DJ played "Get Low" and the bridge reported it correctly, but the accepted request stayed in the queue and was never marked played. The DJ also saw the reconnect-but-still-unmatched behaviour.
Ground truth from prod (event id 16):
Get Low (ft. Ying Yang Twins)byLil Jon & The Eastside Boyz.Get LowbyLil Jon, The EastSide Boyz, Ying Yang Twins.now_playing.matched_request_id= NULL — never matched.What
The fuzzy matcher's title normalizer only stripped generic mix suffixes, leaving featured-artist credits like
(ft. Ying Yang Twins)in the title. That collapsedSequenceMatcherto ~0.39, dropping the combined score below the 0.8 threshold so the request never auto-matched (and reconnecting just re-ran the same failing match). The artist side also used the raw ratio instead of the multi-artist-aware scorer.normalize_track_titlenow strips featured-artist credits — parenthetical, bracketed, and trailingfeat./ft./featuring …. Named remixes/versions are preserved;withis excluded so it doesn't eat real title words ("Dancing With Myself").fuzzy_match_pending_requestuses the multi-artist-awareartist_match_score, so equipment reporting onlyLil Jonstill matches a multi-artist request.Bridge-app side (collection-code-vs-join-code display) is a separate, independent fix and will be its own PR.
Testing
test_track_normalizer.py— feat-strip cases + guard forwith/remix preservation)test_now_playing.py— pins prod event RMLNDZ / request feat(setbuilder): autobuild + fill_to_duration agent tools (#442 Family 3) — gated on undo UX #491, incl. primary-artist-only case)🤖 Co-authored by Claude Opus 4.8. Closes #500.
Summary by CodeRabbit
Bug Fixes
Tests