Skip to content

Extract class candidates from inside parenthesized expressions#19775

Closed
mvanhorn wants to merge 1 commit intotailwindlabs:mainfrom
mvanhorn:osc/19458-fix-data-loading-class-extraction
Closed

Extract class candidates from inside parenthesized expressions#19775
mvanhorn wants to merge 1 commit intotailwindlabs:mainfrom
mvanhorn:osc/19458-fix-data-loading-class-extraction

Conversation

@mvanhorn
Copy link

Summary

Fixes #19458

Tailwind's content scanner doesn't extract class names from inside function-call-like syntax in data attributes such as data-loading="addClass(opacity-50)" (used by Symfony Live Components, Alpine.js, HTMX, etc.).

The fix adds ( and ) as boundary characters in the Rust extractor. This allows the scanner to recognize class names inside parenthesized expressions like addClass(opacity-50) since the parentheses now act as valid candidate boundaries.

This may produce a few additional harmless false-positive candidates (function names like addClass itself), but Tailwind ignores tokens that don't match any utility class, so there's no functional impact.

Test plan

  • Added Rust test: data-loading="addClass(opacity-50)" extracts opacity-50
  • Added Rust test: data-loading="addClass(opacity-50) removeClass(hidden)" extracts both candidates
  • Updated existing test expectations to account for new boundary behavior
  • cargo test passes (all Rust tests)
  • pnpm build && pnpm test passes (all 4620 Vitest tests)

This contribution was developed with AI assistance (Claude Code).

Add `(` and `)` as Common boundary characters so the content scanner
extracts class names from inside function-call-like syntax in data
attributes (e.g. `data-loading="addClass(opacity-50)"`).

This pattern is used by Symfony Live Components, Alpine.js, and similar
frameworks. Previously, parentheses broke candidate extraction because
they were not recognized as valid boundaries.

Fixes tailwindlabs#19458

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

Walkthrough

The change expands the boundary detection logic in the extractor to recognize parentheses '(' and ')' as boundary delimiters alongside null bytes. This enables the tokenizer to properly handle function-call-like syntax within data attributes (e.g., data-loading="addClass(opacity-50)"). The production change is minimal—updating the Common variant of the Class enum in boundary.rs. The remaining changes are test updates across multiple files to reflect the new extraction behavior, including tokens previously ignored in parenthetical contexts and nested function calls. Whitespace adjustments in a HAML test fixture align indentation without functional changes.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding parentheses as boundary characters to enable class extraction from parenthesized expressions.
Description check ✅ Passed The description clearly relates to the changeset by explaining the fix for extracting classes from parenthesized data attributes and detailing test coverage.
Linked Issues check ✅ Passed The PR successfully addresses issue #19458 by adding parentheses as boundary characters to extract class names from function-call-like syntax in data attributes.
Out of Scope Changes check ✅ Passed All changes are scoped to the stated objectives: boundary character additions, test updates, and fixture adjustments for supporting parenthesized expression extraction.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/oxide/src/extractor/boundary.rs`:
- Around line 60-66: Change the paren boundary mapping so '(' is a Before
boundary and ')' is an After boundary instead of both being mapped to Common: in
the enum where Common, Before and After are annotated with #[bytes(...)] replace
#[bytes(b'(', b')')] on Common with #[bytes(b'\0')], add #[bytes(b'(')] to the
Before variant, and add #[bytes(b')')] to the After variant so candidates start
after '(' and end before ')'.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 908fecb2-5648-41d0-a8b3-6eb84a398b80

📥 Commits

Reviewing files that changed from the base of the PR and between c586bd6 and d957ba3.

📒 Files selected for processing (6)
  • crates/oxide/src/extractor/boundary.rs
  • crates/oxide/src/extractor/candidate_machine.rs
  • crates/oxide/src/extractor/mod.rs
  • crates/oxide/src/extractor/named_utility_machine.rs
  • crates/oxide/src/extractor/pre_processors/test-fixtures/haml/dst-17051.haml
  • crates/oxide/src/extractor/utility_machine.rs

Comment on lines +60 to +66
// Function-call-like syntax in data attributes, e.g.:
//
// ```
// <div data-loading="addClass(opacity-50)">
// ^ ^
// ```
#[bytes(b'(', b')')]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Split ( and ) into directional boundaries instead of Common.

Line 66 makes ( valid as an after-boundary and ) valid as a before-boundary, which broadens extraction beyond the regression you’re fixing. You can already see that downstream: crates/oxide/src/extractor/mod.rs Line 446 now extracts bg-[red] from bg-[red](--my-color), and the updated tests start accepting wrapper names like addClass, fn, and variant. The issue only needs candidates to start after ( and end before ), so ( should be Before and ) should be After.

♻️ Narrow the paren boundary scope
@@
-    #[bytes(b'\0')]
-    // Function-call-like syntax in data attributes, e.g.:
-    //
-    // ```
-    // <div data-loading="addClass(opacity-50)">
-    //                            ^           ^
-    // ```
-    #[bytes(b'(', b')')]
+    #[bytes(b'\0')]
     Common,
@@
-    #[bytes(b'>')]
+    #[bytes(b'>')]
+    #[bytes(b'(')]
     Before,
@@
-    #[bytes(b'<')]
+    #[bytes(b'<')]
+    #[bytes(b')')]
     After,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Function-call-like syntax in data attributes, e.g.:
//
// ```
// <div data-loading="addClass(opacity-50)">
// ^ ^
// ```
#[bytes(b'(', b')')]
#[bytes(b'\0')]
Common,
#[bytes(b'>')]
#[bytes(b'(')]
Before,
#[bytes(b'<')]
#[bytes(b')')]
After,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/oxide/src/extractor/boundary.rs` around lines 60 - 66, Change the
paren boundary mapping so '(' is a Before boundary and ')' is an After boundary
instead of both being mapped to Common: in the enum where Common, Before and
After are annotated with #[bytes(...)] replace #[bytes(b'(', b')')] on Common
with #[bytes(b'\0')], add #[bytes(b'(')] to the Before variant, and add
#[bytes(b')')] to the After variant so candidates start after '(' and end before
')'.

@RobinMalfait
Copy link
Member

Hey! Thanks for the PR but going to say no to this solution.

This may produce a few additional harmless false-positive candidates

While they are false-positives, they are often not just a few candidates, these can be hundreds of additional candidates. It's true that Tailwind's core will skip them most of the time, but we still have to pass them from Rust to the Node.js world and process them to even know that they are valid classes or not.

I tried it on a few repo's, small and big.

  • 33183380 +62 candidates (one of our templates)
  • 17321862 +130 candidates (my personal website)
  • 1353213861 +329 candidates (tailwindcss.com)
  • 1030910711 +402 candidates (tailwindcss.com/plus)

In the grand scheme of things, these are relatively low numbers, but they add up over time especially knowing that these additional candidates are all false positives because this was tested on repo's that don't have this issue as described in the original issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Source classes are not detected correctly when specified as data-loading="addClass(opacity-50)"

2 participants