fix(mdx): add regex fallback for MDX dependency detection on compilation failure#10174
fix(mdx): add regex fallback for MDX dependency detection on compilation failure#10174GiladShoham merged 10 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request adds resilience to MDX dependency detection by implementing a regex-based fallback when MDX v3 compilation fails on legacy syntax. The fix addresses compatibility issues with older MDX files that may contain HTML comments, escaped characters, bare variable declarations, or unclosed tags.
Changes:
- Modified
MDXDependencyDetectorto use lighter MDX options (excluding problematic rehype plugins) for compilation - Added regex-based import detection as a fallback when compilation fails
- Enhanced the detector with logger support and error handling
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| scopes/mdx/mdx/mdx.main.runtime.ts | Passes logger instance to MDXDependencyDetector constructor |
| scopes/mdx/mdx/mdx.detector.ts | Implements fallback regex-based dependency detection, adds error handling, and introduces lighter MDX options for detection |
| // isSupported is always called immediately before detect() for the same file. | ||
| this.currentFilename = context.filename; | ||
| return this.supportedExtensions.includes(context.ext); |
There was a problem hiding this comment.
This comment is misleading. While isSupported is called before detect is extracted (precinct/index.ts:123-125), the detect method is not invoked until later (line 210), potentially after other files have been processed. This delay is precisely why the race condition with currentFilename exists (see related comment on lines 51-52).
scopes/mdx/mdx/mdx.detector.ts
Outdated
| const importRegex = /import\s+(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)*\s*from\s*['"]([^'"]+)['"]/g; | ||
| const modules: string[] = []; | ||
| let match: RegExpExecArray | null; | ||
| while ((match = importRegex.exec(source)) !== null) { | ||
| modules.push(match[1]); | ||
| } | ||
| return modules; | ||
| } | ||
|
|
There was a problem hiding this comment.
The new fallback functionality (regex-based import detection and error handling) lacks test coverage. Consider adding tests for:
- Files with legacy MDX syntax that trigger the fallback (HTML comments, unclosed tags, etc.)
- Verifying the regex correctly extracts imports in fallback scenarios
- Edge cases like multiline imports, mixed quote styles, and TypeScript type imports
Note that existing tests in mdx.detector.spec.ts are commented out, so adding new tests would require resolving the ESM import issues first.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| private currentFilename?: string; | ||
|
|
There was a problem hiding this comment.
Using instance state (currentFilename) creates a race condition in concurrent file processing. The detector's isSupported method is called, then the detect method is extracted and returned (see precinct/index.ts line 125), and finally called later (line 210). Between isSupported and detect, other files may be processed, causing currentFilename to be overwritten with the wrong filename.
Since the detect method signature is defined by the DependencyDetector interface and cannot include a filename parameter, consider removing the filename from warning messages entirely, or passing the filename through a different mechanism (e.g., a WeakMap keyed by source content, though this has its own trade-offs).
|
@GiladShoham I've opened a new pull request, #10175, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| const msg = `MDX compilation failed, falling back to regex-based import detection.${fileRef} Error: ${err.message}`; | ||
| this.logger?.consoleWarning(msg); |
There was a problem hiding this comment.
The error message includes err.message which could potentially expose sensitive information about the file structure, internal paths, or implementation details to end users. Consider sanitizing or truncating the error message, or logging the full error details only to the logger (not to console) and showing a simplified message to users via consoleWarning.
| const msg = `MDX compilation failed, falling back to regex-based import detection.${fileRef} Error: ${err.message}`; | |
| this.logger?.consoleWarning(msg); | |
| const userMsg = `MDX compilation failed, falling back to regex-based import detection.${fileRef}`; | |
| this.logger?.consoleWarning(userMsg); | |
| // Log full error details internally without exposing them via consoleWarning. | |
| const detailedMsg = `MDX compilation error${fileRef}: ${err?.message || String(err)}`; | |
| (this.logger as any)?.error?.(detailedMsg, err); |
| isSupported(context: FileContext): boolean { | ||
| // Capture filename for use in detect() warning messages. | ||
| // isSupported is always called immediately before detect() for the same file. | ||
| this.currentFilename = context.filename; |
There was a problem hiding this comment.
The implementation assumes that isSupported() is "always called immediately before detect() for the same file" (line 67), but this assumption may not hold in all usage contexts. In precinct/index.ts:122-126, the detect method is extracted and may be called later as a detached function, potentially processing different files in between. This could lead to currentFilename being incorrect when detect() is called, resulting in misleading error messages. Consider passing the filename as a parameter to detect() if the DependencyDetector interface allows it, or document this as a known limitation.
scopes/mdx/mdx/mdx.detector.ts
Outdated
| */ | ||
| function detectImportsWithRegex(source: string): string[] { | ||
| const importRegex = | ||
| /import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g; |
There was a problem hiding this comment.
The regex pattern \{[^}]*\} will fail to match nested braces in destructured imports, such as import { foo: { bar } } from 'module'. While this is an uncommon pattern, it is valid JavaScript and could appear in MDX files. Consider using a more robust parser or documenting this limitation.
| /import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g; | |
| /import\s+(?:type\s+)?(?:[^'"\\r\\n]*?)\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g; |
scopes/mdx/mdx/mdx.detector.ts
Outdated
| const importRegex = | ||
| /import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g; |
There was a problem hiding this comment.
The regex pattern does not handle multiline imports, which are valid in JavaScript/MDX. For example, imports like:
import {
Component,
useState
} from 'react'
will not be detected. Consider adding the s (dotAll) flag or modifying the regex to handle newlines within the import statement. Alternatively, document this limitation explicitly in the function's JSDoc comment.
scopes/mdx/mdx/mdx.detector.ts
Outdated
| /** | ||
| * MDX options for dependency detection only. | ||
| * Uses the remark plugins from mdxOptions (for frontmatter and import extraction) | ||
| * but excludes rehype plugins like rehypeMdxCodeProps that fail on legacy | ||
| * code fence meta syntax (e.g. `live=true`). | ||
| */ | ||
| const detectorMdxOptions = { | ||
| remarkPlugins: mdxOptions.remarkPlugins, | ||
| jsxImportSource: mdxOptions.jsxImportSource, | ||
| }; |
There was a problem hiding this comment.
The new detectorMdxOptions configuration removes rehype plugins to avoid failures with legacy syntax, but this changes the compilation behavior compared to the original mdxOptions. This could lead to inconsistencies where the detector successfully compiles with the lighter config, but the actual MDX compilation later fails with the full config. Consider documenting this tradeoff and whether it could lead to false positives in dependency detection.
scopes/mdx/mdx/mdx.detector.ts
Outdated
| */ | ||
| function detectImportsWithRegex(source: string): string[] { | ||
| const importRegex = | ||
| /import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g; |
There was a problem hiding this comment.
The regex fallback does not handle export ... from statements (e.g., export { Component } from 'module' or export * from 'module'), which create dependencies just like imports. While less common in MDX files, these are valid JavaScript syntax that could appear in MDX code blocks and should be detected to ensure complete dependency resolution.
| /import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g; | |
| /(?:import|export)\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g; |
scopes/mdx/mdx/mdx.detector.ts
Outdated
| function detectImportsWithRegex(source: string): string[] { | ||
| const importRegex = | ||
| /import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g; | ||
| const modules: string[] = []; | ||
| let match: RegExpExecArray | null; | ||
| while ((match = importRegex.exec(source)) !== null) { | ||
| const moduleName = match[1] || match[2]; | ||
| if (moduleName) modules.push(moduleName); | ||
| } | ||
| return modules; | ||
| } |
There was a problem hiding this comment.
The new regex fallback functionality lacks test coverage. While the original tests are commented out (line 7), the new fallback logic should be tested to ensure it correctly handles:
- Various import patterns (default, named, star, side-effect)
- Type imports
- Multiline imports
- Edge cases where MDX compilation fails
- Error handling and logging behavior
Consider adding tests for the detectImportsWithRegex function and the fallback path in the detect method.
…mports (#10175) Addresses feedback requesting test coverage for the MDX detector's regex-based fallback that handles legacy MDX syntax incompatible with MDX v3. ## Changes **Test Coverage (40+ test cases)** - Valid MDX syntax: default, named, star, and side-effect imports - Regex fallback triggers: HTML comments, unclosed tags, escaped characters, bare variables - Edge cases: type imports, mixed quotes, scoped packages, relative paths, whitespace, code blocks **Bug Fix: Side-Effect Imports** The original regex didn't match `import "module"` statements. Updated pattern now handles both: ```javascript // Both patterns now work import React from 'react'; // ✓ Pattern A: from imports import './styles.css'; // ✓ Pattern B: side-effect imports (was broken) ``` **Code Quality** - Extracted regex to `IMPORT_STATEMENT_REGEX` constant with detailed documentation - Documented pattern structure and capture groups with examples - Noted acceptable limitations: matches imports in comments/code blocks (trade-off for fallback simplicity) ## Implementation The regex uses two alternation patterns after `import\s+`: - **Pattern A** (group 1): `[type] <specifiers> from "module"` - handles all standard import forms - **Pattern B** (group 2): `"module"` - handles side-effect imports The `import\s+` prefix outside the alternation ensures both patterns require the import keyword, preventing false matches on arbitrary quoted strings. <!-- START COPILOT CODING AGENT TIPS --> --- 💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey). --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: GiladShoham <3840769+GiladShoham@users.noreply.github.com> Co-authored-by: Gilad Shoham <shoham.gilad@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| isSupported(context: FileContext): boolean { | ||
| // Capture filename for use in detect() warning messages. | ||
| // isSupported is always called immediately before detect() for the same file. | ||
| this.currentFilename = context.filename; |
There was a problem hiding this comment.
The comment states "isSupported is always called immediately before detect() for the same file," but this is only true by accident due to a cache key mismatch in the precinct code. The detector caches functions at typeToDetective[ext] (e.g., ".mdx"), but lookups use typeToDetective[type] where type="" for MDX files since they're not in the extToType map. This prevents caching from working, ensuring isSupported is called for every file.
While this happens to make currentFilename tracking work correctly, it's fragile and relies on unintended behavior. Consider either:
- Removing the currentFilename tracking and omitting filename from the warning
- Adding a comment explaining this quirk
- Passing filename directly to detect() if the DependencyDetector interface is updated in the future
| // Bind detect to preserve `this` context when passed as a detached function. | ||
| this.detect = this.detect.bind(this); | ||
| } |
There was a problem hiding this comment.
The method binding in the constructor is unnecessary for isSupported since it's always called through the detector object (not extracted), but it is necessary for detect since that method is extracted and cached. However, due to the caching mechanism, the binding alone doesn't solve the stale currentFilename issue identified in the previous comment. If the filename tracking is removed, the binding for detect should be kept for safety, but the binding for isSupported can be removed.
| * | ||
| * Limitations: | ||
| * - Will match imports in comments (e.g., // import "x") | ||
| * - Will match imports in code blocks/strings if they appear to be syntactically valid |
There was a problem hiding this comment.
The regex pattern has a potential issue with matching certain edge cases. The pattern \{[^}]*\} for named imports will fail on nested braces or complex destructuring patterns like import { a: { b } } from 'module'. While this is an uncommon pattern and the fallback is meant for error cases, consider documenting this limitation in the regex comment section alongside the existing limitations about code blocks and comments.
| * - Will match imports in code blocks/strings if they appear to be syntactically valid | |
| * - Will match imports in code blocks/strings if they appear to be syntactically valid | |
| * - Uses a simple `{[^}]*}` pattern for named imports, so it does not support nested braces or | |
| * complex destructuring patterns (e.g. `import { a: { b } } from "module"`); this is acceptable | |
| * for this best-effort fallback and may cause such patterns to be skipped or partially matched. |
MDX v3 compilation can fail on files with legacy syntax (HTML comments, escaped characters, bare variable declarations, unclosed tags). When this happens, the dependency detector now falls back to regex-based import extraction instead of crashing.
Also uses a lighter MDX options config (no rehype plugins) for detection to avoid failures from
rehypeMdxCodePropson legacy code fence meta syntax (e.g.live=true).