Skip to content

fix(mdx): add regex fallback for MDX dependency detection on compilation failure#10174

Merged
GiladShoham merged 10 commits intomasterfrom
fix/mdx-detector-fallback
Feb 2, 2026
Merged

fix(mdx): add regex fallback for MDX dependency detection on compilation failure#10174
GiladShoham merged 10 commits intomasterfrom
fix/mdx-detector-fallback

Conversation

@GiladShoham
Copy link
Copy Markdown
Member

MDX v3 compilation can fail on files with legacy syntax (HTML comments, escaped characters, bare variable declarations, unclosed tags). When this happens, the dependency detector now falls back to regex-based import extraction instead of crashing.

Also uses a lighter MDX options config (no rehype plugins) for detection to avoid failures from rehypeMdxCodeProps on legacy code fence meta syntax (e.g. live=true).

Copilot AI review requested due to automatic review settings February 1, 2026 19:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds resilience to MDX dependency detection by implementing a regex-based fallback when MDX v3 compilation fails on legacy syntax. The fix addresses compatibility issues with older MDX files that may contain HTML comments, escaped characters, bare variable declarations, or unclosed tags.

Changes:

  • Modified MDXDependencyDetector to use lighter MDX options (excluding problematic rehype plugins) for compilation
  • Added regex-based import detection as a fallback when compilation fails
  • Enhanced the detector with logger support and error handling

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
scopes/mdx/mdx/mdx.main.runtime.ts Passes logger instance to MDXDependencyDetector constructor
scopes/mdx/mdx/mdx.detector.ts Implements fallback regex-based dependency detection, adds error handling, and introduces lighter MDX options for detection

Comment on lines +65 to 67
// isSupported is always called immediately before detect() for the same file.
this.currentFilename = context.filename;
return this.supportedExtensions.includes(context.ext);
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is misleading. While isSupported is called before detect is extracted (precinct/index.ts:123-125), the detect method is not invoked until later (line 210), potentially after other files have been processed. This delay is precisely why the race condition with currentFilename exists (see related comment on lines 51-52).

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +48
const importRegex = /import\s+(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)*\s*from\s*['"]([^'"]+)['"]/g;
const modules: string[] = [];
let match: RegExpExecArray | null;
while ((match = importRegex.exec(source)) !== null) {
modules.push(match[1]);
}
return modules;
}

Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new fallback functionality (regex-based import detection and error handling) lacks test coverage. Consider adding tests for:

  1. Files with legacy MDX syntax that trigger the fallback (HTML comments, unclosed tags, etc.)
  2. Verifying the regex correctly extracts imports in fallback scenarios
  3. Edge cases like multiline imports, mixed quote styles, and TypeScript type imports

Note that existing tests in mdx.detector.spec.ts are commented out, so adding new tests would require resolving the ESM import issues first.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines +51 to +52
private currentFilename?: string;

Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using instance state (currentFilename) creates a race condition in concurrent file processing. The detector's isSupported method is called, then the detect method is extracted and returned (see precinct/index.ts line 125), and finally called later (line 210). Between isSupported and detect, other files may be processed, causing currentFilename to be overwritten with the wrong filename.

Since the detect method signature is defined by the DependencyDetector interface and cannot include a filename parameter, consider removing the filename from warning messages entirely, or passing the filename through a different mechanism (e.g., a WeakMap keyed by source content, though this has its own trade-offs).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 1, 2026

@GiladShoham I've opened a new pull request, #10175, to work on those changes. Once the pull request is ready, I'll request review from you.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 1, 2026 20:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

Comment on lines +84 to +85
const msg = `MDX compilation failed, falling back to regex-based import detection.${fileRef} Error: ${err.message}`;
this.logger?.consoleWarning(msg);
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message includes err.message which could potentially expose sensitive information about the file structure, internal paths, or implementation details to end users. Consider sanitizing or truncating the error message, or logging the full error details only to the logger (not to console) and showing a simplified message to users via consoleWarning.

Suggested change
const msg = `MDX compilation failed, falling back to regex-based import detection.${fileRef} Error: ${err.message}`;
this.logger?.consoleWarning(msg);
const userMsg = `MDX compilation failed, falling back to regex-based import detection.${fileRef}`;
this.logger?.consoleWarning(userMsg);
// Log full error details internally without exposing them via consoleWarning.
const detailedMsg = `MDX compilation error${fileRef}: ${err?.message || String(err)}`;
(this.logger as any)?.error?.(detailedMsg, err);

Copilot uses AI. Check for mistakes.
Comment on lines 25 to +68
isSupported(context: FileContext): boolean {
// Capture filename for use in detect() warning messages.
// isSupported is always called immediately before detect() for the same file.
this.currentFilename = context.filename;
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation assumes that isSupported() is "always called immediately before detect() for the same file" (line 67), but this assumption may not hold in all usage contexts. In precinct/index.ts:122-126, the detect method is extracted and may be called later as a detached function, potentially processing different files in between. This could lead to currentFilename being incorrect when detect() is called, resulting in misleading error messages. Consider passing the filename as a parameter to detect() if the DependencyDetector interface allows it, or document this as a known limitation.

Copilot uses AI. Check for mistakes.
*/
function detectImportsWithRegex(source: string): string[] {
const importRegex =
/import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g;
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern \{[^}]*\} will fail to match nested braces in destructured imports, such as import { foo: { bar } } from 'module'. While this is an uncommon pattern, it is valid JavaScript and could appear in MDX files. Consider using a more robust parser or documenting this limitation.

Suggested change
/import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g;
/import\s+(?:type\s+)?(?:[^'"\\r\\n]*?)\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g;

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +41
const importRegex =
/import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g;
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern does not handle multiline imports, which are valid in JavaScript/MDX. For example, imports like:

import {
  Component,
  useState
} from 'react'

will not be detected. Consider adding the s (dotAll) flag or modifying the regex to handle newlines within the import statement. Alternatively, document this limitation explicitly in the function's JSDoc comment.

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +32
/**
* MDX options for dependency detection only.
* Uses the remark plugins from mdxOptions (for frontmatter and import extraction)
* but excludes rehype plugins like rehypeMdxCodeProps that fail on legacy
* code fence meta syntax (e.g. `live=true`).
*/
const detectorMdxOptions = {
remarkPlugins: mdxOptions.remarkPlugins,
jsxImportSource: mdxOptions.jsxImportSource,
};
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new detectorMdxOptions configuration removes rehype plugins to avoid failures with legacy syntax, but this changes the compilation behavior compared to the original mdxOptions. This could lead to inconsistencies where the detector successfully compiles with the lighter config, but the actual MDX compilation later fails with the full config. Consider documenting this tradeoff and whether it could lead to false positives in dependency detection.

Copilot uses AI. Check for mistakes.
*/
function detectImportsWithRegex(source: string): string[] {
const importRegex =
/import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g;
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex fallback does not handle export ... from statements (e.g., export { Component } from 'module' or export * from 'module'), which create dependencies just like imports. While less common in MDX files, these are valid JavaScript syntax that could appear in MDX code blocks and should be detected to ensure complete dependency resolution.

Suggested change
/import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g;
/(?:import|export)\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g;

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +49
function detectImportsWithRegex(source: string): string[] {
const importRegex =
/import\s+(?:type\s+)?(?:(?:\{[^}]*\}|\*\s+as\s+\w+|\w+)\s*,?\s*)?\s*from\s*['"]([^'"]+)['"]|import\s+['"]([^'"]+)['"]/g;
const modules: string[] = [];
let match: RegExpExecArray | null;
while ((match = importRegex.exec(source)) !== null) {
const moduleName = match[1] || match[2];
if (moduleName) modules.push(moduleName);
}
return modules;
}
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new regex fallback functionality lacks test coverage. While the original tests are commented out (line 7), the new fallback logic should be tested to ensure it correctly handles:

  1. Various import patterns (default, named, star, side-effect)
  2. Type imports
  3. Multiline imports
  4. Edge cases where MDX compilation fails
  5. Error handling and logging behavior

Consider adding tests for the detectImportsWithRegex function and the fallback path in the detect method.

Copilot uses AI. Check for mistakes.
Copilot AI and others added 2 commits February 2, 2026 14:54
…mports (#10175)

Addresses feedback requesting test coverage for the MDX detector's
regex-based fallback that handles legacy MDX syntax incompatible with
MDX v3.

## Changes

**Test Coverage (40+ test cases)**
- Valid MDX syntax: default, named, star, and side-effect imports
- Regex fallback triggers: HTML comments, unclosed tags, escaped
characters, bare variables
- Edge cases: type imports, mixed quotes, scoped packages, relative
paths, whitespace, code blocks

**Bug Fix: Side-Effect Imports**
The original regex didn't match `import "module"` statements. Updated
pattern now handles both:
```javascript
// Both patterns now work
import React from 'react';     // ✓ Pattern A: from imports
import './styles.css';          // ✓ Pattern B: side-effect imports (was broken)
```

**Code Quality**
- Extracted regex to `IMPORT_STATEMENT_REGEX` constant with detailed
documentation
- Documented pattern structure and capture groups with examples
- Noted acceptable limitations: matches imports in comments/code blocks
(trade-off for fallback simplicity)

## Implementation

The regex uses two alternation patterns after `import\s+`:
- **Pattern A** (group 1): `[type] <specifiers> from "module"` - handles
all standard import forms
- **Pattern B** (group 2): `"module"` - handles side-effect imports

The `import\s+` prefix outside the alternation ensures both patterns
require the import keyword, preventing false matches on arbitrary quoted
strings.

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in
our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: GiladShoham <3840769+GiladShoham@users.noreply.github.com>
Co-authored-by: Gilad Shoham <shoham.gilad@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 2, 2026 12:57
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment on lines 25 to +108
isSupported(context: FileContext): boolean {
// Capture filename for use in detect() warning messages.
// isSupported is always called immediately before detect() for the same file.
this.currentFilename = context.filename;
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "isSupported is always called immediately before detect() for the same file," but this is only true by accident due to a cache key mismatch in the precinct code. The detector caches functions at typeToDetective[ext] (e.g., ".mdx"), but lookups use typeToDetective[type] where type="" for MDX files since they're not in the extToType map. This prevents caching from working, ensuring isSupported is called for every file.

While this happens to make currentFilename tracking work correctly, it's fragile and relies on unintended behavior. Consider either:

  1. Removing the currentFilename tracking and omitting filename from the warning
  2. Adding a comment explaining this quirk
  3. Passing filename directly to detect() if the DependencyDetector interface is updated in the future

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +102
// Bind detect to preserve `this` context when passed as a detached function.
this.detect = this.detect.bind(this);
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method binding in the constructor is unnecessary for isSupported since it's always called through the detector object (not extracted), but it is necessary for detect since that method is extracted and cached. However, due to the caching mechanism, the binding alone doesn't solve the stale currentFilename issue identified in the previous comment. If the filename tracking is removed, the binding for detect should be kept for safety, but the binding for isSupported can be removed.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 2, 2026 13:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

*
* Limitations:
* - Will match imports in comments (e.g., // import "x")
* - Will match imports in code blocks/strings if they appear to be syntactically valid
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern has a potential issue with matching certain edge cases. The pattern \{[^}]*\} for named imports will fail on nested braces or complex destructuring patterns like import { a: { b } } from 'module'. While this is an uncommon pattern and the fallback is meant for error cases, consider documenting this limitation in the regex comment section alongside the existing limitations about code blocks and comments.

Suggested change
* - Will match imports in code blocks/strings if they appear to be syntactically valid
* - Will match imports in code blocks/strings if they appear to be syntactically valid
* - Uses a simple `{[^}]*}` pattern for named imports, so it does not support nested braces or
* complex destructuring patterns (e.g. `import { a: { b } } from "module"`); this is acceptable
* for this best-effort fallback and may cause such patterns to be skipped or partially matched.

Copilot uses AI. Check for mistakes.
@GiladShoham GiladShoham merged commit 2d92cae into master Feb 2, 2026
18 checks passed
@GiladShoham GiladShoham deleted the fix/mdx-detector-fallback branch February 2, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants