Skip to content

Ensure @source globs with symlinks are preserved#20203

Merged
RobinMalfait merged 12 commits into
mainfrom
fix/issue-17985
Jun 7, 2026
Merged

Ensure @source globs with symlinks are preserved#20203
RobinMalfait merged 12 commits into
mainfrom
fix/issue-17985

Conversation

@RobinMalfait
Copy link
Copy Markdown
Member

@RobinMalfait RobinMalfait commented Jun 7, 2026

This PR fixes an issue when working with @source and @source not that involves symlinks.

Internally our sources are mapped to a source entry where we have a base path and a pattern. You can think about this where we insert a .gitignore file in the base path for the given pattern.

However, we optimize these entries to move as many "static" parts into the base path. For example:

@source "./some/folder/here/*.html";

Is mapped to something like:

{ base: "/projects/my-project", pattern: "./some/folder/here/*.html" }

We then optimize it by turning it into:

{ base: "/projects/my-project/some/folder/here", pattern: "*.html" }

While doing this, we also use dunce::canonicalize to resolve the actual paths on disk. This means that a symlink is resolved to their real paths. This can cause issues because the "real" path is not what you wrote in the @source directives.

So before, it could be that you have this:

@source "./some/symlinked-folder/here/*.html";

Which was mapped to this on the Rust side:

{ base: "/projects/my-project", pattern: "./some/symlinked-folder/here/*.html" }

But was then optimized to:

{ base: "/projects/my-project/some/actual-folder/here", pattern: "*.html" }

...and we lost the symlinked-folder information. This causes issues as seen in #17985.

With this PR, we keep the symlinked information in those globs since that's what you wrote in those @source directives.

While setting up integration tests, I stumbled upon an issue because I wanted to test that ignoring a symlinked folder, but including a single particular file of that ignored folder resulted in that file being ignored as well. Let's look at an example:

@source     '../lib';
@source not '../lib/ignored';
@source     '../lib/ignored/except.html';

Earlier I mentioned that we create .gitignore files based on these @source directives. In this case, when we're dealing with a folder, we use **/* as the contents.

Looking at the example above, we should essentially have something like this:

# lib/.gitignore
# @source '../lib'
!**/*

# lib/ignored/.gitignore
# @source '../lib/ignored'
**/*

# @source '../lib/ignored/except.html'
!except.html

Since it's a .gitignore file, we have to invert the globs. But the bug I noticed is that in reality the result of those gitignores didn't look like the above, it looked like:

# lib/.gitignore
# @source '../lib'
!**/*

# lib/ignored/.gitignore
# @source '../lib/ignored/except.html'
!except.html

# @source '../lib/ignored'
**/*

Notice how the !except.html and **/* are flipped. When dealing with .gitignore files, the order is important.

This was caused because internally we kept a BTreeMap of BTreeSets where the map was the base path and a set of patterns. The patterns were sorted because of the BTreeSet... which is not what we want.

Fixes: #17985
Closes: #20091

Test plan

  1. Added new tests in the scanner tests (on the Rust side)
  2. Added integration tests with a symlink to another folder, outside of the current folder
  3. Added integration tests with a symlink to another folder, inside of the current folder
  4. Added integration tests to ensure that the order of @source files with a folder + file is sorted correctly.
  5. Since we're dealing with symlinks in these tests, let's test all OSes [ci-all]

When we are dealing with `@source` patterns, we want to optimize them by
moving as many static parts from the pattern to the base path.

E.g.:
```
@source "./foo/bar/baz/*.html";
```

Will look like:
```
SourceEntry {
  base = "/projects/project-a",
  pattern = "/foo/bar/baz/*.html",
}
```

And becomes:
```
SourceEntry {
  base = "/projects/project-a/foo/bar/baz",
  pattern = "/*.html",
}
```

But with this change, we don't canonicalize them, meaning that if we
were referencing a symlink then we keep using the symlink in the
pattern. We won't use the resolved canonical path all of a sudden.
@RobinMalfait RobinMalfait requested a review from a team as a code owner June 7, 2026 17:06
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 7, 2026

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f60b1748-3591-498a-a8ae-fde64fdb6d82

📥 Commits

Reviewing files that changed from the base of the PR and between 76fdf97 and 8522472.

📒 Files selected for processing (1)
  • CHANGELOG.md

Walkthrough

This PR refactors the scanner's ignore-rule storage mechanism from BTreeMap/BTreeSet to a vector-based structure with an emit closure for grouping patterns, rewrites the source pattern optimization algorithm to use component hoisting for better handling of static and dynamic path segments, and adds comprehensive symlink support throughout the test infrastructure with corresponding scanner and CLI integration tests that validate negated @source rules exclude symlinked content correctly across sibling projects.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main fix: preserving symlink paths in @source globs during optimization.
Description check ✅ Passed The description thoroughly explains the symlink preservation issue, the BTreeSet ordering bug, and includes comprehensive test coverage details.
Linked Issues check ✅ Passed The PR fully addresses both issues: preserves symlink paths in optimized entries [#17985], maintains correct .gitignore rule ordering [#20091], and includes comprehensive tests across multiple test suites.
Out of Scope Changes check ✅ Passed All changes directly support the stated objectives: refactoring pattern aggregation to preserve order, component-based hoisting to preserve symlinks, test additions validating symlink behavior, and integration test support for symlinks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
integrations/utils.ts (3)

52-52: ⚡ Quick win

Parameter naming is semantically backwards.

The first parameter is what the symlink points to (typically called target or source), and the second is where the symlink is created (typically called link or destination). The current naming (dst, src) reverses this convention, which can confuse callers. Additionally, the implementation (line 319) names the first parameter target, creating an inconsistency with the signature.

♻️ Rename parameters for clarity
-    symlink(dst: string, src: string): Promise<void>
+    symlink(target: string, link: string): Promise<void>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integrations/utils.ts` at line 52, The symlink signature currently uses
semantically reversed names (symlink(dst: string, src: string)): rename
parameters to match implementation and common conventions (e.g., symlink(target:
string, link: string): Promise<void>) and update the implementation (the
function that currently references `target` at line ~319) to use the `link` name
for the path where the symlink is created; propagate the new parameter names to
any internal callers, JSDoc/comments, and exports so the API is consistent and
avoid breaking external callers by updating call sites accordingly.

319-329: 💤 Low value

Remove unnecessary target parent directory creation.

Creating the parent directory of the symlink target (lines 320-322) is unnecessary. Symlinks can point to non-existent targets, and when targets do exist, their parent directories are already created by earlier fs.write() calls in the test setup loop (lines 427-439). Only the parent of the symlink itself (srcParent) needs to exist.

♻️ Remove redundant mkdir
       async symlink(target, src) {
         let targetAbsolute = path.join(root, target)
-        let targetParent = path.dirname(targetAbsolute)
-        await fs.mkdir(targetParent, { recursive: true })
-
         let srcAbsolute = path.join(root, src)
         let srcParent = path.dirname(srcAbsolute)
         await fs.mkdir(srcParent, { recursive: true })

         await fs.symlink(targetAbsolute, srcAbsolute)
       },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integrations/utils.ts` around lines 319 - 329, The symlink helper
unnecessarily creates the target's parent directory; update the async symlink
function to stop creating targetParent and only ensure the symlink's parent
exists: remove or omit the fs.mkdir call that uses targetParent/targetAbsolute,
keep computing targetAbsolute for fs.symlink and preserve creation of srcParent
(srcAbsolute) with fs.mkdir({ recursive: true}) before calling
fs.symlink(targetAbsolute, srcAbsolute). Ensure references to targetAbsolute,
srcParent/srcAbsolute, and fs.symlink remain intact.

428-438: 💤 Low value

Clarify comment wording.

The comment "The symlink path is relative to the target destination's path" is ambiguous. It's unclear whether "target destination" refers to the symlink location or what the symlink points to.

✏️ Suggested rewording
         if (content.toString().startsWith('symlink:')) {
-          // The symlink path is relative to the target destination's path
+          // The path after 'symlink:' is relative to the symlink's location
           let target = path.join(
             filename,
             content.toString().slice('symlink:'.length),
           )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integrations/utils.ts` around lines 428 - 438, The comment is ambiguous about
what "target destination" refers to; update it to explicitly say that the string
after 'symlink:' is a path for the symlink target resolved relative to the
symlink's directory (i.e., the directory containing filename), and, if you
intended resolution against that directory, consider using
path.dirname(filename) when building target; reference the code paths using
content.toString().startsWith('symlink:'), the target variable, path.join(...),
filename, and context.fs.symlink.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@integrations/utils.ts`:
- Line 52: The symlink signature currently uses semantically reversed names
(symlink(dst: string, src: string)): rename parameters to match implementation
and common conventions (e.g., symlink(target: string, link: string):
Promise<void>) and update the implementation (the function that currently
references `target` at line ~319) to use the `link` name for the path where the
symlink is created; propagate the new parameter names to any internal callers,
JSDoc/comments, and exports so the API is consistent and avoid breaking external
callers by updating call sites accordingly.
- Around line 319-329: The symlink helper unnecessarily creates the target's
parent directory; update the async symlink function to stop creating
targetParent and only ensure the symlink's parent exists: remove or omit the
fs.mkdir call that uses targetParent/targetAbsolute, keep computing
targetAbsolute for fs.symlink and preserve creation of srcParent (srcAbsolute)
with fs.mkdir({ recursive: true}) before calling fs.symlink(targetAbsolute,
srcAbsolute). Ensure references to targetAbsolute, srcParent/srcAbsolute, and
fs.symlink remain intact.
- Around line 428-438: The comment is ambiguous about what "target destination"
refers to; update it to explicitly say that the string after 'symlink:' is a
path for the symlink target resolved relative to the symlink's directory (i.e.,
the directory containing filename), and, if you intended resolution against that
directory, consider using path.dirname(filename) when building target; reference
the code paths using content.toString().startsWith('symlink:'), the target
variable, path.join(...), filename, and context.fs.symlink.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 1dad4c43-5368-40b3-a8c9-f7d4db4b214c

📥 Commits

Reviewing files that changed from the base of the PR and between ad66939 and ab1cec0.

📒 Files selected for processing (12)
  • crates/oxide/src/extractor/candidate_machine.rs
  • crates/oxide/src/extractor/mod.rs
  • crates/oxide/src/extractor/pre_processors/haml.rs
  • crates/oxide/src/extractor/pre_processors/pug.rs
  • crates/oxide/src/extractor/pre_processors/ruby.rs
  • crates/oxide/src/extractor/pre_processors/rust.rs
  • crates/oxide/src/extractor/pre_processors/slim.rs
  • crates/oxide/src/scanner/mod.rs
  • crates/oxide/src/scanner/sources.rs
  • crates/oxide/tests/scanner.rs
  • integrations/cli/index.test.ts
  • integrations/utils.ts

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 7, 2026

Confidence Score: 5/5

The changes are safe to merge; the two bugs being fixed are well-scoped and backed by thorough tests across all OSes.

The optimize() rewrite is correct: only the initial base is canonicalized, path components are pushed by their original symlinked names, and path_to_posix_string handles Windows separator normalization. The Vec switch in create_walker correctly preserves @source declaration order.

No files require special attention.

Reviews (4): Last reviewed commit: "udpate changelog" | Re-trigger Greptile

Comment thread crates/oxide/src/scanner/sources.rs
Comment thread integrations/utils.ts
Comment on lines +319 to +329
async symlink(target, src) {
let targetAbsolute = path.join(root, target)
let targetParent = path.dirname(targetAbsolute)
await fs.mkdir(targetParent, { recursive: true })

let srcAbsolute = path.join(root, src)
let srcParent = path.dirname(srcAbsolute)
await fs.mkdir(srcParent, { recursive: true })

await fs.symlink(targetAbsolute, srcAbsolute)
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Windows symlink type depends on key ordering in the config object

fs.symlink(targetAbsolute, srcAbsolute) is called without an explicit type argument. On Windows, Node.js auto-detects 'file' vs 'dir' based on whether the target exists at call time — if it doesn't exist yet, it defaults to 'file', which silently creates an invalid symlink for directory targets. The current integration tests happen to define target directory files before the symlink: entry (so auto-detection succeeds), but if a future test lists the symlink key first in the config.fs object, directory symlinks will silently be created as file symlinks on Windows and the walker will fail to traverse them. Passing 'junction' (or 'dir') as the third argument, or processing all regular-file writes before symlinks, would make this robust regardless of key order.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/oxide/src/scanner/sources.rs (1)

129-149: ⚠️ Potential issue | 🟠 Major

Handle ? and character classes ([...]) when switching to pattern mode in PublicSourceEntry::optimize()

optimize() only switches to ComponentStage::Pattern when part.contains('*'). The optimized (base, pattern) is later matched with fast_glob::glob_match, which supports ? and character classes ([ab], [a-z], etc.), so segments containing those metacharacters can be hoisted into base and treated as literal paths, changing glob semantics.

Suggested fix
-                        Component::Normal(part) if part.to_string_lossy().contains("*") => {
+                        Component::Normal(part) if has_glob_meta(part) => {
                             new_pattern.push(component);
                             stage = ComponentStage::Pattern;
                         }
fn has_glob_meta(part: &std::ffi::OsStr) -> bool {
    let part = part.to_string_lossy();
    part.contains('*') || part.contains('?') || part.contains('[')
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/oxide/src/scanner/sources.rs` around lines 129 - 149,
PublicSourceEntry::optimize() currently only switches to ComponentStage::Pattern
when a component contains '*', which misses other glob metacharacters supported
by fast_glob::glob_match (like '?' and character classes '['). Add a helper
(e.g. has_glob_meta(part: &OsStr) -> bool) that checks for '*' or '?' or '[' on
the component string, then replace occurrences of
part.to_string_lossy().contains("*") with has_glob_meta(&part) in the
Component::Normal match arms (including the last-component branch that decides
between pushing to base or new_pattern) so any segment with '?', '[' or '*' is
treated as pattern and moves to ComponentStage::Pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@crates/oxide/src/scanner/sources.rs`:
- Around line 129-149: PublicSourceEntry::optimize() currently only switches to
ComponentStage::Pattern when a component contains '*', which misses other glob
metacharacters supported by fast_glob::glob_match (like '?' and character
classes '['). Add a helper (e.g. has_glob_meta(part: &OsStr) -> bool) that
checks for '*' or '?' or '[' on the component string, then replace occurrences
of part.to_string_lossy().contains("*") with has_glob_meta(&part) in the
Component::Normal match arms (including the last-component branch that decides
between pushing to base or new_pattern) so any segment with '?', '[' or '*' is
treated as pattern and moves to ComponentStage::Pattern.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5bdd0d19-884a-4975-93af-b8ef8bfb53dc

📥 Commits

Reviewing files that changed from the base of the PR and between ab1cec0 and 506e6a9.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • crates/oxide/src/scanner/mod.rs
  • crates/oxide/src/scanner/sources.rs
  • crates/oxide/tests/scanner.rs
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/oxide/tests/scanner.rs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/oxide/src/scanner/sources.rs (1)

249-313: ⚡ Quick win

Consider adding unit tests for absolute path handling.

The new Component::Prefix and Component::RootDir branches (lines 152-162) aren't exercised by the existing unit tests. Adding tests for patterns like /absolute/path/**/*.html (Unix) and C:\absolute\path\**\*.html (Windows, if CI supports it) would help ensure the platform-specific logic remains correct.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/oxide/src/scanner/sources.rs` around lines 249 - 313, Add unit tests
that exercise absolute-path branches by constructing PublicSourceEntry instances
whose pattern starts with an absolute path so optimize() will hit
Component::RootDir and Component::Prefix handling (e.g. patterns like
"/absolute/path/**/*.html" on Unix and "C:\\absolute\\path\\**\\*.html" on
Windows where CI supports it); ensure each test creates the on-disk directory
(fs::create_dir_all), calls source.optimize(), and asserts that source.base is
the dunce::canonicalize(...) of the hoisted directory and that source.pattern
preserves the leading "/**/*.html" (or equivalent) per the existing assertions
to validate the new branches in optimize().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/oxide/src/scanner/sources.rs`:
- Around line 249-313: Add unit tests that exercise absolute-path branches by
constructing PublicSourceEntry instances whose pattern starts with an absolute
path so optimize() will hit Component::RootDir and Component::Prefix handling
(e.g. patterns like "/absolute/path/**/*.html" on Unix and
"C:\\absolute\\path\\**\\*.html" on Windows where CI supports it); ensure each
test creates the on-disk directory (fs::create_dir_all), calls
source.optimize(), and asserts that source.base is the dunce::canonicalize(...)
of the hoisted directory and that source.pattern preserves the leading
"/**/*.html" (or equivalent) per the existing assertions to validate the new
branches in optimize().

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fbd00595-086e-492e-91b3-30eb3120843b

📥 Commits

Reviewing files that changed from the base of the PR and between 506e6a9 and 76fdf97.

📒 Files selected for processing (1)
  • crates/oxide/src/scanner/sources.rs

@RobinMalfait RobinMalfait merged commit 3f58e52 into main Jun 7, 2026
21 checks passed
@RobinMalfait RobinMalfait deleted the fix/issue-17985 branch June 7, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

@source not: Cannot ignore symlinked directory because symlink is optimized away

1 participant