Ensure @source globs with symlinks are preserved#20203
Conversation
When we are dealing with `@source` patterns, we want to optimize them by moving as many static parts from the pattern to the base path. E.g.: ``` @source "./foo/bar/baz/*.html"; ``` Will look like: ``` SourceEntry { base = "/projects/project-a", pattern = "/foo/bar/baz/*.html", } ``` And becomes: ``` SourceEntry { base = "/projects/project-a/foo/bar/baz", pattern = "/*.html", } ``` But with this change, we don't canonicalize them, meaning that if we were referencing a symlink then we keep using the symlink in the pattern. We won't use the resolved canonical path all of a sudden.
ab1cec0 to
2154245
Compare
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThis PR refactors the scanner's ignore-rule storage mechanism from 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
integrations/utils.ts (3)
52-52: ⚡ Quick winParameter naming is semantically backwards.
The first parameter is what the symlink points to (typically called
targetorsource), and the second is where the symlink is created (typically calledlinkordestination). The current naming(dst, src)reverses this convention, which can confuse callers. Additionally, the implementation (line 319) names the first parametertarget, creating an inconsistency with the signature.♻️ Rename parameters for clarity
- symlink(dst: string, src: string): Promise<void> + symlink(target: string, link: string): Promise<void>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integrations/utils.ts` at line 52, The symlink signature currently uses semantically reversed names (symlink(dst: string, src: string)): rename parameters to match implementation and common conventions (e.g., symlink(target: string, link: string): Promise<void>) and update the implementation (the function that currently references `target` at line ~319) to use the `link` name for the path where the symlink is created; propagate the new parameter names to any internal callers, JSDoc/comments, and exports so the API is consistent and avoid breaking external callers by updating call sites accordingly.
319-329: 💤 Low valueRemove unnecessary target parent directory creation.
Creating the parent directory of the symlink target (lines 320-322) is unnecessary. Symlinks can point to non-existent targets, and when targets do exist, their parent directories are already created by earlier
fs.write()calls in the test setup loop (lines 427-439). Only the parent of the symlink itself (srcParent) needs to exist.♻️ Remove redundant mkdir
async symlink(target, src) { let targetAbsolute = path.join(root, target) - let targetParent = path.dirname(targetAbsolute) - await fs.mkdir(targetParent, { recursive: true }) - let srcAbsolute = path.join(root, src) let srcParent = path.dirname(srcAbsolute) await fs.mkdir(srcParent, { recursive: true }) await fs.symlink(targetAbsolute, srcAbsolute) },🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integrations/utils.ts` around lines 319 - 329, The symlink helper unnecessarily creates the target's parent directory; update the async symlink function to stop creating targetParent and only ensure the symlink's parent exists: remove or omit the fs.mkdir call that uses targetParent/targetAbsolute, keep computing targetAbsolute for fs.symlink and preserve creation of srcParent (srcAbsolute) with fs.mkdir({ recursive: true}) before calling fs.symlink(targetAbsolute, srcAbsolute). Ensure references to targetAbsolute, srcParent/srcAbsolute, and fs.symlink remain intact.
428-438: 💤 Low valueClarify comment wording.
The comment "The symlink path is relative to the target destination's path" is ambiguous. It's unclear whether "target destination" refers to the symlink location or what the symlink points to.
✏️ Suggested rewording
if (content.toString().startsWith('symlink:')) { - // The symlink path is relative to the target destination's path + // The path after 'symlink:' is relative to the symlink's location let target = path.join( filename, content.toString().slice('symlink:'.length), )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@integrations/utils.ts` around lines 428 - 438, The comment is ambiguous about what "target destination" refers to; update it to explicitly say that the string after 'symlink:' is a path for the symlink target resolved relative to the symlink's directory (i.e., the directory containing filename), and, if you intended resolution against that directory, consider using path.dirname(filename) when building target; reference the code paths using content.toString().startsWith('symlink:'), the target variable, path.join(...), filename, and context.fs.symlink.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@integrations/utils.ts`:
- Line 52: The symlink signature currently uses semantically reversed names
(symlink(dst: string, src: string)): rename parameters to match implementation
and common conventions (e.g., symlink(target: string, link: string):
Promise<void>) and update the implementation (the function that currently
references `target` at line ~319) to use the `link` name for the path where the
symlink is created; propagate the new parameter names to any internal callers,
JSDoc/comments, and exports so the API is consistent and avoid breaking external
callers by updating call sites accordingly.
- Around line 319-329: The symlink helper unnecessarily creates the target's
parent directory; update the async symlink function to stop creating
targetParent and only ensure the symlink's parent exists: remove or omit the
fs.mkdir call that uses targetParent/targetAbsolute, keep computing
targetAbsolute for fs.symlink and preserve creation of srcParent (srcAbsolute)
with fs.mkdir({ recursive: true}) before calling fs.symlink(targetAbsolute,
srcAbsolute). Ensure references to targetAbsolute, srcParent/srcAbsolute, and
fs.symlink remain intact.
- Around line 428-438: The comment is ambiguous about what "target destination"
refers to; update it to explicitly say that the string after 'symlink:' is a
path for the symlink target resolved relative to the symlink's directory (i.e.,
the directory containing filename), and, if you intended resolution against that
directory, consider using path.dirname(filename) when building target; reference
the code paths using content.toString().startsWith('symlink:'), the target
variable, path.join(...), filename, and context.fs.symlink.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1dad4c43-5368-40b3-a8c9-f7d4db4b214c
📒 Files selected for processing (12)
crates/oxide/src/extractor/candidate_machine.rscrates/oxide/src/extractor/mod.rscrates/oxide/src/extractor/pre_processors/haml.rscrates/oxide/src/extractor/pre_processors/pug.rscrates/oxide/src/extractor/pre_processors/ruby.rscrates/oxide/src/extractor/pre_processors/rust.rscrates/oxide/src/extractor/pre_processors/slim.rscrates/oxide/src/scanner/mod.rscrates/oxide/src/scanner/sources.rscrates/oxide/tests/scanner.rsintegrations/cli/index.test.tsintegrations/utils.ts
Confidence Score: 5/5The changes are safe to merge; the two bugs being fixed are well-scoped and backed by thorough tests across all OSes. The optimize() rewrite is correct: only the initial base is canonicalized, path components are pushed by their original symlinked names, and path_to_posix_string handles Windows separator normalization. The Vec switch in create_walker correctly preserves @source declaration order. No files require special attention. Reviews (4): Last reviewed commit: "udpate changelog" | Re-trigger Greptile |
| async symlink(target, src) { | ||
| let targetAbsolute = path.join(root, target) | ||
| let targetParent = path.dirname(targetAbsolute) | ||
| await fs.mkdir(targetParent, { recursive: true }) | ||
|
|
||
| let srcAbsolute = path.join(root, src) | ||
| let srcParent = path.dirname(srcAbsolute) | ||
| await fs.mkdir(srcParent, { recursive: true }) | ||
|
|
||
| await fs.symlink(targetAbsolute, srcAbsolute) | ||
| }, |
There was a problem hiding this comment.
Windows symlink type depends on key ordering in the config object
fs.symlink(targetAbsolute, srcAbsolute) is called without an explicit type argument. On Windows, Node.js auto-detects 'file' vs 'dir' based on whether the target exists at call time — if it doesn't exist yet, it defaults to 'file', which silently creates an invalid symlink for directory targets. The current integration tests happen to define target directory files before the symlink: entry (so auto-detection succeeds), but if a future test lists the symlink key first in the config.fs object, directory symlinks will silently be created as file symlinks on Windows and the walker will fail to traverse them. Passing 'junction' (or 'dir') as the third argument, or processing all regular-file writes before symlinks, would make this robust regardless of key order.
+ move pinned pattern check (leading `/`) to the optimize step as well.
2154245 to
506e6a9
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/oxide/src/scanner/sources.rs (1)
129-149:⚠️ Potential issue | 🟠 MajorHandle
?and character classes ([...]) when switching to pattern mode inPublicSourceEntry::optimize()
optimize()only switches toComponentStage::Patternwhenpart.contains('*'). The optimized(base, pattern)is later matched withfast_glob::glob_match, which supports?and character classes ([ab],[a-z], etc.), so segments containing those metacharacters can be hoisted intobaseand treated as literal paths, changing glob semantics.Suggested fix
- Component::Normal(part) if part.to_string_lossy().contains("*") => { + Component::Normal(part) if has_glob_meta(part) => { new_pattern.push(component); stage = ComponentStage::Pattern; }fn has_glob_meta(part: &std::ffi::OsStr) -> bool { let part = part.to_string_lossy(); part.contains('*') || part.contains('?') || part.contains('[') }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/oxide/src/scanner/sources.rs` around lines 129 - 149, PublicSourceEntry::optimize() currently only switches to ComponentStage::Pattern when a component contains '*', which misses other glob metacharacters supported by fast_glob::glob_match (like '?' and character classes '['). Add a helper (e.g. has_glob_meta(part: &OsStr) -> bool) that checks for '*' or '?' or '[' on the component string, then replace occurrences of part.to_string_lossy().contains("*") with has_glob_meta(&part) in the Component::Normal match arms (including the last-component branch that decides between pushing to base or new_pattern) so any segment with '?', '[' or '*' is treated as pattern and moves to ComponentStage::Pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@crates/oxide/src/scanner/sources.rs`:
- Around line 129-149: PublicSourceEntry::optimize() currently only switches to
ComponentStage::Pattern when a component contains '*', which misses other glob
metacharacters supported by fast_glob::glob_match (like '?' and character
classes '['). Add a helper (e.g. has_glob_meta(part: &OsStr) -> bool) that
checks for '*' or '?' or '[' on the component string, then replace occurrences
of part.to_string_lossy().contains("*") with has_glob_meta(&part) in the
Component::Normal match arms (including the last-component branch that decides
between pushing to base or new_pattern) so any segment with '?', '[' or '*' is
treated as pattern and moves to ComponentStage::Pattern.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5bdd0d19-884a-4975-93af-b8ef8bfb53dc
📒 Files selected for processing (4)
CHANGELOG.mdcrates/oxide/src/scanner/mod.rscrates/oxide/src/scanner/sources.rscrates/oxide/tests/scanner.rs
✅ Files skipped from review due to trivial changes (1)
- CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/oxide/tests/scanner.rs
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/oxide/src/scanner/sources.rs (1)
249-313: ⚡ Quick winConsider adding unit tests for absolute path handling.
The new
Component::PrefixandComponent::RootDirbranches (lines 152-162) aren't exercised by the existing unit tests. Adding tests for patterns like/absolute/path/**/*.html(Unix) andC:\absolute\path\**\*.html(Windows, if CI supports it) would help ensure the platform-specific logic remains correct.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/oxide/src/scanner/sources.rs` around lines 249 - 313, Add unit tests that exercise absolute-path branches by constructing PublicSourceEntry instances whose pattern starts with an absolute path so optimize() will hit Component::RootDir and Component::Prefix handling (e.g. patterns like "/absolute/path/**/*.html" on Unix and "C:\\absolute\\path\\**\\*.html" on Windows where CI supports it); ensure each test creates the on-disk directory (fs::create_dir_all), calls source.optimize(), and asserts that source.base is the dunce::canonicalize(...) of the hoisted directory and that source.pattern preserves the leading "/**/*.html" (or equivalent) per the existing assertions to validate the new branches in optimize().
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@crates/oxide/src/scanner/sources.rs`:
- Around line 249-313: Add unit tests that exercise absolute-path branches by
constructing PublicSourceEntry instances whose pattern starts with an absolute
path so optimize() will hit Component::RootDir and Component::Prefix handling
(e.g. patterns like "/absolute/path/**/*.html" on Unix and
"C:\\absolute\\path\\**\\*.html" on Windows where CI supports it); ensure each
test creates the on-disk directory (fs::create_dir_all), calls
source.optimize(), and asserts that source.base is the dunce::canonicalize(...)
of the hoisted directory and that source.pattern preserves the leading
"/**/*.html" (or equivalent) per the existing assertions to validate the new
branches in optimize().
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fbd00595-086e-492e-91b3-30eb3120843b
📒 Files selected for processing (1)
crates/oxide/src/scanner/sources.rs
This PR fixes an issue when working with
@sourceand@source notthat involves symlinks.Internally our sources are mapped to a source entry where we have a
basepath and apattern. You can think about this where we insert a.gitignorefile in thebasepath for the given pattern.However, we optimize these entries to move as many "static" parts into the base path. For example:
Is mapped to something like:
We then optimize it by turning it into:
While doing this, we also use
dunce::canonicalizeto resolve the actual paths on disk. This means that a symlink is resolved to their real paths. This can cause issues because the "real" path is not what you wrote in the@sourcedirectives.So before, it could be that you have this:
Which was mapped to this on the Rust side:
But was then optimized to:
...and we lost the
symlinked-folderinformation. This causes issues as seen in #17985.With this PR, we keep the symlinked information in those globs since that's what you wrote in those
@sourcedirectives.While setting up integration tests, I stumbled upon an issue because I wanted to test that ignoring a symlinked folder, but including a single particular file of that ignored folder resulted in that file being ignored as well. Let's look at an example:
Earlier I mentioned that we create
.gitignorefiles based on these@sourcedirectives. In this case, when we're dealing with a folder, we use**/*as the contents.Looking at the example above, we should essentially have something like this:
Since it's a
.gitignorefile, we have to invert the globs. But the bug I noticed is that in reality the result of those gitignores didn't look like the above, it looked like:Notice how the
!except.htmland**/*are flipped. When dealing with.gitignorefiles, the order is important.This was caused because internally we kept a
BTreeMapofBTreeSets where the map was the base path and a set of patterns. The patterns were sorted because of theBTreeSet... which is not what we want.Fixes: #17985
Closes: #20091
Test plan
@sourcefiles with a folder + file is sorted correctly.