Allow partial build of repo metadata for repos exceeding max limits#12166
Conversation
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Overview
This PR changes repo metadata tree construction so large repositories can be partially built breadth-first with lazy-loaded remainders, while keeping fail-fast behavior for codebase embedding.
Concerns
- The new
StopAndLazyLoadbudget path still records files after the quota reaches zero within an already-open directory, so flat or wide directories can exceedMAX_FILES_PER_REPOinstead of stopping at the cap. - This changes user-facing file explorer behavior for oversized repositories, but the PR description does not include screenshots or a screen recording demonstrating the degraded/partial tree behavior end to end. For this user-facing change, please include visual evidence.
Verdict
Found: 0 critical, 2 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
|
cc @alokedesai |
| BudgetExceededBehavior::FailFast => true, | ||
| BudgetExceededBehavior::StopAndLazyLoad => { | ||
| quota.is_none_or(|remaining| remaining > 0) | ||
| || matches_ignored_path_interest( |
There was a problem hiding this comment.
curious why we let ignored past interests go through even if quota is 0?
There was a problem hiding this comment.
This is the "special case" paths like skill directories
| // Preserve existing behavior: failing to read the | ||
| // root directory propagates, while an unreadable | ||
| // nested directory is left as an unloaded placeholder. | ||
| if job.is_root { |
There was a problem hiding this comment.
is this still the existing behavior we want to preserve?
There was a problem hiding this comment.
Yeah I think we should preserve the existing behavior (at least for this PR)
| /// exhausted the builder stops descending breadth-first and leaves the | ||
| /// remaining directories as unloaded placeholders (lazy-loaded on demand) | ||
| /// rather than failing or collapsing the tree to a single level. | ||
| const MAX_FILES_PER_REPO: usize = 200_000; |
There was a problem hiding this comment.
sorry i'm a bit confused by this, i thought we were going to just index the whole thing? or maybe set a memory limit, not a hardcoded max number of files?

Description
This makes
repo_metadatabuild a partial file tree for git repos that exceed the indexing file limit, instead of falling back to a shallowdepth=1tree.Motivation
depth=1fallback is awkward to build on. Today, when a repo exceedsMAX_FILES_PER_REPO, we rebuild the entire repo as a shallowdepth=1tree with lazily-loaded subdirectories. That shape is hard for downstream consumers to use: migrating skills discovery and project-context/rules onto repo metadata would each need to implement custom, on-demand expansion of a lazily-loaded tree just to find files below the first level. A partial-but-real tree removes that burden.Approach
MAX_FILES_PER_REPO), since gitignored directories are already lazy and don't count toward it.@-context / file search than DFS's "first subtree fully, rest empty," and matches Zed's queue-based scan order.ignored_path_interests(e.g. skill provider directories like.agents/skills) are always expanded, even past the budget, so discovery-critical files stay reachable regardless of repo size.Consumers that must not operate on a partial tree can opt into the previous behavior:
Entry::build_treenow takes aBudgetExceededBehaviorparameter, and codebase embedding passesFailFast(the file limit there is an intentional cost cap).Linked Issue
ready-to-specorready-to-implement.Testing
Added
repo_metadataunit tests covering: breadth-first coverage with the remainder left unloaded on budget exhaustion, interest paths loading past the budget, directories/gitignored files not consuming the budget, full coverage within budget, andFailFasterroring vs. succeeding.cargo nextest run -p repo_metadata --features local_fspasses;cargo clippyclean onrepo_metadata+ai;cargo check -p warpbuilds.I have manually tested my changes locally with
./script/runAgent Mode
CHANGELOG-NONE