Skip to content

Allow partial build of repo metadata for repos exceeding max limits#12166

Merged
kevinyang372 merged 1 commit into
masterfrom
kevin/allow-partial-build-of-repo-metadata
Jun 3, 2026
Merged

Allow partial build of repo metadata for repos exceeding max limits#12166
kevinyang372 merged 1 commit into
masterfrom
kevin/allow-partial-build-of-repo-metadata

Conversation

@kevinyang372
Copy link
Copy Markdown
Member

@kevinyang372 kevinyang372 commented Jun 3, 2026

Description

This makes repo_metadata build a partial file tree for git repos that exceed the indexing file limit, instead of falling back to a shallow depth=1 tree.

Motivation

  • The depth=1 fallback is awkward to build on. Today, when a repo exceeds MAX_FILES_PER_REPO, we rebuild the entire repo as a shallow depth=1 tree with lazily-loaded subdirectories. That shape is hard for downstream consumers to use: migrating skills discovery and project-context/rules onto repo metadata would each need to implement custom, on-demand expansion of a lazily-loaded tree just to find files below the first level. A partial-but-real tree removes that burden.
  • It unblocks non-recursive watcher registration for lazily-loaded repos. As we move toward registering file watchers non-recursively for lazily-loaded directories, a budgeted/partial tree gives us much better watcher behavior on large repos (we watch what we actually loaded rather than the whole tree).

Approach

  • Conservative, bounded "unbounded" indexing. Other popular editors (VSCode, Zed) effectively index the whole non-ignored tree with no hard cap. We move in that direction but stay conservative for now with an upper bound (MAX_FILES_PER_REPO), since gitignored directories are already lazy and don't count toward it.
  • Load everything until the budget, then mark the rest lazy. When the file budget is exhausted, the builder stops descending and leaves the remaining directories as unloaded placeholders (lazy-loaded on demand) instead of erroring or collapsing the whole repo to a single level. Every node scanned before the budget is kept.
  • BFS instead of DFS. The builder now does a queue-driven, level-order breadth-first traversal rather than recursive depth-first. This spends the budget evenly and shallow-first (all of level 1, then level 2, …), which is far more useful for the file tree / @-context / file search than DFS's "first subtree fully, rest empty," and matches Zed's queue-based scan order.
  • Paths of interest are always included. Registered ignored_path_interests (e.g. skill provider directories like .agents/skills) are always expanded, even past the budget, so discovery-critical files stay reachable regardless of repo size.

Consumers that must not operate on a partial tree can opt into the previous behavior: Entry::build_tree now takes a BudgetExceededBehavior parameter, and codebase embedding passes FailFast (the file limit there is an intentional cost cap).

Linked Issue

  • The linked issue is labeled ready-to-spec or ready-to-implement.
  • Where appropriate, screenshots or a short video of the implementation are included below (especially for user-visible or UI changes).

Testing

  • Added repo_metadata unit tests covering: breadth-first coverage with the remainder left unloaded on budget exhaustion, interest paths loading past the budget, directories/gitignored files not consuming the budget, full coverage within budget, and FailFast erroring vs. succeeding.

  • cargo nextest run -p repo_metadata --features local_fs passes; cargo clippy clean on repo_metadata + ai; cargo check -p warp builds.

  • I have manually tested my changes locally with ./script/run

Agent Mode

  • Warp Agent Mode - This PR was created via Warp's AI Agent Mode

CHANGELOG-NONE

@cla-bot cla-bot Bot added the cla-signed label Jun 3, 2026
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented Jun 3, 2026

@kevinyang372

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR changes repo metadata tree construction so large repositories can be partially built breadth-first with lazy-loaded remainders, while keeping fail-fast behavior for codebase embedding.

Concerns

  • The new StopAndLazyLoad budget path still records files after the quota reaches zero within an already-open directory, so flat or wide directories can exceed MAX_FILES_PER_REPO instead of stopping at the cap.
  • This changes user-facing file explorer behavior for oversized repositories, but the PR description does not include screenshots or a screen recording demonstrating the degraded/partial tree behavior end to end. For this user-facing change, please include visual evidence.

Verdict

Found: 0 critical, 2 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Comment thread crates/repo_metadata/src/entry.rs
@kevinyang372 kevinyang372 requested a review from moirahuang June 3, 2026 20:42
@kevinyang372
Copy link
Copy Markdown
Member Author

cc @alokedesai

Copy link
Copy Markdown
Contributor

@moirahuang moirahuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving to unblock

BudgetExceededBehavior::FailFast => true,
BudgetExceededBehavior::StopAndLazyLoad => {
quota.is_none_or(|remaining| remaining > 0)
|| matches_ignored_path_interest(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why we let ignored past interests go through even if quota is 0?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "special case" paths like skill directories

// Preserve existing behavior: failing to read the
// root directory propagates, while an unreadable
// nested directory is left as an unloaded placeholder.
if job.is_root {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still the existing behavior we want to preserve?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should preserve the existing behavior (at least for this PR)

/// exhausted the builder stops descending breadth-first and leaves the
/// remaining directories as unloaded placeholders (lazy-loaded on demand)
/// rather than failing or collapsing the tree to a single level.
const MAX_FILES_PER_REPO: usize = 200_000;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i'm a bit confused by this, i thought we were going to just index the whole thing? or maybe set a memory limit, not a hardcoded max number of files?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline

@kevinyang372 kevinyang372 requested a review from alokedesai June 3, 2026 21:49
@kevinyang372 kevinyang372 merged commit 0f97ef1 into master Jun 3, 2026
42 checks passed
@kevinyang372 kevinyang372 deleted the kevin/allow-partial-build-of-repo-metadata branch June 3, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants