Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status buffer performance for large repo is poor due to untracked file checks #2260

Closed
zivarah opened this issue Jan 26, 2024 · 8 comments
Closed

Comments

@zivarah
Copy link

zivarah commented Jan 26, 2024

I'll start off by saying that I fully acknowledge that the problems I'm about to describe are due to the fact that I'm working with a repo that is far larger than it should be, and doing so on windows to boot. That said, this is the hand I've been dealt and I'd sure like to be able to use fugitive while playing it.

My company has an extremely large repository. To mitigate this, we:

  • limit our fetch refspecs to only relevant branches
  • Use sparse checkouts in our worktrees
  • Utilize fsmonitor

To demonstrate the scale, here is the output of git-sizer run from our most prevalent sparse-checkout setup:

$  git-sizer -v
Processing blobs: 10198549
Processing trees: 31778397
Processing commits: 3758160
Matching commits to trees: 3758160
Processing annotated tags: 0
Processing references: 5
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Overall repository size      |           |                                |
| * Commits                    |           |                                |
|   * Count                    |  3.76 M   | *******                        |
|   * Total size               |  1.25 GiB | *****                          |
| * Trees                      |           |                                |
|   * Count                    |  31.8 M   | *********************          |
|   * Total size               |  43.9 GiB | ***********************        |
|   * Total tree entries       |  1.12 G   | **********************         |
| * Blobs                      |           |                                |
|   * Count                    |  10.2 M   | ******                         |
|   * Total size               |   729 GiB | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Annotated tags             |           |                                |
|   * Count                    |     0     |                                |
| * References                 |           |                                |
|   * Count                    |     5     |                                |
|     * Branches               |     1     |                                |
|     * Remote-tracking refs   |     2     |                                |
|     * Other                  |     2     |                                |
|                              |           |                                |
| Biggest objects              |           |                                |
| * Commits                    |           |                                |
|   * Maximum size         [1] |   189 KiB | ***                            |
|   * Maximum parents      [2] |     5     |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [3] |  14.6 k   | **************                 |
| * Blobs                      |           |                                |
|   * Maximum size         [4] |  86.5 MiB | *********                      |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum history depth      |   868 k   | *                              |
| * Maximum tag depth          |     0     |                                |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Number of directories  [5] |  81.6 k   | !!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
| * Maximum path depth     [5] |    16     | *                              |
| * Maximum path length    [6] |   205 B   | **                             |
| * Number of files        [5] |   422 k   | ********                       |
| * Total size of files    [7] |  4.38 GiB | ****                           |
| * Number of symlinks     [8] |     4     |                                |
| * Number of submodules       |     0     |                                |

...

Unfortunately, when working in one of these sparse checkouts, fugitive is essentially unusably slow.

Some quick and dirty measurements:

  • Status buffer reload via :G ("cached", meaning the second reload in a row): 8s
    • Same result (unsurprisingly) when focusing the status buffer after modifying a file
  • Staging a changed file: 8s
  • Unstaging a file: 12s
  • Discarding unstaged changes for a file: 8s

I suspect that the vast majority of this time is simply identifying untracked files:

$ time git status
On branch feature
Your branch is up to date with 'origin/feature'.

You are in a sparse checkout.


It took 6.44 seconds to enumerate untracked files,
but the results were cached, and subsequent runs may be faster.
See 'git help status' for information on how to improve this.

nothing to commit, working tree clean

real    0m7.237s
user    0m0.000s
sys     0m0.015s

Whereas if I omit those:

$ time git status -uno
On branch feature
Your branch is up to date with 'feature'.

You are in a sparse checkout.

nothing to commit (use -u to show untracked files)

real    0m0.152s
user    0m0.000s
sys     0m0.031s

I tried out the untracked cache, which does help significantly, but git status is still spending 3-4 seconds on the untracked files.

I understand that in general, we want to be pretty thoughtful about adding new configuration options. That said, I would be surprised if we can solve this for large repositories without disabling some behaviors, which we wouldn't want to force on the 99% of users that don't have such an insanely large repository.

So the low hanging fruit seems like it would be:

  • Allow disabling of the automatic status buffer refresh entirely
  • Could we allow just the untracked file portion to be configurable?
    • I'm envisioning that the status buffer could be configured to use -uno when calling git status unless some special "full reload" command is given

I am of course open to other suggestions here -- I'm just spitballing based on what little data I have. I'm happy to gather more data if you can give me specific things you want measured and/or better ways of getting real data.

Thanks for all of your hard work on this project and other vim extensions -- you're a legend.

@zivarah
Copy link
Author

zivarah commented Jan 26, 2024

As a super dirty proof of concept, if I update autoload/fugitive.vim to replace this:

let status_cmd = cmd + ['status', '-bz']

with this:

let status_cmd = cmd + ['status', '-bz', '-uno']

Then the status buffer is essentially instant again

@tpope
Copy link
Owner

tpope commented Jan 27, 2024

You can disable the enumeration of untracked files with a Git option:

git config status.showUntrackedFiles no

This will also disable untracked files in a plain git status invocation, but I would expect that would be a good thing.

Regarding #2207, I need to post a followup there (subscribe to notifications to follow along), but the short answer is I think I'd rather channel that effort into making the git status call asynchronous.

@tpope tpope closed this as completed Jan 27, 2024
@zivarah
Copy link
Author

zivarah commented Jan 27, 2024

Thanks for the reply.

This will also disable untracked files in a plain git status invocation, but I would expect that would be a good thing.

I'm actually torn here. Generally, when I'm running git status` manually, I think I'd want to see everything. That would be a much lower-volume and intentionally-initiated workflow compared to the fugitive status buffer reloading, which happens constantly.

So I don't love this as a workaround, but I think I can accept it as long as I have a way to do a "full status" reload of the fugitive status buffer. Is there a good way to set config or CLI options for a specific invocation of :G that I could make a keymapping for?

@tpope
Copy link
Owner

tpope commented Jan 27, 2024

Thanks for the reply.

This will also disable untracked files in a plain git status invocation, but I would expect that would be a good thing.

I'm actually torn here. Generally, when I'm running git status` manually, I think I'd want to see everything. That would be a much lower-volume and intentionally-initiated workflow compared to the fugitive status buffer reloading, which happens constantly.

I am not, in principle, opposed to making a Fugitive specific option for this, but I've been hesitant to commit to a convention for repository-specific Fugitive settings.

So I don't love this as a workaround, but I think I can accept it as long as I have a way to do a "full status" reload of the fugitive status buffer. Is there a good way to set config or CLI options for a specific invocation of :G that I could make a keymapping for?

There's no way to pass options to :G, and I don't think this is a road I want to go down. Right now :G does little more than open a specific file path. Giving it options would raise question about what happens when the path is opened in multiple windows, or persisted in :mksession etc.

You could build this out from scratch by invoking :call FugitiveExecute(['config', 'status.showUntrackedFiles', 'normal']), then :G, then :call FugitiveExecute(['config', 'status.showUntrackedFiles', 'no']). It's janky, but might be better than nothing.

@zivarah
Copy link
Author

zivarah commented Jan 27, 2024

I am not, in principle, opposed to making a Fugitive specific option for this, but I've been hesitant to commit to a convention for repository-specific Fugitive settings.

That's very understandable. I could see using custom git config values (git config --local fugitive.statusBuffer.showUntracked false) or something like that, but I understand the desire to not go down this road without a strong case.

There's no way to pass options to :G, and I don't think this is a road I want to go down. Right now :G does little more than open a specific file path. Giving it options would raise question about what happens when the path is opened in multiple windows, or persisted in :mksession etc.

Totally fair.

You could build this out from scratch by invoking :call FugitiveExecute(['config', 'status.showUntrackedFiles', 'normal']), then :G, then :call FugitiveExecute(['config', 'status.showUntrackedFiles', 'no']). It's janky, but might be better than nothing.

This sort of workaround is okay for me, I think. It looks like I should be able to use FugitiveConfigGet to also get the current value and restore that at the end, making this a relatively painless workaround with now significant side effects. I will play with this when I have some time and reply here with whatever I come up with in case it's helpful for anyone else.

@zivarah
Copy link
Author

zivarah commented Jan 28, 2024

I ended up with the following, which seems to work well:

	function! FugitiveStatusFullRefresh() abort
		" Force a full refresh of the status buffer, including untracked files.
		" This is useful for large repositories where `status.showUntrackedFiles`
		" has been set to `no` to avoid a long delay when reloading the status
		" buffer.

		" Save the current value of `status.showUntrackedFiles` so that we can
		" restore it after the refresh.
		let show_untracked_files = FugitiveConfigGet('status.showUntrackedFiles')

		" If `status.showUntrackedFiles` is not set, just do a normal `:G`.
		if ! len(show_untracked_files)
			G
			return
		endif

		" Set `status.showUntrackedFiles` to `normal` so that untracked files
		" are included in the status buffer.
		call FugitiveExecute(['config', 'status.showUntrackedFiles', 'normal'])
		" Refresh the status buffer
		G
		" Restore the original value of `status.showUntrackedFiles`
		call FugitiveExecute(['config', 'status.showUntrackedFiles', show_untracked_files])
	endfunction

@tpope
Copy link
Owner

tpope commented Feb 13, 2024

Have you tried git config core.untrackedCache true and git config core.fsmonitor true, to see if those improve performance?

@zivarah
Copy link
Author

zivarah commented Feb 13, 2024

Have you tried git config core.untrackedCache true and git config core.fsmonitor true, to see if those improve performance?

Yep -- We're already using feature.manyFiles (which implies core.untrackedCache per git-config), and we're also using core.fsmonitor as mentioned in my original post.

My understanding is that core.fsmonitor helps immensely with tracked file performance, but does not improve untracked file performance at all. core.untrackedCache helps quite a bit but we still end up with several-second status calls even with it set.

It's a tragically large repo =(

So far I am finding the workaround you suggested to work quite well: the performance of git status with status.showUntrackedFiles set to no is very snappy (~0.2s once it's warmed up).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants