Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git performance problems with large device count #3121

Closed
jerji opened this issue Apr 12, 2024 · 6 comments · Fixed by #3192
Closed

Git performance problems with large device count #3121

jerji opened this issue Apr 12, 2024 · 6 comments · Fixed by #3192
Assignees

Comments

@jerji
Copy link
Contributor

jerji commented Apr 12, 2024

Dear Maintainer,

My company uses Oxidized to store device configurations for various kinds of network devices. One of our instances currently has about 700 devices and the bare git repo has grown to 300~GB. Currently, it times out when requesting versions for devices probably because of the size of the repo.

Are there any optimizations you think might help in this situation or is there any profiling I could help with? Oxidized is an important tool for us and want to help where we can. :)

@robertcheramy
Copy link
Collaborator

Could you try running git gc on your repository?

@jerji
Copy link
Contributor Author

jerji commented Apr 15, 2024

I started it over night, we will see when it completes. Last tried to run it the server ran out of disk space while it was completing. I have since copied it to a new drive.

@ytti
Copy link
Owner

ytti commented Jun 12, 2024

I recall looking into this years ago, trying to figure out if I'm using rugged incorrectly, leaving behind trash or if rugged is doing something incorrectly leaving behind trash or if this is expected accumulation of trash in git, I couldn't answer my question.

@robertcheramy
Copy link
Collaborator

git gc is not supported in libgit2/rugged: libgit2/libgit2#3247 so we have to document how to run it manually, this is what I want to document in docs/Troubleshooting.md.

I've read in #1805 (comment) that the git implementation in oxidized (or rugged?) is not the usual way, I'll have a look at it. As you already checked this, I'm not sure I will find something.

@ytti
Copy link
Owner

ytti commented Jun 12, 2024

I don't think there is anything particularly odd in bare repo, we simply have no use for having checked out files laying about. We look at the file in the HEAD, and if it is same as one we fetched, we don't make commit, otherwise we make commit.

But as we're not manipulating any files, we don't really need to check them out. If we did check them out, we'd give more opportunities for people to break things.

@robertcheramy
Copy link
Collaborator

robertcheramy commented Jun 14, 2024

I've looked deeper into the output/git.rb.

I find it confusing that we write an index to a bare repository.

The README.md of libgit2/rugged reads every time the tree from the index, and does not write the index to disk.

This is a problem when someone tries to fix the git repository as in #1805 inside a clone and push it back: as it is a bare repository, git does not update the index, and oxidized ignores the current HEAD and works on the old index. I will address this in #1805 as it is off-topic here.

Beside the index subject , I can't see an incorrect use of rugged. We are simply making a lot of commits in a lot of files, and this slows git down after some time. The solution is to pack the loose objects into a packfile, which is git gcfor, and is (currently) not supported by libgit2/rugged. So the only current solution is to document how to run git gc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants