-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization: Check file timestamps #127
Conversation
The current version performs a `git commit` on every repository every 5 seconds. If nothing changed, it simply has no effect. This strategy is eating up sometimes 10-20% of my CPU (on a poor dev machine), due to the massive amount of hashing needed to commit so many repositories. This change creates an optimization by tracking file changed timestamps and comparing them against last commit timestamps. To do this, we have to traverse all files in each repository, which `git commit` is doing anyway. The big difference is that we only need to traverse to the inode (directory) for the file timestamp entries, so it should be significantly faster. Considered design: It has been suggested to use the `notify` crate to be notified of file changes. The trouble here was a global limit on file descriptors that can be watched, so you can't watch all files. I considered some machine learning appraches to guessing which files should be watched, but abandoned the approach because every strategy I could think of inolved a fallback to eactly what I'm doing here. Considered design: Use SQLite to remember timestamps of every file. This was needlessly complex because I only need to remember one timestamp - the most recent one. And more important, the last dura commit already stores it, so SQLite wasn't adding any value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a major improvement without adding much complexity at all! Code is all good apart from a few little bits - one unwrap()
I think should definitely be changed, but the rest are minor points.
I tried this on another computer and I'm getting even higher CPU utilization — 15-25% — it's possible I just installed it wrong. Looking into it... |
UPDATE: I found the performance issue. Over the weekend, after I produced that build that performed well on my old laptop, I started adding unit tests while having a few drinks. As expected, it seemed like a very good idea to call Well... it turns out there's a lot of kernel calls in vs here's dura without the I inexplicably have some failing tests now. I think the other PR must have somehow broken them. Working... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All lgtm, give me a heads up when the final tests are sorted and I'll have another check over :)
Merged, because this is definitely better than it was before. I'm going to make a new PR for more improvements |
The current version performs a
git commit
on every repository every 5 seconds. If nothing changed, it simply has no effect. This strategy is eating up sometimes 10-20% of my CPU (on a poor dev machine), due to the massive amount of hashing needed to commit so many repositories.This change creates an optimization by tracking file changed timestamps and comparing them against last commit timestamps. To do this, we have to traverse all files in each repository, which
git commit
is doing anyway. The big difference is that we only need to traverse to the inode (directory) for the file timestamp entries, so it should be significantly faster.Considered Design:
notify
It has been suggested to use the
notify
crate to be notified of file changes. The trouble here was a global limit on file descriptors that can be watched, so you can't watch all files. I considered some machine learning appraches to guessing which files should be watched, but abandoned the approach because every strategy I could think of inolved a fallback to eactly what I'm doing here.Considered Design: SQLite
Use SQLite to remember timestamps of every file. This was needlessly complex because I only need to remember one timestamp - the most recent one. And more important, the last dura commit already stores it, so SQLite wasn't adding any value.
Results
On the laptop where I was getting ~10% CPU utilization, it's now less that 0.1% (it doesn't register). There's a stark drop in the Activity Monitor graph after I installed the updated version.