Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tail: reduce CPU load for polling #3618

Merged
merged 42 commits into from
Jun 21, 2022
Merged

tail: reduce CPU load for polling #3618

merged 42 commits into from
Jun 21, 2022

Conversation

jhscheer
Copy link
Contributor

@jhscheer jhscheer commented Jun 12, 2022

This should fix: #3616, fix: #3633
together with #3606 (comment) and #3611 (comment)

This uses the recently introduced compare_contents feature of notify PollWatcher.

This feature will evaluate the contents of changed files to determine if they have indeed changed using a fast hashing algorithm. By enabling this feature, performance will be significantly impacted as all files will need to be read and hashed at each poll_interval.

This means that there is no more need for extra tight poll intervals and workarounds to catch every event. This also means that the relevant GNU test-suite checks should be more stable now (no more fluctuation between PASS/FAIL).

* unifiy delays: 500 ms for inotify, 1500 ms for polling
* fix test_follow_name_move2
* unifiy delays: 500 ms for inotify, 1500 ms for polling
* fix test_follow_name_move2
This reduces the CPU load for polling drastically (from ~80% down to ~5%)
by removing/fixing several previous workarounds related to polling,
while still passing all related GNU test-suite checks.
* set Notify::PollWatcher delay to: sleep_sec/10 instead of
  sleep_sec/100
* set recv_timeout to sleep_sec instead of sleep_sec/100
* remove the manual polling of watched files

Bugs:
* fix an issue with headers to consistently pass
"test_follow_name_retry_headers" and "gnu/tests/tail-2/overlay-headers.sh"

Code clean-up and refactor
* make fields of struct FileHandling private (and add getters/setters)
to ensure that the paths are absolute and match the paths returned by
Notify::Events
* replace calls to "crash!" with "return USimpleError"
* clean-up formatting
This reduces the CPU load for polling drastically (from ~80% down to ~5%)
by removing/fixing several previous workarounds related to polling,
while still passing all related GNU test-suite checks.
* set Notify::PollWatcher delay to: sleep_sec/10 instead of
  sleep_sec/100
* set recv_timeout to sleep_sec instead of sleep_sec/100

Bugs:
* fix an issue with headers to consistently pass
"test_follow_name_retry_headers" and "gnu/tests/tail-2/overlay-headers.sh"

Code clean-up and refactor
* make fields of struct FileHandling private (and add getters/setters)
to ensure that the paths are absolute and match the paths returned by
Notify::Events
* replace calls to "crash!" with "return USimpleError"
* clean-up formatting
@sylvestre
Copy link
Sponsor Contributor

I guess you saw:
Error: GNU test failed: tests/tail-2/truncate. tests/tail-2/truncate is passing on 'main'. Maybe you have to rebase?

@jhscheer jhscheer marked this pull request as draft June 13, 2022 21:01
* add/fix manual polling to pass:
    "gnu/tests/tail-2/F-vs-rename.sh"
    "gnu/tests/tail-2/truncate.sh"

* pin notify version because it is a pre-release
* code clean-up & refactoring
Setting notify::PollWatcher::compare_contents to true
makes many workarounds for polling unneccessary.
Setting notify::PollWatcher::compare_contents to true
makes many workarounds for polling unneccessary.
The overlay-headers.sh test intends to check `tail` for:
"redundant headers for overlapping inotify events while it was
suspended". However, it then runs `tail ---dis` which disables
inotify events, i.e. it only uses polling.

Since this test suspends/resumes `tail` and our polling implementation
differs slightly from GNU's `tail`, this test is fluctuating between
PASS/FAIL, depending on scheduling.

There are two issues here:
1. This test does not work as intended because it does not test
inotify events.
2. For `---dis` uu_tail passes this test only sometimes.

For 1. I opened a bug#55996 at bug-coreutils@gnu.org

For 2. this commit uses the same fix as suggested in the bug report
by removing the `---dis` flag. With this change the test uses inotify
events and in turn `uu_tail` should always pass overlay-headers.sh test.
* remove read_some condition from ProcessChecker::is_dead check
  to make make pid.sh test pass consistently

* remove manual polling (again)
@jhscheer jhscheer marked this pull request as ready for review June 16, 2022 19:23
@jhscheer
Copy link
Contributor Author

jhscheer commented Jun 17, 2022

Warning: Congrats! The gnu test tests/tail-2/F-vs-rename is no longer failing!
Warning: Congrats! The gnu test tests/tail-2/overlay-headers is no longer failing!
Error: GNU test failed: tests/tail-2/pid. tests/tail-2/pid is passing on 'main'. Maybe you have to rebase?

I run the pid.sh test in a loop on my system and get 100/100 passes.
I tried a few things, but this test keeps failing in the CI.
I would need to patch in debug code into pid.sh to get more information other than that the exit code is 1. However, I don't feel like debugging for the CI.

Could a volunteer please check this branch out and give me feedback if this test is passing/failing on their system?

@jhscheer
Copy link
Contributor Author

I tried on two systems but I can't reproduce it either :/

Thanks. I think I found the culprit now and the last 5 CI test runs were green.

@sylvestre
Copy link
Sponsor Contributor

Sounds good.
I wish the patch was smaller but this is great work, bravo :)

@sylvestre
Copy link
Sponsor Contributor

The windows tasks seem to be stalled. Do you see that too?

@jhscheer
Copy link
Contributor Author

The windows tasks seem to be stalled. Do you see that too?

Yes, I see Windows and macOS still in progress.

@sylvestre
Copy link
Sponsor Contributor

They will timeout, i think there is a bug.

Following the parent dir instead of the file itself is a workaround
only intended for linux (inotify).
@sylvestre
Copy link
Sponsor Contributor

Are you aware of Error: GNU test failed: tests/tail-2/F-headers. tests/tail-2/F-headers is passing on 'main'. Maybe you have to rebase? ?

@jhscheer
Copy link
Contributor Author

Finally everything is green.
(It doesn't look like the Code Coverage fail for windows is related to this PR)

Are you aware of Error: GNU test failed: tests/tail-2/F-headers. tests/tail-2/F-headers is passing on 'main'. Maybe you have to rebase? ?

Yes. I removed the ---disable-inotify option from the test and I hope that this test is now stable. If this test keeps failing some times, I'm at a loss to what I could try next.

If other tests that use this option fail some times, I would try the same workaround.

@sylvestre
Copy link
Sponsor Contributor

I am working on a grcov workaround
and reported the issue upstream mozilla/grcov#849

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_tail::test_retry7 is intermittent test_follow_name_truncate4 is intermittent
2 participants