Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poll() returns immdiately with timeouts less than one millisecond #1614

Closed
yump opened this issue Sep 2, 2022 · 2 comments · Fixed by #1615 or #1616
Closed

poll() returns immdiately with timeouts less than one millisecond #1614

yump opened this issue Sep 2, 2022 · 2 comments · Fixed by #1615 or #1616

Comments

@yump
Copy link
Contributor

yump commented Sep 2, 2022

This contradicts the documentation, which says:

Note that the timeout will be rounded up to the system clock granularity (usually 1ms), and kernel scheduling delays mean that the blocking interval may be overrun by a small amount.

I ran into this chasing down high CPU usage from below's replay view, which is just a TUI for viewing existing logs and does not collect samples. Some notes from the investigation, trimmed for brevity:

below replay --time now consumes enormous CPU time even though the screen is completely static.

  • ~4% (4200 MHz) built from upstream git

As for the bulk of the CPU hogging, Looks like culprits are,

(snip!)

  1. There are some 50,000 calls to epoll_wait() per second.

Very suspicious profile (event=cycles):

- 96.97% cursive_core::cursive_run::CursiveRunner<C>::run
   - 96.78% cursive_core::cursive_run::CursiveRunner<C>::step (inlined)
      - 73.51% cursive_core::cursive_run::CursiveRunner<C>::process_events (inlined)
         - 73.23% <cursive_buffered_backend::BufferedBackend as cursive_core::backend::Backend>::poll_event
              <cursive::backends::crossterm::Backend as cursive_core::backend::Backend>::poll_event
              crossterm::event::poll
            - crossterm::event::poll_internal (inlined)
               - 73.18% crossterm::event::read::InternalEventReader::poll
                  - 65.03% <crossterm::event::source::unix::UnixInternalEventSource as crossterm::event::source::EventSource>::try_read
                     + 57.68% mio::poll::Poll::poll
                     + 4.68% crossterm::event::timeout::PollTimeout::leftover (inlined)
                     + 1.73% crossterm::event::timeout::PollTimeout::new (inlined)
                     + 0.06% mio::event::events::Events::is_empty

I built crossterm, modified it's example code to poll with 1ms timeout like crossterm uses it, and threw it in gdb. Backtrace looks like:

#0  0x00007ffff7e8c74a in epoll_wait () from /lib64/libc.so.6
#1  0x00005555555ca615 in mio::sys::unix::selector::epoll::Selector::select (self=0x555555668c20, events=0x555555668c30, timeout=...)
    at /home/rhaley/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.8.4/src/sys/unix/selector/epoll.rs:68
#2  0x00005555555cdd22 in mio::poll::Poll::poll (self=0x555555668c20, events=0x555555668c30, timeout=...)
    at /home/rhaley/.cargo/registry/src/github.com-1ecc6299db9ec823/mio-0.8.4/src/poll.rs:400
#3  0x0000555555585be6 in crossterm::event::source::unix::{impl#1}::try_read (self=0x555555668c20, timeout=...) at src/event/source/unix.rs:83
#4  0x000055555557d6cb in crossterm::event::read::InternalEventReader::poll<crossterm::event::filter::EventFilter> (
    self=0x5555556630a0 <crossterm::event::INTERNAL_EVENT_READER+8>, timeout=..., filter=0x5555556228dc) at src/event/read.rs:64
#5  0x000055555556f7c4 in crossterm::event::poll_internal<crossterm::event::filter::EventFilter> (timeout=..., filter=0x5555556228dc) at src/event.rs:233
#6  0x0000555555578958 in crossterm::event::poll (timeout=...) at src/event.rs:168
#7  0x000055555556c58e in event_poll_read::print_events () at examples/event-poll-read.rs:25
#8  0x000055555556cb3b in event_poll_read::main () at examples/event-poll-read.rs:55

The main problem is that mio rounds <1ms timeouts to 0, which results in epoll_wait() returning immediately. But how does a =1ms timeout become a <1ms timeout?

Because crossterm doesn't pass the timeout straight to mio. Instead, it converts it to a crossterm::timeout::PollTimeout, and then passes the leftover time to mio. The leftover time is calculated as the original timeout less the elapsed time since the PollTimout was constructed. In fact, crossterm converts back and forth between std::time::Duration and its own PollTimeout by calling leftover() and constructing new PollTimeouts several times, potentially losing a few nanos on each conversion.

Compounding the problem, crossterm polls repeatedly in a loop until the timeout is fully elapsed. So the 1ms timeout becomes a 1ms spinwait.

Patched mio to round timeouts up instead of down. Result:

Before, with 4% CPU usage:

07:10:00
@[tracepoint:syscalls:sys_enter_openat, below, 2606215, 2606215]: 8
@[tracepoint:syscalls:sys_enter_ioctl, below, 2606215, 2606215]: 8
@[tracepoint:syscalls:sys_enter_close, below, 2606215, 2606215]: 8
@[tracepoint:syscalls:sys_enter_clock_nanosleep, below, 2606215, 2606215]: 32
@[tracepoint:syscalls:sys_enter_epoll_wait, below, 2606215, 2606215]: 50275

After, with 0.9% CPU usage:

07:06:30
@[tracepoint:syscalls:sys_enter_openat, below, 2571785, 2571785]: 8
@[tracepoint:syscalls:sys_enter_ioctl, below, 2571785, 2571785]: 8
@[tracepoint:syscalls:sys_enter_close, below, 2571785, 2571785]: 8
@[tracepoint:syscalls:sys_enter_clock_nanosleep, below, 2571785, 2571785]: 31
@[tracepoint:syscalls:sys_enter_epoll_wait, below, 2571785, 2571785]: 31

(per-second syscall counts collected with this bpftrace script)

@Thomasdezeeuw
Copy link
Collaborator

Nice debugging @yump, I left a comment in #1615.

@yump yump changed the title poll() returns immdiately with timeouts less than one second poll() returns immdiately with timeouts less than one millisecond Sep 3, 2022
@Thomasdezeeuw
Copy link
Collaborator

Reopening this because it's still a possible problem on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants