Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pty): use async io to avoid polling #523

Merged
merged 1 commit into from
May 24, 2021
Merged

Conversation

kxt
Copy link
Contributor

@kxt kxt commented May 19, 2021

This patch fixes #509 by using async read instead of polling a
non-blocking fd. This reduces CPU usage when the ptys are idle.

@kxt
Copy link
Contributor Author

kxt commented May 19, 2021

I made a measurement of cating 1048576 lines using hyperfine.

this PR:
  Time (mean ± σ):     732.8 ms ±  37.9 ms    [User: 0.7 ms, System: 521.6 ms]
  Range (min … max):   672.4 ms … 780.7 ms    10 runs
0.12.0:
  Time (mean ± σ):      1.744 s ±  0.053 s    [User: 1.8 ms, System: 560.5 ms]
  Range (min … max):    1.648 s …  1.838 s    10 runs
alacritty:
  Time (mean ± σ):     625.5 ms ±  10.2 ms    [User: 1.0 ms, System: 611.1 ms]
  Range (min … max):   613.6 ms … 648.8 ms    10 runs
tmux:
  Time (mean ± σ):      1.532 s ±  0.026 s    [User: 1.5 ms, System: 740.8 ms]
  Range (min … max):    1.483 s …  1.567 s    10 runs

However, during these measurements, I've noticed a problem. This benchmark measures the time it takes for cat to finish. However, due to the multi-threaded architecture of zellij, if rendering takes more time than producing the output (running cat in this example), the unbounded to_screen mspc queue would start growing.

This results in both user visible lagging, and ballooning memory usage by all the PtyBytes sitting in the mspc queue.

This problem is exacerbated by this PR, as it makes streaming faster.

Perhaps the task::sleep in stream_terminal_bytes is meant to remedy this problem, however a fixed amount of waiting is either:

  • too little, causing lagging
  • too much, because rendering would be able to handle more, making cat slower unnecessarily

The proper solution would be a having back pressure between Pty and Screen. I'll open a bug for this issue so we can discuss the details.

@kxt kxt force-pushed the 509-async-pty branch 2 times, most recently from b307b63 to f8784f5 Compare May 19, 2021 07:51
@imsnif
Copy link
Member

imsnif commented May 19, 2021

Hey @kxt - this looks great at a very brief first glance. I want to give this a more thorough dive (both to test it and understand the difference in behaviour and the new code changes). Unfortunately, I've been having a bit of a day, so I won't have a chance to get to it today. I'm setting some time aside for it tomorrow though. Thank you for your patience :)

@kxt
Copy link
Contributor Author

kxt commented May 19, 2021

Unfortunately it's a bit involved code, sorry about that. I might be able to clean up it a little before you dive into it tomorrow.

@imsnif
Copy link
Member

imsnif commented May 19, 2021

Nah, it's fine :) Only thing that might be good (for me and future readers) is to add comments around more advanced features to explain why you used them (eg. the async trait and unsafe block)

@kxt
Copy link
Contributor Author

kxt commented May 19, 2021

I've updated the patch, hopefully stream_terminal_bytes is a bit more straightforward.

async_trait is not a big deal, it's just that Rust still does not support async fn in traits, so there is a proc macro that automagically converts async fn methods into normal fns that return Futures.

The unsafe part is however a big deal. from_raw_fd is unsafe because it consumes the file descriptor, meaning that it will be closed once AsyncReader is dropped (this is why the AsyncReader object is used instead of having something like async fn async_read(fd: RawFd, buf: &mut [u8]) -> Result<usize> in ServerOsApi that converts fd into File internally, it would close fd after the first read).

Closing the fd when AsyncReader is dropped is just what we want, as we've already finished with that pty. However, this also means that the ptys were not closed before this patch, which might be the cause of #518.

Copy link
Member

@imsnif imsnif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks for the updates. This really looks great. Certainly much better in every way than what we had before. I left a couple of minor nitpicks, but I don't have much to say about the code otherwise.

The behaviour change is a little problematic though, as you pointed out in the linked issue. I'm going to comment on it here because I think it'll be easier. (orig issue: #525)

An alternative explanation (kudos to @aeinstein): running nearly at the speed of light, cat is experiencing time dilation compared to a stationary observer.

(I laughed quite a bit at this)

I made a proof-of-concept using a bounded to_screen queue, and I ran into a deadlock: [...] which is already filled by pty thread, so this send is blocking.

Kudos on all the threading. I agree that this is a problematic part of our architecture. IMO a better solution would be to decouple the plugin rendering from the plugin state updating. Maybe even to have them be long-lived and operating in the background, sending their visual state to the screen thread to be rendered as needed. But I guess @TheLostLambda might have more to say about this.

There's also ongoing work being done on the rendering itself to cut down its cost significantly, which I think will be a big help here.

Either way, I hope these changes don't need to block this PR. Do you think there's some way to workaround the backpressure issue? Even adding an artificial amount of sleep that won't be 100% on every machine but will be good enough? Or some other solution?

Because visually to users, this seems like a performance degradation for large amounts of output - and I'm hesitant to worsen the UX here.

src/tests/fakes.rs Outdated Show resolved Hide resolved
zellij-server/src/os_input_output.rs Outdated Show resolved Hide resolved
zellij-server/src/pty.rs Show resolved Hide resolved
This patch fixes zellij-org#509 by using async read instead of polling a
non-blocking fd. This reduces CPU usage when the ptys are idle.
@kxt
Copy link
Contributor Author

kxt commented May 21, 2021

Do you think there's some way to workaround the backpressure issue? Even adding an artificial amount of sleep that won't be 100% on every machine but will be good enough? Or some other solution?
Because visually to users, this seems like a performance degradation for large amounts of output - and I'm hesitant to worsen the UX here.

This is a YMMV kinda problem. The code currently has a 10ms sleep, that's the same as the old code. However, due to the higher throughput, there might be some systems that were somewhat laggy with the old code because it kept the screen thread at nearly 100% percent fed, and the new code will tip it over 100% and trigger #525. But higher throughput also means that sleep could be increased: I tried at 30ms, and the new code has both higher throughput and less lag (this I did not measure, just felt more responsive on my system) than the old code with 10ms. This is not an official endorsement of increasing sleep time.

I have a few more ideas for implementing back pressure that I want to try before we upend the entire threading architecture, I'll keep #525 posted.

@imsnif
Copy link
Member

imsnif commented May 24, 2021

Hey @kxt - I'm not sure where things stand. I see you requested another review on this, but I thought we were going with the crossbeam changes dicussed in #525 - so how would you like to continue?

@kxt
Copy link
Contributor Author

kxt commented May 24, 2021

I did not know that #525 is blocking this PR, and it's not clear to me why it does.

@imsnif
Copy link
Member

imsnif commented May 24, 2021

My apologies for the confusion then. I took another look at this and while the performance is a little janky here when catting large files, it's MUCH faster. I think the jankyness is this exposing the soon-to-be-fixed flaws of our render function itself.

@imsnif imsnif merged commit 2168793 into zellij-org:main May 24, 2021
@imsnif
Copy link
Member

imsnif commented May 24, 2021

Also, thanks for this @kxt! And thanks for your patience with the merges from main, I know it can be frustrating.

@kxt kxt deleted the 509-async-pty branch May 24, 2021 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CPU usage when idle
2 participants