Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace repeated polling in Outbox.loop() with an asyncio event (see #2482) #2867

Merged
merged 14 commits into from
Apr 16, 2024

Conversation

afullerx
Copy link
Contributor

@afullerx afullerx commented Apr 10, 2024

This change replaces the continuous polling implementation of the Outbox.loop() method with one based on asyncio events. It, along with #2827, resolves issue #2482, while lowering CPU usage at idle by over 80%. (To just 0.02% on my system).

I was unable to use the most straightforward approach due to the fact that in versions of Python prior to 3.10, asyncio synchronization primitives attach themselves to the current event loop during initialization. In this case, the event loop that Outbox is created on is different from the one executing its loop method. Therefore, I had to go with a slightly more complex lazy initialization approach in order to maintain compatibility with Python 3.8.

At first, I removed the asyncio.sleep(), reasoning that lower latency in the delivery of messages and updates could result in better performance. However, during automated testing, this led to an increase in the number of random timing related errors. As a result, I concluded that some batching was actually beneficial for system performance. When adding it back, 5ms was chosen as the batch duration because this is the average delay in processing with the current implementation and didn't result in any increase in sporadic errors. This value could likely be optimized further.

The automated tests were run using Python 3.8 in the Docker dev container. Ad hoc testing was done natively on Windows 10 using Python 3.12.

Copy link
Contributor

@falkoschindler falkoschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the pull request, @afullerx!
I like this approach very much. There are just a few things I don't quite understand or would implement differently. Can you check my comments, please? 🙂

nicegui/outbox.py Outdated Show resolved Hide resolved
nicegui/outbox.py Outdated Show resolved Hide resolved
nicegui/outbox.py Outdated Show resolved Hide resolved
@falkoschindler falkoschindler added the bug Something isn't working label Apr 11, 2024
@falkoschindler
Copy link
Contributor

I just noticed that I completely forgot to read your initial post and immediately jumped into your code instead - probably out of excitement and curiosity to see how you solved this problem. Therefore I missed your valuable explanations and asked kind of redundant questions. Shame on me. 🫤

Especially considering this being your first pull request ever, I really appreciate the effort you've put into the implementation, testing and documentation!

@falkoschindler falkoschindler modified the milestones: 1.4.21, 1.4.22 Apr 12, 2024
Copy link
Contributor

@falkoschindler falkoschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed and improved the code a bit:

  • With await asyncio.wait_for(self._enqueue_event.wait(), timeout=1.0) we can await an asyncio event with a timeout. If the timeout occurs, we can simply continue. The while loop will simply continue iterating if _should_stop is not set.
  • If there is no client connection, we continue after a short delay. This way we check for a client connection in 0.1-second intervals.
  • The 0.1 second delay is a bit more defensive. I guess this is why I had to adjust two pytests.

Regarding the batching of multiple messages: If multiple updates happened within 0.01 seconds, the old implementation waited and submitted multiple updates at once. The new implementation will submit a single update, but only once the currently running task yields and the outbox loop is called for another cycle. So if you synchronously update multiple UI elements, their updates will be batched.

Anyway, @rodja and I will thouroughly review this pull request once again to make sure we don't break anything.

@afullerx
Copy link
Contributor Author

@falkoschindler I like the improvements. I saw just one issue: Prior to Python 3.11 asyncio.wait_for() raised asyncio.TimeoutError, so we also need to catch that. I verified that it's necessary with testing, so I went ahead and added it.

Good point about the natural batching that occurs with synchronous updates. I should have thought of that. It makes me even more curious why I saw an increase in test failures and why adding a sleep fixed it. Hopefully just an anomaly with my low-performance test setup and small changes in timing.

@afullerx
Copy link
Contributor Author

@falkoschindler Since the addition of asyncio.wait_for(), I'm getting lockups about every 3rd test run in the dev container. I'm well on my way to picking apart what's happening, but it's taking a while because every experiment can take over an hour to complete.

@afullerx
Copy link
Contributor Author

Fixed in the latest commit. The fix is pretty simple, but figuring out what was going on definitely wasn't. I thought it was going to be an event-related deadlock. It turns out it was some kind of obscure issue preventing task termination when asyncio.wait_for() is called in a loop with the event already set.

The hangs occurred when self.client.has_socket_connection is set to false during the test exit while there are still updates in the queue. Since self._should_stop often isn't set during disorderly test teardowns, this led to us calling wait_for() in an infinite loop. This normally wouldn't be a problem because the Outbox.loop() task is abruptly terminated, regardless of what it's doing. For some unknown reason, the repeated calls to wait_for() prevented this typical behavior, resulting in a hanging of the process. The solution is simply to check self._enqueue_event.is_set() before calling wait_for().

This change also provides a useful optimization, since every call to asyncio.wait_for() results in a new task being created and multiple context switches. I validated this fix by running the tests on a continuous loop for several hours without issue.

Copy link
Contributor

@falkoschindler falkoschindler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your thorough investigation and beautiful solution, @afullerx! It's a nice improvement to avoid creating tasks just to exit immediately because the event is already set. And if it resolves the hanging tests - the better! 😀

@falkoschindler falkoschindler merged commit a2544c1 into zauberzeug:main Apr 16, 2024
1 check passed
@afullerx afullerx deleted the improve_outbox branch June 1, 2024 02:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrapping the menu_and_tabs example into a @ui.page("/") causes continuous 100% CPU load
3 participants