🐛 fix(execute): adopt CPython subprocess stream handling#3715
Merged
gaborbernat merged 4 commits intotox-dev:mainfrom Feb 17, 2026
Merged
🐛 fix(execute): adopt CPython subprocess stream handling#3715gaborbernat merged 4 commits intotox-dev:mainfrom
gaborbernat merged 4 commits intotox-dev:mainfrom
Conversation
3aebd45 to
934a277
Compare
…adlocks The previous implementation used a two-phase approach with stop events and separate drain phases, leading to race conditions where data could arrive between thread stop and drain execution. This caused occasional deadlocks and incomplete stream reading, particularly during interrupt handling. Replaced the custom implementation with CPython's proven approach: Unix: Use selectors.DefaultSelector (poll/epoll/kqueue) instead of basic select.select, with 32KB chunks instead of 1KB. Properly handle EINTR for signal safety during interrupt scenarios. Windows: Simplified from complex overlapped I/O to straightforward blocking reads with fh.read(32KB) in a loop. Removed asyncio dependencies and the error-prone overlapped mechanism that was causing sporadic failures. Both platforms now read until EOF naturally rather than checking stop events in the hot path, only consulting the stop flag between select() calls. The drain phase is conditionally executed based on the drain parameter, allowing long-running processes like the pep517 backend to skip unnecessary draining. This matches the battle-tested implementation from CPython's subprocess.py which has handled these edge cases correctly for years.
934a277 to
1dfd36c
Compare
Threads were deadlocking on Windows because `ov.getresult(True)` blocks indefinitely and cannot respond to stop events. When the main thread sets the stop event to terminate subprocess readers, the I/O threads remained stuck in the blocking wait, causing test timeouts and preventing graceful shutdown. Changed to `getresult(False)` with periodic polling that checks the stop event every 50ms. This preserves Windows' efficient overlapped I/O mechanism while allowing threads to exit cleanly when signaled. The polling interval matches the Unix implementation's timeout for consistency across platforms.
8395f06 to
2bde0bf
Compare
efa998d to
04db466
Compare
bc10ac5 to
dfff2a5
Compare
The polling approach with getresult(False) was losing subprocess output data because the read thread would exit when the stop event was set before data could be fully captured, causing 29 test failures on Windows CI. Switched to blocking getresult(True) which ensures complete data capture for each overlapped read operation. The stop event is checked between complete reads rather than during them, allowing threads to exit cleanly while guaranteeing all subprocess output is processed.
dfff2a5 to
ee3e719
Compare
Reader threads on Windows were deadlocking because pep517_backend.close() called execute.__exit__() before terminating the subprocess. This created a chicken-and-egg problem where readers couldn't exit because the subprocess still had open pipes, but the subprocess wouldn't be killed until readers exited first. Reordered operations to terminate the subprocess first, which closes the pipes and allows overlapped I/O operations to complete naturally. This eliminates the need for arbitrary timeouts in the Windows reader polling loop, resulting in cleaner code that relies on proper process lifecycle management instead of guessing when it's safe to give up.
Member
Author
|
This is now available in version 4.36.1 🎉 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Subprocess output reading could deadlock on Windows during parallel test execution or when handling large amounts of output. The root cause was a chicken-and-egg problem where reader threads waited for the subprocess to close its pipes, but the subprocess wouldn't be terminated until after the readers exited. 🔒 This manifested as timeouts in CI when tox tried to clean up long-running backend processes.
The fix adopts CPython's approach to subprocess stream handling. On Unix, we switched from
select.select()to theselectorsmodule with proper EOF detection and interrupt handling. On Windows, we use overlapped I/O with non-blocking polling, mirroring CPython's implementation. Most importantly, we reordered the shutdown sequence inpep517_backend.pyto terminate subprocesses before stopping reader threads, ensuring pipes close and pending I/O operations complete naturally. ⚡This eliminates arbitrary timeouts and race conditions in the process cleanup logic. The implementation now handles EINTR signals gracefully and reads larger chunks (32KB instead of 1KB) for better performance with high-volume output.
References:
selectorsmodule: https://docs.python.org/3/library/selectors.html_overlappedmodule: https://github.com/python/cpython/blob/main/Modules/overlapped.c