Skip to content

Inconsistent subprocess.Popen.communicate() behavior between Windows and Posix on non-byte memoryview input #134453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tikuma-lsuhsc opened this issue May 21, 2025 · 1 comment
Assignees
Labels
3.13 bugs and security fixes stdlib Python modules in the Lib dir topic-subprocess Subprocess issues. type-bug An unexpected behavior, bug, or error

Comments

@tikuma-lsuhsc
Copy link

tikuma-lsuhsc commented May 21, 2025

Bug report

Bug description:

This is not a bug per se as it diverges from the documentation, but Posix version of subprocess.Popen.communicate() behaves less ideally when a large "memoryview"-able object with non-byte item is passed as the input argument. For example, suppose I have a simple pass-through process which takes a long input array x of length greater than 512 and return it as is:

import subprocess as sp
import numpy as np

x = np.random.randn(44100, dtype=float) # 8-byte data type
ret = sp,run('pass_thru', input=x)
y = np.frombuffer(ret.stdout, dtype=float)
assert np.array_equal(x, y)

This example fails the assertion only in Posix because the returned array y shorter than x.

This appears to stem from sp.Popen.communicate()

if self._input:
    input_view = memoryview(self._input)

#[snip]

if key.fileobj is self.stdin:
    chunk = input_view[self._input_offset:self._input_offset + _PIPE_BUF]

    try:
        self._input_offset += os.write(key.fd, chunk)
    except BrokenPipeError:
        selector.unregister(key.fileobj)
        key.fileobj.close()
    else:
        if self._input_offset >= len(self._input):
            #[snip]

The indexed chunk of input_view sends more bytes if self._input is not a bytes-like while len(self._input) counts the number of items. As such self._input_offset > len(self._input) after the first few writes due to the mismatch in their units.

I think the if statement checks the bytes read against input_view.nbytes instead of len(self._input).

This use case is not officially supported as the input is expected to be bytes-like and not any arbitrary memoryview object, but it works under Windows, and this Posix codepath behavior is rather nasty as it works (as wrongly expected) for a short input (as is often the case for testing).

I believe an arbitrary memoryview object as a subprocess input works otherwise (based on my extensive uses passing audio and video data to and from FFmpeg) so perhaps I should label this issue as a feature request.

P.S., A fix for the example is to use ret = sp,run('pass_thru', input=x.view('b')).

CPython versions tested on:

3.13

Operating systems tested on:

No response

Linked PRs

@tikuma-lsuhsc tikuma-lsuhsc added the type-bug An unexpected behavior, bug, or error label May 21, 2025
@ZeroIntensity ZeroIntensity added the topic-subprocess Subprocess issues. label May 21, 2025
@picnixz picnixz added the stdlib Python modules in the Lib dir label May 23, 2025
@gpshead gpshead changed the title Inconsistent subprocess.Popen.communicate() behavior between Windows and Posix Inconsistent subprocess.Popen.communicate() behavior between Windows and Posix on non-byte memoryview May 30, 2025
@gpshead gpshead changed the title Inconsistent subprocess.Popen.communicate() behavior between Windows and Posix on non-byte memoryview Inconsistent subprocess.Popen.communicate() behavior between Windows and Posix on non-byte memoryview input May 30, 2025
@gpshead gpshead self-assigned this May 30, 2025
gpshead added a commit to gpshead/cpython that referenced this issue May 30, 2025
Fix inconsistent subprocess.Popen.communicate() behavior between Windows
and POSIX when using memoryview objects with non-byte elements as input.

On POSIX systems, the code was incorrectly comparing bytes written against
element count instead of byte count, causing data truncation for large
inputs with non-byte element types.

Changes:
- Cast memoryview inputs to byte view when input is already a memoryview
- Fix progress tracking to use len(input_view) instead of len(self._input)
- Add comprehensive test coverage for memoryview inputs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
gpshead added a commit to gpshead/cpython that referenced this issue May 30, 2025
Fix inconsistent subprocess.Popen.communicate() behavior between Windows
and POSIX when using memoryview objects with non-byte elements as input.

On POSIX systems, the code was incorrectly comparing bytes written against
element count instead of byte count, causing data truncation for large
inputs with non-byte element types.

Changes:
- Cast memoryview inputs to byte view when input is already a memoryview
- Fix progress tracking to use len(input_view) instead of len(self._input)
- Add comprehensive test coverage for memoryview inputs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

pre-commit-whitespace-fixup
@gpshead gpshead added the 3.13 bugs and security fixes label May 30, 2025
@gpshead
Copy link
Member

gpshead commented May 30, 2025

Thanks for the nice bug writeup. My PR leans into "well, why not" and fixes it as a bug because it already worked on one platform so we may as well make that behavior the definition of how it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes stdlib Python modules in the Lib dir topic-subprocess Subprocess issues. type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

4 participants