Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use FS encoding and errors for unicode args #1499

Closed
wants to merge 3 commits into from
Closed

Conversation

twm
Copy link
Contributor

@twm twm commented Dec 28, 2020

This partially reverts 85069b5 but should have the same effect.

Scope and purpose

The error reported Twisted #10070:

Traceback (most recent call last):
  File "/buildbot/buildbot-job/build/sandbox/lib/python3.8/site-packages/twisted/internet/defer.py", line 1445, in _inlineCallbacks
    result = current_context.run(g.send, result)
  File "/buildbot/buildbot-job/build/master/buildbot/test/unit/scripts/test_start.py", line 117, in test_start_timeout_number_string
    res = yield self.runStart(start_timeout='10')
  File "/buildbot/buildbot-job/build/master/buildbot/test/unit/scripts/test_start.py", line 87, in runStart
    return getProcessOutputAndValue(sys.executable, args=args, env=env)
  File "/buildbot/buildbot-job/build/sandbox/lib/python3.8/site-packages/twisted/internet/utils.py", line 174, in getProcessOutputAndValue
    return _callProtocolWithDeferred(
  File "/buildbot/buildbot-job/build/sandbox/lib/python3.8/site-packages/twisted/internet/utils.py", line 28, in _callProtocolWithDeferred
    reactor.spawnProcess(p, executable, (executable,) + tuple(args), env, path)
  File "/buildbot/buildbot-job/build/sandbox/lib/python3.8/site-packages/twisted/internet/posixbase.py", line 386, in spawnProcess
    args, env = self._checkProcessArgs(args, env)
  File "/buildbot/buildbot-job/build/sandbox/lib/python3.8/site-packages/twisted/internet/base.py", line 1088, in _checkProcessArgs
    _arg = argChecker(arg)
  File "/buildbot/buildbot-job/build/sandbox/lib/python3.8/site-packages/twisted/internet/base.py", line 1074, in argChecker
    arg = arg.encode(defaultEncoding)
builtins.TypeError: encode() argument 'encoding' must be str, not None

This points at this code: ​

# If any of the following environment variables:
# - PYTHONUTF8
# - PYTHONIOENCODING
#
# are set before the Python interpreter runs, they will affect the
# value of sys.stdout.encoding
defaultEncoding = sys.stdout.encoding
# Common check function
def argChecker(arg: Union[bytes, str]) -> Optional[bytes]:
"""
Return either L{bytes} or L{None}. If the given value is not
allowable for some reason, L{None} is returned. Otherwise, a
possibly different object which should be used in place of arg is
returned. This forces unicode encoding to happen now, rather than
implicitly later.
"""
if isinstance(arg, str):
try:
arg = arg.encode(defaultEncoding)

It looks like defaultEncoding used to default to sys.getfilesystemencoding(): ​85069b5 It's also documented to be possible for std.stdout to be None on Windows so using sys.stdout.encoding seems broken in the general case.

It seems like just the environment variables added for #9858 should be sufficient to solve the problem there, as they activate UTF-8 mode on all supported Python versions. However it also looks off to me that we aren't using the system-configured encoding error handler (via the sublimely named os.getfilesystemencodeerrors()). That's what should allow non-text stuff to round-trip

I have no idea how to add tests for this, but I think that the additional coverage added in PR #1302 may be sufficient.

Contributor Checklist:

This partially reverts 85069b5 but
should have the same effect.
The sys.getfilesystemencodeerrors function seems to be missing from the
sys type stub. I've submitted a PR upstream:
python/typeshed#4864
@adiroiban
Copy link
Member

Here is my take on this #1501 ... don't revert and try to fix and update the tests

@twm
Copy link
Contributor Author

twm commented Dec 28, 2020

Ooops, I forgot 3.5 is still supported.

As I look at this post-sleep I'm beginning to wonder why this str to byte coercion stuff is running on Windows at all. Isn't the environment natively UTF-16 on Windows?

@rodrigc
Copy link
Contributor

rodrigc commented Jan 3, 2021

This patch is not correct. This will break certain versions of Python

@rodrigc rodrigc closed this Jan 3, 2021
@adiroiban
Copy link
Member

Well, we were trying to find the right formula for Win, Linux and macOS. I am not sure if closing the PR without any comment on what would be the right fix is the best way to move this forward.

Note that there is also #1501 for which the progress was stalled as I didn't had time to setup a macOS and WIndows dev VM with all the supported python version.
My local Win and macOS VM are only at 2.7 ...but plan to upgrade them to 3.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants