Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9378.disttrial vs non ascii bytes #959

Merged
merged 6 commits into from
Feb 20, 2018
Merged

Conversation

exarkun
Copy link
Member

@exarkun exarkun commented Feb 19, 2018

Almost certainly the normal case on Python 2 is for the error message to
be bytes, not unicode.  The implicit ASCII decode is what breaks things.
It seems to be irrelevant on Python 3.
@adiroiban adiroiban requested a review from a team February 19, 2018 19:41
)
if _PY3:
test_addFailureNonASCII.skip = (
"Exceptions only convert to unicode on Python 3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the implementation, I see there is no different behaviour.

I think that this should not be skipped on Py3 but rather update the implementation to have an exception on py3.

twisted.trial.unittest.FailTest: b'\xe2\x98\x83' != b"b'\\xe2\\x98\\x83'"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the implementation having no different behavior, this is true but there is a difference in Python itself which is that str(Exception()) returns bytes on Python 2 and unicode on Python 3.

I'm not sure what you mean about updating the implementation to have an exception on Python 3 means.

message when called with a L{Failure}, even if it includes encoded
non-ASCII content.
"""
content = u"\N{SNOWMAN}".encode("utf-8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the test.

I was expecting to see content = u"\N{SNOWMAN}" , that is have an exception which has Unicode as args[0]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure that I observe comes from having bytes that can't be decoded as ASCII. So this requires bytes here, not unicode.

@exarkun exarkun requested a review from a team February 19, 2018 19:51
@@ -97,7 +97,9 @@ def addFailure(self, test, fail):
if isinstance(testName, unicode):
testName = testName.encode("utf-8")
failure = self._getFailure(fail)
fail = failure.getErrorMessage().encode("utf-8")
fail = failure.getErrorMessage()
if isinstance(fail, unicode):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my suggestion was to implement it as if not _PY3 and isinstance(fail, unicode): with the hope that you will get the same result in the existing test...

but just by reviewing this code I don't understand what is going on here...

in the test you pass bytes, and then it ends up here as Unicode? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I think there is poor test coverage for all of the encoding/decoding that is happening in this code. That's a pre-existing defect but I didn't do much to improve things. The one test I added is specifically for the failure that I encountered. There are probably some more tests that should be added.

The test I added passes bytes so of course the isinstance(fail, unicode) check returns False and the encode("utf-8") does not happen. The case where that isinstance returns True may be exercised indirectly somewhere but I don't know. If you actually run disttrial on Python 3, you'll get isinstance(fail, unicode) as True every time because that's the only case allowed on Python 3.

I see how this could be fixed with a _PY3 check in the implementation, now, because Python 2 also enforces the type of fail - as bytes (compared to unicode for Python 3). So, indeed, Python version should tell you exactly if encoding is required or not.

I don't think that helps the tests, though, because the actual value that comes out is different depending on whether Python 2 or Python 3 is in use. As you observed:

twisted.trial.unittest.FailTest: b'\xe2\x98\x83' != b"b'\\xe2\\x98\\x83'"

where the byte string has been rendered as a unicode string with escapes. I suppose the test could also check _PY3 and make a different comparison depending on Python version... I don't really like that but maybe it's the right thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe not _PY3 and isinstance(fail, unicode) will always be False because Failure.getErrorMessage() twisted.python.reflect.safe_str, which in turn calls str on its argument and returns either that or the str of the exception that raised. On Python 2, then, Failure.getErrorMessage() can never return unicode.

The fact that Failure.getErrorMessage's (implicit) API contract requires that it return a native string implies the condition should not _PY3 instead of isinstance(fail, unicode).

But it also implies encoding is happening at the wrong level. All of these things - the test name, the error message, the class name, and the traceback frames - are native strings and are dealt with as such elsewhere in trial, including on the receiving end of a dist Worker. The only reason they've been encoded here is that disttrial's AMP commands are defined in terms of String on Python 2 and 3.

I think a better solution would be to remove all the ad-hoc transcodings to and from utf-8 in favor of Unicode AMP commands under Python 3. I also think that requires its own PR.

Rather than belabor this PR, I'd rather see it merged as-is and any _PY3 related checks implemented in https://twistedmatrix.com/trac/ticket/9379.

Copy link
Member

@markrwilliams markrwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not fix the underlying problem; for example, WorkerReporter.addError has the same issue, so a test that raises an exception other than FailTest with an encoded message (e.g., raise RuntimeError(u"\N{SNOWMAN}")) will hit an equivalent bug.

I believe the correct solution is https://twistedmatrix.com/trac/ticket/9379, but no PR for that has been submitted. Let's merge this in the meantime. Please fix the news fragment first.

@@ -97,7 +97,9 @@ def addFailure(self, test, fail):
if isinstance(testName, unicode):
testName = testName.encode("utf-8")
failure = self._getFailure(fail)
fail = failure.getErrorMessage().encode("utf-8")
fail = failure.getErrorMessage()
if isinstance(fail, unicode):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe not _PY3 and isinstance(fail, unicode) will always be False because Failure.getErrorMessage() twisted.python.reflect.safe_str, which in turn calls str on its argument and returns either that or the str of the exception that raised. On Python 2, then, Failure.getErrorMessage() can never return unicode.

The fact that Failure.getErrorMessage's (implicit) API contract requires that it return a native string implies the condition should not _PY3 instead of isinstance(fail, unicode).

But it also implies encoding is happening at the wrong level. All of these things - the test name, the error message, the class name, and the traceback frames - are native strings and are dealt with as such elsewhere in trial, including on the receiving end of a dist Worker. The only reason they've been encoded here is that disttrial's AMP commands are defined in terms of String on Python 2 and 3.

I think a better solution would be to remove all the ad-hoc transcodings to and from utf-8 in favor of Unicode AMP commands under Python 3. I also think that requires its own PR.

Rather than belabor this PR, I'd rather see it merged as-is and any _PY3 related checks implemented in https://twistedmatrix.com/trac/ticket/9379.

@@ -0,0 +1 @@
`trial -j` no longer crashes on test failure messages containing non-ASCII bytes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention that this applies only on Python 2.

@exarkun exarkun merged commit c164919 into trunk Feb 20, 2018
@exarkun exarkun deleted the 9378.disttrial-vs-non-ascii-bytes branch February 20, 2018 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants