-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
9378.disttrial vs non ascii bytes #959
Conversation
Almost certainly the normal case on Python 2 is for the error message to be bytes, not unicode. The implicit ASCII decode is what breaks things.
It seems to be irrelevant on Python 3.
) | ||
if _PY3: | ||
test_addFailureNonASCII.skip = ( | ||
"Exceptions only convert to unicode on Python 3" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the implementation, I see there is no different behaviour.
I think that this should not be skipped on Py3 but rather update the implementation to have an exception on py3.
twisted.trial.unittest.FailTest: b'\xe2\x98\x83' != b"b'\\xe2\\x98\\x83'"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the implementation having no different behavior, this is true but there is a difference in Python itself which is that str(Exception()) returns bytes on Python 2 and unicode on Python 3.
I'm not sure what you mean about updating the implementation to have an exception on Python 3 means.
message when called with a L{Failure}, even if it includes encoded | ||
non-ASCII content. | ||
""" | ||
content = u"\N{SNOWMAN}".encode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the test.
I was expecting to see content = u"\N{SNOWMAN}"
, that is have an exception which has Unicode as args[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failure that I observe comes from having bytes that can't be decoded as ASCII. So this requires bytes here, not unicode.
@@ -97,7 +97,9 @@ def addFailure(self, test, fail): | |||
if isinstance(testName, unicode): | |||
testName = testName.encode("utf-8") | |||
failure = self._getFailure(fail) | |||
fail = failure.getErrorMessage().encode("utf-8") | |||
fail = failure.getErrorMessage() | |||
if isinstance(fail, unicode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suggestion was to implement it as if not _PY3 and isinstance(fail, unicode):
with the hope that you will get the same result in the existing test...
but just by reviewing this code I don't understand what is going on here...
in the test you pass bytes, and then it ends up here as Unicode? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I think there is poor test coverage for all of the encoding/decoding that is happening in this code. That's a pre-existing defect but I didn't do much to improve things. The one test I added is specifically for the failure that I encountered. There are probably some more tests that should be added.
The test I added passes bytes so of course the isinstance(fail, unicode)
check returns False and the encode("utf-8")
does not happen. The case where that isinstance
returns True may be exercised indirectly somewhere but I don't know. If you actually run disttrial on Python 3, you'll get isinstance(fail, unicode)
as True every time because that's the only case allowed on Python 3.
I see how this could be fixed with a _PY3
check in the implementation, now, because Python 2 also enforces the type of fail
- as bytes
(compared to unicode
for Python 3). So, indeed, Python version should tell you exactly if encoding is required or not.
I don't think that helps the tests, though, because the actual value that comes out is different depending on whether Python 2 or Python 3 is in use. As you observed:
twisted.trial.unittest.FailTest: b'\xe2\x98\x83' != b"b'\\xe2\\x98\\x83'"
where the byte string has been rendered as a unicode string with escapes. I suppose the test could also check _PY3 and make a different comparison depending on Python version... I don't really like that but maybe it's the right thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe not _PY3 and isinstance(fail, unicode)
will always be False
because Failure.getErrorMessage()
twisted.python.reflect.safe_str
, which in turn calls str
on its argument and returns either that or the str
of the exception that raised. On Python 2, then, Failure.getErrorMessage()
can never return unicode
.
The fact that Failure.getErrorMessage
's (implicit) API contract requires that it return a native string implies the condition should not _PY3
instead of isinstance(fail, unicode)
.
But it also implies encoding is happening at the wrong level. All of these things - the test name, the error message, the class name, and the traceback frames - are native strings and are dealt with as such elsewhere in trial
, including on the receiving end of a dist Worker
. The only reason they've been encoded here is that disttrial
's AMP commands are defined in terms of String
on Python 2 and 3.
I think a better solution would be to remove all the ad-hoc transcodings to and from utf-8
in favor of Unicode
AMP commands under Python 3. I also think that requires its own PR.
Rather than belabor this PR, I'd rather see it merged as-is and any _PY3
related checks implemented in https://twistedmatrix.com/trac/ticket/9379.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not fix the underlying problem; for example, WorkerReporter.addError
has the same issue, so a test that raises an exception other than FailTest
with an encoded message (e.g., raise RuntimeError(u"\N{SNOWMAN}")
) will hit an equivalent bug.
I believe the correct solution is https://twistedmatrix.com/trac/ticket/9379, but no PR for that has been submitted. Let's merge this in the meantime. Please fix the news fragment first.
@@ -97,7 +97,9 @@ def addFailure(self, test, fail): | |||
if isinstance(testName, unicode): | |||
testName = testName.encode("utf-8") | |||
failure = self._getFailure(fail) | |||
fail = failure.getErrorMessage().encode("utf-8") | |||
fail = failure.getErrorMessage() | |||
if isinstance(fail, unicode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe not _PY3 and isinstance(fail, unicode)
will always be False
because Failure.getErrorMessage()
twisted.python.reflect.safe_str
, which in turn calls str
on its argument and returns either that or the str
of the exception that raised. On Python 2, then, Failure.getErrorMessage()
can never return unicode
.
The fact that Failure.getErrorMessage
's (implicit) API contract requires that it return a native string implies the condition should not _PY3
instead of isinstance(fail, unicode)
.
But it also implies encoding is happening at the wrong level. All of these things - the test name, the error message, the class name, and the traceback frames - are native strings and are dealt with as such elsewhere in trial
, including on the receiving end of a dist Worker
. The only reason they've been encoded here is that disttrial
's AMP commands are defined in terms of String
on Python 2 and 3.
I think a better solution would be to remove all the ad-hoc transcodings to and from utf-8
in favor of Unicode
AMP commands under Python 3. I also think that requires its own PR.
Rather than belabor this PR, I'd rather see it merged as-is and any _PY3
related checks implemented in https://twistedmatrix.com/trac/ticket/9379.
@@ -0,0 +1 @@ | |||
`trial -j` no longer crashes on test failure messages containing non-ASCII bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention that this applies only on Python 2.
https://twistedmatrix.com/trac/ticket/9378