9378.disttrial vs non ascii bytes #959

exarkun · 2018-02-19T16:01:46Z

https://twistedmatrix.com/trac/ticket/9378

Almost certainly the normal case on Python 2 is for the error message to be bytes, not unicode. The implicit ASCII decode is what breaks things.

It seems to be irrelevant on Python 3.

adiroiban · 2018-02-19T19:46:08Z

src/twisted/trial/_dist/test/test_workerreporter.py

+        )
+    if _PY3:
+        test_addFailureNonASCII.skip = (
+            "Exceptions only convert to unicode on Python 3"


From the implementation, I see there is no different behaviour.

I think that this should not be skipped on Py3 but rather update the implementation to have an exception on py3.

twisted.trial.unittest.FailTest: b'\xe2\x98\x83' != b"b'\\xe2\\x98\\x83'"

Regarding the implementation having no different behavior, this is true but there is a difference in Python itself which is that str(Exception()) returns bytes on Python 2 and unicode on Python 3.

I'm not sure what you mean about updating the implementation to have an exception on Python 3 means.

adiroiban · 2018-02-19T19:47:26Z

src/twisted/trial/_dist/test/test_workerreporter.py

+        message when called with a L{Failure}, even if it includes encoded
+        non-ASCII content.
+        """
+        content = u"\N{SNOWMAN}".encode("utf-8")


I don't understand the test.

I was expecting to see content = u"\N{SNOWMAN}" , that is have an exception which has Unicode as args[0]

The failure that I observe comes from having bytes that can't be decoded as ASCII. So this requires bytes here, not unicode.

adiroiban · 2018-02-19T19:57:00Z

src/twisted/trial/_dist/workerreporter.py

@@ -97,7 +97,9 @@ def addFailure(self, test, fail):
        if isinstance(testName, unicode):
            testName = testName.encode("utf-8")
        failure = self._getFailure(fail)
-        fail = failure.getErrorMessage().encode("utf-8")
+        fail = failure.getErrorMessage()
+        if isinstance(fail, unicode):


my suggestion was to implement it as if not _PY3 and isinstance(fail, unicode): with the hope that you will get the same result in the existing test...

but just by reviewing this code I don't understand what is going on here...

in the test you pass bytes, and then it ends up here as Unicode? :)

Ah, I see. I think there is poor test coverage for all of the encoding/decoding that is happening in this code. That's a pre-existing defect but I didn't do much to improve things. The one test I added is specifically for the failure that I encountered. There are probably some more tests that should be added.

The test I added passes bytes so of course the isinstance(fail, unicode) check returns False and the encode("utf-8") does not happen. The case where that isinstance returns True may be exercised indirectly somewhere but I don't know. If you actually run disttrial on Python 3, you'll get isinstance(fail, unicode) as True every time because that's the only case allowed on Python 3.

I see how this could be fixed with a _PY3 check in the implementation, now, because Python 2 also enforces the type of fail - as bytes (compared to unicode for Python 3). So, indeed, Python version should tell you exactly if encoding is required or not.

I don't think that helps the tests, though, because the actual value that comes out is different depending on whether Python 2 or Python 3 is in use. As you observed:

twisted.trial.unittest.FailTest: b'\xe2\x98\x83' != b"b'\\xe2\\x98\\x83'"

where the byte string has been rendered as a unicode string with escapes. I suppose the test could also check _PY3 and make a different comparison depending on Python version... I don't really like that but maybe it's the right thing.

I believe not _PY3 and isinstance(fail, unicode) will always be False because Failure.getErrorMessage() twisted.python.reflect.safe_str, which in turn calls str on its argument and returns either that or the str of the exception that raised. On Python 2, then, Failure.getErrorMessage() can never return unicode.

The fact that Failure.getErrorMessage's (implicit) API contract requires that it return a native string implies the condition should not _PY3 instead of isinstance(fail, unicode).

But it also implies encoding is happening at the wrong level. All of these things - the test name, the error message, the class name, and the traceback frames - are native strings and are dealt with as such elsewhere in trial, including on the receiving end of a dist Worker. The only reason they've been encoded here is that disttrial's AMP commands are defined in terms of String on Python 2 and 3.

I think a better solution would be to remove all the ad-hoc transcodings to and from utf-8 in favor of Unicode AMP commands under Python 3. I also think that requires its own PR.

Rather than belabor this PR, I'd rather see it merged as-is and any _PY3 related checks implemented in https://twistedmatrix.com/trac/ticket/9379.

markrwilliams

This does not fix the underlying problem; for example, WorkerReporter.addError has the same issue, so a test that raises an exception other than FailTest with an encoded message (e.g., raise RuntimeError(u"\N{SNOWMAN}")) will hit an equivalent bug.

I believe the correct solution is https://twistedmatrix.com/trac/ticket/9379, but no PR for that has been submitted. Let's merge this in the meantime. Please fix the news fragment first.

markrwilliams · 2018-02-20T00:43:53Z

src/twisted/trial/_dist/workerreporter.py

@@ -97,7 +97,9 @@ def addFailure(self, test, fail):
        if isinstance(testName, unicode):
            testName = testName.encode("utf-8")
        failure = self._getFailure(fail)
-        fail = failure.getErrorMessage().encode("utf-8")
+        fail = failure.getErrorMessage()
+        if isinstance(fail, unicode):


I believe not _PY3 and isinstance(fail, unicode) will always be False because Failure.getErrorMessage() twisted.python.reflect.safe_str, which in turn calls str on its argument and returns either that or the str of the exception that raised. On Python 2, then, Failure.getErrorMessage() can never return unicode.

The fact that Failure.getErrorMessage's (implicit) API contract requires that it return a native string implies the condition should not _PY3 instead of isinstance(fail, unicode).

But it also implies encoding is happening at the wrong level. All of these things - the test name, the error message, the class name, and the traceback frames - are native strings and are dealt with as such elsewhere in trial, including on the receiving end of a dist Worker. The only reason they've been encoded here is that disttrial's AMP commands are defined in terms of String on Python 2 and 3.

I think a better solution would be to remove all the ad-hoc transcodings to and from utf-8 in favor of Unicode AMP commands under Python 3. I also think that requires its own PR.

Rather than belabor this PR, I'd rather see it merged as-is and any _PY3 related checks implemented in https://twistedmatrix.com/trac/ticket/9379.

markrwilliams · 2018-02-20T00:44:29Z

src/twisted/newsfragments/9378.bugfix

@@ -0,0 +1 @@
+`trial -j`  no longer crashes on test failure messages containing non-ASCII bytes.


Please mention that this applies only on Python 2.

exarkun added 5 commits February 19, 2018 10:57

Add a failing test for the worker reporter

c8d457a

Add a type check before encoding

f32707d

Almost certainly the normal case on Python 2 is for the error message to be bytes, not unicode. The implicit ASCII decode is what breaks things.

Add the news fragment.

676bd73

Skip the new test on Python 3

52272e5

It seems to be irrelevant on Python 3.

wrap long lines

af22593

adiroiban requested a review from a team February 19, 2018 19:41

adiroiban reviewed Feb 19, 2018

View reviewed changes

adiroiban added the needs-review label Feb 19, 2018

exarkun requested a review from a team February 19, 2018 19:51

adiroiban reviewed Feb 19, 2018

View reviewed changes

markrwilliams approved these changes Feb 20, 2018

View reviewed changes

markrwilliams removed the needs-review label Feb 20, 2018

markrwilliams mentioned this pull request Feb 20, 2018

twisted.trial._dist's AMP Commands use Unicode on Python 3. #960

Merged

3 tasks

mention this is a Python 2 related change

d6625fc

exarkun merged commit c164919 into trunk Feb 20, 2018

exarkun deleted the 9378.disttrial-vs-non-ascii-bytes branch February 20, 2018 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

9378.disttrial vs non ascii bytes #959

9378.disttrial vs non ascii bytes #959

exarkun commented Feb 19, 2018

adiroiban Feb 19, 2018

exarkun Feb 19, 2018

adiroiban Feb 19, 2018

exarkun Feb 19, 2018

adiroiban Feb 19, 2018

exarkun Feb 19, 2018

markrwilliams Feb 20, 2018

markrwilliams left a comment

markrwilliams Feb 20, 2018

markrwilliams Feb 20, 2018

		@@ -0,0 +1 @@
		`trial -j` no longer crashes on test failure messages containing non-ASCII bytes.

9378.disttrial vs non ascii bytes #959

9378.disttrial vs non ascii bytes #959

Conversation

exarkun commented Feb 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

markrwilliams left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment