-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Create a utility function for converting unicode to bytes" #984
Conversation
if handle in self.openDirs: | ||
raise KeyError("already opened this directory") | ||
self.openDirs[handle] = [ensureBytes(dirObj), iter(dirObj)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of a bug created by ensureBytes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for looking into this.
Do we have a test to cover this behavior and prevent us from regression on this bug?
We can revert all these lines, but unless we have tests, they can be later modified and the bug will not be noticed.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I was unclear: this line is covered by many tests, but they could not catch the bug introduced by ensureBytes
.
ensureBytes
returns the object unmodified if it is not a "unicode" object:
>>> from twisted.python.bytes import ensureBytes
>>> type(ensureBytes(object()))
<type 'object'>
dirObj
should never be a string-like object, but ensureBytes
allows its type to be either "unicode" or anything. Imagine what would happen if you used ensureBytes
in a buggy protocol implementation; it could erroneously pass an arbitrary object through to its transport (which could in turn be another protocol), eventually resulting in a type error deep inside Twisted. Conversion functions like ensureBytes
must always return objects of their claimed return type to callers to avoid propagating bugs.
That's exactly the precaution taken by networkString
.
On Python 2
>>> from twisted.python.compat import networkString
>>> networkString(object())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mrw/src/python/twisted/src/twisted/python/compat.py", line 537, in networkString
raise TypeError("Can only pass-through bytes on Python 2")
TypeError: Can only pass-through bytes on Python 2
>>> from twisted.python.compat import networkString
>>> networkString(object())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/mrw/src/python/twisted/src/twisted/python/compat.py", line 523, in networkString
raise TypeError("Can only convert text to bytes on Python 3")
TypeError: Can only convert text to bytes on Python 3
ensureBytes
could be modified to check that it's received either a "unicode" or bytes
object, encoding the former to bytes with UTF-8 or the provided codec, passing through the latter, and raising an exception otherwise. But now then it would be mimicking networkString
, except that it would allow callers to accept both bytes and text; I can't think of any APIs that actually work this way, other than problematic ones like Python 3's file system path interface. That's why this function is a net negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy to revert this, but I am trying to see if we can do more than just reverting the change and allowing similar bugs to surface in the future.
Sorry, I was unclear: this line is covered by many tests, but they could not catch the bug introduced by ensureBytes.
A test just touching this line is not enough :)
I was asking if there is a test which can trigger this bug and which can fail if the objects are not right.
Since the tests can't catch the bugs, I would say that we have not tests :) .
I would like to use to opportunity to detect the place where encoding errors can occur and write explicit tests for this behaviour.
Now, we can merge this revert as it is and try to write more tests in a follow up... which might be never
Ok. It looks like this is also a revert for #937 ... but the PR description does not talk about this :) I was confused as I didn't remember approved those changes in conch. |
@markrwilliams Thanks for raising this issue :) My review mentioned that a carefull review is needed when this code is used... and that we should monitor how this is used and abused :) For me, is easier to audit the code with ensureBytes, rather than look for all different variant so if / else. I agee that ensureBytes was abuses (and pretty soon). I was wrong to think that this will help with the review process... and is hard to monitor. I wasn't aware of #937 I approve this revert... but please create a ticket for it, as it this is not just a revert/reopen of a single ticket. I take the blame for open the gate, but I think that @wiml as the reviewer for #937 didn't pay attention to the usage of this method and to the test coverage. |
@adiroiban This is not quite a reversion of #937. Reverting #961 broke the build because #937 introduced dependencies on Good call on creating a new ticket. Here it is: https://twistedmatrix.com/trac/ticket/9401 Also, thank you for your help on this and also Twisted generally. Maintaining a project as large and old as Twisted is hard and I appreciate all your effort and all of @rodrigc's effort, too! |
@markrwilliams my understanding of ensureBytes is that it should be used for non-compat reason. |
@adiroiban Generally, there's little reason why we should expose those APIs apart from compat reasons. There's very few cases where either/or is acceptable -- an API should allow one of bytes, unicode, or str -- and preferably not multiple. We do allow multiple a lot in t.web, for example, but that's mostly because t.web is a giant hulking mess of backwards compatibility and API confusion. |
https://twistedmatrix.com/trac/ticket/9401
Reverts #961
twisted.python.bytes.ensureBytes
) in a module that conflicts with a built-in name.twisted.python.compat
, because it is a shim for porting code. If/when Twisted only supports Python 3, there will be almost no circumstances under which APIs might take both strings or bytes. It also should be private (its name should begin with a leading underscore) so that people know it's not for use outside of Twisted. And its module's name should not conflict with a built-in name.