Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Create a utility function for converting unicode to bytes" #984

Merged
merged 2 commits into from
Mar 31, 2018

Conversation

markrwilliams
Copy link
Member

@markrwilliams markrwilliams commented Mar 28, 2018

https://twistedmatrix.com/trac/ticket/9401

Reverts #961

  1. This of dubious value and will obscure real bugs.
  2. The approving review did not adequately address previous reviewers' concerns.
  3. This introduces a new public API (twisted.python.bytes.ensureBytes) in a module that conflicts with a built-in name.
  1. is the most critical reason for a reversion. If this is going to be used in Twisted, it must be in twisted.python.compat, because it is a shim for porting code. If/when Twisted only supports Python 3, there will be almost no circumstances under which APIs might take both strings or bytes. It also should be private (its name should begin with a leading underscore) so that people know it's not for use outside of Twisted. And its module's name should not conflict with a built-in name.

if handle in self.openDirs:
raise KeyError("already opened this directory")
self.openDirs[handle] = [ensureBytes(dirObj), iter(dirObj)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of a bug created by ensureBytes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for looking into this.

Do we have a test to cover this behavior and prevent us from regression on this bug?

We can revert all these lines, but unless we have tests, they can be later modified and the bug will not be noticed.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was unclear: this line is covered by many tests, but they could not catch the bug introduced by ensureBytes.

ensureBytes returns the object unmodified if it is not a "unicode" object:

>>> from twisted.python.bytes import ensureBytes
>>> type(ensureBytes(object()))
<type 'object'>

dirObj should never be a string-like object, but ensureBytes allows its type to be either "unicode" or anything. Imagine what would happen if you used ensureBytes in a buggy protocol implementation; it could erroneously pass an arbitrary object through to its transport (which could in turn be another protocol), eventually resulting in a type error deep inside Twisted. Conversion functions like ensureBytes must always return objects of their claimed return type to callers to avoid propagating bugs.

That's exactly the precaution taken by networkString.
On Python 2

>>> from twisted.python.compat import networkString
>>> networkString(object())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mrw/src/python/twisted/src/twisted/python/compat.py", line 537, in networkString
    raise TypeError("Can only pass-through bytes on Python 2")
TypeError: Can only pass-through bytes on Python 2

On Python 3

>>> from twisted.python.compat import networkString
>>> networkString(object())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mrw/src/python/twisted/src/twisted/python/compat.py", line 523, in networkString
    raise TypeError("Can only convert text to bytes on Python 3")
TypeError: Can only convert text to bytes on Python 3

ensureBytes could be modified to check that it's received either a "unicode" or bytes object, encoding the former to bytes with UTF-8 or the provided codec, passing through the latter, and raising an exception otherwise. But now then it would be mimicking networkString, except that it would allow callers to accept both bytes and text; I can't think of any APIs that actually work this way, other than problematic ones like Python 3's file system path interface. That's why this function is a net negative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to revert this, but I am trying to see if we can do more than just reverting the change and allowing similar bugs to surface in the future.

Sorry, I was unclear: this line is covered by many tests, but they could not catch the bug introduced by ensureBytes.

A test just touching this line is not enough :)
I was asking if there is a test which can trigger this bug and which can fail if the objects are not right.

Since the tests can't catch the bugs, I would say that we have not tests :) .

I would like to use to opportunity to detect the place where encoding errors can occur and write explicit tests for this behaviour.

Now, we can merge this revert as it is and try to write more tests in a follow up... which might be never

@adiroiban
Copy link
Member

I am a bit confused by this PR... it reverts #961 ...but in the diff of #961 I only see the introduction of the function, without this function being used in the code.

So it looks like this PR is more than a revert for trac 9343

@adiroiban
Copy link
Member

Ok. It looks like this is also a revert for #937 ... but the PR description does not talk about this :)

I was confused as I didn't remember approved those changes in conch.

@adiroiban
Copy link
Member

@markrwilliams Thanks for raising this issue :)

My review mentioned that a carefull review is needed when this code is used... and that we should monitor how this is used and abused :)

For me, is easier to audit the code with ensureBytes, rather than look for all different variant so if / else.

I agee that ensureBytes was abuses (and pretty soon).

I was wrong to think that this will help with the review process... and is hard to monitor. I wasn't aware of #937

I approve this revert... but please create a ticket for it, as it this is not just a revert/reopen of a single ticket.

I take the blame for open the gate, but I think that @wiml as the reviewer for #937 didn't pay attention to the usage of this method and to the test coverage.

@markrwilliams
Copy link
Member Author

@adiroiban This is not quite a reversion of #937. Reverting #961 broke the build because #937 introduced dependencies on ensureBytes. This PR thus undoes only those parts of #937.

Good call on creating a new ticket. Here it is: https://twistedmatrix.com/trac/ticket/9401

Also, thank you for your help on this and also Twisted generally. Maintaining a project as large and old as Twisted is hard and I appreciate all your effort and all of @rodrigc's effort, too!

@hawkowl hawkowl merged commit e94e592 into trunk Mar 31, 2018
@adiroiban
Copy link
Member

@markrwilliams my understanding of ensureBytes is that it should be used for non-compat reason.
That is, it is designed as a helper for APIs which accept both Unicode and bytes on both Python2 and Python3

@hawkowl
Copy link
Member

hawkowl commented Mar 31, 2018

@adiroiban Generally, there's little reason why we should expose those APIs apart from compat reasons. There's very few cases where either/or is acceptable -- an API should allow one of bytes, unicode, or str -- and preferably not multiple. We do allow multiple a lot in t.web, for example, but that's mostly because t.web is a giant hulking mess of backwards compatibility and API confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants