New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 'lines' property to always use 'bytes' #206

Merged
merged 10 commits into from Apr 23, 2018

Conversation

Projects
None yet
8 participants
@witsch
Member

witsch commented Oct 21, 2017

The 'lines' property type could not be used with bytes so far, as str will include the prefix and quotes:

$ python3.6
Python 3.6.3 (default, Oct 21 2017, 11:47:54) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> str(b'foo')
"b'foo'"
>>> 

This fixes the type to always store bytes as was already the case with Python 2.

@witsch witsch added the bug label Oct 21, 2017

@witsch witsch requested review from davisagli and hannosch Oct 21, 2017

witsch added a commit to zopefoundation/Products.GenericSetup that referenced this pull request Oct 21, 2017

@@ -135,7 +135,7 @@ def field2lines(v):
if isinstance(v, (list, tuple)):
result = []
for item in v:
result.append(str(item))
result.append(field2bytes(item))
return result
return field2text(v).splitlines()

This comment has been minimized.

@hannosch

hannosch Oct 21, 2017

Member

Having the last return statement with field2text (which calls field2string) and the changed one using field2bytes looks inconsistent to me. Does the last line also have to be changed to use field2bytes?

This comment has been minimized.

@witsch

witsch Oct 22, 2017

Member

I agree, but I guess we should first sort out and agree which direction we want to take, i.e. lists of text vs lists of bytes. There are no hard reasons for either I suppose, but like I said, we think "bytes" feel more right.

In any case, the str is wrong and cannot stay (imho).

@tseaver

This comment has been minimized.

Member

tseaver commented Oct 22, 2017

I'm pretty sure that a lines property should be returning lists of text, not lists of bytes.

@witsch

This comment has been minimized.

Member

witsch commented Oct 22, 2017

@hannosch

This comment has been minimized.

Member

hannosch commented Oct 22, 2017

Hey. I think we had a similar lengthy discussion at some sprint and went the other way, where we tried to make all the normal converters (without the u prefix) use native strings. That is use the str type under both Python 2 and 3, even though they mean something very different.

The reasoning was that almost nobody ever used the u-prefixed converters and most people used ascii-only content with these converters.

But that means you'd have to use some other converter if you actually wanted binary content.

I don't remember all the details, and am not convinced the way we decided to do this before is the right way. So I'm all open for arguments and changing this all again.

@davisagli

This comment has been minimized.

Member

davisagli commented Oct 29, 2017

Hmm. I think using native strings makes fine sense for when these converters are used to process request params. But they are also used when values are stored by a PropertyManager, which is the case that @witsch and @gotcha were working on. I'm concerned that if this converter returns native strings, then after loading a database created with Zope 2 into Zope 4 on Python 3, existing values of lines properties will be bytes but newly written ones will be stored as text. Or is this an example of something that needs to be dealt with by app-specific migration/db conversion code?

@leorochael

This comment has been minimized.

Contributor

leorochael commented Oct 29, 2017

Back when these converters like lines were created, there was no Unicode in Python (and no u-converters like ulines.

So I agree most people using lines indeed intended to get lines of text, and this is what they got, in the form of a list of (byte) strings that just so happened to contain bytes in whatever encoding they were using (most likely iso-8859-1).

These lines of text encoded as bytestrings were
then likely sent to whatever storage or processing back-end was in use, with the expectation that they would be processed "as is".

So while I like the argument that "lines of text" is what people expect when they use "lines" (hence we should be giving them a list of Unicode strings), in practice this will probably break existing applications that are migrating to Python 3 (and are already facing a whole set of challenges) throwing Unicode errors far from the point of cause, and with no quick workaround (since there is no blines converter).

IMO, lines returning a list of Unicode strings only on Python 3 is DWIMy, and goes against at least 3 entries in the Zen of Python (EIBTI, ITFOARTTTG, TSBOOWTDI). I think it should return "list of bytes" in both Python versions.

@tseaver

This comment has been minimized.

Member

tseaver commented Oct 30, 2017

OK, I'll bow to the consensus.

@icemac

This comment has been minimized.

Member

icemac commented Dec 8, 2017

Is there still something to do before this PR can be merged?

@gotcha

This comment has been minimized.

Member

gotcha commented Dec 8, 2017

@witsch

This comment has been minimized.

Member

witsch commented Dec 8, 2017

I can have a look on monday (I meant to wrap this up anyway :))

@witsch

This comment has been minimized.

Member

witsch commented Dec 22, 2017

I finally looked into this again and ended up pushing a work-in-progress commit I had already started back at the sprint. Actually I meant to amend it in order to fix the two remaining test errors, which (both) look like this:

Error in test test_processInputs_w_simple_marshalling (ZPublisher.tests.testHTTPRequest.HTTPRequestTests)
Traceback (most recent call last):
  File "…/lib/python3.6/unittest/case.py", line 59, in testPartExecutor
    yield
  File "…/lib/python3.6/unittest/case.py", line 605, in run
    testMethod()
  File "…/ZPublisher/tests/testHTTPRequest.py", line 277, in test_processInputs_w_simple_marshalling
    req = self._processInputs(inputs)
  File "…/ZPublisher/tests/testHTTPRequest.py", line 146, in _processInputs
    req.processInputs()
  File "…/ZPublisher/HTTPRequest.py", line 698, in processInputs
    if '<' in tainted[i]:
TypeError: a bytes-like object is required, not 'str'

However, after digging a bit deeper I'm not sure how to best solve this. If we change lines to expect and return lists of bytes, we'll probably also need a TaintedBytes class (in AccessControl/tainted.py) or make TaintedString "byte-aware" somehow (because the instance holding the "tainted" byte sequence also needs to return bytes again).

But before giving this a try, I'd like to hear your thoughts about it… :)

@gotcha

This comment has been minimized.

Member

gotcha commented Jan 28, 2018

I explored how much work it was to add TaintedBytes. Not that much actually.
Needs review though !

Depends on zopefoundation/AccessControl#57

@dataflake

I'm afraid I don't have the bandwidth to look at this right now.

@icemac icemac added this to the 4.0b4 milestone Apr 12, 2018

@icemac

This comment has been minimized.

Member

icemac commented Apr 20, 2018

@gotcha Do you think this PR can be merged now?

@icemac

icemac approved these changes Apr 23, 2018

LGTM.

Minimize diff.
[skip ci]

@icemac icemac merged commit eb4baa0 into master Apr 23, 2018

@icemac icemac deleted the lines-property branch Apr 23, 2018

@icemac

This comment has been minimized.

Member

icemac commented Apr 23, 2018

Thank you all for working on this PR! 😃

@gotcha

This comment has been minimized.

Member

gotcha commented Apr 23, 2018

Thanks to you @icemac for following up !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment