Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DTML upload for Python 3. #265

Merged
merged 4 commits into from Jun 12, 2018
Merged

Fix DTML upload for Python 3. #265

merged 4 commits into from Jun 12, 2018

Conversation

icemac
Copy link
Member

@icemac icemac commented Apr 4, 2018

Without this change uploading a file to a DTMLMethod via ZMI breaks because self.munge(file) tries to use a string regex on a bytes object (the file variable).

What do you think is this patch the way to go?

Open tasks:

  • Discuss solution
  • Write test (method is currently untested)
  • Port fix to DTMLDocument (it duplicates the code)
  • Write test for DTMLDocument

See zopefoundation/Products.SiteErrorLog#11 which uses this method in tests and fails with AttributeError: 'str' object has no attribute 'read'. in OFS/DTMLMethod.py:272

@icemac icemac added the bug label Apr 4, 2018
@icemac icemac added this to the 4.0b4 milestone Apr 4, 2018
@icemac icemac added the question label Apr 6, 2018
@@ -266,10 +266,12 @@ def manage_upload(self, file='', REQUEST=None):
if self.wl_isLocked():
raise ResourceLockedError('This DTML Method is locked.')

if not isinstance(file, binary_type):
if not isinstance(file, basestring):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basestring is not defined in Python 3

Copy link
Member Author

@icemac icemac May 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right in general, but it is defined inside the module:

if sys.version_info >= (3, ):
basestring = str

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That if sys.version_info could use the PY3 symbol imported above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is not completely the same it is close enough, as we will drop Python 2 support before Python 4 comes out. :)

@icemac icemac modified the milestones: 4.0b4, 4.0b5 Apr 23, 2018
if REQUEST and not file:
raise ValueError('No file specified')
file = file.read()
if PY3:
Copy link
Member

@rbu rbu May 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the conversion binary_type->text_type happen for Python 2 as well? I would assume the file type is always binary_type (in which case the method's default should also be changed to file=b'').

Plus, if the desired output type is text_type, then you might also want to decode inputs that are non-file-like (such as file=b'fnord').

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbu I think you are generally right. But there are databases out in the world which have stored the file contents as bytes (aka Python 2 str). Can we provide a way to convert their file contents to text so we do not have to live forever with file contents wich might be bytes?

Copy link
Member

@rbu rbu Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see what you're saying. In the old code, Python 2 would always have str byte strings and byte file objects. Those were read, resulting in bytes being saved to ZODB. As I understand it, per the WSGI spec, file will usually be a bytes file object, so read()ing from it needs to be decoded. So the goal of this code is to always save byte strings in PY2 and always save text strings in PY3.

To make that consistent, though, the default method parameter needs to be decoded as well. What do you think of this:

        if not isinstance(file, basestring):
            if REQUEST and not file:
                raise ValueError('No file specified')
            file = file.read()

        if PY3 and isinstance(file, binary_type):
                file = file.decode('utf-8')

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right if manage_upload is called with a file parameter which is bytes it should be decoded, too.

@rbu
Copy link
Member

rbu commented Jun 5, 2018

FYI: I've rebased this on top of master

@icemac
Copy link
Member Author

icemac commented Jun 8, 2018

I changed the behaviour of manage_upload to store always a native str. (This seems to be the same what manage_edit does.) Does this sound reasonable? If yes I'll implement the same for DTMLDocument.

@leorochael
Copy link
Contributor

@icemac it sounds reasonable that both manage_upload and manage_edit store the same thing.

@@ -42,7 +44,7 @@
from OFS.SimpleItem import Item_w__name__
from ZPublisher.Iterators import IStreamIterator

if sys.version_info >= (3, ):
if PY3:
basestring = str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current state, you don't need the basestring symbol anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Copy link
Member

@rbu rbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good to me

@icemac icemac changed the title [WIP] Fix DTML upload for Python 3. Fix DTML upload for Python 3. Jun 12, 2018
@icemac icemac merged commit e6a85ac into master Jun 12, 2018
@icemac icemac deleted the fix-dtml-upload branch June 12, 2018 07:17
@icemac icemac removed this from the 4.0 final milestone Oct 11, 2018
@icemac icemac added this to the 4.0b6 milestone Oct 11, 2018
rbu added a commit that referenced this pull request Nov 20, 2018
DTML documents need to be stored as native strings ("str" in both Py2 and Py3),
which was implemented in #265 for the manage_upload case (create a document,
then upload the content).

This commit introduces the conditional type conversion for the "add" case
as well (create a document with initial uploaded content).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants