Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue15 - Fix unicode handling in make_query() #78

Merged
merged 9 commits into from
May 5, 2017

Conversation

flipmcf
Copy link
Contributor

@flipmcf flipmcf commented Nov 21, 2016

#15

@flipmcf
Copy link
Contributor Author

flipmcf commented Nov 21, 2016

The previous pull suffered from some serious mission creep. This pull is designed to be easily reviewed.

Copy link
Member

@icemac icemac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the little change I'd request.

@@ -188,6 +203,9 @@ def make_query(*args, **kwargs):
qlist = complex_marshal(list(d.items()))
for i in range(len(qlist)):
k, m, v = qlist[i]
if type(v) == unicode:
_default_encoding()
v = v.encode(_DEFAULT_ENCODING)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer if _default_encoding() returns _DEFAULT_ENCODING. This makes the code more readable IMHO as the function has to be called anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same thing.

if isinstance(v, unicode):
_default_encoding()
encoding = _DEFAULT_ENCODING
return ':%s:ustring' % (encoding,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@@ -166,6 +166,15 @@ def __init__(self, sequence, size, start=0, end=0,
# "make_query(bstart=batch.end)" to the other.


_DEFAULT_ENCODING = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added as a suggestion by tseaver #4 (comment)

#Do not do this at import time.
#Call '_default_encoding()' at run time to retrieve it from config, if present
#If not configured, will be 'utf8' by default.
_DEFAULT_ENCODING = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4 (comment)

Although it looks like this everywhere else:

# This may get overwritten during configuration
default_encoding = 'utf-8'

try:
_DEFAULT_ENCODING = config.zpublisher_default_encoding
except AttributeError:
_DEFAULT_ENCODING = 'utf8'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should force utf8 for those who have not added a zpublisher_default_encoding in their config (eg - everyone getting this update) But release notes and docs should address the new configuration Maybe with this pull request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right: at least the changelog should inform about this change.

@@ -188,6 +203,9 @@ def make_query(*args, **kwargs):
qlist = complex_marshal(list(d.items()))
for i in range(len(qlist)):
k, m, v = qlist[i]
if type(v) == unicode:
_default_encoding()
v = v.encode(_DEFAULT_ENCODING)
qlist[i] = '%s%s=%s' % (quote(k), m, quote(str(v)))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the culprit - just throwing all values into str() without type checking for a unicode with non-ascii characters

However, for backwards compatibility, I think we should only do this for unicode values that cannot be encoded with ascii. I think after reading ZPublisher.HTTPRequest, it converts all string to unicode, so a round-trip from HTTPRequest, make_query(self.request.form) for example, would magically turn strings into :utf8:ustring . This is untested, but I'm pretty sure it's true.

self.assertEqual(simple_marshal(DateTime()), ":date")

def testMarshalUnicode(self):
self.assertEqual(simple_marshal(u'unic\xF3de'), ":utf8:ustring")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also, for backwards compatibility:

self.assertEqual(simple_marshal(u'ascii_only_unicode'), "")

like we do for 'string' above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It least it would not hurt. :)

Copy link
Contributor Author

@flipmcf flipmcf Nov 23, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might hurt. If you get a unicode from a query, then make a query with it - it magically changes to a string when you marshal back into a query string.

Think about the ZPublisher -> makequery -> ZPublisher round trip. I don't believe we should modify query strings. This specifically revolves around search page results that need to carry the search string into the pagination - The query string should remain unmodified except for batch size and start being appended in that specific case.

However, everything should be unicode.... So....

I'm really torn here.

@@ -204,8 +205,7 @@ def make_query(*args, **kwargs):
for i in range(len(qlist)):
k, m, v = qlist[i]
if type(v) == unicode:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that I've overlooked this one. As in line 296 this should be if isinstance(v, unicode):

self.assertEqual(simple_marshal(DateTime()), ":date")

def testMarshalUnicode(self):
self.assertEqual(simple_marshal(u'unic\xF3de'), ":utf8:ustring")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It least it would not hurt. :)

Copy link
Member

@icemac icemac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@icemac
Copy link
Member

icemac commented Nov 24, 2016

@tseaver What do you think about this PR?

@flipmcf
Copy link
Contributor Author

flipmcf commented Nov 29, 2016

@tseaver I took out a lot of the scope creep we had in the previous PR. I would be happy to attack that stuff in a separate PR tho.

@tseaver tseaver merged commit 50ce4db into zopefoundation:master May 5, 2017
@tseaver
Copy link
Member

tseaver commented May 5, 2017

@flipmcf and @icemac: Thanks for all the work!

@flipmcf flipmcf deleted the issue15 branch May 12, 2017 14:00
anand-sd pushed a commit to anand-sd/Zope that referenced this pull request Jul 29, 2024
CI setup for manylinux wheels on tagged commit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants