-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue15 - Fix unicode handling in make_query() #78
Conversation
…Fix makequery to work with unicode
The previous pull suffered from some serious mission creep. This pull is designed to be easily reviewed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides the little change I'd request.
@@ -188,6 +203,9 @@ def make_query(*args, **kwargs): | |||
qlist = complex_marshal(list(d.items())) | |||
for i in range(len(qlist)): | |||
k, m, v = qlist[i] | |||
if type(v) == unicode: | |||
_default_encoding() | |||
v = v.encode(_DEFAULT_ENCODING) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nicer if _default_encoding()
returns _DEFAULT_ENCODING
. This makes the code more readable IMHO as the function has to be called anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the same thing.
if isinstance(v, unicode): | ||
_default_encoding() | ||
encoding = _DEFAULT_ENCODING | ||
return ':%s:ustring' % (encoding,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
@@ -166,6 +166,15 @@ def __init__(self, sequence, size, start=0, end=0, | |||
# "make_query(bstart=batch.end)" to the other. | |||
|
|||
|
|||
_DEFAULT_ENCODING = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was added as a suggestion by tseaver #4 (comment)
#Do not do this at import time. | ||
#Call '_default_encoding()' at run time to retrieve it from config, if present | ||
#If not configured, will be 'utf8' by default. | ||
_DEFAULT_ENCODING = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it looks like this everywhere else:
# This may get overwritten during configuration
default_encoding = 'utf-8'
try: | ||
_DEFAULT_ENCODING = config.zpublisher_default_encoding | ||
except AttributeError: | ||
_DEFAULT_ENCODING = 'utf8' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should force utf8 for those who have not added a zpublisher_default_encoding in their config (eg - everyone getting this update) But release notes and docs should address the new configuration Maybe with this pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right: at least the changelog should inform about this change.
@@ -188,6 +203,9 @@ def make_query(*args, **kwargs): | |||
qlist = complex_marshal(list(d.items())) | |||
for i in range(len(qlist)): | |||
k, m, v = qlist[i] | |||
if type(v) == unicode: | |||
_default_encoding() | |||
v = v.encode(_DEFAULT_ENCODING) | |||
qlist[i] = '%s%s=%s' % (quote(k), m, quote(str(v))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the culprit - just throwing all values into str() without type checking for a unicode with non-ascii characters
However, for backwards compatibility, I think we should only do this for unicode values that cannot be encoded with ascii. I think after reading ZPublisher.HTTPRequest, it converts all string to unicode, so a round-trip from HTTPRequest, make_query(self.request.form)
for example, would magically turn strings into :utf8:ustring . This is untested, but I'm pretty sure it's true.
self.assertEqual(simple_marshal(DateTime()), ":date") | ||
|
||
def testMarshalUnicode(self): | ||
self.assertEqual(simple_marshal(u'unic\xF3de'), ":utf8:ustring") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also, for backwards compatibility:
self.assertEqual(simple_marshal(u'ascii_only_unicode'), "")
like we do for 'string' above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It least it would not hurt. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might hurt. If you get a unicode from a query, then make a query with it - it magically changes to a string when you marshal back into a query string.
Think about the ZPublisher -> makequery -> ZPublisher round trip. I don't believe we should modify query strings. This specifically revolves around search page results that need to carry the search string into the pagination - The query string should remain unmodified except for batch size and start being appended in that specific case.
However, everything should be unicode.... So....
I'm really torn here.
@@ -204,8 +205,7 @@ def make_query(*args, **kwargs): | |||
for i in range(len(qlist)): | |||
k, m, v = qlist[i] | |||
if type(v) == unicode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that I've overlooked this one. As in line 296 this should be if isinstance(v, unicode):
self.assertEqual(simple_marshal(DateTime()), ":date") | ||
|
||
def testMarshalUnicode(self): | ||
self.assertEqual(simple_marshal(u'unic\xF3de'), ":utf8:ustring") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It least it would not hurt. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found one more little glitch and documentation would be helpful, too.
See https://github.com/zopefoundation/Zope/pull/78/files/b8a135655a3d68d02379b4ec1976186d881ce92e..b4a5c62449b08b01dd87eaeb094b7561c40dbd24#diff-436468549e8e0256cc3d9427b6c63a04L206
@tseaver What do you think about this PR? |
@tseaver I took out a lot of the scope creep we had in the previous PR. I would be happy to attack that stuff in a separate PR tho. |
CI setup for manylinux wheels on tagged commit
#15