-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for PyPy and Python 3.5+ #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM. I only have some questions resp. suggestions.
src/zope/app/locales/extract.py
Outdated
from zope.i18nmessageid import Message | ||
from zope.app.locales.interfaces import IPOTEntry, IPOTMaker, ITokenEater | ||
|
||
if not hasattr(tokenize, 'generate_tokens'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier to understand if it does not contain a not
. Plus it would be nice to keep the Python 3 version in the if
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why I used an if
there. It should probably be a try/except ImportError
.
src/zope/app/locales/extract.py
Outdated
DEFAULT_CHARSET = 'UTF-8' | ||
DEFAULT_ENCODING = '8bit' | ||
_import_chickens = {}, {}, ("*",) # dead chickens needed by __import__ | ||
|
||
pot_header = '''\ | ||
pot_header = u'''\ | ||
############################################################################## | ||
# | ||
# Copyright (c) 2003-2004 Zope Foundation and Contributors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove or at least change these year numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
src/zope/app/locales/extract.py
Outdated
>>> entry = POTEntry(Message("Oe", default="\xd6")) | ||
>>> entry.write(FakeFile()) | ||
string with the DEFAULT_ENCODING, too (note that Message always runs | ||
``unicode`` on its value and default value; on Python 3, ``unicode`` is str, so if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should disallow bytes
on Python 3. There should be no code in the wild using bytes on Python 3. Porting code to Python 3 would end in a Python 3 str
by using 'asdf'
or u'asdf'
as message argument. If someone actually uses b'asdf'
he should change his code as MessageId
always was a subclass of unicode
.
But I fear this cannot be fixed here. *sigh*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's kinda messed up, not just on Python 3, but also on Python 2 if the C implementation of Message isn't available: you get a UnicodeDecodeError because the Python implementation does unicode(default)
.
The C implementation is not doing that, which is why it works. It actually looks like the C implementation isn't getting imported on Python 3, due to a bug in the import line. (Sigh.)
I think that the Python implementation of Message
should probably not be doing string conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zopefoundation/zope.i18nmessageid#6 Should fix both issues.
@@ -135,35 +160,58 @@ def addLocationComment(self, filename, line): | |||
|
|||
def write(self, file): | |||
if self.comments: | |||
file.write(self.comments) | |||
file.write(self.comments.encode(DEFAULT_CHARSET) | |||
if not isinstance(self.comments, bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easer to understand without the not
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the not
easier for two reasons. First, because bytes
is the same on Python 2 and 3. The alternative spelling is comments if isinstance(comments, text_type) else comments.encode(...)
. So now you have to go follow the indirection of text_type
to see what that's defined as, and keep in mind that it differs on Python 2 and 3.
Second, the emphasis of something needing to be done flows easier to me if it's the first part of the line: Do this if that otherwise default seems more clear than default if that otherwise do this.
But that's subjective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you accidentally negated too much -- the alternative would be self.comments if isinstance(self.comments, bytes) else self.comments.encode(DEFAULT_CHARSET)
.
I've no preference for either version.
for filename, line in self.locations: | ||
file.write('#: %s:%s\n' % (filename, line)) | ||
file.write(b'#: %s:%d\n' % (filename.encode(DEFAULT_CHARSET), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using the ''.format()
notation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bytes objects don't support .format
in Python 3 😦 %
is available again as of 3.5.
python3.6 -c 'print(b"abc".format)'
Traceback (most recent call last):
File "<string>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'format'
What did people do before 3.5? Manual joining? This module makes somewhat extensive use of %
formatting so it would be annoying to try to change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did people do before 3.5?
- Complain a lot.
- Use b''.join([b'#: ', filename,encode(DEFAULT_CHARSET), b':', str(line).encode('ascii')]) (I do not recommend this!)
- Complain some more
- Use ('#: %s:%d\n' % (filename, line)).encode(DEFAULT_CHARSET)
- Petition and convince Guido to add back %-formatting to byte strings in 3.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamadden Sorry, I was not aware of this problem.
src/zope/app/locales/extract.py
Outdated
# names[:] = filter(lambda x:x not in exclude, names) | ||
# files += [os.path.join(dirname, name) | ||
# for name in fnmatch.filter(names, pattern) | ||
# if name not in exclude] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out code?
engine.catalog['default'] = {} | ||
catalog.update(engine.catalog['default']) | ||
for msgid, locations in catalog.items(): | ||
catalog[msgid] = map(lambda l: (l[0], l[1][0]), locations) | ||
for msgid, locations in list(catalog.items()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is list
actually necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. The next line is catalog[msgid] = ...
. So if the items aren't copied we're mutating the dictionary as we're iterating over it, which raises exceptions.
src/zope/app/locales/pygettext.py
Outdated
import operator | ||
|
||
|
||
if not hasattr(tokenize, 'generate_tokens'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
src/zope/app/locales/pygettext.py
Outdated
@@ -257,7 +265,7 @@ def __init__(self, options): | |||
def __call__(self, ttype, tstring, stup, etup, line): | |||
# dispatch | |||
## import token | |||
## print >> sys.stderr, 'ttype:', token.tok_name[ttype], \ | |||
## print(\, file=sys.stderr, 'ttype:', token.tok_name[ttype]) | |||
## 'tstring:', tstring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was an amusing take on \
. An automated refactoring tool/regexp replace, I presume?
src/zope/app/locales/pygettext.py
Outdated
isdocstring = 1 | ||
# k is the message string, v is a dictionary-set of (filename, | ||
# lineno) tuples. We want to sort the entries in v first by | ||
# file name and then by line number. | ||
v = v.keys() | ||
v = list(v.keys()) | ||
v.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v = sorted(v.keys())
?
… POTEntry equality and sorting.
- Remove unused functions from extract's TokenEater. - Tests for py_strings. - Pin to correct version of zope.i18nmessageid
The bug in zope.i18nmesageid has been addressed, so unless someone feels strongly about Python 3.4 support being part of this (held up by bytes formatting) I would say this is good to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -135,35 +160,58 @@ def addLocationComment(self, filename, line): | |||
|
|||
def write(self, file): | |||
if self.comments: | |||
file.write(self.comments) | |||
file.write(self.comments.encode(DEFAULT_CHARSET) | |||
if not isinstance(self.comments, bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you accidentally negated too much -- the alternative would be self.comments if isinstance(self.comments, bytes) else self.comments.encode(DEFAULT_CHARSET)
.
I've no preference for either version.
for filename, line in self.locations: | ||
file.write('#: %s:%s\n' % (filename, line)) | ||
file.write(b'#: %s:%d\n' % (filename.encode(DEFAULT_CHARSET), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did people do before 3.5?
- Complain a lot.
- Use b''.join([b'#: ', filename,encode(DEFAULT_CHARSET), b':', str(line).encode('ascii')]) (I do not recommend this!)
- Complain some more
- Use ('#: %s:%d\n' % (filename, line)).encode(DEFAULT_CHARSET)
- Petition and convince Guido to add back %-formatting to byte strings in 3.5
if default is not None: | ||
default = unicode(default) | ||
if default is not None and not isinstance(default, text_type): | ||
default = default.decode(DEFAULT_CHARSET) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. This changes the logic: previously this method would only accept Unicode or pure-ASCII bytestrings; now we're accepting arbitrary bytestrings and assuming a specific charset.
Was this intentional? Why?
Where is this helper method being called from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was intentional, although maybe it wasn't fully thought through?
It was mostly intended to make it symmetric with the behaviour exposed by the POTentry doctest exposed, where different things happen on Python 2 and Python 3 for text_type(b'\xd6')
. In that specific example, one raises an exception, one returns nonsense results; calling str
on bytes
objects in Python 3 is Not What You Want To Do.
Since UTF-8 is a strict superset of ASCII, this shouldn't be narrowing the possible inputs, just expanding them, which seemed reasonable to get consistent behaviour.
This method is called from the __openseen
method of the token eater state machine when it gets the closing )
character for the _('translated expression', 'default')
.
So since this is called from the tokenizer, this may not actually really matter. On all Python versions, the tokenizer produces native strings; on Python 3 it does this by determining the file's encoding using comments or the BOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very good point: we don't want to get "b'Foo'"
on Python 3 with code that would raise UnicodeErrors on Python 2.
Especially when the only thing calling this is the tokenizer so it's all moot.
Thank you for taking your time to explain!
tokenize.COMMENT): | ||
# there was no class docstring | ||
self.__state = self.__waiting | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dead code elimination? sorry that was too cryptic
I'm guessing these methods were deleted because they were never called from anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I verified in d6b9465 that (a) coverage testing didn't hit them (even though it does hit the nearly identical implementation in pygettext.py) and then (b) that this class implements a manual state machine, and the only way those two methods could be called was if they were manually assigned to self.__state
or explicitly called; they weren't; and (c) they were double-dunder private methods so nobody else should be calling them either.
print >> sys.stderr, ("Could not figure out the i18n domain" | ||
"for module %s, assuming it is OK" % module_name) | ||
print("Could not figure out the i18n domain" | ||
" for module %s, assuming it is OK" % module_name, file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing that missing space!
src/zope/app/locales/extract.py
Outdated
"output file: %r\n" \ | ||
"Python only: %r" \ | ||
% (base_dir, path, site_zcml, exclude_dirs, domain, | ||
include_default_domain, output_file, python_only)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wouldn't need to indent every line if you didn't add the spurious space after a (
;)
src/zope/app/locales/pygettext.py
Outdated
# Sort the entries. First sort each particular entry's keys, then | ||
# sort all the entries by their first item. | ||
reverse = {} | ||
for k, v in self.__messages.items(): | ||
keys = v.keys() | ||
keys = list(v.keys()) | ||
keys.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use keys = sorted(v.keys())
like elsewhere?
finally: | ||
sys.exit = _exit | ||
sys.stderr = stderr | ||
sys.stdout = stdout |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using unittest.mock
wouldn't make this simpler.
(Well, the code's already been written, so probably not now ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might have...but not on Python 2.7 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
D'oh! I forgot that the difference between 2.6 and 2.7 was unittest2 (extra assertions, support for skipping), but unittest.mock showed up much later.
src/zope/app/locales/tests.py
Outdated
'-s', 'no such file'], 1) | ||
self.assertIn('does not exist', err.getvalue()) | ||
|
||
temp = tempfile.mkdtemp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like when every invocation of tempfile.mkdtemp()
has a prefix='test-zope-app-locales-'
or something similar, so that people have a chance to track down which piece of software leaves clutter behind.
Second, can you maybe use self.addCleanup(shutil.rmtree, temp)
here so the directory is deleted even if the test fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good points. There are other tests that use mkdtemp
here and I just did what they did, but there's no reason not to clean them up too.
src/zope/app/locales/tests.py
Outdated
|
||
def test_extract(self): | ||
me = __file__ | ||
if me.endswith(".pyc"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well handle .pyo
with .endswith((".pyc", ".pyo"))
.
Not that I know anyone who runs unit tests of random software with python -O
. (I remember people arguing one shouldn't use bare assert
s for test in case they want to run the test suite with -O
, but I don't believe they actually do it.)
Anticipating a forthcoming request: @jamadden, you now have PyPI rights to zope.app.locales. |
Re: Python 3.4 -- I say ignore it until somebody files a wishlist bug explaining why they need Python 3.4 support. |
Thanks for the reviews! |
Python 3.5+ because it uses % formatting on byte strings.
Coverage is OK, but not super great.